Answer the question
In order to leave comments, you need to log in
What are the ways to get the canonical url when parsing a specific resource address?
Some "spider" is fed the URL of some resource address, for example, _http_://www.example.com/blog/2019/mega-article
In addition, different URL options are potentially possible:
_https_://www.example.com/blog/ 2019/mega-article
_http_://m.example.com/blog/2019/mega-article
If the title is present rel=canonical
, then everything is clear: just extract that URL and that's it.
What if it's not specified rel=canonical
?
Are there other ways to get the canonical URL? And if you still need to get it, then how to get out of the situation?
Addition:
ID representing the canonical URL of the given url
Answer the question
In order to leave comments, you need to log in
Could you rephrase the question or add what purpose you need it for?
Now the question sounds like "how to get the canonical url if it is not in the code?"
If the page does not have a canonical to another URL, then this page is canonical by default.
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question