SEO was so much simpler back in the old days. Code was written — the bots obeyed. That was the end of it. These days, however, robots don’t always do as they’re told.
Search engines (mainly Google) are growing a sophisticated brain of their own. They will take your implementation of code as a sign, but they don’t always listen anymore. They will determine, on their own, whether or not your command was a beneficial one for their index and for the users as a general rule. As such, it’s important to understand how some critical tags and requests are handled.
Why are they doing this? It doesn’t really matter, because it simply doesn’t change the fact that it’s happening and most likely won’t be going away. So, as always, the best thing to do is adapt to the change quickly. It’s the only way to stay ahead and on top of your competition.
Now, most elements are either seen as suggestive (taken as a “hint”, but not necessarily followed) and a directive (much more likely to follow as directed). So let’s explore them further and note what’s what.
This is where the “nofollow” tag came in. For years, webmasters utilized this little tag to prevent spammers flood their blog comments, forum signatures and other various UGC channels for quick links. It was useful in a myriad of ways.
However, the nofollow tag has now come to a point where search engines have publicly determined that even though it will be acknowledged, it will no longer be the directive it once was. All things being equal, it shouldn’t affect search quality in a negative way as most expect, but if you were to secure a nofollowed link from Wikipedia, for example, it could now do more good than you think — I’d conclude this is the sort of scenario the search engines were imagining when making the change.
The canonical tag is extremely useful in situations as the aforementioned, and can help a company avoid any link dilution and self cannibalization, when near-duplication of pages is unavoidable or purely accidental. For some reason, it is merely suggestive, meaning search engines may take the duplicated pages into consideration. I don’t see why this would be the case, nevertheless, it is something to note in case you see such pages still being indexed.
<link rel=”prev” href=”https://site.com/shoes?page=3″>
As search engines have become smarter, they are able to understand which pages are paginated and treat them accordingly – in doing so, they have decided to deprecate the tag altogether:
As we evaluated our indexing signals, we decided to retire rel=prev/next.
Studies show that users love single-page content, aim for that when possible, but multi-part is also fine for Google Search. Know and do what's best for *your* users! #springiscoming
— Google Search Central (@googlesearchc) March 21, 2019
The above example is a simple output where every robot (represented as the asterisk) is permitted to crawl the website in question, bar the /admin folder & every file within it. The exception is the Chinese search engine, Baidu (i.e. baiduspider), which is restricted from crawling the website entirely (hence the forward slash, indicating the entire directory).
Search engines take directions from robots.txt files quite sternly, and are considered directives in every sense of the word.
XML sitemaps are not necessary for a website to be crawled and indexed, but it certainly does help, especially in specific circumstances where a page is not linked to from any other page in the site, leading to what is known as an “orphan” page. Think of a XML sitemap as an auxiliary method of ensuring every possible page is discovered and crawled by the bots.
While some tags and elements such as the changefreq (change frequency) and priority are ignored by engines, loc (URL) and lastmod (last modified) are not.
Other parameters can be included alongside the noindex function, such as the “follow” as featured above. This tells the bots that, while we don’t want the page to be indexed, it should still follow all the links on the page and pass equity to the linked pages.
There are many scenarios where different combinations might be ideal e.g. “index, nofollow” or “noindex, nofollow” — the important thing is to remain consistent and cross reference with the robots.txt file to ensure there isn’t any contradictory or mixed messaging being sent to the crawlers.
There is always going to be contention and debate as to how influential each of these signals are. Search engines still remain quite vague on the specifics, and engineers routinely contradict each other in person and on social media. There is also the discrepancy between what the official webmaster blogs claim, and what actually happens in a typical crawl and indexing effort when some SEOs conduct experiments to confirm or verify such claims.
Being aware of these matters, along with the fact that over 500 changes are made annually to the algorithm, it may leave you with a sense of overwhelm and discouragement to not even bother with keeping up with it all. And that, as they say, is when people get left behind. The fact of the matter is, that it’s the duty of a SEO to stay on the cutting edge — to thrive in this ever-changing landscape. Besides, it’s a great way to level the playing field for old and new consultants alike. It’s what keeps us sharp and on our toes, not relying on our past success but always recreating ourselves in new and better ways.
Wouldn’t you agree?