SEO was so much simpler back in the old days. Code was written — the bots obeyed. That was the end of it. These days, however, robots don’t always do as they’re told.
Search engines (mainly Google) are growing a sophisticated brain of their own. They will take your implementation of code as a sign, but they don’t always listen anymore. They will determine, on their own, whether or not your command was a beneficial one for their index and for the users as a general rule. As such, it’s important to understand how some critical tags and requests are handled.
Why are they doing this? It doesn’t really matter, because it simply doesn’t change the fact that it’s happening and most likely won’t be going away. So, as always, the best thing to do is adapt to the change quickly. It’s the only way to stay ahead and on top of your competition.
Now, most elements are either seen as suggestive (taken as a “hint”, but not necessarily followed) and a directive (much more likely to follow as directed). So let’s explore them further and note what’s what.
NofollowLink building is perhaps the strongest factor in determining a websites authority and ability to appear for its target keywords. A link from one website to another is comprised of a standard HTML reference, the “anchor hyperlink reference”, displayed as seen above. Search engines realized that some links, however, are not meant to be an endorsement or taken as a helpful indicator of value, and webmasters should have the ability to make these links pass no authority — even though it is still necessary to have this link live on a site.
This is where the “nofollow” tag came in. For years, webmasters utilized this little tag to prevent spammers flood their blog comments, forum signatures and other various UGC channels for quick links. It was useful in a myriad of ways.
However, the nofollow tag has now come to a point where search engines have publicly determined that even though it will be acknowledged, it will no longer be the directive it once was. All things being equal, it shouldn’t affect search quality in a negative way as most expect, but if you were to secure a nofollowed link from Wikipedia, for example, it could now do more good than you think — I’d conclude this is the sort of scenario the search engines were imagining when making the change.
CanonicalA canonical tag specifies a preferred website URL, when multiple URLs on a site have the same or similar content, in order to reduce duplicate content issues. An example would be when certain filters are applied to a product listings page (size, color, fabric, etc.) on an eCommerce website, where the URL will slightly differ when each filter is applied, yet display essentially the same type of content. It could also be utilized where a 301 redirect is not possible for one reason or another.
The canonical tag is extremely useful in situations as the aforementioned, and can help a company avoid any link dilution and self cannibalization, when near-duplication of pages is unavoidable or purely accidental. For some reason, it is merely suggestive, meaning search engines may take the duplicated pages into consideration. I don’t see why this would be the case, nevertheless, it is something to note in case you see such pages still being indexed.
PaginationPagination is a phenomena for eCommerce websites that have more than one page for each category of products listing. In the not too distant past, the way to avoid any issues with having many pages that shared the same title, description and other similar content was to let the robots know about it, through the next/prev tags.
As search engines have become smarter, they are able to understand which pages are paginated and treat them accordingly – in doing so, they have decided to deprecate the tag altogether:
As we evaluated our indexing signals, we decided to retire rel=prev/next.
Studies show that users love single-page content, aim for that when possible, but multi-part is also fine for Google Search. Know and do what's best for *your* users! #springiscoming
— Google Search Central (@googlesearchc) March 21, 2019
Robots.txtRobots.txt is exactly that — a text file found in the root directory of a website, used exclusively to restrict and control the behavior of search engine robots. Every search engine and SEO auditing software has its own robot with a name and identifier, such as Googlebot for Google, Slurp for Yahoo, SemrushBot for SEMrush, etc.
The above example is a simple output where every robot (represented as the asterisk) is permitted to crawl the website in question, bar the /admin folder & every file within it. The exception is the Chinese search engine, Baidu (i.e. baiduspider), which is restricted from crawling the website entirely (hence the forward slash, indicating the entire directory).
Search engines take directions from robots.txt files quite sternly, and are considered directives in every sense of the word.
Sitemap.xmlTo clarify, there are many kinds of sitemaps available to a webmaster. The most common type of sitemap is the HTML sitemap, typically on a standalone page of a site. Every single page is listed in some order or fashion, designed to assist a user to find exactly what they are looking for without having to deal with frustrating navigation. XML sitemaps are similar in this way, but are evidently composed in a .xml file for search engines to easier process.
XML sitemaps are not necessary for a website to be crawled and indexed, but it certainly does help, especially in specific circumstances where a page is not linked to from any other page in the site, leading to what is known as an “orphan” page. Think of a XML sitemap as an auxiliary method of ensuring every possible page is discovered and crawled by the bots.
While some tags and elements such as the changefreq (change frequency) and priority are ignored by engines, loc (URL) and lastmod (last modified) are not.
NoindexThe noindex tag is pretty self-explanatory: It instructs robots not to index a certain page. Of course, this is just one way of ensuring a page is not indexed. Other methods include blocking the page from robots via the robots.txt file, or using the Search Console Remove URL tool to have it temporarily removed from the index.
Other parameters can be included alongside the noindex function, such as the “follow” as featured above. This tells the bots that, while we don’t want the page to be indexed, it should still follow all the links on the page and pass equity to the linked pages.
There are many scenarios where different combinations might be ideal e.g. “index, nofollow” or “noindex, nofollow” — the important thing is to remain consistent and cross reference with the robots.txt file to ensure there isn’t any contradictory or mixed messaging being sent to the crawlers.
There is always going to be contention and debate as to how influential each of these signals are. Search engines still remain quite vague on the specifics, and engineers routinely contradict each other in person and on social media. There is also the discrepancy between what the official webmaster blogs claim, and what actually happens in a typical crawl and indexing effort when some SEOs conduct experiments to confirm or verify such claims.
Being aware of these matters, along with the fact that over 500 changes are made annually to the algorithm, it may leave you with a sense of overwhelm and discouragement to not even bother with keeping up with it all. And that, as they say, is when people get left behind. The fact of the matter is, that it’s the duty of a SEO to stay on the cutting edge — to thrive in this ever-changing landscape. Besides, it’s a great way to level the playing field for old and new consultants alike. It’s what keeps us sharp and on our toes, not relying on our past success but always recreating ourselves in new and better ways.
Wouldn’t you agree?