What does “canonical” mean? I have been asked many times, so I created this explanation for clients and other non-technical parties.
What is a “canonical”?
The term “canonical” means “a standard or fundamental principal”. In website development, a canonical reference is used when you have multiple web pages with duplicate or near duplicate content. The canonical tells search engines “Yes, these pages are duplicates. Please use this page as the official version”.
Why do canonicals matter?
Duplicate content is not a penalized by search engines, but there is good reason to avoid it. “Backlinks”, links to your site from other sites, are important ranking signals for search engines. If you have two identical pages each page might gain links. The result is that you dilute the benefit of those backlinks by spreading them across two pages. Proper canonicalization will pass the link value to the “official” version.
In addition to diluting backlinks, there are other concerns. Every site has a “crawl budget”, meaning how often googlebot visits the site and how many pages it views. You don’t want to waste that budget on duplicate pages. Nor does Google. If googlebot finds too many duplicate pages it may decrease your crawl budget. Thus, it is possible that googlebot will not even visit important pages.
Real world examples
Everyday we see websites with multiple versions of their homepage. For most websites the homepage will receive more backlinks than any other page. Diluting this link value is a huge missed opportunity. Here are some common variations. They are all identical, but technically distinct web pages.
In another scenario, imagine an ecom site with variations of a single product. Faceted navigation (i.e. search refinements) often results in near duplicate content on identical pages. For example,you sell a shirt that is available in 4 sizes and 5 colors. The canonical should be the base url, with all variations referencing that base url. However, we often find that each combination of size and color are given unique urls that include those search parameters, and that a self-referential canonical is created. This scenario will generate 21 urls with identical content:
Extrapolate this to 5,000 products and you can easily end up with unique 100,000 urls.
rel= canonical in your code
In html it is written <rel=”canonical” href=”https://www.pageurl.com”>. Here is an example: page1 and page2 are identical. We want to define page1 as the canonical, i.e. “official” version.
The canonical for page1 is self referential, meaning literally, “the official version of this page is this page”.
Page2 is in a different navigation hierarchy, but is otherwise identical. We want to tell Google to ignore this page and instead use page1.
The url for page2 is https://www.sitename.com/main-services/page2/. The canonical url will be page1
Auditing and implementing rel=canonical
Discovering the canonicals for each page on your site is quite easy using a crawler such as Screaming Frog. This will map each url and it’s canonical, and will tell you if the canonical is missing.
Every site and CMS handles the management of canonicals differently. Common CMS like WordPress are generally straight forward. Some popular CMS that “do SEO for you” automatically create self-referencing canonicals. One example is Squarespace which, I’ve found, prevents any editing of the canonicals. Custom websites vary wildly.