You only have so much time to capture a user’s attention when they are on your site, so every bit of your website real estate is crucially important. Duplicating information on your website could be the very reason people decide to leave and take their web surfing somewhere else. Not only could duplicate information increase your bounce rate, but it could also have some significant negative impact on your SEO.
So how do you find that duplicate content on your site if you have a large number of pages and links and you don’t want to sit and crawl through every page yourself? First, it’s important to note that there are different types of duplicate content (cross-domain, internal, partial to name a few), and the one I will be talking about here is the internal duplicate.
First, let’s define what we’re talking about. Content is duplicate when it can be reached by multiple URL paths. If a product page is assigned to multiple category pages, and each page has its own URL -- the result is internal duplicate content. An example:
In the above case, you end up with duplicate content that ideally needs to be resolved to restore your site’s SEO health. If you don’t explicitly tell Google that one of the two pages above is duplicate Google will simply index both of the pages and that is not good for your SEO value.
If you know that your site has 100 pages (just an example), but Google reports 200 URLs in its index - then you have duplicate content issues. In order to get to the bottom of this, we need to check how many pages the site actually has, and how many URLs are in Google’s index.
The Actual Number of Pages
The easiest way to check to a number of pages would be to head to the site’s XML sitemap (or its sitemap index). As per Wikipedia:
“The Sitemaps protocol allows a webmaster to inform search engines about URLs on a website that are available for crawling. A Sitemap is an XML file that lists the URLs for a site.”
But you might be thinking, where do you find the XML sitemap? That’s a great question! It really depends on the platform. On Shopify stores, the XML sitemap is created automatically. You access it by adding “sitemap.xml” to the root domain, like this: sitename.com/sitemap.xml.
On Bigcommerce it would be sitename.com/xmlsitemap.php. If you are not using either of those platforms, simply head over to the site’s robots.txt file sitename.com/robots.txt. That’s one of the locations where the XML sitemap can be referenced.
Once you found the XML sitemap, simply count the number of URLs referenced in it. That’s your number X.
Tip: if you are on Windows hit CTRL + F and search for <url>:
Total URLs Indexed by Google
Now that you know how many URLs your site has from the sitemap, it’s time to compare this to Google’s list, which is actually a piece of cake to do. Just go to Google.com and search for site:sitename.com.
Be sure to replace the “sitename.com” with the actual root domain of your site. That’s your number Y.
Comparing the Numbers
Finally, it’s time to compare the numbers! If number Y is substantially higher than number X than my friend you have got work to do. You can either do it yourself, or you can outsource this to a digital marketing agency that does SEO work… you know… kind of like Slicedbread.
Every little thing can help you make more sales and increase your traffic, so why not start with the basics and make sure that all of the pages on your website are unique, interesting, and easily indexed by Google. Your website is a direct representation of your brand, so don’t let it get sloppy or it will reflect poorly on your product as well.