SEO: 2 Good Ways to Remove Duplicate Content, and 8 Bad Ones : Softpact eBusiness Solutions

Duplicate content material is 2 or extra pages containing the identical or very comparable textual content. Duplicate content material splits hyperlink authority and thus diminishes a web page’s means to rank in natural search outcomes.

Say a website has two equivalent pages, every with 10 exterior, inbound hyperlinks. That website might have harnessed the power of 20 hyperlinks to spice up the rating of a single web page. As an alternative, the location has two pages with 10 hyperlinks. Neither would rank as extremely.

Duplicate content material additionally hurts crawl price range and in any other case bloats search engines’ indexes.

Ecommerce websites create duplicate content material. It’s a byproduct of platform settings and know-how selections. What follows are two good methods to take away duplicate content material from search-engine indexes — and eight to keep away from.

Take away Listed Duplicate Content material

To right listed, duplicate content material, (i) consolidate hyperlink authority right into a single web page and (ii) immediate the search engines to take away the duplicate web page from their index. There are two good methods to do that.

301 redirects are the best choice. 301 redirects consolidate hyperlink authority, immediate de-indexation, and in addition redirect the consumer to the brand new web page. Google has said that it assigns one hundred pc of the hyperlink authority to the brand new web page with a 301 redirect. However Bing and different search engines are tighter lipped. Regardless, use 301 redirects solely when the web page has been completely eliminated.

Canonical tags. “Canonical” is a flowery phrase for one thing that’s acknowledged because the one fact. In search engine marketing, canonical tags determine which web page ought to be listed and assigned hyperlink authority. The tags are ideas to search engines like google and yahoo — not instructions like 301 redirects. Search engines like google and yahoo sometimes respect canonical tags for really duplicate content material.

Canonical tags are the subsequent best choice when (i) 301 redirects are impractical or (ii) the duplicate web page wants to stay accessible — for instance, when you have two product grid pages, one sorted excessive-to-low, and the opposite low-to-excessive, you wouldn’t need to redirect one to the opposite.

eight Strategies to Keep away from

Some choices that take away — or declare to take away — duplicate content material from search indexes will not be advisable, in my expertise.

302 redirects sign a short lived transfer slightly than everlasting. Google has stated for years that 302 redirects cross one hundred pc of the hyperlink authority. Nevertheless, 302s don’t immediate de-indexation. Since they take the identical quantity of effort to implement as 301s, 302 redirects ought to solely be used when the redirect is actually short-term and can sometime be eliminated.

JavaScript redirects are regarded by Google as legitimate — after a number of days or perhaps weeks have handed for the rendering to course of to finish. However there’s little purpose to make use of JavaScript redirects until you lack server entry for 301s.

Meta refreshes are seen to consumers as a quick blip or multisecond web page load on their display earlier than the browser masses a brand new web page. They’re a poor selection because of the obnoxious consumer expertise and the rendering time Google must course of them as redirects.

404 errors reveal that the requested file isn’t on the server, prompting serps to deindex that web page. However 404s additionally take away the web page’s related hyperlink authority. Attempt to 301 redirect a deleted web page when you’ll be able to.

Delicate 404 errors happen when the server 302 redirects a nasty URL to what seems like an error web page, which then returns a 200 OK server header response. For instance, say instance.com/web page/ has been eliminated and will return a 404 error. As an alternative, it 302 redirects to a web page that appears like an error web page (reminiscent of www.instance.com/error-web page/), however returns a 200 OK response.

The 302 response inadvertently tells search engines like google that www.instance.com/web page/ is gone however is perhaps coming again, so the web page ought to stay listed. Furthermore, the 200 response tells search engines like google and yahoo that www.instance.com/error-web page/ is a legitimate web page for indexing. Tender 404s thus bloat the index even additional by leading to not only one dangerous URL being listed, however two.

Search engine instruments. Google and Bing present instruments to take away a URL. Nevertheless, since each require that the submitted URL returns a legitimate 404 error, the instruments are a backup step after eradicating the web page out of your server.

Meta robots noindex tag is within the head of the HTML file. The noindex attribute tells bots to not index the web page. When utilized after a web page has been listed, it might ultimately end in de-indexation, however that would take months. Sadly, hyperlink authority dies with the engines’ capacity to index the web page. And since search engines like google should proceed to crawl a web page to confirm that the noindex attribute continues to be in place, this feature doesn’t scale back lifeless-weight pages from the index. (Word, by the way, that the nofollow attribute of the meta robots tag has no impression on that web page’s indexation.)

Robots.txt disallow doesn’t immediate de-indexation. Pages which might be disallowed after they’ve been listed are not crawled by search engine bots, however they could or might not stay listed. It’s unlikely that these pages will present up in search outcomes until looked for by URL, nevertheless, as a result of the various search engines will not crawl the web page.

Whereas they’re not excellent for eradicating listed content material, meta robots noindex and robots.txt disallow ought to each forestall new duplicate content material from being listed. Their software, nevertheless, requires that duplicate content material be recognized earlier than the launch of a brand new website, and they don’t seem to be one hundred-% efficient.

Your Greatest Guess

For those who want a positive technique of de-indexation, a 301 redirect or 404 error is your greatest guess as a result of the server not masses the content material that had been discovered on that web page. If it’s essential to de-index the web page and harness the hyperlink authority, use a 301 redirect.