Google recently have been working hard on developing tools and new methods to combat duplicate content. 301’s remain the best fix and prevention is best done with robots.txt blocks and nofollow, but the new tools are great if you can’t get access to redirect or block.
Duplicate content causes a split of page rank, can cause some pages to be filtered from rankings but big websites seem to not care about the issue. If the big site’s don’t care about it, why should a site for a small business concentrate on often making lengthy changes or spend time on re-directs that some sites don’t even bat an eyelid at the issue?
Let’s have a look at some examples.
On the whole the SEO on the BBC is good but they do have a duplicate content issue on the site. I first pointed this out in a post back in February. The problem seen was two URL’s for each page.
You also have a second URL, the difference it’s in the folder sport2 and not sport1
On top of that there is also the low graphic version of the page.
And under the sport2 folder
Another post I did a while ago where your profile can be loaded up on two URL’s
Also some profiles now appearing with ?_fb_noscript=1 after the URL’s. That example above isn’t indexed but these two are http://www.facebook.com/wgardner69 and http://en-gb.facebook.com/wgardner69?_fb_noscript=1 some random person!
Travel Supermarket & Virgin Media
Another spot form Neil was a very strange duplication on Travel Supermarket & Virgin Media. This time it looked like they have duplicated content on a sub domain rather than having two URL’s for one page of content.
Can’t remember who spotted this (please comment and I’ll link) but twitter has a https duplication problem and a mobile sub domain duplicating.
Looking today there is also explore.twitter.com/johnpcampbell indexed but they have a fix in place in the form of a 301 re-direct to twitter.com/johnpcampbell
Should you still care about duplicate content?
In all these example due to the size and the power of the sites it’s not really having an adverse effect on their overall performance (like throwing a dart at godzilla! he’s not going to feel a thing). Google seems to be able to work out which is the correct URL to display. It would be nice to know the effects of correcting this as these sites have so many pages.
Just a little fix to stop duplicate content on twitter would cut the crawling time of Google allowing the search engine to spider more pages. Unfortunately we’ll never know but I’ll keep on fixing site-wide duplicate content issues.
Do you thing big companies need to sort out duplicate content issues? Add a comment