Google recently have been working hard on developing tools and new methods to combat duplicate content. 301′s remain the best fix and prevention is best done with robots.txt blocks and nofollow, but the new tools are great if you can’t get access to redirect or block.

duplicate-cat

Duplicate content causes a split of page rank, can cause some pages to be filtered from rankings but big websites seem to not care about the issue. If the big site’s don’t care about it, why should a site for a small business concentrate on often making lengthy changes or spend time on re-directs that some sites don’t even bat an eyelid at the issue?

Let’s have a look at some examples.

BBC

On the whole the SEO on the BBC is good but they do have a duplicate content issue on the site. I first pointed this out in a post back in February. The problem seen was two URL’s for each page.

http://news.bbc.co.uk/sport1/hi/football/teams/a/arsenal/7831046.stm

You also have a second URL, the difference it’s in the folder sport2 and not sport1

http://news.bbc.co.uk/sport2/hi/football/teams/a/arsenal/7831046.stm

On top of that there is also the low graphic version of the page.

http://news.bbc.co.uk/sport1/low/football/teams/b/blackpool/7831046.stm

And under the sport2 folder

http://news.bbc.co.uk/sport2/low/football/teams/b/blackpool/7831046.stm

Facebook

Another post I did a while ago where your profile can be loaded up on two URL’s

http://www.facebook.com/johnpcampbell

and

http://en-gb.facebook.com/johnpcampbell

Also some profiles now appearing with ?_fb_noscript=1 after the URL’s. That example above isn’t indexed but these two are http://www.facebook.com/wgardner69 and http://en-gb.facebook.com/wgardner69?_fb_noscript=1 some random person!

LinkedIn

Spotted by a work colleague of mine Neil Walker (follow him on twitter @theukseo) he noticed LinkedIn had a duplication problem with two URL’s for his profile.

http://www.linkedin.com/pub/neil-walker/4/41a/793

http://www.linkedin.com/in/internetmarketingoptimisation

Travel Supermarket & Virgin Media

Another spot form Neil was a very strange duplication on Travel Supermarket & Virgin Media. This time it looked like they have duplicated content on a sub domain rather than having two URL’s for one page of content.

Twitter

Can’t remember who spotted this (please comment and I’ll link) but twitter has a https duplication problem and a mobile sub domain duplicating.

m.twitter.com/johnpcampbell

twitter.com/johnpcampbell

https://twitter.com/johnpcampbell

Looking today there is also explore.twitter.com/johnpcampbell indexed but they have a fix in place in the form of a 301 re-direct to twitter.com/johnpcampbell

Should you still care about duplicate content?

In all these example due to the size and the power of the sites it’s not really having an adverse effect on their overall performance (like throwing a dart at godzilla! he’s not going to feel a thing). Google seems to be able to work out which is the correct URL to display. It would be nice to know the effects of correcting this as these sites have so many pages.

Just a little fix to stop duplicate content on twitter would cut the crawling time of Google allowing the search engine to spider more pages. Unfortunately we’ll never know but I’ll keep on fixing site-wide duplicate content issues.

Do you thing big companies need to sort out duplicate content issues? Add a comment

As I’m changing my domain from .com to co.uk sometime next week I though I’d document the process from an SEO point of view.

1. Change all internal links to point to the new domain. If you are doing this on a text server block from google until your are ready as you don’t want the new domain to be indexed containing duplicate content.

2. Add in you 301 re-direct from the old domain to the new. Make sure you also re-direct inner pages from the old domain to the new domain.

3. Re-run your xml sitemap and update the sitemap reference in the robots.txt

Sitemap : http://www.aukseo.co.uk/sitemap.xml

4. Submit a new site to webmaster tools, verify and submit the new sitemap

5. If possible point some links at the new domain this should help with the indexing of the site. If possible change any external links to point directly at the new domain, if you can’t do this don’t worry too much as the links will still have effect due to the 301.

6. Make sure you change anything that depends on your domain, analytics, feedburner, ranking software and payment systems.

EDIT : 14/07/09
7. On Google Webmater Tools verify the new site and use the change of address tool under the Site Configuration to notify Google that your changing to a new domain.

Give Google a couple of days and then check back by using the site: command in Google on the old domain. You should start to see the page reduce from the old domain to the new domain.

If you have some stragglers of pages that don’t re-direct to the new domain then use a HTML Sitemap and point links to the old url’s. This should give Google a reason to index the link. From experience this works but don’t worry too much if the old page doesn’t move over as they still will receive traffic they used to do. Once indexed remove the links from the sitemap.

Any other ideas please add to the comments below!

Posted in SEO.