<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>AUKSEO - Blog from a Search Engine Optimiser based the UK &#187; duplicate content</title>
	<atom:link href="http://www.aukseo.co.uk/tag/duplicate-content/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.aukseo.co.uk</link>
	<description>The views of a UK SEO</description>
	<lastBuildDate>Tue, 03 Aug 2010 19:33:17 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Is Duplicate Content an Issue for Big Sites? Twitter, BBC, LinkedIn..</title>
		<link>http://www.aukseo.co.uk/is-duplicate-content-an-issue-for-big-sites-twitter-bbc-linkedin-694/</link>
		<comments>http://www.aukseo.co.uk/is-duplicate-content-an-issue-for-big-sites-twitter-bbc-linkedin-694/#comments</comments>
		<pubDate>Thu, 08 Oct 2009 07:20:50 +0000</pubDate>
		<dc:creator>aukseo</dc:creator>
				<category><![CDATA[SEO]]></category>
		<category><![CDATA[twitter]]></category>
		<category><![CDATA[duplicate content]]></category>
		<category><![CDATA[redirect]]></category>

		<guid isPermaLink="false">http://www.aukseo.co.uk/?p=694</guid>
		<description><![CDATA[			
				
			
		
Google recently have been working hard on developing tools and new methods to combat duplicate content. 301&#8217;s remain the best fix and prevention is best done with robots.txt blocks and nofollow, but the new tools are great if you can&#8217;t get access to redirect or block.

Duplicate content causes a split of page rank, can cause [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fwww.aukseo.co.uk%2Fis-duplicate-content-an-issue-for-big-sites-twitter-bbc-linkedin-694%2F"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fwww.aukseo.co.uk%2Fis-duplicate-content-an-issue-for-big-sites-twitter-bbc-linkedin-694%2F&amp;source=johnpcampbell&amp;style=normal&amp;service=bit.ly" height="61" width="50" /><br />
			</a>
		</div>
<p>Google recently have been working hard on <a href="http://googlewebmastercentral.blogspot.com/2009/10/reunifying-duplicate-content-on-your.html" target="_blank">developing tools and new methods to combat duplicate content</a>. 301&#8217;s remain the best fix and prevention is best done with robots.txt blocks and nofollow, but the new tools are great if you can&#8217;t get access to redirect or block.</p>
<p><a href="http://www.aukseo.co.uk/wp-content/uploads/2009/10/duplicate-cat.jpg"><img src="http://www.aukseo.co.uk/wp-content/uploads/2009/10/duplicate-cat.jpg" alt="duplicate-cat" title="duplicate-cat" width="420" height="315" class="aligncenter size-full wp-image-705" /></a></p>
<p>Duplicate content causes a split of page rank, can cause some pages to be filtered from rankings but <b>big websites</b> seem to not care about the issue. If the big site&#8217;s don&#8217;t care about it, why should a site for a small business concentrate on often making lengthy changes or spend time on re-directs that some sites don&#8217;t even bat an eyelid at the issue?</p>
<p>Let&#8217;s have a look at some examples. </p>
<h3><u>BBC</u></h3>
<p>On the whole the SEO on the BBC is good but they do have a duplicate content issue on the site. <a href="http://www.aukseo.co.uk/bbc-duplicate-content-305/">I first pointed this out in a post back in February</a>. The problem seen was two URL&#8217;s for each page.</p>
<p><em><b>http://news.bbc.co.uk/sport1/hi/football/teams/a/arsenal/7831046.stm</b></em></p>
<p>You also have a second URL, the difference it’s in the folder sport2 and not sport1</p>
<p><em><b>http://news.bbc.co.uk/sport2/hi/football/teams/a/arsenal/7831046.stm</b></em></p>
<p>On top of that there is also the low graphic version of the page. </p>
<p><em><b>http://news.bbc.co.uk/sport1/low/football/teams/b/blackpool/7831046.stm</b></em></p>
<p>And under the sport2 folder</p>
<p><em><b>http://news.bbc.co.uk/sport2/low/football/teams/b/blackpool/7831046.stm</b></em></p>
<h3><u>Facebook</u></h3>
<p>Another post I did a while ago where your profile can be loaded up on two URL&#8217;s </p>
<p><em><b>http://www.facebook.com/johnpcampbell</b></em></p>
<p>and</p>
<p><em><b>http://en-gb.facebook.com/johnpcampbell</b></em></p>
<p>Also some profiles now appearing with ?_fb_noscript=1 after the URL&#8217;s. That example above isn&#8217;t indexed but these two are <b>http://www.facebook.com/wgardner69</b> and <b>http://en-gb.facebook.com/wgardner69?_fb_noscript=1</b> some random person!</p>
<h3><u>LinkedIn</u></h3>
<p>Spotted by a work colleague of mine <a href="http://www.seomad.com" target="_blank">Neil Walker</a> (follow him on twitter <a href="http://twitter.com/theukseo" target="_blank">@theukseo</a>) he noticed <a href="http://www.seomad.com/SEOBlog/linkedin-duplicate-content-issues.html" target="_blank">LinkedIn had a duplication problem</a> with two URL&#8217;s for his profile. </p>
<p><em><b>http://www.linkedin.com/pub/neil-walker/4/41a/793</b></em></p>
<p><em><b>http://www.linkedin.com/in/internetmarketingoptimisation</b></em></p>
<h3><u>Travel Supermarket &#038; Virgin Media</u></h3>
<p>Another spot form Neil was a <a href="http://www.seomad.com/SEOBlog/whats-happening-with-travel-supermarket-and-virgin-media.html" target="_blank">very strange duplication on Travel Supermarket &#038; Virgin Media</a>. This time it looked like they have duplicated content on a sub domain rather than having two URL&#8217;s for one page of content. </p>
<h3><u>Twitter</u></h3>
<p>Can&#8217;t remember who spotted this (please comment and I&#8217;ll link) but twitter has a https duplication problem and a mobile sub domain duplicating. </p>
<p><em><b>m.twitter.com/johnpcampbell</p>
<p>twitter.com/johnpcampbell</p>
<p>https://twitter.com/johnpcampbell</b></em></p>
<p>Looking today there is also <b>explore.twitter.com/johnpcampbell</b> indexed but they have a fix in place in the form of a 301 re-direct to twitter.com/johnpcampbell</p>
<p><strong>Should you still care about duplicate content?</strong></p>
<p>In all these example due to the size and the power of the sites it&#8217;s not really having an adverse effect on their overall performance <em>(like throwing a dart at godzilla! he&#8217;s not going to feel a thing)</em>. Google seems to be able to work out which is the correct URL to display. It would be nice to know the effects of correcting this as these sites have so many pages. </p>
<p>Just a little fix to stop duplicate content on twitter would cut the crawling time of Google allowing the search engine to spider more pages. Unfortunately we&#8217;ll never know but I&#8217;ll keep on fixing site-wide duplicate content issues. </p>
<p><b>Do you thing big companies need to sort out duplicate content issues? <u>Add a comment</u></b></p>
]]></content:encoded>
			<wfw:commentRss>http://www.aukseo.co.uk/is-duplicate-content-an-issue-for-big-sites-twitter-bbc-linkedin-694/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>BBC duplicate content</title>
		<link>http://www.aukseo.co.uk/bbc-duplicate-content-305/</link>
		<comments>http://www.aukseo.co.uk/bbc-duplicate-content-305/#comments</comments>
		<pubDate>Tue, 03 Feb 2009 23:53:11 +0000</pubDate>
		<dc:creator>aukseo</dc:creator>
				<category><![CDATA[SEO]]></category>
		<category><![CDATA[bbc]]></category>
		<category><![CDATA[duplicate content]]></category>

		<guid isPermaLink="false">http://www.aukseo.co.uk/?p=305</guid>
		<description><![CDATA[			
				
			
		
Something that I spend a lot of time doing at work is finding content management and e commerce platforms that create duplicate content. It&#8217;s generally generated from print views, pdf&#8217;s, differences in URL generation or clients ripping off other sites. 
It can lead to big problems if it effects your entire site. Often getting these [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fwww.aukseo.co.uk%2Fbbc-duplicate-content-305%2F"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fwww.aukseo.co.uk%2Fbbc-duplicate-content-305%2F&amp;source=johnpcampbell&amp;style=normal&amp;service=bit.ly" height="61" width="50" /><br />
			</a>
		</div>
<p>Something that I spend a lot of time doing at work is finding content management and e commerce platforms that create duplicate content. It&#8217;s generally generated from print views, pdf&#8217;s, differences in URL generation or clients ripping off other sites. </p>
<p>It can lead to big problems if it effects your entire site. Often getting these problems sorted out can lead to a good increase in long tail traffic by 10 to 20 percent. I noticed in the past that the BBC site does spurt out duplicate pages. For example.</p>
<p>Andrei Arshavin signed today for Arsenal, the BBC have a nice article with a video at the top. The URL for that page is:</p>
<p><strong>http://news.bbc.co.uk/sport1/hi/football/teams/a/arsenal/7831046.stm</strong></p>
<p>You also have a second URL, the difference it&#8217;s in the folder sport2 and not sport1 </p>
<p><strong>http://news.bbc.co.uk/sport2/hi/football/teams/a/arsenal/7831046.stm</strong></p>
<p>That URL 302 re-directs to the 1st one, but Google does cache the second URL. This can be seen at <a href="http://tinyurl.com/bdsj3q">http://tinyurl.com/bdsj3q</a>.</p>
<p>Thats not it, soon the low graphics version will get cached. </p>
<p><strong>http://news.bbc.co.uk/sport1/low/football/teams/b/blackpool/7831046.stm</strong></p>
<p>And under the sport2 folder</p>
<p><strong>http://news.bbc.co.uk/sport2/low/football/teams/b/blackpool/7831046.stm</strong></p>
<p>So it&#8217;s duplicate content, Google says that you should try to make sure that you only have one version of a page on your site. The reasons why you should sort this out are pretty simple.</p>
<p>- Splits the flow of link juice<br />
- Splits the possibility of inbound links<br />
- Search engines spend time caching pages it&#8217;s already seen rather than picking up your new pages</p>
<p>So should the BBC block Google from caching the duplicates? Well, yes, but for a site of that site and the speed that Google caches the content and the inbound links that generate it&#8217;s not going to cause a problem. </p>
<p>If you see similar problems in new sites then you do need to get fixes in place. 301 the pages that have been cached already and then block the search engines!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.aukseo.co.uk/bbc-duplicate-content-305/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>
