Avoiding the Duplicate Content Filter on your Drupal Site
One SEO consideration that can be overlooked is having the same content be accessible by multiple urls on your site. This is called duplicate content and your site can be flagged and disappear from Google or other search engines.
If you're using Drupal with Clean URLs and Pathauto, consider the following:
http://www.example.com/blog/2010-03-10
http://www.example.com/node/256
The above urls could be accessed with or without the www prefix & with or without trailing slashes - which means there's now 8 possible urls to access one page, and that could significantly impact your search engine rankings.
It's a matter of perception - Google's perception - whether it believes the intent behind the duplicate content is malicious. But since there's no way of knowing absolutely how your site will be thought of, best to take a few proactive measures.
With or Without www
Starting at line 83 of your Drupal .htaccess file are comments about how to setup your site with or without the www prefix. You really don't need the www in your url, it's a matter of personal preference and your choice may depend on the nature of your site.
With www
RewriteCond %{HTTP_HOST} ^example\.com$ [NC]
RewriteRule ^(.*)$ http://www.example.com/$1 [L,R=301]Without www
RewriteCond %{HTTP_HOST} ^www\.example\.com$ [NC]
RewriteRule ^(.*)$ http://example.com/$1 [L,R=301]Remove Trailing Slashes
Also in your .htaccess file, you can tell your site to remove trailing slashes.
# Remove Trailing Slashes
RewriteCond %{HTTP_HOST} ^(www.)?example\.com$ [NC]
RewriteRule ^(.+)/$ http://%{HTTP_HOST}/$1 [R=301,L]Update robots.txt
Tell search engines that if they come across a page like 'node/256' to ignore it by adding a line to the bottom of robots.txt in the root of your Drupal installation.
Disallow: /node/Aggregating Content from other Sites
If your site is built on aggregating news or other data feeds, to help reduce the risk of being penalized by the duplicate content filter consider also showing unique content. Unique content could be additional site content, comments or generally anything that changes making your pages not appear stale.
Add comment
jsfwd on Twitter
- Must see... Arcade Fire's new HTML5 music experience 'The Wilderness Downtown' http://bit.ly/cVcoDy 4 days 16 hours ago
- Drupal Camp Toronto 2010 - October 15 & 16 - http://2010.drupalcamptoronto.org 2 weeks 20 hours ago
- Rob Ford "I can't support bike lanes" when cyclists get hit "it's their own fault" http://youtu.be/nySs1cEq5rs #voteTO #bikeTO 2 weeks 4 days ago
- @jswaby me thinks I'll keep it ;) 2 weeks 5 days ago
- Found a BlackBerry Bold & chargers in the recycling this morning. Recycling #fail. Me #win. 2 weeks 5 days ago
- #MMA coming to Ontario in 2011 (it's about time) http://bit.ly/ddK6AY 2 weeks 6 days ago
- Twitter Tweet Button for Drupal http://t.co/FaAAnTD via @jsfwd 2 weeks 6 days ago
- Twitter reclaiming the Tweet button http://bit.ly/cjwYie 3 weeks 2 days ago
- RT @paulosman: Next #OpenWebTO meeting is on August 30th at #csiTO! Come learn about the #Salmon protocol from @walkah http://bit.ly/dtAdKp 3 weeks 3 days ago
- @DilbertDave Some Drupal contrib modules need to be patched to work with PHP 5.3. If you can use 5.2.x. 3 weeks 5 days ago