Overview of Duplicate Content Issues
Duplicate content issues are something that probably only SEOs worry about, or even think of. Before entering the search engine marketing world, I didn’t give them a single thought, and wouldn’t have known what you were talking about had you asked me about them.
Search Engine Indexing
Search engines store web pages, not web sites, and because they store each page individually they also store the URL for that page (the url being the web address, ie www.netshiftmedia.com/services). Search engine spiders scour the Internet looking for links and then grab the content off of those pages to help build their index. This is great in that you want all of your content indexed, however if you provide multiple copies of the same content, you can be penalized. If not penalized, you will be creating a library of your own pages that are competing for the exact same terms, making it even more difficult to rank for those terms. But why would you submit duplicate content to search engines? You wouldn’t, at least not intentionally.
Duplicate Content Examples and How to Fix Them
A strong majority of web sites are setup so that you can access the site using either with the www in front of the domain, or without. In the event that your web site is setup this way, try and access the site without typing the www in front of the domain. You’ll get to the same site, however the www will not be in front of the web address. In this case if search engines find the web site without the www in the domain they will index the same version of the page with the www and without the www, meaning they’ll have 2 copies of every page on your site! Even further, if your site uses a printer-friendly link, and you allow users to find a printer-friendly version of your page you will have a third copy of each page on your site within search engines. Now search engines don’t know necessarily which page to rank for considering there are 3 different versions of the same content. Making matters worse, what if you’ve registered 2 or 3 extra domains for your web site. Having 4 domains pointing to the same site, with the www and non-www versions of the url means that search engines could have up to 8 different copies of the exact same page stored in their index! As well, any links you receive may be pointed to any one of your domains either with or without the www in front of the domain. You could theoretically have thousands of links pointing to your web site, but because they all point to each separate domain you may really only have a couple hundred links to each domain, which will move your domain down in the search engine rankings.
Fixing the Problem
Search engines try their best to get around these issues, and the recent Big Daddy update at Google was supposed to rectify a large number of these issues, however that remains to be seen. In the mean time, there is a tool that can be used to redirect such content to the proper page on your site, should someone try access one of the multiple versions of your page. This tool is called the 301 redirect. 301 is the HTTP response code from the server when something has been moved permanently. Essentially, you will need to tell your web server that any request to your non-www page should be redirected to your domain with the www in front. This way if a search engine follows a link and accesses your page without the www, it will ignore it and move to the domain with the www in place. Any links, and any content that is on your site will be stored in one place with search engines and there won’t be multiple copies of the page stored in their index. As an example, Net Shift Media has the following domains registered: www.netshift.ca, www.netshiftmedia.com, www.netshiftmediainc.com. If you click on any of those links you will be redirected to our primary domain www.netshiftmedia.com this also applies if you visit any of the domains without the www in front of it.
301 Redirect
Using Apache, to setup these redirects is quite simple. Using an .htaccess file, you can specific a number of redirect rules for your web site, as outlined below.
RewriteEngine On
RewriteCond %{HTTP_HOST} ^netshiftmedia.com [nc]
RewriteRule (.*) ttp://www.netshiftmedia.com/$1 [R=301,L]
RewriteCond %{HTTP_HOST} ^www.netshift.ca [nc]
RewriteRule (.*) http://www.netshiftmedia.com/$1 [R=301,L]
RewriteCond %{HTTP_HOST} ^netshift.ca [nc]
RewriteRule (.*) http://www.netshiftmedia.com/$1 [R=301,L]
RewriteCond %{HTTP_HOST} ^netshiftmediainc.com [nc]
RewriteRule (.*) http://www.netshiftmedia.com/$1 [R=301,L]
RewriteCond %{HTTP_HOST} ^www.netshiftmediainc.com [nc]
RewriteRule (.*) http://www.netshiftmedia.com/$1 [R=301,L]
The above code should be placed in a file named .htaccess and saved in the root directory of your web site. The first line tells Apache to turn on their rewrite engine, which is necessary for the server to execute the next statements. The next statement sets a condition for the rewrite to take place. I won’t get into regular expressions here, however you can check out our regular expression library and tester to find more information. It basically states, that if the URL is netshiftmedia.com (without the www) then execute the following rule on the next line. The rewrite rule tells the server to direct the user to www-based domain and use the 301 redirect code, then place any other text after the url to the end of the new url. For example, if you entered netshiftmedia.com/services into the address bar you would be redirected to www.netshiftmedia.com/services. The following lines duplicate the same functionality, but check for our other registered urls and redirect them to the proper www-based domain.
This simple code in our .htaccess file will help search engines identify what pages on our site they should be indexing and makes their job easier, which can give us a definite boost in rankings.



