RSS Feed for This PostCurrent Article

Blogging, Robots.txt and the Supplemental Index

A minor detail overlooked by most bloggers including myself is proper setup of your robots.txt file with your blog.
Here’s the situation suddenly a lot of your blog posts and indexed pages appear in the supplemental index for no reason. It’s all original content so why is it in the supplemental index?
It turns out Google also indexes the feeds of your blog and causes real pages to go supplemental. Do a site: search on your site and limit it to rss feeds or keyword feed and you should see what I mean just like I did pretty quickly.
THe easy solution is to use robots.txt and to take advantage of Googlebot’s ability to handle wild card characters adding the following to your robots.txt file

User-agent:Googlebot
Disallow: /*/feed/$
Disallow: /*/feed/rss/$
Disallow: /*/trackback/$

This got me thinking what else am I missing from my robot.txt file and what are the bigger more successful blogs utilizing? Here’s the trick: Go to any blog you follow or like and enter the url/robots.txt and that file will appear on your screen for analyzing and inspection. Once you understand how the file works you’ll see how other sites are using their robots.txt file and hopefully incorporate some of the tricks of the trade into your site

Technorati Tags: , , ,

Trackback URL

RSS Feed for This PostPost a Comment

You must be logged in to post a comment.