########http://www.robotstxt.org/ - this is the best source for info I have seen on this topic ######## /Techback #Where do I find out how /robots.txt files work? #You can read the whole standard specification but the basic concept is simple: by writing a structured text file you can indicate to robots that certain parts of your server are off-limits to some or all robots. It is best explained with an example: # /robots.txt file for http://webcrawler.com/ # mail webmaster@webcrawler.com for constructive criticism #User-agent: webcrawler #Disallow: #User-agent: lycra #Disallow: / User-agent: * Disallow: /img Disallow: /chatTest Disallow: /DB Disallow: /inc_asp Disallow: /inc_css Disallow: /inc_js Disallow: /in_xml Disallow: /inc_wav Disallow: /inc_swf #The first two lines, starting with '#', specify a comment #The first paragraph specifies that the robot called 'webcrawler' has nothing disallowed: it may go anywhere. #The second paragraph indicates that the robot called 'lycra' has all relative URLs starting with '/' disallowed. Because all relative URL's on a server start with '/', this means the entire site is closed off. #The third paragraph indicates that all other robots should not visit URLs starting with /tmp or /log. Note the '*' is a special token, meaning "any other User-agent"; you cannot use wildcard patterns or regular expressions in either User-agent or Disallow lines. #Two common errors: #Wildcards are _not_ supported: instead of 'Disallow: /tmp/*' just say 'Disallow: /tmp/'. #You shouldn't put more than one path on a Disallow line (this may change in a future version of the spec)