A robots.txt file tells search engines spiders what pages or files they should or shouldn't request from your site. It is more of a way of preventing your site from being overloaded by requests rather than a secure mechanism to prevent access. It really shouldn't be used as a way of preventing access to your site, and the chances are that some search engine spiders will access the site anyway. If you do need to prevent access then think about using noindex directives within the page itself, or even password protecting the page.

The syntax of a robots.txt file is pretty simple. Each part must be proceeded with what user agent it pertains to, with the wildcard of * being used to apply to all user agents.

User-agent: *

To allow search engines to spider a page use the Allow rule. For example, to allow access to all spiders to the entire site.

  1. User-agent: *
  2. Allow: /

To disallow search engine spiders from a page use the Disallow rule. For example, to disallow access for all spiders to a single file.

  1. User-agent: *
  2. Disallow: /somefile.html

You can then apply rules to a single search engine spider by naming the spider directly.

Read more.