I, Robots.txt -- Redmond Channel Partner

I, Robots.txt

By Russ Cooper
February 15, 2005

In a recent bout of stupidity, the U.S. Department of Energy apparently accidentally published confidential Homeland Security Department documents marked "For Official Use Only", and the documents remain visible via Google's Web cache.

To avoid situations like this, be sure you've created a properly configured robots.txt file on your Web servers. While it won't prevent confidential documents from being placed on a publicly available server, it is at least one way to prevent such documents from being available in Google's Web cache from now until eternity.

The robots.txt file isn't based on any officially recognized standard, but it has been in existence since 1993 and is generally accepted. Full details can be found here.

The robots.txt file is placed on a Web server to provide instructions to well-behaved Web crawlers or spiders. Anyone can use a crawler, but they're most often used by search engines to collect information about Web sites. The file's role is to provide instructions to the crawler, specifying what directories or files should not be indexed by the crawler. There are basically two lines:
User-Agent:
Disallow:

These lines can be repeated within the same file. The "User-Agent:" line indicates which crawler type the subsequent "Disallow:" lines apply to. You can specify a particular crawler by indicating its User- Agent value (found in your Web logs), or simply specify "*" to indicate all crawlers.

Following the "User-Agent:" line are one or more "Disallow:" lines, typically indicating directories. Files can also be specified if desired. Here's a sample robots.txt file:
User-Agent: *
Disallow: /

These two lines, if placed in the robots.txt file at the root of your Web site, tell crawlers to ignore your site.

It's important to understand that a robots.txt file isn't a security mechanism; it does nothing to prevent crawlers or individuals from searching your site for files to index or view. Only polite crawlers will request the file and honor its contents.

If you want some of your site to be found in search engines, but have other files you want to keep out, you should disallow all directories except the ones you want to make available in the search engine. For example, if you have the following structure on your Web root:
"/": Publicly available information to be put into search engines
"/Dev": Stuff you're working on but don't want published
"/Private": Stuff you definitely don't want published

Your robots.txt file would look like this:
User-Agent: *
Disallow: /Dev/
Disallow: /Private/

To be extra secure, you should put some form of authentication on both the /Dev and /Private sub-directories.

Finally, you might have specified that nothing should be crawled, yet you find crawlers still reading directory pages that should be inaccessible. This is means there's still a link to a page on your site somewhere on the Internet.

Using the previous example, let's say you've got a file named FOO.ASP in the /Dev directory. According to your robots.txt file, it shouldn't be crawled. However, there's no defense if some other site offers up a link like this:
"http://www.yoursite.com/Dev/FOO.ASP"

Crawlers will follow that link to your FOO.ASP page and include it in their searches. There's nothing you can do about this. That's why authentication is a necessary extra step to prevent access.

Russ Cooper is a Senior Information Security Analyst with Cybertrust, Inc., www.cybertrust.com. He's also founder and editor of NTBugtraq, www.ntbugtraq.com, one of the industry's most influential mailing lists dedicated to Microsoft security. One of the world's most- recognized security experts, he's often quoted by major media outlets on security issues.

Russ Cooper's Security Watch column appears every Monday in the Redmond magazine/ENT Security Watch e-mail newsletter. Click here to subscribe.

About the Author

Russ Cooper is a senior information security analyst with Verizon Business, Inc. He's also founder and editor of NTBugtraq, www.ntbugtraq.com, one of the industry's most influential mailing lists dedicated to Microsoft security. One of the world's most-recognized security experts, he's often quoted by major media outlets on security issues.

Free Webcast! Learn about Password Management Best Practices

Featured

Microsoft CSPs To Start Selling Windows 10 ESU this Fall

Organizations that want to extend the life of their Windows 10 PCs can begin buying extension plans from Microsoft's Cloud Solution Provider (CSP) partners on Sept. 1.
2025 Microsoft Conference Calendar: For Partners, IT Pros and Developers

Here's your guide to all the IT training sessions, partner meet-ups and annual Microsoft conferences you won't want to miss.
Microsoft Gives Security Partners First Dibs at New Windows Security Platform

Microsoft is readying a new "Windows endpoint security platform" as part of its Windows Resiliency Initiative (WRI).
Microsoft Cements Lead in Increasingly AI-Centric Security Analytics

Forrester has singled out Microsoft as the leading cloud hyperscaler in the increasingly AI-driven field of enterprise security analytics.

RCP Update

Email Address*Country*

Please type the letters/numbers you see above.

I, Robots.txt

Featured

Microsoft CSPs To Start Selling Windows 10 ESU this Fall

2025 Microsoft Conference Calendar: For Partners, IT Pros and Developers

Microsoft Gives Security Partners First Dibs at New Windows Security Platform

Microsoft Cements Lead in Increasingly AI-Centric Security Analytics

2025 Microsoft Conference Calendar: For Partners, IT Pros and Developers

So You Want To Be an MSP? Here's What It Really Takes

HPE Finalizes $14B Juniper Acquisition

TCS Expands Microsoft Integration To Drive AI, Cloud Adoption

Channel Veteran Dan Wensley Named GTIA CEO

2025 Microsoft Conference Calendar: For Partners, IT Pros and Developers

So You Want To Be an MSP? Here's What It Really Takes

HPE Finalizes $14B Juniper Acquisition

TCS Expands Microsoft Integration To Drive AI, Cloud Adoption

Channel Veteran Dan Wensley Named GTIA CEO

Partner Guides

Partner's Guide to the Windows Server 2008 Deadline

Partner's Guide to Office 365 Security Costs

Partner's Guide to UCaaS

Partner's Guide to Starter Workloads in Azure

Partner's Guide to Microsoft's Fiscal Year 2019

FREE WEBCASTS FROM OUR SPONSORS

Tech Talk | Stop Buying What You Already Own: The MSP's Guide to Microsoft 365 Optimization

Veeam Data Cloud for Microsoft 365

Seamless Transitions: Migrating Your VMware Workloads to Azure

Azure Virtual Desktop: The Smart Choice for Remote Work Success

FREE WHITE PAPERS FROM OUR SPONSORS

The Easy Button eBook: Simplicity of SaaS-Based Backup for Microsoft 365

Unlocking the power of Microsoft 365 management: A toolkit for MSP success

How MSPs can future-proof Microsoft 365 management with automation and security

The future of M365 management