What Is Robots.txt?
Robots.txt is a text file which users put on their websites to tell search engines which pages on the site should not be visited. This text file is kept on the server and it controls the various features of our website.It is like telling a child not to do some thing. If the child is good he will follow your instructions but if he is too naughty then he will surely disobey you. So if you have sensitive data then it is very stupid of you to relay fully on robots.txt files. To better understand it, first you must be aware What Robots is .
A robot is a program which is used by search engines to find new websites on the internet. This search is carried out in order to index these websites and to gather information from them. Various terms are used for referring them such as “spiders” , “crawlers” and “bots”
Working of Robots. Working of Robots is categorized into four steps.
Step 1 :-Site Indexing : It finds new websites and stores them in search engine servers.
Step 2 :- Validate Site Content : Here Robots analyze the content of the site and sees them whether they are following the standards or not. If they are properly working as per the laid standards then grade them according to their performance.
Step 3 :- Link Checking : Here robots analyze all the incoming and outgoing links from the website. This analysis help them to grade sites based on their relevancy. various algorithms are used for this purpose.
Where you can find Robots.txt file? You can find Robots.txt file at the root folder of your website.A folder that forms the top most directory on the website and which is accessible to public. It is very important that you should pact the robot.txt file in the root directory because palcing it some where else will not have any effect.
What is the importance of Robot.txt files for webmasters ? For website owners or for webmasters Robort.txt file is very important. As it helps to index the websites in a better way. Robots help to pass information of the website to search engines and search engines can then rank them in a better way. It also helps web site owners and gives them completer control of the website as they can control how a search engine can visit their site, which content to made available to them and which one to hide.
Structure of a Robots.txt File
The structure of a robots.txt is pretty simple – it is an endless list of user agents and disallowed and allowed files and directories. Basically, the syntax is as follows: User-agent: Disallow: Here, User-agent this specifies the search engine spiders. e.g.
User-agent: Google (Means robots/crawlers from Google)
User-agent: * (Means all search engine robots/crawlers) Allow/Disallow this command tells the search engine spiders which pages to crawl and which one not to crawl. e.g.
User-agent: * Disallow: /temp/ (Means all All user agents are disallowed to see the /temp directory.