Robots Exclusion Protocol | |
The robots.txt definitions are advise to robots, they do not actually block access to data. Well-behaved robots will obey many of these directives. |
Disallowing features | |
Copy to clipboard
|
Blocking duplicate access paths | |
If your site is not using SEFURL, many parts of the site have to be accessed with at least one parameter, such as "?id=123", thus you should not block access to many patterns with a question mark in them. If you are using SEFURL then your URLs will have fewer question marks. For TikiWiki, "Disallow: /*&" could be used to disallow every URL with an ampersand, which will avoid having robots trying to examine variations of pages. However, you should examine your site to consider whether you want to block all URLs with an ampersand or only specific parameters. Some default URLs require at least one ampersand, such as accessing a file's information requires specification of both a gallery ID and a file ID. By adding a parameter after the ampersand you can disallow specific parameters. For example, if a robot fully crawls a file directory in the default order there is no need for it to also follow the URL with "&sort_mode" and view the same data in a different order. It is obvious that the early 2009 version of the Cuil crawler robot, twiceler, crawls every identified variant of a URL. It is not known whether twiceler obeys the * wildcard. Copy to clipboard
|
Robots.txt Directives | ||||||||||||||||||||||||
|
HTML META Directives | ||||||||||||||||||
Alias names for this page
Robot | robots.txt | Robots |