robots.io

Robots.io is a Java library designed to make parsing a websites 'robots.txt' file easy.

How to use

The RobotsParser class provides all the functionality to use robots.io.

The Javadoc for Robots.io can be found here.

Examples

Connecting

To parse the robots.txt for Google with the User-Agent string "test":

RobotsParser robotsParser = new RobotsParser("test");
robotsParser.connect("http://google.com");

Alternatively, to parse with no User-Agent, simply leave the constructor blank.

You can also pass a domain with a path.

robotsParser.connect("http://google.com/example.htm"); //This would also be valid

Note: Domains can either be passed in string form or as a URL object to all methods.

Querying

To check if a URL is allowed:

robotsParser.isAllowed("http://google.com/test"); //Returns true if allowed

Or, to get all the rules parsed from the file:

robotsParser.getDisallowedPaths(); //This will return an ArrayList of Strings

The results parsed are cached in the robotsParser object until the connect() method is called again, overwriting the previously parsed data

Politeness

In the event that all access is denied, a RobotsDisallowedException will be thrown.

URL Normalisation

Domains passed to RobotsParser are normalised to always end in a forward slash. Disallowed Paths returned will never begin with a forward slash. This is so that URL's can easily be constructed. For example:

robotsParser.getDomain() + robotsParser.getDisallowedPaths().get(0); // http://google.com/example.htm

Licensing

Robots.io is distributed under the GPL.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
META-INF		META-INF
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

robots.io

How to use

Examples

Connecting

Querying

Politeness

URL Normalisation

Licensing

About

Uh oh!

Releases 3

Packages

Languages

License

JamesFrost/robots.io

Folders and files

Latest commit

History

Repository files navigation

robots.io

How to use

Examples

Connecting

Querying

Politeness

URL Normalisation

Licensing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Languages

Packages