Package org.jbox.webSpider.simpleSpider

This package defines APIs of an simple implementation of WebSpider.

See:
          Description

Class Summary
HtmlFetcher A HTML fetcher.
HtmlVisitor A HTML text visitor.
SimpleSpider An implementation of WebSpider.
 

Exception Summary
UnknownEncodingException Thrown when encoding of a page couldn't be resolved.
 

Package org.jbox.webSpider.simpleSpider Description

This package defines APIs of an simple implementation of WebSpider.


The construction of SimpleSpider is like below:

Note: SimpleSpider doesn't take care of "rebot.txt" at current versio, so it might not be very suitable for crawling whole Internet. It's designed to be used for persional webSite. At the latter version we will add the funtion for dealing with "rebot.txt".

Note: SimpleSpider doesn't take care of "rebot.txt".