|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.jbox.webSpider.simpleSpider.SimpleSpider
public class SimpleSpider
An implementation of WebSpider
.
It should be noticed that the SimpleSpider doesn't take care
of "rebot.txt".
Constructor Summary | |
---|---|
SimpleSpider()
Constructs a new SimpleSpider. |
Method Summary | |
---|---|
int |
getMaxPageNum()
Return the max page number that the spider will crawl. |
boolean |
hashNext()
Check if there is a next page to visit or if has reached the max page number. |
Page |
next()
Visit and return the next @{link Page Page} object. |
void |
setMaxPageNum(int maxPageNum)
Set the max page number that the spider will crawl. |
void |
setRules(java.lang.String[] rules)
Set crawl rules of WebSpider. |
void |
setStartUrls(java.lang.String[] startUrls)
Set start URLs of WebSpider. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public SimpleSpider()
Method Detail |
---|
public void setStartUrls(java.lang.String[] startUrls)
setStartUrls
in interface WebSpider
startUrls
- String array containing start URLs of WebSpider.public void setRules(java.lang.String[] rules)
setRules
in interface WebSpider
rules
- String array containing rules written in REGEXP.public void setMaxPageNum(int maxPageNum)
setMaxPageNum
in interface WebSpider
maxPageNum
- max page number.public int getMaxPageNum()
getMaxPageNum
in interface WebSpider
public boolean hashNext()
hashNext
in interface WebSpider
public Page next()
next
in interface WebSpider
UnknownEncodingException
- thrown if encoding of a page couldn't
not be resolve.
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |