|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectorg.jbox.webSpider.simpleSpider.SimpleSpider
public class SimpleSpider
An implementation of WebSpider.
It should be noticed that the SimpleSpider doesn't take care
of "rebot.txt".
| Constructor Summary | |
|---|---|
SimpleSpider()
Constructs a new SimpleSpider. |
|
| Method Summary | |
|---|---|
int |
getMaxPageNum()
Return the max page number that the spider will crawl. |
boolean |
hashNext()
Check if there is a next page to visit or if has reached the max page number. |
Page |
next()
Visit and return the next @{link Page Page} object. |
void |
setMaxPageNum(int maxPageNum)
Set the max page number that the spider will crawl. |
void |
setRules(java.lang.String[] rules)
Set crawl rules of WebSpider. |
void |
setStartUrls(java.lang.String[] startUrls)
Set start URLs of WebSpider. |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Constructor Detail |
|---|
public SimpleSpider()
| Method Detail |
|---|
public void setStartUrls(java.lang.String[] startUrls)
setStartUrls in interface WebSpiderstartUrls - String array containing start URLs of WebSpider.public void setRules(java.lang.String[] rules)
setRules in interface WebSpiderrules - String array containing rules written in REGEXP.public void setMaxPageNum(int maxPageNum)
setMaxPageNum in interface WebSpidermaxPageNum - max page number.public int getMaxPageNum()
getMaxPageNum in interface WebSpiderpublic boolean hashNext()
hashNext in interface WebSpiderpublic Page next()
next in interface WebSpiderUnknownEncodingException - thrown if encoding of a page couldn't
not be resolve.
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||