|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectorg.htmlparser.visitors.NodeVisitor
org.htmlparser.beans.StringBean
org.jbox.webSpider.simpleSpider.HtmlVisitor
public class HtmlVisitor
A HTML text visitor.
| Field Summary |
|---|
| Fields inherited from class org.htmlparser.beans.StringBean |
|---|
mBuffer, mCollapse, mCollapseState, mIsPre, mIsScript, mIsStyle, mLinks, mParser, mPropertySupport, mReplaceSpace, mStrings, PROP_COLLAPSE_PROPERTY, PROP_CONNECTION_PROPERTY, PROP_LINKS_PROPERTY, PROP_REPLACE_SPACE_PROPERTY, PROP_STRINGS_PROPERTY, PROP_URL_PROPERTY |
| Constructor Summary | |
|---|---|
HtmlVisitor(java.lang.String[] rules)
Constructs a new HTMLVisitor object with an String array of rules. |
|
| Method Summary | |
|---|---|
java.util.LinkedList<java.lang.String> |
getLinksUnderRules()
Return links in a HTML page which meet the rules. |
java.lang.String |
getText()
Return text without HTML tag. |
java.lang.String |
getTitle()
Return title of a page. |
void |
parse(java.lang.String html,
java.lang.String encoding)
Parse HTML text with specified encoding. |
void |
visitTag(org.htmlparser.Tag tag)
Visit HTML Tag. |
| Methods inherited from class org.htmlparser.beans.StringBean |
|---|
addPropertyChangeListener, carriageReturn, collapse, extractStrings, getCollapse, getConnection, getLinks, getReplaceNonBreakingSpaces, getStrings, getURL, main, removePropertyChangeListener, setCollapse, setConnection, setLinks, setReplaceNonBreakingSpaces, setStrings, setURL, updateStrings, visitEndTag, visitStringNode |
| Methods inherited from class org.htmlparser.visitors.NodeVisitor |
|---|
beginParsing, finishedParsing, shouldRecurseChildren, shouldRecurseSelf, visitRemarkNode |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Constructor Detail |
|---|
public HtmlVisitor(java.lang.String[] rules)
rules - String array containing rules written in REGEXP.| Method Detail |
|---|
public void visitTag(org.htmlparser.Tag tag)
visitTag in class org.htmlparser.beans.StringBeanpublic java.lang.String getText()
public java.util.LinkedList<java.lang.String> getLinksUnderRules()
public java.lang.String getTitle()
public void parse(java.lang.String html,
java.lang.String encoding)
throws org.htmlparser.util.ParserException
html - String of HTML text to parse.encoding - String representing encoding for parsing.
org.htmlparser.util.ParserException - thrown if fail to parse the html.
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||