|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectorg.htmlparser.visitors.NodeVisitor
org.htmlparser.beans.StringBean
org.jbox.webSpider.simpleSpider.HtmlVisitor
public class HtmlVisitor
A HTML text visitor.
Field Summary |
---|
Fields inherited from class org.htmlparser.beans.StringBean |
---|
mBuffer, mCollapse, mCollapseState, mIsPre, mIsScript, mIsStyle, mLinks, mParser, mPropertySupport, mReplaceSpace, mStrings, PROP_COLLAPSE_PROPERTY, PROP_CONNECTION_PROPERTY, PROP_LINKS_PROPERTY, PROP_REPLACE_SPACE_PROPERTY, PROP_STRINGS_PROPERTY, PROP_URL_PROPERTY |
Constructor Summary | |
---|---|
HtmlVisitor(java.lang.String[] rules)
Constructs a new HTMLVisitor object with an String array of rules. |
Method Summary | |
---|---|
java.util.LinkedList<java.lang.String> |
getLinksUnderRules()
Return links in a HTML page which meet the rules. |
java.lang.String |
getText()
Return text without HTML tag. |
java.lang.String |
getTitle()
Return title of a page. |
void |
parse(java.lang.String html,
java.lang.String encoding)
Parse HTML text with specified encoding. |
void |
visitTag(org.htmlparser.Tag tag)
Visit HTML Tag. |
Methods inherited from class org.htmlparser.beans.StringBean |
---|
addPropertyChangeListener, carriageReturn, collapse, extractStrings, getCollapse, getConnection, getLinks, getReplaceNonBreakingSpaces, getStrings, getURL, main, removePropertyChangeListener, setCollapse, setConnection, setLinks, setReplaceNonBreakingSpaces, setStrings, setURL, updateStrings, visitEndTag, visitStringNode |
Methods inherited from class org.htmlparser.visitors.NodeVisitor |
---|
beginParsing, finishedParsing, shouldRecurseChildren, shouldRecurseSelf, visitRemarkNode |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public HtmlVisitor(java.lang.String[] rules)
rules
- String array containing rules written in REGEXP.Method Detail |
---|
public void visitTag(org.htmlparser.Tag tag)
visitTag
in class org.htmlparser.beans.StringBean
public java.lang.String getText()
public java.util.LinkedList<java.lang.String> getLinksUnderRules()
public java.lang.String getTitle()
public void parse(java.lang.String html, java.lang.String encoding) throws org.htmlparser.util.ParserException
html
- String of HTML text to parse.encoding
- String representing encoding for parsing.
org.htmlparser.util.ParserException
- thrown if fail to parse the html.
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |