org.jbox.webSpider.simpleSpider
Class HtmlFetcher
java.lang.Object
org.jbox.webSpider.simpleSpider.HtmlFetcher
public class HtmlFetcher
- extends java.lang.Object
A HTML fetcher.
- Version:
- 1.0
- Author:
- YiBin.H
Field Summary |
protected java.net.URLConnection |
urlConn
|
Method Summary |
void |
connect(java.lang.String urlStr)
Connect the specified URL. |
java.lang.String |
fectch()
Fetch text of a page. |
protected java.lang.String |
fetchEncoding()
Fetch encoding of a page. |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
urlConn
protected java.net.URLConnection urlConn
HtmlFetcher
public HtmlFetcher()
connect
public void connect(java.lang.String urlStr)
throws java.io.IOException
- Connect the specified URL.
- Parameters:
urlStr
- the URL to Connect.
- Throws:
java.io.IOException
- thrown if fail to connect the URL.
fetchEncoding
protected java.lang.String fetchEncoding()
throws java.io.IOException,
UnknownEncodingException
- Fetch encoding of a page.
If "charset=" exists in content type of response header, invoking this
method will return the value of it, or else spider try to down load
content of page until meeting string "charset=". If "charset=" exists in
the content, the method will then return the value of "charset=", or
else, throws an UnknownEncodingException.
- Returns:
- encoding of a page.
- Throws:
java.io.IOException
- thrown if fail to down load HTML of a page.
UnknownEncodingException
- thrown if fail to fetch encoding of a page.
fectch
public java.lang.String fectch()
throws java.io.IOException,
UnknownEncodingException
- Fetch text of a page.
- Returns:
- text of a page.
- Throws:
java.io.IOException
- thrown if fail to fetch the HTML of a page.
UnknownEncodingException
- thrown if fail to resolve encoding of
a page.