org.jbox.webSpider.simpleSpider
Class HtmlFetcher
java.lang.Object
  
org.jbox.webSpider.simpleSpider.HtmlFetcher
public class HtmlFetcher
- extends java.lang.Object
 
A HTML fetcher.
- Version:
 
  - 1.0
 
- Author:
 
  - YiBin.H
 
| 
Field Summary | 
protected  java.net.URLConnection | 
urlConn
 
            | 
 
 
| 
Method Summary | 
 void | 
connect(java.lang.String urlStr)
 
          Connect the specified URL. | 
 java.lang.String | 
fectch()
 
          Fetch text of a page. | 
protected  java.lang.String | 
fetchEncoding()
 
          Fetch encoding of a page. | 
 
| Methods inherited from class java.lang.Object | 
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait | 
 
urlConn
protected java.net.URLConnection urlConn
HtmlFetcher
public HtmlFetcher()
connect
public void connect(java.lang.String urlStr)
             throws java.io.IOException
- Connect the specified URL.
- Parameters:
 urlStr - the URL to Connect.
- Throws:
 java.io.IOException - thrown if fail to connect the URL.
 
 
fetchEncoding
protected java.lang.String fetchEncoding()
                                  throws java.io.IOException,
                                         UnknownEncodingException
- Fetch encoding of a page.
 
 If "charset=" exists in content type of response header, invoking this
 method will return the value of it, or else spider try to down load
 content of page until meeting string "charset=". If "charset=" exists in
 the content, the method will then return the value of "charset=", or
 else, throws an UnknownEncodingException.
- Returns:
 - encoding of a page.
 - Throws:
 java.io.IOException - thrown if fail to down load HTML of a page.
UnknownEncodingException - thrown if fail to fetch encoding of a page.
 
 
fectch
public java.lang.String fectch()
                        throws java.io.IOException,
                               UnknownEncodingException
- Fetch text of a page.
- Returns:
 - text of a page.
 - Throws:
 java.io.IOException - thrown if fail to fetch the HTML of a page.
UnknownEncodingException - thrown if fail to resolve encoding of 
 a page.