org.jbox.configuration
Class Configuration

java.lang.Object
  extended by org.jbox.configuration.Configuration

public class Configuration
extends java.lang.Object

The Configuration class is provided to specify properties and mapping documents to be used when creating WebSpider, CutterBox, IndexWriter,and Searcher.

Note: build methods in this class are not designed in single pattern.

A new Configuration will use the properties specified in jbox.cfg.xml by default.

Version:
1.0
Author:
YiBin.H
See Also:
WebSpider, Cutter, CutterBox, IndexWriter, Searcher

Constructor Summary
Configuration()
          Use default path: jbox.cfg.xml to load configuration file.
Configuration(java.lang.String path)
          Load configuration file with the specified path.
 
Method Summary
 CutterBox buildCutterBox()
          Create a new CutterBox object using the properties and mappings in this configuration.
 IndexWriter buildIndexWriter()
          Create a new IndexWriter object using the properties and mappings in this configuration.
 Searcher buildSearcher()
          Create a new Searcher object using the properties and mappings in this configuration.
 WebSpider buildWebSpider()
          Create a new WebSpider object using the properties and mappings in this configuration.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

Configuration

public Configuration()
Use default path: jbox.cfg.xml to load configuration file.


Configuration

public Configuration(java.lang.String path)
Load configuration file with the specified path.

Parameters:
path - path of configuration file.
Method Detail

buildWebSpider

public WebSpider buildWebSpider()
Create a new WebSpider object using the properties and mappings in this configuration.


The configuration of WebSpider may be like below:


<spider class = "org.jbox.spider.htmlSpider.SimpleSpider">
   <maxPageNume> 1 </maxPageNum>
   <startUrls>
     <property name = "URL">http://localhost</property>
   </startUrls>
   <crawlRules>
     <property name = "Rule">http://.*</property>
   </crawlRules>
</spider>

Attribute "class" specify which concrete WebSpider to be created.

Element <maxPageNum> represents the number of pages to be visited.

Element <startUrls> represents the URL which a WebSpider start with.

Element <crawlRules> represents the rule used when WebSpider visiting internet. A rule is written in REGEXP(regular expression).For example:
"http://.*(\.html)$"
means WebSpider will ignore all URLs unless the URL end with".html".

Returns:
Concrete WebSpider object specified by <Spider> tag in configuration file.
Throws:
ConfigurationException - - if fail to create a WebSpider.

buildCutterBox

public CutterBox buildCutterBox()
Create a new CutterBox object using the properties and mappings in this configuration.


The configuration of CutterBox may be like below:


<cutterBox>
  <cutter language="EN" class="org.jbox.textCutter.EN.SimpleENCutter">
    <property name = "UnicodeScope" start="0x0030" end="0x0039"/>
    <property name = "UnicodeScope" start="0x0041" end="0x005a"/>
    <property name = "UnicodeScope" start="0x0061" end="0x007a"/>
  </cutter>
  <cutter language="CJK" class="org.jbox.textCutter.CJK.SimpleCJKCutter">
    <property name = "UnicodeBlock">CJK_UNIFIED_IDEOGRAPHS</property>
    <property name = "UnicodeBlock">KATAKANA</property>
    <property name = "UnicodeBlock">HANUNOO</property>
  </cutter>
</cutterBox>

Element <cutter> specify a concrete Cutter to put in CutterBox.

Property "UnicodeBlock" specify a unicode scope of Cutter by java.lang.Character.UnicodeBlock.

Property "UnicodeScope" specify a unicode scope of Cutter by a int array with two element: start unicode, and end unicode.

Returns:
CutterBox object represented by <CutterBox> tag in configuration file.
Throws:
ConfigurationException - - if fail to create a CutterBox

buildIndexWriter

public IndexWriter buildIndexWriter()
Create a new IndexWriter object using the properties and mappings in this configuration.


The configuration of IndexWriter may be like below:


<indexWriter class = "org.jbox.index.IndexWriterWithTFLOC">
  <property name = "PageHome">org.jbox.dao. PageHomeByProcedure</property>
  <property name = "WordHome">org.jbox.dao. WordHomeByProcedure</property>
</indexWriter>

Property "PageHome" specify the DAO PageHome of Page

Property "wordHome" specify the DAO WordHome of Word

Returns:
Concrete IndexWriter object represented by <IndexWriter> tag in configuration file.
Throws:
ConfigurationException - - if fail to create a IndexWriter.

buildSearcher

public Searcher buildSearcher()
Create a new Searcher object using the properties and mappings in this configuration.


The configuration of Searcher may be like below:


<searcher class = "org.jbox.searcher.simpleSearcher.SimpleSearcher">
   <property name = "PageHome">org.jbox.dao.PageHomeByProcedure</property>
   <property name = "WordHome">org.jbox.dao.WordHomeByProcedure</property>
</searcher>

Property "PageHome" specify the DAO PageHome of Page

Property "wordHome" specify the DAO WordHome of Word

Returns:
Concrete Searcher object represented by <Searcher> tag in configuration file.
Throws:
ConfigurationException - - if fail to create a Searcher.