|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectorg.jbox.configuration.Configuration
public class Configuration
The Configuration class is provided to specify properties and
mapping documents to be used when creating WebSpider,
CutterBox, IndexWriter,and Searcher.
Note: build methods in this class are not designed in single pattern.
A new Configuration will use the properties specified in jbox.cfg.xml by default.
WebSpider,
Cutter,
CutterBox,
IndexWriter,
Searcher| Constructor Summary | |
|---|---|
Configuration()
Use default path: jbox.cfg.xml to load configuration file. |
|
Configuration(java.lang.String path)
Load configuration file with the specified path. |
|
| Method Summary | |
|---|---|
CutterBox |
buildCutterBox()
Create a new CutterBox object using the properties and mappings
in this configuration. |
IndexWriter |
buildIndexWriter()
Create a new IndexWriter object using the properties and mappings
in this configuration. |
Searcher |
buildSearcher()
Create a new Searcher object using the properties and mappings in
this configuration. |
WebSpider |
buildWebSpider()
Create a new WebSpider object using the properties and mappings
in this configuration. |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Constructor Detail |
|---|
public Configuration()
public Configuration(java.lang.String path)
path - path of configuration file.| Method Detail |
|---|
public WebSpider buildWebSpider()
WebSpider object using the properties and mappings
in this configuration.
The configuration of WebSpider may be like below:
<spider class = "org.jbox.spider.htmlSpider.SimpleSpider">
<maxPageNume> 1 </maxPageNum>
<startUrls>
<property name =
"URL">http://localhost</property>
</startUrls>
<crawlRules>
<property name =
"Rule">http://.*</property>
</crawlRules>
</spider>
Attribute "class" specify which concrete WebSpider to be created.
Element <maxPageNum> represents the number of pages to be visited.
Element <startUrls> represents the URL which a WebSpider start with.
Element <crawlRules> represents the rule used when WebSpider
visiting internet. A rule is written in REGEXP(regular expression).For
example:
"http://.*(\.html)$"
means WebSpider will ignore all URLs unless the URL end with".html".
ConfigurationException - -
if fail to create a WebSpider.public CutterBox buildCutterBox()
CutterBox object using the properties and mappings
in this configuration.
The configuration of CutterBox may be like below:
<cutterBox>
<cutter language="EN"
class="org.jbox.textCutter.EN.SimpleENCutter">
<property name = "UnicodeScope" start="0x0030"
end="0x0039"/>
<property name = "UnicodeScope" start="0x0041"
end="0x005a"/>
<property name = "UnicodeScope" start="0x0061"
end="0x007a"/>
</cutter>
<cutter language="CJK"
class="org.jbox.textCutter.CJK.SimpleCJKCutter">
<property name =
"UnicodeBlock">CJK_UNIFIED_IDEOGRAPHS</property>
<property name =
"UnicodeBlock">KATAKANA</property>
<property name =
"UnicodeBlock">HANUNOO</property>
</cutter>
</cutterBox>
Element <cutter> specify a concrete Cutter to put in
CutterBox.
Property "UnicodeBlock" specify a unicode scope of Cutter by
java.lang.Character.UnicodeBlock.
Property "UnicodeScope" specify a unicode scope of Cutter by a
int array with two element: start unicode, and end unicode.
CutterBox object represented by <CutterBox> tag in
configuration file.
ConfigurationException - -
if fail to create a CutterBoxpublic IndexWriter buildIndexWriter()
IndexWriter object using the properties and mappings
in this configuration.
The configuration of IndexWriter may be like below:
<indexWriter class = "org.jbox.index.IndexWriterWithTFLOC">
<property name = "PageHome">org.jbox.dao.
PageHomeByProcedure</property>
<property name = "WordHome">org.jbox.dao.
WordHomeByProcedure</property>
</indexWriter>
Property "PageHome" specify the DAO PageHome of
Page
Property "wordHome" specify the DAO WordHome of
Word
IndexWriter object represented by
<IndexWriter> tag in configuration file.
ConfigurationException - -
if fail to create a IndexWriter.public Searcher buildSearcher()
Searcher object using the properties and mappings in
this configuration.
The configuration of Searcher may be like below:
<searcher class =
"org.jbox.searcher.simpleSearcher.SimpleSearcher">
<property name =
"PageHome">org.jbox.dao.PageHomeByProcedure</property>
<property name =
"WordHome">org.jbox.dao.WordHomeByProcedure</property>
</searcher>
Property "PageHome" specify the DAO PageHome of
Page
Property "wordHome" specify the DAO WordHome of
Word
Searcher object represented by <Searcher>
tag in configuration file.
ConfigurationException - -
if fail to create a Searcher.
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||