|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectorg.jbox.configuration.Configuration
public class Configuration
The Configuration
class is provided to specify properties and
mapping documents to be used when creating WebSpider
,
CutterBox
, IndexWriter
,and Searcher
.
Note: build methods in this class are not designed in single pattern.
A new Configuration will use the properties specified in jbox.cfg.xml by default.
WebSpider
,
Cutter
,
CutterBox
,
IndexWriter
,
Searcher
Constructor Summary | |
---|---|
Configuration()
Use default path: jbox.cfg.xml to load configuration file. |
|
Configuration(java.lang.String path)
Load configuration file with the specified path. |
Method Summary | |
---|---|
CutterBox |
buildCutterBox()
Create a new CutterBox object using the properties and mappings
in this configuration. |
IndexWriter |
buildIndexWriter()
Create a new IndexWriter object using the properties and mappings
in this configuration. |
Searcher |
buildSearcher()
Create a new Searcher object using the properties and mappings in
this configuration. |
WebSpider |
buildWebSpider()
Create a new WebSpider object using the properties and mappings
in this configuration. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public Configuration()
public Configuration(java.lang.String path)
path
- path of configuration file.Method Detail |
---|
public WebSpider buildWebSpider()
WebSpider
object using the properties and mappings
in this configuration.
The configuration of WebSpider may be like below:
<spider class = "org.jbox.spider.htmlSpider.SimpleSpider">
<maxPageNume> 1 </maxPageNum>
<startUrls>
<property name =
"URL">http://localhost</property>
</startUrls>
<crawlRules>
<property name =
"Rule">http://.*</property>
</crawlRules>
</spider>
Attribute "class" specify which concrete WebSpider
to be created.
Element <maxPageNum> represents the number of pages to be visited.
Element <startUrls> represents the URL which a WebSpider start with.
Element <crawlRules> represents the rule used when WebSpider
visiting internet. A rule is written in REGEXP(regular expression).For
example:
"http://.*(\.html)$"
means WebSpider will ignore all URLs unless the URL end with".html".
ConfigurationException
- -
if fail to create a WebSpider
.public CutterBox buildCutterBox()
CutterBox
object using the properties and mappings
in this configuration.
The configuration of CutterBox may be like below:
<cutterBox>
<cutter language="EN"
class="org.jbox.textCutter.EN.SimpleENCutter">
<property name = "UnicodeScope" start="0x0030"
end="0x0039"/>
<property name = "UnicodeScope" start="0x0041"
end="0x005a"/>
<property name = "UnicodeScope" start="0x0061"
end="0x007a"/>
</cutter>
<cutter language="CJK"
class="org.jbox.textCutter.CJK.SimpleCJKCutter">
<property name =
"UnicodeBlock">CJK_UNIFIED_IDEOGRAPHS</property>
<property name =
"UnicodeBlock">KATAKANA</property>
<property name =
"UnicodeBlock">HANUNOO</property>
</cutter>
</cutterBox>
Element <cutter> specify a concrete Cutter
to put in
CutterBox.
Property "UnicodeBlock" specify a unicode scope of Cutter
by
java.lang.Character.UnicodeBlock
.
Property "UnicodeScope" specify a unicode scope of Cutter
by a
int array with two element: start unicode, and end unicode.
CutterBox
object represented by <CutterBox> tag in
configuration file.
ConfigurationException
- -
if fail to create a CutterBox
public IndexWriter buildIndexWriter()
IndexWriter
object using the properties and mappings
in this configuration.
The configuration of IndexWriter
may be like below:
<indexWriter class = "org.jbox.index.IndexWriterWithTFLOC">
<property name = "PageHome">org.jbox.dao.
PageHomeByProcedure</property>
<property name = "WordHome">org.jbox.dao.
WordHomeByProcedure</property>
</indexWriter>
Property "PageHome" specify the DAO PageHome
of
Page
Property "wordHome" specify the DAO WordHome
of
Word
IndexWriter
object represented by
<IndexWriter> tag in configuration file.
ConfigurationException
- -
if fail to create a IndexWriter
.public Searcher buildSearcher()
Searcher
object using the properties and mappings in
this configuration.
The configuration of Searcher
may be like below:
<searcher class =
"org.jbox.searcher.simpleSearcher.SimpleSearcher">
<property name =
"PageHome">org.jbox.dao.PageHomeByProcedure</property>
<property name =
"WordHome">org.jbox.dao.WordHomeByProcedure</property>
</searcher>
Property "PageHome" specify the DAO PageHome
of
Page
Property "wordHome" specify the DAO WordHome
of
Word
Searcher
object represented by <Searcher>
tag in configuration file.
ConfigurationException
- -
if fail to create a Searcher
.
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |