org.jbox.textCutter
Class AbstractCutter

java.lang.Object
  extended by org.jbox.textCutter.AbstractCutter
All Implemented Interfaces:
Cutter
Direct Known Subclasses:
SimpleCJKCutter, SimpleENCutter

public abstract class AbstractCutter
extends java.lang.Object
implements Cutter

A abstract class define default behavior of Cutter.

Version:
1.0
Author:
YiBin.H
See Also:
CutterBox, LanguageFilter

Field Summary
protected  LanguageFilter langFilter
           
 
Constructor Summary
AbstractCutter()
           
 
Method Summary
protected abstract  java.util.Collection<java.lang.String> cutSentenceToWord(java.lang.String checkedString)
          Cut text into words.
 java.util.Collection<java.lang.String> cutSentenceToWord(java.lang.StringBuffer unCheckedString)
          Cut text in a StringBuffer into words.
 void setUnicode(java.lang.Character.UnicodeBlock[] unicodeBlocks, int[][] unicodeScopes)
          Be used to specify the unicode scope of Cutter for filtering text.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

langFilter

protected LanguageFilter langFilter
Constructor Detail

AbstractCutter

public AbstractCutter()
Method Detail

cutSentenceToWord

protected abstract java.util.Collection<java.lang.String> cutSentenceToWord(java.lang.String checkedString)
Cut text into words.

Parameters:
checkedString - text contain chars belongs the unicode scope of the Cutter.
Returns:
words of text.

setUnicode

public void setUnicode(java.lang.Character.UnicodeBlock[] unicodeBlocks,
                       int[][] unicodeScopes)
Be used to specify the unicode scope of Cutter for filtering text.

Specified by:
setUnicode in interface Cutter

cutSentenceToWord

public java.util.Collection<java.lang.String> cutSentenceToWord(java.lang.StringBuffer unCheckedString)
Description copied from interface: Cutter
Cut text in a StringBuffer into words.

Specified by:
cutSentenceToWord in interface Cutter
Parameters:
unCheckedString - text contain chars with different code of languages.
Returns:
string collection containing words of text that belong to unicode scope of the Cutter.