org.jbox.textCutter.util
Class NoiseFilter

java.lang.Object
  extended by org.jbox.textCutter.util.NoiseFilter

public class NoiseFilter
extends java.lang.Object

A filter is used to filter noise word.

It is used to filter noise word for CutterBox. All noise words must be defined in a file in the directory "DICT/NOISE/". For example, word "fool" needed to be filtered, it should be added to a file in "DICT/NOISE/", or added to a new file such as "myNoise.txt" in "DICT/NOISE/". Then the word "fool" will be ignored when cutting text. CutterBox will invoking this method when calling CutterBox.cutPage(Page).

Version:
1.0
See Also:
Dict, CutterBox

Field Summary
protected  Dict noise
           
 
Constructor Summary
NoiseFilter()
          Constructs a new NoiseFilter with default path "DICT/NOISE".
NoiseFilter(java.lang.String path)
          Constructs a new NoiseFilter with path.
 
Method Summary
 void filterNoise(java.util.Collection<Word> words)
          Filter noise words, remove words defined in noise dictionary from the specified collection .
 void filterRedundancy(java.util.Collection<Word> unFilteredWords)
          Filter redundant words.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

noise

protected Dict noise
Constructor Detail

NoiseFilter

public NoiseFilter()
Constructs a new NoiseFilter with default path "DICT/NOISE".


NoiseFilter

public NoiseFilter(java.lang.String path)
Constructs a new NoiseFilter with path.

Parameters:
path - path of noise dictionary file or directory.
Method Detail

filterNoise

public void filterNoise(java.util.Collection<Word> words)
Filter noise words, remove words defined in noise dictionary from the specified collection .

Parameters:
words - Collection containing Word objects to be filtered.

filterRedundancy

public void filterRedundancy(java.util.Collection<Word> unFilteredWords)
Filter redundant words. If the specified collection contain two Word at the same string, it will remove one word from the collection, and then add the locations of removed word to the other word. For example, suppose that there are two words at the same string "fun", and first word has the locations {1,2}, second has {5}, after invoking this method, the second word was removed, and locations of the first word is {1,2,5}.

Parameters:
unFilteredWords -