- All Implemented Interfaces:
Closeable,AutoCloseable
(initially inspired by the David Spencer code).
Example Usage:
SpellChecker spellchecker = new SpellChecker(spellIndexDirectory);
// To index a field of a user index:
spellchecker.indexDictionary(new LuceneDictionary(my_lucene_reader, a_field));
// To index a file containing words:
spellchecker.indexDictionary(new PlainTextDictionary(new File("myfile.txt")));
String[] suggestions = spellchecker.suggestSimilar("misspelt", 5);
-
Field Summary
FieldsModifier and TypeFieldDescriptionprivate floatprivate floatprivate floatBoost value for start and end gramsprivate booleanprivate Comparator<SuggestWord> static final floatThe default minimum score to use, if not specified by callingsetAccuracy(float).static final StringField name for each word in the ngram index.private final Objectprivate StringDistanceprivate IndexSearcherprivate final Object(package private) Directorythe spell index -
Constructor Summary
ConstructorsConstructorDescriptionSpellChecker(Directory spellIndex) Use the given directory as a spell checker index with aLevenshteinDistanceas the defaultStringDistance.SpellChecker(Directory spellIndex, StringDistance sd) Use the given directory as a spell checker index.SpellChecker(Directory spellIndex, StringDistance sd, Comparator<SuggestWord> comparator) Use the given directory as a spell checker index with the givenStringDistancemeasure and the givenComparatorfor sorting the results. -
Method Summary
Modifier and TypeMethodDescriptionprivate static voidadd(BooleanQuery.Builder q, String name, String value) Add a clause to a boolean query.private static voidadd(BooleanQuery.Builder q, String name, String value, float boost) Add a clause to a boolean query.private static voidvoidRemoves all terms from the spell check index.voidclose()Close the IndexSearcher used by this SpellCheckerprivate static DocumentcreateDocument(String text, int ng1, int ng2) (package private) IndexSearchercreateSearcher(Directory dir) Creates a new read-only IndexSearcherprivate voidbooleanCheck whether the word exists in the index.private static String[]Form all ngrams for a given word.floatThe accuracy (minimum score) to be used, unless overridden insuggestSimilar(String, int, IndexReader, String, SuggestMode, float), to decide whether a suggestion is included or not.Gets the comparator in use for ranking suggestions.private static intgetMax(int l) private static intgetMin(int l) Returns theStringDistanceinstance used by thisSpellCheckerinstance.final voidindexDictionary(Dictionary dict, IndexWriterConfig config, boolean fullMerge) Indexes the data from the givenDictionary.(package private) booleanisClosed()private IndexSearcherprivate voidreleaseSearcher(IndexSearcher aSearcher) voidsetAccuracy(float acc) Sets the accuracy 0 < minScore < 1; defaultDEFAULT_ACCURACYvoidsetComparator(Comparator<SuggestWord> comparator) Sets theComparatorfor theSuggestWordQueue.voidsetSpellIndex(Directory spellIndexDir) Use a different index as the spell checker index or re-open the existing index ifspellIndexis the same value as given in the constructor.voidSets theStringDistanceimplementation for thisSpellCheckerinstance.String[]suggestSimilar(String word, int numSug) Suggest similar words.String[]suggestSimilar(String word, int numSug, float accuracy) Suggest similar words.String[]suggestSimilar(String word, int numSug, IndexReader ir, String field, SuggestMode suggestMode) String[]suggestSimilar(String word, int numSug, IndexReader ir, String field, SuggestMode suggestMode, float accuracy) Suggest similar words (optionally restricted to a field of an index).private voidswapSearcher(Directory dir)
-
Field Details
-
DEFAULT_ACCURACY
public static final float DEFAULT_ACCURACYThe default minimum score to use, if not specified by callingsetAccuracy(float).- See Also:
-
F_WORD
Field name for each word in the ngram index.- See Also:
-
spellIndex
Directory spellIndexthe spell index -
bStart
private float bStartBoost value for start and end grams -
bEnd
private float bEnd -
searcher
-
searcherLock
-
modifyCurrentIndexLock
-
closed
private volatile boolean closed -
accuracy
private float accuracy -
sd
-
comparator
-
-
Constructor Details
-
SpellChecker
Use the given directory as a spell checker index. The directory is created if it doesn't exist yet.- Parameters:
spellIndex- the spell index directorysd- theStringDistancemeasurement to use- Throws:
IOException- if Spellchecker can not open the directory
-
SpellChecker
Use the given directory as a spell checker index with aLevenshteinDistanceas the defaultStringDistance. The directory is created if it doesn't exist yet.- Parameters:
spellIndex- the spell index directory- Throws:
IOException- if spellchecker can not open the directory
-
SpellChecker
public SpellChecker(Directory spellIndex, StringDistance sd, Comparator<SuggestWord> comparator) throws IOException Use the given directory as a spell checker index with the givenStringDistancemeasure and the givenComparatorfor sorting the results.- Parameters:
spellIndex- The spelling indexsd- The distancecomparator- The comparator- Throws:
IOException- if there is a problem opening the index
-
-
Method Details
-
setSpellIndex
Use a different index as the spell checker index or re-open the existing index ifspellIndexis the same value as given in the constructor.- Parameters:
spellIndexDir- the spell directory to use- Throws:
AlreadyClosedException- if the Spellchecker is already closedIOException- if spellchecker can not open the directory
-
setComparator
Sets theComparatorfor theSuggestWordQueue.- Parameters:
comparator- the comparator
-
getComparator
Gets the comparator in use for ranking suggestions.- See Also:
-
setStringDistance
Sets theStringDistanceimplementation for thisSpellCheckerinstance.- Parameters:
sd- theStringDistanceimplementation for thisSpellCheckerinstance
-
getStringDistance
Returns theStringDistanceinstance used by thisSpellCheckerinstance.- Returns:
- the
StringDistanceinstance used by thisSpellCheckerinstance.
-
setAccuracy
public void setAccuracy(float acc) Sets the accuracy 0 < minScore < 1; defaultDEFAULT_ACCURACY- Parameters:
acc- The new accuracy
-
getAccuracy
public float getAccuracy()The accuracy (minimum score) to be used, unless overridden insuggestSimilar(String, int, IndexReader, String, SuggestMode, float), to decide whether a suggestion is included or not.- Returns:
- The current accuracy setting
-
suggestSimilar
Suggest similar words.As the Lucene similarity that is used to fetch the most relevant n-grammed terms is not the same as the edit distance strategy used to calculate the best matching spell-checked word from the hits that Lucene found, one usually has to retrieve a couple of numSug's in order to get the true best match.
I.e. if numSug == 1, don't count on that suggestion being the best one. Thus, you should set this value to at least 5 for a good suggestion.
- Parameters:
word- the word you want a spell check done onnumSug- the number of suggested words- Returns:
- String[]
- Throws:
IOException- if the underlying index throws anIOExceptionAlreadyClosedException- if the Spellchecker is already closed- See Also:
-
suggestSimilar
Suggest similar words.As the Lucene similarity that is used to fetch the most relevant n-grammed terms is not the same as the edit distance strategy used to calculate the best matching spell-checked word from the hits that Lucene found, one usually has to retrieve a couple of numSug's in order to get the true best match.
I.e. if numSug == 1, don't count on that suggestion being the best one. Thus, you should set this value to at least 5 for a good suggestion.
- Parameters:
word- the word you want a spell check done onnumSug- the number of suggested wordsaccuracy- The minimum score a suggestion must have in order to qualify for inclusion in the results- Returns:
- String[]
- Throws:
IOException- if the underlying index throws anIOExceptionAlreadyClosedException- if the Spellchecker is already closed- See Also:
-
suggestSimilar
public String[] suggestSimilar(String word, int numSug, IndexReader ir, String field, SuggestMode suggestMode) throws IOException - Throws:
IOException
-
suggestSimilar
public String[] suggestSimilar(String word, int numSug, IndexReader ir, String field, SuggestMode suggestMode, float accuracy) throws IOException Suggest similar words (optionally restricted to a field of an index).As the Lucene similarity that is used to fetch the most relevant n-grammed terms is not the same as the edit distance strategy used to calculate the best matching spell-checked word from the hits that Lucene found, one usually has to retrieve a couple of numSug's in order to get the true best match.
I.e. if numSug == 1, don't count on that suggestion being the best one. Thus, you should set this value to at least 5 for a good suggestion.
- Parameters:
word- the word you want a spell check done onnumSug- the number of suggested wordsir- the indexReader of the user index (can be null see field param)field- the field of the user index: if field is not null, the suggested words are restricted to the words present in this field.suggestMode- (NOTE: if indexReader==null and/or field==null, then this is overridden with SuggestMode.SUGGEST_ALWAYS)accuracy- The minimum score a suggestion must have in order to qualify for inclusion in the results- Returns:
- String[] the sorted list of the suggest words with these 2 criteria: first criteria: the edit distance, second criteria (only if restricted mode): the popularity of the suggest words in the field of the user index
- Throws:
IOException- if the underlying index throws anIOExceptionAlreadyClosedException- if the Spellchecker is already closed
-
add
Add a clause to a boolean query. -
add
Add a clause to a boolean query. -
formGrams
Form all ngrams for a given word.- Parameters:
text- the word to parseng- the ngram length e.g. 3- Returns:
- an array of all ngrams in the word and note that duplicates are not removed
-
clearIndex
Removes all terms from the spell check index.- Throws:
IOException- If there is a low-level I/O error.AlreadyClosedException- if the Spellchecker is already closed
-
exist
Check whether the word exists in the index.- Parameters:
word- word to check- Returns:
- true if the word exists in the index
- Throws:
IOException- If there is a low-level I/O error.AlreadyClosedException- if the Spellchecker is already closed
-
indexDictionary
public final void indexDictionary(Dictionary dict, IndexWriterConfig config, boolean fullMerge) throws IOException Indexes the data from the givenDictionary.- Parameters:
dict- Dictionary to indexconfig-IndexWriterConfigto usefullMerge- whether or not the spellcheck index should be fully merged- Throws:
AlreadyClosedException- if the Spellchecker is already closedIOException- If there is a low-level I/O error.
-
getMin
private static int getMin(int l) -
getMax
private static int getMax(int l) -
createDocument
-
addGram
-
obtainSearcher
-
releaseSearcher
- Throws:
IOException
-
ensureOpen
private void ensureOpen() -
close
Close the IndexSearcher used by this SpellChecker- Specified by:
closein interfaceAutoCloseable- Specified by:
closein interfaceCloseable- Throws:
IOException- if the close operation causes anIOExceptionAlreadyClosedException- if theSpellCheckeris already closed
-
swapSearcher
- Throws:
IOException
-
createSearcher
Creates a new read-only IndexSearcher- Parameters:
dir- the directory used to open the searcher- Returns:
- a new read-only IndexSearcher
- Throws:
IOException- f there is a low-level IO error
-
isClosed
boolean isClosed()- Returns:
trueif and only if theSpellCheckeris closed, otherwisefalse.
-