Class BinaryDictionary
java.lang.Object
org.apache.lucene.analysis.ja.dict.BinaryDictionary
- All Implemented Interfaces:
Dictionary
- Direct Known Subclasses:
TokenInfoDictionary,UnknownDictionary
Base class for a binary-encoded in-memory dictionary.
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic enumDeprecated, for removal: This API element is subject to removal in a future version. -
Field Summary
FieldsModifier and TypeFieldDescriptionprivate final ByteBufferstatic final Stringstatic final Stringstatic final intflag that the entry has baseform data.static final intflag that the entry has pronunciation data.static final intflag that the entry has reading data.private final String[]private final String[]private final String[]static final Stringstatic final Stringprivate final int[]static final Stringstatic final Stringprivate final int[]static final intFields inherited from interface org.apache.lucene.analysis.ja.dict.Dictionary
INTERNAL_SEPARATOR -
Constructor Summary
ConstructorsModifierConstructorDescriptionprotectedBinaryDictionary(IOSupplier<InputStream> targetMapResource, IOSupplier<InputStream> posResource, IOSupplier<InputStream> dictResource) -
Method Summary
Modifier and TypeMethodDescriptionprivate static intbaseFormOffset(int wordId) getBaseForm(int wordId, char[] surfaceForm, int off, int len) Get base form of wordgetInflectionForm(int wordId) Get inflection form of tokensgetInflectionType(int wordId) Get inflection type of tokensintgetLeftId(int wordId) Get left id of specified wordgetPartOfSpeech(int wordId) Get Part-Of-Speech of tokensgetPronunciation(int wordId, char[] surface, int off, int len) Get pronunciation of tokensgetReading(int wordId, char[] surface, int off, int len) Get reading of tokensstatic final InputStreamgetResource(BinaryDictionary.ResourceScheme scheme, String path) Deprecated, for removal: This API element is subject to removal in a future version.intgetRightId(int wordId) Get right id of specified wordintgetWordCost(int wordId) Get word cost of specified wordprivate booleanhasBaseFormData(int wordId) private booleanhasPronunciationData(int wordId) private booleanhasReadingData(int wordId) voidlookupWordIds(int sourceId, IntsRef ref) private static voidpopulatePosDict(DataInput in, int posSize, String[] posDict, String[] inflTypeDict, String[] inflFormDict) private static voidpopulateTargetMap(DataInput in, int[] targetMap, int[] targetMapOffsets) private intpronunciationOffset(int wordId) private intreadingOffset(int wordId) private StringreadString(int offset, int length, boolean kana)
-
Field Details
-
DICT_FILENAME_SUFFIX
- See Also:
-
TARGETMAP_FILENAME_SUFFIX
- See Also:
-
POSDICT_FILENAME_SUFFIX
- See Also:
-
DICT_HEADER
- See Also:
-
TARGETMAP_HEADER
- See Also:
-
POSDICT_HEADER
- See Also:
-
VERSION
public static final int VERSION- See Also:
-
buffer
-
targetMapOffsets
private final int[] targetMapOffsets -
targetMap
private final int[] targetMap -
posDict
-
inflTypeDict
-
inflFormDict
-
HAS_BASEFORM
public static final int HAS_BASEFORMflag that the entry has baseform data. otherwise it's not inflected (same as surface form)- See Also:
-
HAS_READING
public static final int HAS_READINGflag that the entry has reading data. otherwise reading is surface form converted to katakana- See Also:
-
HAS_PRONUNCIATION
public static final int HAS_PRONUNCIATIONflag that the entry has pronunciation data. otherwise pronunciation is the reading- See Also:
-
-
Constructor Details
-
BinaryDictionary
protected BinaryDictionary(IOSupplier<InputStream> targetMapResource, IOSupplier<InputStream> posResource, IOSupplier<InputStream> dictResource) throws IOException - Throws:
IOException
-
-
Method Details
-
populateTargetMap
private static void populateTargetMap(DataInput in, int[] targetMap, int[] targetMapOffsets) throws IOException - Throws:
IOException
-
populatePosDict
private static void populatePosDict(DataInput in, int posSize, String[] posDict, String[] inflTypeDict, String[] inflFormDict) throws IOException - Throws:
IOException
-
getResource
@Deprecated(forRemoval=true, since="9.1") public static final InputStream getResource(BinaryDictionary.ResourceScheme scheme, String path) throws IOException Deprecated, for removal: This API element is subject to removal in a future version.- Throws:
IOException
-
lookupWordIds
-
getLeftId
public int getLeftId(int wordId) Description copied from interface:DictionaryGet left id of specified word- Specified by:
getLeftIdin interfaceDictionary- Returns:
- left id
-
getRightId
public int getRightId(int wordId) Description copied from interface:DictionaryGet right id of specified word- Specified by:
getRightIdin interfaceDictionary- Returns:
- right id
-
getWordCost
public int getWordCost(int wordId) Description copied from interface:DictionaryGet word cost of specified word- Specified by:
getWordCostin interfaceDictionary- Returns:
- word's cost
-
getBaseForm
Description copied from interface:DictionaryGet base form of word- Specified by:
getBaseFormin interfaceDictionary- Parameters:
wordId- word ID of token- Returns:
- Base form (only different for inflected words, otherwise null)
-
getReading
Description copied from interface:DictionaryGet reading of tokens- Specified by:
getReadingin interfaceDictionary- Parameters:
wordId- word ID of token- Returns:
- Reading of the token
-
getPartOfSpeech
Description copied from interface:DictionaryGet Part-Of-Speech of tokens- Specified by:
getPartOfSpeechin interfaceDictionary- Parameters:
wordId- word ID of token- Returns:
- Part-Of-Speech of the token
-
getPronunciation
Description copied from interface:DictionaryGet pronunciation of tokens- Specified by:
getPronunciationin interfaceDictionary- Parameters:
wordId- word ID of token- Returns:
- Pronunciation of the token
-
getInflectionType
Description copied from interface:DictionaryGet inflection type of tokens- Specified by:
getInflectionTypein interfaceDictionary- Parameters:
wordId- word ID of token- Returns:
- inflection type, or null
-
getInflectionForm
Description copied from interface:DictionaryGet inflection form of tokens- Specified by:
getInflectionFormin interfaceDictionary- Parameters:
wordId- word ID of token- Returns:
- inflection form, or null
-
baseFormOffset
private static int baseFormOffset(int wordId) -
readingOffset
private int readingOffset(int wordId) -
pronunciationOffset
private int pronunciationOffset(int wordId) -
hasBaseFormData
private boolean hasBaseFormData(int wordId) -
hasReadingData
private boolean hasReadingData(int wordId) -
hasPronunciationData
private boolean hasPronunciationData(int wordId) -
readString
-