Uses of Class
org.apache.lucene.analysis.Tokenizer
Packages that use Tokenizer
Package
Description
Text analysis.
Fast, general-purpose grammar-based tokenizers.
Analyzer for Simplified Chinese, which indexes words.
Basic, general-purpose analysis components.
Fast, general-purpose URLs and email addresses tokenizers.
Tokenizer that breaks text into words with the Unicode Text Segmentation algorithm.
Analyzer for Japanese.
Analyzer for Korean.
Character n-gram tokenizers and filters.
Analysis components for path-like strings such as filenames.
Set of components for pattern-based (regex) analysis.
Fast, general-purpose grammar-based tokenizer
StandardTokenizer implements the Word Break rules from the
Unicode Text Segmentation algorithm, as specified in Unicode Standard Annex #29.Analyzer for Thai.
Utility functions for text analysis.
Tokenizer that is aware of Wikipedia syntax.
-
Uses of Tokenizer in org.apache.lucene.analysis
Methods in org.apache.lucene.analysis that return TokenizerModifier and TypeMethodDescriptionfinal TokenizerTokenizerFactory.create()Creates a TokenStream of the specified input using the default attribute factory.abstract TokenizerTokenizerFactory.create(AttributeFactory factory) Creates a TokenStream of the specified input using the given AttributeFactoryConstructors in org.apache.lucene.analysis with parameters of type TokenizerModifierConstructorDescriptionTokenStreamComponents(Tokenizer tokenizer) Creates a newAnalyzer.TokenStreamComponentsfrom a TokenizerTokenStreamComponents(Tokenizer tokenizer, TokenStream result) Creates a newAnalyzer.TokenStreamComponentsinstance -
Uses of Tokenizer in org.apache.lucene.analysis.classic
Subclasses of Tokenizer in org.apache.lucene.analysis.classicModifier and TypeClassDescriptionfinal classA grammar-based tokenizer constructed with JFlex -
Uses of Tokenizer in org.apache.lucene.analysis.cn.smart
Subclasses of Tokenizer in org.apache.lucene.analysis.cn.smartModifier and TypeClassDescriptionclassTokenizer for Chinese or mixed Chinese-English text.Methods in org.apache.lucene.analysis.cn.smart that return Tokenizer -
Uses of Tokenizer in org.apache.lucene.analysis.core
Subclasses of Tokenizer in org.apache.lucene.analysis.coreModifier and TypeClassDescriptionfinal classEmits the entire input as a single token.classA LetterTokenizer is a tokenizer that divides text at non-letters.final classA UnicodeWhitespaceTokenizer is a tokenizer that divides text at whitespace.final classA tokenizer that divides text at whitespace characters as defined byCharacter.isWhitespace(int).Methods in org.apache.lucene.analysis.core that return Tokenizer -
Uses of Tokenizer in org.apache.lucene.analysis.email
Subclasses of Tokenizer in org.apache.lucene.analysis.emailModifier and TypeClassDescriptionfinal classThis class implements Word Break rules from the Unicode Text Segmentation algorithm, as specified in Unicode Standard Annex #29 URLs and email addresses are also tokenized according to the relevant RFCs. -
Uses of Tokenizer in org.apache.lucene.analysis.icu.segmentation
Subclasses of Tokenizer in org.apache.lucene.analysis.icu.segmentationModifier and TypeClassDescriptionfinal classBreaks text into words according to UAX #29: Unicode Text Segmentation (http://www.unicode.org/reports/tr29/) -
Uses of Tokenizer in org.apache.lucene.analysis.ja
Subclasses of Tokenizer in org.apache.lucene.analysis.jaModifier and TypeClassDescriptionfinal classTokenizer for Japanese that uses morphological analysis. -
Uses of Tokenizer in org.apache.lucene.analysis.ko
Subclasses of Tokenizer in org.apache.lucene.analysis.koModifier and TypeClassDescriptionfinal classTokenizer for Korean that uses morphological analysis. -
Uses of Tokenizer in org.apache.lucene.analysis.ngram
Subclasses of Tokenizer in org.apache.lucene.analysis.ngramModifier and TypeClassDescriptionclassTokenizes the input from an edge into n-grams of given size(s).classTokenizes the input into n-grams of the given size(s).Methods in org.apache.lucene.analysis.ngram that return TokenizerModifier and TypeMethodDescriptionEdgeNGramTokenizerFactory.create(AttributeFactory factory) NGramTokenizerFactory.create(AttributeFactory factory) -
Uses of Tokenizer in org.apache.lucene.analysis.path
Subclasses of Tokenizer in org.apache.lucene.analysis.pathModifier and TypeClassDescriptionclassTokenizer for path-like hierarchies.classTokenizer for domain-like hierarchies.Methods in org.apache.lucene.analysis.path that return Tokenizer -
Uses of Tokenizer in org.apache.lucene.analysis.pattern
Subclasses of Tokenizer in org.apache.lucene.analysis.patternModifier and TypeClassDescriptionfinal classThis tokenizer uses regex pattern matching to construct distinct tokens for the input stream.final classfinal class -
Uses of Tokenizer in org.apache.lucene.analysis.standard
Subclasses of Tokenizer in org.apache.lucene.analysis.standardModifier and TypeClassDescriptionfinal classA grammar-based tokenizer constructed with JFlex. -
Uses of Tokenizer in org.apache.lucene.analysis.th
Subclasses of Tokenizer in org.apache.lucene.analysis.thMethods in org.apache.lucene.analysis.th that return Tokenizer -
Uses of Tokenizer in org.apache.lucene.analysis.util
Subclasses of Tokenizer in org.apache.lucene.analysis.utilModifier and TypeClassDescriptionclassAn abstract base class for simple, character-oriented tokenizers.classBreaks text into sentences with aBreakIteratorand allows subclasses to decompose these sentences into words. -
Uses of Tokenizer in org.apache.lucene.analysis.wikipedia
Subclasses of Tokenizer in org.apache.lucene.analysis.wikipediaModifier and TypeClassDescriptionfinal classExtension of StandardTokenizer that is aware of Wikipedia syntax.