Package org.apache.lucene.analysis.br
Class BrazilianStemmer
java.lang.Object
org.apache.lucene.analysis.br.BrazilianStemmer
A stemmer for Brazilian Portuguese words.
-
Field Summary
Fields -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionprivate StringchangeTerm(String value) 1) Turn to lowercase 2) Remove accents 3) ã -> a ; õ -> o 4) ç -> cprivate voidCreates CT (changed term) , substituting * 'ã' and 'õ' for 'a~' and 'o~'.private StringGets R1private StringGets RVprivate booleanisIndexable(String term) Checks a term if it can be processed indexed.private booleanisStemmable(String term) Checks a term if it can be processed correctly.private booleanisVowel(char value) See if string is 'a','e','i','o','u'log()For log and debug purposeprivate StringremoveSuffix(String value, String toRemove) Remove a string suffixprivate StringreplaceSuffix(String value, String toReplace, String changeTo) Replace a string suffix by anotherprotected StringStems the given term to an uniquediscriminator.private booleanstep1()Standard suffix removal.private booleanstep2()Verb suffixes.private voidstep3()Delete suffix 'i' if in RV and preceded by 'c'private voidstep4()Residual suffixprivate voidstep5()If the word ends with one of ( e é ê) in RV,delete it, and if preceded by 'gu' (or 'ci') with the 'u' (or 'i') in RV, delete the 'u' (or 'i')private booleanCheck if a string ends with a suffixprivate booleansuffixPreceded(String value, String suffix, String preceded) See if a suffix is preceded by a String
-
Field Details
-
locale
-
TERM
Changed term -
CT
-
R1
-
R2
-
RV
-
-
Constructor Details
-
BrazilianStemmer
public BrazilianStemmer()
-
-
Method Details
-
stem
Stems the given term to an uniquediscriminator.- Parameters:
term- The term that should be stemmed.- Returns:
- Discriminator for
term
-
isStemmable
Checks a term if it can be processed correctly.- Returns:
- true if, and only if, the given term consists in letters.
-
isIndexable
Checks a term if it can be processed indexed.- Returns:
- true if it can be indexed
-
isVowel
private boolean isVowel(char value) See if string is 'a','e','i','o','u'- Returns:
- true if is vowel
-
getR1
Gets R1R1 - is the region after the first non-vowel following a vowel, or is the null region at the end of the word if there is no such non-vowel.
- Returns:
- null or a string representing R1
-
getRV
Gets RVRV - IF the second letter is a consonant, RV is the region after the next following vowel,
OR if the first two letters are vowels, RV is the region after the next consonant,
AND otherwise (consonant-vowel case) RV is the region after the third letter.
BUT RV is the end of the word if this positions cannot be found.
- Returns:
- null or a string representing RV
-
changeTerm
1) Turn to lowercase 2) Remove accents 3) ã -> a ; õ -> o 4) ç -> c- Returns:
- null or a string transformed
-
suffix
Check if a string ends with a suffix- Returns:
- true if the string ends with the specified suffix
-
replaceSuffix
Replace a string suffix by another- Returns:
- the replaced String
-
removeSuffix
Remove a string suffix- Returns:
- the String without the suffix
-
suffixPreceded
See if a suffix is preceded by a String- Returns:
- true if the suffix is preceded
-
createCT
Creates CT (changed term) , substituting * 'ã' and 'õ' for 'a~' and 'o~'. -
step1
private boolean step1()Standard suffix removal. Search for the longest among the following suffixes, and perform the following actions:- Returns:
- false if no ending was removed
-
step2
private boolean step2()Verb suffixes.Search for the longest among the following suffixes in RV, and if found, delete.
- Returns:
- false if no ending was removed
-
step3
private void step3()Delete suffix 'i' if in RV and preceded by 'c' -
step4
private void step4()Residual suffixIf the word ends with one of the suffixes (os a i o á í ó) in RV, delete it
-
step5
private void step5()If the word ends with one of ( e é ê) in RV,delete it, and if preceded by 'gu' (or 'ci') with the 'u' (or 'i') in RV, delete the 'u' (or 'i')Or if the word ends ç remove the cedilha
-
log
For log and debug purpose- Returns:
- TERM, CT, RV, R1 and R2
-