blob: 849fdd32463eab309cc66f20a77a68bec630047c (
plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
|
Snowball stemming algorithms, for information retrieval
Stemming algorithms
PyStemmer provides access to efficient algorithms for calculating a
"stemmed" form of a word. This is a form with most of the common
morphological endings removed; hopefully representing a common
linguistic base form. This is most useful in building search
engines and information retrieval software; for example, a search
with stemming enabled should be able to find a document containing
"cycling" given the query "cycles".
PyStemmer provides algorithms for several (mainly european) languages,
by wrapping the libstemmer library from the Snowball project in a
Python module.
It also provides access to the classic Porter stemming algorithm for
english: although this has been superceded by an improved algorithm,
the original algorithm may be of interest to information retrieval
researchers wishing to reproduce results of earlier experiments.
|