aboutsummaryrefslogtreecommitdiff
path: root/libraries/libexttextcat/README
diff options
context:
space:
mode:
authorHunter Sezen <ovariegata@yahoo.com>2017-12-19 21:43:25 +0000
committerWilly Sudiarto Raharjo <willysr@slackbuilds.org>2017-12-21 08:19:23 +0700
commit4c4269a4c7d788c7b73f60423aaf01cda72d4f8b (patch)
tree25d54b931c01185d920050cefa18c02232ab84f5 /libraries/libexttextcat/README
parentdec9ed6dfab03059ef9716c95b24e5d995b59a2d (diff)
libraries/libexttextcat: Updated for version 3.4.5.
Signed-off-by: David Spencer <idlemoor@slackbuilds.org>
Diffstat (limited to 'libraries/libexttextcat/README')
-rw-r--r--libraries/libexttextcat/README4
1 files changed, 2 insertions, 2 deletions
diff --git a/libraries/libexttextcat/README b/libraries/libexttextcat/README
index 3b9743c04a4d..9332783b6ed7 100644
--- a/libraries/libexttextcat/README
+++ b/libraries/libexttextcat/README
@@ -3,7 +3,7 @@ classification technique described in Cavnar & Trenkle, "N-Gram-Based
Text Categorization". It was primarily developed for language
guessing, a task on which it is known to perform with near-perfect
accuracy.
-
+
The central idea of the Cavnar & Trenkle technique is to calculate a
"fingerprint" of a document with an unknown category, and compare this
with the fingerprints of a number of documents of which the categories
@@ -12,7 +12,7 @@ classification. A fingerprint is a list of the most frequent n-grams
occurring in a document, ordered by frequency. Fingerprints are
compared with a simple out-of-place metric. See the article for more
details.
-
+
Considerable effort went into making this implementation fast and
efficient. The language guesser processes over 100 documents/second on
a simple PC, which makes it practical for many uses. It was developed