diff options
author | Petar Petrov <slackalaxy@gmail.com> | 2018-12-24 09:09:59 +0700 |
---|---|---|
committer | Willy Sudiarto Raharjo <willysr@slackbuilds.org> | 2018-12-24 09:09:59 +0700 |
commit | f6b1cc61cacf8d462a563b885065902a78d72fa7 (patch) | |
tree | 16b50bc50f77e1255fb29274f52b8217beec1270 /academic/clark-ugene/README | |
parent | 6e7811910300d45a4b302c25204c7b97ec24019c (diff) |
academic/clark-ugene: Added (supervised sequence classification).
Signed-off-by: Willy Sudiarto Raharjo <willysr@slackbuilds.org>
Diffstat (limited to 'academic/clark-ugene/README')
-rw-r--r-- | academic/clark-ugene/README | 39 |
1 files changed, 39 insertions, 0 deletions
diff --git a/academic/clark-ugene/README b/academic/clark-ugene/README new file mode 100644 index 0000000000000..4e9386f2ff5fa --- /dev/null +++ b/academic/clark-ugene/README @@ -0,0 +1,39 @@ +This is Ugene's (http://ugene.net/) fork of the CLARK tool +(http://clark.cs.ucr.edu/Tool/), with supports building DB directly from +gzip & 7z packed RefSeq files + +CLARK: CLAssifier based on Reduced K-mers + +The problem of DNA sequence classification is central to several +application domains in molecular biology, genomics, metagenomics and +genetics. The problem is computationally challenging due to the size of +datasets generated by modern sequencing instruments and the growing size +of reference sequence databases. + +CLARK is a novel method for supervised sequence classification based on +discriminative k-mers. Somewhat unique among other metagenomic and +genomic classification methods, CLARK provides a confidence score for +its assignments which can be used in downstream analysis. The utility of +CLARK is demonstrated on two distinct specific classification problems: + +1) the assignment of metagenomic reads to known bacterial genomes +2) the assignment of BAC clones and transcript to chromosome arms (in + the absence of a finished assembly for the reference genome). + +Three classifiers or variants in the CLARK framework are provided : +CLARK (default): created for powerful workstation, it may require a +significant amount of RAM to run with large database (e.g., all +bacterial genomes from NCBI/RefSeq). This classifier queries k-mers +with exact matching. + +CLARK-l (light): created for workstations with limited memory, this +software tool provides precise classification on small metagenomes. +Indeed, for metagenomics analysis, CLARK-l works with a sparse or +"light" database (up to 4 GB of RAM) that is built using distant and +non-overlapping k-mers. This classifier queries k-mers with exact +matching. + +CLARK-S (spaced): created for powerful workstation exploiting spaced k- +mers, this classifier requires a higher RAM usage than CLARK or CLARK-l, +but it does offer a higher sensitivity. CLARK-S completes the CLARK +series of classifiers. |