diff options
author | Petar Petrov <slackalaxy@gmail.com> | 2022-10-08 18:13:17 +0100 |
---|---|---|
committer | Willy Sudiarto Raharjo <willysr@slackbuilds.org> | 2022-10-15 10:47:28 +0700 |
commit | 1f43c07f3ce63f69512eda9410131fad0d266ea5 (patch) | |
tree | 5ed4ffaf5afc8a991dfa015786554799d23483ef /academic | |
parent | d5622d171bc7a0a532386d1d82232d3c0b384275 (diff) |
academic/muscle5: Added (MUSCLE 5: Next-generation MUSCLE)
Signed-off-by: Willy Sudiarto Raharjo <willysr@slackbuilds.org>
Diffstat (limited to 'academic')
-rw-r--r-- | academic/muscle5/README | 28 | ||||
-rw-r--r-- | academic/muscle5/References | 5 | ||||
-rw-r--r-- | academic/muscle5/muscle5.1 | 93 | ||||
-rw-r--r-- | academic/muscle5/muscle5.SlackBuild | 118 | ||||
-rw-r--r-- | academic/muscle5/muscle5.info | 10 | ||||
-rw-r--r-- | academic/muscle5/slack-desc | 19 |
6 files changed, 273 insertions, 0 deletions
diff --git a/academic/muscle5/README b/academic/muscle5/README new file mode 100644 index 0000000000000..bdea0f68e6d36 --- /dev/null +++ b/academic/muscle5/README @@ -0,0 +1,28 @@ +MUSCLE 5: Next-generation MUSCLE + +Muscle v5 is a major re-write of MUSCLE based on new algorithms. + +* Highest accuracy, scalable to thousands of sequences: +Compared to previous versions, Muscle v5 is much more accurate, is often +faster, and scales to much larger datasets. At the time of writing (late +2021), Muscle v5 has the highest scores on multiple alignment benchmarks +including Balibase, Bralibase, Prefab and Balifam. It can align tens of +thousands of sequences with high accuracy on a low-cost commodity +computer (say, an 8-core Intel CPU with 32 Gb RAM). On large datasets, +Muscle v5 is 20-30% more accurate than MAFFT and Clustal-Omega. + +* Alignment ensembles: +Muscle v5 can generate ensembles of high-accuracy alternative +alignments. All replicates have equal average accuracy on benchmark +test, including the MSA made with default parameters. By comparing +results of downstream analysis (trees, structure prediction...) on +different replicates, you can assess the effects of alignment errors on +your study. + +* Manual: +https://drive5.com/muscle5/manual/ + +* Reference (included in the package) +R.C. Edgar (2021) "MUSCLE v5 enables improved estimates of phylogenetic +tree confidence by ensemble bootstrapping" +https://www.biorxiv.org/content/10.1101/2021.06.20.449169v1.full.pdf diff --git a/academic/muscle5/References b/academic/muscle5/References new file mode 100644 index 0000000000000..e11f73531f734 --- /dev/null +++ b/academic/muscle5/References @@ -0,0 +1,5 @@ +References + +R.C. Edgar (2021) "MUSCLE v5 enables improved estimates of phylogenetic +tree confidence by ensemble bootstrapping" +https://www.biorxiv.org/content/10.1101/2021.06.20.449169v1.full.pdf diff --git a/academic/muscle5/muscle5.1 b/academic/muscle5/muscle5.1 new file mode 100644 index 0000000000000..d1c2661ec23d8 --- /dev/null +++ b/academic/muscle5/muscle5.1 @@ -0,0 +1,93 @@ +.\" DO NOT MODIFY THIS FILE! It was generated by help2man 1.48.5. +.TH MUSCLE "1" "January 2022" "muscle 5.1" "User Commands" +.SH NAME +muscle \- Multiple alignment program of protein sequences +.SH DESCRIPTION +MUSCLE is a multiple alignment program for protein sequences. MUSCLE +stands for multiple sequence comparison by log-expectation. In the +authors tests, MUSCLE achieved the highest scores of all tested +programs on several alignment accuracy benchmarks, and is also one of +the fastest programs out there. +.SH USAGE +.SS "Align FASTA input, write aligned FASTA (AFA) output:" +.IP +muscle \fB\-align\fR input.fa \fB\-output\fR aln.afa +.PP +Align large input using Super5 algorithm if \fB\-align\fR is too expensive, +typically needed with more than a few hundred sequences: +.IP +muscle \fB\-super5\fR input.fa \fB\-output\fR aln.afa +.SS "Single replicate alignment:" +.IP +muscle \fB\-align\fR input.fa \fB\-perm\fR PERM \fB\-perturb\fR SEED \fB\-output\fR aln.afa +muscle \fB\-super5\fR input.fa \fB\-perm\fR PERM \fB\-perturb\fR SEED \fB\-output\fR aln.afa +.IP +PERM is guide tree permutation none, abc, acb, bca (default none). +SEED is perturbation seed 0, 1, 2... (default 0 = don't perturb). +.PP +Ensemble of replicate alignments, output in Ensemble FASTA (EFA) format, +EFA has one aligned FASTA for each replicate with header line "<PERM.SEED": +.IP +muscle \fB\-align\fR input.fa \fB\-stratified\fR \fB\-output\fR stratified_ensemble.efa +muscle \fB\-align\fR input.fa \fB\-diversified\fR \fB\-output\fR diversified_ensemble.afa +.HP +\fB\-replicates\fR N +.IP +Number of replicates, defaults 4, 100, 100 for stratified, +.IP +diversified, resampled. With \fB\-stratified\fR there is one +replicate per guide tree permutation, total is 4 x N. +.PP +Generate resampled ensemble from existing ensemble by sampling columns +with replacement: +.IP +muscle \fB\-resample\fR ensemble.efa \fB\-output\fR resampled.efa +.HP +\fB\-maxgapfract\fR F +.IP +Maximum fraction of gaps in a column (F=0..1, default 0.5). +.HP +\fB\-minconf\fR CC +.IP +Minimum column confidence (CC=0..1, default 0.5). +.PP +If ensemble output filename has @, then one FASTA file is generated +for each replicate where @ is replaced by perm.s, otherwise all replicates +are written to one EFA file. +.SS "Calculate disperson of an ensemble:" +.IP +muscle \fB\-disperse\fR ensemble.efa +.SS "Extract replicate with highest total CC (diversified input recommended):" +.IP +muscle \fB\-maxcc\fR ensemble.efa \fB\-output\fR maxcc.afa +.SS "Extract aligned FASTA files from EFA file:" +.IP +muscle \fB\-efa_explode\fR ensemble.efa +.SS "Convert FASTA to EFA, input has one filename per line:" +.IP +muscle \fB\-fa2efa\fR filenames.txt \fB\-output\fR ensemble.efa +.PP +Update ensemble by adding two sequences of digits to each replicate, digits +are column confidence (CC) values, e.g. "73" means CC=0.73, "++" is CC=1.0: +.IP +muscle \fB\-addconfseqs\fR ensemble.efa \fB\-output\fR ensemble_cc.efa +.PP +Calculate letter confidence (LC) values, \fB\-ref\fR specifies the alignment to +compare against the ensemble (e.g. from \fB\-maxcc\fR), output is in aligned +FASTA format with LC values 0, 1 ... 9 instead of letters: +.IP +muscle \fB\-letterconf\fR ensemble.efa \fB\-ref\fR aln.afa \fB\-output\fR letterconf.afa +.HP +\fB\-html\fR aln.html +.IP +Alignment colored by LC in HTML format. +.HP +\fB\-jalview\fR aln.features +.IP +Jalview feature file with LC values and colors. +.SS "More documentation at:" +.IP +https://drive5.com/muscle +.SH AUTHOR + This manpage was written by Andreas Tille for the Debian distribution and + can be used for any other usage of the program. diff --git a/academic/muscle5/muscle5.SlackBuild b/academic/muscle5/muscle5.SlackBuild new file mode 100644 index 0000000000000..541a2182a3f3b --- /dev/null +++ b/academic/muscle5/muscle5.SlackBuild @@ -0,0 +1,118 @@ +#!/bin/bash + +# Slackware build script for muscle5 + +# Copyright 2022 Petar Petrov slackalaxy@gmail.com +# All rights reserved. +# +# Redistribution and use of this script, with or without modification, is +# permitted provided that the following conditions are met: +# +# 1. Redistributions of this script must retain the above copyright +# notice, this list of conditions and the following disclaimer. +# +# THIS SOFTWARE IS PROVIDED BY THE AUTHOR "AS IS" AND ANY EXPRESS OR IMPLIED +# WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF +# MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO +# EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, +# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; +# OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, +# WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR +# OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF +# ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +cd $(dirname $0) ; CWD=$(pwd) + +PRGNAM=muscle5 +VERSION=${VERSION:-5.1} +BUILD=${BUILD:-1} +TAG=${TAG:-_SBo} +PKGTYPE=${PKGTYPE:-tgz} + +SRCNAM=muscle + +if [ -z "$ARCH" ]; then + case "$( uname -m )" in + i?86) ARCH=i586 ;; + arm*) ARCH=arm ;; + *) ARCH=$( uname -m ) ;; + esac +fi + +# If the variable PRINT_PACKAGE_NAME is set, then this script will report what +# the name of the created package would be, and then exit. This information +# could be useful to other scripts. +if [ ! -z "${PRINT_PACKAGE_NAME}" ]; then + echo "$PRGNAM-$VERSION-$ARCH-$BUILD$TAG.$PKGTYPE" + exit 0 +fi + +TMP=${TMP:-/tmp/SBo} +PKG=$TMP/package-$PRGNAM +OUTPUT=${OUTPUT:-/tmp} + +if [ "$ARCH" = "i586" ]; then + SLKCFLAGS="-O2 -march=i586 -mtune=i686" + LIBDIRSUFFIX="" +elif [ "$ARCH" = "i686" ]; then + SLKCFLAGS="-O2 -march=i686 -mtune=i686" + LIBDIRSUFFIX="" +elif [ "$ARCH" = "x86_64" ]; then + SLKCFLAGS="-O2 -fPIC" + LIBDIRSUFFIX="64" +else + SLKCFLAGS="-O2" + LIBDIRSUFFIX="" +fi + +set -e + +rm -rf $PKG +mkdir -p $TMP $PKG $OUTPUT +cd $TMP +rm -rf $SRCNAM-$VERSION +tar xvf $CWD/$SRCNAM-$VERSION.tar.gz +cd $SRCNAM-$VERSION + +chown -R root:root . +find -L . \ + \( -perm 777 -o -perm 775 -o -perm 750 -o -perm 711 -o -perm 555 \ + -o -perm 511 \) -exec chmod 755 {} \; -o \ + \( -perm 666 -o -perm 664 -o -perm 640 -o -perm 600 -o -perm 444 \ + -o -perm 440 -o -perm 400 \) -exec chmod 644 {} \; + +cd src + +# do not create static executable +sed -i "s:LDFLAGS += -static:#LDFLAGS += -static:" Makefile +make CFLAGS="$SLKCFLAGS" \ +CXXFLAGS="$SLKCFLAGS" + +install -D -m755 Linux/$SRCNAM $PKG/usr/bin/$PRGNAM +cd .. + +# Thanks to Debian for the man page +mkdir -p $PKG/usr/man/man1 +cp $CWD/$PRGNAM.1 $PKG/usr/man/man1/$PRGNAM.1 + +# The Makefile strips the binary... +#find $PKG -print0 | xargs -0 file | grep -e "executable" -e "shared object" | grep ELF \ +# | cut -f 1 -d : | xargs strip --strip-unneeded 2> /dev/null || true + +find $PKG/usr/man -type f -exec gzip -9 {} \; +for i in $( find $PKG/usr/man -type l ) ; do ln -s $( readlink $i ).gz $i.gz ; rm $i ; done + +mkdir -p $PKG/usr/doc/$PRGNAM-$VERSION +cp -a \ + CONTRIBUTING.md LICENSE README.md \ + $PKG/usr/doc/$PRGNAM-$VERSION + +cat $CWD/$PRGNAM.SlackBuild > $PKG/usr/doc/$PRGNAM-$VERSION/$PRGNAM.SlackBuild +cat $CWD/References > $PKG/usr/doc/$PRGNAM-$VERSION/References + +mkdir -p $PKG/install +cat $CWD/slack-desc > $PKG/install/slack-desc + +cd $PKG +/sbin/makepkg -l y -c n $OUTPUT/$PRGNAM-$VERSION-$ARCH-$BUILD$TAG.$PKGTYPE diff --git a/academic/muscle5/muscle5.info b/academic/muscle5/muscle5.info new file mode 100644 index 0000000000000..1749642b988a9 --- /dev/null +++ b/academic/muscle5/muscle5.info @@ -0,0 +1,10 @@ +PRGNAM="muscle5" +VERSION="5.1" +HOMEPAGE="https://github.com/rcedgar/muscle" +DOWNLOAD="https://github.com/rcedgar/muscle/archive/v5.1/muscle-5.1.tar.gz" +MD5SUM="99b5ef38a119994e7a8f0ea7a12b5987" +DOWNLOAD_x86_64="" +MD5SUM_x86_64="" +REQUIRES="" +MAINTAINER="Petar Petrov" +EMAIL="slackalaxy@gmail.com" diff --git a/academic/muscle5/slack-desc b/academic/muscle5/slack-desc new file mode 100644 index 0000000000000..bc8ca327050a3 --- /dev/null +++ b/academic/muscle5/slack-desc @@ -0,0 +1,19 @@ +# HOW TO EDIT THIS FILE: +# The "handy ruler" below makes it easier to edit a package description. +# Line up the first '|' above the ':' following the base package name, and +# the '|' on the right side marks the last column you can put a character in. +# You must make exactly 11 lines for the formatting to be correct. It's also +# customary to leave one space after the ':' except on otherwise blank lines. + + |-----handy-ruler------------------------------------------------------| +muscle5: muscle5 (MUSCLE 5: Next-generation MUSCLE) +muscle5: +muscle5: Muscle v5 is a major re-write of MUSCLE based on new algorithms. +muscle5: Compared to previous versions, Muscle v5 is much more accurate, +muscle5: faster, and scales to much larger datasets. +muscle5: +muscle5: https://drive5.com/muscle5/ +muscle5: https://drive5.com/muscle5/manual/ +muscle5: +muscle5: +muscle5: |