aboutsummaryrefslogtreecommitdiff
path: root/lib/enca/man/enca.1
diff options
context:
space:
mode:
Diffstat (limited to 'lib/enca/man/enca.1')
-rw-r--r--lib/enca/man/enca.1867
1 files changed, 0 insertions, 867 deletions
diff --git a/lib/enca/man/enca.1 b/lib/enca/man/enca.1
deleted file mode 100644
index 5eb8e11756..0000000000
--- a/lib/enca/man/enca.1
+++ /dev/null
@@ -1,867 +0,0 @@
-.de XA
-.RS
-.PP
-\\$1
-.RE
-.PP
-..
-.TH "enca" "1" "Sep 2009" "enca 1.11" " "
-.SH "NAME"
-.PP
-enca \-\- detect and convert encoding of text files
-.
-.
-.SH "SYNOPSIS"
-.PP
-\fBenca\fR [\fB\-L\fR \fILANGUAGE\fR] [\fIOPTION\fR]... [\fIFILE\fR]...
-.br
-\fBenconv\fR [\fB\-L\fR \fILANGUAGE\fR] [\fIOPTION\fR]... [\fIFILE\fR]...
-.
-.SH "INTRODUCTION AND EXAMPLES"
-.PP
-If you are lucky enough, the only two things you will ever need to know are:
-command
-.XA "enca \fIFILE\fR"
-will tell you which encoding file \fIFILE\fR uses (without changing it), and
-.XA "enconv \fIFILE\fR"
-will convert file \fIFILE\fR to your locale native encoding.
-To convert the file to some other encoding use the \fB-x\fR option
-(see \fB\-x\fR entry in section \fBOPTIONS\fR and sections \fBCONVERSION\fR
-and \fBENCODINGS\fR for details).
-.PP
-Both work with multiple files and standard input (output) too.
-E.g.
-.XA "enca \-x latin2 <sometext | lpr"
-assures file `sometext' is in ISO Latin\~2 when it's sent to printer.
-.PP
-The main reason why these command will fail and turn your files into
-garbage is that Enca needs to know their language to detect the encoding.
-It tries to determine your language and preferred charset from locale
-settings, which might not be what you want.
-.PP
-You can (or have to) use \fB\-L\fR option to tell it the right language.
-Suppose, you downloaded some Russian HTML file,
-`file.htm', it claims it's windows-1251 but it isn't.
-So you run
-.XA "enca \-L ru file.htm"
-and find out it's KOI8\-R (for example).
-Be warned, currently there are not many supported languages (see section
-\fBLANGUAGES\fR).
-.PP
-Another warning concerns the fact several Enca's features, namely its
-charset conversion capabilities, strongly depend on what other tools
-are installed on your system (see section \fBCONVERSION)\fR\-\-run
-.XA "enca \-\-version"
-to get list of features (see section \fBFEATURES\fR).
-Also try
-.XA "enca \-\-help"
-to get description of all other Enca options (and to find the rest of this
-manual page redundant).
-.
-.
-.SH "DESCRIPTION"
-.PP
-Enca reads given text files, or standard input when none are given,
-and uses knowledge about their language (must be supported by you)
-and a mixture of parsing, statistical analysis, guessing and
-black magic to determine their encodings, which it then prints to standard
-output (or it confesses it doesn't have any idea what the encoding could be).
-By default, Enca presents results as a multiline human-readable descriptions,
-several other formats are available\-\-see Output type selectors below.
-.PP
-Enca can also convert files to some other encoding \fIENC\fR
-when you ask for it\-\-either using a built\-in converter,
-some conversion library, or by calling an external converter.
-.PP
-Enca's primary goal is to be usable unattended, as an automatic conversion
-tool, though it perhaps have not reached this point yet (please see section
-\fBSECURITY\fR).
-.PP
-Please note except rare cases Enca really has to know the language of input
-files to give you a reliable answer.
-On the other hand, it can then cope quite well with files that are not purely
-textual or even detect charset of text strings inside some binary file;
-of course, it depends on the character of the non-text component.
-.PP
-Enca doesn't care about structure of input files, it views them as a uniform
-piece of text/data. In case of multipart files (e.g. mailboxes), you have to
-use some tool knowing the structure to extract the individual parts first.
-It's the cost of ability to detect encodings of any damaged, incomplete or
-otherwise incorrect files.
-.
-.
-.SH "OPTIONS"
-.PP
-There are several categories of options: operation mode options, output type
-selectors, guessing parameters, conversion parameters, general options and
-listings.
-.PP
-All long options can be abbreviated as long as they are unambiguous,
-mandatory parameters of long options are mandatory for short options too.
-.PP
-.
-.SS Operation modes
-.PP
-are following:
-.TP
-\fB\-c\fR, \fB\-\-auto\-convert\fR
-Equivalent to calling Enca as \fBenconv\fR.
-.sp
-If no output type selector is specified, detect file encodings, guess your
-preferred charset from locales, and convert files to it (only available with
-+target\-charset\-auto feature).
-.TP
-\fB\-g\fR, \fB\-\-guess\fR
-Equivalent to calling Enca as \fBenca\fR.
-.sp
-If no output type selector is specified, detect file encodings and report
-them.
-.PP
-.
-.SS Output type selectors
-.PP
-select what action Enca will take when it determines the encoding;
-most of them just choose between different names, formats and conventions
-how encodings can be printed, but one of them (\fB\-x\fR)
-is special: it tells Enca to recode files to some other
-encoding \fIENC\fR.
-These options are mutually exclusive; if you specify more than one output
-type selector the last one takes precedence.
-.sp
-Several output types represent charset name used by some other program,
-but not all these programs know all the charsets which Enca recognises.
-Be warned, Enca makes no difference between unrecognised charset and
-charset having no name in given namespace in such situations.
-.TP
-\fB\-d\fR, \fB\-\-details\fR
-It used to print a few pages of details about the guessing process, but since
-Enca is just a program linked against Enca library, this is not possible and
-this option is roughly equivalent to \fB\-\-human\-readable\fR,
-except it reports failure reason when Enca doesn't recoginize the encoding.
-.TP
-\fB\-e\fR, \fB\-\-enca\-name\fR
-Prints Enca's nice name of the charset, i.e., perhaps the most generally
-accepted and more or less human-readable charset identifier,
-with surfaces appended.
-.sp
-This name is used when calling an external converter, too.
-.TP
-\fB\-f\fR, \fB\-\-human\-readable\fR
-Prints verbal description of the detected charset and surfaces\-\-something
-a human understands best.
-This is the default behaviour.
-.sp
-The precise format is following: the first line contains charset name alone,
-and it's followed by zero or more indented lines containing names of detected
-surfaces.
-This format is not, however, suitable or intended for further
-machine-processing, and the verbal charset descriptions are like to change
-in the future.
-.TP
-\fB\-i\fR, \fB\-\-iconv\-name\fR
-Prints how \fIiconv\fR(3) (and/or \fIiconv\fR(1)) calls the detected charset.
-More precisely, it prints one, more or less arbitrarily chosen, alias
-accepted by iconv.
-A charset unknown to iconv counts as unknown.
-.sp
-This output type makes sense only when Enca is compiled with iconv support
-(feature +iconv\-interface).
-.TP
-\fB\-r\fR, \fB\-\-rfc1345\-name\fR
-Prints RFC\~1345 charset name.
-When such a name doesn't exist because RFC\~1345 doesn't define given
-encoding, some other name defined in some other RFC or just the name which
-author consideres `the most canonical', is printed.
-.sp
-Since RFC\~1345 doesn't define surfaces, no surface info is appended.
-.TP
-\fB\-m\fR, \fB\-\-mime\-name\fR
-Prints preferred MIME name of detected charset. This is the name you should
-normally use when fixing e-mails or web pages.
-.sp
-A charset not present in http://www.iana.org/assignments/character-sets
-counts as unknown.
-.TP
-\fB\-s\fR, \fB\-\-cstocs\-name\fR
-Prints how \fIcstocs\fR(1) calls the detected charset.
-A charset unknown to cstocs counts as unknown.
-.TP
-\fB\-n\fR, \fB\-\-name=\fR\fIWORD\fR
-Prints charset (encoding) name selected by \fIWORD\fR (can be abbreviated as
-long as is unambiguous).
-For names listed above, \fB\-\-name=\fR\fIWORD\fR is equivalent to
-\fB\-\-\fR\fIWORD\fR.
-.sp
-Using \fBaliases\fR as the output type causes Enca to print list of all
-accepted aliases of detected charset.
-.TP
-\fB\-x\fR, \fB\-\-convert\-to=\fR[\fB..\fR]\fIENC\fR
-Converts file to encoding \fIENC\fR.
-.sp
-The optional `..' before encoding name has no special meaning, except you can
-use it to remind yourself that, unlike in \fIrecode\fR(1), you should specify
-\fIdesired\fR encoding, instead of current.
-.sp
-You can use \fIrecode\fR(1) recoding chains or any other kind of braindead
-recoding specification for \fIENC\fR, provided that you tell Enca to use some
-tool understanding it for conversion (see section \fBCONVERSION\fR).
-.sp
-When Enca fails to determine the encoding, it prints a warning and leaves the
-the file as is; when it is run as a filter it tries to do its best to copy
-standard input to standard output unchanged.
-Nevertheless, you should not rely on it and do backup.
-.PP
-.
-.SS Guessing parameters
-.PP
-There's only one: \fB\-L\fR setting language of input files. This option is
-mandatory (but see below).
-.TP
-\fB\-L\fR, \fB\-\-language=\fR\fILANG\fR
-Sets language of input files to \fILANG\fR.
-.sp
-More precisely, \fILANG\fR can be any valid locale name (or alias with
-+locale\-alias feature) of some supported language.
-You can also specify `none' as language name, only multibyte encodings are
-recognised then.
-Run
-.sp
-enca \-\-list languages
-.sp
-to get list of supported languages.
-When you don't specify any language Enca tries to guess your language from
-locale settings and assumes input files use this language.
-See section \fBLANGUAGES\fR for details.
-.PP
-.
-.SS Conversion parameters
-.PP
-give you finer control of how charset conversion will be performed.
-They don't affect anything when \fB\-x\fR is not specified as output type.
-Please see section \fBCONVERSION\fR for the gory conversion details.
-.TP
-\fB\-C\fR, \fB\-\-try\-converters=\fR\fILIST\fR
-Appends comma separated \fILIST\fR to the list of converters that will
-be tried when you ask for conversion.
-Their names can be abbreviated as long as they are unambiguous.
-Run
-.sp
-enca \-\-list converters
-.sp
-to get list of all valid converter names (and see section \fBCONVERSION\fR
-for their description).
-.sp
-The default list depends on how Enca has been compiled, run
-.sp
-enca \-\-help
-.sp
-to find out default converter list.
-.sp
-Note the default list is used only when you don't specify \fB\-C\fR at all.
-Otherwise, the list is built as if it were initially empty and every
-\fB\-C\fR adds new converter(s) to it. Moreover, specifying
-\fBnone\fR as converter name causes clearing the converter list.
-.TP
-\fB\-E\fR, \fB\-\-external\-converter\-program=\fR\fIPATH\fR
-Sets external converter program name to \fIPATH\fR.
-Default external converter depends on how enca has been complied, and the
-possibility to use external converters may not be available at all.
-Run
-.sp
-enca \-\-help
-.sp
-to find out default converter program in your enca build.
-.PP
-.
-.SS General options
-.PP
-don't fit to other option categories...
-.TP
-\fB\-p\fR, \fB\-\-with\-filename\fR
-Forces Enca to prefix each result with corresponding file name.
-By default, Enca prefixes results with filenames when run on multiple files.
-.sp
-Standard input is printed as \fBSTDIN\fR
-and standard output as \fBSTDOUT\fR
-(the latter can be probably seen in error messages only).
-.TP
-\fB\-P\fR, \fB\-\-no\-filename\fR
-Forces Enca to not prefix results with file names.
-By default, Enca doesn't prefix result with file name when
-run on a single file (including standard input).
-.TP
-\fB\-V\fR, \fB\-\-verbose\fR
-Increases verbosity level (each use increases it by one).
-.sp
-Currently this option in not very useful because different parts of Enca
-respond differently to the same verbosity level, mostly not at all.
-.PP
-.
-.SS Listings
-.PP
-are all terminal, i.e. when Enca encounters some of them it prints
-the required listing and terminates without processing any following options.
-.TP
-\fB\-h\fR, \fB\-\-help\fR
-Prints brief usage help.
-.TP
-\fB\-G\fR, \fB\-\-license\fR
-Prints full Enca license (through a pager, if possible).
-.TP
-\fB\-l\fR, \fB\-\-list=\fR\fIWORD\fR
-Prints list specified by \fIWORD\fR (can be abbreviated as long as it is
-unambiguous).
-Available lists include:
-.sp
-\fBbuilt\-in\-charsets\fR.
-All encodings convertible by built\-in converter, by group
-(both input and output encoding must be from this list and belong to the same
-group for internal conversion).
-.sp
-\fBbuilt\-in\-encodings\fR.
-Equivalent to \fBbuilt\-in\-charsets\fR, but considered obsolete; will
-be accepted with a warning, for a while.
-.sp
-\fBconverters\fR.
-All valid converter names (to be used with \fB\-C\fR).
-.sp
-\fBcharsets\fR.
-All encodings (charsets).
-You can select what names will be printed with \fB\-\-name\fR or any
-name output type selector (of course, only encodings having a name in given
-namespace will be printed then), the selector must be specified \fIbefore\fR
-\fB\-\-list\fR.
-.sp
-\fBencodings\fR.
-Equivalent to \fBcharsets\fR, but considered obsolete; will
-be accepted with a warning, for a while.
-.sp
-\fBlanguages\fR.
-All supported languages together with charsets belonging to them.
-Note output type selects language name style, not charset name style here.
-.sp
-\fBnames\fR.
-All possible values of \fB\-\-name\fR option.
-.sp
-\fBlists\fR.
-All possible values of this option.
-(Crazy?)
-.sp
-\fBsurfaces\fR.
-All surfaces Enca recognises.
-.TP
-\fB\-v\fR, \fB\-\-version\fR
-Prints program version and list of features (see section \fBFEATURES\fR).
-.
-.
-.SH "CONVERSION"
-.PP
-Though Enca has been originally designed as a tool for guessing encoding
-only, it now features several methods of charset conversion.
-You can control which of them will be used with \fB\-C\fR.
-.PP
-Enca sequentially tries converters from the list specified by \fB\-C\fR
-until it finds some that
-is able to perform required conversion or until it exhausts the list.
-You should specify preferred converters first, less preferred later.
-External converter (\fBextern\fR)
-should be always specified last, only as last resort, since it's usually not
-possible to recover when it fails.
-The default list of converters always starts with \fBbuilt\-in\fR and then
-continues with the first one available from: \fBlibrecode\fR, \fBiconv\fR,
-nothing.
-.PP
-It should be noted when Enca says it is not able to perform the
-conversion it only means none of the converters is able to perform it.
-It can be still possible to perform the required conversion in several steps,
-using several converters, but to figure out how, human intelligence is
-probably needed.
-.PP
-.
-.SS Built\-in converter
-.PP
-is the simplest and far the fastest of all, can perform only
-a few byte-to-byte conversions and modifies files directly in place (may
-be considered dangerous, but is pretty efficient). You can get list of
-all encodings it can convert with
-.XA "enca \-\-list built\-in"
-Beside speed, its main advantage (and also disadvantage) is that it doesn't
-care: it simply converts characters having a representation in target
-encoding, doesn't touch anything else and never prints any error message.
-.sp
-This converter can be specified as \fBbuilt\-in\fR with \fB\-C\fR.
-.PP
-.
-.SS Librecode converter
-.PP
-is an interface to GNU recode library, that does the actual recoding job.
-It may or may not be compiled in; run
-.XA "enca \-\-version"
-to find out its availability in your enca build
-(feature +librecode\-interface).
-.sp
-You should be familiar with \fIrecode\fR(1) before using it,
-since recode is a quite sophisticated and powerful charset conversion tool.
-You may run into problems using it together with Enca
-particularly because Enca's support for surfaces not 100% compatible,
-because recode tries too hard to make the transformation reversible,
-because it sometimes silently ignores I/O errors,
-and because it's incredibily buggy.
-Please see GNU recode info pages for details about recode library.
-.sp
-This converter can be specified as \fBlibrecode\fR with \fB\-C\fR.
-.PP
-.
-.SS Iconv converter
-.PP
-is an interface to the UNIX98 \fIiconv\fR(3)
-conversion functions, that do the actual recoding job.
-It may or may not be compiled in; run
-.XA "enca \-\-version"
-to find out its availability in your enca build
-(feature +iconv\-interface).
-.sp
-While iconv is present on most today systems it only rarely
-offer some useful set of available conversions, the only notable exception
-being iconv from GNU libc.
-It is usually quite picky about surfaces, too (while, at the same time,
-not implementing surface conversion).
-It however probably represents the only standard(ized) tool
-able to perform conversion from/to Unicode.
-Please see iconv documentation about for details about its capabilities on
-your particular system.
-.sp
-This converter can be specified as \fBiconv\fR with \fB\-C\fR.
-.PP
-.
-.SS External converter
-.PP
-is an arbitrary external conversion tool that can be specified with
-\fB\-E\fR option (at most one can be defined simultaneously).
-There are some standard, provided together with enca: \fBcstocs\fR,
-\fBrecode\fR, \fBmap\fR, \fBumap\fR, and \fBpiconv\fR.
-All are wrapper scripts: for \fIcstocs\fR(1), \fIrecode\fR(1),
-\fImap\fR(1), \fIumap\fR(1), and \fIpiconv\fR(1).
-.sp
-Please note enca has little control what the external converter really does.
-If you set it to \fB/bin/rm\fR
-you are fully responsible for the consequences.
-.sp
-If you want to make your own converter to use with enca,
-you should know it is always called
-.XA "\fICONVERTER\fR \fIENC_CURRENT\fR \fIENC\fR \fIFILE\fR [\fB\-\fR]"
-where \fICONVERTER\fR is what has been set by \fB\-E\fR,
-\fIENC_CURRENT\fR is detected encoding,
-\fIENC\fR is what has been specified with \fB\-x\fR,
-and \fIFILE\fR is the file to convert, i.e. it is called for each file
-separately.
-The optional fourth parameter, \fB\-\fR, should cause (when present)
-sending result of conversion to standard output instead of overwriting
-the file \fIFILE\fR.
-The converter should also take care of not changing file permissions,
-returning error code\~1 when it fails and cleaning its temporary files.
-Please see the standard external converters for examples.
-.sp
-This converter can be specified as \fBextern\fR with \fI\-C\fR.
-.PP
-.
-.SS Default target charset
-.PP
-The starightforward way of specifying target charset is the \fB\-x\fR
-option, which overrides any defaults.
-When Enca is called as \fBenconv\fR, default target charset is selected
-exactly the same way as \fIrecode\fR(1) does it.
-.PP
-If the \fBDEFAULT_CHARSET\fR environment variable is set, it's used as the
-target charset.
-.PP
-Otherwise, if you system provides the \fInl_langinfo\fR(3) function, current
-locale's native charset is used as the target charset.
-.PP
-When both methods fail, Enca complains and terminates.
-.PP
-.
-.SS Reversibility notes
-.PP
-If reversibility is crucial for you, you shouldn't use enca as converter
-at all (or maybe you can, with very specifically designed \fIrecode\fR(1)
-wrapper).
-Otherwise you should at least know that there four
-basic means of handling inconvertible character entities:
-.sp
-fail\-\-this is a possibility, too, and incidentally it's exactly what current
-GNU libc iconv implementation does (recode can be also told to do it)
-.sp
-don't touch them\-\-this is what enca internal converter always does and
-recode can do; though it is not reversible, a human being is usually able to
-reconstruct the original (at least in principle)
-.sp
-approximate them\-\-this is what cstocs can do, and recode too, though
-differently; and the best choice if you
-just want to make the accursed text readable
-.sp
-drop them out\-\-this is what both recode and cstocs can do (cstocs can also
-replace these characters by some fixed character instead of mere ignoring);
-useful when the to\-be\-omitted characters contain only noise.
-.sp
-Please consult your favourite converter manual for details of this issue.
-Generally, if you are not lucky enough to have all convertible characters
-in you file, manual intervention is needed anyway.
-.PP
-.
-.SS Performance notes
-.PP
-Poor performance of available converters has been one of main reasons for
-including built\-in converter in enca.
-Try to use it whenever possible, i.e. when files in consideration are
-charset-clean enough or charset-messy enough so that its zero built\-in
-intelligence doesn't matter.
-It requires no extra disk space nor extra memory and can outperform
-\fIrecode\fR(1) more than 10 times on large files and Perl
-version (i.e. the faster one) of \fIcstocs\fR(1) more than 400 times on small
-files (in fact it's almost as fast as mere \fIcp\fR(1)).
-.PP
-Try to avoid external converters when it's not absolutely necessary since
-all the forking and moving stuff around is incredibily slow.
-.
-.
-.SH "ENCODINGS"
-.PP
-You can get list of recognised character sets with
-.XA "enca \-\-list charsets"
-and using \fB\-\-name\fR parameter you can select any name you want to be
-used in the listing.
-You can also list all surfaces with
-.XA "enca \-\-list surfaces"
-Encoding and surface names are case insensitive and non-alphanumeric
-characters are not taken into account.
-However, non-alphanumeric characters are mostly
-not allowed at all. The only allowed are: `\-', `_', `.', `:', and\~`/'
-(as charset/surface separator).
-So `ibm852' and `IBM-852' are the same, while `IBM 852' is not accepted.
-.PP
-.
-.SS Charsets
-.PP
-Following list of recognised charsets uses Enca's names (\fB\-e\fR) and
-verbal descriptions as reported by Enca (\fB\-f\fR):
-.PP
-.TS
-tab (@);
-l l.
-ASCII@7bit ASCII characters
-ISO-8859-2@ISO 8859-2 standard; ISO Latin 2
-ISO-8859-4@ISO 8859-4 standard; Latin 4
-ISO-8859-5@ISO 8859-5 standard; ISO Cyrillic
-ISO-8859-13@ISO 8859-13 standard; ISO Baltic; Latin 7
-ISO-8859-16@ISO 8859-16 standard
-CP1125@MS-Windows code page 1125
-CP1250@MS-Windows code page 1250
-CP1251@MS-Windows code page 1251
-CP1257@MS-Windows code page 1257; WinBaltRim
-IBM852@IBM/MS code page 852; PC (DOS) Latin 2
-IBM855@IBM/MS code page 855
-IBM775@IBM/MS code page 775
-IBM866@IBM/MS code page 866
-baltic@ISO-IR-179; Baltic
-KEYBCS2@Kamenicky encoding; KEYBCS2
-macce@Macintosh Central European
-maccyr@Macintosh Cyrillic
-ECMA-113@Ecma Cyrillic; ECMA-113
-KOI-8_CS_2@KOI8-CS2 code (`T602')
-KOI8-R@KOI8-R Cyrillic
-KOI8-U@KOI8-U Cyrillic
-KOI8-UNI@KOI8-Unified Cyrillic
-TeX@(La)TeX control sequences
-UCS-2@Universal character set 2 bytes; UCS-2; BMP
-UCS-4@Universal character set 4 bytes; UCS-4; ISO-10646
-UTF-7@Universal transformation format 7 bits; UTF-7
-UTF-8@Universal transformation format 8 bits; UTF-8
-CORK@Cork encoding; T1
-GBK@Simplified Chinese National Standard; GB2312
-BIG5@Traditional Chinese Industrial Standard; Big5
-HZ@HZ encoded GB2312
-unknown@Unrecognized encoding
-.TE
-.PP
-where \fBunknown\fR is not any real encoding,
-it's reported when Enca is not able to give a reliable answer.
-.PP
-.
-.SS Surfaces
-.PP
-Enca has some experimental support for so-called surfaces (see below).
-It detects following surfaces (not all can be applied to all charsets):
-.PP
-.TS
-tab (@);
-l l.
-/CR@CR line terminators
-/LF@LF line terminators
-/CRLF@CRLF line terminators
-N.A.@Mixed line terminators
-N.A.@Surrounded by/intermixed with non-text data
-/21@Byte order reversed in pairs (1,2 -> 2,1)
-/4321@Byte order reversed in quadruples (1,2,3,4 -> 4,3,2,1)
-N.A.@Both little and big endian chunks, concatenated
-/qp@Quoted-printable encoded
-.TE
-.PP
-Note some surfaces have N.A. in place of identifier\-\-they
-cannot be specified on command line, they can only be reported by Enca.
-This is intentional because they only inform you why the file cannot be
-considered surface-consistent instead of representing a real surface.
-.PP
-Each charset has its natural surface (called `implied' in recode) which is not
-reported, e.g., for IBM 852 charset it's `CRLF line terminators'.
-For UCS encodings, big endian is considered as natural surface;
-unusual byte orders are constructed from 21 and 4321 permutations:
-2143 is reported simply as 21,
-while 3412 is reported as combination of 4321 and 21.
-.PP
-Doubly-encoded UTF-8 is neither charset nor surface, it's just reported.
-.PP
-.
-.SS About charsets, encodings and surfaces
-.PP
-Charset is a set of character entities while encoding is its representation
-in the terms of bytes and bits.
-In Enca, the word \fIencoding\fR means the same as `representation of text',
-i.e. the relation between sequence of character entities constituting the
-text and sequence of bytes (bits) constituting the file.
-.PP
-So, encoding is both character set and so-called surface
-(line terminators, byte order, combining, Base64 transformation, etc.).
-Nevertheless, it proves convenient to work with some {charset,surface} pairs
-as with genuine charsets.
-So, as in \fIrecode\fR(1), all UCS- and UTF- encodings of Universal character
-set are called charsets.
-Please see recode documentation for more details of this issue.
-.PP
-The only good thing about surfaces is: when you don't start playing with
-them, neither Enca won't start and it will try to behave as much as
-possible as a surface-unaware program, even when talking to recode.
-.PP
-.
-.
-.SH "LANGUAGES"
-.PP
-Enca needs to know the language of input files to work reliably, at least
-in case of regular 8bit encoding.
-Multibyte encodings should be recognised for any Latin, Cyrillic or Greek
-language.
-.PP
-You can (or have to) use \fB\-L\fR option to tell Enca the language.
-Since people most often work with files in the same language for which they
-have configured locales, Enca tries tries to guess the language by examining
-value of \fBLC_CTYPE\fR and other locale categories
-(please see \fIlocale\fR(7)) and using it for the
-language when you don't specify any.
-Of course, it may be completely wrong and will give you nonsense answers and
-damage your files, so please don't forget to use the \fB\-L\fR option.
-You can also use \fBENCAOPT\fR environment variable to set a default language
-(see section \fBENVIRONMENT\fR).
-.PP
-Following languages are supported by Enca (each language is listed together
-with supported 8bit encodings).
-.PP
-.TS
-tab (@);
-l l.
-Belarussian@CP1251 IBM866 ISO\-8859\-5 KOI8\-UNI maccyr IBM855
-Bulgarian @CP1251 ISO\-8859\-5 IBM855 maccyr ECMA\-113
-Czech @ISO\-8859\-2 CP1250 IBM852 KEYBCS2 macce KOI\-8_CS_2 CORK
-Estonian @ISO\-8859\-4 CP1257 IBM775 ISO\-8859\-13 macce baltic
-Croatian @CP1250 ISO\-8859\-2 IBM852 macce CORK
-Hungarian @ISO\-8859\-2 CP1250 IBM852 macce CORK
-Lithuanian @CP1257 ISO\-8859\-4 IBM775 ISO\-8859\-13 macce baltic
-Latvian @CP1257 ISO\-8859\-4 IBM775 ISO\-8859\-13 macce baltic
-Polish @ISO\-8859\-2 CP1250 IBM852 macce ISO\-8859\-13 ISO\-8859\-16 baltic CORK
-Russian @KOI8\-R CP1251 ISO\-8859\-5 IBM866 maccyr
-Slovak @CP1250 ISO\-8859\-2 IBM852 KEYBCS2 macce KOI\-8_CS_2 CORK
-Slovene @ISO\-8859\-2 CP1250 IBM852 macce CORK
-Ukrainian @CP1251 IBM855 ISO\-8859\-5 CP1125 KOI8\-U maccyr
-Chinese @GBK BIG5 HZ
-none @
-.TE
-.PP
-The special language \fBnone\fR can be shortened to \fB__\fR, it
-contains no 8bit encodings, so only multibyte encodings are detected.
-.PP
-.
-.
-.SH "FEATURES"
-.PP
-Several Enca's features depend on what is available on your system and how
-it was compiled.
-You can get their list with
-.XA "enca \-\-version"
-Plus sign before a feature name means it's available, minus sign means
-this build lacks the particular feature.
-.PP
-\fBlibrecode\-interface\fR.
-Enca has interface to GNU recode library charset conversion functions.
-.sp
-\fBiconv\-interface\fR.
-Enca has interface to UNIX98 iconv charset conversion functions.
-.sp
-\fBexternal\-converter\fR.
-Enca can use external conversion programs (if you have some suitable
-installed).
-.sp
-\fBlanguage\-detection\fR.
-Enca tries to guess language (\fB\-L\fR) from locales. You don't need the
-\fB\-\-language\fR option, at least in principle.
-.sp
-\fBlocale\-alias\fR.
-Enca is able to decrypt locale aliases used for language names.
-.sp
-\fBtarget\-charset\-auto\fR.
-Enca tries to detect your preferred charset from locales.
-Option \fB\-\-auto\-convert\fR and calling Enca as \fBenconv\fR works, at
-least in principle.
-.sp
-\fBENCAOPT\fR.
-Enca is able to correctly parse this environment variable before command line
-parameters. Simple stuff like \fBENCAOPT="\-L uk"\fR will work even without
-this feature.
-.PP
-.
-.
-.SH "ENVIRONMENT"
-.PP
-The variable \fBENCAOPT\fR can hold set of default Enca options.
-Its content is interpreted before command line arguments.
-Unfortunately, this doesn't work everywhere (must have +ENCAOPT
-feature).
-.PP
-\fBLC_CTYPE\fR, \fBLC_COLLATE\fR, \fBLC_MESSAGES\fR
-(possibly inherited from \fBLC_ALL\fR or \fBLANG\fR) is used
-for guessing your language (must have +language-detection feature).
-.PP
-The variable \fBDEFAULT_CHARSET\fR can be used by \fBenconv\fR as the default
-target charset.
-.PP
-.
-.
-.SH "DIAGNOSTICS"
-.PP
-Enca returns exit code\~0 when all input files were successfully proceeded
-(i.e. all encodings were detected and all files were converted to required
-encoding, if conversion was asked for).
-Exit code\~1 is returned when Enca wasn't able to either guess encoding or
-perform conversion on any input file becuase it's not clever enough.
-Exit code\~2 is returned in case of serious (e.g. I/O) troubles.
-.PP
-.
-.
-.SH "SECURITY"
-.PP
-It should be possible to let Enca work unattended, it's its goal. However:
-.PP
-There's no warranty the detection works 100%. Don't bet on it, you can easily
-lose valuable data.
-.PP
-Don't use enca (the program), link to libenca instead if you want anything
-resembling security. You have to perform the eventual conversion yourself
-then.
-.PP
-Don't use external converters. Ideally, disable them compile-time.
-.PP
-Be aware of \fBENCAOPT\fR and all the built-in automagic guessing various
-things from environment, namely locales.
-.PP
-.
-.
-.SH "SEE ALSO"
-.PP
-\fIautoconvert\fR(1),
-\fIcstocs\fR(1),
-\fIfile\fR(1),
-\fIiconv\fR(1),
-\fIiconv\fR(3),
-\fInl_langinfo\fR(3),
-\fImap\fR(1),
-\fIpiconv\fR(1),
-\fIrecode\fR(1),
-\fIlocale\fR(5),
-\fIlocale\fR(7),
-\fIltt\fR(1),
-\fIumap\fR(1),
-\fIunicode\fR(7),
-\fIutf-8\fR(7),
-\fIxcode\fR(1)
-.PP
-.
-.
-.SH "KNOWN BUGS"
-.PP
-It has too many \fIunknown\fR bugs.
-.PP
-The idea of using \fBLC_*\fR value for language is certainly braindead.
-However I like it.
-.PP
-It can't backup files before mangling them.
-.PP
-In certain situations, it may behave incorrectly on >31bit file systems
-and/or over NFS (both untested but shouldn't cause problems in practice).
-.PP
-Built\-in converter does not convert character `ch' from \fIKOI8-CS2\fR,
-and possibly some other characters you've probably never heard about anyway.
-.PP
-EOL type recognition works poorly on Quoted-printable encoded files.
-This should be fixed someday.
-.PP
-There are no command line options to tune libenca parameters.
-This is intentional (Enca should DWIM) but sometimes this is a nuisance.
-.PP
-The manual page is too long, especially this section.
-This doesn't matter since nobody does read it.
-.PP
-Send bug reports to <http://bugs.cihar.com/>.
-.
-.
-.SH "TRIVIA"
-.PP
-Enca is Extremely Naive Charset Analyser.
-Nevertheless, the `enc' originally comes from `encoding'
-so the leading\~`e' should be read as in
-`encoding' not as in `extreme'.
-.
-.
-.SH "AUTHORS"
-.PP
-David Necas (Yeti) <yeti@physics.muni.cz>
-.PP
-Michal Cihar <michal@cihar.com>
-.sp
-Unicode data has been generated from various (free) on\-line resources or
-using GNU recode.
-Statistical data has been generated from various texts on the Net, I hope
-character counting doesn't break anyone's copyright.
-.
-.
-.SH "ACKNOWLEDGEMENTS"
-.PP
-Please see the file THANKS in distribution.
-.
-.
-.SH "COPYRIGHT"
-.PP
-Copyright (C) 2000-2003 David Necas (Yeti).
-.PP
-Copyright (C) 2009 Michal Cihar <michal@cihar.com>.
-.sp
-Enca is free software; you can redistribute it and/or modify it
-under the terms of version 2 of the GNU General Public License
-as published by the Free Software Foundation.
-.sp
-Enca is distributed in the hope that it will be useful,
-but WITHOUT ANY WARRANTY; without even the implied warranty
-of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
-See the GNU General Public License for more details.
-.sp
-You should have received a copy of the GNU General Public License
-along with Enca; if not, write to the Free Software Foundation,
-Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
-.