1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
|
#============================================================================
# Enca v1.12 (2009-10-29) guess and convert encoding of text files
# Copyright (C) 2000-2003 David Necas (Yeti) <yeti@physics.muni.cz>
# Copyright (C) 2009 Michal Cihar <michal@cihar.com>
#============================================================================
Contents
0. Developing programs utilizing libenca
1. How to add a new charset/encoding to libenca
2. How to add a new surface to libenca
3. How to add a new language to libenca
4. Automake, autoconf, libtool, ... note
0. Developing programs utilizing libenca
****************************************
* Look at libenca API documentation in devel-docs/html.
* Look into enca source how it uses libenca.
Note enca is quite a simple application (practically all libenca
interaction is in src/enca.c). It's single-threaded and uses one
language and one analyser all the time. Provided each thread has its own
analyser, libenca should be thread-safe (untested).
* Take names starting with ENCA, Enca, enca, _ENCA, _Enca, and _enca
as reserved.
* pkgconfig is supported, you can use PKG_CHECK_MODULES to check for libenca
in your configure scripts
1. How to add a new charset/encoding
************************************
(optional steps are marked `[optional]'):
iconvcap.c:
* Add a new test (even if you are 100% sure iconv will never support it),
please see top of iconvcap.c for some documentation how it works.
tools/encodings.dat:
* Add a new entry.
* Use @ICONV_NAME_<name>@ (as it will appear in iconvcap output) for
iconv names.
tools/iconvenc.null:
* Add it (with NULL)
Specifically, for regular 8bit (language dependent) charsets:
lib/unicodemap.c:
* Add a new map to Unicode (UCS-2) unicode_map_...[].
* Add a new UNICODE_MAP[] entry.
lib/filters.c: [optional]
* Create a new filter or make an alias of an existing filter.
lib/lang_??.c:
* Add the new encoding to some existing language(s).
* Add appropriate filters or hooks [optional].
data/maps/??.map:
* Add a new map to Unicode (UCS-2)
Specifically, for multibyte encodings:
lib/multibyte.c:
* Create a new check function.
* Put it into appropriate ascii/8bit/binary test group
ENCA_MULTIBYTE_TESTS_ASCII[], ENCA_MULTIBYTE_TESTS_8BIT[],
ENCA_MULTIBYTE_TESTS_BINARY[].
* Put strict tests (i.e. test which may fail) first, looks-like tests
last.
2. How to add a new surface
***************************
* Try to ask the author what to do, since this may be complicated, or
* Hack, basically it must be added to lib/enca.h EncaSurface enum,
to lib/encnames.c SURFACE_INFO[] a detection method must be added to
lib/guess.c and now the most complicated part: this new method must be
used ``in the right places'' in lib/guess.c make_guess().
3. How to add a new language
****************************
Create a new language file:
* Create new lib/lang_....c files by copying some existing (use locale code
for names)
* Fill all encoding and occurence data, create filters and hooks (see
filters.c too). You can do it manually, but look how it's done for
existing languages in data/* and read data/README.
lib/internal.h:
* Add new ENCA_LANGUAGE_....
src/lang.c:
* Add a new LANGUAGE_LIST[] entry pointing to the ENCA_LANGUAGE_....
4. Automake, autoconf, libtool, ... note
****************************************
If you run ./autogen.sh and it finishes OK, you are lucky and can expect
things to work.
You have to give --enable-maintainer-mode to ./configure (or ./autogen) to
build dists and/or the strange stuff in tools/, data/, tests/, and
devel-docs/.
|