NAME
tconv_ext - tconv extended API
SYNOPSIS
#include <tconv.h>
tconv_t tconv_open_ext(const char *tocodes, const char *fromcodes, tconv_option_t *tconvOptionp);
void tconv_trace_on(tconv_t tconvp);
void tconv_trace_off(tconv_t tconvp);
void tconv_trace(tconv_t tconvp, const char *fmts, ...);
char *tconv_error_set(tconv_t tconvp, const char *msgs);
char *tconv_error(tconv_t tconvp);
char *tconv_fromcode(tconv_t tconvp);
char *tconv_tocode(tconv_t tconvp);
short tconv_helper(tconv_t tconvp,
void *contextp,
short (*producerp)(void *contextp, char **bufpp, size_t *countlp, short *eofbp),
short (*consumerp)(void *contextp, char *bufp, size_t countl, short eofb, size_t *resultlp)
);
DESCRIPTION
tconv extended API is providing more entry points to query or control how tconv behaves: tconv is a generic layer on top of iconv(), ICU, etc. Therefore additional semantic is needed.
METHODS
tconv_open_ext
tconv_t tconv_open_ext(const char *tocodes, const char *fromcodes, tconv_option_t *tconvOptionp);
typedef void (*tconvTraceCallback_t)(void *userDatavp, const char *msgs);
typedef struct tconv_option {
tconv_charset_t *charsetp;
tconv_convert_t *convertp;
tconvTraceCallback_t traceCallbackp;
void *traceUserDatavp;
const char *fallbacks;
} tconv_option_t;
tconv support two engine types: one for charset detection, one for character conversion. Each engine as its own option structure:
- charsetp
-
Describe charset engine options.
- convertp
-
Describe convertion engine options.
Logging is provided through the genericLogger package, and the developper may provide a function pointer with an associated context:
- traceCallbackp
-
A function pointer.
- traceUserDatavp
-
Function pointer opaque context.
- fallbacks
-
Fallback charset when user gave none and the guess failed.
If tconvOptionp
is NULL, defaults will apply. Otherwise, if charsetp
is NULL charset defaults apply, if convertp
is NULL convertion defaults apply, and if traceCallbackp
is NULL, no logging is possible.
charset engine
A charset engine may support three entry points:
typedef void *(*tconv_charset_new_t) (tconv_t tconvp, void *optionp);
typedef char *(*tconv_charset_run_t) (tconv_t tconvp, void *contextp, char *bytep, size_t bytel);
typedef void (*tconv_charset_free_t)(tconv_t tconvp, void *contextp);
All entry points start with a tconvp
pointer (that they can use to trigger logging, error setting).
The new is optional, have a pointer to an opaque (from tconv point of view) data area, and return a charset specific opaque context. If new is not NULL, then free must not be NULL, and will be called with the charset specific context pointer returned by new. When new is NULL, the charset specific context will be NULL.
The only required entry point is run, with a pointer to bytes, and the number of bytes.
charsetp
must point to a structure defined as:
typedef struct tconv_charset {
enum {
TCONV_CHARSET_EXTERNAL = 0,
TCONV_CHARSET_PLUGIN,
TCONV_CHARSET_ICU,
TCONV_CHARSET_CCHARDET,
} charseti;
union {
tconv_charset_external_t external;
tconv_charset_plugin_t plugin;
tconv_charset_ICU_option_t *ICUOptionp;
tconv_charset_cchardet_option_t *cchardetOptionp;
} u;
} tconv_charset_t;
i.e. a charset engine can be of four types:
- TCONV_CHARSET_EXTERNAL
-
An external charset engine type is a structure that give explicitly the three entry points described at the beginning of this section, and a pointer to an opaque charset specific option area. It is defined as:
typedef struct tconv_charset_external { void *optionp; tconv_charset_new_t tconv_charset_newp; tconv_charset_run_t tconv_charset_runp; tconv_charset_free_t tconv_charset_freep; } tconv_charset_external_t;
- TCONV_CHARSET_PLUGIN
-
The charset engine is dynamically loaded. A plugin definition is:
typedef struct tconv_charset_plugin { void *optionp; char *news; char *runs; char *frees; char *filenames; } tconv_charset_plugin_t;
i.e. tconv will use
filenames
as the path of a shared library and will try to load it.optionp
is a pointer to a charset specific option area. tconv will look to the three entry points namednews
,runs
andfrees
:- news
-
If
news
is NULL, environment variableTCONV_ENV_CHARSET_NEW
, elsetconv_charset_newp
will be looked at. - runs
-
If
runs
is NULL, environment variableTCONV_ENV_CHARSET_RUN
, elsetconv_charset_runp
will be looked at. - frees
-
If
frees
is NULL, environment variableTCONV_ENV_CHARSET_FREE
, elsetconv_charset_freep
will be looked at.
Please note that dynamically load is not always thread-safe, and tconv will not try to adapt to this situation. Therefore, it is up to the caller to make sure that tconv_open_ext() is called within a context that is not affected by an eventual non-thread-safe workflow (e.g. typically within a critical section, or at program startup).
- TCONV_CHARSET_ICU
-
ICU built-in, available when tconv has been compiled with ICU. If tconv has not been compiled with such support,
TCONV_CHARSET_ICU
remain available, but using it will fail.If
ICUOptionp
is not NULL, it must be a pointer to a structure defined as:typedef struct tconv_charset_ICU_option { int confidencei; } tconv_charset_ICU_option_t;
where
confidencei
is the minimum accepted confidence level. If NULL, a default of 10 is used, unless the environment variableTCONV_ENV_CHARSET_ICU_CONFIDENCE
is set. - TCONV_CHARSET_CCHARDET
-
cchardet built-in, always available.
If
cchardetOptionp
is not NULL, it must be a pointer to a structure defined as:typedef struct tconv_charset_cchardet_option { float confidencef; } tconv_charset_cchardet_option_t;
where
confidencef
is the minimum accepted confidence level. If NULL, a default of 0.4f is used. This can also be set via the environment variableTCONV_ENV_CHARSET_CCHARDET_CONFIDENCE
.
convert engine
A convert engine may support three entry points:
typedef void *(*tconv_convert_new_t) (tconv_t tconvp, const char *tocodes, const char *fromcodes, void *optionp);
typedef size_t (*tconv_convert_run_t) (tconv_t tconvp, void *contextp, char **inbufsp, size_t *inbytesleftlp, char **outbufsp, size_t *outbytesleftlp);
typedef int (*tconv_convert_free_t)(tconv_t tconvp, void *contextp);
All entry points start with a tconvp
pointer.
The new is optional, have a pointer to an opaque (from tconv point of view) data area, and return a convert specific opaque context. If new is not NULL, then free must not be NULL, and will be called with the convert specific context pointer returned by new. When new is NULL, the convert specific context will be NULL.
The only required entry point is run, with additional parameters that are the iconv() semantics: pointers to
convertp
must point to a structure defined as:
typedef struct tconv_convert {
enum {
TCONV_CONVERT_EXTERNAL = 0,
TCONV_CONVERT_PLUGIN,
TCONV_CONVERT_ICU,
TCONV_CONVERT_ICONV
} converti;
union {
tconv_convert_external_t external;
tconv_convert_plugin_t plugin;
tconv_convert_ICU_option_t *ICUOptionp;
tconv_convert_iconv_option_t *iconvOptionp;
} u;
} tconv_convert_t;
i.e. a convert engine can be of four types:
- TCONV_CONVERT_EXTERNAL
-
An external convert engine type is a structure that give explicitly the three entry points described above, and a pointer to an opaque convert specific option area. It is defined as:
typedef struct tconv_convert_external { void *optionp; tconv_convert_new_t tconv_convert_newp; tconv_convert_run_t tconv_convert_runp; tconv_convert_free_t tconv_convert_freep; } tconv_convert_external_t;
- TCONV_CONVERT_PLUGIN
-
The convert engine is dynamically loaded. A plugin definition is:
typedef struct tconv_convert_plugin { void *optionp; char *news; char *runs; char *frees; char *filenames; } tconv_convert_plugin_t;
i.e. tconv will use
filenames
as the path of a shared library and will try to load it.optionp
is a pointer to a convert specific option area. tconv will look to the three entry points namednews
,runs
andfrees
:- news
-
If
news
is NULL, environment variableTCONV_ENV_CONVERT_NEW
, elsetconv_convert_newp
will be looked at. - runs
-
If
runs
is NULL, environment variableTCONV_ENV_CONVERT_RUN
, elsetconv_convert_runp
will be looked at. - frees
-
If
frees
is NULL, environment variableTCONV_ENV_CONVERT_FREE
, elsetconv_convert_freep
will be looked at.
Same remark about thread-safety as for the charset engine.
- TCONV_CONVERT_ICU
-
ICU built-in, available when tconv has been compiled with ICU. If tconv has not been compiled with such support,
TCONV_CONVERT_ICU
remain available, but using it will fail.If
ICUOptionp
is not NULL, it must be a pointer to a structure defined as:typedef struct tconv_convert_ICU_option { size_t uCharCapacityl; short fallbackb; int signaturei; } tconv_convert_ICU_option_t;
containing:
- uCharCapacityl
-
ICU convertion always go through an UTF-16 internal buffer by design.
uCharCapacityl
is the number of bytes of this internal intermediary buffer. The default is 4096, unless environment variableTCONV_ENV_CONVERT_ICU_UCHARCAPACITY
is set. - fallbackb
-
ICU convertion has an optional fallback mechanism for unknown characters. Default value is a false value, unless
TCONV_ENV_CONVERT_ICU_FALLBACK
is set. - signaturei
-
A signature may be added or removed on demand. If
signaturei
is lower than zero, signature is removed. Ifsignaturei
is higher than zero, signature is added. Else ICU default will apply. Default is 0, unlessTCONV_ENV_CONVERT_ICU_SIGNATURE
is set.
- TCONV_CONVERT_ICONV
-
iconv built-in, always available. No special option.
tconv_trace_on
void tconv_trace_on(tconv_t tconvp);
Set tracing. Then any call to tconv_trace() will trigger a call to traceCallbackp
given in tconv_open_ext()'s option structure.
tconv_trace_off
void tconv_trace_off(tconv_t tconvp);
Unset tracing.
tconv_trace
void tconv_trace(tconv_t tconvp, const char *fmts, ...);
Formats a message string and call traceCallbackp
if tracing is on.
tconv_error_set
char *tconv_error_set(tconv_t tconvp, const char *msgs);
Set a string that should a contain a more accurate description of the last error. Any engine should use that when a specific description exist. Default is use system's errno description.
tconv_error
char *tconv_error(tconv_t tconvp);
Get the latest value of specific error string.
tconv_fromcode
char *tconv_fromcode(tconv_t tconvp);
Get the source codeset.
tconv_tocode
char *tconv_tocode(tconv_t tconvp);
Get the destination codeset.
tconv_helper
short tconv_helper(tconv_t tconvp,
void *contextp,
short (*producerp)(void *contextp, char **bufpp, size_t *countlp, short *eofbp),
short (*consumerp)(void *contextp, char *bufp, size_t countl, short eofb, size_t *resultlp)
);
From an end-user point of viez, the only important thing is to produce bytes that must be converted and to consume them. The tconv_helper
method is totally hiding all the iconv API subtilities, leaving only the two methods that are meaningul for the vast majority of applications. The parameters are:
NOTES
- tracing
-
tconv can trace itself, unless tconv has been compiled with -DTCONV_NDEBUG, which is the default. When compiled without -DTCONV_NDEBUG, default tracing level is 0, unless environment variable
TCONV_ENV_TRACE
is set and the value of the later is a true value. - specific error string
-
tconv internally limit the length of such string to 1024 bytes (including NUL).
- normalized charset name
-
A charset name contains only letters in the range [a-z0-9+.:].