NAME

tconv_ext - tconv extended API

SYNOPSIS

#include <tconv.h>

tconv_t tconv_open_ext(const char *tocodes, const char *fromcodes, tconv_option_t *tconvOptionp);

void  tconv_trace_on(tconv_t tconvp);
void  tconv_trace_off(tconv_t tconvp);
void  tconv_trace(tconv_t tconvp, const char *fmts, ...);
char *tconv_error_set(tconv_t tconvp, const char *msgs);
char *tconv_error(tconv_t tconvp);
char *tconv_fromcode(tconv_t tconvp);
char *tconv_tocode(tconv_t tconvp);
short tconv_fuzzy_set(tconv_t tconvp, short fuzzyb);
short tconv_fuzzy(tconv_t tconvp);
short tconv_helper(tconv_t  tconvp,
                   void    *contextp,
                   short   (*producerp)(void *contextp, char **bufpp, size_t *countlp, short *eofbp),
                   short   (*consumerp)(void *contextp, char *bufp, size_t countl, short eofb, size_t *resultlp)
                   );

DESCRIPTION

tconv extended API is providing more entry points to query or control how tconv behaves: tconv is a generic layer on top of iconv(), ICU, etc. Therefore additional semantic is needed.

METHODS

tconv_open_ext

tconv_t tconv_open_ext(const char *tocodes, const char *fromcodes, tconv_option_t *tconvOptionp);

typedef void (*tconvTraceCallback_t)(void *userDatavp, const char *msgs);
typedef struct tconv_option {
  tconv_charset_t      *charsetp;
  tconv_convert_t      *convertp;
  tconvTraceCallback_t  traceCallbackp;
  void                 *traceUserDatavp;
} tconv_option_t;

tconv support two engine types: one for charset detection, one for character conversion. Each engine as its own option structure:

charsetp

Describe charset engine options.

convertp

Describe convertion engine options.

Logging is provided through the genericLogger package, and the developper may provide a function pointer with an associated context:

traceCallbackp

A function pointer.

traceUserDatavp

Function pointer opaque context.

If tconvOptionp is NULL, defaults will apply. Otherwise, if charsetp is NULL charset defaults apply, if convertp is NULL convertion defaults apply, and if traceCallbackp is NULL, no logging is possible.

charset engine

A charset engine may support three entry points:

typedef void *(*tconv_charset_new_t) (tconv_t tconvp, void *optionp);
typedef char *(*tconv_charset_run_t) (tconv_t tconvp, void *contextp, char *bytep, size_t bytel);
typedef void  (*tconv_charset_free_t)(tconv_t tconvp, void *contextp);

All entry points start with a tconvp pointer (that they can use to trigger logging, error setting).

The new is optional, have a pointer to an opaque (from tconv point of view) data area, and return a charset specific opaque context. If new is not NULL, then free must not be NULL, and will be called with the charset specific context pointer returned by new. When new is NULL, the charset specific context will be NULL.

The only required entry point is run, with a pointer to bytes, and the number of bytes.

charsetp must point to a structure defined as:

typedef struct tconv_charset {
  enum {
    TCONV_CHARSET_EXTERNAL = 0,
    TCONV_CHARSET_PLUGIN,
    TCONV_CHARSET_ICU,
    TCONV_CHARSET_CCHARDET,
  } charseti;
  union {
    tconv_charset_external_t         external;
    tconv_charset_plugin_t           plugin;
    tconv_charset_ICU_option_t      *ICUOptionp;
    tconv_charset_cchardet_option_t *cchardetOptionp;
  } u;
} tconv_charset_t;

i.e. a charset engine can be of four types:

TCONV_CHARSET_EXTERNAL

An external charset engine type is a structure that give explicitly the three entry points described at the beginning of this section, and a pointer to an opaque charset specific option area. It is defined as:

typedef struct tconv_charset_external {
  void                *optionp;
  tconv_charset_new_t  tconv_charset_newp;
  tconv_charset_run_t  tconv_charset_runp;
  tconv_charset_free_t tconv_charset_freep;
} tconv_charset_external_t;
TCONV_CHARSET_PLUGIN

The charset engine is dynamically loaded. A plugin definition is:

typedef struct tconv_charset_plugin {
  void *optionp;
  char *news;
  char *runs;
  char *frees;
  char *filenames;
} tconv_charset_plugin_t;

i.e. tconv will use filenames as the path of a shared library and will try to load it. optionp is a pointer to a charset specific option area. tconv will look to the three entry points named news, runs and frees:

news

If news is NULL, environment variable TCONV_ENV_CHARSET_NEW, else tconv_charset_newp will be looked at.

runs

If runs is NULL, environment variable TCONV_ENV_CHARSET_RUN, else tconv_charset_runp will be looked at.

frees

If frees is NULL, environment variable TCONV_ENV_CHARSET_FREE, else tconv_charset_freep will be looked at.

Please note that dynamically load is not always thread-safe, and tconv will not try to adapt to this situation. Therefore, it is up to the caller to make sure that tconv_open_ext() is called within a context that is not affected by an eventual non-thread-safe workflow (e.g. typically within a critical section, or at program startup).

TCONV_CHARSET_ICU

ICU built-in, available when tconv has been compiled with ICU. If tconv has not been compiled with such support, TCONV_CHARSET_ICU remain available, but using it will fail.

If ICUOptionp is not NULL, it must be a pointer to a structure defined as:

typedef struct tconv_charset_ICU_option {
  int confidencei;
} tconv_charset_ICU_option_t;

where confidencei is the minimum accepted confidence level. If NULL, a default of 10 is used, unless the environment variable TCONV_ENV_CHARSET_ICU_CONFIDENCE is set.

TCONV_CHARSET_CCHARDET

cchardet built-in, always available.

If cchardetOptionp is not NULL, it must be a pointer to a structure defined as:

typedef struct tconv_charset_cchardet_option {
  float confidencef;
} tconv_charset_cchardet_option_t;

where confidencef is the minimum accepted confidence level. If NULL, a default of 0.4f is used. This can also be set via the environment variable TCONV_ENV_CHARSET_CCHARDET_CONFIDENCE.

convert engine

A convert engine may support three entry points:

typedef void   *(*tconv_convert_new_t) (tconv_t tconvp, const char *tocodes, const char *fromcodes, void *optionp);
typedef size_t  (*tconv_convert_run_t) (tconv_t tconvp, void *contextp, char **inbufsp, size_t *inbytesleftlp, char **outbufsp, size_t *outbytesleftlp);
typedef int     (*tconv_convert_free_t)(tconv_t tconvp, void *contextp);

All entry points start with a tconvp pointer.

The new is optional, have a pointer to an opaque (from tconv point of view) data area, and return a convert specific opaque context. If new is not NULL, then free must not be NULL, and will be called with the convert specific context pointer returned by new. When new is NULL, the convert specific context will be NULL.

The only required entry point is run, with additional parameters that are the iconv() semantics: pointers to

a pointer to input bytes
number of input bytes
a pointer to output bytes
number of output bytes

convertp must point to a structure defined as:

typedef struct tconv_convert {
  enum {
    TCONV_CONVERT_EXTERNAL = 0,
    TCONV_CONVERT_PLUGIN,
    TCONV_CONVERT_ICU,
    TCONV_CONVERT_ICONV
  } converti;
  union {
    tconv_convert_external_t      external;
    tconv_convert_plugin_t        plugin;
    tconv_convert_ICU_option_t   *ICUOptionp;
    tconv_convert_iconv_option_t *iconvOptionp;
  } u;
} tconv_convert_t;

i.e. a convert engine can be of four types:

TCONV_CONVERT_EXTERNAL

An external convert engine type is a structure that give explicitly the three entry points described above, and a pointer to an opaque convert specific option area. It is defined as:

typedef struct tconv_convert_external {
  void                 *optionp;
  tconv_convert_new_t  tconv_convert_newp;
  tconv_convert_run_t  tconv_convert_runp;
  tconv_convert_free_t tconv_convert_freep;
} tconv_convert_external_t;
TCONV_CONVERT_PLUGIN

The convert engine is dynamically loaded. A plugin definition is:

typedef struct tconv_convert_plugin {
  void *optionp;
  char *news;
  char *runs;
  char *frees;
  char *filenames;
} tconv_convert_plugin_t;

i.e. tconv will use filenames as the path of a shared library and will try to load it. optionp is a pointer to a convert specific option area. tconv will look to the three entry points named news, runs and frees:

news

If news is NULL, environment variable TCONV_ENV_CONVERT_NEW, else tconv_convert_newp will be looked at.

runs

If runs is NULL, environment variable TCONV_ENV_CONVERT_RUN, else tconv_convert_runp will be looked at.

frees

If frees is NULL, environment variable TCONV_ENV_CONVERT_FREE, else tconv_convert_freep will be looked at.

Same remark about thread-safety as for the charset engine.

TCONV_CONVERT_ICU

ICU built-in, available when tconv has been compiled with ICU. If tconv has not been compiled with such support, TCONV_CONVERT_ICU remain available, but using it will fail.

If ICUOptionp is not NULL, it must be a pointer to a structure defined as:

typedef struct tconv_convert_ICU_option {
  size_t uCharCapacityl;
  short  fallbackb;
  int    signaturei;
} tconv_convert_ICU_option_t;

containing:

uCharCapacityl

ICU convertion always go through an UTF-16 internal buffer by design. uCharCapacityl is the number of bytes of this internal intermediary buffer. The default is 4096, unless environment variable TCONV_ENV_CONVERT_ICU_UCHARCAPACITY is set.

fallbackb

ICU convertion has an optional fallback mechanism for unknown characters. Default value is a false value, unless TCONV_ENV_CONVERT_ICU_FALLBACK is set.

signaturei

A signature may be added or removed on demand. If signaturei is lower than zero, signature is removed. If signaturei is higher than zero, signature is added. Else ICU default will apply. Default is 0, unless TCONV_ENV_CONVERT_ICU_SIGNATURE is set.

TCONV_CONVERT_ICONV

iconv built-in, available when tconv has been compiled with iconv. If tconv has not been compiled with such support, TCONV_CONVERT_ICONV remain available, but using it will fail.

If iconvOptionp is not NULL, it remains a noop, since the definition of corresponding type is:

typedef void tconv_convert_iconv_option_t;

which mean that tconv is then only a proxy to the iconv() with which it was compiled and linked.

tconv_trace_on

void  tconv_trace_on(tconv_t tconvp);

Set tracing. Then any call to tconv_trace() will trigger a call to traceCallbackp given in tconv_open_ext()'s option structure.

tconv_trace_off

void  tconv_trace_off(tconv_t tconvp);

Unset tracing.

tconv_trace

void  tconv_trace(tconv_t tconvp, const char *fmts, ...);

Formats a message string and call traceCallbackp if tracing is on.

tconv_error_set

char *tconv_error_set(tconv_t tconvp, const char *msgs);

Set a string that should a contain a more accurate description of the last error. Any engine should use that when a specific description exist. Default is use system's errno description.

tconv_error

char *tconv_error(tconv_t tconvp);

Get the latest value of specific error string.

tconv_fromcode

char *tconv_fromcode(tconv_t tconvp);

Get the source codeset.

tconv_tocode

char *tconv_tocode(tconv_t tconvp);

Get the destination codeset.

tconv_fuzzy_set

short tconv_fuzzy_set(tconv_t tconvp, short fuzzyb);

Sets and return the fuzzy mode. In some rare cases, it is possible that tconv cannot determine the converter family of the two charsets. In this case, and if the two normalised charset strings are equivalent, then a direct mode will happen. This is the meaning of fuzzy. This does never happen with the ICU built-in convert, but can happen with the ICONV built-in.

Fuzzy mode is prefered over doing iconv with the same charsets in input and output, because some iconv implementations simply fail to do that. The only side of the fuzzy mode is that it does not provide any charset validation. This is why the next method exists, in case the end-user application want to do something special when fuzzy mode is on.

tconv_fuzzy

short tconv_fuzzy(tconv_t tconvp);

A true value means that fuzzy mode is on.

tconv_helper

short tconv_helper(tconv_t  tconvp,
                   void    *contextp,
                   short   (*producerp)(void *contextp, char **bufpp, size_t *countlp, short *eofbp),
                   short   (*consumerp)(void *contextp, char *bufp, size_t countl, short eofb, size_t *resultlp)
                   );

From an end-user point of viez, the only important thing is to produce bytes that must be converted and to consume them. The tconv_helper method is totally hiding all the iconv API subtilities, leaving only the two methods that are meaningul for the vast majority of applications. The parameters are:

tconvp
a producer
a consumer

NOTES

tracing

tconv can trace itself, unless tconv has been compiled with -DTCONV_NDEBUG, which is the default. When compiled without -DTCONV_NDEBUG, default tracing level is 0, unless environment variable TCONV_ENV_TRACE is set and the value of the later is a true value.

specific error string

tconv internally limit the length of such string to 1024 bytes (including NUL).

normalized charset name

A charset name contains only letters in the range [a-z0-9+.:].

SEE ALSO

tconv(3), genericLogger(3)