gendict(1) — Linux manual page

NAME | SYNOPSIS | DESCRIPTION | OPTIONS | CAVEATS | ENVIRONMENT | AUTHORS | VERSION | COPYRIGHT | SEE ALSO | COLOPHON

GENDICT(1)                     ICU 67.1 Manual                    GENDICT(1)

NAME top

       gendict - Compiles word list into ICU string trie dictionary

SYNOPSIS top

       gendict [ --uchars | --bytes --transform transform ] [ -h, -?, --help
       ] [ -V, --version ] [ -c, --copyright ] [ -v, --verbose ] [ -i,
       --icudatadir directory ]  input-file  output-file

DESCRIPTION top

       gendict reads the word list from dictionary-file and creates a string
       trie dictionary file. Normally this data file has the .dict
       extension.

       Words begin at the beginning of a line and are terminated by the
       first whitespace.  Lines that begin with whitespace are ignored.

OPTIONS top

       -h, -?, --help
              Print help about usage and exit.

       -V, --version
              Print the version of gendict and exit.

       -c, --copyright
              Embeds the standard ICU copyright into the output-file.

       -v, --verbose
              Display extra informative messages during execution.

       -i, --icudatadir directory
              Look for any necessary ICU data files in directory.  For
              example, the file pnames.icu must be located when ICU's data
              is not built as a shared library.  The default ICU data
              directory is specified by the environment variable ICU_DATA.
              Most configurations of ICU do not require this argument.

       --uchars
              Set the output trie type to UChar. Mutually exclusive with
              --bytes.

       --bytes
              Set the output trie type to Bytes. Mutually exclusive with
              --uchars.

       --transform
              Set the transform type. Should only be specified with --bytes.
              Currently supported transforms are: offset-<hex-number>, which
              specifies an offset to subtract from all input characters.  It
              should be noted that the offset transform also maps U+200D to
              0xFF and U+200C to 0xFE, in order to offer compatibility to
              languages that require these characters.  A transform must be
              specified for a bytes trie, and when applied to the non-value
              characters in the input-file must produce output between 0x00
              and 0xFF.

        input-file
              The source file to read.

        output-file
              The file to write the output dictionary to.

CAVEATS top

       The input-file is assumed to be encoded in UTF-8.  The integers in
       the input-file that are used as values must be made up of ASCII
       digits. They may be specified either in hex, by using a 0x prefix, or
       in decimal.  Either --bytes or --uchars must be specified.

ENVIRONMENT top

       ICU_DATA  Specifies the directory containing ICU data. Defaults to
                 ${prefix}/share/icu/67.1/.  Some tools in ICU depend on the
                 presence of the trailing slash. It is thus important to
                 make sure that it is present if ICU_DATA is set.

AUTHORS top

       Maxime Serrano

VERSION top

       1.0

COPYRIGHT top

       Copyright (C) 2012 International Business Machines Corporation and
       others

SEE ALSO top

       http://www.icu-project.org/userguide/boundaryAnalysis.html 

COLOPHON top

       This page is part of the ICU (International Components for Unicode)
       project.  Information about the project can be found at 
       ⟨http://site.icu-project.org/home⟩.  If you have a bug report for this
       manual page, see ⟨http://site.icu-project.org/bugs⟩.  This page was
       obtained from the project's upstream Git repository
       ⟨https://github.com/unicode-org/icu⟩ on 2020-08-13.  (At that time,
       the date of the most recent commit that was found in the repository
       was 2020-08-12.)  If you discover any rendering problems in this HTML
       version of the page, or you believe there is a better or more up-to-
       date source for the page, or you have corrections or improvements to
       the information in this COLOPHON (which is not part of the original
       manual page), send a mail to man-pages@man7.org

ICU MANPAGE                      1 June 2012                      GENDICT(1)