The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

Image::Leptonica::Func::psio1

VERSION

version 0.04

psio1.c

psio1.c

  |=============================================================|
  |                         Important note                      |
  |=============================================================|
  | Some of these functions require libtiff, libjpeg and libz.  |
  | If you do not have these libraries, you must set            |
  |     #define  USE_PSIO     0                                 |
  | in environ.h.  This will link psio1stub.c                   |
  |=============================================================|

   This is a PostScript "device driver" for wrapping images
   in PostScript.  The images can be rendered by a PostScript
   interpreter for viewing, using evince or gv.  They can also be
   rasterized for printing, using gs or an embedded interpreter
   in a PostScript printer.  And they can be converted to a pdf
   using gs (ps2pdf).

   Convert specified files to PS
        l_int32          convertFilesToPS()
        l_int32          sarrayConvertFilesToPS()
        l_int32          convertFilesFittedToPS()
        l_int32          sarrayConvertFilesFittedToPS()
        l_int32          writeImageCompressedToPSFile()

   Convert mixed text/image files to PS
        l_int32          convertSegmentedPagesToPS()
        l_int32          pixWriteSegmentedPageToPS()
        l_int32          pixWriteMixedToPS()

   Convert any image file to PS for embedding
        l_int32          convertToPSEmbed()

   Write all images in a pixa out to PS
        l_int32          pixaWriteCompressedToPS()

These PostScript converters are used in three different ways.

(1) For embedding a PS file in a program like TeX.
    convertToPSEmbed() handles this for levels 1, 2 and 3 output,
    and prog/converttops wraps this in an executable.
    converttops is a generalization of Thomas Merz's jpeg2ps wrapper,
    in that it works for all types (formats, depth, colormap)
    of input images and gives PS output in one of these formats
      * level 1 (uncompressed)
      * level 2 (compressed ccittg4 or dct)
      * level 3 (compressed flate)

(2) For composing a set of pages with any number of images
    painted on them, in either level 2 or level 3 formats.

(3) For printing a page image or a set of page images, at a
    resolution that optimally fills the page, using
    convertFilesFittedToPS().

The top-level calls of utilities in category 2, which can compose
multiple images on a page, and which generate a PostScript file for
printing or display (e.g., conversion to pdf), are:
    convertFilesToPS()
    convertFilesFittedToPS()
    convertSegmentedPagesToPS()

All images are output with page numbers.  Bounding box hints are
more subtle.  They must be included for embeding images in
TeX, for example, and the low-level writers include bounding
box hints by default.  However, these hints should not be included for
multi-page PostScript that is composed of a sequence of images;
consequently, they are not written when calling higher level
functions such as convertFilesToPS(), convertFilesFittedToPS()
and convertSegmentedPagesToPS().  The function l_psWriteBoundingBox()
sets a flag to give low-level control over this.

FUNCTIONS

convertFilesFittedToPS

l_int32 convertFilesFittedToPS ( const char *dirin, const char *substr, l_float32 xpts, l_float32 ypts, const char *fileout )

convertFilesFittedToPS()

    Input:  dirin (input directory)
            substr (<optional> substring filter on filenames; can be NULL)
            xpts, ypts (desired size in printer points; use 0 for default)
            fileout (output ps file)
    Return: 0 if OK, 1 on error

Notes:
    (1) This generates a PS file for all files in a specified directory
        that contain the substr pattern to be matched.
    (2) Each image is written to a separate page in the output PS file.
    (3) All images are written compressed:
            * if tiffg4  -->  use ccittg4
            * if jpeg    -->  use dct
            * all others -->  use flate
        If the image is jpeg or tiffg4, we use the existing compressed
        strings for the encoding; otherwise, we read the image into
        a pix and flate-encode the pieces.
    (4) The resolution is internally determined such that the images
        are rendered, in at least one direction, at 100% of the given
        size in printer points.  Use 0.0 for xpts or ypts to get
        the default value, which is 612.0 or 792.0, rsp.
    (5) The size of the PostScript file is independent of the resolution,
        because the entire file is encoded.  The @xpts and @ypts
        parameter tells the PS decomposer how to render the page.

convertFilesToPS

l_int32 convertFilesToPS ( const char *dirin, const char *substr, l_int32 res, const char *fileout )

convertFilesToPS()

    Input:  dirin (input directory)
            substr (<optional> substring filter on filenames; can be NULL)
            res (typ. 300 or 600 ppi)
            fileout (output ps file)
    Return: 0 if OK, 1 on error

Notes:
    (1) This generates a PS file for all image files in a specified
        directory that contain the substr pattern to be matched.
    (2) Each image is written to a separate page in the output PS file.
    (3) All images are written compressed:
            * if tiffg4  -->  use ccittg4
            * if jpeg    -->  use dct
            * all others -->  use flate
        If the image is jpeg or tiffg4, we use the existing compressed
        strings for the encoding; otherwise, we read the image into
        a pix and flate-encode the pieces.
    (4) The resolution is often confusing.  It is interpreted
        as the resolution of the output display device:  "If the
        input image were digitized at 300 ppi, what would it
        look like when displayed at res ppi."  So, for example,
        if res = 100 ppi, then the display pixels are 3x larger
        than the 300 ppi pixels, and the image will be rendered
        3x larger.
    (5) The size of the PostScript file is independent of the resolution,
        because the entire file is encoded.  The res parameter just
        tells the PS decomposer how to render the page.  Therefore,
        for minimum file size without loss of visual information,
        if the output res is less than 300, you should downscale
        the image to the output resolution before wrapping in PS.
    (6) The "canvas" on which the image is rendered, at the given
        output resolution, is a standard page size (8.5 x 11 in).

convertSegmentedPagesToPS

l_int32 convertSegmentedPagesToPS ( const char *pagedir, const char *pagestr, const char *maskdir, const char *maskstr, l_int32 numpre, l_int32 numpost, l_int32 maxnum, l_float32 textscale, l_float32 imagescale, l_int32 threshold, const char *fileout )

convertSegmentedPagesToPS()

    Input:  pagedir (input page image directory)
            pagestr (<optional> substring filter on page filenames;
                     can be NULL)
            maskdir (input mask image directory)
            maskstr (<optional> substring filter on mask filenames;
                     can be NULL)
            numpre (number of characters in name before number)
            numpost (number of characters in name after number)
            maxnum (only consider page numbers up to this value)
            textscale (scale of text output relative to pixs)
            imagescale (scale of image output relative to pixs)
            threshold (for binarization; typ. about 190; 0 for default)
            fileout (output ps file)
    Return: 0 if OK, 1 on error

Notes:
    (1) This generates a PS file for all page image and mask files in two
        specified directories and that contain the page numbers as
        specified below.  The two directories can be the same, in which
        case the page and mask files are differentiated by the two
        substrings for string matches.
    (2) The page images are taken in lexicographic order.
        Mask images whose numbers match the page images are used to
        segment the page images.  Page images without a matching
        mask image are scaled, thresholded and rendered entirely as text.
    (3) Each PS page is generated as a compressed representation of
        the page image, where the part of the image under the mask
        is suitably scaled and compressed as DCT (i.e., jpeg), and
        the remaining part of the page is suitably scaled, thresholded,
        compressed as G4 (i.e., tiff g4), and rendered by painting
        black through the resulting text mask.
    (4) The scaling is typically 2x down for the DCT component
        (@imagescale = 0.5) and 2x up for the G4 component
        (@textscale = 2.0).
    (5) The resolution is automatically set to fit to a
        letter-size (8.5 x 11 inch) page.
    (6) Both the DCT and the G4 encoding are PostScript level 2.
    (7) It is assumed that the page number is contained within
        the basename (the filename without directory or extension).
        @numpre is the number of characters in the basename
        preceeding the actual page numer; @numpost is the number
        following the page number.  Note: the same numbers must be
        applied to both the page and mask image names.
    (8) To render a page as is -- that is, with no thresholding
        of any pixels -- use a mask in the mask directory that is
        full size with all pixels set to 1.  If the page is 1 bpp,
        it is not necessary to have a mask.

convertToPSEmbed

l_int32 convertToPSEmbed ( const char *filein, const char *fileout, l_int32 level )

convertToPSEmbed()

    Input:  filein (input image file -- any format)
            fileout (output ps file)
            level (compression: 1 (uncompressed), 2 or 3)
    Return: 0 if OK, 1 on error

Notes:
    (1) This is a wrapper function that generates a PS file with
        a bounding box, from any input image file.
    (2) Do the best job of compression given the specified level.
        @level=3 does flate compression on anything that is not
        tiffg4 (1 bpp) or jpeg (8 bpp or rgb).
    (3) If @level=2 and the file is not tiffg4 or jpeg, it will
        first be written to file as jpeg with quality = 75.
        This will remove the colormap and cause some degradation
        in the image.
    (4) The bounding box is required when a program such as TeX
        (through epsf) places and rescales the image.  It is
        sized for fitting the image to an 8.5 x 11.0 inch page.

pixWriteMixedToPS

l_int32 pixWriteMixedToPS ( PIX *pixb, PIX *pixc, l_float32 scale, l_int32 pageno, const char *fileout )

pixWriteMixedToPS()

    Input:  pixb (<optionall> 1 bpp "mask"; typically for text)
            pixc (<optional> 8 or 32 bpp image regions)
            scale (relative scale factor for rendering pixb
                  relative to pixc; typ. 4.0)
            pageno (page number in set; use 1 for new output file)
            fileout (output ps file)
    Return: 0 if OK, 1 on error

Notes:
    (1) This low level function generates the PS string for a mixed
        text/image page, and adds it to an existing file if
        @pageno > 1.
    (2) The two images (pixb and pixc) are typically generated at the
        resolution that they will be rendered in the PS file.
    (3) pixb is the text component.  In the PostScript world, we think of
        it as a mask through which we paint black.
    (4) pixc is the (typically halftone) image component.  It is
        white in the rest of the page.  To minimize the size of the
        PS file, it should be rendered at a resolution that is at
        least equal to its actual resolution.
    (5) @scale gives the ratio of resolution of pixb to pixc.
        Typical resolutions are: 600 ppi for pixb, 150 ppi for pixc;
        so @scale = 4.0.  If one of the images is not defined,
        the value of @scale is ignored.
    (6) We write pixc with DCT compression (jpeg).  This is followed
        by painting the text as black through the mask pixb.  If
        pixc doesn't exist (alltext), we write the text with the
        PS "image" operator instead of the "imagemask" operator,
        because ghostscript's ps2pdf is flaky when the latter is used.
    (7) The actual output resolution is determined by fitting the
        result to a letter-size (8.5 x 11 inch) page.

pixWriteSegmentedPageToPS

l_int32 pixWriteSegmentedPageToPS ( PIX *pixs, PIX *pixm, l_float32 textscale, l_float32 imagescale, l_int32 threshold, l_int32 pageno, const char *fileout )

pixWriteSegmentedPageToPS()

    Input:  pixs (all depths; colormap ok)
            pixm (<optional> 1 bpp segmentation mask over image region)
            textscale (scale of text output relative to pixs)
            imagescale (scale of image output relative to pixs)
            threshold (threshold for binarization; typ. 190)
            pageno (page number in set; use 1 for new output file)
            fileout (output ps file)
    Return: 0 if OK, 1 on error

Notes:
    (1) This generates the PS string for a mixed text/image page,
        and adds it to an existing file if @pageno > 1.
        The PS output is determined by fitting the result to
        a letter-size (8.5 x 11 inch) page.
    (2) The two images (pixs and pixm) are at the same resolution
        (typically 300 ppi).  They are used to generate two compressed
        images, pixb and pixc, that are put directly into the output
        PS file.
    (3) pixb is the text component.  In the PostScript world, we think of
        it as a mask through which we paint black.  It is produced by
        scaling pixs by @textscale, and thresholding to 1 bpp.
    (4) pixc is the image component, which is that part of pixs under
        the mask pixm.  It is scaled from pixs by @imagescale.
    (5) Typical values are textscale = 2.0 and imagescale = 0.5.
    (6) If pixm == NULL, the page has only text.  If it is all black,
        the page is all image and has no text.
    (7) This can be used to write a multi-page PS file, by using
        sequential page numbers with the same output file.  It can
        also be used to write separate PS files for each page,
        by using different output files with @pageno = 0 or 1.

pixaWriteCompressedToPS

l_int32 pixaWriteCompressedToPS ( PIXA *pixa, const char *fileout, l_int32 res, l_int32 level )

pixaWriteCompressedToPS()

    Input:  pixa (any set of images)
            fileout (output ps file)
            res (of input image)
            level (compression: 2 or 3)
    Return: 0 if OK, 1 on error

Notes:
    (1) This generates a PS file of multiple page images, all
        with bounding boxes.
    (2) It compresses to:
            cmap + level2:        jpeg
            cmap + level3:        flate
            1 bpp:                tiffg4
            2 or 4 bpp + level2:  jpeg
            2 or 4 bpp + level3:  flate
            8 bpp:                jpeg
            16 bpp:               flate
            32 bpp:               jpeg
    (3) To generate a pdf, use: ps2pdf <infile.ps> <outfile.pdf>

sarrayConvertFilesFittedToPS

l_int32 sarrayConvertFilesFittedToPS ( SARRAY *sa, l_float32 xpts, l_float32 ypts, const char *fileout )

sarrayConvertFilesFittedToPS()

    Input:  sarray (of full path names)
            xpts, ypts (desired size in printer points; use 0 for default)
            fileout (output ps file)
    Return: 0 if OK, 1 on error

Notes:
    (1) See convertFilesFittedToPS()

sarrayConvertFilesToPS

l_int32 sarrayConvertFilesToPS ( SARRAY *sa, l_int32 res, const char *fileout )

sarrayConvertFilesToPS()

    Input:  sarray (of full path names)
            res (typ. 300 or 600 ppi)
            fileout (output ps file)
    Return: 0 if OK, 1 on error

Notes:
    (1) See convertFilesToPS()

writeImageCompressedToPSFile

l_int32 writeImageCompressedToPSFile ( const char *filein, const char *fileout, l_int32 res, l_int32 *pfirstfile, l_int32 *pindex )

writeImageCompressedToPSFile()

    Input:  filein (input image file)
            fileout (output ps file)
            res (output printer resolution)
            &firstfile (<input and return> 1 if the first image;
                        0 otherwise)
            &index (<input and return> index of image in output ps file)
    Return: 0 if OK, 1 on error

Notes:
    (1) This wraps a single page image in PS.
    (2) The input file can be in any format.  It is compressed as follows:
           * if in tiffg4  -->  use ccittg4
           * if in jpeg    -->  use dct
           * all others    -->  use flate
    (3) Before the first call, set @firstpage = 1.  After writing
        the first page, it will be set to 0.
    (4) @index is incremented if the page is successfully written.

AUTHOR

Zakariyya Mughal <zmughal@cpan.org>

COPYRIGHT AND LICENSE

This software is copyright (c) 2014 by Zakariyya Mughal.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.