Revision history for Perl module EBook::Ishmael.
1.04 Mar 28 2025
- Added the -e|--encoding option: Specify the output encoding for outputted
text.
- Also added the ISHMAEL_ENCODING environment variable.
- Improve UTF8-handling for plain text ebook formats (text, zTXT, PalmDoc).
- Added support for the chawan web browser as a potential HTML formatter.
- When specifying cover image output, '.-' should be used now instead of '.*'
for suffix substitution. '.*' is now deprecated, but will still be
supported. However, it will be removed at some point in a future release.
- Die when stdout ('-') is given as output argument to --image|-g.
- '<body>' tags are no longer included in HTML/XHTML html dump.
- Added additional test environment variables to force enable/disable tests
for optional ebook formats.
- TEST_CBR, TEST_CB7, TEST_CHM
- Fix typos in documentation.
1.03 Mar 21 2025
- Add support for the KF8/AZW3 ebook format.
- Fix raw text dumping in Mobis.
- Fix Mobi HTML cleanup.
- ishmael verifies Mobi image records actually contain image data.
- Fix PalmDoc decoding errors.
1.02 Mar 14 2025
- Do not try to unzip CBZ archives to symlinked temporary directories.
Primarily an issue with Darwin platforms.
- Ishmael-formatted metadata (--metadata=ishmael) is now dumped in UTF8.
- Recognize DC metadata in X?HTML files.
- Recognize "generator" metadata in X?HTML files as software metadata.
- Removed dump tests.
- Improved Mobi encoding handling.
- ishmael dies if comic book archive does not have any images.
- Fixed unescaped left brace error in queequeg regex.
1.01 Mar 11 2025
- Force text browsers to dump output in UTF8.
- Fix dumping X?HTML files with no title.
- Add -e|--html-encoding option to queequeg: Force the use of a specified
encoding when reading HTML input.
- Remove unnecessary module loads.
- Use "remove_tree" instead of "rmtree".
- Fixed previous changelog date.
1.00 Mar 04 2025
This is a major upgrade that introduces many changes that may be incompatible
with previous versions of ishmael. Pay special attention to the Added,
Changes, and Removed sections of this changelog if this concerns you.
Added:
- Added the -g/--image option: Dump all images present in a given ebook to a
specified directory.
- Added support for additional ebook formats.
- Microsoft Compiled HTML (CHM)
- Comic Book archives (cb7, cbr, cbz)
- These new formats have introduced some new optional dependencies; chmlib
for CHM, 7z for cb7, and unrar for cbr.
- New metadata dump formats: pretty xml, normal xml, and normal JSON. See
the changes section for information on how the new metadata dumping system
works.
- Added EXAMPLES section to manual.
Changes:
- Rebranding as a general ebook dumper as opposed to just a plain text
converter.
- The -m/--metadata option can now dump multiple different formats of
metadata; ishmael (the original), json, pjson (the original --meta-json),
xml, and pxml. These formats are specified by an optional argument to
--metadata (--metadata=<form>). This has removed the need for --meta-json
to be a seperate option.
- Changed default -c/--cover behavior. Output is now written to a file
rather than stdout by default ("pick the right default"). You can also add
a '.*' (dot asterisk) to the end of the output path name which ishmael
will substitute for the image's format suffix.
- The -c/--cover option no longer "dies" if a cover image is not present
in an ebook.
Removed:
- Removed the -o/--output option. The new way to specify output is via a
second command-line argument following the given ebook file.
- Removed the -j/--meta-json option. Please use --metadata=pjson instead.
Fixes:
- When executing system commands via qx, ishmael now quotes arguments using
single quotes instead of double quotes. This should mean that arguments
with shell metacharacters should not cause unwanted behavior.
- ishmael no longer relies on an EPUB's metadata file to specify the 'dc'
namespace, which should fix reading some unconventionally formatted
EPUBs.
- ishmael now converts CP1252-encoded Mobis to UTF-8.
- Unix time handling has been fixed for PDB-based formats (Mobi, AZW,
PalmDoc, zTXT).
- ishmael no longer recognizes unset creation/modification dates in
PDB-based formats.
- Fixed HTML/XHTML identification heuristics.
- Fix documentation typos.
- Fix test typos.
Improvements:
- Format identification heuristics have been optimized.
0.07 Feb 25 2025
- Added -r/--raw option: Dumps the raw, unformatted text contents of a given
ebook.
- Added -c/--cover option: Dump the cover image of a given ebook if one is
present.
- As a result, pdftopng is an additional dependency if one wishes to dump
PDF covers. pdftopng should be included with most versions of
poppler-utils.
- MIME::Base64 was also added as a dependency, although it should be
included with Perl core.
- XHTML is now considered a seperate format from HTML (although its class is
derived from the HTML class, so it should act mostly the same except for
being called XHTML rather than HTML).
- Recognize some more FictionBook2 metadata.
- Improve some format heuristics.
- FictionBook2, HTML, XHTML
- When reading EPUBs, try not to dump items that are not under the
"application/xhtml+xml" media type.
- Moved PDB modules out of EBook namespace.
- Removed EBook::Ishmael::EBook::Skeleton.
0.06 Feb 22 2025
- When ran with no arguments, queequeg reads input from stdin.
- queequeg now accepts multiple files as argument.
- --output option now works with --metadata and --meta-json options.
- Removed extra newline from json output.
- No longer bothering categorizing Changes entries.
- AZW support.
- Add support for Huff/CDIC MOBI compression scheme.
0.05 Feb 17 2025
Added:
- queequeg: ishmael's own HTML dumper script that acts as a fallback
formatter if none are available.
Changes:
- Specify XML::LibXML version 1.70 as minimum required version.
Improvements:
- If temporary directory and working directory are not ok directories to
unzip epub to, try HOME directory.
- When reading HTML documents, ishmael attempts to recover malformed HTML.
Fixes:
- Check if working directory is writable instead of readable when creating
temporary epub extract directory.
0.04 Feb 14 2025
Added:
- Support for TEST_PDF environment variable in tests. This allows you to
force enable/disable tests for PDFs regardless of whether poppler utils
are installed or not.
Changes:
- Introduced Cwd dependency.
- Specified File::Temp version 0.10 as minimum required version.
Improvements:
- More accurate text heuristic.
Fixes:
- Try to unzip epub to working directory if temporary directory is
symlinked, as Archive::Zip does not support unzipping to symlinked
directories. This was primarily an issue with Darwin platforms.
- Corrected some documentation errors.
0.03 Feb 11 2025
Changes:
- ishmael now uses a format-agnostic interface for handling ebook metadata.
This means that although the metadata ishmael gathers will now be less
detailed, it should at least be consistent across different formats.
Fixes:
- Modify PDF tests to account for differing output between different
verions of pdfinfo.
- Use '-T text/html' option for w3m, so that it recognizes given data as
HTML and actually tries to format it.
0.02 Feb 10 2025
Added:
- --format can now recognize some alternative format names.
- fb2 for FictionBook2
- xhtml for HTML
Changes:
- Removed dumper list from help message.
Fixes:
- Fix the syntax of the width check in browser_dump() to make it compatible
with older Perl versions.
- Add missing developer documentation.
0.01 Feb 09 2025
- Initial release.