NAME
Squeeze.pm - Shorten text to minimum syllables by using hash and vowel deletion
REVISION
$Id: Squeeze.pm,v 1.24 1998/10/08 14:58:15 jaalto Exp $
SYNOPSIS
use Squeeze.pm; # imnport only function
use Squeeze qw( :ALL ); # import all functions and variables
use English;
while (<>)
{
print SqueezeText $ARG;
}
DESCRIPTION
Squeeze text (English) to most compact format possibly so that it is barely readable. You shold convert all text to lowercase for maximum compression, because optimisations have been designed mostly fr unpapitalised letters.
Warning: Each line is processed multiple times, so prepare for slow conversion time
You can use this module eg to preprocess text before it is sent to electronic media that has maximum text size limit. For example Pagers have some arbitrary text size limit, say 200 characters, which you want to fill as much as possible. Alternatively you may have GSM Cellular phone wich is capable of receiving Short Messages (SMS), whose text limit is 160 characters. To your amusement, the description text of this paragraph has been converted below using this library's SqueezeText() function . See yourself if it's readable (Yes, it takes some time to get used to). The compress ratio is typically 30-40%
u _n use thi mod to prprce txt bfre i_s snt to
elrnic mda t_hs max txt siz lim. f_xmple Pag hv
som abitry txt siz lim, say 200 chr, W/ u wnt to fll
as mch as psbleAlternatvly u may hv GSM Cllar PH wch is
cpble of rcivng Short msg (SMS), WS/ txt lim is 160
chrTo u/ amsment, dsc txt of thi prgra has
ben cnv_ blow usng thi lbrrys SquezText() fnc See
uself if i_s redble (Yes, it tak som T to get usdto
compr rati is typcly 30-40
There are few grammar rules which are used to shorten some English tokens very much:
Word that has _ is usually a verb
Word that has / is usually a substantive, noun,
pronomine or other non-verb
For example, these tokens must be understood before text can be read. This is not yet like Geek code, because you don't need external parser to understand this, but just some common sense and time to adapt yourself to this text. For a complete up to date list, you have to peek the source code
automatically => 'acly_'
for => 4
for him => 4h
for her => 4h
for them => 4t
for those => 4t
can => _n
does => _s
it is => i_s
that is => t_s
which is => w_s
that are => t_r
which are => w_r
less => -/
more => +/
most => ++
however => h/ver
think => thk_
useful => usful
you => u
your => u/
you'd => u/d
you'll => u/l
they => t/
their => t/r
will => /w
would => /d
with => w/
without => w/o
which => W/
whose => WS/
Time is expressed with big letters
time => T
minute => MIN
second => SEC
hour => HH
day => DD
month => MM
year => YY
Other Big letter acronyms
phone => PH
EXAMPLES
To add new words e.g. to word conversion hash table, you'd define your custom set and merge them to existing ones. Do similarly to %SQZ_WXLATE_MULTI_HASH
and $SQZ_ZAP_REGEXP
and then start using the conversion function.
use English;
use Squeeze qw( :ALL );
my %myExtraWordHash =
(
new-word1 => 'conversion1'
, new-word2 => 'conversion2'
, new-word3 => 'conversion3'
, new-word4 => 'conversion4'
);
# First take the existing tables and merge them with my
# translation table
my %mySustomWordHash =
(
%SQZ_WXLATE_HASH
, %SQZ_WXLATE_EXTRA_HASH
, %myExtraWordHash
);
my $myXlat = 0; # state flag
while (<>)
{
if ( $condition )
{
SqueezeHashSet \%%mySustomWordHash; # Use MY conversions
$myXlat = 1;
}
if ( $myXlat and $condition )
{
SqueezeHashSet "reset"; # Back to default table
$myXlat = 0;
}
print SqueezeText $ARG;
}
Similarly you can redefine the multi word thanslate table by supplying another hash reference in call to SqueezeHashSet(), and to kill more text immediately in addtion to default, just concatenate the regexps to $SQZ_ZAP_REGEXP
KNOWN BUGS
There may be lot of false conversions and if you think that some word squeezing went too far, please turn on the debug end send the log to the maintainer. To see how the conversion goes e.g. for word Messages:
use English;
use Lingua::EN:Squeeze;
SqueezeDebug( 1, '(?i)Messages' );
$ARG = "This line has some Messages in it";
print SqueezeText $ARG;
EXPORTABLE VARIABLES
The defaults may not conquer all possible text, so you may wish to extend the hash tables and ZAP_REGEXP to include your unique text.
$SQZ_ZAP_REGEXP
text to kill immediately, like "Hm, Hi, Hello..." You can only set this once, because this regexp is compiled immediately when SqueezeText()
is first time called.
%SQZ_WXLATE_MULTI_HASH
Multi Word conversion: "for you" => "4u" ...
%SQZ_WXLATE_HASH
Single Word conversion hash: word => conversion. This table is applied after %SQZ_WXLATE_MULTI_HASH
%SQZ_WXLATE_EXTRA_HASH
Aggressive Single Word conversions like: without => w/o ...
INTERFACE FUNCTIONS
SqueezeText($)
- Description
-
Squeeze text by using vowel substitutions and deletions and hash tables that guide text substitutions.
- arg1: $text
-
Text.
- Return values
-
String, squeezed text.
new()
SqueezeHashSet($;$)
- Description
-
Set hash tables to use for converting text. The multiple word conversion is done first and after that the single words conversions.
- arg1: \%wordHashRef
-
Pointer to be used to convert single words. If "reset", use default hash table.
- arg2: \%multiHashRef [optional]
-
pointer to be used to convert multiple words. If "reset", use default hash table.
- Return values
-
None.
SqueezeControl(;$)
- Description
-
Select level of text squeezing: noconv, enable, medium, maximum.
- arg1: $state
-
String. If nothing, set maximum squeeze level (kinda: restore defualts).
noconv Turn off squeeze conv Turn on squeeze med Set squeezing level to medium max Set squeezing level to maximum
- Return values
-
None.
SqueezeDebug(;$$)
- Description
-
Activate or deactivate debug.
- arg1: $state [optional]
-
If not given, turn debug off. If non-zero, turn debug on. You must also supply
regexp
if you turn on debug, unless you have given it previously. - arg1: $regexp [optional]
-
If given, use regexp to trigger debug output when debug is on.
- Return values
-
None.
AVAILABILITY
Mailto: jari.aalto@poboxes.com HomePage via forwarding service is at http://www.netforward.com/poboxes/?jari.aalto or alternatively absolute url is at ftp://cs.uta.fi/pub/ssjaaa/ but this may move without notice. Prefer keeping the forwarding service link in your bookmark.
AUTHOR
Copyright (C) 1998-1999 Jari Aalto. All rights reserved. This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself or in terms of Gnu General Public licence v2 or later.
12 POD Errors
The following errors were encountered while parsing the POD:
- Around line 232:
You forgot a '=back' before '=head2'
- Around line 251:
=back without =over
- Around line 705:
You forgot a '=back' before '=head2'
- Around line 707:
'=item' outside of any '=over'
- Around line 1043:
You forgot a '=back' before '=head2'
- Around line 1045:
'=item' outside of any '=over'
- Around line 1079:
You forgot a '=back' before '=head2'
- Around line 1081:
'=item' outside of any '=over'
- Around line 1153:
You forgot a '=back' before '=head2'
- Around line 1155:
'=item' outside of any '=over'
- Around line 1214:
You forgot a '=back' before '=head2'
- Around line 1216:
'=item' outside of any '=over'