The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

freqtable - Print frequency table of lines/words/characters/bytes/numbers

VERSION

This document describes version 0.008 of freqtable (from Perl distribution App-freqtable), released on 2023-12-28.

SYNOPSIS

% freqtable [OPTIONS] < INPUT

Sample input:

% cat input-lines.txt
one
one
two
three
four
five
five
five
six
seven
eight
eight
nine

% cat input-words.txt
one one two three four five five five six seven eight eight nine

% cat input-nums.txt
9.99 cents
9.99 dollars
9 cents

Modes

Display frequency table (by default: lines):

% freqtable input-lines.txt
3       five
2       eight
2       one
1       four
1       nine
1       seven
1       six
1       three
1       two

Display frequency table (words):

% freqtable -w input-words.txt
3       five
2       eight
2       one
1       four
1       nine
1       seven
1       six
1       three
1       two

Display frequency table (characters):

% freqtable -c input-words.txt
12
12      e
 7      i
 5      n
 4      f
 4      o
 4      t
 4      v
 3      h
 2      g
 2      r
 2      s
 1

 1      u
 1      w
 1      x

Display frequency table (nums):

% freqtable -n input-nums.txt
2      9.99
1      9

Display frequency table (integers):

% freqtable -i input-nums.txt
3      9

-F option

Don't display the frequencies:

% freqtable -F input-lines.txt
five
eight
one
four
nine
seven
six
three
two

Filter by frequencies

Only display lines that appear three times:

% freqtable -F input-lines.txt --freq 3
3       five

Only display lines that appear more than once:

% freqtable -F input-lines.txt --freq 2-
3       five
2       eight
2       one

Only display lines that appear less than three times:

% freqtable -F input-lines.txt --freq -2
2       eight
2       one
1       four
1       nine
1       seven
1       six
1       three
1       two

Sorting

Instead of the default sorting by frequency (descending order), if you specify --sort-sub (and optionally one or more --sort-arg) you can sort by the keys using one of Sort::Sub::* subroutines. Examples:

# sort by keys, asciibetically
% freqtable -F input-lines.txt --sort-sub asciibetically
2       eight
3       five
1       four
1       nine
2       one
1       seven
1       six
1       three
1       two

# sort by keys, asciibetically (descending order)
% freqtable -F input-lines.txt --sort-sub 'asciibetically<r>'
1       two
1       three
1       six
1       seven
2       one
1       nine
1       four
3       five
2       eight

# sort by keys, randomly using perl code (essentially, shuffling)
% freqtable -F input-lines.txt --sort-sub 'by_perl_code' --sort-arg 'code=int(rand()*3)-1'
3       five
1       three
2       eight
1       seven
2       one
1       six
1       nine
1       two
1       four

DESCRIPTION

This utility counts the occurences of lines (or words/characters) in the input then display each unique lines along with their number of occurrences. You can also instruct it to only show lines that have a specified number of occurrences.

You can use the following Unix command to count occurences of lines:

% sort input-lines.txt | uniq -c | sort -nr

and with a bit more work you can also use a combination of existing Unix commands to count occurrences of words/characters, as well as filter items that have a specified number of occurrences; freqtable basically offers convenience.

EXIT CODES

0 on success.

255 on I/O error.

99 on command-line options error.

OPTIONS

  • --bytes, -c

  • --chars, -m

  • --words, -w

  • --lines, -l

  • --number, -n

    Treat each line as a number. A line like this:

    9.99 cents

    will be regarded as:

    9.99
  • --integer, -i

    Treat each line as an integer. A line like this:

    9.99 cents

    will be regarded as:

    9
  • --ignore-case, -f

  • --no-print-freq, -F

    Will not print the frequencies.

  • --freq=s

    Filter by frequencies. N (e.g. --freq 5) means only display items that occur N times. M-N (e.g. --freq 5-10) means only display items that occur between M and N times. M- (e.g. --freq 5-) means only display items that occur at least M times. -N (e.g. --freq -10) means only display items that occur at most N times.

  • --sort-sub=s

    This will cause freqtable to sort by key name instead of by frequencies. You pass this option to specify a Sort::Sub routine, which is the name of a Sort::Sub::* module without the Sort::Sub:: prefix, e.g. asciibetically. The name can optionally be followed by <i>, or <r>, or <ir> to mean case-insensitive sorting, reverse order, and reverse order case-insensitive sorting, respectively. When you use one of these suffixes on the command-line, remember to quote since < and > can be intereprted by shell.

    Examples:

    asciibetically
    asciibetically<i>
    by_length<r>
  • --sort-arg=ARGNAME=ARGVALUE

    Pass argument(s) to the sort subroutine. Can be specified multiple times, once for every argument.

  • -a

    Shortcut for --sort=asciibetically.

  • --percent, -p

    Show frequencies as percentages.

FAQ

HOMEPAGE

Please visit the project's homepage at https://metacpan.org/release/App-freqtable.

SOURCE

Source repository is at https://github.com/perlancar/perl-App-freqtable.

SEE ALSO

Unix commands wc, sort, uniq

wordstat from App::wordstat

csv-freqtable from App::CSVUtils

AUTHOR

perlancar <perlancar@cpan.org>

CONTRIBUTING

To contribute, you can send patches by email/via RT, or send pull requests on GitHub.

Most of the time, you don't need to build the distribution yourself. You can simply modify the code, then test via:

% prove -l

If you want to build the distribution (e.g. to try to install it locally on your system), you can install Dist::Zilla, Dist::Zilla::PluginBundle::Author::PERLANCAR, Pod::Weaver::PluginBundle::Author::PERLANCAR, and sometimes one or two other Dist::Zilla- and/or Pod::Weaver plugins. Any additional steps required beyond that are considered a bug and can be reported to me.

COPYRIGHT AND LICENSE

This software is copyright (c) 2023, 2022, 2018 by perlancar <perlancar@cpan.org>.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.

BUGS

Please report any bugs or feature requests on the bugtracker website https://rt.cpan.org/Public/Dist/Display.html?Name=App-freqtable

When submitting a bug or request, please include a test-file or a patch to an existing test-file that illustrates the bug or desired feature.