NAME
ptp - An expressive Pipelining Text Processor
SYNOPSIS
ptp file1 file2 ... [--grep re] [--substitute re subst] ... [-o out]
The program takes in argument a list of files (that can appear anywhere on the line) and a list of commands that describe a pipeline to apply on each input file.
OPTIONS SUMMARY
Here is a short summary of some of the main options available. Many more options are described (and in more details) below, in the "OPTIONS" section.
- -g pattern, -s pattern subst
-
Filter all the lines using the given pattern (inverted with -V before the -g option), or replace all the match of the pattern by the given substitution string.
- -p perl code
-
Execute the given code for each lines of input (the line is in $_ that can be modified).
- -n perl code
-
Replace each line by the return value of the given code (the input line is in $_).
- --sort, --uniq, --head n, --tail n, --reverse, --shuffle, ...
-
Sort the file, remove duplicate lines, keep the first or last lines, reverse the file, randomly shuffle the file, etc.
- --pivot, --anti-pivot, --transpose
-
Join all the lines into a single lines (--pivot), or split the fields of each lines into multiple lines (--anti-pivot). Invert lines and column (fields on a line) with --transpose.
- --cut f1,f2,...
-
Keep only the given fields of each line (by default fields can be separated by tabs or comma, they will be separated by tabs in the output, this can be overridden with -F and -P).
- --paste filename
-
Join each line of the current file with the matching line of the given filename.
- --tee filename, --shell command
-
Write the content of the file to the give filename or send it to the given shell command.
- -o filename, -a filename, -i
-
Write the output to the given file (instead of the standard output), or append to the file, or write it in-place in the input files.
DESCRIPTION
PTP is a versatile and expressive text processor program. The core features that it tries to provide are the following:
Provide grep, sed-like and other operations with a coherent regular expression language (grep has a -P flag but sed has nothing of the like).
Provide a powerful input/output files support, that is lacking when using vanilla-Perl one-liner (recursion in directories, output in-place with optional backups, etc.).
Pipelining of multiple operations on multiple files (using a pipeline made of several standard tool usually makes it difficult to process several input files at once).
See examples of PTP in action below, in the "EXAMPLES" section.
OPTIONS
All options are case sensitive and can be abbreviated down to uniqueness. However, it is recommended to use only the variants that are documented here, in case options are introduced in the future that render some abbreviations ambiguous. Unless specified otherwise, the arguments to all the options are mandatory (for brevity they are usually documented only on the short form of the options, but they are mandatory for the long form too).
The program expects four different kinds of arguments (all described below). They can be mixed in any order that you want. However, for some of these arguments the order is actually meaningful (e.g. the commands are applied in the order in which they are specified):
"INPUT FILES" can be specified anywhere on the command line, except between another flag and its argument.
"PIPELINE COMMANDS", which describe what operations should be executed on each input files. The commands are all executed, in the order in which they are specified on the command line, and applied to all input files.
"PROGRAM BEHAVIOR" options, set global options for the program. These flags can appear multiple times on the command line, but only the last occurrence will be used. To avoid mistakes, the program will stop with an error when some of these flags are specified more than once.
"PIPELINE MODES" flags, which modify how the pipeline commands behave. These flags have effect starting at the point where they are specified for all the pipeline commands that are specified after them. Usually, each of these flags will have an opposite flag that allows to revert to the default behavior if needed.
INPUT FILES
Input files can be specified anywhere on the command line. They will be processed in the order in which they appear but their position relative to other arguments is ignored. Any command line argument that does not start with a - is considered to be a filename (unless it is an argument to a preceding flag).
A single - alone indicates that the standard input will be processed, this can be mixed with reading other files. If no input files at all are specified then the standard input is processed.
Finally, you can stop the processing of the command line arguments by specifying a -- option. In that case, all remaining arguments will be considered as input files, even if they start with a -.
PIPELINE COMMANDS
The options in this section specify what processing to apply on the input files. For each input file, all of these commands are applied in the order in which they are specified, before the next file is processed.
If the --merge command is used, then all the input files are merged at that point and all the content processed up to that point is considered as a single input for the rest of the pipeline (this is described below).
Many of the commands from this list are affected by the flags described in "PIPELINE MODES". An overview of the most important one is given in the description of the affected commands.
- --g pattern, --grep
-
Filter each input to keep only the lines that match the given regular expression. That expression cannot have delimiters (e.g. /foo/) so, if you want to pass options to the regex, you need to use the group syntax (e.g. (?i)foo).
If you use parenthesis, you probably want to enclose the expression in single quotes, to prevent the shell from interpreting them.
This command is much faster then manually giving a match operation to the --filter command, because the code does not need to be escaped.
This operation can be made case-insensitive with the -I flag, inverted with -V and the pattern can be interpreted as an exact string with -Q.
- -s pattern subst, --substitute
-
Replace all matches of the given regular expression by the given substitution pattern on each line of the input. The substitution string is evaluated like a Perl string, so it can contain references to capture group in the regular expression using the $1, $2, etc. syntax.
In addition to the -I and -Q flags that also apply to this operation (see the description of the --grep command), this command can be made to match at most once per line with -L.
- --p code, -perl
-
Execute the given perl code for each line of the input. The content of the line is in the $_ variable. That variable can be modified to change the content of the line. If the variable is undefined then the line is removed.
Note that if you introduce new-line characters (or whatever characters specified by the --input-separator flag), the resulting line will not be split again by the program and will be considered as a single line for the rest of the pipeline.
See also the "PERL ENVIRONMENT" section for details on variables and functions available to your Perl code.
An error in the Perl code will result in a message printed to the standard output but the processing will continue. The current line may or may not be modified.
- -n code
-
Execute the given perl code for each line of the input. Replace each line with the return value from the code. The input line is in the $_ variable. If the return value is undef then the line is removed.
See the note on new-line characters given in the description of the --perl command.
An error in the Perl code will result in a message printed to the standard output but the processing will continue. The current line will not be modified.
- -f code, --filter
-
Execute the given perl code for each line of the input and keep the lines where the return value from the code is true. The input line is in the the $_ variable. Note that you can modify that variable, but you probably should avoid doing it.
An error in the Perl code will result in a message printed to the standard output but the processing will continue. The current line will not be removed.
- --ml code, --mark-line
-
Execute the given code for each line of input (the current line is in the $_ variable) and store the return value (usually a boolean) in the marker of the current line.
The marker can then be accessed by other commands through the $m variable or used directly by the commands that operate on marked lines.
- -e code, --execute
-
Execute the given code. As other command, this will be executed once per input file being processed. This command can be used to initialize variables or functions used in --perl or -n commands.
Any error in the Perl code will terminate the execution of the program.
- -M module
-
Load the given Perl module in the Perl environment. This option cannot be used when --safe is specified with level strictly greater than 0.
- -l path, --load
-
Same as --execute except that it takes the code to execute from the given file.
Any error in the Perl code will terminate the execution of the program.
- --sort
-
Sort the content of the input using the default lexicographic order. Or the comparator specified with the --comparator flag.
Any error in the Perl code of the comparator will terminate the execution of the program.
- --ns, --numeric-sort
-
Sort the content of the input using a numeric sort. The numeric value of each line is extracted by parsing a number at the beginning of the line (which should look like a number).
The markers of the input lines are reset (no line is marked after this command).
- --ls, --locale-sort
-
Sort the content of the input using a locale sensitive sorting. The exact meaning of this depends on the configuration of your system (see the perllocale documentation for more details). In practice, it will do things like correctly comparing equivalent characters and/or ignoring the case.
The markers of the input lines are reset (no line is marked after this command).
- --cs code, --custom-sort
-
Sort the content of the input using the given custom comparator. See the --comparator flag for a specification of the argument of this command.
All markers are unset after this operation.
- -u, --unique
-
Remove consecutive lines that are identical. You will often want to have a --sort step before this one.
The markers of the lines that are kept are not changed.
- --gu, --global-unique
-
Remove duplicate lines in the file, even if they are not consecutive. The first occurrence of each line is kept.
The markers of the lines that are kept are not changed.
- --head [n]
-
Keep only the first n lines of the input. If the number of line is negative then remove that much lines from the end of the input. if n is omitted, then uses some default value.
- --tail [n]
-
Keep only the last n lines of the input. If the number of line is negative then remove that much lines from the beginning of the input. if n is omitted, then uses some default value.
- --reverse, --tac
-
Reverse the order of the lines of the input. The markers of each lines are preserved (they are reversed with the input).
- --shuffle
-
Shuffle all the lines of the input in random order. The markers of the input lines are reset (no line is marked after this command).
- --eat
-
Delete the entire content of the file (eat it). This is useful if you don't need the content any-more (maybe you have sent it to another command with --shell) but you cannot redirect the output (typically to get the output of that shell command).
- --delete-marked
-
Delete every line whose marker is currently set. See the --mark-line command for details on how to set the marker of a line.
After this operation, no line has a marker set (they were all deleted).
- --delete-before
-
Delete all the lines immediately preceding a line whose marker is set. The markers of the lines that are not deleted are not changed.
- --delete-after
-
Delete all the lines immediately following a line whose marker is set. The markers of the lines that are not deleted are not changed.
- --delete-at-offset offset
-
Delete all the lines situated at the given offset from a marked line. A positive offset means lines that are after the marked lines.
- --insert-before text
-
Insert the given line of text immediately before each marked line. The given text is treated as a quoted Perl string, so it can use any of the variable described in "PERL ENVIRONMENT". In particular, the $_ variable is set to the marked line before which the insertion is taking place. However this text is not a general Perl expression, so you may have to post-process with an other command for complex processing.
Note that if the -Q flag is in effect, then the given text is inserted as-is without any variable interpolation (except anything that may have been done by your shell before the argument is read by the program).
The newly inserted lines have their markers unset. Other lines' markers are not changed.
- --insert-after text
-
Same as --insert-before, but the new line is inserted after the marked line.
- --insert-at-offset offset text
-
Generalized version of the --insert-before and <--insert-after> commands. This commands insert the given text at the given offset relative to the marked line. Offset 0 means inserting the line immediately after the marked line.
- --clear-markers
-
Clear the marker of all the input lines.
- --set-all-markers
-
Set the marker of all the input lines.
- --cut field,field,...
-
Select specific fields of each input line and replace the line content with these fields pasted together. The given fields must be integer number. The first field has the number 1. It is also possible to give negative field numbers, to count from the end of the line. Each line does not need to have all the specified fields available. Missing fields are replaced by empty strings. The separator itself is not kept in the content of the fields.
The notion of what constitute a field is defined by the -F flag described below in "PIPELINE MODES". The default will try to split on both tabs and comma. When the fields are pasted together, tabs are added between each fields but this can be overridden with the -P flag.
The value of the -F flag is also affected by the -Q and -I flags.
- --paste file
-
Join each line of the input with the matching line of the given file (in sequential order). The joined file is reset for each new input that is processed. If the input and the given file don't have the same length, then the missing lines are replaced by empty strings.
The lines are joined using the separator given by the -P flag (which defaults to a tab). If the side file was longer than a given input, the new lines that are created in the processed file have their markers unset.
- --pivot
-
Join all lines of each input file into a single line. Use the separator given by the -P flag to paste the lines together (this defaults to a tab).
After this command each input file contains a single line, whose marker is unset.
- --anti-pivot
-
Splits all the line according to the -F flag (see --cut for more details) into multiple lines (this is not the same as adding new-lines in the middle of lines as with this command the multiple lines will be treated as distinct lines by subsequent commands). Lines with no fields according to the -F flag are entirely dropped.
After this command, the marker of every line is unset.
- --transpose
-
Splits all the line according to the -F flag (see --cut for more details) and then transpose the rows and columns, so that the first fields of each lines are assembled on the first line of the output, the second fields on the second lines, etc. Missing fields are replaced with empty strings.
The fields assembled on a given lines are joined using the separator given by the -P flag (this defaults to a tab).
After this command, the marker of every line is unset.
- --nl, --number-lines
-
Number each line of the input (putting the line number in a prefix of the line). If you want more control on how the line numbering is done, you can have a look at the "EXAMPLES" section.
- --fn, --file-name
-
Replace the entire content of the input with a single line containing the name of the file currently being processed. Does nothing if a file is entirely empty at that point in the processing.
The resulting line has its marker unset.
- --pfn, --prefix-file-name
-
Add the name of the current file as the first line of the file. That line has its marker unset.
- --lc, --line-count
-
Replace the entire content of the input with a single line containing the number of lines in the file currently being processed.
The resulting line has its marker unset.
- -m, --merge
-
Merge the content of all the files at this point in the pipeline. Then continue to process the rest of the pipeline (specified after this command) as if there was a single merged input with the content of all the files.
This command can only be specified once in a pipeline.
- --tee filename
-
Output the current content of the input to the given file. When inserted in the middle of the command pipeline, the content written to this file is not necessarily the same content as the final output of the pipeline.
If filename is a single -, then write to the standard output.
Note that the filename is actually evaluated as a quoted Perl string, so it can contain variables from the Perl environment. This behavior can be deactivated by the -Q flag, to have the filename be used as-is (in particular, you probably want to use this flag on platform where file-names use back-slash characters '\').
- --shell command
-
Execute the given command and pass it the content of the current file as standard input. The output of the command goes on the standard output, it is not read by the program. The current content and markers are not modified.
Not that the given command is first interpreted as a Perl string, so it can contain variables from the Perl environment. This behavior can be deactivated by the -Q flag. Then, the command is passed to a shell which will do another pass of interpretation. That pass cannot be deactivated.
PROGRAM BEHAVIOR
The options in this section modify globally the behavior of the program. They can be specified anywhere on the command line with the same effect. Most of them can appear multiple time on the command line but only the last occurrence is taken into account. To help find possible problem causes, for some of these options, specifying them multiple times will generate an error.
- -o output_file, --output
-
Send all output of the program to the specified file. The file is created if needed and its content is deleted at the beginning of the execution of the program. So the file cannot be used as an input to the program.
You can only specify a single output file.
- -a output_file, --append
-
Same as --output but append to the specified file instead of deleting its content.
- -i, --in-place
-
Write the output of the pipeline for each input file in-place into these files. This cannot be used when reading from the standard input or when --merge is used in the pipeline.
- -R, --recursive
-
Allow to specify directories instead of files on the command line. The entire content of the specified directories will be processed, as if all the files had been mentioned on the command line.
- --input-filter code
-
When recursively expending a directory passed on the command line (when the -R option is active), then execute the given Perl code (that will typically be just a regex match like /foo.*bar/). Only file names for which the code returns a true value are kept. The complete file name is passed to the code in the default $_ variable. You can view this option in action in the "EXAMPLES" sections
This option applies only of files recursively expended from a directory passed on the command line. It does not apply on files that are explicitly listed. In particular, this option does not apply on files that are expended by a shell glob. It derives that this option is useless unless -R is specified too.
All the functions from the Perl File::Spec::Functions module are available to the code being executed (e.g. the splitpath function).
- --input-encoding encoding (alias --in-encoding)
-
Specify the encoding used to read the input files. The default is UTF-8.
- --output-encoding encoding (alias --out-encoding)
-
Specify the encoding used to write the output. The default is UTF-8.
- --input-separator separator (alias --in-separator)
-
Specify the separator that is used to split the lines in the input files. The default is "\n" (LF). Note that currently, on windows, "\r\n" (CRLF) characters in input files will be automatically transformed into "\n" characters.
- --output-separator separator (alias --out-separator)
-
Specify the separator that is added in-between lines in the output. The default is "\n" (LF). Note that currently, on Windows, this is automatically transformed into an "\r\n" (CRLF) sequence.
- --eol, --preserve-input-separator
-
Keep the input separators in the content of each line. It is then your responsibility to preserve (or to change) this separator when the files are processed.
Setting this flag also sets the --output-separator to the empty string. This can be overridden if needed by passing that flag after the --eol one (this would result in each line having their initial end of line separator plus the one specified for the output (unless the initial one is removed during the processing).
- --fix-final-separator
-
If set, then the final line of each file is always terminated by a line separator in the output (as specified by --output-separator), even if it did not have one in the input.
- -0
-
Set the --input-separator to the null character (\000) and the --output-separator to the empty string. This result in having each file read entirely in a single logical line.
- --00
-
Set the --output-separator to the null character (\000). This produces output compatible with -0 option of
xargs
. - -h, --help
-
Print this help message and exits. Note: the help message will be much printed improved if you have the perldoc program installed (sometimes from a perl-doc package).
- --version
-
Print the version of the program and exit.
- -d, --debug
-
Send debug output on the execution of the program to the standard error output. If you specify this option a second time, then the final output itself will be modified to contain some debugging information too (sent on the standard output or in any file that you specify as the output, not to the standard error).
- --abort
-
Abort the execution of the program after all argument have been parsed but before the actual execution of the program. This is most useful with --debug to check that the arguments are interpreted as expected.
- --preserve-perl-env
-
By default, the Perl environment accessible to the commands executing user supplied code (--perl, -n, --filter, etc.) is reset between each input file. When this option is passed, the environment is preserved between the files.
- --safe [n]
-
Switch to a safer mode of evaluating user supplied Perl code (from the command line). The default mode (equivalent to passing --safe 0) is the fastest. But some specifically crafted user supplied code could break the behavior of the program. That code is also run with all the privilege of the current user, so it can do anything on the system.
When passed a value of 1 or more, the user code is run in a container that protects the rest of the program from that code. The code still has access to the rest of the system. This mode is approximately 30 times slower than the default.
When passed a value of 2 or more, the container additionally tries to prevent the user code from any interaction with the rest of the system (outside of the content of the files passed to the program). However, no claim is made that this is actually secure (and it most certainly is not).
If the argument to --safe is omitted, the value 2 is used.
PIPELINE MODES
The options in this section modify the way the pipeline commands work. These options apply to all pipeline commands specified after them, until they are cancelled by another option.
- -I, --case-insensitive
-
Make the regular expressions used for the --grep and --substitute commands be case-insensitive by default (this can still be overridden in a given regular expression with the (?-i) flag).
This does not apply to regular expressions evaluated through the --perl command.
- -S, --case-sensitive
-
Make the regular expressions used for the --grep and --substitute commands be case-sensitive by default (this can still be overridden in a given regular expression with the (?i) flag).
This is the default mode unless --case-insensitive is specified.
- -Q, --quote-regexp
-
Quote the regular expressions passed to the --grep and --substitute commands so that all (possibly) special characters are treated like normal characters. In practice this means that the matching done by these commands will be a simple text matching. Also disable variable interpolation for the substitution string passed to --substitute and for the arguments to the --insert-before, --insert-after, --insert-at-offset, --tee, and --shell commands.
This does not apply to regular expressions evaluated through the --perl command.
- -E, --end-quote-regexp
-
Stop the effect of the --quote-regexp mode and resumes normal interpretation of regular expressions.
This is the default mode when --quote-regexp is not specified.
- -G, --global-match
-
Apply the substitution given to the --substitute command as many times as possible (this is the default).
- -L, --local-match
-
Apply the substitution given to the --substitute command at most once per line.
- -C code, --comparator
-
Specify a custom comparator to use with the --sort command. This flag expect a perl expression that will receive the two lines to compare in the $a and $b variables and should return an integer less than, equal to, or greater than 0 depending on the order of the line (less than 0 if $a should be before $b).
The default value is somehow equivalent to specifying --comparator '$a cmp $b'. However, a user specified comparator will always be less efficient than the default one.
Any error in the Perl code of the comparator will terminate the execution of the program.
- -F regex, --input-field-spec
-
Specify the regular expression used to cut fields with the --cut command. The default is \s*,\*s|\t. Note that this value is also affected by the -Q and -I flags (as well as their opposed -E and -S).
- -P string, --output-field-spec
-
Specify the separator used to paste fields together with the --cut command and to join lines with the --paste command. The default is a tabulation character.
- Default values for -F and -P
-
The flags below are setting both the -F and -P value, used to split fields and to paste them:
--default restore the default values for the two flags, as documented above.
--bytes sets the flags so that each input character is a field and there is no separator when fields are pasted together. Not that the naming is sort of a misnomer, as it splits on characters and not on bytes (you can split on bytes by specifying an 'ascii' input encoding).
--csv splits on comma (ignoring surrounding spaces) and use a comma to paste the fields.
--tsv splits on each tab character, and use one to paste the fields.
--none never splits on anything. This flag only sets -F but not the -F value. It is meant to be used with the --transpose command so that all the lines of the input are joined into a single line. In that case, the --transpose command becomes equivalent to the --pivot one.
- --sq character, --single-quote-replacement
-
Define a character or a string which, if present in any of the commands that accept Perl code as argument, will be replaced by a single quote character (
'
) before the command is passed to Perl.This is useful to work around limitations of shell escaping.
- --dq character, --double-quote-replacement
-
Define a character or a string which, if present in any of the commands that accept Perl code as argument, will be replaced by a double quote character (
"
) before the command is passed to Perl.This is useful to work around limitations of shell escaping.
- --ds character, --dollar-sigil-replacement
-
Define a character or a string which, if present in any of the commands that accept Perl code as argument, will be replaced by a dollar character (
$
) before the command is passed to Perl.This is useful to work around limitations of shell escaping.
- --re engine, --regex-engine
-
Select the regular expression engine used for the --grep and --substitute commands. The default value perl uses Perl built-in engine. Other values are expected to be the last part of the name of an re::engine::value module (e.g. RE2, PCRE, TRE, GNU, etc.). The matching Perl module needs to be installed. Note that the name of the engine is case-sensitive.
For the --substitute command, only the pattern is affected by this option. The substitution still uses the Perl syntax to refer to matched group (e.g. $1, etc.).
Finally, note that this option does not apply to regex that would be manually specified through any of the commands executing Perl code (e.g. --perl, -e, --filter, etc.).
- -X, --fatal-error
-
Make any Perl code error in the --perl, -n and --filter commands be fatal error (the execution of the program is aborted).
- --ignore-error
-
Print an error to the standard output when an error occurs in the Perl code provided to the --perl, -n and --filter commands and continue the processing (this is the default).
- -V, --inverse-match
-
Inverse the behavior of the --grep and --filter commands (lines that would normally be dropped are kept and inversely).
- -N, --normal-match
-
Give the default behavior to the --grep and --filter commands.
PERL ENVIRONMENT
Below is a description of variables and functions available to the Perl code executed as part of the --perl, -n, --execute, --load, --filter, --mark-line, and --custom-sort (or --sort with --comparator) commands.
While not directly executing Perl code, the --grep and --substitute commands also have access to the variables described below and those that are created by user supplied code.
$_
This variable is set to the current line being processed. In most context (but not all), it can be modified to modify that line.
$f
This variable contains the name of the input file currently being processed as given on the command line. This variable is available to all the commands. When processing the standard input, this will be '-'.
$F
This variable contains the absolute path of the input file currently being processed. This variable is available to all the commands. When processing the standard input, this will be '-'.
$n
This variables contains the number of the line currently being processed. It is available only to the --perl, -n, -s (in the substitution argument only), --mark-line, and --filter commands.
The same value is also available under the standard $. variable, which allows to use the Perl ..
operator. One difference is that any write to $n are ignored, while write to $. will modify that variable (but not the $n one).
$N
This variables contains the total number of lines in the current input.
$m
This variables contains the marker of the current line. This is the value that is set be the --mark-line command, but it can be manipulated by any other line-processing operation (mainly --perl, -n, and --filter).
@m
This array contains the markers of all the line, it is accessed using index relative to the current line (and using Perl convention, where array are read using the $ sigil), so $m[0] is the marker of the current line (equivalent to $m), $m[1] is the marker of the following line, etc. The markers for lines that don't exist are all unset.
This array can be used to modify the marker of any line. Modifying a marker outside of the existing lines is ignored.
$I
This variable contains the index of the file currently being processed (starting at 1 for the first file).
ss start[, len[, $var]]
Returns the sub-string of the given $var, starting at position start and of length len. If $var is omitted, uses the default $_ variable. If len is omitted or 0, reads the entire remaining of the string.
If start is negative, starts at the end of the string. If len is negative, removes that much characters from the end of the string.
This is quite similar to the built-in substr function except that ss will returns the empty-string instead of undef if the specified sub-string is outside of the input.
pf format[, args...]
Formats the given args using the format string (following the standard printf format) and stores the result in the default $_ variable.
spf format[, args...]
Formats the given args using the format string (following the standard printf format) and returns the results.
EXAMPLES
A default invocation of the program without arguments other than file names will behave as the cat program, printing the concatenated content of all its input files:
ptp file1 file2 file3
This example is similar to the built-in --nl commands. It replaces each line with the output of the sprintf function which, here, will prefix the line number to each line.
That example also demonstrates that a variable can be re-used across the lines of an input (the $i variable), but that it is reset between each input. Using the variables and functions described in "PERL ENVIRONMENT" the argument to the -n command could be rewritten spf "% 5d %s", $n, $_
:
ptp file1 file2 -n 'sprintf("%5d %s", ++$i, $_)'
Same as the example above, but does not number empty lines (this is the default behavior of the GNU nl util). Also this uses the pf function that modifies the $_ variable, so it can be used directly with the --perl command instead of the -n one:
ptp file -p 'pf("%5d %s", ++$i, $_) if $_'
Print a sorted list of the login name of all users:
ptp /etc/passwd -F : --cut 1 --sort
Number all the lines of multiple inputs, as if they were a single file:
ptp file1 file2 -m --nl
Join lines that end with an = character with the next line. The chomp Perl command removes the end-of-line character from the current line (which was there due to the usage of the --eol flag). In this example, that command is applied only if the line matches the given regex (which search for the = character at the end of the line):
ptp file --eol -p 'chomp if /=$/'
Output the number of lines of comment in all the source files in a given directory, filtering only the files that match some extensions. The --input-filter option ensures that only source file are used inside the given directory. The -g (--grep) command keeps only the lines that start with a C-style comment (or spaces followed by a comment), then the --lc command (--line-count), replaces the entire content of the file with just the number of lines that it contains (the number of comments at that point). Finally, the --pfn command (--prefix-file-name) adds the name of the current file as the first line of each file, and --pivot joins the two lines of each file (the file name and the number of lines):
ptp dir -R --input-filter '/\.(c|h|cc)$/' -g '^\s*//' --lc --pfn --pivot
Find all the occurrences of a given regex in a file and print them, one per line. The regex can contain capture groups (using parenthesis). In that case only the content of the capture group is kept:
ptp file -n 'join(",", /regex/g)' --anti-pivot --fix-final-separator
ENVIRONMENT
Some environment variables can affects default options of the program when they are set.
- PTP_DEFAULT_CASE_INSENSITIVE
-
Setting this variable to 1 means the the -I flag is in effect at the beginning of the parsing of the command line arguments. Setting the variable to 0 gives the default behavior (as if -S was passed).
- PTP_DEFAULT_QUOTE_REGEX
-
Setting this variable to 1 means the the -Q flag is in effect at the beginning of the parsing of the command line arguments. Setting the variable to 0 gives the default behavior (as if -E was passed).
- PTP_DEFAULT_LOCAL_MATCH
-
Setting this variable to 1 means the the -L flag is in effect at the beginning of the parsing of the command line arguments. Setting the variable to 0 gives the default behavior (as if -G was passed).
- PTP_DEFAULT_REGEX_ENGINE
-
Setting this variable allows to override the default regex engine used by the program. That variable can take the same values as the --re flag.
- PTP_DEFAULT_FATAL_ERROR
-
Setting this variable to 1 means that the -X flag is in effect at the beginning of the parsing of the command line arguments. Setting the variable to 0 gives the default behavior (as if -ignore-error was passed).
- PTP_DEFAULT_INVERSE_MATCH
-
Setting this variable to 1 means that the -V flag is in effect at the beginning of the parsing of the command line arguments. Setting the variable to 0 gives the default behavior (as if -N was passed).
- PTP_DEFAULT_SAFE
-
Setting this variable to an integer value will set the default mode of executing user supplied Perl code, as if the --safe option was given.
CAVEATS
This program is optimized for expressivity rather than performance (also, modern computers are powerful). So it will read each file in memory entirely before processing it. In particular, if you use the --merge option, then all the input files are entirely loaded in memory at the same time.
Handling of the user supplied code might differ depending on whether the --safe option is in effect or not. In particular, currently any exception thrown by user code in safe mode is entirely ignored. While this is a bug, one could say that this contribute to prevent that anything unpredictable will happen to the calling code...
AUTHOR
This program has been written by Mathias Kende.
LICENCE
Copyright 2019 Mathias Kende
This program is distributed under the MIT (X11) License: http://www.opensource.org/licenses/mit-license.php
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.