NAME

Code::ART - Analyze/Rename/Track Perl source code

VERSION

This document describes Code::ART version 0.000004

SYNOPSIS

use Code::ART;

# Convert source code fragment to sub and call...
$refactored = refactor_to_sub( $source_code, \%options );

# or:
$refactored = hoist_to_lexical( $source_code, \%options );

# Source code of sub or lexical...
$sub_definition  = $refactored->{code};

# Code to call sub with args, or to evaluate lexical...
$sub_call_syntax = $refactored->{call};

# Array of arg names (as strings, only for refactor_to_sub() )...
@sub_arg_list    = @{ $refactored->{args} };

# Only if refactoring failed...
$failure_message = $refactored->{failed};

DESCRIPTION

This module provides a range of subroutines to help you refactor valid Perl source into cleaner, better decomposed code.

The module also comes with a Vim plugin to plumb those refactoring behaviours directly into that editor (see "Vim integration").

For example, the module provides a subroutine (refactor_to_sub()) that takes a source code fragment as a string, analyzes it to determine the unbound variables within it, then constructs the source code of an equivalent subroutine (with the unbound variables converted to parameters) plus the source code of a suitable call to that subroutine.

It is useful when hooked into an editor, allowing you to (semi-)automatically convert functional code like:

my @heatmap =
    map  { $config{$_} }
    sort {
           my $a_key = $a =~ /(\d+)/ ? $1 : undef;
           my $b_key = $b =~ /(\d+)/ ? $1 : undef;
           defined $a_key && defined $b_key
              ? $a_key <=> $b_key
              : $a     cmp $b;
         }
    grep { /^heatmap/ }
    keys %config;

into a much cleaner:

my @heatmap =
    map  { $config{$_} }
    nsort
    grep { /^heatmap/ }
    keys %config;

plus:

sub nsort {
    sort {
           my $a_key = $a =~ /(\d+)/ ? $1 : undef;
           my $b_key = $b =~ /(\d+)/ ? $1 : undef;
           defined $a_key && defined $b_key
              ? $a_key <=> $b_key
              : $a     cmp $b;
    }, @_;
}

Or to replace something long and imperative like:

my @heatmap_keys;

for my $key (keys %config) {
    next if $key !~ /^heatmap/;
    push @heatmap_keys, $key;
}

@heatmap_keys
    = sort {
            my $a_key = $a =~ /(\d+)/ ? $1 : undef;
            my $b_key = $b =~ /(\d+)/ ? $1 : undef;
            defined $a_key && defined $b_key
                ? $a_key <=> $b_key
                : $a     cmp $b;
        } @heatmap_keys;

my @heatmap;

for (@heatmap_keys) {
    push @heatmap, $config{$_};
}

with something short and imperative:

my @heatmap;

for ( get_heatmap_keys(\%config ) ) {
    push @heatmap, $config{$_};
}

plus:

sub get_heatmap_keys {
    my ($config_href) = @_;

    my @heatmap_keys;

    for my $key (keys %{$config_href}) {
        next if $key !~ /^heatmap/;
        push @heatmap_keys, $key;
    }

    @heatmap_keys = sort {
                        my $a_key = $a =~ /(\d+)/ ? $1 : undef;
                        my $b_key = $b =~ /(\d+)/ ? $1 : undef;
                        defined $a_key && defined $b_key
                            ? $a_key <=> $b_key
                            : $a     cmp $b;
                    } @heatmap_keys;

    return @heatmap_keys;
}

INTERFACE

Refactoring a fragment of Perl code

To refactor some Perl code, call the refactor_to_sub() subroutine, which is automatically exported when the module is loaded.

my $refactored = refactor_to_sub( $source_code_string, \%options );

Note that this subroutine does not actually rewrite the source code with the refactoring; it merely returns the components with which you could transform the original source yourself.

The subroutine takes a single required argument: a string containing the complete source code within which some element is to be refactored.

The options specify where and how to refactor that code element, as follows:

from => $starting_string_index
to => $ending_string_index

These two options are actually required. They must be non-negative integer values that represent the indexes in the string where the fragment you wish to refactor begins and ends.

name => $name_of_new_sub

This option allows you to specify the name of the new subroutine. If it is not provided, the module uses a bad generic name instead (__REFACTORED_SUB__), which you'll have to change anyway, so passing the option is strongly recommended.

data => $name_of_the_var_to_hold_any_trailing_data

This option allows you to specify the name of the slurpy variable into which any trailing arguments for the new subroutine (i.e. in addition to those the refactorer determines are required) will be placed.

If it is not provided, the module uses a generic name instead (@__EXTRA_DATA__).

return => $source_of_the_expr_to_be_returned

If this option is specified, the refactorer places its value in a return statement at the end of the refactored subroutine.

If it is not provided, no extra return statement is added.

The return value of refactor_to_sub() in all contexts and in all cases is a hash reference containing one or more of the following keys:

'code'

The value for this key will be a string representing the source code for the new subroutine into which the original code was refactored.

'call'

The value for this key will be a string representing the source code for the specific call to the new subroutine (including it's arguments) that can be used to replace the original code.

'return'

The value of this key will be a reference to an hash, whose keys are the names of the variables present inside the original code that was refactored, and whose values are the equivalent names of those variables in the refactored code.

The purpose of these information is to allow your code to present the user with a list of possible return values to select from (i.e. the keys of the hash) and then install a suitable return statement (i.e. the value of the selected key).

'failed'

This key will be present only when the attempt to refactor the code failed for some reason. The value of this key will be a string containing the reason that the original code could not be refactored. See "DIAGNOSTICS" for a list of these error messages.

Note that, if the 'failed' key is present in the returned hash, then the hash may not contain entries for 'code', 'call', or 'return'.

Hence a generic usage might be:

my $refactoring = refactor_to_sub( $original_code );

if (exists $refactoring->{failed}) {
    warn $refactoring->{failed}
}
else {
    replace_original_code_with( $refactoring->{call} );
    add_subroutine_definition(  $refactoring->{code} );
}

Hoisting an expression to a variable or closure

To refactor a single Perl expression into a scalar variable or a lexical closure, call the hoist_to_lexical() subroutine, which is automatically exported when the module is loaded:

my $refactored = hoist_to_lexical( $source_code_string, \%options );

Note that this subroutine does not actually rewrite the source code with the hoisting; it merely returns the components with which you could transform the original source yourself.

The subroutine takes a single required argument: a string containing the complete source code within which some expression is to be refactored.

The options specify where and how to refactor that expression, as follows:

from => $starting_string_index
to => $ending_string_index

These two options are actually required. They must be non-negative integer values that represent the indexes in the string where the expression you wish to refactor begins and ends.

name => $name_of_new_lexical

This option allows you to specify the name of the new lexical variable or closure. If it is not provided, the module uses a bad generic name instead (__REFACTORED_LEXICAL), which you'll have to change anyway, so passing the option is strongly recommended.

all => $boolean

This option allows you to specify whether the refactorer should attempt to hoist every instance of the specified expression (if the option is true) or just the selected instance (if the option is false or omitted).

closure => $boolean

This option allows you to specify whether the refactorer should attempt to hoist the specified expression into a closure (if the option is true), instead of into a lexical variable (if the option is false or omitted).

Closures are a better choice whenever the expression has side-effects, otherwise the behaviour of the refactored code will most likely change. The hoist_to_lexical() subroutine can detect some types of side-effects automatically, and will automatically use a closure in those cases, regardless of the value of this option.

The return value of hoist_to_lexical() in all contexts and in all cases is a hash reference containing one or more of the following keys:

'code'

The value for this key will be a string representing the source code for the new variable of closure declaration into which the original expression was refactored.

'call'

The value for this key will be a string representing the source code for the specific call to the new closure, or use of the new variable, that can be used to replace the original expression.

'hoistloc'

The string index into the source string at which the 'code' declaration should be installed.

'matches'

A reference to an array of hashes. Each hash represents one location where the specified expression was found, and the number of characters it occupies in the string.

For example:

matches => [
             { from => 140, length => 24 },
             { from => 180, length => 22 },
             { from => 299, length => 26 },
           ],
'mutators'

The number of mutation operators detected in the expression. If this number is not zero, refactoring into a variable instead of the closure will usually change the behaviour of the entire code. hoist_to_lexical() tries its darnedest to prevent that.

'target'

The actual selected expression that was hoisted.

'use_version'

A version object representing the version that the source code claimed to require (via an embedded use VERSION statement).

'failed'

This key will be present only when the attempt to refactor the code failed for some reason. The value of this key will be a string containing the reason that the original code could not be refactored. See "DIAGNOSTICS" for a list of these error messages.

Note that, if the 'failed' key is present in the returned hash, then the hash may not contain entries for 'code', 'call', or the other keys listed above.

Analysing variable usage within some source code

To detect and analyse the declaration and usage of variables in a piece of source code, call the classify_all_vars_in() subroutine which is exported by default when the module is used.

The subroutine takes a single argument: a string containing the source code to be analysed.

It returns a hash containing two keys:

'use_version'

The value of this key is a version object representing the version that the source code claimed it required, via an embedded use VERSION statement.

'vars'

A hash of hashes, each of which represents one distinct variable in the source code. The key of each subhash is the string index within the source at which the variable was declared (or a unique negative number) if the variable wasn't declared. Each subhash has the following structure:

{
  decl_name      => '$name_and_sigil_with_which_the_variable_was_declared',
  sigil          => '$|@|%',
  aliases        => \%hash_of_any_known_aliases_for_the_variable,

  declarator     => "my|our|state|for|sub",
  declared_at    => $string_index_where_declared,
  used_at        => \%hash_of_indexes_within_source_string_where_variable_used,

  desc           => "text of any comment on the same line as the declaration",

  start_of_scope => $string_index_where_variable_came_into_scope,
  end_of_scope   => $string_index_where_variable_went_out_of_scope,
  scope_scale    => $fraction_of_the_complete_source_where_variable_is_in_scope,

  is_builtin     => $true_if_variable_is_a_standard_Perl_built_in,

  homograms      => \%hash_of_names_and_keys_of_other_variables_with_the_same_name,
  parograms      => \%hash_of_names_and_keys_of_other_variables_with_similar_names,
  is_cacogram    => $true_if_variable_name_is_pitifully_generic_and_uninformative,
}

Renaming a variable

To rename a variable throughout the source code, call the rename_variable() subroutine, which is exported by default.

The subroutine expects three arguments:

  • The original source code (as a string),

  • A string index at which some usage of the variable is located (i.e. a point in the source where a hypothetical cursor would be "over" the variable).

  • The new name of the variable.

The subroutine returns a hash with a single entry:

{ source => $copy_of_source_string_with_the_variable_renamed }

If the specified string index does not cover a variable, a hash is still returned, but with the single entry:

{ failed => "reason_for_failure" }

Vim integration

The module distribution includes a Vim plugin: vim/perlart.vim

This plugin sets up a series of mappings that refactor or rename code elements that have been visually selected or on which the cursor is sitting.

For example, the <CTRL-S> mapping yanks the visual selection, refactors the code into a subroutine, requests a name for the new subroutine, requests a return value (if one seems needed), and then pastes the resulting subroutine call over the original selected text.

The mapping also places the resulting subroutine definition code in the unnamed register, as well as in register "s (for "subroutine"), so that the definition is easy to paste back into your source somewhere.

The following Normal mode mappings re also available:

<CTRL-N>

Rename the variable under the cursor.

<CTRL-S>

Search for all instances of the variable under the cursor.

WARNING: In some environments, CTRL-S will suspend terminal interactions. If your terminal locks up when you use this mapping, hit CTRL-Q to restart terminal interactions. In this case, you will need to either change the behaviour of CTRL-S in your terminal (for example: https://coderwall.com/p/ltiqsq/disable-ctrl-s-and-ctrl-q-on-terminal), or else change this mapping to something else.>

gd

Jump to the declaration of the variable under the cursor.

*

Jump to the next usage of the variable under the cursor.

The following Visual mode mappings are also available:

<CTRL-H>

Hoist all instances of the visually selected code into a lexical variable.

<CTRL-C>

Hoist all instances of the visually selected code into a lexical closure.

<CTRL-S>

Refactor all instances of the visually selected code into a parameterized subroutine.

<CTRL-H><CTRL-H>
<CTRL-C><CTRL-C>
<CTRL-S><CTRL-S>

Same as the single-control-character versions above, but these only refactor the code actually selected, rather than every equivalent instance throughout the buffer.

DIAGNOSTICS

The analysis and refactoring subroutines all return a hash, in all cases. However, if any subroutine cannot perform its task (usually because the code it has been given is invalid), then the returned hash will contain the key 'failed', and the corresponding value will give a reason for the failure (if possible).

The following failure messages may be encountered:

failed => 'invalid source code'

The code you passed in as the first argument could not be recognized by PPR as a valid Perl.

There is a small chance this was caused by a bug in PPR, but it's more likely that something was wrong with the code you passed in.

failed => 'not a valid series of statements'

The subset of the code you asked refactor_to_sub() to refactor could not be recognized by PPR as a refactorable sequence of Perl statements.

Check whether you caught an extra unmatched opening or closing brace, or started in the middle of a string.

failed => 'the code has an internal return statement'

If the code you're trying to put into a subroutine contains a (conditional) return statement anywhere but at the end of the fragment, then there's no way to refactor it cleanly into another subroutine, because the internal return will return from the newly refactored subroutine, not from the place where you'll be replacing the original code with a call tothe newly refactored subroutine. So refactor_to_sub() doesn't try.

failed => "code has both a leading assignment and an explicit return"

If you're attempting to refactor a fragment of code that starts with the rvalue of an assignment, and ends in a return, there's no way to put both into a new subroutine and still have the previous behaviour of the original code preserved. So refactor_to_sub() doesn't try.

failed => "because the target code is not a simple expression"

Only simple expressions (not full statements) can be hoisted into a lexical variable or closure. You tried to hoist something "bigger" than that.

failed => "because there is no variable at the specified location"

You called classify_var_at() but gave it a position in the source code where there was no variable. If you're doing that from within some editor, you may have an out-by-one error if the buffer positions you're detecting and passing back to the module start at 1 instead of zero.

failed => 'because the apparent variable is not actually a variable'

You called classify_var_at() but gave it a position in the source code where there was no variable. It looks like there is a variable there, but there isn't. Is the apparent variable actually in an uninterpolated string, or a comment, or some POD, or after the __DATA__ or __END__ marker?

API errors are signalled by throwing an exception:

"%s argument of %s must be a %s"

You called the specified subroutine with the wrong kind of argument. The error message will specify which argument and what kind of value it requires.

"Unexpected extra argument passed to %s"

You called the specified subroutine with an extra unexpected argument. Did you mean to put that argument in the subroutine's options hash instead?

"Unknown option (%s) passed to %s"

You passed an unexpected named argument via the specified subroutine's options hash. Did you misspell it, perhaps?

CONFIGURATION AND ENVIRONMENT

Code::ART requires no configuration files or environment variables.

DEPENDENCIES

The PPR module (version 0.000027 or later)

INCOMPATIBILITIES

Because this module relies on the PPR module, it will not run under Perl 5.20 (because regexes are broken in that version of Perl).

BUGS AND LIMITATIONS

These refactoring and analysis algorithms are not intelligent or self-aware. They do not understand the code they are processing, and especially not the purpose or intent of that code. They are merely applying a set of heuristics (i.e. informed guessing) to try to determine what you actually wanted the replacement code to do. Sometimes they will guess wrong. Treat them as handy-but-dumb tools, not as magical A.I. superfriends. Trust...but verify.

No bugs have been reported.

Please report any bugs or feature requests to bug-code-art@rt.cpan.org, or through the web interface at http://rt.cpan.org.

AUTHOR

Damian Conway <DCONWAY@CPAN.org>

LICENCE AND COPYRIGHT

Copyright (c) 2018, Damian Conway <DCONWAY@CPAN.org>. All rights reserved.

This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself. See perlartistic.

DISCLAIMER OF WARRANTY

BECAUSE THIS SOFTWARE IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY FOR THE SOFTWARE, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE SOFTWARE "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE SOFTWARE IS WITH YOU. SHOULD THE SOFTWARE PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR, OR CORRECTION.

IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR REDISTRIBUTE THE SOFTWARE AS PERMITTED BY THE ABOVE LICENCE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE SOFTWARE (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE SOFTWARE TO OPERATE WITH ANY OTHER SOFTWARE), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.

3 POD Errors

The following errors were encountered while parsing the POD:

Around line 2464:

Expected text after =item, not a bullet

Around line 2572:

'=item' outside of any '=over'

Around line 2592:

You forgot a '=back' before '=head1'