NAME
Text::Tradition::Collation - a software model for a text collation
SYNOPSIS
use Text::Tradition;
my $t = Text::Tradition->new(
'name' => 'this is a text',
'input' => 'TEI',
'file' => '/path/to/tei_parallel_seg_file.xml' );
my $c = $t->collation;
my @readings = $c->readings;
my @paths = $c->paths;
my @relationships = $c->relationships;
my $svg_variant_graph = $t->collation->as_svg();
DESCRIPTION
Text::Tradition is a library for representation and analysis of collated texts, particularly medieval ones. The Collation is the central feature of a Tradition, where the text, its sequence of readings, and its relationships between readings are actually kept.
CONSTRUCTOR
new
The constructor. Takes a hash or hashref of the following arguments:
tradition - The Text::Tradition object to which the collation belongs. Required.
linear - Whether the collation should be linear; that is, whether transposed readings should be treated as two linked readings rather than one, and therefore whether the collation graph is acyclic. Defaults to true.
baselabel - The default label for the path taken by a base text (if any). Defaults to 'base text'.
wit_list_separator - The string to join a list of witnesses for purposes of making labels in display graphs. Defaults to ', '.
ac_label - The extra label to tack onto a witness sigil when representing another layer of path for the given witness - that is, when a text has more than one possible reading due to scribal corrections or the like. Defaults to ' (a.c.)'.
wordsep - The string used to separate words in the original text. Defaults to ' '.
ACCESSORS
tradition
linear
wit_list_separator
baselabel
ac_label
wordsep
Simple accessors for collation attributes.
start
The meta-reading at the start of every witness path.
end
The meta-reading at the end of every witness path.
readings
Returns all Reading objects in the graph.
reading( $id )
Returns the Reading object corresponding to the given ID.
add_reading( $reading_args )
Adds a new reading object to the collation. See Text::Tradition::Collation::Reading for the available arguments.
del_reading( $object_or_id )
Removes the given reading from the collation, implicitly removing its paths and relationships.
has_reading( $id )
Predicate to see whether a given reading ID is in the graph.
reading_witnesses( $object_or_id )
Returns a list of sigils whose witnesses contain the reading.
paths
Returns all reading paths within the document - that is, all edges in the collation graph. Each path is an arrayref of [ $source, $target ] reading IDs.
add_path( $source, $target, $sigil )
Links the given readings in the collation in sequence, under the given witness sigil. The readings may be specified by object or ID.
del_path( $source, $target, $sigil )
Links the given readings in the collation in sequence, under the given witness sigil. The readings may be specified by object or ID.
has_path( $source, $target );
Returns true if the two readings are linked in sequence in any witness. The readings may be specified by object or ID.
relationships
Returns all Relationship objects in the collation.
add_relationship( $reading, $other_reading, $options, $changed_readings )
Adds a new relationship of the type given in $options between the two readings, which may be specified by object or ID. Returns a value of ( $status, @vectors) where $status is true on success, and @vectors is a list of relationship edges that were ultimately added. If an array reference is passed in as $changed_readings, then any readings that were altered due to the relationship creation are added to the array.
See Text::Tradition::Collation::Relationship for the available options.
register_relationship_type( %relationship_definition )
Add a relationship type definition to this collation. The argument can be either a hash or a hashref, defining the properties of the relationship. For relationship types and their properties, see Text::Tradition::Collation::RelationshipType.
get_relationship_type( $relationship_name )
Retrieve the RelationshipType object for the relationship with the given name.
merge_readings( $main, $second, $concatenate, $with_str )
Merges the $second reading into the $main one. If $concatenate is true, then the merged node will carry the text of both readings, concatenated with either $with_str (if specified) or a sensible default (the empty string if the appropriate 'join_*' flag is set on either reading, or else $self->wordsep.)
The first two arguments may be either readings or reading IDs.
merge_related( @relationship_types )
Merge all readings linked with the relationship types given. If any of the selected type(s) is not a colocation, the graph will no longer be linear. The majority/plurality reading in each case will be the one kept.
WARNING: This operation cannot be undone.
compress_readings
Where possible in the graph, compresses plain sequences of readings into a single reading. The sequences must consist of readings with no relationships to other readings, with only a single witness path between them and no other witness paths from either that would skip the other. The readings must also not be marked as nonsense or bad grammar.
WARNING: This operation cannot be undone.
duplicate_reading( $reading, @witlist )
Split the given reading into two, so that the new reading is in the path for the witnesses given in @witlist. If the result is that certain non-colocated relationships (e.g. transpositions) are no longer valid, these will be removed. Returns the newly-created reading.
clear_witness( @sigil_list )
Clear the given witnesses out of the collation entirely, removing references to them in paths, and removing readings that belong only to them. Should only be called via $tradition->del_witness.
reading_witnesses( $reading )
Return a list of sigils corresponding to the witnesses in which the reading appears.
OUTPUT METHODS
as_svg( \%options )
Returns an SVG string that represents the graph, via as_dot and graphviz. See as_dot for a list of options. Must have GraphViz (dot) installed to run.
as_dot( \%options )
Returns a string that is the collation graph expressed in dot (i.e. GraphViz) format. Options include:
from
to
color_common
path_witnesses( $edge )
Returns the list of sigils whose witnesses are associated with the given edge. The edge can be passed as either an array or an arrayref of ( $source, $target ).
as_adjacency_list
Returns a JSON structure that represents the collation sequence graph.
as_graphml
Returns a GraphML representation of the collation. The GraphML will contain two graphs. The first expresses the attributes of the readings and the witness paths that link them; the second expresses the relationships that link the readings. This is the native transfer format for a tradition.
as_csv
Returns a CSV alignment table representation of the collation graph, one row per witness (or witness uncorrected.)
as_tsv
Returns a tab-separated alignment table representation of the collation graph, one row per witness (or witness uncorrected.)
alignment_table
Return a reference to an alignment table, in a slightly enhanced CollateX format which looks like this:
$table = { alignment => [ { witness => "SIGIL",
tokens => [ { t => "TEXT" }, ... ] },
{ witness => "SIG2",
tokens => [ { t => "TEXT" }, ... ] },
... ],
length => TEXTLEN };
NAVIGATION METHODS
reading_sequence( $first, $last, $sigil, $backup )
Returns the ordered list of readings, starting with $first and ending with $last, for the witness given in $sigil. If a $backup sigil is specified (e.g. when walking a layered witness), it will be used wherever no $sigil path exists. If there is a base text reading, that will be used wherever no path exists for $sigil or $backup.
readings_at_rank( $rank )
Returns a list of readings at a given rank, taken from the alignment table.
next_reading( $reading, $sigil );
Returns the reading that follows the given reading along the given witness path.
prior_reading( $reading, $sigil )
Returns the reading that precedes the given reading along the given witness path.
common_readings
Returns the list of common readings in the graph (i.e. those readings that are shared by all non-lacunose witnesses.)
path_text( $sigil [, $start, $end, $use_normal_form ] )
Returns the text of a witness (plus its backup, if we are using a layer) as stored in the collation. The text is returned as a string, where the individual readings are joined with spaces and the meta-readings (e.g. lacunae) are omitted. Optional specification of $start and $end allows the generation of a subset of the witness text. Optional specification of $use_normal_form produces a text based on the normal form, rather than the raw text, of the reading.
known_path_text( $use_normal_form, @sequence )
Returns the text of a given sequence of readings. No attempt is made to validate the sequence in question. If $use_normal_form is set to true, the normal form of each reading in the sequence will be used to construct the text.
INITIALIZATION METHODS
These are mostly for use by parsers.
make_witness_path( $witness )
Link the array of readings contained in $witness->path (and in $witness->uncorrected_path if it exists) into collation paths. Clear out the arrays when finished.
make_witness_paths
Call make_witness_path for all witnesses in the tradition.
calculate_ranks
Calculate the reading ranks (that is, their aligned positions relative to each other) for the graph. This can only be called on linear collations.
flatten_ranks
A convenience method for parsing collation data. Searches the graph for readings with the same text at the same rank, and merges any that are found.
identical_readings =head2 identical_readings( start => $startnode, end => $endnode ) =head2 identical_readings( startrank => $startrank, endrank => $endrank )
Goes through the graph identifying all pairs of readings that appear to be identical, and therefore able to be merged into a single reading. Returns the relevant identical pairs. Can be restricted to run over only a part of the graph, specified either by node or by rank.
calculate_common_readings
Goes through the graph identifying the readings that appear in every witness (apart from those with lacunae at that spot.) Marks them as common and returns the list.
text_from_paths
Calculate the text array for all witnesses from the path, for later consistency checking. Only to be used if there is no non-graph-based way to know the original texts.
UTILITY FUNCTIONS
common_predecessor( $reading_a, $reading_b )
Find the last reading that occurs in sequence before both the given readings. At the very least this should be $self->start.
common_successor( $reading_a, $reading_b )
Find the first reading that occurs in sequence after both the given readings. At the very least this should be $self->end.
BUGS/TODO
Rework XML serialization in a more modular way
LICENSE
This package is free software and is provided "as is" without express or implied warranty. You can redistribute it and/or modify it under the same terms as Perl itself.
AUTHOR
Tara L Andrews <aurum@cpan.org>