NAME
Chess::PGN::Parse - reads and parses PGN (Portable Game Notation) Chess files
SYNOPSIS
use Chess::PGN::Parse;
use English qw( -no_match_vars );
my $pgnfile = "kk_2001.pgn";
my $pgn = new Chess::PGN::Parse $pgnfile
or die "can't open $pgnfile\n";
while ($pgn->read_game()) {
print $pgn->white, ", " , $pgn->black, ", ",
$pgn->result, ", ",
$pgn->game, "\n";
}
use Chess::PGN::Parse;
my $text ="";
{
local $INPUT_RECORD_SEPARATOR = undef;
open PGN "< $pgnfile" or die;
$text = <PGN>;
close $text;
}
# reads from string instead of a file
my $pgn = new Chess::PGN::Parse undef, $text;
while ($pgn->read_game()) {
print $pgn->white, ", " , $pgn->black, ", ",
$pgn->result, ", ",
$pgn->game, "\n";
}
use Chess::PGN::Parse;
my $pgnfile = "kk_2001.pgn";
my $pgn = new Chess::PGN::Parse $pgnfile
or die "can't open $pgnfile\n";
my @games = $pgn->smart_read_all();
DESCRIPTION
Chess::PGN::Parse offers a range of methods to read and manipulate Portable Game Notation files. PGN files contain chess games produced by chess programs following a standard format (http://www.schachprobleme.de/chessml/faq/pgn/). It is among the preferred means of chess games distribution. Being a public, well established standard, PGN is understood by many chess archive programs. Parsing simple PGN files is not difficult. However, dealing with some of the intricacies of the Standard is less than trivial. This module offers a clean handle toward reading and parsing complex PGN files.
A PGN file has several tags, which are key/values pairs at the header of each game, in the format [key "value"]
After the header, the game follows. A string of numbered chess moves, optionally interrupted by braced comments and recursive parenthesized variants and comments. While dealing with simple braced comments is straightforward, parsing nested comments can give you more than a headache.
Chess::PGN::Parse most immediate methods are: read_game() reads one game, separating the tags and the game text.
parse_game() parse the current game, and stores the moves into an
array and optionally saves the comments into an array of hashes
for furter usage. It can deal with nested comments and recursive
variations.
quick_parse_game() Same as the above, but doesn't save the comments,
which are just stripped from the text. It can't deal with nested
comments. Should be the preferred method when we know that we are
dealing with simple PGNs.
smart_parse_game() Best of the above methods. A preliminary check
will call parse_game() or quick_parse_game(), depending on the
presence of nested comments in the game.
read_all(), quick_read_all(), smart_read_all() will read all the records
in the current PGN file and return an array of hashes with all the
parsed details from the games.
Parsing games
Parsing PGN games is actually two actions: reading and parsing. The reading will only identify the two components of a game, i.e. the tags and the moves text. During this phase, the tags are decomposed and stored into an internal hash for future use, while the game text is left untouched.
Reading a game is accomplished through the read_game() method, which will identify not only the standard game format but also some unorthodox cases, such as games with no separating blank line between tags and moves, games with no blank lines at the end of the moves, leading blank lines, tags spanning over several lines and some minor quibbles. If you know that your games don't have any of these problems, you might choose the read_standard_game() method, which is a bit faster.
After the reading, you can either use the game text as it is, or you can ask for parsing. What is it? Parsing is the process of identifying and isolating the moves from the rest of the game text, such as comments and recursive variations. This process can be accomplished in two ways: using quick_parse_game(), the non moves elements are just stripped off and discarded, leaving an array of bare moves. If the comments and the recursive variations (RAV) are valuable to you, you can use the parse_game() method, which will strip the excess text, but it can store it into an appropriate data structure. Passing the option {save_comments =>'yes'} to parse_game(), game comments will be stored into a hash, having as key the move number + color. Multiple comments for the same move are appended to the previous one. If this structure doesn't provide enough details, a further option {comments_struct => 'array'} will store an array of comments for each move. Even more details are available using {comments_struct => 'hol'}, which will trigger the creation of a hash of lists (hol), where the key is the comment type (RAV, NAG, brace, semicolon, escaped) and the value is a list of homogeneous comments belonging to the same move.
A further option {log_errors => 'yes'} will save the errors into a structure similar to the comments (no options on the format, though. All errors for one given move are just a string). What are errors? Just anything that is not recognized as any of the previous elements. Not a move, or a move number, or a comment, either text or recursive. Anything that the parser cannot actively classify as 'known' will be stored as error.
Getting the parsed values
At the end of the exercise, you can access the components through some standard methods. The standard tags have their direct access method (white, black, site, event, date, result, round). More methods give access to some commonly used elements: game() is the unparsed text, moves() returns an array of parsed moves, without move numbers, comments() and errors() return the relative structures after parsing. About game(), it's worth mentioning that, using quick_parse_game(), the game text is stripped of all non moves elements. This is an intended feature, to privilege speed. If you need to preserve the original game text after parsing, either copy it before calling quick_parse_game() or use parse_game() instead.
Recursive Parsing
PGN games may include RAV (Recursive Annotated Variations) which is just game text inside parentheses. This module can recognize RAV sequences and store them as comments. One of the things you can do with these sequences is to parse them again and get bare moves that you can feed to a chess engine or a move analyzer (Chess::PGN::EPD by H.S.Myers is one of them). Chess::PGN::Parse does not directly support recursive parsing of games, but it makes it possible. Parse a game, saving the comments as hash of list (see above), and then check for comments that are of 'RAV' type. For each entry in the comments array, strip the surrounding parentheses and create a new Chess::PGN::Parse object with that text. Easier to do than to describe, actually. For an example of this technique, check the file examples/test_recursive.pl.
EXPORT
new, STR, read_game, tags, event, site, white, black, round, date, result, game , NAG, moves
DEPENDENCIES
IO::File
Class methods
- new()
-
Create a new Chess::PGN::Parse object (requires file name) my $pgn = Chess::PGN::Parse->new "filename.pgn" or die "no such file \n";
- NAG() returns the corresponding Numeric Annotation Glyph
- STR()
-
returns the Seven Tags Roster array
@array = $pgn->STR(); @array = PGNParser::STR();
- event()
-
returns the Event tag
- site()
-
returns the Site tag
- date()
-
returns the Date tag
- white()
-
returns the White tag
- black()
-
returns the Black tag
- result()
-
returns the result tag
- round()
-
returns the Round tag
- game()
-
returns the unparsed game moves
- time()
-
returns the Time tag
- eco()
-
returns the ECO tag
- eventdate()
-
returns the EventDate tag
- moves()
-
returns an array reference to the game moves (no numbers)
- comments()
-
returns a hash reference to the game comments (the key is the move number and the value are the comments for such move)
- errors()
-
returns a hash reference to the game errors (the key is the move number and the value are the errors for such move)
- set_event()
-
returns or modifies the Event tag
- set_site()
-
returns or modifies the Site tag
- set_date()
-
returns or modifies the Date tag
- set_white()
-
returns or modifies the White tag
- set_black()
-
returns or modifies the Black tag
- set_result()
-
returns or modifies the result tag
- set_round()
-
returns or modifies the Round tag
- set_game()
-
returns or modifies the unparsed game moves
- set_time()
-
returns or modifies the Time tag
- set_eco()
-
returns or modifies the ECO tag
- set_eventdate()
-
returns or modifies the EventDate tag
- set_moves()
-
returns or modifies an array reference to the game moves (no numbers)
-
returns a hash reference to all the parsed tags
$hash_ref = $pgn->tags();
- read_all()
-
Will read and parse all the games in the current file and return a reference to an array of hashes. Each hash item contains both the raw data and the parsed moves and comments
Same parameters as for parse_game(). Default : discard comments
my $games_ref = $pgn->read_all();
- quick_read_all()
-
Will read and quick parse all the games in the current file and return a reference to an array of hashes. Each hash item contains both the raw data and the parsed moves Comments are discarded. Same parameters as for quick_parse_game().
my $games_ref = $pgn->quick_read_all();
- smart_read_all()
-
Will read and quick parse all the games in the current file and return a reference to an array of hashes. Each hash item contains both the raw data and the parsed moves Comments are discarded. Calls smart_read_game() to decide which method is best to parse each given game.
my $games_ref = $pgn->smart_read_all();
- read_game()
-
reads the next game from the given PGN file. Returns TRUE (1) if successful (= a game was read) or FALSE (0) if no more games are available or an unexpected EOF occurred before the end of parsing
while ($pgn->read_game()) { do_something_smart; }
It can read standard and in some cases even non-standard PGN games. The following deviance from the standard are handled:
1. no blank line between tags and moves; 2. no blank line between games 3. blank line(s) before a game (start of file) 4. multiple tags in the same line 5. tags spanning over more lines (can't cumulate with rule 4) 6. No tags (only moves). (can't cumulate with rule 2) 7. comments (starting with ";") outside the game text
- read_standard_game()
-
reads the next game from the given PGN file. Returns TRUE (1) if successful (= a game was read) or FALSE (0) if no more games are available or an unexpected EOF occurred before the end of parsing
while ($pgn->read_standard_game()) { do_something_smart; }
This method deals only with well formed PGN games. Use the more forgiving read_game() for PGN files that don't fully respect the PGN standard.
_get_tags() returns a list of tags depending on the parameters _get_format() returns a format to be used when printing tags _get_formatted_tag() returns a tag formatted according to the given template.
- standard_PGN()
-
returns a string containing all current PGN tags, including the game. Parameters are passed through a hash reference. None is required. tags => [tag list], # default is the Seven Tags Roster. # You may specify only the tags you want to # print # tags => [qw(White Black Result)] all_tags => 'no', # default 'no'. If yes (or 1), it outputs all the tags # if 'tags' and 'all_tags' are used, 'all_tags' # prevails nl => q{\n}, # default '\n'. Tag separator. Can be changed # according to your needs. # nl => '<br>\n' is a good candidate for HTML # output. brackets => q{[]}, # default '[]'. Output tags within brackets. # Bracketing can be as creative as you want. # If the left and rigth bracketing sequence are # longer than one character, they must be separated # by a pipe (|) symbol. # '()', '(|)\t,'{|}\n' and '{}' are valid # sequences. # # '<h1>|</h1>' will output HTML header 1 # '<b>{</b>|<b>}</b>\n' will enclose each tag # between bold braces. quotes => q{"}, # default '"'. Quote tags values. # As for brackets, quotes can be specified in # pairs: '<>' and '<|>' are equivalent. # If the quoting sequence is more than one char, # the pipe symbol is needed to separate the left # quote from the right one. # '<i>|</i>' will produce HTML italicized text. game => 'yes', # default 'yes'. Output the game text # If the game was parsed, returns a clean list # of moves, else the unparsed text comments => 'no' # Default 'no'. Output the game comments. # Requires the 'game' option
- smart_parse_game()
-
Parses the current game, returning the moves only. Uses by default quick_parse_game(), unless recursive comments are found in the source game.
- quick_parse_game()
-
Parses the current game, returning the moves only. Comments are discarded. This function does FAIL on Recursive Annotated Variation or nested comments. Parameters (passed as a hash reference): check_moves = 'yes'|'no'. Default : no. If requested, each move is checked against a RegEx, to filter off possible unbraced comments.
- parse_game()
-
Parses the current game (after read_game() was called). Accepts parameters as hash reference.
$pgn->parse_game(); # default save_comments => 'no' $pgn->parse_game({ save_comments => 'yes', comments_struct => 'string'});
{comments_struct => 'string'} is the default value When 'comments_struct' is 'string', multiple comments for the same move are concatenated to one string
{comments_struct => 'array'} If 'array', comments are stored as an anonymous array, one comment per element
{comments_struct => 'hol'} If 'hol', comments are stored as a hash of lists, where there is a list of comments for each comment type (NAG, RAV, braced, semicolon, escaped)
$pgn->parse_game({save_comments => 'yes', log_errors => 'yes'});
parse_game() implements a finite state machine on two assumptions:
1. No moves or move numbers are truncated at the end of a line; 2. the possible states in a PGN game are: a. move number b. move c. braced comment d. EOL comment e. Numeric Annotation Glyph f. Recursive Annotated Variation g. Result h. unbraced comments (barewords, "!?+-=")
Items from "a" to "g" are actively parsed and recognized. Anything unrecognized goes into the "h" state and discarded (or stored, if log_errors was requested)
- add_comments()
-
Allows inserting comments for an already parsed game; it accepts comments passed as an anonymous hash. An optional second parameter sets the storage type. They are the same as for parse_game(); 'string' (default) all comments for a given move are concatenated together 'array' each comment for a given move is stored as an array element 'hol' Comments are stored in a hash of lists different for each comment type.
- shrink_epd()
-
Given a EPD (Extended Position Description) string, shrink_epd() will convert it into a bit string, which reduces the original by about 50%. It can be restored to the original string by expand_epd()
- expand_epd()
-
given a EPD bitstring created by shrink_epd(), expand_epd() will restore the original text.
AUTHOR
Giuseppe Maxia, gmax@cpan.org
THANKS
Thanks to - Hugh S. Myers for advice, support, testing and brainstorming; - Damian Conway for the recursive Regular Expressions used to parse comments; - all people at PerlMonks (www.perlmonks.org) for advice and good developing environment. - Nathan Neff for pointing out an insidious, hard-to-spot bug in my RegExes.
COPYRIGHT
The Chess::PGN::Parse module is Copyright (c) 2002 Giuseppe Maxia, Sardinia, Italy. All rights reserved.
You may distribute this software under the terms of either the GNU General Public License version 2 or the Artistic License, as specified in the Perl README file. The embedded and encosed documentation is released under the GNU FDL Free Documentation License 1.1