NAME
DiaColloDB::Document::JSON - diachronic collocation db, source document, raw JSON
SYNOPSIS
##========================================================================
## PRELIMINARIES
use DiaColloDB::Document::JSON;
##========================================================================
## Constructors etc.
$doc = CLASS_OR_OBJECT->new(%args);
##========================================================================
## API: I/O: parse
$bool = $doc->fromFile($filename_or_fh, %opts);
DESCRIPTION
DiaColloDB::Document::JSON provides a DiaColloDB::Document-compliant API for parsing corpus files in raw JSON format, assuming the stored data maps 1:1 onto the DiaColloDB::Document structure.
Globals & Constants
- Variable: @ISA
-
DiaColloDB::Document::JSON inherits from DiaColloDB::Document and supports the DiaColloDB::Document API.
Constructors etc.
- new
-
$doc = CLASS_OR_OBJECT->new(%args);
%args, object structure:
##-- document data date =>$date, ##-- year tokens =>\@tokens, ##-- tokens, including undef for EOS meta =>\%meta, ##-- document metadata (e.g. author, title, collection, ...)
Each token in @tokens is a HASH-ref {w=>$word,p=>$pos,l=>$lemma,...}, or undef for EOS.
API: I/O: parse
- fromFile
-
$bool = $doc->fromFile($filename_or_fh, %opts);
parse tokens from $filename_or_fh. %opts: clobbers %$doc.
EXAMPLE
The following is an example file in the format accepted by this module:
{
"date" : "2016",
"meta" : {
"author" : "Jurish, Bryan",
"collection" : "tiny",
"date_" : "2016-02-25",
"genre" : "dummy",
"textClass" : "dummy:test-data",
"title" : "test document"
},
"tokens" : [
"#s",
"#p",
"#file",
{
"l" : "this",
"p" : "DT",
"w" : "This"
},
{
"l" : "be",
"p" : "VBZ",
"w" : "is"
},
{
"l" : "a",
"p" : "DT",
"w" : "a"
},
{
"l" : "test",
"p" : "NN",
"w" : "test"
},
{
"l" : ".",
"p" : "SENT",
"w" : "."
},
null,
"#s",
{
"l" : "this",
"p" : "DT",
"w" : "This"
},
{
"l" : "be",
"p" : "VBZ",
"w" : "is"
},
{
"l" : "only",
"p" : "RB",
"w" : "only"
},
{
"l" : "a",
"p" : "DT",
"w" : "a"
},
{
"l" : "test",
"p" : "NN",
"w" : "test"
},
{
"l" : ".",
"p" : "SENT",
"w" : "."
},
null,
"#s",
"#p",
{
"l" : "this",
"p" : "DT",
"w" : "This"
},
{
"l" : "be",
"p" : "VBZ",
"w" : "is"
},
{
"l" : "still",
"p" : "RB",
"w" : "still"
},
{
"l" : "a",
"p" : "DT",
"w" : "a"
},
{
"l" : "test",
"p" : "NN",
"w" : "test"
},
{
"l" : ".",
"p" : "SENT",
"w" : "."
},
null
]
}
AUTHOR
Bryan Jurish <moocow@cpan.org>
COPYRIGHT AND LICENSE
Copyright (C) 2016-2020 by Bryan Jurish
This package is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.14.2 or, at your option, any later version of Perl 5 you may have available.