NAME
Finance::Shares::Model - Apply tests to series of stock quotes
SYNOPSIS
use Finance::Shares::Model;
my $fsm = new Finance::Shares::Model( @spec );
$fsm->build();
DESCRIPTION
One or more graphs are built from a specification in the form of a list of hash/array references. Apart from a few configuration entries documented under CONSTRUCTOR, the specification deals with ten types of resource:
- sources
-
Declare where the price and volume data comes from.
- charts
-
These determine the charts' features, and can have a large number of sometimes deeply nested options.
- files
-
Control how the charts are output.
- stocks
-
Lists or individual stock abbreviations.
- dates
-
These entries specify the time period considered and how frequently graph entries occur.
- samples
-
The heart of the model, these determine which stocks, dates, charts, lines etc. are to be used to produce one or more chart pages.
- groups
-
Named groups of
sample
settings. - names
-
It is possible to use short aliases for function names, for example.
- lines
-
These control how the data is processed and how the various functions are used.
- tests
-
Segments of perl code invoking a variety of actions depending on programmed conditions.
To fetch any data, there must be a source and a stock code specified. dates, files and charts have suitable defaults, and groups and names are optional. Nothing much happens without specifying at least one line or test and, of course, a sample entry to bring them all together.
The examples and tests tend to show the resources in the same order because they are easier to find that way. But this is not a requirement - any order will do.
The fsmodel Script
The normal way of using this module would be via the script fsmodel which has its own man page. Although fsmodel can fetch data and draw charts without one, non-trivial usage requires a model specification - a perl file ending with a list of resource definitions.
See "Using the fsmodel Script" in Finance::Shares::Overview for how to set it up and some examples to try. Also see the manpage fsmodel for further details.
Specification Format
A model is specified as a list of these resource definitions. If nothing is given to the constructor, a blank graph called default.ps is output. But that isn't much use.
A minimal model might specify a source, with a stock and a date group. This would then just display the price and volume data.
Example
my @spec = (
filename => 'hpq2',
source => 'hpq.csv',
stock => 'HPQ',
date => {
start => '2003-04-01',
end => '2003-06-30',
by => 'weekdays'
},
);
The file hpq.csv would hold daily quotes for Hewlett Packard from April 1st to June 30th 2003 in CSV format. The price and volume data will appear as two graphs on the same page, saved as hpq2.ps.
You will notice that most examples here terminate the hash/array definition with a comma (,) rather than a semi-colon (;). This is because they are part of a specification list. The fsmodel script expects a perl file with such a list as the last item. It might be pages long, but it's still one list.
Keys and Tags
Typically, the entries are in either an array or a hash ref. The array ref is most common as it may hold any number of entries. A specification may have any number of blocks with the same key. They are merged together with only the last tag entry counting if there are duplicates.
Example
files => [
small => {
width => 400,
height => 500,
},
tall => {
landscape => 0,
},
letter => {
paper => 'US-Letter',
},
],
Here three different PostScript file formats are specified.
Outer, system defined resource identifiers (like files
, here) will be referred to as keys, while the inner, user-chosen identifiers (small, tall, letter) are tags.
Singular and Plural
Generally, the keys identifying the resources can be either singular or plural (e.g. file
or files
). Both forms may be used interchangeably.
Tags, on the other hand must always be used exactly as defined. All identifiers are case sensitive.
[line
and test
tags may become perl variables and so should be of that form. file
and some chart
tags may be in quotes and contain spaces, as they are file names and graph titles. Anomalies like this are noted as they arise.]
Default Entries
The other, hash ref, format specifies a single entry which normally becomes the default. That is, it is used if a needed resource has not been specified. Unlike the array ref form, there can only be one default, so only the last is used if several hash refs for the same key are given.
Example
file => {
landscape => 1,
},
This declares a default entry which will be used if the sample hasn't specified a file resource. No tag is specified, but the module assigns the tag default
.
Single default entries like this are purely a convenience. If they are not present, the first entry in the array format is used as the default instead.
There are a couple of cases where the tag name has a special meaning.
- defaults
-
If one of the array entries is given the tag
default
, that is used instead of the first entry. - file names
-
For the
files
resource, tags specify the file name to use. The default file name can be specified seperately using thefilename
option. (See CONSTRUCTOR.)
Page Names
A page refers to the chart, where all the graphs for a particular data set are shown on the same page.
Mostly, functions use lines and data from their own chart page and return their own results there too. However, one of the motivations for this rewrite was to allow functions to use data from other charts, something which was impossible in the previous version.
Each page is created when a sample specifies a stock over a given date range. This combination uniquely identifies the quotes to be graphed and worked on. The page name is made from these tags seperated by forward slashes:
<sample_tag> / <stock_tag> / <dates_tag>
e.g.
shops/MKS.L/july
The output PostScript file may contain multiple pages. There are two ways to generate them.
Most straight-forward is the use of multiple sample entries, each specifying its own date and stock. One advantage here is that pages can be individually named.
dates => [
by_days => {
start => '2003-03-01',
end => '2003-05-31',
by => 'weekdays',
},
by_weeks => {
start => '2003-06-01',
end => '2003-08-31',
by => 'weeks',
},
],
samples => [
one => {
stock => 'AZN.L',
dates => 'by_weeks',
page => 'astra',
},
two => {
stock => 'GSK.L',
dates => 'by_weeks',
page => 'glaxo',
},
],
The names of the two pages would be
one/AZN.L/by_weeks
two/GSK.L/by_weeks
Alternatively multiple stocks/dates can be specified from a single sample. This is more powerful, and is the mode used by fsmodel.
# dates as above
sample => {
stocks => [qw(AZN.L GSK.L SHP.L)],
dates => ['by_days', 'by_weeks'],
},
The names of the six pages (3 stocks x 2 dates) would be
default/AZN.L/by_days
default/AZN.L/by_weeks
default/GSK.L/by_days
default/GSK.L/by_weeks
default/SHP.L/by_days
default/SHP.L/by_weeks
So how can these data sets be used?
Fully Qualified Line Names
Function names are appended to page names, with optional line identifiers (if the function produces several lines). These would be fully qualified line names (FQLN):
one/AZN.L/by_weeks/avg
one/AZN.L/by_weeks/bollinger/high
default/AZN.L/by_days/data/close
default/AZN.L/by_days/data/volume
line entries for functions (like moving_average) usually have a line
field inside the specification which indicates the source data to use in their calculations. avg
above would be the tag for such a line specification.
Normally the line is referred to simply by its tag. But when referring to a line on a different page, the fully qualified name is needed.
So it would be possible to indicate AstraZenica's volume on GlaxoSmithKline's chart in the following example.
lines => [
astra => {
function => 'moving_average',
period => 3,
gtype => 'volume',
line => 'astra///data/volume',
},
],
samples => [
astra => {
stock => 'AZN.L',
},
glaxo => {
stock => 'GSK.L',
line => 'astra',
},
],
A few things are worth noting.
- Tags are typed
-
The tag 'astra' can be used for both lines and samples without confusion.
- Using defaults in FQLNs
-
The fully qualified line name is not written as
astra/AZK.L/default/data/close
(although it may have been). Provided there is no ambiguity, any of the entries in a page/line name may be omitted. Here 'astra' (the sample) has only one stock entry and uses the default date option, so they may be left blank.
- Compatable graph types
-
The line used has volume-sized numbers but, like most functions, moving_average works with prices by default. If it were used directly, the glaxo price graph would either be scaled to include the foreign line or the new line would be scaled to fit the host chart. Neither is very useful, so it is a good idea to include at least a
gtype
(orgraph
) entry in every line specification.
Wild Cards
Some functions use more than one source line. These are specified within an array.
lines => [
comp1 => {
function => 'compare',
lines => [qw( /AZN.L//data/close /GKN.L//data/close )],
},
],
sample => {
stocks => [qw(AZN.L GKN.L)],
line => 'comp1',
},
To avoid writing out a lot of similar FQLNs the sample, stock and date sections of line/page names may be regular expressions. Also a single '*' in a field stands for 'all but this one'. Some examples:
- '/.+N.L/.*/average'
-
This would match the 'average' line on a number of pages:
default/AZN.L/by_days/average default/AZN.L/by_weeks/average default/GKN.L/by_days/average default/GKN.L/by_weeks/average
- '*//'
-
Matches every sample except the one being evaluated. Useful on a summary sheet which doesn't have the relevant lines anyway. The empty stock and date fields assume that the 'other' samples all have these set.
- '.*/.*/*'
-
Every sample, every stock and every other date set.
[NB: This facility is not at all solid. If you can make it work, fine. If it fails, write out all the FQLN's by hand. Look at the tests for patterns that work.]
Sources
A source
entry specifies where the price and volume data comes from. It can be one of the following.
- array
-
The array ref should point to quotes data held in sub-arrays.
- string
-
This should be the name of a CSV file holding quotes data.
- hash
-
A hash ref holding options for creating a new Finance::Shares::MySQL object.
- object
-
A Finance::Shares::MySQL object created elsewhere.
To be much use there must be at least one source, and the hash ref is probably the most useful. See Finance::Shares::MySQL for full details, but the top level keys that can be used here include:
hostname port
user password
database exchange
start_date end_date
mode tries
Example
sources => [
database => {
user => 'me',
password => 'Gu355',
database => 'mystocks',
},
test => 'test_file.csv',
],
Charts
These are the Finance::Shares::Chart options that control what the graphs look like. Throughout this document a chart
refers to a collection of grids (the graphs
) that appear on the same PostScript page.
See Finance::Shares::Chart for full details, but these top level sub-hash keys may be used in a chart resource:
bgnd_outline background
heading glyph_ratio
show_breaks smallest
heading_font normal_font
heading_size normal_size
heading_color normal_color
dpi key
x_axis graphs
[NB: Chart keys sample
, file
and page
are ignored as they are filled internally.]
graphs
is a special array listing descriptions for the graph grids that will appear on the page. Each graph sub-hash may contain the following keys, although 'points' is only for prices and 'bars' only for volumes. Generally, gtype
and percent
are essential. gtype
must be one of price
, volume
, analysis
or level
.
gtype percent
points bars
y_axis layout
show_dates
It is probably a good idea to pre-define repeated elements (e.g. colours, styles) using perl variables as values in the hash or sub-hashes.
Example
my $bgnd = [1, 1, 0.95];
...
chart => {
dpi => 72,
background => $bgnd,
show_breaks => 1,
key => {
background => $bgnd,
},
graphs => [
'Quotes' => {
percent => 60,
gtype => 'price',
points => {
color => [0.7, 0, 0.3],
},
},
'Trading Volume' => {
percent => 40,
gtype => 'volume',
bars => {
color => [0.3, 0, 0.7],
},
},
],
},
Files
The output is a collection of charts saved to a PostScript file. Each hash ref holds options for creating the PostScript::File object used. Once it is created, any charts using it will be added on a new page.
If the array form is used, it contains one or more named hash refs. The tags become the name of the file, with '.ps' appended.
See PostScript::File for full details but these are the more useful sub-hash keys:
paper eps
height width
bottom top
left right
clip_command clipping
dir reencode
landscape headings
png gs
Example
files => [
'Food-Retailers' => {
dir => '~/models',
paper => 'A5',
},
],
samples => [
sample1 => {
symbol => 'TSCO.L',
file => 'Food-Retailers',
...
},
],
Here the Tesco sample will appear as a page in the file ~/models/Food-Retailers.ps. In this case, the file
sample entry is the default (the first in files) and so could have been omitted.
Where more than one sample specifies the same file, they appear on different pages within it, in the order declared.
Stocks
These are EPIC codes identifying public companies' share quotes. The codes are those used by !Yahoo at http://finance.yahoo.com. Each tag may refer to either a single code or a list enclosed within square brackets.
Example
stocks => [
lloyds => 'LLOY.L',
banks => [qw(AL.L BARC.L BB.L HBOS.L)],
],
Dates
Each series of quotes must have at least a start and end date. These entries specify how the time axis is set up for each graph and which quotes are used for the function calculations.
Example
dates => {
start => '2003-01-01',
end => '2003-03-31',
by => 'weeks',
},
The entries are hash refs with the following fields.
- start
-
In YYYY-MM-DD format, this should be the first date we are interested in (but see before). (Defaults to 60 periods before end.)
- end
-
In YYYY-MM-DD format, this should be the last quote date (see after). (Defaults to today's date.)
- by
-
This specifies how the intervening dates are counted. Suitable values are
quotes
,weekdays
,days
,weeks
andmonths
. (Default: 'weekdays')quotes
is the only choice which is guaranteed to have no undefined data. But then it has no relationship to time.weekdays
is probably the closest, except that it becomes obvious when data is missing.The rest use a proper time axis, with
weeks
andmonths
using averaged data. If the dates axis is getting too crowded, these are a good way to cover a long period without using too many data points. - before
-
Many functions require a number of data items beforet their results are valid. By default, this lead time is not displayed. However, this allows the user to override that value. For example, 0 will specify no lead time - so all the data is displayed and the lines begin when they can.
It is worth noting that the
before
value is calculated from the line specifications, before the quotes are fetched. Any missing data will have to be made up from the graphed dates. - after
-
It is possible to specify a number of unused dates after the end of the data. This might be used for extrapolating function lines into the immediate future.
Samples
Pages are generated from sample entries. Defaults are generated for all essential features, making a valid (though not always useful) model. A sample normally has one entry for each resource type; the entries being tags found in the relevent resource blocks.
Example
samples => [
one => {
source => 'database',
chart => 'blue_graphs',
file => 'letter_format',
stock => ['banks', 'financial'],
dates => 'default',
lines => ['slow_avg', '10day_lows'],
tests => 'oversold',
group => '<none>',
page => 'bank1',
filename => 'banks',
},
],
There are a few things to note about this example.
-
All of these values ('database', 'blue_graphs') are meaningless in themselves. They should be tags in the appropriate resource block. So the <stocks> resource might include
stocks => [ banks => ['LLOY.L', 'HBOS.L', ... ], financial => ['AV.L', 'PRU.L', ... ], ... ],
- Singular and plural
-
For system defined keys (source, chart, stock etc.) a trailing 's' is optional. Sometimes it is more natural to use the singular and sometinmes the plural, but they are interchangeable.
- Lists
-
Many sample entries can take a list of tags within square brackets as well as a single scalar. E.g.
stock => 'STAN.L', stocks => ['STAN.L', 'HSBA.L'],
This makes no sense for
source
,chart
andfile
as there can be only one of each. However each chart can have manylines
andtests
.stocks
anddates
may also take multiple entries, but these are used to generate a number of seperate data sets. See "Page Names". - Defaults
-
If an entry is omitted the value 'default' is assumed (so the
dates
entry above was not really necessary). Although defaults are filled predictably, the system is quite complex - especially if multiple entries, default blocks, groups and configuration files are all in use. When in doubt, it is always safer (though less lazy) to be explicit.lines
andtests
are an exception. No defaults are used. If you want a line or test, you have to specify it. However, the system will automatically include any dependent lines - you don't have to list them all. - Other entries
-
These other keys may appear within a sample specification.
- group
-
See Groups for details.
- page
-
A short identifier used to 'number' the postscript pages. It is intended to be a number, letter, roman numeral etc. but any short word will do. PostScript viewers should use this to identify each sheet.
- filename
-
This is a convenience item allowing the user to specify individual filenames on a 'per sample' basis, leaving the files entry to be more generic. It is overridden by more global settings.
- Sample names
-
The sample tag,
one
, is never used and can be anything.
As a convenience, stocks
and dates
may be given directly instead of setting up a resource entry with a tag.
Example
sample => {
stock => 'HBOS.L',
date => {
start => '2003-09-01',
by => 'quotes',
},
},
Groups
It is quite likely that several samples will have many settings in common. Rather than repeating them, it is possible to put them in a group.
Example
sources => [
import => 'source.csv',
dbase => { ... },
],
files => [
summary => { ... },
pages => { ... },
],
charts => [
single => { ... },
quad => { ... },
],
lines => [
one => { ... },
two => { ... },
three => { ... },
four => { ... },
],
groups => [
main => {
file => 'pages',
chart => 'quad',
lines => [qw(one two)],
},
meta => {
file => 'summary',
chart => 'single',
},
],
samples => [
marks => { stock => 'MKS.L', page => 'marks' },
boots => { stock => 'BOOT.L', page => 'boots' },
dixons => { stock => 'DXN.L', page => 'dixons' },
argos => { stock => 'GUS.L', page => 'argos' },
totals => { group => 'meta', line => 'three' },
summary => { group => 'meta', line => 'four' },
],
group
provides shorthand for a group of settings, and makes editing easier.
page
is a kludge allowing individual pages to have their own page identifier. It actually sets the PostScript::File 'page' label, but is included as a sample key as it typically changes with each sample.
Names
This provides an 'alias' facility. For example, you might be fed up with typing 'moving_average' a million times. So your config file might include the following, allowing 'mov' to be used instead.
names => [
mov => 'moving_average',
],
Remember that any number of resource blocks may be used as they are merged together.
[This is another facility that hasn't been well tested. But if it doesn't work, you can always copy & paste. So far names are consulted for:
function module names
page names (i.e. aliases for 'sample/stock/date')
]
Lines
This array ref lists all the functions known to the model. Like the other resources, they may or may not be used. However, unlike the others, the sub-hashes are not all the same format as they may control a wide range of objects producing graph lines.
Example
lines => [
grad1 => {
function => 'gradient',
period => 1,
style => $fn_style,
},
grad_env => {
function => 'envelope',
percent => 5,
graph => 'analysis',
line => 'grad1',
style => $fn_style,
},
expo => {
function => 'exponential_average',
},
],
$fn_style
would probably be a hash ref holding PostScript::Graph::Style options.
There are three types of lines.
Data lines
These are built-in and never appear in a lines block. They all belong to the function data
and the individual line tags are open
, high
, low
, close
and volume
. Treat them as reserved words.
Dependent lines
Most functions produce lines in this category. They are defined in line blocks and usually show up as a line on a graph (though they may yield more depending on the function). As well as the function
field, they also must have line
and either gtype
or graph
entries. If these are omitted, they usually default to closing prices (i.e. line = data/close
and gtype = prices
).
Independent lines
These are defined in line blocks in the usual way, but they have no lines depending on them. Important built-in examples are value
, which draws horizontal lines and mark
which uses test code (see below) to fill the data points. More reserved words.
Common Entry Fields
The only requirement is that they must have a key, function
, whose value holds the name of the method. However, these keys are also common:
graph gtype
key line
order period
shown style
Example
lines => [
avg => {
function => 'moving_average',
gtype => 'volume',
line => ['volume'],
period => 20,
key => '20 day average of Volume',
order => 99,
style => {
bgnd_color => 0,
line => {
inner_color => [ 1, 0.7, 0.3 ],
outer_color => 0.4,
inner_dashes => [ 12, 3, 6, 3 ],
outer_dashes => [ ],
inner_width => 1.5,
outer_width => 2.5,
},
point => {
color => [ 0, 0.3, 1.0 ],
shape => 'plus',
size => 8,
width => 4,
},
},
},
],
This example has an abnormally large style
entry in order to illustrate the possible fields. The default styles are designed to draw each line in a different colour and often only need one or two fields specifying directly.
Tests
One of the original aims in writing these modules was to develop a suite that would link stock market analysis with the power of perl. Well, here it is - its reason for existence.
Tests are segments of perl code (mostly in string form) that are eval'ed to produce signals of various kinds. There are three types of entry.
- Simple text
-
The simplest and most common form, this perl fragment is eval'ed for every data point in the sample where it is applied. It can be used to produce conditional signal marks on the charts, noting dates and prices for potential stock positions either as printed output or by invoking a callback function.
Don't worry - it is only compiled once, and the compiled code is repeated - so it is just as efficient as running code in a script file.
- Hash ref
-
This may have three keys,
before
,during
andafter
. Each is a perl fragment, compiled and run before, during and after the sample points are considered.during
is thus identical to "Simple text".This form allows one-off invocation of callbacks, or an open-print-close sequence for writing signals to file.
- Raw code
-
These are the callbacks invoked by the previous types of perl fragment.
See Finance::Shares::test for full details, but here is an illustration from the package testing to give some idea of the supported features.
Example
tests => [
sub1 => sub {
my ($date, $high, $low) = @_;
print "$date\: $low to $high\n";
},
test1 => q(
mark($above, 300) if $high > $value or not defined $high;
call('sub1', $date, $high, $low) if $low >= 290;
),
test2 => {
before => <<END
print "before.\\n";
my \$name = "$filename.out";
open( \$self->{fh}, '>', \$name )
or die "Unable to write to '\$name'";
END
,
during => q(
if ($close > $average and defined $average) {
mark($mark, $close);
my $fh = $self->{fh};
print $fh "close=$close, average=$average\n"
if defined $close;
}
),
after => qq(
print "after.\\n";
close \$self->{fh};
),
},
],
Some notes.
- Perl variables
-
Perl variables may be used normally provided you avoid the tags used for lines and tests. Variables with these names refer to the value of the line/test at that time.
- Persistence
-
A special hash ref
$self
has several predefined variables. It is available to all the perl text, allowing variables set in one section to be available in another - when calculating averages, for example. - Programming facility
-
The perl fragments are just text or code refs. They would typically be presented within a model spec file which fsmodel invokes using do. It is therefore possible to create the code from seperate files, 'here' documents or quoted strings. Don't forget to escape the '$' signs when using double-quoted strings.
CONSTRUCTOR
In addition to the resources covered in DESCRIPTION, the following options (in key => value form) may also be included in the model specification. [These, too, will be comma (not semi-colon) seperated.]
- by
-
Set a default value for how the dates are shown. Suitable values are
quotes weekdays days weeks months
- config
-
The name of a file containing a (partial) model specification.
- directory
-
If
filename
doesn't have a directory component, this becomes the directory. Otherwise the current directory is used. - end
-
Set a default value for the last date considered, in YYYY-MM-DD format.
- filename
-
The name associated with the default file specification. '.ps' will be appended to this.
- no_chart
-
Used by fsmodel, this supresses any chart output if set to 1.
- null
-
The string assigned to this will be read as meaning 'nothing'. The following example states that the sample has no stock code. (Default: '<none>')
null => '<nothing>', sample => { stock => '<nothing>', ... },
- show_values
-
Where a function requires more than one dependent line, any after the first may be a number. This is converted internally to a
value
line which may or may not be shown displayed.There is no facility for specifying the style of these lines, but setting
show_values
to 1 will show and 0 will hide them all. - start
-
Set a default value for the first date to be displayed, in YYYY-MM-DD format.
- verbose
-
Control the amount of feedback given while the model is being run.
0 silent 1 default 2 show eval'ed code in user tests 3 debug model outline 4 debug model including objects 5 most methods, including Chart 6 diagnostic, inc Functions 7 everything
- write_csv
-
This saves the sample data in CSV format. It may be either a name for the CSV file or '1', in which case a suitable name is generated. If more than one sample page exists, all subsequent pages will also be saved into seperate files.
BUGS
Please let me know when you suspect something isn't right. A short script working from a CSV file demonstrating the problem would be very helpful.
In particular the regular expression/wild card matching doesn't work properly.
AUTHOR
Chris Willmot, chris@willmot.org.uk
LICENCE
Copyright (c) 2002-2003 Christopher P Willmot
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. A copy can be found at http://www.gnu.org/copyleft/gpl.html
SEE ALSO
Finance::Shares::Overview provides an introduction to the suite, and fsmodel is the principal script.
Other modules involved in processing the model include Finance::Shares::MySQL, Finance::Shares::Chart.
Chart and file details may be found in PostScript::File, PostScript::Graph::Paper, PostScript::Graph::Key, PostScript::Graph::Style.
All functions are invoked from their own modules, all with lower-case names such as Finance::Shares::moving_average. The nitty-gritty on how to write each line specification are found there.
Core modules used directly by this module include Finance::Shares::data, Finance::Shares::value, Finance::Share::mark and Finance::Share::test.
For information on writing additional line functions see Finance::Share::Function and Finance::Share::Line. Also, Finance::Share::test covers writing your own tests.