NAME

File::Tabular::Web - turn tabular files into web applications

SYNOPSIS

# start a local HTTP server
plackup -MFile::Tabular::Web -e "File::Tabular::Web->new->to_app"

# create an application scaffolding from a tabular file
perl ftw_new_app.pl path/to/some/data.txt

# use the app
http://localhost:5000/path/to/some/data.ftw?S=foo
http://localhost:5000/path/to/some/data.ftw?S=col1:bar*
http://localhost:5000/path/to/some/data.ftw?S=col2 < 123 AND col3 ~ \w\d
http://localhost:5000/path/to/some/data.ftw?L=id_of_some_record

# not displayed here
# - POST URLs to edit the data
# - integration in a real application server instead of localhost

# customize the app -- no programming involved
edit path/to/some/{data_short.tt,data_long.tt,data_edit.tt}  # views
edit path/to/some/data.ftw                                   # config

DESCRIPTION

This is a simple web application framework for searching, displaying and updating data from flat tabular files.

The framework is based on File::Tabular and Search::QueryParser for searching and editing facilities, and on Plack middleware for Web support. As a result, it will run on any Plack-supported infrastructure, like CGI, FCGI, modperl, or a local HTTP server launched from the command line through the plackup utility.

The strong point of File::Tabular::Web is that it is built around a versatile search engine, convenient for Web-style queries: this search engine spans all data fields by default, but can also retrieve words in specific fields, find prefixes, apply regular expressions, compare numerical values, use boolean combinations, etc. All of that power is available directly to all applications within the framework, without any programming. To build a new application, all that is needed is to invoke the ftw_new_app.pl script, which will create some scaffolding templates for searching, displaying and editing your data. The application is immediately usable; the templates can be customized to improve the look and feel, and the configuration file can be edited to tune some aspects like access control ... but no Perl code is needed, at least not for common needs. So if you are looking for simplicity and speed of development and deployment, and are ready to sacrifice some speed of execution, then you may have found a convenient tool.

This framework has been used successfully for about 15 years in our Intranet for managing lists of people, rooms, meetings, links, etc., and even for more sensitive information like lists of payments or the archived judgements (minutes) of Geneva courts. Of course this technology is much slower than a real database, but if the data is not too big and the frequency of requests is not too high, it can be a perfectly viable solution.

See also File::Tabular::Web::Attachments and File::Tabular::Web::Attachments::Indexed for subclasses that extend the framework with methods for managing documents attached to data fields.

QUICKSTART

HTTP server configuration

File::Tabular::Web is designed so that it only needs to be installed once and for all in your HTTP server configuration. Then all applications can be added or modified on the fly, without restarting the server.

For this to work, you need to tell the HTTP server which URLs are going to be served by File::Tabular::Web. Although there are several ways to achieve this, the recommended way is to choose a file extension to be associated with this module and define a general rule performing the mapping. Here is an example using Apache, Plack and mod_perl :

<LocationMatch "\.ftw$">
  SetHandler perl-script
  PerlResponseHandler Plack::Handler::Apache2
  PerlSetVar psgi_app /path/to/ftw.psgi
</LocationMatch>

where /path/to/ftw.psgi is a path to a simple PSGI file containing the following code :

use File::Tabular::Web;
my $app = File::Tabular::Web->new->to_app;

Once this is configured, any URL ending in .ftw will be served by File::Tabular::Web.

The example above just gives the general idea; similar configurations can be obtained with FCGI or other architectures, setting the rules either within the HTTP server or within the Plack middleware; see your web server documentation and the Plack documentation.

For development purposes, an application server can be started from the command line, thanks to the Plack infrastructure :

plackup /path/to/ftw.psgi

or the ftw.psgi can even be dispensed with, through the command

plackup -MFile::Tabular::Web -e "File::Tabular::Web->new->to_app"

Setting up a particular application

An application consists of a data file, a configuration file and a couple of template files. In the simplest setting, all of these should be located in the same directory, at some path under an app_root directory, which by default is the same as the DOCUMENT_ROOT of your HTTP server. The data file is all you need to get started; the other files will be generated automatically.

We will show this through the example of a simple people directory application, assuming an Apache server where the document root is the htdocs directory. If you want to try with a local server instead, using the plackup command shown above, your document root is the current directory.

  • First create directory htdocs/people.

  • Let's assume that you already have a list of people, in a spreadsheet or a database. Export that list into a flat text file named htdocs/people/dir.txt. If you export from an Excel Spreadsheet, do NOT export as CSV format ; choose "text (tab-separated)" instead. The datafile should contain one line per record, with a character like '|' or TAB as field separator, and field names on the first line (see File::Tabular for details).

  • Run the helper script

    perl ftw_new_app.pl --fieldSep \\t htdocs/people/dir.txt

    This will create in the same directory a configuration file dir.ftw, and a collection of HTML templates dir_short.tt, dir_long.tt, dir_modif.tt, etc. The --fieldSep option specifies which character acts as field separator (the default is '|'); other option are available, see

    perl ftw_new_app.pl --help

    for a list.

  • The URL http:://your.web.server/people/dir.ftw is now available to access the application, ready for searching, displaying, and maybe edit the data. You may first test the default layout, and then customize the templates to suit your needs. The templating language is documented in the "Template Toolkit's documentation".

Note : initially all files are placed in the same directory, because it is simple and convenient; however, data and templates files are not really web resources and therefore theoretically should not belong to the htdocs tree. If you want a more structured architecture, you may move these files to a different location, and specify within the configuration how to find them (see instructions below).

In most cases, the steps just shown will be sufficient, so they can be performed by a webmaster without Perl knowledge.

For more advanced uses, application-specific Perl subclasses can be hooked up into the framework for performing particular tasks. See for example the companion File::Tabular::Web::Attachments module, which provides services for attaching documents and indexing them through Search::Indexer, therefore providing a mini-framework for storing electronic documents.

WEB API

Entry points

Various entry points into the application (searching, editing, etc.) are chosen by single-letter arguments :

H

http://myServer/some/app.ftw?H

Displays the homepage of the application (through the home view). This is the default entry point, i.e. equivalent to

http://myServer/some/app.ftw

S

http://myServer/some/app.ftw?S=<criteria>

Searches records matching the specified criteria, and displays a short summary of each record (through the short view). Here are some example of search criteria :

word1 word2 word3                 # records containing these 3 words anywhere
+word1 +word2 +word3              # idem
word1 word2 -word3                # containing word1 and word2 but not word3
word1 AND (word2 OR word3)        # obvious
"word1 word2 word3"               # sequence
word*                             # word completion
field1:word1 field2:word2         # restricted by field
field1 == val1  field2 > val2     # relational operators (will inspect the
                                  #   shape of supplied values to decide
                                  #   about string/numeric/date comparisons)
field~regex                       # regex

See Search::QueryParser and File::Tabular for more details.

Additional parameters may control sorting and pagination. Ex:

?S=word&orderBy=birthdate:-d.m.y,lastname:alpha&count=20&start=40
count

How many items to display on one page. Default is 50.

start

Index within the list of results, telling which is the first record to display (basis is 0).

orderBy

How to sort results. This may be one or several field names, possibly followed by a specification like :num or :-alpha. Precise syntax is documented in "cmp" in Hash::Type.

max

Maximum number of records retrieved in a search (records beyond that number will be dropped).

L

http://myServer/some/app.ftw?L=<key>

Finds the record with the given key and displays it in detail through the long view.

M

http://myServer/some/app.ftw?M=key

If called with method GET, finds the record with the given key and displays it through the modif view (typically this view will be an HTML form).

If called with method POST, finds the record with the given key and updates it with given field names and values. After update, displays an update message through the msg view.

A

http://myServer/some/app.ftw?A

If called with method GET, displays a form for creating a new record, through the modif view. Fields may be pre-filled by default values given in the configuration file.

If called with method POST, creates a new record, with values given by the submitted form. After record creation, displays an update message through the msg view.

D

http://myServer/some/app.ftw?D=<key>

Deletes record with the given key. After deletion, displays an update message through the msg view.

X

http://myServer/some/app.ftw?X

Display all records throught the download view (mnemonic : eXtract)

Additional parameters

V

Name of the view (i.e. template) that will be used for displaying results. For example, assuming that the application has defined a print view, we can call that view through

http://myServer/some/app.ftw?S=<criteria>&V=print

WRITING TEMPLATES

This section assumes that you already know how to write templates for the Template Toolkit (see Template).

The path for searching templates includes

  • the application directory (where the configuration file resides)

  • the directory specified within the configuration file by parameter [template]dir

  • some default directories: <app_root>/../lib/tmpl/ftw/<application_name>, <app_root>/../lib/tmpl/ftw/<default>, <app_root>/../lib/tmpl/ftw.

Values passed to templates

self

handle to the File::Tabular::Web object; from there you can access self.url (URL of the application), self.app_root (root dir for applications, by default equal to DOCUMENT_ROOT), self.cfg (configuration information, an AppConfig object), self.mtime (modification time of the data file), and self.msg (last message). You can also call methods "can_do" or "param", like for example

[% IF self.can_do('add') %]
   <a href="?A">Add a new record</a>
[% END # IF %]

or

[% self.param('myFancyParam') %]
found

structure containing the results of a search. Fields within this structure are :

count

how many records were retrieved

records

arrayref containing a slice of records

start

index of first record in the returned slice

end

index of last record in the returned slice

href link to the next slice of results (if any)

href link to the previous slice of results (if any)

Using relative URLS

All pages generated by the application have the same URL; query parameters control which page will be displayed. Therefore all internal links can just start with a question mark : the browser will recognize that this is a relative link to the same URL, with a different query string. So within templates we can write simple links like

<a href="?H">Homepage</a>
<a href="?S=*">See all records</a>
<a href="?A">Add a new record</a>
[% FOREACH record IN found.records %]
  <a href="?M=[% record.Id %]">Modify this record</a>
[% END # FOREACH  %]

Forms

Data input

A typical form for updating or adding a record will look like

<form method="POST">
 First Name <input name="firstname" value="[% record.firstname %]"><br>
 Last Name  <input name="lasttname" value="[% record.lastname %]">
 <input type="submit">
</form>

Usually there is no need to specify the action of the form : the default action sent by the browser will be the same URL (including the query parameter ?A or ?M=[% record.Id %]). When the application receives a POST request, it knows it has to update or add the record instead of displaying the form. This implies that you must use the POST method for any data modification; whereas forms for searching may use either GET or POST methods.

For convenience, deletion through a GET url of shape ?D=[% record.Id %] is supported; however, data modification through GET method is not recommended, and therefore it is preferable to write

<form method="post">
  <input name="D" value="[% record.Id %]">
  <input type="submit" value="Delete this record">
</form>

Searching

A typical form for searching will look like

<form method="POST" action="[% self.url %]">
   Search : 
     <select name="S">
       <option value="">--Choose in field1--</option>
       <option value="+field1:val1">val1</option>
       <option value="+field1:val2">val2</option>
       ...
     </select>
     Other : <input name="S">
 <input type="submit">
</form>

So the form can combine several search criteria, all passed through the S parameter. The form method can be either GET or POST; but if you choose POST, then it is recommended that you also specify

action="[% self.url %]"

instead of relying on the implicit self-url from the browser. Otherwise the URL displayed in the browser may still contain some all criteria from a previous search, while the current form sends other search criteria --- the application will not get confused, but the user might.

Highlighting the searched words

The preMatch and postMatch parameters in the configuration file (see below) define some marker strings that will be automatically inserted in the data returned by a search, surrounding each word that was mentioned in the query. These marker strings should be chosen so that they would unlikely mix with regular data or with HTML markup : the recommanded values are

preMatch  {[
postMatch ]}

Then you can exploit that marking within your templates by calling the "highlight" and "unhighlight" template filters, described below.

CONFIGURATION FILE

The configuration file is always stored within the htdocs directory, at the location corresponding to the application URL : so for application http://myServer/some/data.ftw, the configuration file is in

/path/to/http/htdocs/some/data.ftw

Because of the HTTP server configuration directives described above, the URL is always served by File::Tabular::Web, so there is no risk of users seing the content of the configuration file.

The configuration is written in Appconfig format. This format supports comments (starting with #), continuation lines (through final \), "heredoc" quoting style for multiline values, and section headers similar to a Windows INI file. All details about the configuration file format can be found in Appconfig::File.

Below is the list of the various recognized sections and parameters.

Global section

The global section (without any section header) can contain general-purpose parameters that can be retrieved later from the viewing templates through [% self.cfg.<param> %]; this is useful for example for setting a title or other values that will be common to all templates.

The global section may also contain some options to "new" in File::Tabular : preMatch, postMatch, avoidMatchKey, fieldSep, recordSep.

Option highlightClass defines the class name used by the "highlight" filter (default is HL).

[fixed] / [default]

The fixed and default sections simulate parameters to the request. Specifications in the fixed section are stronger than HTTP parameters; specifications in the default section are weaker : the param method for the application will first look in the fixed section, then in the HTTP request, and finally in the default section. So for example with

[fixed]
count=50
[default]
orderBy=lastname

a request like

?S=*&count=20

will be treated as

?S=*&count=50&orderBy=lastname

Relevant parameters to put in fixed or in default are described in section "S" of this documentation : for example count, orderBy, etc.

[application]

dir=/some/directory

Directory where application files reside. By default : same directory as the configuration file.

name=some_name

Name of the application (will be used for example as prefix to find template files). This must be a single-level name (no pathnames allowed).

data=some_name

Name of the tabular file containing the data. This must be a single-level name and must reside in the application directory. By default: application name with the .txt suffix appended.

class=My::File::Tabular::Web::Subclass

Will dynamically load the specified module and use it as class for objects of this application. The specified module must be a subclass of File::Tabular::Web.

useFileCache=1

If true, the whole datafile will be slurped into memory and reused across requests (except update requests).

mtime=<format>

Format to display the last modified time of the data file, using POSIX strftime(). The result will be available to templates in [% self.mtime %]

[permissions]

This section specifies permissions to perform operations within the application. Of course we need the HTTP server to be configured to do some kind of authentification, so that the application receives a user name through the REMOTE_USER environment variable. Otherwise the default user name received by the application is "Anonymous". Instructions for setting up authentication for an Apache server are documented at http://httpd.apache.org/docs/2.4/howto/auth.html.

The HTTP server may also be configured to do some kind of authorisation checking, but this will control access to the application as a whole, whereas here we configure fine-grained permissions for various operations.

Builtin permission names are : search, read, add, delete, modif, and download. Each name also has a negative counterpart, i.e. no_search, no_read, etc.

For each of those permission names, the configuration can give a list of user names separated by commas or spaces : the current user name will be compared to this list. A permission may also specify '*', which means 'everybody' : this is the default for permissions read, search and download. There is no builtin notion of "user groups", but you can introduce such a notion by writing a subclass which overrides the "user_match" method.

Permissions may also be granted or denied on a per-record basis : writing $fieldname (starting with a literal dollar sign) means that users can access records in which the content of fieldname matches their username. Usually this is associated with an automatic user field (see below), so that the user who created a new record can later modify it.

Example :

[permissions]
 read   = * # the default, could have been omitted
 search = * # idem
 add    = andy bill
 modif  = $last_author # username must match content of field 'last_author'
 delete = $last_author

[fields]

The fields section specifies some specific information about fields in the tabular file.

time <field> = <format>

Declares field to be a time field, which means that whenever a record is updated, the current local time will be automatically inserted in that field. The format argument will be passed to POSIX strftime(). Ex :

time DateModif = %d.%m.%Y
time TimeModif = %H:%M:%S
user = <field>

Declares field to be a user field, which means that whenever a record is updated, the current username will be automatically inserted in that field.

default <field> = <value>

Default values for some fields ; will be inserted into new records.

autoNum <field>

Activates autonumbering for new records ; the number will be stored in the given field. Automatically implies that default <field> = '#'.

Subclasses may add more entries in this section (for example for specifying fields that will hold names of attached documents).

[template]

This section specifies where to find templates for various views. The specified locations will be looked for in several directories: the application template directory (as specified by dir directive, see below), the application directory, the default File::Tabular::Web template directory (as specified by the app_tmpl_default_dir method), or the subdirectory default of the above.

dir

specifies the application template directory

short

Template for the "short" display of records (typically a table for presenting search results).

long

Template for the "long" display of records (typically for a detailed presentation of a single record ).

modif

Template for editing a record (typically this will be a form with an action to call the update URL (?M=key).

msg

Template for presenting special messages to the user (messages after a record update or deletion, or error messages).

home

Homepage for the application.

Defaults for these templates are <application_name>_short.tt, <application_name>_long.tt, etc.

METHODS

Note on the architecture

The internal object-oriented design is a bit unorthodox, mainly because I wrote it many years ago at a time when I was less familiar with Web architectures, and also because when migrating to Plack I also had to keep the previous modperl+CGI API for preserving backwards compatibility. Unfortunately the architecture cannot be changed now, because there might be subclasses that rely on this particular design. External users need not worry, but authors of subclasses should be aware of the design.

There are two kinds of instance of the File::Tabular::Web class :

  • if running under Plack, one instance is the persistent Plack component that will execute the "call" method at each request.

  • at each HTTP request, a new transient instance of File::Tabular::Web class is created; that instance holds temporary information needed to communicate across the various steps of request handling. It is automatically destroyed after having sent the response.

In addition, the module itself maintains a collection of application hashrefs, loaded dynamically when needed. Each application hashref holds information about its configuration file, template files, etc.

By convention, methods starting with an underscore are meant to be private, i.e. should not be redefined in subclasses.

Entry point

new

use File::Tabular::Web;
my $ftw = File::Tabular::Web->new(app_root => $some_directory);

The new method creates a Plack component which can serve requests to a collection of File::Tabular::Web applications.

The app_root optional argument tells where application files are located : relative URL to applications will be mapped to relative paths starting from this root. If the argument is not explictly supplied, a default value is guessed by the system, looking at

  • $mod_perl->document_root (if under mod_perl)

  • $env->{CONTEXT_DOCUMENT_ROOT} (new in Apache2.4)

  • $env->{DOCUMENT_ROOT}

to_app

my $app = $ftw->to_app;

Creates a Plack application from the Plack component. This method is just inherited from Plack::Component.

handler

File::Tabular::Web->handler;

Legacy code : this method used to be the main entry point into the module, to be called from mod_perl or CGI scripts. Now the entry point is Plack's to_app method shown above. The handler method remains only for backwards compatibility; new projects should not use this.

Methods for creating / initializing "application" hashrefs

_app_new

Reads the configuration file for a given application and creates a hashref storing the information. The hashref is put in a global cache of all applications loaded so far.

This method should not be overridden in subclasses; if you need specific code to be executed, use the "app_initialize" method.

_app_read_config

Glueing code to the AppConfig module.

app_initialize

Initializes the application hashref. In particular, it creates the Template object, with appropriate settings to specify where to look for templates.

If you override this method in subclasses, you should probably call SUPER::app_initialize.

app_tmpl_default_dir

Returns the default directory containing templates. The default is <app_root>/../lib/tmpl/ftw.

app_tmpl_filters

Returns a hashref of filters to be passed to the Template object (see Template::Filters).

The default contains two filters, which work together with the preMatch and postMatch parameters of the configuration file. Suppose the following configuration :

preMatch  {[
postMatch ]}

Then the filters are defined as follows :

highlight

Replaces strings of shape {[...[} by <span class="HL">...</span>.

The class name is HL by default, but another name can be defined through the highlightClass configuration parameter. Templates have to define a style for that class, like for example

<style>
  .HL {background: lightblue}
</style>
unhighlight

Replaces strings of shape {[...[} by ... (i.e. removes the marking).

These filters are intended to help highlighting the words matched by a search request ; usually this must happen after the data has been filtered for HTML entities. So a typical use in a template would be for example

<a href="/some/url?with=[% record.foo | unhighlight | uri %]">
    link to [% record.foo | html | highlight %]
</a>

app_phases_definition

As explained above in section "WEB API", various entry points into the application are chosen by single-letter arguments; here this method returns a table that specifies what happens for each of them.

A letter in the table is associated to a hashref, with the following keys :

pre

name of method to be executed in the "data preparation phase"

op

name of method to be executed in the "data manipulation phase"

view

name of view for displaying the results

Methods for instance creation / initialization

_new

Creates a new object, which represents an HTTP request to the application. The class for the created object is generally File::Tabular::Web, unless specified otherwise in the the configuration file (see the class entry in section "CONFIGURATION FILE").

The _new method cannot be redefined in subclasses; if you need custom code to be executed, use "initialize" or "app_initialize" (both are invoked from _new).

initialize

Code to initialize the object. The default behaviour is to setup max, count and orderBy within the object hash.

_setup_phases

Reads the phases definition table and decides about what to do in the next phases.

open_data

Retrieves the name of the datafile, decides whether it should be opened for readonly or for update, and creates a corresponding File::Tabular object. The datafile may be cached in memory if directive useFileCache is activated.

_cached_content

Implementation of the memory cache; checks the modification time of the file to detect changes and invalidate the cache.

Methods that can be called from templates

param

[% self.param %]

With no argument, returns the list of parameter names to the current HTTP request.

[% self.param(param_name) %]

With an argument, returns the value that was specified under $param_name in the HTTP request, or in the configuration file (see the description of [fixed]/[default] sections). The return value is always a scalar (so this is not exactly the same as calling cgi.param(...)). If the HTTP request contains multiple values under the same name, these values are joined with a space. Initial and trailing spaces are automatically removed.

If you need to access the list of values in the HTTP request, you can call

[% self.req.param(param_name) %]

can_do

[% self.can_do($action, [$record]) %]

Tells whether the current user has permission to do $action (which might be 'modif', 'delete', etc.). See explanations above about how permissions are specified in the initialization file. Sometimes permissions are setup in a record-specific way (for example one data field may contain the names of authorized users); the second optional argument is meant for those cases, so that can_do() can inspect the current data record.

Request handling : general methods

_dispatch_request

Executes the various phases of request handling

display

Finds the template corresponding to the view name, gathers its output, and prints it together with some HTTP headers.

Request handling : search methods

search_key

Search a record with a specific key. Puts the result into $self->{result}.

Search records matching given criteria (see File::Tabular for details). Puts results into $self->{result}.

Initializes $self->{search_string}. Overridden in subclasses for more specific searching (like for example adding fulltext search into attached documents).

sort_and_slice

Choose a slice within the result set, according to pagination parameters count and start.

_url_for_next_slice

Returns an URL to the next or previous slice, using "params_for_next_slice".

params_for_next_slice

Returns an array of strings "param=value" that will be inserted into the URL for next or previous slice.

words_queried

List of words found in the query string (to be used for example for highlighting those words in the display).

Update Methods

empty_record

Generates an empty record (preparation for adding a new record). Fields are filled with default values specified in the configuration file.

update

Checks for permission and then performs the update. Most probably you don't want to override this method, but rather the methods before_update or after_update.

before_update

Copies values from HTTP parameters into the record, and automatically fills the user name or current time/date in appropriate fields.

after_update

Hook for any code to perform after an update (useful for example for attached documents).

rollback_update

Hook for any code to roll back whatever was performed in before_update, in case the update failed (useful for example for attached documents).

Delete Methods

delete

Checks for permission and then performs the delete. Most probably you don't want to override this method, but rather the methods before_delete or after_delete.

before_delete

Hook for any code to perform before a delete.

after_delete

Hook for any code to perform aftere a delete.

Miscellaneous methods

prepare_download

Checks for permission to download the whole dataset.

Prints help. Not implemented yet.

user_match

$self->user_match($access_control_list)

Returns true if the current user (as stored in $self->{user} "matches" the access control list (given as an argument string).

The meaning of "matches" may be redefined in subclasses; the default implementation just performs a regex case-insensitive search within the list for a complete word equal to the username.

Override in subclasses if you need other authorization schemes (like for example dealing with groups).

key_field

Returns the name of the key field in the data file.

key

my $key = $self->key($record);

Returns the value in the first field of the record.

AUTHOR

Laurent Dami, <dami AT cpan DOT org>

COPYRIGHT & LICENSE

Copyright 2007-2016 Laurent Dami, all rights reserved.

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.