NAME

CohortExplorer::Datasource - CohortExplorer datasource superclass

SYNOPSIS

# The code below shows methods your datasource class overrides

package CohortExplorer::Application::My::Datasource;
use base qw( CohortExplorer::Datasource );

sub authenticate { 
    
    my ($self, $opts) = @_;
            
    # authentication code...

      # Successful response may return a scalar (e.g. project_id) depending on the requirement
      return $response
    
}

sub additional_params {
    
     my ($self, $opts, $response) = @_;
      
     my %params;

     # get database handle (i.e. $self->dbh) and run some SQL queries to get additional parameters
     # (e.g. project_id ) to be used in entity/variable/table structure hooks
      
     return \%params;
}

sub entity_structure {
     
     my ($self) = @_;
     
     my %struct = (
                  -columns =>  {
                                 entity_id => 'd.record',
                                 variable => 'd.field_name',
                                 value => 'd.value',
                                 table => 'm.form_name'
                   },
                   -from =>  [ -join => qw/data|d <=>{project_id=project_id} metadata|m/ ],
                   -where =>  { 
                   	              # project_id accessor to impose condition
                                 'd.project_id' => $self->project_id
                    }
      );
      
      return \%struct;
 }
 
     
sub table_structure {
     
     my ($self) = @_;
     
     return {
             
              -columns => {
                             table => 'GROUP_CONCAT( DISTINCT form_name )', 
                             variable_count => 'COUNT( field_name )',
                             label => 'element_label'
              },
             -from  => 'metadata'',
             -where => {
                         project_id => $self->project_id
              },
             -order_by => 'field_order',
             -group_by => 'form_name'
    };
 }
 
 sub variable_structure {
     
     my ($self) = @_;
     
     return {
             -columns => {
                           variable => 'field_name',
                           table => 'form_name',
                           label => 'element_label',
                           type => "IF( element_validation_type IS NULL, 'text', element_validation_type)",
                           category => "IF( element_enum like '%, %', REPLACE( element_enum, '\\\\n', '\n'), '')"
             },
            -from => 'metadata',
            -where => { 
                         project_id => $self->project_id
             },
            -order_by => 'field_order'
    };
 }
 
 sub datatype_map {
    
  return {
              int         => 'signed',
             float        => 'decimal',
             date_dmy     => 'date',
             date_mdy     => 'date',
             date_ymd     => 'date',
             datetime_dmy => 'datetime'
  };
}

DESCRIPTION

CohortExplorer::Datasource is the base class for all datasources. When connecting CohortExplorer to EAV repositories other than Opal (OBiBa) and REDCap the user is expected to create a class which inherits from CohortExplorer::Datasource. The datasources/projects implemented using Opal and REDCap can be queried using the inbuilt "Opal (OBiBa)CohortExplorer::Application::Opal::Datasource" and REDCap APIs (see here).

OBJECT CONSTRUCTION

initialize( $opts, $config_file )

CohortExplorer::Datasource is an abstract factory; initialize() is the factory method that constructs and returns an object of the datasource supplied as an application option. This class reads the datasource configuration from the config file (i.e. datasource-config.properties) to instantiate the datasource object. A sample config file is shown below:

 <datasource datasourceA> 
  namespace=CohortExplorer::Application::Opal::Datasource
  url=http://example.com
  entity_type=Participant
  dsn=DBI:mysql:database=opal;host=hostname;port=3306
  username=database_username
  password=database_password
</datasource> 

<datasource datasourceB> 
  namespace=CohortExplorer::Application::Opal::Datasource
  url=http://example.com
  entity_type=Instrument
  dsn=DBI:mysql:database=opal;host=hostname;port=3306
  username=database_username
  password=database_password
  name=datasourceA
</datasource> 

<datasource datasourceC> 
  namespace=CohortExplorer::Application::REDCap::Datasource
  dsn=DBI:mysql:database=opal;host=myhost;port=3306
  username=database_username
  password=database_password
</datasource>

Each block holds a unique datasource configuration. In addition to some reserve parameters, namespace, name, dsn, username, password, static_tables and visit_max it is up to the user to decide what other parameters they want to include in the configuration file. The user can specify the actual name of the datasource using the name parameter provided the block name is an alias. If the name parameter is not found the block name is assumed to be the actual name of the datasource. In the example above, both datasourceA and datasourceB connect to the same datasource (i.e. datasourceA) but with different configuration, datasourceA is configured to query the participant data where as, datasourceB can be used to query the instrument data. Once the class has instantiated the datasource object, the user can access the parameters by simply calling the accessors which have the same name as the parameters. For example, the database handle can be retrieved by $self->dbh and entity_type by $self->entity_type.

The namespace is the full package name of the inbuilt API the application will use to consult the parent EAV schema. The parameters present in the configuration file can be used by the subclass hooks to provide user or project specific functionality.

new()

$object = $ds_pkg->new();

Basic constructor.

PROCESSING

After instantiating the datasource object, the class first calls authenticate to perform the user authentication. If the authentication is successful (i.e. returns a defined $response), it sets some additional parameters, if any ( via additional_params). The subsequent steps include calling the methods; entity_structure, table_structure, variable_structure, datatype_map and validating the return from each method. Upon successful validation the class attempts to set entity, table and variable specific parameters by invoking the methods below:

set_entity_params( $struct )

This method attempts to retrieve the entity parameters, entity_count and visit_max (for longitudinal datasources) from the database. The method accepts the input from entity_structure method of the sub class.

set_table_params( $struct )

This method attempts to set tables and their attributes as a hash where, table names are keys and attribute name-value pairs are values. The table attributes are read from the -columns specified under the hash ref returned from table_structure method of the sub class.

set_variable_params( $struct )

This method attempts to set variables and their attributes as a hash where, keys are the combination of table and variable names and, values are the attribute name-value pairs. Instead of using the variable names as keys the method uses the combination of table and variable names because,

a.

the resulting name also contains the name of the table, the variable was recorded under (e.g. CaseHistory.Onset_Age),

b.

distinguishes one variable from the other as some variables across tables may share the same name (e.g. Subject.Sex and Informant.Sex).

set_visit_variables()

This method attempts to set the visit variables. The method is called only if the datasource is longitudinal with data on at least 2 visits. The visit variables are valid to dynamic tables and they represent the visit transformation of variables (e.g., vAny.var, vLast.var, v1.var ... vMax). The prefix vAny implies any visit, vLast last visit, v1 first visit and vMax the maximum visit for which the data is available in the database. The compare command allows the use of visit variables when searching for entities of interest.

SUBCLASS HOOKS

The subclasses override the following hooks:

authenticate( $opts )

This method should return a response (a scalar) upon successful authentication otherwise return undef. The method is called with one parameter, $opts which is a hash with application options as keys and their user-provided values as hash values. Note the methods below are only called if the authentication is successful.

additional_params( $opts, $response )

This method should return a hash ref containing parameter name-value pairs. Not all parameter values are known in advance so they can not be specified in the datasource configuration file. Sometimes the value of some parameter first needs to be retrieved from the database (e.g. variables and records a given user has access to). This hook can be used specifically for this purpose. The user can run some SQL queries to retrieve values of the parameters they want to add to the datasource object. The parameters used in calling this method are:

$opts a hash with application options as keys and their user-provided values as hash values.

$response a scalar received upon successful authentication. The user may want to use the scalar response to fetch other parameters (if any).

entity_structure()

The method should return a hash ref defining the entity structure in the database. The hash ref must have the following keys:

-columns

entity_id

variable

value

table

visit (valid to longitudinal datasources)

-from

table specifications (see SQL::Abstract::More)

-where

where clauses (see SQL::Abstract)

table_structure()

The method should return a hash ref defining the table structure in the database. The table in this context implies questionnaires or forms. For example,

{
    -columns => [
                  table => 'GROUP_CONCAT( DISTINCT form_name )', 
                  variable_count => 'COUNT( field_name )',
                  label => 'element_label'
    ],
   -from  => 'metadata',
   -where => {
               project_id => $self->project_id
   },
  -order_by => 'field_order',
  -group_by => 'form_name'

}

the user should make sure the returned hash ref is able to produce the SQL output like the one below:

+-------------------+-----------------+------------------+
| table             | variable_count  | label            |
+-------------------+-----------------+------------------+
| demographics      |              26 | Demographics     |
| baseline_data     |              19 | Baseline Data    |
| month_1_data      |              20 | Month 1 Data     |
| month_2_data      |              20 | Month 2 Data     |
| month_3_data      |              28 | Month 3 Data     |
| completion_data   |               6 | Completion Data  |
+-------------------+-----------------+------------------+

Note that -columns hash ref must have the key table corresponding to the name of form/questionnaire, others columns are table attributes, it is up to the user to decide what table attributes they think are suitable for the description of tables.

variable_structure()

This method should return a hash ref defining the variable structure in the database. For example,

{
    -columns => [
                   variable => 'field_name',
                   table => 'form_name',
                   label => 'element_label',
                   category => "IF( element_enum like '%, %', REPLACE( element_enum, '\\\\n', '\n'), '')",
                   type => "IF( element_validation_type IS NULL, 'text', element_validation_type)"
    ],
   -from => 'metadata',
   -where => { 
               project_id => $self->project_id
    },
    -order_by => 'field_order'
}

the user should make sure the returned hash ref is able to produce the SQL output like the one below:

+---------------------------+---------------+-------------------------+---------------+----------+
| variable                  | table         |label                    | category      | type     |
+---------------------------+---------------+-------------------------+---------------------------
| kt_v_b                    | baseline_data | Kt/V                    |               | float    |
| plasma1_b                 | baseline_data | Collected Plasma 1?     | 0, No         | text     |
|                           |               |                         | 1, Yes        |          |
| date_visit_1              | month_1_data  | Date of Month 1 visit   |               | date_ymd |
| alb_1                     | month_1_data  | Serum Albumin (g/dL)    |               | float    |
| prealb_1                  | month_1_data  | Serum Prealbumin (mg/dL)|               | float    |
| creat_1                   | month_1_data  | Creatinine (mg/dL)      |               | float    |
+---------------------------+---------------+-----------+-------------------------------+--------+

Note that -columns array ref must contain elements variable and table. Again it is up to the user to decide what variable attributes (i.e. meta data) they think define the variables in the datasource. The categories within category must be separated by newline.

datatype_map()

This method should return a hash ref with variable type as keys and equivalent SQL type (i.e. castable) as value. For example, in some datasource the datatype int can be converted to database signed and float to decimal. By default, all variable values are assumed to be varchar(255).

DIAGNOSTICS

  • Config::General fails to parse the datasource configuration file.

  • Failed to instantiate datasource package '<datasource pkg>' via new().

  • The return from methods additional_params, entity_structure, table_structure, variable_structure and datatype_map is either not hash worthy or contains missing columns.

  • The select method from SQL::Abstract::More fails to construct the SQL query from the supplied hash ref.

  • The method execute from DBI fails to execute the SQL query.

DEPENDENCIES

Carp

CLI::Framework::Exceptions

Config::General

DBI

Exception::Class::TryCatch

SQL::Abstract::More

Tie::IxHash

SEE ALSO

CohortExplorer

CohortExplorer::Application::Opal::Datasource

CohortExplorer::Application::REDCap::Datasource

CohortExplorer::Command::Describe

CohortExplorer::Command::Find

CohortExplorer::Command::History

CohortExplorer::Command::Query::Search

CohortExplorer::Command::Query::Compare

LICENSE AND COPYRIGHT

Copyright (c) 2013-2014 Abhishek Dixit (adixit@cpan.org). All rights reserved.

This program is free software: you can redistribute it and/or modify it under the terms of either:

  • the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version, or

  • the " Artistic Licence ".

AUTHOR

Abhishek Dixit