NAME

DBIx::BulkUtil - Sybase/SybaseIQ/Oracle bulk load and other utilities

SYNOPSIS

use DBIx::BulkUtil;

# Return just the regular DBI handle
my $dbh = DBIx::BulkUtil->connect(%options, \%dbi_options);

# Or return a DBI handle and a 'Utility' object.

# syb_connect,ora_connect, and iq_connect methods are also provided
# to directly specify database type
my ($dbh, $db_util) = DBIx::BulkUtil->connect(%options, \%dbi_options);

# Wrappers for Sybase bcp, Oracle sqlldr, IQ 'load table'
$db_util->bcp_in($table,  [$file], [\%options]);
$db_util->bcp_out($table, [$file], [\%options]);

$column_info = $db_util->column_info($table);

$insert_sth = $db_util->prepare(%options);

$cnt = $db_util->merge(%options);

$blk_sth = $db_util->blk_prepare($table, %options);
$blk_sth->execute(@args);
$blk_sth->finish();

$index_info  = $db_util->index_info($table, [$all_indexes]);
$primary_key = $db_util->primary_key($table);

$stored_proc_sql = $db_util->sp_sql($stored_proc, @args);
$stored_proc_sth = $db_util->sp_sth($stored_proc, @args);

my $passwd = DBIx::BulkUtil->passwd();

$object_type = $db_util->obj_type($object_name);

$current_date_function = $db_util->now();
$date_function_str     = $db_util->add($date, $amount, $units);
$one_row_no_table_sql  = $db_util->row_select($select_clause);
@no_table_results      = $db_util->get($select_clause)
$ten_minutes_from_now  = $db_util->get( $db_util->add( $db_util->now(), 10, 'minute' ) );

DESCRIPTION

Provide easy to use bulk load and other utility methods.

CLASS METHODS

connect()

Returns a DBI database handle. In list context, returns a database handle and a database utility object. The default connection attribute values for the database handle are:

ChopBlanks => 1,
AutoCommit => 1,
PrintError => 0,
RaiseError => 1,

If the last argument to connect() is a hash reference, then you can either override these attributes or set other attributes on connect. (Note: The connect is made using RaiseError => 1, but after the connection is successful, HandleError is set to a subroutine that calls Carp::confess which gives a stack trace when DBI throws an exception).

The first argument to connect is a hash with the following keys:

Server

The database server to connect to. If no server or database is provided, then it is defaulted to the value of environment variable DSQUERY.

Database

The database to connect to. If a database but no server is provided, then it is assumed to be an Oracle database. If no server or database is provided, then the default is the pm database.

Type

The database type (Oracle, Sybase, or SybaseIQ). The default depends on what combination of Server and Database is provided.

User

The user to connect to the database as. Defaults to calling the user() method.

DataDir

Meant to be a data directory to keep config info in, to be used in the env2db method in any way you see fit.

Password

The password to use to login to the database. Default is to call the passwd method.

RetryCount

Will retry connection to the database this many times.

RetryMinutes

Will wait this many minutes before trying to connect to the database again.

BulkLogin

For Sybase, enables the use of the blk_prepare method on the utility handle for bulk inserts (i.e. the syb_bcp_attribs attribute on insert statements).

NoBlankNull

For SybaseIQ, when loading a file via bcp_in, will not convert blank columns to null values.

Dsl

A string or arrayref of connect options to use as the dsl connect string instead of using the Server or Database after 'dbi:$db_type:'

DslOptions

A string or arrayref of connect options to add to the dsl connect string. E.g.:

DBIx::BulkUtil->connect(
  Server => $server,
  DslOptions => [ 'interfaces=/my/interfaces_file', 'port=1234' ],
);

Will result in the connect string:

'dbi:Sybase:server=<server_name>;interfaces=/my/interfaces_file;port=1234'
NoServer

For Sybase, will not add the 'server=...' argument to the DSL connect string. For Oracle will not add the database name to the DSL string. The DslOptions option is necessary if this option is used.

connect_cached()

Same as connect method, but calls the DBI connect_cached method to make the actual database connection, and will return the same database handle previously returned for the same database, user, and DBI options.

syb_connect()

Same as connect method, but calls the DBIx::BulkUtil connect method with the Type option set to 'Sybase'

ora_connect()

Same as connect method, but calls the DBIx::BulkUtil connect method with the Type option set to 'Oracle'

ora_connect_cached()

Same as connect method, but calls the DBIx::BulkUtil connect_cached method with the Type option set to 'Oracle'

iq_connect()

Same as connect method, but calls the DBIx::BulkUtil connect method with the Type option set to 'SybaseIQ'

iq_connect_cached()

Same as connect method, but calls the DBIx::BulkUtil connect_cached method with the Type option set to 'SybaseIQ'

passwd()

Dummy function to override for determining password.

UTILITY OBJECT METHODS

Methods that may be called on the utility object that is optionally returned from the connect or connect_cached DBIX::BulkUtil class methods. These methods provide convenience and/or make some operations between Oracle and Sybase databases more transparent.

now

Returns sql that will return the current date/time of the database (e.g. to be used as a column in a select statement).

add

Returns sql that will add some unit of time to a datetime expression. E.g. $util->add($util->now(), 10, 'hour') adds 10 hours to the current time.

row_select

Given just a select clause (the part after the "SELECT" keyword), returns sql to select a row from no table (e.g. for fetching the current time from the database).

get

Fetches the row from a select clause with no table. E.g. $ect_util->get($ect_util->now()) returns the current database date/time.

obj_type

Returns T/V/P depending on whether the given object is a Table, View, or Procedure.

sp_sql

Returns sql to execute a given stored procedure with arguments. If one of the arguments is ":cursor", then for Sybase it is filtered out, for Oracle we assume it is a parameter name and not a literal string to be bound.

Sybase stored procedures can return multiple result sets, and also a list of output parameters. Oracle does not return result sets, but you can pass in a cursor as an output parameter. When you pass in a parameter ":cursor", we assume its an output parameter that will hold a statement handle, so you can return a single result set in a nearly "backwards compatible" way. But we don't handle "multiple" result sets (yet), we don't deal with other output parameters, and so this this method is not meant to be completely transparent for all stored procedures.

sp_sth

Prepares and executes a stored procedure with arguments, and returns a statement handle for fetching. If one of the arguments is ":cursor", then we assume for Oracle it is a cursor type output parameter, and the statement handle for the cursor is returned. For Sybase, we ignore any ":cursor" argument.

bcp_in

For Sybase, uses BCP, for Oracle, SQL Loader, to load a pipe-delimited file into a database table. If the last argument is a hash reference, then additional options may be specified. Current options are:

Delimiter

Specifies the delimiter in the bcp file (default: "|").

RowDelimiter

Specifies the record terminator in the bcp file (default: "\n").

For bcp_in, the number of rows to ignore at the start of the file. For bcp_out, if true, the first row will be the column names of the table.

DirectPath

For Oracle only, if true, does Direct path instead of conventional load. If value is 'P', also does parallel load. For parallel loads, indexes are not rebuilt after the load.

Constants

(Oracle and SybaseIQ only). A hashref of column names and constant values to set the columns to which are not in the file.

FieldRef

(Oracle only). A hashref of column names and sqlldr expressions to specify the value of the column. If the expression includes the column itself (e.g. ':column_name'), then the field will appear in the same position in the control file corresponding to its position in the table. If it does not (e.g. "to_date('2014-02-01','YYYY-MM-DD')"), then the column appears at the end of the control file field list (i.e. it assumes the column is not in the file).

Also, if the expression begins with "~", then assume the expression is position information for a fixed width file.

Filler

Generally used with the ColumnList option, a list of column names in the file which are filler and not loaded into the database.

Default

(SybaseIQ only). A reference to an array of column names not in the file, which will be set to their default values.

Stdin

(Oracle only). A file handle, subroutine, or reference to a scalar to supply data to sqlldr through stdin. If it is a subroutine, return values will be used as input until it returns undef. If this option is used, and none of the files supplied to bcp_in is named '-', then '-' is automatically added to the list. If one of the files is named '-', and this option is not used, then this option is assumed to be the *STDIN filehandle.

TrailingDelimiter

(SybaseIQ only). Boolean flag which indicates that the last column of each record has a trailing column delimiter.

DateFormat

(Oracle only). Sets the default date format mask for sqlldr. By default, the process will try to determine the date format for each date column from the input file. If the format can not be determined, then this format will be used.

SybaseDateFmt

(Oracle only). Sets the default DateFormat to 'MON DD YYYY HH12:MI:SS:FF3AM'.

QuoteFields

(Oracle only). Allows fields in bcp file to be quoted, thereby allowing delimiters within the field.

NLSLang

(Oracle only). Sets the characterset option set for sqlldr.

Semantics

(Oracle only). Sets the 'LENGTH SEMANTICS' in the control file. Allowable values are CHAR or BYTE.

Action

Sets sqlldr mode to APPEND, REPLACE, or TRUNCATE (valid values are A, R, or T, default is A). Simulates same thing for Sybase through truncate or delete sql statements for T and R. Replace does not replace individual rows, it deletes all rows first.

SybaseTypes

(Oracle only). An array of the Sybase data types being loaded. When the type is 'char' (not 'varchar'), then PRESERVE BLANKS is added in the control file for char(1) columns, and trim logic for char > 1 columns.

Debug

(Oracle only). Displays the sqlldr command line executed, and does not remove the sqlldr control, log, and bad record files.

NoExec

(Oracle only). Displays and returns but does not execute the sqlldr command line that would be executed.

CommitSize

The number of rows loaded before committing each batch (default: 1000).

MaxErrors

The maximum number of errors allowed before aborting the load (default: 0).

LoadWhen

(Oracle only). Adds this text to a WHEN clause in the sqlldr control file which determines which rows in the data file are loaded.

ColumnList

List of ordered column names in bcp file.

PacketSize

(Sybase only). Sets network packet size for bcp.

PassThru

(Sybase only). Allows an arbitrary list of arguments to be passed to the bcp command line.

If the file is not provided, it is assumed to be the table name with a ".bcp" extension.

Sybase bcp is broken. If you have delimiter characters in your data, there is no way to escape them. If your fields are quoted as in csv files, Sybase bcp will complain. For bcp_in, unquote fields and convert your file to a format with a new delimiter that does not appear in your data. For bcp_out, choose a delimiter that does not appear in your data.

bcp_out

For Sybase uses BCP, for Oracle, just select and print (Oracle has no "bcp out" type functionality) to export a database table to a file. See bcp_in for options.

If the file is not provided, it is assumed to be the table name with a ".bcp" extension.

For Sybase, if there are any money columns, or if the Filter option is used, then a view is temporarily created to bcp from. Money columns are converted to decimal so that they are not truncated.

Sybase bcp_out is broken. It does not escape delimiter characters. If you have delimiter characters in your data, you can call the select2file method, although bcp_in will not load the resulting file. See bcp_in.

If the last argument is a hash reference, then additional options may be specified. Current options are:

Delimiter

Same as bcp_in.

RowDelimiter

Same as bcp_in.

Header

Same as bcp_in.

NoFix

When using Sybase native bcp out, the default is to transform the dates into ANSI standard format (which historically used to be the only reason to use this library). This option, if true, disables that transformation, and can save time on large transfers.

Filter

Appends additional SQL clauses to the SELECT * statement, e.g. "WHERE asof_date > '2011-01-01'".

Columns

Comma separated list of columns to select from table.

blk_prepare

Prepares an insert statement for bulk insert into a table. For Sybase, the BulkLogin option must have been supplied with a true value on connect, or else this method will fail. Returns a statement handle that will insert arguments into the table. E.g.:

my $sth = $db_util->blk_prepare('some_table');
while (<FH>) {
  chomp;
  my @data = split /,/;
  $sth->execute(@data);
}
$sth->finish();

Inserts are batched, and so the finish() method must be called to commit the final batch. The execute() method must be called with a list of arguments corresponding to the list of columns from the table (excluding any columns in the Constant option below). The first argument is the table name. The following optional arguments are key/value pairs with the following keys:

Constants

A hash reference of column name and constant value pairs that will be inserted on every execute call. Values for these columns should not be included in the list of arguments in the call to execute().

CommitSize

The number of inserts per batch (default: 1000).

BlkOpts

A hash reference of options to pass to the Sybase syb_bcp_attribs options. Needed if inserting to identity columns. See DBD::Sybase for these options.

bcp_sql

(Sybase only). Given a table name and a sql statement, uses sqsh to execute a sql statement and bcp the results into a table.

select2file

Calls the non-Sybase version of bcp_out which just selects from a table and saves to a delimited file. Accepts an optional hashref as the last argument with the same options as bcp_out. Also accepts the option Filter which appends additional SQL clauses to the SELECT * statement. Include any additional keywords (e.g. "WHERE", "GROUP BY", etc.) in the Filter option.

Returns the number of rows selected.

bcp_file

Modifies a bcp_file that contains a header row. Arguments are ($input_file, output_file, {%options}), with available options KeepCols and DropCols. KeepCols is a list of column names to keep from the input file (and the order in which they will appear in the output file), DropCols is a list of columns to drop from the input file. KeepCols overrides DropCols. Also accepts the following same options as bcp_out: Delimiter.

merge

Merges data from a staging table into a target table. For Oracle, issues a MERGE statement, for Sybase, it deletes from the target table corresponding rows from the staging table, then inserts records from the staging to the target table. Columns last_chg_user and last_chg_date are appropriately updated in the target table. Accepts a hash with the following keys as its argument:

Table

The target table.

StgTable

The staging table.

KeyCols

A list of the key columns in the target table. Defaults to the columns in the first unique index found on the table.

UpdCols

A list of the columns to update in the target table. Defaults to all columns in the target table not in the list of key columns. Ignored in Sybase since rows are deleted and inserted, not updated.

ColMap

A hashref of target to staging table column name mappings.

NoExec

Display but do not execute SQL.

column_info

Given a table name, returns a hash reference with keys LIST and MAP. LIST will contain a list of all columns in the table lowercased. MAP contains a hash reference with the lowercase column names as keys, and the value is a hash reference with keys NAME and PRECISION. NAME is the column name in the actual (upper/lower/mixed) case in the database, and PRECISION is the size of the column.

index_info

Given a table name, returns a hash reference of the names of any indexes on the table and an array reference of the column names in the index. If the optional second argument is true, returns all indexes, otherwise returns only unique indexes.

key_columns

Given a table name, returns a reference to an array of column names that are in the primary_key, or if no primary_key exists, returns the columns in the first unique index found on the table.

upd_columns

Given a table name, returns a reference to an array of all of the column names in a table that are not the key columns of a table.

primary_key

Returns an array reference of the column names in the primary key of a table.

strptime_fmt

Given a date string, returns a template suitable for passing to strptime. Returns an undefined value if the format can not be determined.

ora_date_fmt

Given a date string, returns an Oracle date format string. Returns an undefined value if the format can not be determined.

prepare

Prepares a SQL statement, optionally binds a reference to a hash or array to its input parameters, and returns a statement handle.

Example:

my $sth = $dbu->prepare(
  Table => 'eqa_own.some_table',
  Columns => [qw(column1 column2)]
  BindHash => \my %href,
);
$href{column1} = 'Col1Value';

$href{column2} = 'Col2Value1';
$sth->execute();

$href{column2} = 'Col2Value2';
$sth->execute();

Accepts as arguments a hash with the following keys:

Table

Table name. If provided, will construct an insert statement for this table. (Required if Sql not provided).

Sql

Sql statement. Prepares this SQL statement. (Required if Table not provided).

Columns

List of column names. If Table is provided, must be names of columns in the table. (default: all columns in Table if Table is provided).

BindHash

(Oracle only). Reference to hash. Placeholders in SQL statement will be bound to this hash reference. (May not be used with BindArray).

BindArray

(Oracle only). Reference to array. Placeholders in SQL statement will be bound to this array reference. (May not be used with BindHash).

ByName

If true, use ":column_name" type placeholders in SQL statement and in binding to hash or array. If false (but defined), use "?" as placeholders. "?" placeholders may not be used with BindHash or BindArray. (default: Oracle true, Sybase false).