NAME
EAI::Wrap - framework for easy creation of Enterprise Application Integration tasks
SYNOPSIS
# site.config
%config = (
sensitive => {
myftp => {user => 'someone', pwd => 'password', privKey => 'pathToPrivateKey', hostkey => 'hostkey to be presented'},
mydb => {user => 'someone', pwd => 'password'}
},
checkLookup => {"task_script.pl" => {errmailaddress => "test\@test.com", errmailsubject => "testjob failed", timeToCheck => "0800", freqToCheck => "B", logFileToCheck => "test.log", logcheck => "started.*"}},
folderEnvironmentMapping => {Test => "Test", Dev => "Dev", "" => "Prod"},
errmailaddress => 'To@somewhere.com',
errmailsubject => "errMailSubject",
fromaddress => 'from@somewhere.com',
smtpServer => "MailServer",
smtpTimeout => 60,
logRootPath => "C:/dev/EAI/Logs",
historyFolder => "History",
redoDir => "redo",
task => {
retrySecondsErr => 60*5,
retrySecondsPlanned => 60*15,
},
DB => {
server => {Prod => "ProdServer", Test => "TestServer"},
cutoffYr2000 => 60,
DSN => 'driver={SQL Server};Server=$DB->{server}{$execute{env}};database=$DB->{database};TrustedConnection=Yes;',
schemaName => "dbo",
},
FTP => {
maxConnectionTries => 5,
plinkInstallationPath => "C:/dev/EAI/putty/PLINK.EXE",
},
File => {
format_thousandsep => ",",
format_decimalsep => ".",
}
);
# task_script.pl
use EAI::Wrap;
%common = (
FTP => {
remoteHost => {"Prod" => "ftp.com", "Test" => "ftp-test.com"},
remoteDir => "/reports",
port => 22,
user => "myuser",
privKey => 'C:/keystore/my_private_key.ppk',
FTPdebugLevel => 0, # ~(1|2|4|8|16|1024|2048)
},
DB => {
tablename => "ValueTable",
deleteBeforeInsertSelector => "rptDate = ?",
dontWarnOnNotExistingFields => 1,
database => "DWH",
},
task => {
plannedUntil => "2359",
},
);
@loads = (
{
File => {
filename => "Datafile1.XML",
format_XML => 1,
format_sep => ',',
format_xpathRecordLevel => '//reportGrp/CM1/*',
format_fieldXpath => {rptDate => '//rptHdr/rptDat', NotionalVal => 'NotionalVal', tradeRef => 'tradeRefId', UTI => 'UTI'},
format_header => "rptDate,NotionalVal,tradeRef,UTI",
},
},
{
File => {
filename => "Datafile2.txt",
format_sep => "\t",
format_skip => 1,
format_header => "rptDate NotionalVal tradeRef UTI",
},
}
);
setupEAIWrap();
openDBConn(\%common) or die;
openFTPConn(\%common) or die;
while (!$execute{processEnd}) {
for my $load (@loads) {
getFilesFromFTP($load);
if (checkFiles($load)) {
readFileData($load);
dumpDataIntoDB($load);
markProcessed($load);
}
}
processingEnd();
}
DESCRIPTION
EAI::Wrap provides a framework for defining EAI jobs directly in Perl, sparing the creator of low-level tasks as FTP-Fetching, file-parsing and storing into a database. It also can be used to handle other workflows, like creating files from the database and uploading to FTP-Servers or using other externally provided tools.
The definition is done by first setting up configuration hashes and then providing a high-level scripting of the job itself using the provided API (although any perl code is welcome here!).
EAI::Wrap has a lot of infrastructure already included, like logging using Log4perl, database handling with DBI and DBD::ODBC, FTP services using Net::SFTP::Foreign, file parsing using Text::CSV (text files), Data::XLSX::Parser and Spreadsheet::ParseExcel (excel files), XML::LibXML (xml files), file writing with Spreadsheet::WriteExcel and Excel::Writer::XLSX (excel files), Text::CSV (text files).
Furthermore it provides very flexible commandline options, allowing almost all configurations to be set on the commandline. Commandline options (e.g. additional information passed on with the interactive option) of the task script are fetched at INIT allowing use of options within the configuration, e.g. $opt{process}{interactive_startdate} for a passed start date.
Also the logging configured in $ENV{EAI_WRAP_CONFIG_PATH}/log.config
(logfile root path set in $ENV{EAI_WRAP_CONFIG_PATH}/site.config
) starts immediately at INIT of the task script, to use a logger, simply make a call to get_logger(). For the logging configuration, see EAI::Common, setupLogging.
API
- %config
-
global config (set in
$ENV{EAI_WRAP_CONFIG_PATH}/site.config
, amended with$ENV{EAI_WRAP_CONFIG_PATH}/additional/*.config
), contains special parameters (default error mail sending, logging paths, etc.) and site-wide pre-settings for the five categories in task scripts, described below under configuration categories) - %common
-
common configs for the task script, may contain one configuration hash for each configuration category.
- @loads
-
list of hashes defining specific load processes within the task script. Each hash may contain one configuration hash for each configuration category.
- configuration categories
-
In the above mentioned hashes can be five categories (sub-hashes): DB, File, FTP, process and task. These allow further parameters to be set for the respective parts of EAI::Wrap (EAI::DB, EAI::File and EAI::FTP), process parameters and task parameters. The parameters are described in detail in section CONFIGURATION REFERENCE.
The process category is on the one hand used to pass information within each process (data, additionalLookupData, filenames, hadErrors or custom commandline parameters starting with interactive), on the other hand for additional configurations not suitable for DB, File or FTP (e.g. uploadCMD). The task category contains parameters used on the task script level and is therefore only allowed in
%config
and%common
. It contains parameters for skipping, retrying and redoing the whole task script.The settings in DB, File, FTP and task are "merge" inherited in a cascading manner (i.e. missing parameters are merged, parameters already set below are not overwritten):
%config (defined in site.config and other associated configs loaded at INIT) merged into -> %common (common task parameters defined in script) merged into each of -> $loads[]
special config parameters and DB, FTP, File, task parameters from command line options are merged at the respective level (config at the top, the rest at the bottom) and always override any set parameters. Only scalar parameters can be given on the command line, no lists and hashes are possible. Commandline options are given in the format:
--<category> <parameter>=<value>
for the common level and
--load<i><category> <parameter>=<value>
for the loads level.
Command line options are also available to the script via the hash
%opt
or the list of hashes@optloads
, so in order to access the cmdline option--process interactive_date=202300101
you could either use$common{process}{interactive_date}
or$opt{process}{interactive_date}
.In order to use
--load1process interactive_date=202300101
, you would use$loads[1]{process}{interactive_date}
or$optloads[1]{process}{interactive_date}
.The merge inheritance for DB, FTP, File and task can be prevented by using an underscore after the hashkey, ie.
DB_
,FTP_
,File_
andtask_
. In this case the parameters are not merged fromcommon
. However, they are always inherited fromconfig
. - %execute
-
hash of parameters for current task execution which is not set by the user but can be used to set other parameters and control the flow. Most important here are
$execute{env}
giving the current used environment (Prod, Test, Dev, whatever),$execute{envraw}
(Production is empty here), the several files lists (being procesed, for deletion, moving, etc.), flags for ending/interrupting processing, directory locations as home and history, etc.Detailed information about the several parameters used can be found in section execute of the configuration parameter reference, there are parameters for files (filesProcessed, filesToArchive, filesToDelete, filesToMoveinHistory, filesToMoveinHistoryUpload, filesToRemove and retrievedFiles), directories (homedir, historyFolder, historyFolderUpload and redoDir), process controlling parameters (failcount, firstRunSuccess, retryBecauseOfError, retrySeconds and processEnd).
Retrying with querying
$execute{processEnd}
can happen on two reasons: First, becausetask => {plannedUntil => "HHMM"}
is set to a time until the task has to be retried, however this is done at most until midnight. Second, because an error occurred, in this case$process->{hadErrors}
is set on each load that failed.$execute{retryBecauseOfError}
is also important in this context as it prevents the repeated run of following API procedures if the process didn't have an error:getLocalFiles, getFilesFromFTP, getFiles, checkFiles, extractArchives, getAdditionalDBData, readFileData, dumpDataIntoDB, writeFileFromDB, putFileInLocalDir, uploadFileToFTP, uploadFileCMD, and uploadFile.
After the first successful run of the task,
$execute{firstRunSuccess}
is set to prevent any error messages resulting of files having been moved/removed while rerunning the task until the defined planned time (task => {plannedUntil => "HHMM"}
) has been reached. - INIT
-
The INIT procedure is executed at the EAI::Wrap module initialization (when EAI::Wrap is used in the task script) and loads the site configuration, starts logging and reads commandline options. This means that everything passed to the script via command line may be used in the definitions, especially the
task{interactive.*}
parameters, here the name and the type of the parameter are not checked by the consistency checks (all other parameters not allowed or having the wrong type would throw an error). - removeFilesinFolderOlderX
-
remove files on FTP server being older than a time back (given in day/mon/year in
remove => {removeFolders => ["",""], day=>, mon=>, year=>1}
), see EAI::FTP::removeFilesOlderX - openDBConn ($)
-
argument $arg (ref to current load or common)
open a DB connection with the information provided in
$DB->{user}
,$DB->{pwd}
(these can be provided by the sensitive information looked up using$DB->{prefix}
) and$DB->{DSN}
which can be dynamically configured using information from$DB
itself, using$execute{env}
inside$DB->{server}{*}
:'driver={SQL Server};Server=$DB->{server}{$execute{env}};database=$DB->{database};TrustedConnection=Yes;'
, also see EAI::DB::newDBH - openFTPConn ($)
-
argument $arg (ref to current load or common)
open a FTP connection with the information provided in
$FTP->{remoteHost}
,$FTP->{user}
,$FTP->{pwd}
,$FTP->{hostkey}
,$FTP->{privKey}
(these four can be provided by the sensitive information looked up using$FTP->{prefix}
) and$execute{env}
, also see EAI::FTP::login - redoFiles ($)
-
argument $arg (ref to current load or common)
redo file from redo directory if specified (
$common{task}{redoFile}
is being set), this is also being called by getLocalFiles and getFilesFromFTP. Arguments are fetched from common or loads[i], using File parameter. - getLocalFiles ($)
-
argument $arg (ref to current load or common)
get local file(s) from source into homedir, uses
$File->{filename}
,$File->{extension}
and$File->{avoidRenameForRedo}
. Arguments are fetched from common or loads[i], using File parameter. - getFilesFromFTP ($)
-
argument $arg (ref to current load or common)
get file/s (can also be a glob for multiple files) from FTP into homedir and extract archives if needed. Arguments are fetched from common or loads[i], using File and FTP parameters.
- getFiles ($)
-
argument $arg (ref to current load or common)
combines above two procedures in a general procedure to get files from FTP or locally. Arguments are fetched from common or loads[i], using File and FTP parameters.
All get<*Files*> functions also parse the file into the datastructure process{data}. Custom "hooks" can be defined with fieldCode and lineCode to modify and enhance the standard mapping defined in format_header. To access the final line data the hash %EAI::File::line can be used (specific fields with $EAI::File::line{<target header column>}). if a field is being replaced using a different name from targetheader, the data with the original header name is placed in %EAI::File::templine. You can also access data from the previous line with %EAI::File::previousline and the previous temp line with %EAI::File::previoustempline.
- checkFiles ($)
-
argument $arg (ref to current load or common)
check files for continuation of processing and extract archives if needed. Arguments are fetched from common or loads[i], using File parameter. The processed files are put into process->{filenames}
- extractArchives ($)
-
argument $arg (ref to current load or common)
extract files from archive. Arguments are fetched from common or loads[i], using only the process->{filenames} parameter that was filled by checkFiles.
- getAdditionalDBData ($;$)
-
arguments $arg (ref to current load or common) and optional $refToDataHash
get additional data from DB. Arguments are fetched from common or loads[i], using DB and process parameters. You can also pass an optional ref to a data hash parameter to store the retrieved data there instead of
$process-
{additionalLookupData}> - readFileData ($)
-
argument $arg (ref to current load or common)
read data from a file. Arguments are fetched from common or loads[i], using File parameter.
- dumpDataIntoDB ($)
-
argument $arg (ref to current load or common)
store data into Database. Arguments are fetched from common or loads[i], using DB and File (for emptyOK) parameters.
- markProcessed ($)
-
argument $arg (ref to current load or common)
mark files as being processed depending on whether there were errors, also decide on removal/archiving of downloaded files. Arguments are fetched from common or loads[i], using File parameter.
- writeFileFromDB ($)
-
argument $arg (ref to current load or common)
create Data-files from Database. Arguments are fetched from common or loads[i], using DB and File parameters.
- putFileInLocalDir ($)
-
argument $arg (ref to current load or common)
put files into local folder if required. Arguments are fetched from common or loads[i], using File parameter.
- markForHistoryDelete ($)
-
argument $arg (ref to current load or common)
mark to be removed or be moved to history after upload. Arguments are fetched from common or loads[i], using File parameter.
- uploadFileToFTP ($)
-
argument $arg (ref to current load or common)
upload files to FTP. Arguments are fetched from common or loads[i], using FTP and File parameters.
- uploadFileCMD ($)
-
argument $arg (ref to current load or common)
upload files using an upload command program. Arguments are fetched from common or loads[i], using File and process parameters.
- uploadFile ($)
-
argument $arg (ref to current load or common)
combines above two procedures in a general procedure to upload files via FTP or CMD or to put into local dir. Arguments are fetched from common or loads[i], using File and process parameters
- processingEnd
-
final processing steps for processEnd (cleanup, FTP removal/archiving) or retry after pausing. No context argument as this always depends on all loads and/or the common definition
- processingPause ($)
-
generally available procedure for pausing processing, argument $pauseSeconds gives the delay
- moveFilesToHistory (;$)
-
optional argument $archiveTimestamp
move transferred files marked for moving (filesToMoveinHistory/filesToMoveinHistoryUpload) into history and/or historyUpload folder. Optionally a custom timestamp can be passed.
- deleteFiles ($)
-
argument $filenames, ref to array
delete transferred files given in $filenames
CONFIGURATION REFERENCE
- config
-
parameter category for site global settings, defined in site.config and other associated configs loaded at INIT
- checkLookup
-
ref to datastructure {"scriptname.pl" => {errmailaddress => "",errmailsubject => "",timeToCheck =>"", freqToCheck => "", logFileToCheck => "", logcheck => "",logRootPath =>""},...} used for logchecker, each entry of the hash lookup table defines a log to be checked, defining errmailaddress to receive error mails, errmailsubject, timeToCheck as earliest time to check for existence in log, freqToCheck as frequency of checks (daily/monthly/etc), logFileToCheck as the name of the logfile to check, logcheck as the regex to check in the logfile and logRootPath as the folder where the logfile is found. lookup key: $execute{scriptname} + $execute{addToScriptName}
- errmailaddress
-
default mail address for central logcheck/errmail sending
- errmailsubject
-
default mail subject for central logcheck/errmail sending
- executeOnInit
-
code to be executed during INIT of EAI::Wrap to allow for assignment of config/execute parameters from commandline params BEFORE Logging!
- folderEnvironmentMapping
-
ref to hash {Test => "Test", Dev => "Dev", "" => "Prod"}, mapping for $execute{envraw} to $execute{env}
- fromaddress
-
from address for central logcheck/errmail sending, also used as default sender address for sendGeneralMail
- historyFolder
-
ref to hash {"scriptname.pl" => "folder"}, folders where downloaded files are historized, lookup key as in checkLookup, default in "" => "defaultfolder"
- historyFolderUpload
-
ref to hash {"scriptname.pl" => "folder"}, folders where uploaded files are historized, lookup key as in checkLookup, default in "" => "defaultfolder"
- logCheckHoliday
-
calendar for business days in central logcheck/errmail sending. builtin calendars are AT (Austria), TG (Target), UK (United Kingdom) and WE (for only weekends). Calendars can be added with EAI::DateUtil::addCalendar
- logs_to_be_ignored_in_nonprod
-
logs to be ignored in central logcheck/errmail sending
- logRootPath
-
ref to hash {"scriptname.pl" => "folder"}, paths to log file root folders (environment is added to that if non production), lookup key as checkLookup, default in "" => "defaultfolder"
- redoDir
-
ref to hash {"scriptname.pl" => "folder"}, folders where files for redo are contained, lookup key as checkLookup, default in "" => "defaultfolder"
- sensitive
-
hash lookup table ({"prefix" => {user=>"",pwd =>"",hostkey=>"",privkey =>""},...}) for sensitive access information in DB and FTP (lookup keys are set with DB{prefix} or FTP{prefix}), may also be placed outside of site.config; all sensitive keys can also be environment lookups, e.g. hostkey=>{Test => "", Prod => ""} to allow for environment specific setting
- smtpServer
-
smtp server for den (error) mail sending
- smtpTimeout
-
timeout for smtp response
- testerrmailaddress
-
error mail address in non prod environment
- execute
-
hash of parameters for current task execution which is not set by the user but can be used to set other parameters and control the flow
- alreadyMovedOrDeleted
-
hash for checking the already moved or deleted files, to avoid moving/deleting them again at cleanup
- addToScriptName
-
this can be set to be added to the scriptname for config{checkLookup} keys, e.g. some passed parameter.
- env
-
Prod, Test, Dev, whatever
- envraw
-
Production has a special significance here as being the empty string (used for paths). Otherwise like env.
- errmailaddress
-
for central logcheck/errmail sending in current process
- errmailsubject
-
for central logcheck/errmail sending in current process
- failcount
-
for counting failures in processing to switch to longer wait period or finish altogether
- filesToArchive
-
list of files to be moved in archiveDir on FTP server, necessary for cleanup at the end of the process
- filesToDelete
-
list of files to be deleted on FTP server, necessary for cleanup at the end of the process
- filesToMoveinHistory
-
list of files to be moved in historyFolder locally, necessary for cleanup at the end of the process
- filesToMoveinHistoryUpload
-
list of files to be moved in historyFolderUpload locally, necessary for cleanup at the end of the process
- filesToRemove
-
list of files to be deleted locally, necessary for cleanup at the end of the process
- firstRunSuccess
-
for planned retries (process=>plannedUntil filled) -> this is set after the first run to avoid error messages resulting of files having been moved/removed.
- freqToCheck
-
for logchecker: frequency to check entries (B,D,M,M1) ...
- homedir
-
the home folder of the script, mostly used to return from redo and other folders for globbing files.
- historyFolder
-
actually set historyFolder
- historyFolderUpload
-
actually set historyFolderUpload
- logcheck
-
for logchecker: the Logcheck (regex)
- logFileToCheck
-
for logchecker: Logfile to be searched
- logRootPath
-
actually set logRootPath
- processEnd
-
specifies that the process is ended, checked in EAI::Wrap::processingEnd
- redoDir
-
actually set redoDir
- retrievedFiles
-
files retrieved from FTP or redo directory
- retryBecauseOfError
-
retryBecauseOfError shows if a rerun occurs due to errors (for successMail) and also prevents several API calls from being run again.
- retrySeconds
-
how many seconds are passed between retries. This is set on error with process=>retrySecondsErr and if planned retry is defined with process=>retrySecondsPlanned
- scriptname
-
name of the current process script, also used in log/history setup together with addToScriptName for config{checkLookup} keys
- timeToCheck
-
for logchecker: scheduled time of job (don't look earlier for log entries)
- DB
-
DB specific configs
- addID
-
this hash can be used to additionaly set a constant to given fields: Fieldname => Fieldvalue
- additionalLookup
-
query used in getAdditionalDBData to retrieve lookup information from DB using readFromDBHash
- additionalLookupKeys
-
used for getAdditionalDBData, list of field names to be used as the keys of the returned hash
- cutoffYr2000
-
when storing date data with 2 year digits in dumpDataIntoDB/storeInDB, this is the cutoff where years are interpreted as 19XX (> cutoffYr2000) or 20XX (<= cutoffYr2000)
- columnnames
-
returned column names from readFromDB and readFromDBHash, this is used in writeFileFromDB to pass column information from database to writeText
- database
-
database to be used for connecting
- debugKeyIndicator
-
used in dumpDataIntoDB/storeInDB as an indicator for keys for debugging information if primkey not given (errors are shown with this key information). Format is the same as for primkey
- deleteBeforeInsertSelector
-
used in dumpDataIntoDB/storeInDB to delete specific data defined by keydata before an insert (first occurrence in data is used for key values). Format is the same as for primkey ("key1 = ? ...")
- dontWarnOnNotExistingFields
-
suppress warnings in dumpDataIntoDB/storeInDB for not existing fields
- dontKeepContent
-
if table should be completely cleared before inserting data in dumpDataIntoDB/storeInDB
- doUpdateBeforeInsert
-
invert insert/update sequence in dumpDataIntoDB/storeInDB, insert only done when upsert flag is set
- DSN
-
DSN String for DB connection
- incrementalStore
-
when storing data with dumpDataIntoDB/storeInDB, avoid setting empty columns to NULL
- ignoreDuplicateErrs
-
ignore any duplicate errors in dumpDataIntoDB/storeInDB
- keyfields
-
used for readFromDBHash, list of field names to be used as the keys of the returned hash
- longreadlen
-
used for setting database handles LongReadLen parameter for DB connection, if not set defaults to 1024
- noDBTransaction
-
don't use a DB transaction for dumpDataIntoDB
- noDumpIntoDB
-
if files from this load should not be dumped to the database
- postDumpExecs
-
array for execs done in dumpDataIntoDB after postDumpProcessing and before commit/rollback: [{execs => ['',''], condition => ''}]. doInDB all execs if condition (evaluated string or anonymous sub: condition => sub {...}) is fulfilled
- postDumpProcessing
-
done in dumpDataIntoDB after storeInDB, execute perl code in postDumpProcessing (evaluated string or anonymous sub: postDumpProcessing => sub {...})
- postReadProcessing
-
done in writeFileFromDB after readFromDB, execute perl code in postReadProcessing (evaluated string or anonymous sub: postReadProcessing => sub {...})
- prefix
-
key for sensitive information (e.g. pwd and user) in config{sensitive}
- primkey
-
primary key indicator to be used for update statements, format: "key1 = ? AND key2 = ? ..."
- pwd
-
for password setting, either directly (insecure -> visible) or via sensitive lookup
- query
-
query statement used for readFromDB and readFromDBHash
- schemaName
-
schemaName used in dumpDataIntoDB/storeInDB, if tableName contains dot the extracted schema from tableName overrides this. Needed for datatype information!
- server
-
DB Server in environment hash lookup: {Prod => "", Test => ""}
- tablename
-
the table where data is stored in dumpDataIntoDB/storeInDB
- upsert
-
in dumpDataIntoDB/storeInDB, should an update be done after the insert failed (because of duplicate keys) or insert after the update failed (because of key not exists)?
- user
-
for user setting, either directly (insecure -> visible) or via sensitive lookup
- File
-
File parsing specific configs
- avoidRenameForRedo
-
when redoing, usually the cutoff (datetime/redo info) is removed following a pattern. set this flag to avoid this
- columns
-
for writeText: Hash of data fields, that are to be written (in order of keys)
- columnskip
-
for writeText: boolean hash of column names that should be skipped when writing the file ({column1ToSkip => 1, column2ToSkip => 1, ...})
- dontKeepHistory
-
if up- or downloaded file should not be moved into historyFolder but be deleted
- dontMoveIntoHistory
-
if up- or downloaded file should not be moved into historyFolder but be kept in homedir
- emptyOK
-
flag to specify whether empty files should not invoke an error message. Also needed to mark an empty file as processed in EAI::Wrap::markProcessed
- extract
-
flag to specify whether to extract files from archive package (zip)
- extension
-
the extension of the file to be read (optional, used for redoFile)
- fieldCode
-
additional field based processing code: fieldCode => {field1 => 'perl code', ..}, invoked if key equals either header (as in format_header) or targetheader (as in format_targetheader) or invoked for all fields if key is empty {"" => 'perl code'}. set $EAI::File::skipLineAssignment to true (1) if current line should be skipped from data. perl code can be an evaluated string or an anonymous sub: field1 => sub {...}
- filename
-
the name of the file to be read
- firstLineProc
-
processing done in reading the first line of text files
- format_allowLinefeedInData
-
line feeds in values don't create artificial new lines/records, only works for csv quoted data
- format_beforeHeader
-
additional String to be written before the header in write text
- format_dateColumns
-
numeric array of columns that contain date values (special parsing) in excel files
- format_decimalsep
-
decimal separator used in numbers of sourcefile (defaults to . if not given)
- format_defaultsep
-
default separator when format_sep not given (usually in site.config), if not given, "\t" is used as default.
- format_encoding
-
text encoding of the file in question (e.g. :encoding(utf8))
- format_headerColumns
-
optional numeric array of columns that contain data in excel files (defaults to all columns starting with first column up to format_targetheader length)
- format_header
-
format_sep separated string containing header fields (optional in excel files, only used to check against existing header row)
- format_headerskip
-
skip until row-number for checking header row against format_header in excel files
- format_eol
-
for quoted csv specify special eol character (allowing newlines in values)
- format_fieldXpath
-
for XML reading, hash with field => xpath to content association entries
- format_fix
-
for text writing, specify whether fixed length format should be used (requires format_padding)
- format_namespaces
-
for XML reading, hash with alias => namespace association entries
- format_padding
-
for text writing, hash with field number => padding to be applied for fixed length format
- format_poslen
-
array of positions/length definitions: e.g. "poslen => [(0,3),(3,3)]" for fixed length format text file parsing
- format_quotedcsv
-
special parsing/writing of quoted csv data using Text::CSV
- format_sep
-
separator string for csv format, regex for split for other separated formats. Also needed for splitting up format_header and format_targetheader (Excel and XML-formats use tab as default separator here).
- format_sepHead
-
special separator for header row in write text, overrides format_sep
- format_skip
-
either numeric or string, skip until row-number if numeric or appearance of string otherwise in reading textfile
- format_stopOnEmptyValueColumn
-
for excel reading, stop row parsing when a cell with this column number is empty (denotes end of data, to avoid very long parsing).
- format_suppressHeader
-
for textfile writing, suppress output of header
- format_targetheader
-
format_sep separated string containing target header fields (= the field names in target/database table). optional for XML and tabular textfiles, defaults to format_header if not given there.
- format_thousandsep
-
thousand separator used in numbers of sourcefile (defaults to , if not given)
- format_worksheetID
-
worksheet number for excel reading, this should always work
- format_worksheet
-
alternatively the worksheet name can be passed, this only works for new excel format (xlsx)
- format_xlformat
-
excel format for parsing, also specifies excel parsing
- format_xpathRecordLevel
-
xpath for level where data nodes are located in xml
- format_XML
-
specify xml parsing
- lineCode
-
additional line based processing code, invoked after whole line has been read (evaluated string or anonymous sub: lineCode => sub {...})
- localFilesystemPath
-
if files are taken from or put to the local file system with getLocalFiles/putFileInLocalDir then the path is given here. Setting this to "." avoids copying files.
- optional
-
to avoid error message for missing optional files, set this to 1
- FTP
-
FTP specific configs
- archiveDir
-
folder for archived files on the FTP server
- dontMoveTempImmediately
-
if 0 oder missing: rename/move files immediately after writing to FTP to the final name, otherwise/1: a call to EAI::FTP::moveTempFiles is required for that
- dontDoSetStat
-
for Net::SFTP::Foreign, no setting of time stamp of remote file to that of local file (avoid error messages of FTP Server if it doesn't support this)
- dontDoUtime
-
don't set time stamp of local file to that of remote file
- dontUseQuoteSystemForPwd
-
for windows, a special quoting is used for passing passwords to Net::SFTP::Foreign that contain [()"<>& . This flag can be used to disable this quoting.
- dontUseTempFile
-
directly upload files, without temp files
- fileToArchive
-
should file be archived on FTP server? requires archiveDir to be set
- fileToRemove
-
should file be removed on FTP server?
- FTPdebugLevel
-
debug ftp: 0 or ~(1|2|4|8|16|1024|2048), loglevel automatically set to debug for module EAI::FTP
- hostkey
-
hostkey to present to the server for Net::SFTP::Foreign, either directly (insecure -> visible) or via sensitive lookup
- localDir
-
optional: local folder for files to be placed, if not given files are downloaded into current folder
- maxConnectionTries
-
maximum number of tries for connecting in login procedure
- onlyArchive
-
only archive/remove on the FTP server, requires archiveDir to be set
- path
-
additional relative FTP path (under remoteDir which is set at login), where the file(s) is/are located
- port
-
ftp/sftp port (leave empty for default port 22)
- prefix
-
key for sensitive information (e.g. pwd and user) in config{sensitive}
- privKey
-
sftp key file location for Net::SFTP::Foreign, either directly (insecure -> visible) or via sensitive lookup
- pwd
-
for password setting, either directly (insecure -> visible) or via sensitive lookup
- queue_size
-
queue_size for Net::SFTP::Foreign, if > 1 this causes often connection issues
- remove
-
ref to hash {removeFolders=>[], day=>, mon=>, year=>1} for for removing (archived) files with removeFilesOlderX, all files in removeFolders are deleted being older than day days, mon months and year years
- remoteDir
-
remote root folder for up-/download, archive and remove: "out/Marktdaten/", path is added then for each filename (load)
- remoteHost
-
ref to hash of IP-addresses/DNS of host(s).
- SFTP
-
to explicitly use SFTP, if not given SFTP will be derived from existence of privKey or hostkey
- simulate
-
for removal of files using removeFilesinFolderOlderX/removeFilesOlderX only simulate (1) or do actually (0)?
- sshInstallationPath
-
path were ssh/plink exe to be used by Net::SFTP::Foreign is located
- type
-
(A)scii or (B)inary
- user
-
set user directly, either directly (insecure -> visible) or via sensitive lookup
- process
-
used to pass information within each process (data, additionalLookupData, filenames, hadErrors or commandline parameters starting with interactive) and for additional configurations not suitable for DB, File or FTP (e.g. uploadCMD* and onlyExecFor)
- additionalLookupData
-
additional data retrieved from database with EAI::Wrap::getAdditionalDBData
- archivefilenames
-
in case a zip archive package is retrieved, the filenames of these packages are kept here, necessary for cleanup at the end of the process
- data
-
loaded data: array (rows) of hash refs (columns)
- filenames
-
names of files that were retrieved and checked to be locally available for that load, can be more than the defined file in File->filename (due to glob spec or zip archive package)
- filesProcessed
-
hash for checking the processed files, necessary for cleanup at the end of the whole task
- hadErrors
-
set to 1 if there were any errors in the process
- interactive_
-
interactive options (are not checked), can be used to pass arbitrary data via command line into the script (eg a selected date for the run with interactive_date).
- onlyExecFor
-
mark loads to only be executed when $common{task}{execOnly} !~ $load->{process}{onlyExecFor}
- uploadCMD
-
upload command for use with uploadFileCMD
- uploadCMDPath
-
path of upload command
- uploadCMDLogfile
-
logfile where command given in uploadCMD writes output (for error handling)
- task
-
contains parameters used on the task script level
- customHistoryTimestamp
-
optional custom timestamp to be added to filenames moved to History/HistoryUpload/FTP archive, if not given, get_curdatetime is used (YYYYMMDD_hhmmss)
- execOnly
-
used to remove loads where $common{task}{execOnly} !~ $load->{process}{onlyExecFor}
- ignoreNoTest
-
ignore the notest file in the process-script folder, usually preventing all runs that are not in production
- plannedUntil
-
latest time that planned repetition should start, this can be given either as HHMM (HourMinute) or HHMMSS (HourMinuteSecond), in case of HHMM the "Second" part is attached as 59
- redoFile
-
flag for specifying a redo
- redoTimestampPatternPart
-
part of the regex for checking against filename in redo with additional timestamp/redoDir pattern (e.g. "redo", numbers and _), anything after files barename (and before ".$ext" if extension is defined) is regarded as a timestamp. Example: '[\d_]', the regex is built like ($ext ? qr/$barename($redoTimestampPatternPart|$redoDir)*\.$ext/ : qr/$barename($redoTimestampPatternPart|$redoDir)*.*/)
- retrySecondsErr
-
retry period in case of error
- retrySecondsErrAfterXfails
-
after fail count is reached this alternate retry period in case of error is applied. If 0/undefined then job finishes after fail count
- retrySecondsXfails
-
fail count after which the retrySecondsErr are changed to retrySecondsErrAfterXfails
- retrySecondsPlanned
-
retry period in case of planned retry
- skipHolidays
-
skip script execution on holidays
- skipHolidaysDefault
-
holiday calendar to take into account for skipHolidays
- skipWeekends
-
skip script execution on weekends
- skipForFirstBusinessDate
-
used for "wait with execution for first business date", either this is a calendar or 1 (then calendar is skipHolidaysDefault), this cannot be used together with skipHolidays
COPYRIGHT
Copyright (c) 2023 Roland Kapl
All rights reserved. This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
The full text of the license can be found in the LICENSE file included with this module.