NAME
EAI::Wrap - framework for easy creation of Enterprise Application Integration tasks
SYNOPSIS
# site.config
%config = (
sensitive => {
dbSys => {user => "DBuser", pwd => "DBPwd"},
ftpSystem1 => {user => "FTPuser", pwd => "FTPPwd", privKey => 'path_to_private_key', hostkey =>'hostkey'},
},
checkLookup => {"task_script.pl" => {errmailaddress => "test\@test.com", errmailsubject => "testjob failed", timeToCheck => "0800", freqToCheck => "B", logFileToCheck => "test.log", logcheck => "started.*"}},
executeOnInit => sub {$execute{addToScriptName} = "doWhateverHereToModifySettings";},
folderEnvironmentMapping => {Test => "Test", Dev => "Dev", "" => "Prod"},
errmailaddress => 'your@mail.address',
errmailsubject => "No errMailSubject defined",
fromaddress => 'service@mail.address',
smtpServer => "a.mail.server",
smtpTimeout => 60,
testerrmailaddress => 'your@mail.address',
logRootPath => {"" => "C:/dev/EAI/Logs",},
historyFolder => {"" => "History",},
historyFolderUpload => "HistoryUpload",
redoDir => {"" => "redo",},
task => {
redoTimestampPatternPart => '[\d_]',
retrySecondsErr => 60*5,
retrySecondsErrAfterXfails => 60*10,
retrySecondsXfails => 2,
retrySecondsPlanned => 60*15,
},
DB => {
server => {Prod => "ProdServer", Test => "TestServer"},
cutoffYr2000 => 60,
DSN => 'driver={SQL Server};Server=$DB->{server}{$execute{env}};database=$DB->{database};TrustedConnection=Yes;',
schemaName => "dbo",
},
FTP => {
lookups => {
ftpSystem1 => {remoteHost => {Test => "TestHost", Prod => "ProdHost"}, port => 5022},
},
maxConnectionTries => 5,
sshInstallationPath => "C:/dev/EAI/putty/PLINK.EXE",
},
File => {
format_defaultsep => "\t",
format_thousandsep => ",",
format_decimalsep => ".",
}
);
# task_script.pl
use EAI::Wrap;
%common = (
FTP => {
remoteHost => {"Prod" => "ftp.com", "Test" => "ftp-test.com"},
remoteDir => "/reports",
port => 22,
user => "myuser",
privKey => 'C:/keystore/my_private_key.ppk',
FTPdebugLevel => 0, # ~(1|2|4|8|16|1024|2048)
},
DB => {
tablename => "ValueTable",
deleteBeforeInsertSelector => "rptDate = ?",
dontWarnOnNotExistingFields => 1,
database => "DWH",
},
task => {
plannedUntil => "2359",
},
);
@loads = (
{
File => {
filename => "Datafile1.XML",
format_XML => 1,
format_sep => ',',
format_xpathRecordLevel => '//reportGrp/CM1/*',
format_fieldXpath => {rptDate => '//rptHdr/rptDat', NotionalVal => 'NotionalVal', tradeRef => 'tradeRefId', UTI => 'UTI'},
format_header => "rptDate,NotionalVal,tradeRef,UTI",
},
},
{
File => {
filename => "Datafile2.txt",
format_sep => "\t",
format_skip => 1,
format_header => "rptDate NotionalVal tradeRef UTI",
},
}
);
setupEAIWrap();
openDBConn(\%common) or die;
openFTPConn(\%common) or die;
while (processingContinues()) {
for my $load (@loads) {
if (getFilesFromFTP($load)) {
readFileData($load);
dumpDataIntoDB($load);
markProcessed($load);
}
}
}
DESCRIPTION
EAI::Wrap provides a framework for defining EAI jobs directly in Perl, sparing the creator of low-level tasks as FTP-Fetching, file-parsing and storing into a database. It also can be used to handle other workflows, like creating files from the database and uploading to FTP-Servers or using other externally provided tools.
The definition is done by first setting up datastructures for configurations and then providing a high-level scripting of the job itself using the provided subs (although any perl code is welcome here!).
EAI::Wrap has a lot of infrastructure already included, like logging using Log4perl, database handling with DBI and DBD::ODBC, FTP services using Net::SFTP::Foreign, file parsing using Text::CSV (text files), Data::XLSX::Parser and Spreadsheet::ParseExcel (excel files), XML::LibXML (xml files), file writing with Spreadsheet::WriteExcel and Excel::Writer::XLSX (excel files), Text::CSV (text files).
Furthermore it provides very flexible commandline options, allowing almost all configurations to be set on the commandline. Commandline options (e.g. additional information passed on with the interactive option) of the task script are fetched at INIT allowing use of options within the configuration, e.g. $opt{process}{interactive_startdate} for a passed start date.
Also the logging configured in $ENV{EAI_WRAP_CONFIG_PATH}/log.config
(logfile root path set in $ENV{EAI_WRAP_CONFIG_PATH}/site.config
) starts immediately at INIT of the task script, to use a logger, simply make a call to get_logger(). For the logging configuration, see EAI::Common, setupLogging.
There are two accompanying scripts:
setDebugLevel.pl to easily modify the configured log-levels of the task-script itself and all EAI-Wrap modules.
checkLogExist.pl to run checks on the produced logs (at given times using a cron-job or other scheduler) for their existence and certain (starting/finishing) entries, giving error notifications if the check failed.
API: datastructures for configurations
- %config
-
global config (set in
$ENV{EAI_WRAP_CONFIG_PATH}/site.config
, amended with$ENV{EAI_WRAP_CONFIG_PATH}/additional/*.config
), contains special parameters (default error mail sending, logging paths, etc.) and site-wide pre-settings for the five categories in task scripts, described below under configuration categories) - %common
-
common configs for the task script, may contain one configuration hash for each configuration category.
- @loads
-
list of hashes defining specific load processes within the task script. Each hash may contain one configuration hash for each configuration category.
- configuration categories
-
In the above mentioned hashes can be five categories (sub-hashes): DB, File, FTP, process and task. These allow further parameters to be set for the respective parts of EAI::Wrap (EAI::DB, EAI::File and EAI::FTP), process parameters and task parameters. The parameters are described in detail in section CONFIGURATION REFERENCE.
The process category is on the one hand used to pass information within each process (data, additionalLookupData, filenames, hadErrors or custom commandline parameters starting with interactive), on the other hand for additional configurations not suitable for DB, File or FTP (e.g. uploadCMD). The task category contains parameters used on the task script level and is therefore only allowed in
%config
and%common
. It contains parameters for skipping, retrying and redoing the whole task script.The settings in DB, File, FTP and task are "merge" inherited in a cascading manner (i.e. missing parameters are merged, parameters already set below are not overwritten):
%config (defined in site.config and other associated configs loaded at INIT) merged into -> %common (common task parameters defined in script) merged into each of -> $loads[]
special config parameters and DB, FTP, File, task parameters from command line options are merged at the respective level (config at the top, the rest at the bottom) and always override any set parameters. Only scalar parameters can be given on the command line, no lists and hashes are possible. Commandline options are given in the format:
--<category> <parameter>=<value>
for the common level and
--load<i><category> <parameter>=<value>
for the loads level.
Command line options are also available to the script via the hash
%opt
or the list of hashes@optloads
, so in order to access the cmdline option--process interactive_date=202300101
you could either use$common{process}{interactive_date}
or$opt{process}{interactive_date}
.In order to use
--load1process interactive_date=202300101
, you would use$loads[1]{process}{interactive_date}
or$optloads[1]{process}{interactive_date}
.The merge inheritance for DB, FTP, File and task can be prevented by using an underscore after the hashkey, ie.
DB_
,FTP_
,File_
andtask_
. In this case the parameters are not merged fromcommon
. However, they are always inherited fromconfig
.A special merge is done for configurations defined in hash
lookups
, which may appear in all five categories (sub-hashes) of the top-level configuration%config
. This uses the prefix defined in the task script's%common
configuration to get generally defined settings for this specific prefix. As an example, common remoteHosts or ports for FTP can be defined here. These settings also allow an environment dependent hash, like{Test => "TestHost", Prod => "ProdHost"}
. - %execute
-
hash of parameters for current task execution which is not set by the user but can be used to set other parameters and control the flow. Most important here are
$execute{env}
, giving the current used environment (Prod, Test, Dev, whatever),$execute{envraw}
(Production is empty here), the several file lists (files being procesed, files for deletion/moving, etc.), flags for ending/interrupting processing and directory locations as home and historyDetailed information about the several parameters used can be found in section execute of the configuration parameter reference, there are parameters for files (filesProcessed, filesToDelete, filesToMoveinHistory, filesToMoveinHistoryUpload, retrievedFiles) and uploadFilesToDelete, directories (homedir, historyFolder, historyFolderUpload and redoDir), process controlling parameters (failcount, firstRunSuccess, retryBecauseOfError, retrySeconds and processEnd).
Retrying with checking
$execute{processEnd}
(set duringprocessingEnd()
, combining this call and check can be done in loop header at start withprocessingContinues()
) can happen because of two reasons: First, due totask => {plannedUntil => "HHMM"}
being set to a time until the task has to be retried, however this is done at most until midnight. Second, because an error occurred, in such a case$process->{hadErrors}
is set for each load that failed.$process{successfullyDone}
is also important in this context as it prevents the repeated run of following API procedures if the loads didn't have an error during their execution:openDBConn, openFTPConn, getLocalFiles, getFilesFromFTP, getFiles, extractArchives, getAdditionalDBData, readFileData, dumpDataIntoDB, writeFileFromDB, putFileInLocalDir, uploadFileToFTP, uploadFileCMD, and uploadFile.
checkFiles is always run, regardless of
$process{successfullyDone}
.After the first successful run of the task,
$execute{firstRunSuccess}
is set to prevent any error messages resulting of files having been moved/removed while rerunning the task until the defined planned time (task => {plannedUntil => "HHMM"}
) has been reached. - initialization
-
The INIT procedure is executed at the task script initialization (when EAI::Wrap is used in the task script) and loads the site configuration, starts logging and reads commandline options. This means that everything passed to the script via command line may be used in the definitions, especially the
task{interactive.*}
parameters, here the name and the type of the parameter are not checked by the consistency checks (all other parameters not allowed or having the wrong type would throw an error). The task scripts configuration itself is then read with setupEAIWrap(), which is usually called immediately after the datastructures for configurations have been finished.
API: High-level subs
Following are the high level subs that can be called for a standard workflow. Most of them accumulate their sub names in process{successfullyDone} to prevent any further call in a faulting loop, when they alrady ran successfully. Also process{hadErrors} is set in case of errors to provide for error repeating. Downloaded files are collected in process{filenames} and completely processed files in process{filesProcessed}.
- setupEAIWrap
-
setupEAIWrap is actually imported from EAI::Common, but as it is usually called as the first sub, it is mentioned here as well. This sub sets up the configuration datastructure and merges the hierarchy of configurations, more information in EAI::Common::setupEAIWrap.
- removeFilesinFolderOlderX
-
Usually done for clearing FTP archives, this removes files on FTP server being older than a time back (given in day/mon/year in
remove => {removeFolders => ["",""], day=>, mon=>, year=>1}
), see EAI::FTP::removeFilesOlderX (always runs in a faulting loop) - openDBConn ($)
-
argument $arg (ref to current load or common)
open a DB connection with the information provided in
$DB->{user}
,$DB->{pwd}
(these can be provided by the sensitive information looked up using$DB->{prefix}
) and$DB->{DSN}
which can be dynamically configured using information from$DB
itself, using$execute{env}
inside$DB->{server}{*}
:'driver={SQL Server};Server=$DB->{server}{$execute{env}};database=$DB->{database};TrustedConnection=Yes;'
, also see EAI::DB::newDBHIf the DSN information is not found in
$DB
then a system wide DSN for the set $DB{prefix} is tried to be fetched from$config{DB}{$DB{prefix}}{DSN}
. This also respects environment information in$execute{env}
if configured. - openFTPConn ($)
-
argument $arg (ref to current load or common)
open a FTP connection with the information provided in
$FTP->{remoteHost}
,$FTP->{user}
,$FTP->{pwd}
,$FTP->{hostkey}
,$FTP->{privKey}
(these four can be provided by the sensitive information looked up using$FTP->{prefix}
) and$execute{env}
, also see EAI::FTP::loginIf the remoteHost information is not found in
$FTP
then a system wide remoteHost for the set $FTP{prefix} is tried to be fetched from$config{FTP}{$FTP{prefix}}{remoteHost}
. This also respects environment information in$execute{env}
if configured. - redoFiles ($)
-
argument $arg (ref to current load or common)
redo file from redo directory if specified (
$common{task}{redoFile}
is being set), this is also being called by getLocalFiles and getFilesFromFTP. Arguments are fetched from common or loads[i], using File parameter. (always runs in a faulting loop when called directly) - getLocalFiles ($)
-
argument $arg (ref to current load or common)
get local file(s) from source into homedir, checks files for continuation of processing and extract archives if needed. Arguments are fetched from common or loads[i], using File parameter. The processed files are put into process->{filenames} (always runs in a faulting loop). Uses
$File->{filename}
,$File->{extension}
and$File->{avoidRenameForRedo}
. - getFilesFromFTP ($)
-
argument $arg (ref to current load or common)
get file/s (can also be a glob for multiple files) from FTP into homedir, checks files for continuation of processing and extract archives if needed. Arguments are fetched from common or loads[i], using File and FTP parameters. The processed files are put into process->{filenames} (always runs in a faulting loop).
- getFiles ($)
-
argument $arg (ref to current load or common)
combines above two procedures in a general procedure to get files from FTP or locally. Arguments are fetched from common or loads[i], using File and FTP parameters.
- checkFiles ($)
-
argument $arg (ref to current load or common)
check files for continuation of processing and extract archives if needed. Arguments are fetched from common or loads[i], using File parameter. The processed files are put into process->{filenames} (always runs in a faulting loop). Important: files (their filenames) not retrieved by getFilesFromFTP or getLocalFiles have to be put into $execute{retrievedFiles} (e.g. push @{$execute{retrievedFiles}}, $filenameTobeChecked)!
- extractArchives ($)
-
argument $arg (ref to current load or common)
extract files from archive (only one archive is allowed). Arguments are fetched from common or loads[i], using only the process->{filenames} parameter that was filled by checkFiles. If not being called by getFilesFromFTP/getLocalFiles and checkFiles @{$process{filenames}} has to contain the archive filename.
- getAdditionalDBData ($;$)
-
arguments $arg (ref to current load or common) and optional $refToDataHash
get additional data from DB. Arguments are fetched from common or loads[i], using DB and process parameters. You can also pass an optional ref to a data hash parameter to store the retrieved data there instead of
$process-
{additionalLookupData}> - readFileData ($)
-
argument $arg (ref to current load or common)
read data from a file. Arguments are fetched from common or loads[i], using File parameter. This parses the file content into the datastructure process{data}. Custom "hooks" can be defined with fieldCode and lineCode to modify and enhance the standard mapping defined in format_header. To access the final line data the hash %EAI::File::line can be used (specific fields with $EAI::File::line{<target header column>}). if a field is being replaced using a different name from targetheader, the data with the original header name is placed in %EAI::File::templine. You can also access data from the previous line with %EAI::File::previousline and the previous temp line with %EAI::File::previoustempline.
- dumpDataIntoDB ($)
-
argument $arg (ref to current load or common)
store data into Database. Arguments are fetched from common or loads[i], using DB and File (for emptyOK) parameters.
- markProcessed ($)
-
argument $arg (ref to current load or common)
mark files as being processed depending on whether there were errors, also decide on removal/archiving of downloaded files. Arguments are fetched from common or loads[i], using File parameter. (always runs in a faulting loop)
- writeFileFromDB ($)
-
argument $arg (ref to current load or common)
create Data-files from Database. Arguments are fetched from common or loads[i], using DB and File parameters.
- putFileInLocalDir ($)
-
argument $arg (ref to current load or common)
put files into local folder if required. Arguments are fetched from common or loads[i], using File parameter.
- markForHistoryDelete ($)
-
argument $arg (ref to current load or common)
mark to be removed or be moved to history after upload. Arguments are fetched from common or loads[i], using File parameter. (always runs in a faulting loop)
- uploadFileToFTP ($)
-
argument $arg (ref to current load or common)
upload files to FTP. Arguments are fetched from common or loads[i], using FTP and File parameters.
- uploadFileCMD ($)
-
argument $arg (ref to current load or common)
upload files using an upload command program. Arguments are fetched from common or loads[i], using File and process parameters.
- uploadFile ($)
-
argument $arg (ref to current load or common)
combines above two procedures in a general procedure to upload files via FTP or CMD or to put into local dir. Arguments are fetched from common or loads[i], using File and process parameters
- processingEnd
-
final processing steps for process ending (cleanup, FTP removal/archiving) or retry after pausing. No context argument as this always depends on all loads and/or the common definition (always runs in a faulting loop). Returns true if process ended and false if not. Using this as a check also works for do .. while or do .. until loops.
- processingPause ($)
-
generally available procedure for pausing processing, argument $pauseSeconds gives the delay
- processingContinues
-
Alternative and compact way to combine call to
processingEnd()
and check of$execute{processEnd}
in one go in a while or until loop header. Returns true if process continues and false if not. Caveat: This doesn't works for do .. while or do .. until loops! Instead of checkingprocessingEnd()
andprocessingContinues()
, a check of!$execute{processEnd}
can be done in the while or until header with a call toprocessingEnd()
at the end of the loop. - moveFilesToHistory (;$)
-
optional argument $archiveTimestamp
move transferred files marked for moving (filesToMoveinHistory/filesToMoveinHistoryUpload) into history and/or historyUpload folder. Optionally a custom timestamp can be passed.
- deleteFiles ($)
-
argument $filenames, ref to array
delete transferred files given in $filenames
CONFIGURATION REFERENCE
- config
-
parameter category for site global settings, defined in site.config and other associated configs loaded at INIT
- checkLogExistDelay
-
ref to hash {Test => 2, Dev => 3, "" => 0}, mapping to set delays for checkLogExist per environment in $execute{env}, this can be further overriden per job (and environment) in checkLookup.
- checkLookup
-
ref to datastructure {"scriptname.pl + optional addToScriptName" => {errmailaddress => "",errmailsubject => "",timeToCheck =>"", freqToCheck => "", logFileToCheck => "", logcheck => "",logRootPath =>""},...} used for logchecker, each entry of the hash lookup table defines a log to be checked, defining errmailaddress to receive error mails, errmailsubject, timeToCheck as earliest time to check for existence in log, freqToCheck as frequency of checks (daily/monthly/etc), logFileToCheck as the name of the logfile to check, logcheck as the regex to check in the logfile and logRootPath as the folder where the logfile is found. lookup key: $execute{scriptname} + $execute{addToScriptName}
- errmailaddress
-
default mail address for central logcheck/errmail sending
- errmailsubject
-
default mail subject for central logcheck/errmail sending
- executeOnInit
-
code to be executed during INIT of EAI::Wrap to allow for assignment of config/execute parameters from commandline params BEFORE Logging!
- folderEnvironmentMapping
-
ref to hash {Test => "Test", Dev => "Dev", "" => "Prod"}, mapping for $execute{envraw} to $execute{env}
- fromaddress
-
from address for central logcheck/errmail sending, also used as default sender address for sendGeneralMail
- historyFolder
-
ref to hash {"scriptname.pl + optional addToScriptName" => "folder"}, folders where downloaded files are historized, lookup key as in checkLookup, default in "" => "defaultfolder". historyFolder, historyFolderUpload, logRootPath and redoDir are always built with an environment subfolder, the default is built as folderPath/endFolder/environ, otherwise it is built as folderPath/environ/endFolder. Environment subfolders (environ) are also built depending on prodEnvironmentInSeparatePath: either folderPath/endFolder/$execute{env} (prodEnvironmentInSeparatePath = true, Prod has own subfolder) or folderPath/endFolder/$execute{envraw} (prodEnvironmentInSeparatePath = false, Prod is in common folder, other environments have their own folder)
- historyFolderUpload
-
ref to hash {"scriptname.pl + optional addToScriptName" => "folder"}, folders where uploaded files are historized, lookup key as in checkLookup, default in "" => "defaultfolder"
- logCheckHoliday
-
calendar for business days in central logcheck/errmail sending. builtin calendars are AT (Austria), TG (Target), UK (United Kingdom) and WE (for only weekends). Calendars can be added with EAI::DateUtil::addCalendar
- logs_to_be_ignored_in_nonprod
-
regular expression to specify logs to be ignored in central logcheck/errmail sending
- logprefixForLastLogfile
-
prefix for previous (day) logs to be set in error mail (link), if not given, defaults to get_curdate(). In case Log::Dispatch::FileRotate is used as the File Appender in Log4perl config, the previous log is identified with <logname>.1
- logRootPath
-
ref to hash {"scriptname.pl + optional addToScriptName" => "folder"}, paths to log file root folders (environment is added to that if non production), lookup key as checkLookup, default in "" => "defaultfolder"
- prodEnvironmentInSeparatePath
-
set to 1 if the production scripts/logs etc. are in a separate Path defined by folderEnvironmentMapping (prod=root/Prod, test=root/Test, etc.), set to 0 if the production scripts/logs are in the root folder and all other environments are below that folder (prod=root, test=root/Test, etc.)
- redoDir
-
ref to hash {"scriptname.pl + optional addToScriptName" => "folder"}, folders where files for redo are contained, lookup key as checkLookup, default in "" => "defaultfolder"
- sensitive
-
hash lookup table ({"prefix" => {user=>"",pwd =>"",hostkey=>"",privkey =>""},...}) for sensitive access information in DB and FTP (lookup keys are set with DB{prefix} or FTP{prefix}), may also be placed outside of site.config; all sensitive keys can also be environment lookups, e.g. hostkey=>{Test => "", Prod => ""} to allow for environment specific setting
- smtpServer
-
smtp server for den (error) mail sending
- smtpTimeout
-
timeout for smtp response
- testerrmailaddress
-
error mail address in non prod environment
- execute
-
hash of parameters for current task execution which is not set by the user but can be used to set other parameters and control the flow
- alreadyMovedOrDeleted
-
hash for checking the already moved or deleted local files, to avoid moving/deleting them again at cleanup
- addToScriptName
-
this can be set to be added to the scriptname for config{checkLookup} keys, e.g. some passed parameter.
- env
-
Prod, Test, Dev, whatever is defined as the lookup value in folderEnvironmentMapping. homedir as fetched from the File::basename::dirname of the executing script using /^.*[\\\/](.*?)$/ is used as the key for looking up this value.
- envraw
-
Production has a special significance here as being an empty string. Otherwise like env.
- errmailaddress
-
target address for central logcheck/errmail sending in current process
- errmailsubject
-
mail subject for central logcheck/errmail sending in current process
- failcount
-
for counting failures in processing to switch to longer wait period or finish altogether
- filesToDelete
-
list of files to be deleted locally after download, necessary for cleanup at the end of the process
- filesToMoveinHistory
-
list of files to be moved in historyFolder locally, necessary for cleanup at the end of the process
- filesToMoveinHistoryUpload
-
list of files to be moved in historyFolderUpload locally, necessary for cleanup at the end of the process
- firstRunSuccess
-
for planned retries (process=>plannedUntil filled) -> this is set after the first run to avoid error messages resulting of files having been moved/removed.
- freqToCheck
-
for logchecker: frequency to check entries (B,D,M,M1) ...
- homedir
-
the home folder of the script, mostly used to return from redo and other folders for globbing files.
- historyFolder
-
actually set historyFolder
- historyFolderUpload
-
actually set historyFolderUpload
- logcheck
-
for logchecker: the Logcheck (regex)
- logFileToCheck
-
for logchecker: Logfile to be searched
- logRootPath
-
actually set logRootPath
- processEnd
-
specifies that the process is ended, checked in EAI::Wrap::processingEnd
- redoDir
-
actually set redoDir
- retrievedFiles
-
files retrieved from FTP or redo directory
- retryBecauseOfError
-
retryBecauseOfError shows if a rerun occurs due to errors (for successMail)
- retrySeconds
-
how many seconds are passed between retries. This is set on error with process=>retrySecondsErr and if planned retry is defined with process=>retrySecondsPlanned
- scriptname
-
name of the current process script, also used in log/history setup together with addToScriptName for config{checkLookup} keys
- timeToCheck
-
for logchecker: scheduled time of job (don't look earlier for log entries)
- uploadFilesToDelete
-
list of files to be deleted locally after upload, necessary for cleanup at the end of the process
- DB
-
DB specific configs
- addID
-
this hash can be used to additionaly set a constant to given fields: Fieldname => Fieldvalue
- additionalLookup
-
query used in getAdditionalDBData to retrieve lookup information from DB using readFromDBHash
- additionalLookupKeys
-
used for getAdditionalDBData, list of field names to be used as the keys of the returned hash
- cutoffYr2000
-
when storing date data with 2 year digits in dumpDataIntoDB/storeInDB, this is the cutoff where years are interpreted as 19XX (> cutoffYr2000) or 20XX (<= cutoffYr2000)
- columnnames
-
returned column names from readFromDB and readFromDBHash, this is used in writeFileFromDB to pass column information from database to writeText
- database
-
database to be used for connecting
- debugKeyIndicator
-
used in dumpDataIntoDB/storeInDB as an indicator for keys for debugging information if primkey not given (errors are shown with this key information). Format is the same as for primkey
- deleteBeforeInsertSelector
-
used in dumpDataIntoDB/storeInDB to delete specific data defined by keydata before an insert (first occurrence in data is used for key values). Format is the same as for primkey ("key1 = ? ...")
- dontWarnOnNotExistingFields
-
suppress warnings in dumpDataIntoDB/storeInDB for not existing fields
- dontKeepContent
-
if table should be completely cleared before inserting data in dumpDataIntoDB/storeInDB
- doUpdateBeforeInsert
-
invert insert/update sequence in dumpDataIntoDB/storeInDB, insert only done when upsert flag is set
- DSN
-
DSN String for DB connection
- incrementalStore
-
when storing data with dumpDataIntoDB/storeInDB, avoid setting empty columns to NULL
- ignoreDuplicateErrs
-
ignore any duplicate errors in dumpDataIntoDB/storeInDB
- keyfields
-
used for readFromDBHash, list of field names to be used as the keys of the returned hash
- longreadlen
-
used for setting database handles LongReadLen parameter for DB connection, if not set defaults to 1024
- lookups
-
similar to $config{sensitive}, a hash lookup table ({"prefix" => {remoteHost=>""},...} or {"prefix" => {remoteHost=>{Prod => "", Test => ""}},...}) for centrally looking up DSN Settings depending on $DB{prefix}. Overrides $DB{DSN} set in config, but is overriden by script-level settings in %common.
- noDBTransaction
-
don't use a DB transaction for dumpDataIntoDB
- noDumpIntoDB
-
if files from this load should not be dumped to the database
- postDumpExecs
-
array for execs done in dumpDataIntoDB after postDumpProcessing and before commit/rollback: [{execs => ['',''], condition => ''}]. doInDB all execs if condition (evaluated string or anonymous sub: condition => sub {...}) is fulfilled
- postDumpProcessing
-
done in dumpDataIntoDB after storeInDB, execute perl code in postDumpProcessing (evaluated string or anonymous sub: postDumpProcessing => sub {...})
- postReadProcessing
-
done in writeFileFromDB after readFromDB, execute perl code in postReadProcessing (evaluated string or anonymous sub: postReadProcessing => sub {...})
- prefix
-
key for sensitive information (e.g. pwd and user) in config{sensitive} or system wide DSN in config{DB}{prefix}{DSN}. respects environment in $execute{env} if configured.
- primkey
-
primary key indicator to be used for update statements, format: "key1 = ? AND key2 = ? ..."
- pwd
-
for password setting, either directly (insecure -> visible) or via sensitive lookup
- query
-
query statement used for readFromDB and readFromDBHash
- schemaName
-
schemaName used in dumpDataIntoDB/storeInDB, if tableName contains dot the extracted schema from tableName overrides this. Needed for datatype information!
- server
-
DB Server in environment hash lookup: {Prod => "", Test => ""}
- tablename
-
the table where data is stored in dumpDataIntoDB/storeInDB
- upsert
-
in dumpDataIntoDB/storeInDB, should an update be done after the insert failed (because of duplicate keys) or insert after the update failed (because of key not exists)?
- user
-
for user setting, either directly (insecure -> visible) or via sensitive lookup
- File
-
File parsing specific configs
- avoidRenameForRedo
-
when redoing, usually the cutoff (datetime/redo info) is removed following a pattern. set this flag to avoid this
- columns
-
for writeText: Hash of data fields, that are to be written (in order of keys)
- columnskip
-
for writeText: boolean hash of column names that should be skipped when writing the file ({column1ToSkip => 1, column2ToSkip => 1, ...})
- dontKeepHistory
-
if up- or downloaded file should not be moved into historyFolder but be deleted
- dontMoveIntoHistory
-
if up- or downloaded file should not be moved into historyFolder but be kept in homedir
- emptyOK
-
flag to specify whether empty files should not invoke an error message. Also needed to mark an empty file as processed in EAI::Wrap::markProcessed
- extract
-
flag to specify whether to extract files from archive package (zip)
- extension
-
the extension of the file to be read (optional, used for redoFile)
- fieldCode
-
additional field based processing code: fieldCode => {field1 => 'perl code', ..}, invoked if key equals either header (as in format_header) or targetheader (as in format_targetheader) or invoked for all fields if key is empty {"" => 'perl code'}. set $EAI::File::skipLineAssignment to true (1) if current line should be skipped from data. perl code can be an evaluated string or an anonymous sub: field1 => sub {...}
- filename
-
the name of the file to be read
- firstLineProc
-
processing done in reading the first line of text files
- format_allowLinefeedInData
-
line feeds in values don't create artificial new lines/records, only works for csv quoted data
- format_autoheader
-
assumption: header exists in file and format_header should be derived from there. only for readText
- format_beforeHeader
-
additional String to be written before the header in write text
- format_dateColumns
-
numeric array of columns that contain date values (special parsing) in excel files
- format_decimalsep
-
decimal separator used in numbers of sourcefile (defaults to . if not given)
- format_defaultsep
-
default separator when format_sep not given (usually in site.config), if not given, "\t" is used as default.
- format_encoding
-
text encoding of the file in question (e.g. :encoding(utf8))
- format_headerColumns
-
optional numeric array of columns that contain data in excel files (defaults to all columns starting with first column up to format_targetheader length)
- format_header
-
format_sep separated string containing header fields (optional in excel files, only used to check against existing header row)
- format_headerskip
-
skip until row-number for checking header row against format_header in excel files
- format_eol
-
for quoted csv specify special eol character (allowing newlines in values)
- format_fieldXpath
-
for XML reading, hash with field => xpath to content association entries
- format_fix
-
for text writing, specify whether fixed length format should be used (requires format_padding)
- format_namespaces
-
for XML reading, hash with alias => namespace association entries
- format_padding
-
for text writing, hash with field number => padding to be applied for fixed length format
- format_poslen
-
array of array defining positions and lengths [[pos1,len1],[pos2,len2]...[posN,lenN]] of data in fixed length format text files (if format_sep == "fix")
- format_quotedcsv
-
special parsing/writing of quoted csv data using Text::CSV
- format_sep
-
separator string for csv format, regex for split for other separated formats. Also needed for splitting up format_header and format_targetheader (Excel and XML-formats use tab as default separator here).
- format_sepHead
-
special separator for header row in write text, overrides format_sep
- format_skip
-
either numeric or string, skip until row-number if numeric or appearance of string otherwise in reading textfile
- format_stopOnEmptyValueColumn
-
for excel reading, stop row parsing when a cell with this column number is empty (denotes end of data, to avoid very long parsing).
- format_suppressHeader
-
for textfile writing, suppress output of header
- format_targetheader
-
format_sep separated string containing target header fields (= the field names in target/database table). optional for XML and tabular textfiles, defaults to format_header if not given there.
- format_thousandsep
-
thousand separator used in numbers of sourcefile (defaults to , if not given)
- format_worksheetID
-
worksheet number for excel reading, this should always work
- format_worksheet
-
alternatively the worksheet name can be passed, this only works for new excel format (xlsx)
- format_xlformat
-
excel format for parsing, also specifies excel parsing
- format_xpathRecordLevel
-
xpath for level where data nodes are located in xml
- format_XML
-
specify xml parsing
- lineCode
-
additional line based processing code, invoked after whole line has been read (evaluated string or anonymous sub: lineCode => sub {...})
- localFilesystemPath
-
if files are taken from or put to the local file system with getLocalFiles/putFileInLocalDir then the path is given here. Setting this to "." avoids copying files.
- optional
-
to avoid error message for missing optional files, set this to 1
- FTP
-
FTP specific configs
- additionalParamsGet
-
additional parameters for Net::SFTP::Foreign get.
- additionalMoreArgs
-
additional more args for Net::SFTP::Foreign new (args passed to ssh command).
- additionalParamsNew
-
additional parameters for Net::SFTP::Foreign new.
- additionalParamsPut
-
additional parameters for Net::SFTP::Foreign put.
- archiveDir
-
folder for archived files on the FTP server
- dontMoveTempImmediately
-
if 0 oder missing: rename/move files immediately after writing to FTP to the final name, otherwise/1: a call to EAI::FTP::moveTempFiles is required for that
- dontDoSetStat
-
for Net::SFTP::Foreign, no setting of time stamp of remote file to that of local file (avoid error messages of FTP Server if it doesn't support this)
- dontDoUtime
-
don't set time stamp of local file to that of remote file
- dontUseQuoteSystemForPwd
-
for windows, a special quoting is used for passing passwords to Net::SFTP::Foreign that contain [()"<>& . This flag can be used to disable this quoting.
- dontUseTempFile
-
directly upload files, without temp files
- fileToArchive
-
should files be archived on FTP server? if archiveDir is not set, then file is archived (rolled) in the same folder
- fileToRemove
-
should files be removed on FTP server?
- FTPdebugLevel
-
debug ftp: 0 or ~(1|2|4|8|16|1024|2048), loglevel automatically set to debug for module EAI::FTP
- hostkey
-
hostkey to present to the server for Net::SFTP::Foreign, either directly (insecure -> visible) or via sensitive lookup
- hostkey2
-
additional hostkey to be presented (e.g. in case of round robin DNS)
- localDir
-
optional: local folder for files to be placed, if not given files are downloaded into current folder
- lookups
-
similar to $config{sensitive}, a hash lookup table ({"prefix" => {remoteHost=>""},...} or {"prefix" => {remoteHost=>{Prod => "", Test => ""}},...}) for centrally looking up remoteHost and port settings depending on $FTP{prefix}.
- maxConnectionTries
-
maximum number of tries for connecting in login procedure
- noDirectRemoteDirChange
-
if no direct change into absolute paths (/some/path/to/change/into) ist possible then set this to 1, this separates the change into setcwd(undef) and setcwd(remoteDir)
- onlyArchive
-
only archive/remove on the FTP server, requires archiveDir to be set
- path
-
additional relative FTP path (under remoteDir which is set at login), where the file(s) is/are located
- port
-
ftp/sftp port (leave empty for default port 22 when using Net::SFTP::Foreign, or port 21 when using Net::FTP)
- prefix
-
key for sensitive information (e.g. pwd and user) in config{sensitive} or system wide remoteHost/port in config{FTP}{prefix}{remoteHost} or config{FTP}{prefix}{port}. respects environment in $execute{env} if configured.
- privKey
-
sftp key file location for Net::SFTP::Foreign, either directly (insecure -> visible) or via sensitive lookup
- pwd
-
for password setting, either directly (insecure -> visible) or via sensitive lookup
- queue_size
-
queue_size for Net::SFTP::Foreign, if > 1 this causes often connection issues
- remove
-
ref to hash {removeFolders=>[], day=>, mon=>, year=>1} for for removing (archived) files with removeFilesOlderX, all files in removeFolders are deleted being older than day days, mon months and year years
- remoteDir
-
remote root folder for up-/download, archive and remove: "out/Marktdaten/", path is added then for each filename (load)
- remoteHost
-
ref to hash of IP-addresses/DNS of host(s).
- SFTP
-
to explicitly use SFTP, if not given SFTP will be derived from existence of privKey or hostkey
- simulate
-
for removal of files using removeFilesinFolderOlderX/removeFilesOlderX only simulate (1) or do actually (0)?
- sshInstallationPath
-
path were ssh/plink exe to be used by Net::SFTP::Foreign is located
- type
-
(A)scii or (B)inary, only applies to Net::FTP
- user
-
set user directly, either directly (insecure -> visible) or via sensitive lookup
- process
-
used to pass information within each process (data, additionalLookupData, filenames, hadErrors or commandline parameters starting with interactive) and for additional configurations not suitable for DB, File or FTP (e.g. uploadCMD* and onlyExecFor)
- additionalLookupData
-
additional data retrieved from database with EAI::Wrap::getAdditionalDBData
- archivefilenames
-
in case a zip archive package is retrieved, the filenames of these packages are kept here, necessary for cleanup at the end of the process
- data
-
loaded data: array (rows) of hash refs (columns)
- filenames
-
names of files that were retrieved and checked to be locally available for that load, can be more than the defined file in File->filename (due to glob spec or zip archive package)
- filesProcessed
-
hash for checking the processed files, necessary for cleanup at the end of the whole task
- hadErrors
-
set to 1 if there were any errors in the process
- interactive_
-
interactive options (are not checked), can be used to pass arbitrary data via command line into the script (eg a selected date for the run with interactive_date).
- onlyExecFor
-
define loads to only be executed when $common{task}{execOnly} !~ $load->{process}{onlyExecFor}. Empty onlyExecFor loads are always executed regardless of $common{task}{execOnly}
- successfullyDone
-
accumulates API sub names to prevent most API calls that ran successfully from being run again.
- uploadCMD
-
upload command for use with uploadFileCMD
- uploadCMDPath
-
path of upload command
- uploadCMDLogfile
-
logfile where command given in uploadCMD writes output (for error handling)
- task
-
contains parameters used on the task script level
- customHistoryTimestamp
-
optional custom timestamp to be added to filenames moved to History/HistoryUpload/FTP archive, if not given, get_curdatetime is used (YYYYMMDD_hhmmss)
- execOnly
-
do not execute loads where $common{task}{execOnly} !~ $load->{process}{onlyExecFor}. Empty onlyExecFor loads are always executed regardless of $common{task}{execOnly}
- ignoreNoTest
-
ignore the notest file in the process-script folder, usually preventing all runs that are not in production
- plannedUntil
-
latest time that planned repetition should start, this can be given either as HHMM (HourMinute) or HHMMSS (HourMinuteSecond), in case of HHMM the "Second" part is attached as 59
- redoFile
-
flag for specifying a redo
- redoTimestampPatternPart
-
part of the regex for checking against filename in redo with additional timestamp/redoDir pattern (e.g. "redo", numbers and _), anything after files barename (and before ".$ext" if extension is defined) is regarded as a timestamp. Example: '[\d_]', the regex is built like ($ext ? qr/$barename($redoTimestampPatternPart|$redoDir)*\.$ext/ : qr/$barename($redoTimestampPatternPart|$redoDir)*.*/)
- retrySecondsErr
-
retry period in case of error
- retrySecondsErrAfterXfails
-
after fail count is reached this alternate retry period in case of error is applied. If 0/undefined then job finishes after fail count
- retrySecondsXfails
-
fail count after which the retrySecondsErr are changed to retrySecondsErrAfterXfails
- retrySecondsPlanned
-
retry period in case of planned retry
- skipHolidays
-
skip script execution on holidays
- skipHolidaysDefault
-
holiday calendar to take into account for skipHolidays
- skipWeekends
-
skip script execution on weekends
- skipForFirstBusinessDate
-
used for "wait with execution for first business date", either this is a calendar or 1 (then calendar is skipHolidaysDefault), this cannot be used together with skipHolidays
COPYRIGHT
Copyright (c) 2024 Roland Kapl
All rights reserved. This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
The full text of the license can be found in the LICENSE file included with this module.