NAME

EAI::Wrap - framework for easy creation of Enterprise Application Integration tasks

SYNOPSIS

# site.config
%config = (
	sensitive => {
			dbSys => {user => "DBuser", pwd => "DBPwd"},
			ftpSystem1 => {user => "FTPuser", pwd => "FTPPwd", privKey => 'path_to_private_key', hostkey =>'hostkey'},
		},
	checkLookup => {"task_script.pl" => {errmailaddress => "test\@test.com", errmailsubject => "testjob failed", timeToCheck => "0800", freqToCheck => "B", logFileToCheck => "test.log", logcheck => "started.*"}},
	executeOnInit => sub {$execute{addToScriptName} = "doWhateverHereToModifySettings";},
	folderEnvironmentMapping => {Test => "Test", Dev => "Dev", "" => "Prod"},
	errmailaddress => 'your@mail.address',
	errmailsubject => "No errMailSubject defined",
	fromaddress => 'service@mail.address',
	smtpServer => "a.mail.server",
	smtpTimeout => 60,
	testerrmailaddress => 'your@mail.address',
	logRootPath => {"" => "C:/dev/EAI/Logs",},
	historyFolder => {"" => "History",},
	historyFolderUpload => "HistoryUpload",
	redoDir => {"" => "redo",},
	task => {
		redoTimestampPatternPart => '[\d_]',
		retrySecondsErr => 60*5,
		retrySecondsErrAfterXfails => 60*10,
		retrySecondsXfails => 2,
		retrySecondsPlanned => 60*15,
	},
	DB => {
		server => {Prod => "ProdServer", Test => "TestServer"},
		cutoffYr2000 => 60,
		DSN => 'driver={SQL Server};Server=$DB->{server}{$execute{env}};database=$DB->{database};TrustedConnection=Yes;',
		schemaName => "dbo",
	},
	FTP => {
		lookups => {
			ftpSystem1 => {remoteHost => {Test => "TestHost", Prod => "ProdHost"}, port => 5022},
		},
		maxConnectionTries => 5,
		sshInstallationPath => "C:/dev/EAI/putty/PLINK.EXE",
	},
	File => {
		format_defaultsep => "\t",
		format_thousandsep => ",",
		format_decimalsep => ".",
	}
);

# task_script.pl
use EAI::Wrap;
%common = (
	FTP => {
		remoteHost => {"Prod" => "ftp.com", "Test" => "ftp-test.com"},
		remoteDir => "/reports",
		port => 22,
		user => "myuser",
		privKey => 'C:/keystore/my_private_key.ppk',
		FTPdebugLevel => 0, # ~(1|2|4|8|16|1024|2048)
	},
	DB => {
		tablename => "ValueTable",
		deleteBeforeInsertSelector => "rptDate = ?",
		dontWarnOnNotExistingFields => 1,
		database => "DWH",
	},
	task => {
		plannedUntil => "2359",
	},
);
@loads = (
	{
		File => {
			filename => "Datafile1.XML",
			format_XML => 1,
			format_sep => ',',
			format_xpathRecordLevel => '//reportGrp/CM1/*',
			format_fieldXpath => {rptDate => '//rptHdr/rptDat', NotionalVal => 'NotionalVal', tradeRef => 'tradeRefId', UTI => 'UTI'}, 
			format_header => "rptDate,NotionalVal,tradeRef,UTI",
		},
	},
	{
		File => {
			filename => "Datafile2.txt",
			format_sep => "\t",
			format_skip => 1,
			format_header => "rptDate	NotionalVal	tradeRef	UTI",
		},
	}
);
setupEAIWrap();
standardLoop();

DESCRIPTION

EAI::Wrap provides a framework for defining EAI jobs directly in Perl, sparing the creator of low-level tasks as FTP-Fetching, file-parsing and storing into a database. It also can be used to handle other workflows, like creating files from the database and uploading to FTP-Servers or using other externally provided tools.

The definition is done by first setting up datastructures for configurations and then providing a high-level scripting of the job itself using the provided subs (although any perl code is welcome here!).

EAI::Wrap has a lot of infrastructure already included, like logging using Log4perl, database handling with DBI and DBD::ODBC, FTP services using Net::SFTP::Foreign, file parsing using Text::CSV (text files), Data::XLSX::Parser and Spreadsheet::ParseExcel (excel files), XML::LibXML (xml files), file writing with Spreadsheet::WriteExcel and Excel::Writer::XLSX (excel files), Text::CSV (text files).

Furthermore it provides very flexible commandline options, allowing almost all configurations to be set on the commandline. Commandline options (e.g. additional information passed on with the interactive option) of the task script are fetched at INIT allowing use of options within the configuration, e.g. $opt{process}{interactive_startdate} for a passed start date.

Also the logging configured in $ENV{EAI_WRAP_CONFIG_PATH}/log.config (logfile root path set in $ENV{EAI_WRAP_CONFIG_PATH}/site.config) starts immediately at INIT of the task script, to use a logger, simply make a call to get_logger(). For the logging configuration, see EAI::Common, setupLogging.

There are two accompanying scripts:

setDebugLevel.pl to easily modify the configured log-levels of the task-script itself and all EAI-Wrap modules.

checkLogExist.pl to run checks on the produced logs (at given times using a cron-job or other scheduler) for their existence and certain (starting/finishing) entries, giving error notifications if the check failed.

API: datastructures for configurations

%config

global config (set in $ENV{EAI_WRAP_CONFIG_PATH}/site.config, amended with $ENV{EAI_WRAP_CONFIG_PATH}/additional/*.config), contains special parameters (default error mail sending, logging paths, etc.) and site-wide pre-settings for the five categories in task scripts, described below under configuration categories)

%common

common configs for the task script, may contain one configuration hash for each configuration category.

@loads

list of hashes defining specific load processes within the task script. Each hash may contain one configuration hash for each configuration category.

configuration categories

In the above mentioned hashes can be five categories (sub-hashes): DB, File, FTP, process and task. These allow further parameters to be set for the respective parts of EAI::Wrap (EAI::DB, EAI::File and EAI::FTP), process parameters and task parameters. The parameters are described in detail in section CONFIGURATION REFERENCE.

The process category is on the one hand used to pass information within each process (data, additionalLookupData, filenames, hadErrors or custom commandline parameters starting with interactive), on the other hand for additional configurations not suitable for DB, File or FTP (e.g. uploadCMD). The task category contains parameters used on the task script level and is therefore only allowed in %config and %common. It contains parameters for skipping, retrying and redoing the whole task script.

The settings in DB, File, FTP and task are "merge" inherited in a cascading manner (i.e. missing parameters are merged, parameters already set below are not overwritten):

- %config (defined in site.config and other associated configs. This is being loaded at INIT)
merged into ->
- %common (common task parameters defined in script. This is being loaded when calling setupEAIWrap())
merged into each instance of ->
- $loads[] (only if loads are defined, you can also stay with %common if there is only one load in the script)

special config parameters and DB, FTP, File, task parameters from command line options are merged at the respective level (config at the top, the rest at the bottom) and always override any set parameters. Only scalar parameters can be given on the command line, no lists and hashes are possible. Commandline options are given in the format:

--<category> <parameter>=<value>

for the common level and

--load<i><category> <parameter>=<value>

for the loads level.

Command line options are also available to the script via the hash %opt or the list of hashes @optloads, so in order to access the cmdline option --process interactive_date=202300101 you could either use $common{process}{interactive_date} or $opt{process}{interactive_date}.

In order to use --load1process interactive_date=202300101, you would use $loads[1]{process}{interactive_date} or $optloads[1]{process}{interactive_date}.

The merge inheritance for DB, FTP, File and task can be prevented by using an underscore after the hashkey, ie. DB_, FTP_, File_ and task_. In this case the parameters are not merged from common. However, they are always inherited from config.

A special merge is done for configurations defined in hash lookups, which may appear in all five categories (sub-hashes) of the top-level configuration %config. This uses the prefix defined in the task script's %common configuration to get generally defined settings for this specific prefix. As an example, common remoteHosts or ports for FTP can be defined here. These settings also allow an environment dependent hash, like {Test => "TestHost", Prod => "ProdHost"}.

%execute

hash of parameters for current task execution, which is not set by the user but can be read to set other parameters and control the flow. Most important here are $execute{env}, giving the current used environment (Prod, Test, Dev, whatever), $execute{envraw} (same as $execute{env}, with Production being empty here), the several file lists (files being procesed, files for deletion/moving, etc.), flags for ending/interrupting processing and directory locations as the home dir and history folders for processed files.

Detailed information about these parameters can be found in section execute of the configuration parameter reference, there are parameters for files (filesProcessed, filesToDelete, filesToMoveinHistory, filesToMoveinHistoryUpload, retrievedFiles) and uploadFilesToDelete, directories (homedir, historyFolder, historyFolderUpload and redoDir), process controlling parameters (failcount, firstRunSuccess, retryBecauseOfError, retrySeconds and processEnd).

Retrying after $execute{processEnd} is false (this parameter is set during processingEnd(), combining this call and check can be done in loop header at start with processingContinues()) can happen because of two reasons: First, due to task => {plannedUntil => "HHMM"} being set to a time until the task has to be retried, however this is done at most until midnight. Second, because an error occurred, in such a case $process->{hadErrors} is set for each load that failed. $process{successfullyDone} is also important in this context as it prevents the repeated run of following API procedures if the loads didn't have an error during their execution:

openDBConn, openFTPConn, getLocalFiles, getFilesFromFTP, getFiles, extractArchives, getAdditionalDBData, readFileData, dumpDataIntoDB, writeFileFromDB, putFileInLocalDir, uploadFileToFTP, uploadFileCMD, and uploadFile.

checkFiles is always run, regardless of $process{successfullyDone}.

After the first successful run of the task, $execute{firstRunSuccess} is set to prevent any error messages resulting of files having been moved/removed while rerunning the task until the defined planned time (task => {plannedUntil => "HHMM"}) has been reached.

initialization

The INIT procedure is executed at the task script initialization (when EAI::Wrap is "use"d in the task script) and loads the site configuration, starts logging and reads commandline options. This means that everything passed to the script via command line may be used in the definitions, especially the task{interactive.*} parameters, here the name and the type of the parameter are not checked by the consistency checks (other parameters that are not allowed or have the wrong type throw an error). The task script's configuration itself is then read with setupEAIWrap(), which is usually called immediately after the datastructures for configurations have been finished.

API: High-level subs

Following are the high level subs that can be called for a standard workflow. Most of them accumulate their sub names in process{successfullyDone} to prevent any further call in a faulting loop, when they alrady ran successfully. Also process{hadErrors} is set in case of errors to provide for error repeating. Downloaded files are collected in process{filenames} and completely processed files in process{filesProcessed}.

setupEAIWrap

setupEAIWrap is actually imported from EAI::Common, but as it is usually called as the first sub, it is mentioned here as well. This sub sets up the configuration datastructure and merges the hierarchy of configurations, more information in EAI::Common::setupEAIWrap.

removeFilesinFolderOlderX

Usually done for clearing FTP archives, this removes files on FTP server being older than a time back (given in day/mon/year in remove => {removeFolders => ["",""], day=>, mon=>, year=>1}), see EAI::FTP::removeFilesOlderX (always runs in a faulting loop)

openDBConn ($)