Setting Options

Net::AsycnZ sets options by means of named parameters for both the parent process and each of its child processes. Options for the parent are set in Net::AsycnZ->new. Options for the child processes are set via the options parameter of Net::AsycnZ->new; the value of this parameter must be an array of Net::Z3950::AsyncZ::Options::_params objects.

If a _params object doesn't exist for a child process, Net::AsycnZ->new will create it with a set of default options. There will always be a _params object for every server in the servers array, and they are cross-indexed, that is $_params_object[0] is used for $server[0], etc. So, if you are creating your own array of _params objects, you must keep this parallelism in mind.

Types of Options

      [1] Options set in Net::Z3950::AsyncZ::new which control the parent process and selected features of the child processes for which no alternatives are present: the alternatives are set as indicated in [2] and [3].

      [2] Options set in a Net::Z3950::AsyncZ:Options::_params object: this is returned by Net::Z3950::AsyncZ::asyncZOptions(). There is one _params object for each server: if you don't create one, it is created for you with the default values. If you don't create a _params object for a server, then log and query options set in the AsyncZ constructor will be used. The rationale behind this is that you usually will be asking one question across all servers and will usually be using only one log file for debugging.

But in all other cases where it is possible to set an option for the child in both the AsyncZ constructor or _params, the _params setting will be used. At the moment this affects the format and num_to_fetch options.

      [3] Options set in the Net::Z3950::Manager by using the Z3950_options option of the _params object. These take precedence over any others and must be passed in with the first _params object, that is, $_params_object[0], because AsyncZ uses only one Net::Z3950::Manager. The Manager is created when setting up the first server passed into the constructor.

Note:

Default values for options are shown to the right of the =>operator:

HTML=>0

In some instances, the type of variable is shown and defaults detailed in commentary:

format=>\&format

Net::Z3950::AsyncZ::new

cb

        cb=>\&cb    callback function to which records will be sent as available. See Output Callback.

format

        format=>\&format    callback function to format individual lines of records. See Format Callback. If you create a _params object for a server and do not set its format option, then the default format will be used, even if you set the format option of the AsyncZ constructor to another value.

interval

        interval=>1    Event loop timer interval in seconds: This controls how frequently AsyncZ checks to see if servers have responded and if the timeout period is up.

log

        log=>undef    controls how extended error messages are handled. There are two sets of error messages--those handled through Net::Z3950::AsyncZ::ErrMsg and which are meant for the user and those meant for debugging. The latter are generated by both AsyncZ and the Perl library and can accumlulate at a rapid clip. AsyncZ writes its debugging messages to STDOUT, while those coming from library routines almost always go to STDERR. There are 3 options for log.

        [1] undef, the default, in which case all debugging messages go to the terminal, and those written to STDOUT will end up in a browser if you are on the web.

        [2]log=>Net::Z3950::AsyncZ::Errors::suppressErrors() (or log=>suppressErrors() if you import the function)--in which case these messages will be suppressed

        [3]log=>$filespec, in which case all of these messages will go to the file specified in $filespec

The Net::Z3950::AsyncZ::Options::_param object also has a log option--which means that you can specify a log file for each child process--ie. for each server queried-- while keeping a separate one for the parent. Or you can set up a system where parent and child_1 write to log.1, while child_2 and child_3 write to log.2, etc.

Note: All error logs are automatically opened and closed. Do NOT open or close them yourself!

Do NOT open or close log files yourself!

maxpipes

        maxpipes=>4    maximum number of forks to be executed at one time--the greater the number the more resources are used--both of memory and cpu.

monitor

        monitor=>0    timeout in seconds for a monitoring child process, or 0, in which case a monitor is not set.

        The monitor is a child process which runs a timer and kills the parent process, if it exceeds the timeout period. You run the monitor only if your software hangs. An orderly shutdown of all runnning processes is put into effect, the purpose of which is to prevent the development of zombie processes and to release all shared memory.

num_to_fetch

        num_to_fetch->5    number of records to fetch; this setting will be used only if you have not created a _params object. This means that if you create _params object for the server and do not set its num_to_fetch option, then num_to_fetch will default to 5 even if you have set another value for num_to_fetch in the AsyncZ constructor.

options

        options=>\@options    reference to an array of references to "Net::Z3950::AsyncZ::Options::_params" objects. Each reference is obtained from a call to "Net::Z3950::AsyncZ::asyncZOptions". For instance:

@options = (
	asyncZOptions(option_1=>opt_1,option_2=>opt_2, . . .),
	undef,
	asyncZOptions(option_1=>opt_1,option_2=>opt_2, . . .)
	);

This array parallels the servers array:

@servers = (
	[$host_1, $port_1, $database_1],
	[$host_2, $port_2, $database_2],
	[$host_3, $port_3, $database_3]
	);

$options[0] is used for $server[0] and $options[2] for $server[2]. If a _params object is not found or if it is not defined, as for $server[1], then a default _params object is created for the server.

query

        query=> undef    the query string: its format depends on Z3950 querytype and defaults to 'prefix' (as in Net::Z3950). You can set a separate Z3950 querytype for each query, or you can change the querytype for all servers by using Z3950_options.

If you create a _params for a server but do not set the query option in _params, then this query will be used. This means that you can set one query for all of your servers without having to re-set it for each of the _params objects you create. But if you create a _params with a different query, then the query set in _params will be used.

servers

        servers=>\@servers    array of references to servers in form: [ $host, $port, $database]

   See options above and AsyncZ.pod: "The Basic Script".

See also basic.pl
swap_attempts

        swap_attempts=>5    the number of times that a swap check will be done before exiting; see swap_check for details.

swap_check

        swap_check=>0    the number of seconds between checks for swapping activity-- used when querying a great number of servers and requesting large amounts of data. It instructs AsyncZ to sleep for swap_check number of seconds before processing any further connections. If you are attempting to process too much data for the size of your RAM, the system will have to swap out of memory into the swap space on your disk; too much swapping causes loss of data and disk "thrashing"--i.e. repeated disk access--and will overburden the system. When swap_check is set, AsyncZ will check for signs of swap activity; if it finds swap activity it will go to sleep for the number of seconds set in swap_check and then re-check for swap_attempts number of times. If the swap activity continues beyond this number of checks, AsyncZ dies. For large throughput, you will probably want to set the monitor, and to set it for a long period of time, for instance, 3000 seconds. This means that you can set swap_check to a period of 10,20, 30 seconds. The values you set on these variables will depend on your own system memory resources and the amount of data you are processing. Note: This has been tested only on Linux but should also work on Unix, at least on Solaris.

timeout

        timeout=>25    total timeout in seconds for all processes to complete their work.

timeout_min

        timeout_min=>5    minumum timeout in secs to exit Event loop if all processes are finished; a security blanket to make sure all processes get a chance to report their results to the parent process before exiting the loop.

 

   

Net::Z3950::AsyncZ::Options::_params

Where a _param option duplicates an AsyncZ::new option, consult the AsyncZ::new description for more details.

HTML

        HTML=>0    if true use default HTML formatting for records, if false format as plain text; see "Row Formatting Priorities".

Z3950_options

        Z3950_options=>undef    reference to hash of additional Z3950 options.

These options are passed to the Z3950 Manager and take precedence over _param options and options set in Net::Z3950::AsyncZ->new.   Z3950_options makes it possible to implement Z3950 options which may not be specifically accounted for in any of the options to the AsyncZ module. For instance, to ask for "full" as opposed to "brief" records (which is the Z3950 default):

@options = (asyncZOptions(Z3950_options=>{elementSetName =>'f'}) <, (asyncZOptions(. . .), . . >);

Note: To use this option, it must appear in the first _params object of the _params array, $options[0], as in the above example. It is ignored in any subsequent uses. This means that you cannot set these options on a per-server basis; they apply across to board to all the servers you are querying. In the above exmaple, for instance, you could not ask for brief records from some servers and full from others.

See "Types of Options"

cb

        cb=>\&cb    reference to callback function to which records will be sent as available

format

        format=>\&format    reference to a callback function that formats each row of a record

interval

        interval=>5    timer interval for this forked process. See interval above under Net::Z3950::AsyncZ::new.

log

        log=>undef    controls how extended error messages are handled for this child process. A separate log file can be opened for each process.

Note: All error logs are automatically opened and closed.

See log above under Net::Z3950::AsyncZ::new.

marc_fields
marc_subst
marc_userdef
marc_xcl

These options are fully described and illustrated in Report.pod under the heading "MARC Bibliographic Format".

num_to_fetch

        num_to_fetch=>5    number of records to fetch from this server.

pipetimeout

        pipetimeout=>20    timeout in seconds for this child process

preferredRecordSyntax

        preferredRecordSyntax=>Net::Z3950::RecordSyntax::USMARC    the Z3950 preferredRecordSyntax for this child process

query

        query=>undef    the query for this process

querytype

        querytype=>'prefix'    Z3950 querytype for this child process; it can be set to'ccl', or 'ccl2rpn'.

raw

        raw=>0    (boolean) if true the raw record data for this process is returned; its format is dependent on the render option.

render

        render=>1    (boolean) if true the raw record data for this process is returned filtered through the Z3950 Record::render function; this is the default. If false the raw data is returned unfiltered in its original state. The unfiltered raw data can be read using Net::Z3950::AsyncZ::prep_Raw and Net::AsyncZ::get_ZRawRec.

startrec

        startrec=>1    number of the record with which to start result from Record Set.

utf8

        utf8=>0    when set to true conversions will be made to utf8/unicode characters from the character codes used in MARC records to represent non-latin1 and accented latin1 chatacters. When ouputting utf8, you must call binmode on the ouput stream, for example:

binmode(STDOUT, ":utf8");

When outputting to a browser, you should also notify the browser:

	print "Content-type: text/html;charset=utf-8'\n\n";		
        print '<head><META http-equiv="Content-Type" content="text/html; charset=utf-8"></head><body>';

See the sample script: MARC_HTML.pl.

Note: To use utf8 you must have the MARC::Charset module installed. Otherwise, the utf8 option will be ignored.

 

 

Row Formatting Priorities

If more than one option is set that affects the formatting of a record's rows, the following priority squence is in effect:

raw, format, HTML, plaintext (default)
 

 

Methods for Setting _params Options

get/set methods

Net::Z3950::AsyncZ::Options::_params provides a full range of get_option / set_option methods, enabling the dynamic setting of option values.

	$_params_object->set_HTML(0);
        $num_to_fetch = $_params_object->get_num_to_fetch();

In addition there are functions for setting options with fixed values:

                 Function                                    Equivalent

set_marc_xtra() 	 	set_marc_fields($Net::Z3950::AsyncZ::Report::xtra)    
set_marc_all()   		set_marc_fields($Net::Z3950::AsyncZ::Report::all)
set_marc_std()   		set_marc_fields($Net::Z3950::AsyncZ::Report::std)
set_raw_on()     		set_raw(1)
set_raw_off()    		set_raw(0)
set_plaintext()  		set_HTML(0)
set_HTML()	 	set_HTML(1)
set_prefix()    		set_querytype('prefix')
set_ccl=>()        	set_querytype('ccl')
set_GRS1()	        set_preferredRecordSyntax(Net::Z3950::RecordSyntax::GRS1)
set_USMARC()	     	set_preferredRecordSyntax(Net::Z3950::RecordSyntax::USMARC)

The get/set methods guarantee that you have in fact set or queried the option you are interested in and, in the case of the fixed value options, that you have set it to the value required. You don't have to be concerned that a meaningless hash key will spring into existence through misspelling:

$_params_object = asyncZoptions(leg=>Error.LOG, num_to_fish=>3);

In the case of the some of the fixed value methods, one advantage is the obvious simplicity of calling set_GRS1() instead of set_preferredRecordSyntax(Net::Z3950::RecordSyntax::USMARC).

Net::Z3950::AsyncZ::Option::_params::option

This method works to both get and set values.

$value = $_params_obj->option('option');
$old_options_ref = $_params_obj->option(option=>value,option=>value,option=>value. . . );

params

in get mode:	'option' to be queried
in set mode:	 list of option=>value pairs to be set (or %hash)
			

returns

in get mode: 	$value of option being queried
in set mode:	$old_options_ref -- reference to a hash of  option=>value pairs
				    which have been replaced by list or %hash					

Net::Z3950::AsyncZ::Option::_params::validOption

$bool = $_params_obj->validOption('option');

Net::Z3950::AsyncZ::Option::_params::invalidOption

$bool = $_params_obj->invalidOption('option');

Both of the above methods will enable you to determine whether an option you choose to set is a valid option. Useful when using Net::Z3950::AsyncZ::Option::_params::option.

	$option = 'num_to_fetch';
        $_params_obj->validOption($option) ? $_params_obj->option($option=>3) :
				 die "invalid option: $option";

Net::Z3950::AsyncZ::Option::_params::test

$_params_obj->test();

Calling this function will print a listing of defined options and values for $_params_obj.

 

 

AUTHOR

Myron Turner <turnermm@shaw.ca> or <mturner@ms.umanitoba.ca>

COPYRIGHT AND LICENSE

Copyright 2003 by Myron Turner

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.