The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

File::AptFetch - perl interface onto APT-Methods.

SYNOPSIS

# TODO:

DESCRIPTION

Shortly:

  • Methods are usual executables. Hence F:AF forks.

  • There's no command-line interface for methods. The IPC is two pipes (STDIN and STDOUT from method POV).

  • Each portion of communication (named message) consists of numerical code with explaining text and a sequence of colon (':') separated lines. A message is terminated with empty line.

  • File::AptFetch::Cookbook has more.

(disclaimer) Right now, F::AF is in "proof-of-concept" state. It surely works with local methods (file and copy); I hope it will work within trivial cases with remote methods. (F::AF has no means to accept (not talking about to pass along) authentication credentials; So if your upstream needs authentication, F::AF is of no help here.) And one more warning: you're supposed to do all the dirty work of managing -- F::AF is only for comunication. Hopefully, there will be someday a kind of super-module what would simplify all this.

(warning) You should understand one potential tension with F::AF: wget(1), curl(1), various FTP clients, or whatever else that constitutes fetcher are (I hope so) thoroughly tested against monkey-wrench on the other side of connection. APT methods are not. APT talks to repositories; those repositories are mostly mirrors. Administrators of mirrors and mirror-net roots have at least a basic clue. Pending discovery of APT methods behaviour when they face idiots on the other side of connection.

There's a list of known bugs, caveats, and deficiencies.

  • At two points F::AF reads and writes pipes. SIGALRM and SIGPIPE are of concern (SIGCHLD support just talks about that signal; the signal by itself is ignored). However, that's possible, that eval would be broken with some other signal. Hopefully, some day I'll find some other way to support such situation. Right now -- F::AF will die.

  • That seems that upon normal operation there're no zombies left. However, I'm not sure if waitpid would work as expected. (What if some method would take lots of time to die after being signaled?)

  • SIGCHLD is ignored by default. SIGPIPE is not. It's supported only while interacting with a child. If method decides to die some time outside those IPC sections, then your process will get SIGCHLD and possible SIGPIPE. (To be honest, may be I'm overperssimistic here (if process goes away it becomes a zombie; if it didn't closed its input (your output), then should stay; than there's no place for SIGPIPE). Should verify.)

  • Methods are supposed (or not?) to write extra diagnostic at its STDERR. It stays the same as of your process. However, I still haven't seen any output. So, (first) I (and you) have nothing to worry about and (second) I have nothing to work with. That's possible that issue will stay as caveat.

  • @$log is fragile. Don't touch it. However, there's a possibility of @$log corruption, like this. If method goes insane and outputs unparsable messages, then "gain" will give up immedately leaving @$log unempty. In that case you're supposed to recreate F::AF object (or give up). If you don't then strange things can happen (mostly -- give-ups again). So, please, do.

  • @$diag grows. In next release there will be some means to maintain that. Right now, clean @$diag yourself, if that becomes an issue.

  • You're supposed to maintain a balance of requests and fetches. If you try "gain" when there's no unfinished requests, then method will timeout. There's nothing to worry about actually except hanging for 120sec.

(note) Documentation of this library must maintain 4 namespaces:

Function/method parameter list (@_)

Within a section they always refer to parameter names and keys (if @_ is hash) mentioned in nearest synopsis.

Explicit values in descriptive codes

They always refer to some value in nearest code. $method, $? etc means that there would be some value that has some relation with named something. POD markup in descriptions means exactly that.

Keys of File::AptFetch blessed object

Whatever missing in nearest synopsis fits here. Each key has explicit content dereference attached. So @$log means that key named log has value of ARRAY reference, %$message has value of HASH reference, and $status has value of plain scalar (it's not reference to SCALAR, or it would be $$status).

Keys of File::AptFetch::ConfigData configuration module

Within each section upon introducing they are explicitly mentioned as such. The above explanation about explicit dereference applies here too.

(note) Message headers are refered as keys of some fake global %$message. So Filename becomes $message{filename}, and Last-Modified -- $message{last-modified} (as you can see that notation is somewhat syntactically incorrect). I hope it's clear from context is that header down- or up-stream.

(note) Through out this POD "log item" means one line in @$log; "log entry" means sequence of log items including terminating empty item.

(note) Through out this POD "120sec timeout" means: "$timeout in File::AptFetch::ConfigData being left as set in stock distribution, overriden while pre-build configuring, or set at run-time".

IMPORTANT NOTE ON PERL-5.10.0

It's neither bug nor caveat. And it's out of my hands, really. perl-5.10.0 exits application code differently if compared with perl-5.10.1 (unbelievable?). My understanding is that v5.10.0 closes handles first, then DESTROYs. Sometimes that filehandle closing happens in right order. But most probably application is killed with $SIG{CHLD}. END{} doesn't help --- that filehandle masacre happens before those blocks are run. I believe, whatever tinkering with the global $SIG{CHLD} is a bad idea. And terminating every method just after transfers have finished is same stupid. Thus, if you run perl-5.10.0 (probably any earlier too) destroy the File::AptFetch object explicitly before exiting app, if you care about to be not $SIG{CHLD}ed.

METHODS

init
ref(my $fetch = File::AptFetch->init($method)) or
  die $fetch;

That's an initialization stuff. APT-Methods are userspace executables, you know, hence it forks. If fork fails, then it dies. If all preparations succeede, then it returns File::AptFetch blessed object; Otherwise a string describing issue is returned. Any diagnostic from forked instance and, later, execed $method goes through STDERR. (And see "_cache_configuration".)

An idea behind this ridiculous construct is that someday, in some future, there will be a lots of concurency. (I didn't say that would be threads, did I?) Hence it's impossible to maintain one package-wide store for fail description. All methods of File::AptFetch return descriptive strings in case of errors. &init just follows them.

$method is saved in same named key for reuse.

C<$method>: (I<lib_method>): neither preset nor found

$lib_method (in File::AptFetch::ConfigData) points to a directory where APT-Methods reside. Without that knowledge File::AptFetch has nothing to do. It's either picked from configuration (build-time) or from apt-config output (run-time) (in that order). It wasn't found in either place -- fairly strange APT you have.

I<method> is unspecified

$method is required argument, so, please, provide.

C<$method>: ($?): died without handshake

Start-up configuration is essential. If $method disconnects early, than that makes a problem. The exit code (no postprocessing at all) is provided in braces.

C<$method>: timeouted without handshake

$method failed to configure within time frame provided. (v.0.0.8) "_read" has more about timeouts.

C<$method>: ($Status): that's supposed to be (100) (Capabilities)

As described in "APT Method Interface", Section 2.2, $method starts with '100 Capabilities' Status Code. $method didn't. Thus that's not an APT-Method. File::AptFetch has given up.

Yet refer to "_parse_status_code", "_parse_message", and "_cache_configuration" -- those can emit their own give-up codes (they are passed up immediately by init without postprocessing).

DESTROY
undef $fetch;
# or leave the scope

That's a destructor for File::AptFetch objects. Pipes are destroied first. Then, if $pid is found this PID is killed, and then, if kill happened to be successful, the upcoming zombie is reaped. waitpid is unconditional and isn't timeout protected.

The actual signal sent for $pid is configured with $signal in File::AptFetch::ConfigData. However one can override (upon build time) or explicitly set it to any desired name or number (upon runtime). Refer to File::AptFetch::ConfigData for details.

request
my $rc = $fetch->request(
  $target0 => $source,
  $target1 => { uri => $source } );
$rc and die $rc;

(bug) In that section abbreviation "URI" actually refers to "scheme-specific-part". Beware.

That files requests for transfer. Each request is a pair of $target and either of

$source

Simple scalar; It MUST NOT provide schema -- pure filename (either local or remote); It MUST provide all (and no more than) needed leading slashes though (double slash for remotes).

$source is preprocessed -- $method (with obvious colon) is prepended. (That seems, APT's method become very nervous if being requested mismatching method's name schema.) (bug) That requirement will be slightly relaxed in next release.

%$source HASH ref

Such keys are known

$uri

The same requirements as for $source apply.

There're other keys yet that must be supported. Right now I unaware of any (pending real-life testing).

Actual request is filed at once (subject to buffering though), in one big (or not so) chunk (as requested by API). @$diag field is updated accordingly.

Diagnostic provided:

C<$method>: ($filename): URI is undefined

Either $source or $source{uri} was evaluated to FALSE. (What request is supposed to be?)

(caveat) While undef and empty string are invalid URIs, is 0 a valid URI? No, URI is supposed to have at least one leading slash.

&request pretends to be atomic, the request would happen only in case @_ has been parsed successfully.

gain
$rc = $fetch->gain;
$rc and die $rc;

That gains something. 'Something' means it's unknown what kind of message APT's method would return. It can be 'URI Start', 'URI Done', or 'URI Failure' messages. Anyway, message is stored in @$diag and %$message fields of object; $Status and $status are set too.

Diagnostic provided:

C<$method>: ($CHLD_error): died

Something gone wrong, the APT's method has died; More diagnostic might gone onto STDERR.

C<$method>: timeouted without responce

The APT's method has quit without properly terminating message with empty line or failed to output anything at all. Supposedly, shouldn't happen.

C<$method>: timeouted

The APT's method has sat silently all the time. The possible cause would be you've run out of requests (than the method has nothing to do at all (they don't tick after all)).

"_parse_status_code" and "_parse_message" can emit their own messages.

_parse_status_code
$rc = $self->_parse_status_code;
return $rc if $rc;

Internal. Picks one item from @$log and attempts to process it as a Status Code. Consequent items are unaffected.

C<$method>: ($log_item): that's not a Status Code

The $log_item must be qr/^\d{3}\s+.+/. No luck this time.

Sets apropriate fields ($Status with the Status Code, $status with the informational string), then backups the processed item.

_parse_message
$rc = $self->_parse_message;
return $rc if $rc;

Internal. Processes the log entry. Atomically sets either %$capabilities (if $Status is 100) or %$message (any other). Each key is lowercased.

(bug) It's ridiculous to write 'Last-Modified' => $time instead of last_modified => $time, isn't it? In next release hyphens ('-') will be substituted with underscore ('_').

C<$method>: ($log_item): that's not a Message

The $log_item must be qr/^[0-9a-z-]+:(?>\s+).+/i. It's not. No luck this time. The offending and all consequent items are left on @$log.

The $log_items are backed up and removed from @$log.

(bug) If the last item isn't an empty line, then undef will be pushed. Beware and prevent before going for parsing.

_cache_configuration
$rc = $self->_cache_configuration;
return $rc if $rc;

Internal. forks. dies if fork fails. forked child execs an array set in @$config_source (from File::AptCache::ConfigData). If $lib_method (from File::AptFetch::ConfigData) is unset, then parses prepared cache for Dir::Bin::methods item and (if finds) sets $lib_method. It doesn't complain if $lib_method happens to be left unset. If cache is set it returns without any activity.

@$config_source is subject to the build-time configuration. It's preset with qw[ /usr/bin/apt-config dump ] (YMMV, refer to File::AptFetch::ConfigData to be sure). @$config_source must provide reasonable output -- that's the only requirement (look below for details).

(bug) While @$config_source is configurable all diagnostic messages refer to 'apt-config'.

@$config_source's output is postprocessed -- configuration names and values are stored as equal ('=') separated pairs in scalars and pushed into intermediate array. If everything finishes OK, then the package-wide cache is set. That cache is lexical (that's possible, I would find a reason to make some kind of iterator some time later; such iterator is missing right now).

Diagnostic provided:

C<$method>: ($line): that's unparsable

The $line must be qr/^[a-z-]+(?:::[a-z-]+)*(?:::)*\s+".*";$/i. $line doesn't match. Please note caveat below.

C<$method>: close (C<apt-config>) failed: $!

After processing input a pipe is closed. That close failed with $!.

C<$method>: (C<apt-config>): timeouted

While processing a fair 120sec timeout is given (it's reset after each $line). @$config_source hanged for that time.

C<$method>: (C<apt-config>) died: $?

@$config_source has exited uncleanly. More diagnostic is supposed to be on STDERR.

C<$method>: (C<apt-config>): failed to output anything

@$config_source has exited cleanly, but failed to provide any output to parse at all.

(caveat) apt-cache can be triggered to output many (at least 2, that's what I can see) double-colon ('::') separators. apt-get (and ...(?)) removes such extra separators, while retaining the last pair ("quotation needed" (TM)) when talking to APT-Methods. So does File::AptFetch.

(bug) I've discovered, that APT replaces each space (' ') in configuration value with %-escape ('%20'). I have no fscking understunding what yet escapes are in use. I know for sure, that double-double-quotes ('""') are removed (so 'ABC { "abc" "xyz"; };' in configuration file becomes 'ABC "abc xyz";' in apt-cache output). That's why File::AptFetch doesn't read configuration files by itself (one less point of frustration). (goddamn Debian.)

_uncache_configuration
File::AptFetch::_uncache_configuration;
# or
$self->_uncache_configuration;
# or
$fetch->_uncache_configuration;

Internal. That cleans APT's configuration cache. That doesn't trigger recacheing. That cacheing would happen whenever that cache would be required again (subject to the natural control flow).

(caveat) &_cache_configuration sets $lib_method (in File::AptFetch::ConfigData) (if it happens to be undefined). &_uncache_configuration untouches it.

_read
$fetch->_read;
$fetch->{ALRM_error} and
  die "internal error: requesting read while there shouldn't be any";
$fetch->{CHLD_error} and
  die "external error: method has gone nuts and AWOLed";

Internal. Refactored. That attempts to read the log entry. Each item is chomped and then pushed onto @$log. If item happens to be empty line then finishes. The @$log isn't filled atomically, so check if the last line was empty.

That provides no diagnostic. However

child timeouts

If child timeouts, then $ALRM_error is set (to TRUE, otherwise meaningles). Then finishes.

(v0.0.8) And more about what timeout is. It was believed, that methods pulse their progress. That belief was in vain. Thus for now:

  • The timeout is configurable through $timeout (in File::AptFetch::ConfigData) (120sec, by stock configuration; no defaults.)

  • The timeout is cached in each instance of File::AptFetch object.

  • Target filenames are cached in the File::AptFetch object too.

  • If the cycle of _read() has been timeouted then each target filename is checked for size change.

  • If any target file has changed then request processing is considered to be in progress yet, and the next cycle is started (as if method has reported anything.)

  • (bug) It's clear, that's the place were user-provided callback should be called. Although it's not the case yet.

child exits

The child is waitpided, then $CHLD_error is set, then finishes.

unknown error

It's actually possible that main reading cycle would return with neither line, nor timeout, nor child exit. Then it dies.

SEE ALSO

File::AptFetch::Cookbook, "APT Method Itnerface" in libapt-pkg-doc package, apt-config(1)

AUTHOR

Eric Pozharski, <whynot@cpan.org>

COPYRIGHT & LICENSE

Copyright 2009, 2010 by Eric Pozharski

This library is free in sense: AS-IS, NO-WARANRTY, HOPE-TO-BE-USEFUL. This library is released under GNU LGPLv3.