NAME
Audio::FindChunks - breaks audio files into sound/silence parts.
SYNOPSIS
use Audio::FindChunks;
# Duplicate input to output, caching RMS values to a file (as a side effect)
Audio::FindChunks->new(rms_filename => 'x.rms', filter => 1)->get('rms_data');
# Output human-readable info, using RMS cache file 'xxx.rms' if present:
Audio::FindChunks->new(cache_rms => 1, filename => 'xxx.mp3',
stem_strip_extension => 1)->output_blocks();
# Remove start/end silence (if longer than 0.2sec):
Audio::FindChunks->new(cache_rms => 1, filename => 'xxx.mp3',
min_actual_silence_sec => 1e100)->split_file();
# Split a multiple-sides tape recording
Audio::FindChunks->new(filename => 'xxx.mp3', min_actual_silence_sec => 11
)->split_file({verbose => 1});
# Output the RMS levels of small interval in human-readable form
Audio::FindChunks->new(filename => 'xxx.mp3')->output_levels();
DESCRIPTION
Audio sequence is broken into parts which contain only noise ("gaps"), and parts with usable signal ("tracks").
The following configuration settings (and defaults) are supported:
# For getting PCM flow (and if averaging data is read from cache)
frequency => 44100, # If 'raw_pcm' or 'override_header_info' only
bytes_per_sample => 4, # likewise
channels => 2, # likewise
sizedata => MY_INF, # likewise (how many bytes of PCM to read)
out_fh => \*STDOUT, # mirror WAV/PCM to this FH if 'filter'
# Process non-WAV data:
preprocess => {mp3 => [[qw(lame --silent --decode)], [], ['-']]}, # Second contains extra args to read stdin
# RMS cache (used if 'valid_rms')
rms_extension => '.rms', # Appended to the 'filestem'
# Averaging to RMS info
sec_per_chunk => 0.1, # The window for taking mean square
# thresholds picking from the list of sorted 3-medians of RMS data
threshold_in_sorted_min_rel => 0, # relative position of 'threashold_min'
threshold_in_sorted_min_sec => 1, # shifted by this amount in the list
threshold_factor_min => 1, # the list elt is multiplied by this
threshold_in_sorted_max_rel => 0.5, # likewise
threshold_in_sorted_max_sec => 0, # likewise
threshold_factor_max => 1, # likewise
threshold_ratio => 0.15, # relative position between min/max
# Chunkification: smoothification
above_thres_window => 11, # in units of chunks
above_thres_window_rel => 0.25, # fractions of chunks above threshold
# in a window to make chunk signal
# Splitting into runs of signal/noise
max_tracks => 9999, # fail if more signal/noise runs
min_signal_sec => 5, # such runs of signal are forced
min_silence_sec => 2, # likewise
ignore_signal_sec => 1, # short runs of signal are ignored
min_silence_chunks_merge (see below) # and long resulting runs of silence
# are forced
# Calculate average signal in an interval "deeply inside" silence runs
local_level_ignore_pre_sec => 0.3, # offset the start of this interval
local_level_ignore_pre_rel => 0.02, # additional relative offset
local_level_ignore_post_sec => 0.3, # likewise for end of the interval
local_level_ignore_post_rel => 0.02, # likewise
# Enlargement of signal runs: attach consequent chunks with signal this much
# above this average over the neighbour silence run
local_threshold_factor => 1.05,
# Final enlargement of runs of signal
extend_track_end_sec => 0.5, # Unconditional enlargement
extend_track_begin_sec => 0.3, # likewise
min_boundary_silence_sec => 0.2, # Ignore short silence at start/end
Note that above_thres_window
is the only value specified directly in units of chunks; the other *_sec
may be optionally specified in units of chunks by setting the corresponding *_chunks
value. Note also that this window should better be decreased if minimal allowed silence length parameters are decreased.
These values are mirrored from other values if not explicitly specified:
min_actual_silence_sec << min_silence_sec # Ignore short gaps
min_start_silence_sec << min_boundary_silence_sec # Same at start
min_end_silence_sec << min_boundary_silence_sec # Same at end
min_silence_chunks_merge << min_silence_chunks # See above
cache_rms_write <<< cache_rms # Boolean: write RMS cache
cache_rms_read <<< cache_rms # Boolean: read RMS cache (unless 'filter')
The following values default to undef
:
filename # if undef, read data from STDIN
stem_strip_extension # Boolean: 'filestem' has no extension
filter # If true, PCM data is mirrored to out_fh
rms_filename # Specify cache file explicitly
raw_pcm # The input has no WAV header
override_header_info # The user specified values override WAV header
cache_rms # Use cache file (see *_write, *_read above)
skip_medians # Boolean: do not calculate 3-medians
subchunk_size # Optimization of calculation of RMS; the
# best value depends on the processor cache
METHODS
new(key1 => value1, key2 => value2, ....)
-
The arguments form a hash of configuration parameters.
set(key => value)
-
set a configuration parameter.
get(key)
-
get a configuration parameter or a value which may be calculated basing on them.
output_levels([key])
-
prints a human-readable display of RMS (or similar) values. Defaults to
rms_data
; additional possible values aremedians
andsorted
.The format of the output data is similar to
Frequency: 44100. Stride: 4; 2 channels. Chunk=0.1sec=17640bytes. ch0: -9999.0 .. 9999.0 (-10dB;-10dB). ch1: -9999.0 .. 9999.0 (-10dB;-10dB). 0: 0.0: 20.7= -61dB: ###########> 1: 0.1: 20.7= -61dB: ###########> 2: 0.2: 20.7= -61dB: ###########> ...
(with the
ch0 ETC
line empty if data is read from an RMS file). Each chunk gives a line with the chunk number, start (in sec), RMS intensity (in linear scale and in decibel), and the graphical representation of the decibel level (each#
counts as 3dB,:
adds 1dB, and>
adds 2dB). output_blocks([option_hashref], [key])
-
prints a human-readable display of obtained audio chunks.
key
defaults tob
; additional possible values areb0
tob4
. Recognized options key isformat
; defaults tolong
, which results in windy output; the valueshort
results in shorter output and no preamble. Preamble lines are all#
-commented; any output line is in the formSTART_SEC =END_SEC # COMMENT
With
short
format there is no preamble, and (currently)COMMENT
is of the formPIECE_NUMBER len=PIECE_DURATION_SEC
. These formats are recognized, e.g., by MP3::Split::mp3split_read().The default format is currently
# threshold: 1078.46653890971 (in 20.7214163971884 .. 7072.35556648067) 4.4 =25.8 # n=1 duration 21.4; gap 4.4 (4.4 .. 25.8; 21.4) 27.7 =67 # n=2 duration 39.3; gap 1.9 (27.7 .. 1m07.0; 39.3)
split_file([options], [key])
-
Splits the file (only MP3 via MP3::Splitter is supported now). The meaning of options is the same as for MP3::Splitter. Defaults to blocks of type
b
; additional possible values areb0
tob4
. - @vals = get_rmsinfo(); set_rmsinfo(@vals)
-
Duplicate RMS info between two different
Audio::FindChunks
objects. The exchanged info is the following:chunks rms_data medians sorted channels min max frequency bytes_per_sample sec_per_chunk bytes_per_chunk
set_rmsinfo() returns the object itself.
set() and get()
In and Out
The functionality of the module is modelled on the architecture of Data::Flow: the two principal methods are set(key => value)
and get(key)
; the module knows how to calculate keys basing on values of other keys.
The results of calculation are cached; in particular, if one needs to calculate some value for different values of a configuration parameter, one should create many copies of Audio::FindChunks
object, as in
my @info = Audio::FindChunks->new(filename => $f)->get_rmsinfo;
for my $ratio (0..100) {
Audio::FindChunks->new(threshold_ratio => $r/100)
->set_rmsinfo(@info)->print_blocks();
}
The internally used format of intermediate data is designed for quick shallow copying even for enourmous audio files.
Dependencies
The current dependecies for values which are not explicitly set():
filestem <<< filename stem_strip_extension
input_type <<< filename
preprocess_a <<< input_type preprocess
preprocess_input <<< preprocess_a filename
fh AND close_fh <<< preprocess_input filename
fh_bin <<< fh
out_fh_bin <<< filter out_fh
rms_filename_default <<< filestem rms_extension
read_from_rms_file <<< filter cache_rms_read rms_filename
write_to_rms_file <<< cache_rms_write rms_filename
rms_filename_actual <<< rms_filename rms_filename_default
samples_per_chunk <<< sec_per_chunk frequency
bytes_per_chunk <<< samples_per_chunk bytes_per_sample
rms_data_arr_f <<< read_from_rms_file rms_filename_actual
samples_per_chunk
rms_data AND chunks <<< rms_data_arr_f OR A LOT OF OTHER PARAMETERS
medians <<< rms_data skip_medians chunks
sorted <<< medians chunks,
threshold_in_sorted_* <<< chunks threshold_in_sorted_*_*
threshold_min/max <<< threshold_factor_* sorted threshold_in_sorted_min/max
threshold <<< threshold_min threshold_ratio threshold_max
above_thres <<< chunks rms_data threshold
above_thres_in_window <<< above_thres chunks above_thres_window
above_thres_window_abs<<< above_thres_window_rel above_thres_window
maybe_signal <<< above_thres_in_window chunks above_thres_window_abs
maybe_trk_pk <<< max_tracks maybe_signal chunks
b0 <<< maybe_trk_pk
b1 <<< b0 min_signal_chunks min_silence_chunks
b2 <<< b1 ignore_signal_chunks
b3 <<< b2 min_silence_chunks_merge
b4 <<< b3
b <<< b4 local_level_ignore_*
medians local_threshold_factor
extend_track_begin_chunks
extend_track_end_chunks
min_actual_silence_chunks
min_start_silence_chunks min_end_silence_chunks
If rms_data
is not read from cached source, a lot of other fields may be also set from the WAV header (unless raw_pcm
).
Formats
Potentially large internally-cached values are stored as array references to decrease the overhead of shallow copying.
The data which relates to the initial chunks (of size sec_per_chunk
) is stored as length 1 arrays with packed (either by l*
or d*
, depending on the semantic) data; this allows small memory footprint work with huge audio files, and allows an easy implemenation of most computationally intensive work in C.
The blocks of audio/signal/noise/silence are stored as Perl arrays; each element is a reference to an array of length 3: type (-1 for silence, 0 for noise, 1 for signal, and 2 for audio), start chunks, duration in chunks.
ALGORITHM
The algorithm for finding boundaries of parts follows closely the algorithm used by GramoFile v1.7 (however, this version is fully customizable, fully documented, and has some significant bugs fixed). The keywords in the discussion below refer to customization parameters; keywords of the form >>>key
refer to get()
able values set on the step in question.
- Smooth the input
-
This is done in 2 distinct steps:
Break the input into chunks of equal duration (governed by
sec_per_chunk
); find the acoustic energy of each channel per chunk (no customization); energy is the quadratic average of signal level; calculate maximal energy among channels per chunk (no customization;>>>rms_data
).Trim "extremal" chunks by replacing the energy level of each chunk by the median of it and its two neighbors (switched off if
skip_medians
;>>>medians
). - Calculate the signal/noise threshold
-
basing on the distribution (
>>>sorted
) of smoothed values. Governed bythreshold_*
parameters.>>>threshold_min
,>>>threshold_max
,>>>threshold
. - Smooth it again
-
Separate into signal and noise chunks basing on the number of above-threshold chunks in a small window about the given chunk. Governed by
above_thres_window
,above_thres_window_rel
.>>>maybe_signal
,>>>b0
. - Find certain intervals of sound and silence
-
Long enough runs of signal chunks are proclaimed carrying sound; likewise for noise chunks and silence. Governed by
max_tracks
,min_signal_chunks
,min_silence_chunks
.>>>b1
.Long enough "unproclaimed" runs of chunks with only short bursts of signal are proclaimed silence. Governed by
ignore_signal_chunks
,>>>b2
; andmin_silence_chunks_merge
,>>>b3
. - Merge undecided into sound/silence
-
A run of chunks (signal or noise) "yet unproclaimed" to be sound or silence is proclaimed sound if it is adjacent to a run of sound on at least one side. The rest of unproclaimed runs are proclaimed silence. No customization.
Runs of sound/silence are audio/gap candidates (no customization;
>>>b4
). - Calculate average signal level in each gap candidate
-
ignoring short intervals near ends of gaps. Governed by
local_level_*
. - Allow for slow attack/decay or fade in/out
-
Extend runs of audio: join the consequent runs of chunks of adjacent gaps where the energy level remains significantly larger than the average level in this gap. Additionally, unconditionally extend the tracks by a small amount. Governed by
local_threshold_factor
,extend_track_end_chunks
,extend_track_begin_chunks
. - Long enough gap candidates are gaps
-
Gaps which became too short are considered audio and are merged into neighbors. Governed by
min_actual_silence_chunks
,min_start_silence_chunks
,min_end_silence_chunks
;>>>b
.
Functions implemented in C
long bool_find_runs(int *input, array_run_t *output, long cnt, long out_cnt)
void double_find_above(double *input, int *output, long cnt, double threshold)
void double_median3(double *rmsarray, double *medarray, long total_blocks)
void double_sort(double *input, double *output, long cnt)
void int_find_above(int *input, int *output, long cnt, int threshold)
void int_sum_window(int *input, int *output, long cnt, int window_size)
void le_short_sample_stats(char *buf, int stride, long samples, array_stats_t *stat)
SEE ALSO
Data::Flow
, MP3::Split
AUTHOR
Ilya Zakharevich, <cpan@ilyaz.org<gt>
COPYRIGHT AND LICENSE
Copyright (C) 2004 by Ilya Zakharevich
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.2 or, at your option, any later version of Perl 5 you may have available.