Author image Stephen A Cavilia
and 1 contributors

NAME

Parallel::Batch - Run a large number of similar processes using bounded parallelism

SYNOPSIS

  use Parallel::Batch;
  
  my $batch = Parallel::Batch->new({code => \&frobnicate,
                                    jobs => [ ... ],
                                    maxprocs => 8});
  $batch->run();

DESCRIPTION

Parallel::Batch solves a common problem allowing modern multi-CPU computers to be used efficiently: you have a large number of independent pieces of data that all need to be processed somehow, and can run several of these processes at the same time.

There are a few trivial ways to execute a large number of jobs. You could run the entire set serially, but this will not use all the available processing speed. You could also create n processes at once to run all jobs simultaneously, but this tends to quickly exhast other resources like memory and I/O bandwidth, making the entire process slower. Or you could divide the set into m equally-sized groups and have each processor run its subset serially, but this will usually waste time at the end if some jobs take longer than others to finish.

This module works by calling fork() to create a new process, invoking a user-specified function on the next piece of data within this process, and returning once all data has been thusly processed and all processes exited. It also keeps track of the total number of jobs in progress, and will keep this under a set limit by delaying new forks until existing processes terminate.

CONSTRUCTOR

new

Options:

The following options can be passed to the constructor in a hashref, or retrieved or changed later using their own accessor methods

code

coderef to be run on each piece of data. It will be passed a single argument, which is an element of the jobs array.

jobs

Array of data objects to be processed.

maxprocs

Maximum number of child processes that should be running at any time.

progress_cb

Hashref of progress callbacks

METHODS

run

Start running the jobs, and return once all are completed.

PROGRESS NOTIFICATION

Parallel::Batch can report its progress through applicaton-defined callbacks as it runs. If the progress_cb argument is a hashref containing any of the following keys, they will be called at the places descibed:

start

Will be called just before any processes are spawned.

new

Will be called after each new process has been created.

finish

Will be called when a child process exits.

done

Will be called after all jobs are completed and all child processes have terminated.

SEE ALSO

Mention other useful documentation such as the documentation of related modules or operating system documentation (such as man pages in UNIX), or any relevant external documentation such as RFCs or standards.

If you have a mailing list set up for your module, mention it here.

If you have a web site set up for your module, mention it here.

AUTHOR

Stephen Cavilia, <sac@atomicradi.us<gt>

COPYRIGHT AND LICENSE

Copyright (C) 2011 by Stephen Cavilia

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.12.2 or, at your option, any later version of Perl 5 you may have available.