NAME

Net::Hadoop::Oozie

VERSION

version 0.102

SYNOPSIS

    use Net::Hadoop::Oozie;
    my $oozie = Net::Hadoop::Oozie->new( %options );

DESCRIPTION

This module is a Perl interface to Oozie REST service endpoints and also include some utility methods for some bulk requests and some admin functionality.

NAME

Net::Hadoop::Oozie - Interface to various Oozie REST endpoints and utility methods.

ACCESSORS

action

api_version

doas

filter

The submission format is filter_key1=filter_value1;filter_key2=...;, but the filters are defined as a hash.

    filter => {
        status => ...,
    }

The valid filters are listed below.

name

The application name from the workflow/coordinator/bundle definition

user

The user that submitted the job

group

The group for the job

status

The status of the job

You need to consider a certain behavior when using filters:

  • The query will do an AND among all the filter names.

  • The query will do an OR among all the filter values for the same name.

  • Multiple values must be specified as different name value pairs.

jobtype

The doc says workflow, coordinator, bundle BUT in CDH 4.4, valid values are '','coordinators' and 'bundles'. workflows and coordinator methods are helper functions setting these values behind the scenes.

len

Defaults to 50.

offset

Defaults to 1.

order

Default is asc, can be asc or desc. For instance, when used on a coordinator in a job call, using desc will put the len most recent actions in the actions key, in most recent order first; the offset is then applied from the end of the list.

show

METHODS

END POINTS

admin

build_version

coord_rerun

coordinators

job

jobs

kill

submit_job

For details about job submission through REST, see https://oozie.apache.org/docs/4.0.0/WebServicesAPI.html#Job_Submission.

Required parameters are listed below.

  • oozie.wf.application.path

    Like /oozie_workflows/myworkflow, must be deployed there already.

  • appName

    How this specific instance will be called, can be anything you want.

Optional parameters are listed below.

Auto variables

If you want some variable interpolated in your script (like a date, an int, or whatever), pass it in the options you call the method with. if you pass foo => 'bar', inside the workflow you will be able to use it as ${foo}.

Configuration properties

Useful parameters for oozie itself (like the queue name) need AFAICT an extra level of handling. they can be set dynamically, but need a tweak in the workflow definition itself, in the top config section; for instance, if we need to specify mapreduce.job.queuename to assign the tasks to a specific fair scheduler queue, we need to declare it in the global configuration section, like this:

    <property>
        <name>mapreduce.job.queuename</name>
        <value>${queueName}</value>
    </property>

And we will call "submit_job" adding this to the options hash:

    queueName => "root.<queue name>"

This method returns a job ID which you can use directly to query the job status, with the "job" method above, so you can launch a job from a script, and have a loop query the job status at regular intervals (be nice, please) to check when it's done (untested code :-).

    my $oozie = Net::Hadoop::Oozie->new;
    my $job_params = [
        { appName => 'job1', myParam => 'foo' },
        { appName => 'job2', myParam => 'bar' },
        ...
    ];
    for my $job (@$job_params) {
        my $jobid = $oozie->submit_job({
            myParam                     => $job->{myParam},
            debug                       => 0, # set to 1 to print the job config and response
            appName                     => $job->{appName},
            'oozie.wf.application.path' => "/wf_base_path/<workflow name>/",
        });
        push @ids, $jobid;
    }

    while (my $jobid = shift @ids) {
        my $status;
        if (($status = $oozie->job($jobid)->{status}) =~ /(WAITING|READY|SUBMITTED|RUNNING)/)) {
            push @ids, $jobid; # put back in the queue
            sleep 10; # or more, how about 60?
        }
        # what do you want to do if not succeeded?
        if ($status !~ /SUCCEEDED/) {
            die "job $jobid died";
        }
    }

workflows

UTILITY METHODS

active_coordinators

active_job_paths

coordinators_with_the_same_appname_on_the_same_path

failed_workflows_last_n_hours

failed_workflows_last_n_hours_pretty

job_exists

This is a sugar interface on top of the "job" method. Normally the REST interface just dies with an HTTP 400 message on missing jobs. This method won't die and will return the data set if there is a proper response from the service. It will return false otherwise.

    if ( my $job = $oozie->job_exists( $id ) ) {
        # do something
    }
    else {
        warn "No such job: $id";
    }

standalone_active_workflows

Returns an arrayref of standalone workflows (as in jobs not attached to a coordinator):

    my $wfs_without_a_coordinator = $oozie->standalone_active_workflows;
    foreach my $wf ( @{ $wfs_without_a_coordinator } ) {
        # do something
    }

suspended_coordinators

Returns an arrayref of suspended coordinators:

    my $suspended = $oozie->suspended_coordinators;
    foreach my $coord ( @{ $suspended } ) {
        # do something
    }

suspended_workflows

Returns an arrayref of suspended workflows:

    my $suspended = $oozie->suspended_workflows;
    foreach my $wf ( @{ $suspended } ) {
        # do something
    }

coordinators_with_the_same_appname_on_the_same_path

Returns a hash consisting of duplicated application names for multiple coordinators. Having coordinators like this is usually an user error when submitting jobs.

    my %offenders = $oozie->coordinators_with_the_same_appname_on_the_same_path;

AUTHOR

David Morel <david.morel@amakuru.net>

COPYRIGHT AND LICENSE

This software is copyright (c) 2015 by David Morel & Booking.com.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.