=pod

=head1 NAME

Algorithm::Diff::Apply -- apply one or more Algorithm::Diff diffs

=head1 SYNOPSIS

	## Single-diff form:
	use Algorithm::Diff::Apply qw{apply_diff};
	my @ary = ...;
	my @diff = ...;   # some call to Algorithm::Diff::diff()

	my @changed_ary = apply_diff(\@ary, \@diff);
	my $changed_ary = apply_diff(\@ary, \@diff);

	
	## To apply >1 diff at once, use the plural form:
	use Algorithm::Diff::Apply qw{apply_diffs};

        @newary = apply_diffs(\@ary,
                              diff1_name => \@diff1,
                              diff2_name => \@diff2,
	                      ...
                              diffN_name => \@diffN);

        # Alternatively:
        @newary = apply_diffs(\@ary, %named_diffs);

        # Scalar context:
        $newary = apply_diffs(\@ary, %named_diffs);

        # Extension argument syntax:
        $newary = apply_diffs(\@ary, {
                                  resolver => \&some_sub,
	                          optimisers => [\&foo, \&bar],
                                  key_generator => \&anothersub,
	                          opt4 => ...,
	                          opt5 => ...,
                              }, %named_diffs);

=head1 DESCRIPTION

    At worst, the user is made to feel superior to the computer.
      -- R. E. Mullen [1]

This module contains subroutines for applying diffs generated by
C<Algorithm::Diff> to a target array in the hope of regenerating a new
array incorporating all the changes described in the diffs into a new
merged array.

If two hunks from different diffs happen to affect the same line,
conflicts are detected and can be optionally handed off to helper
subroutines for resolution.

=over 4

=item apply_diff ARRAY,DIFF

Applies the changes described by a diff to a copy of I<ARRAY>, which
is then returned. I<DIFF> is a diff generated by
C<Algorithm::Diff::diff()>, and I<ARRAY> must be an array of an
appropriate length. Both parameters are passed in as references.
Neither argument is modified.

In a scalar context, C<apply_diff()> returns a reference to the
permuted copy that's generated. In an array context, the permuted copy
is returned as an array value.

This version of the algorithm is simpler and quicker than the
full-blown plural form, and should be used if you're only ever going
to be applying one diff at once.

=item apply_diffs ARRAY,HASH

=item apply_diffs ARRAY,OPTIONS,HASH

Applies more than one diff to a copy of an array at once, manages
conflicts, and returns the permuted copy as either a reference or an
array depending on context.

I<ARRAY> must be a reference to an array value of an appropriate
length. The array behind the passed reference is not permuted.

The I<HASH> parameter contains diffs from different sources, as
generated by C<Algorithm::Diff::diff()>. The diffs are keyed by
arbitrary strings which should reflect their source.
See L<DIFF LABELS> for details of what these strings might reflect.

I<OPTIONS>, if specified, must be a hash reference of option keywords
and the corresponding parameters. The following options are
recognised:

=over 8

=item Option "optimisers" (a.k.a. "optimizers")

Reference to an array of conflict optimiser subroutines. Normally
C<apply_diffs()> performs all the optimisations documented in this
module; this option can be used to change that. Pass in an empty array
to turn off optimisations altogether. See L<Conflict Optimiser
Callbacks> for details of how these subs are called.

=item Option "resolver"

This option can be used to supply a subroutine which will be called
when a conflict is detected. See L<Conflict Resolver Callbacks>
details of how resolver routines are called.

=item Option "key_generator"

If the elements in the target arrays and all the diffs cannot be
meaningfully compared with plain "C<eq>" - e.g. reference types - then
a hashing function can be supplied using this option. See C<Key
Generation Callbacks> below for details.

=back

=item mark_conflicts HASH

This is the default resolver callback; see L<Conflict Optimiser
Callbacks> for details of its interface. It causes C<apply_diffs()> to
return arrays looking a bit like:

   [ @part_before_conflict,
     ">>>>>> diff1_name\n",
     @lines_1,              # Lines permuted by diff1 (only)
     ">>>>>> diff2_name\n"
     @lines_2,              # The same lines, as permuted by diff2
     "<<<<<<\n",
     @part_after_conflict,
   ]

Which is probably the right thing to do if your array is going to be
printed out one item per line.

=item optimise_remove_duplicates HASH

Conflict optimiser: detects identical diff hunks in the conflict
block, and removes all but one of each duplicated hunk.

=back

=head1 CALLBACK INTERFACES

A lot of the dirty work in C<apply_diffs()> is done by callback
subroutines, and users of C<Algorithm::Diff::Apply> can override its
default behaviour if they wish by passing appropriate options to
C<apply_diffs()> (see above).

=head2 Conflict Optimiser Callbacks

Optimiser callbacks are a way of making conflict cases easier for
humans to deal with. This can be done by combining diff hunks from
different sources that do the same thing, factoring out common
changes, and other devious bits of monkeywork. Doing this can even
factor away a conflict situation entirely.

These callbacks are called by C<apply_diffs()> when it detects a group
of hunks from different diff sequences that can't all be applied at
once without conflicting with each other.

For each block of conflicting diffs, the callback will be called with
every diff in the block in the following format:

    @ret = optimiser_callback (
          "conflict_block" => {
               "name_1" => [$hunk_1_1, $hunk_1_2, ..., $hunk_1_N],
               "name_2" => [$hunk_2_1, $hunk_2_2, ..., $hunk_2_N],
                :
               "name_M" => [$hunk_M_1, $hunk_M_2, ..., $hunk_M_N],
          },
    );

The tag names are whatever you labelled the diff sequences you passed
to C<apply_diffs()>. By definition, a conflict block contains a subset
of at least two separate diffs.

Each of the C<$hunk_X_X> scalars in the arguments above is a hash
reference with the following structure:

    {
        "start"   => N,
        "changes" => [[OP1, DATA1], ..., [OPn, DATAn]],
    }

Where "start" is a line number in the I<target> array, indicating
where this hunk is intended to be applied, and "changes" contains the
changes to apply.

Optimiser callbacks should return a I<permuted copy> of what they were
passed. Empty diffs will be discarded automatically. If only one diff
remains after processing, the conflict will have been optimised away
completely.

=head2 Conflict Resolver Callbacks

Resolver callbacks are invoked when conflicts have been detected, and
the optimisers weren't able to completely factor away the conflict
block.

The job of a resolver callback is to return some kind of resolution of
the conflicting sub-arrays. These subs are called a little like this:

     @ret = resolver_callback (
	   "alt_txts" => {
                "diff1_name" => ['m', 'n', 'o'],
	        "diff3_name" => [],
           }
     );

The C<alt_txts> parameter is a hash ref keyed by (some of the) names
of the diffs being applied in the main C<apply_diffs()> call, whose
values are arrays containing alternative generated subsequences. Each
of these subsequences is the result of applying a set of hunks from
the corresponding diff to a copy of the slice of the source array
where the conflict happened.

A resolver callback should return an array which will be spliced into
the array that C<apply_diffs()> will return.

(Other options are passed to resolver subs too, but these are as yet
undocumented because they're still liable to change)

=head2 Key Generation Callbacks

When C<Algorithm::Diff::Apply> needs to compare two lines from
different hunks for equality during the optimisation phase, it uses
the "eq" operator by default. This is meaningless for many Perl
variables; for example object references stringify to a form that
doesn't capture the internals of the object, e.g.
"SomeClass=HASH(0x813aa8c)".

This also applies to plain ARRAY, HASH, SCALAR references, or other
types of ref. Ideally we'd like the internal state of any such object
pointed at by a scalar variable to be represented in a form that can
be compared against other varibales of the same type.

This is where key generation callbacks come in. A C<key_generator>
subroutine is a hashing routine; it takes as argument a scalar
variable, and returns a string that can be compared against other
strings returned by other calls to the key_generator. The return
values are cached inside a call to C<apply_diffs()> for efficiency.

The key generators used by C<Algorithm::Diff::Apply> have the same
semantics as the ones used by C<Algorithm::Diff>. See
L<Algorithm::Diff/KEY GENERATION FUNCTIONS>.

When you're dealing with objects, a simple wrapper around one of the
object class's methods often does the trick:

	# Book objects representing titles produced by a publishing
	# house are considered unique if their ISBNs are
	# identical.

	my @books = ...;
	my $booksdiff1 = ...;
	my $booksdiff2 = ...;
	apply_diffs(\@books, {
	                    key_generator => sub {
	                            my $book = shift;
	                            return $book->isbn();
	                    },
	            },
	            bookdiff1 => $booksdiff1,
	            bookdiff2 => $booksdiff2 );

Different kinds of thing in your array? Not a problem.

	#...
	key_generator => sub {
		my $arg = shift;
		local $_ = ref($arg) or return $arg;
		/Book/     and return $arg->isbn();
		/Product/  and return $arg->serial_number();
		/UKPerson/ and return $arg->nat_ins_number();
		/Person/   and return $arg->uid();
		# ... fallback case ...
		return "$arg";    # stringify: always unique
	},
	#...

Remember that A::D::Apply only uses this type of callback when trying
to optimise away or reduce conflicting hunks. That's why the "singular
form" C<apply_diff()> doesn't take this option.

=head1 DIFF LABELS

C<apply_diffs()> makes you uniquely label the diffs you're applying
when merging and applying them together, in the hope that when a
conflict occurs, a meaningful name can be associated with each source
of changes.

A diff label should reflect where the change came from, and can be
anything you like: a revision number, a URL, a filename, or just a
unique number. C<Algorithm::Diff::Apply> doesn't care about the exact
syntax of the tag names you use - any unique string will do.

This convention is also used by the callback routines whose interfaces
are described in L<CALLBACK INTERFACES> above.

=head1 TO-DO

The calling convention for options is ugly and confusing, and defeats
prototyping. It's done like that to correspond with other modules in
the I<Algorithm::{Diff,Patch}> family, but alternative syntaxes should
probably be permitted.

Add more conflict optimisers. If anyone out there has a good routine,
and would like to see it added to the library, then I'd be happy to
incorporate it.

=head1 SEE ALSO

=over 4

=item Tracking:

You can report bugs in this module at
L<http://rt.cpan.org/NoAuth/Bugs.html?Dist=Algorithm-Diff-Apply>, or to the
author directly via email.

=item Other Perl modules:

L<Algorithm::Diff>, of course. Other modules that do similar things to
C<Algorithm::Diff::Apply>: L<Algorithm::Merge>, L<VCS::Lite>.

=item Papers, software, and discussions:

[1] Automated Merging of Software Modifications, R. E. Mullen, CISL
Cambridge, Honeywell Software Productivity Symposium, Minneapolis MN,
April 1977: L<http://www.multicians.org/mullen-paper.html>.

Codeville: L<http://bitconjurer.org/codeville/>.

darcs: L<http://www.abridgegame.org/darcs/>.

Monotone: L<http://www.venge.net/monotone/>.

Interesting information for anyone building a full VCS:
L<http://kt.zork.net/kernel-traffic/kt20030323_210.html#1>.

=back

=head1 AUTHOR

Andrew Chadwick, I<andrewc-algodiffaply@piffle.org>.

=head1 LICENCE

Copyright (c) 2003 Andrew Chadwick. This program is free software; you may
copy it, redistribute it, or modify it under the same terms as Perl itself.

=cut