HTML::Differences - Reasonable sane HTML diffing
use HTML::Differences qw( html_text_diff );
my $html1 = <<'EOF';
my $html2 = <<'EOF';
<p>Some <strong>strong</strong> text</p>
print html_text_diff( $html1, $html2 );
This module provides a reasonable sane way to get the diff between two HTML documents or fragments. Under the hood, it uses HTML::Parser.
Internally, this module converts the HTML it gets into an array reference containing each unique HTML token. These tokens consists of things such as the doctype declaration, tag start & end, text, etc.
All whitespace between two pieces of text is converted to a single space, except when inside a <pre> block. Leading and trailing space on text is also stripped out.
Start tags are normalized so that attributes appear in sorted order, and all quotes are converted to double quotes, with one space before each attribute. Self-closing tags (like <hr/>) are converted to their simpler form (<hr>).
Note that because HTML::Parser decodes HTML entities inside attribute values, this module cannot distinguish between two attributes where one contains an entity and one does not.
Missing end tags are not added, and will show up in the diff.
Comments are included by default, but you can pass a flag to ignore them.
This module offers two optionally importable subroutines. Nothing is exported by default.
This subroutine uses Text::Diff's diff() subroutine to provide a string version of the diff between the two pieces of HTML provided.
The HTML can be passed as a plain scalar or as a reference to a scalar.
After the two HTML parameters, you can pass key/value pairs as options:
If this is true, then comments are ignored for the purpose of the diff. This defaults to false.
The style for the diff. This defaults to "Table". See Text::Diff for the available options.
The amount of context to show in the diff. This defaults to 2**31 to include all the context. You can set this to some smaller value if you prefer.
This returns an array reference of strings suitable for passing to any of Algorithm::Diff's methods or exported subroutines.
The only option currently accepted is ignore_comments.
There are a couple other modules out there that do HTML diffs, so why write this one?
The HTML::Diff module uses regexes to parse HTML. This is crazy.
The Test::HTML::Differences module attempts to fix up the HTML a little too much for my purposes. It ends up ignoring missing end tags or breaking on them in various ways.
Dave Rolsky <email@example.com>
This software is Copyright (c) 2015 by Dave Rolsky.
This is free software, licensed under:
The Artistic License 2.0 (GPL Compatible)
To install HTML::Differences, copy and paste the appropriate command in to your terminal.
perl -MCPAN -e shell
For more information on module installation, please visit the detailed CPAN module installation guide.