NAME

App::SpamcupNG::HTMLParse - functions to extract information from Spamcop.net web pages

SYNOPSIS

    use App::SpamcupNG::HTMLParse qw(find_next_id find_errors find_warnings find_spam_header find_message_age find_header_info);

DESCRIPTION

This package export functions that uses XPath to extract specific information from the spamcop.net HTML pages.

EXPORTS

Following are all exported functions by this package.

find_header_info

Finds information from the e-mail header of the received SPAM and returns it.

Returns a hash reference with the following keys:

mailer: the X-Mailer header, if available
content_type: the Content-Type, if available

There is an attempt to normalize the Content-Type header, by removing extra spaces and using just the first two entries, also making everything as lower case.

find_message_age

Find and return the SPAM message age information.

Returns an array reference, with the zero index as an integer with the age, and the index 1 as the age unit (possibly "hour");

If nothing is found, returns undef;

find_next_id

Expects as parameter a scalar reference of the HTML page.

Tries to find the SPAM ID used to identify SPAM reports on spamcop.net webpage.

Returns the ID if found, otherwise undef.

find_warnings

Expects as parameter a scalar reference of the HTML page.

Tries to find all warnings on the HTML, based on CSS classes.

Returns an array reference with all warnings found.

find_errors

Expects as parameter a scalar reference of the HTML page.

Tries to find all errors on the HTML, based on CSS classes.

Returns an array reference with all errors found.

find_best_contacts

Expects as parameter a scalar reference of the HTML page.

Tries to find all best contacts on the HTML, based on CSS classes.

The best contacts are the e-mail address that Spamcop considers appropriate to use for SPAM reporting.

Returns an array reference with all best contacts found.

find_spam_header

Expects as parameter a scalar reference of the HTML page.

You can optionally pass a second parameter that defines if each line should be prefixed with a tab character. The default value is false.

Tries to find the e-mail header of the SPAM reported.

Returns an array reference with all the lines of the e-mail header found.

find_receivers

Expects as parameter a scalar reference of the HTML page.

Tries to find all the receivers of the SPAM report, even if those were not real e-mail address, only internal identifiers for Spamcop to store statistics.

Returns an array reference, where each item is a string.

SEE ALSO

AUTHOR

Alceu Rodrigues de Freitas Junior, <arfreitas@cpan.org>

COPYRIGHT AND LICENSE

This software is copyright (c) 2018 of Alceu Rodrigues de Freitas Junior, <arfreitas@cpan.org>

This file is part of App-SpamcupNG distribution.

App-SpamcupNG is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

App-SpamcupNG is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with App-SpamcupNG. If not, see <http://www.gnu.org/licenses/>.