Author image Luc Didry
and 1 contributors

NAME

MHonArc::CharEnt - HTML Character routines for MHonArc.

SYNOPSIS

  use MHonArc::CharEnt;

  MHonArc resource file:

    <CharsetConverters>
    ...
    iso-8859-15;    MHonArc::CharEnt::str2sgml;     MHonArc/CharEnt.pm
    ...
    </CharsetConverters>

DESCRIPTION

MHonArc::CharEnt provides the main character conversion routine used by MHonArc for converting non-ASCII encoded message header data and text/plain character data into HTML. This module was initially written to just support 8-bit only charsets. However, it has been extended to support multibyte charsets.

All characters are mapped to HTML 4.0 character entity references (e.g. &lt; &gt;) or to Unicode numeric character entity references (e.g. &#x203E;). Most modern browsers will support the Unicode references directly.

NOTES

  • This module relies on MHonArc's CHARSETALIASES resource for defining alternate names for charset supported.

  • Most character conversion is done through mapping tables that are dynamicly loaded on a as-needed basis. There is probably room for optimization by trying to replace tables for charsets with algorithmic conversion solutions.

    UTF-8 conversion is done algorithmically.

  • A main goal of this module is to convert raw non-ASCII data of various character sets to ASCII data using entity references for non-ASCII characters. This way, archive files will all be in ASCII, with modern compliant HTML browsers being able to handle the rendering of non-ASCII characters from the standard named and numeric character entity references.

    This does make reading the raw HTML source for non-English languages difficult, but this may be a non-issue with most users.

VERSION

$Id: CharEnt.pm,v 1.17 2010/12/31 18:23:02 ehood Exp $

AUTHOR

Earl Hood, earl@earlhood.com

MHonArc comes with ABSOLUTELY NO WARRANTY and MHonArc may be copied only under the terms of the GNU General Public License, which may be found in the MHonArc distribution.