-
-
15 Aug 2013 02:11:22 UTC
- Distribution: CAM-PDF
- Module version: 1.60
- Source (raw)
- Browse (raw)
- Changes
- How to Contribute
- Issues (51)
- Testers (6443 / 3 / 0)
- Kwalitee
Bus factor: 0- 54.93% Coverage
- License: perl_5
- Perl: v5.6.0
- Activity
24 month- Tools
- Download (749.66KB)
- MetaCPAN Explorer
- Permissions
- Subscribe to distribution
- Permalinks
- This version
- Latest version
and 1 contributors-
Clotho Advanced Media, Inc.
- Dependencies
- Crypt::RC4
- Digest::MD5
- Text::PDF
- and possibly others
- Reverse dependencies
- CPAN Testers List
- Dependency graph
NAME
CAM::PDF::PageText - Extract text from PDF page tree
SYNOPSIS
my $pdf = CAM::PDF->new($filename); my $pageone_tree = $pdf->getPageContentTree(1); print CAM::PDF::PageText->render($pageone_tree);
DESCRIPTION
This module attempts to extract sequential text from a PDF page. This is not a robust process, as PDF text is graphically laid out in arbitrary order. This module uses a few heuristics to try to guess what text goes next to what other text, but may be fooled easily by, say, subscripts, non-horizontal text, changes in font, form fields etc.
All those disclaimers aside, it is useful for a quick dump of text from a simple PDF file.
LICENSE
Same as CAM::PDF
FUNCTIONS
- $pkg->render($pagetree)
- $pkg->render($pagetree, $verbose)
-
Turn a page content tree into a string. This is a class method that should be called like:
CAM::PDF::PageText->render($pagetree);
AUTHOR
See CAM::PDF
Module Install Instructions
To install CAM::PDF, copy and paste the appropriate command in to your terminal.
cpanm CAM::PDF
perl -MCPAN -e shell install CAM::PDF
For more information on module installation, please visit the detailed CPAN module installation guide.