-
-
28 Oct 2018 07:26:20 UTC
- Distribution: App-scrape
- Source (raw)
- Browse (raw)
- Changes
- How to Contribute
- Repository
- Issues (0)
- Testers (646 / 0 / 0)
- Kwalitee
Bus factor: 1- 90.60% Coverage
- License: perl_5
- Perl: v5.6.0
- Activity
24 month- Tools
- Download (16.05KB)
- MetaCPAN Explorer
- Permissions
- Subscribe to distribution
- Permalinks
- This version
- Latest version
++ed by:1 non-PAUSE userNAME
scrape.pl - simple HTML scraping from the command line
ABSTRACT
This is a simple program to extract data from HTML by specifying CSS3 or XPath selectors.
SYNOPSIS
scrape.pl URL selector selector ... # Print page title scrape.pl http://perl.org title # The Perl Programming Language - www.perl.org # Print links with titles, make links absolute scrape.pl http://perl.org a //a/@href --uri=2 # Print all links to JPG images, make links absolute scrape.pl http://perl.org a[@href=$"jpg"] # print JSON about Amazon prices scrape.pl https://www.amazon.de/dp/0321751043 --format json --name "title" #productTitle --name "price" #priceblock_ourprice --name "deal" #priceblock_dealprice # print JSON about Amazon prices for multiple products scrape.pl --format json --url https://www.amazon.de/dp/B01J90P010 --url https://www.amazon.de/dp/B01M3015CT --name "title" #productTitle --name "price" #priceblock_ourprice --name "deal" #priceblock_dealprice
DESCRIPTION
This program fetches an HTML page and extracts nodes matched by XPath or CSS selectors from it.
If URL is
-
, input will be read from STDIN.OPTIONS
- --format
-
Output format, the default is
csv
. Valid values arecsv
orjson
. - --url
-
URL to fetch. This can be given multiple times to fetch multiple URLs in one run. If this is not given, the first argument on the command line will be taken as the only URL to be fetched.
- --keep-url
-
Add the fetched URL as another column with the given name in the output. If you use CSV output, the URL will always be in the first column.
- --name
-
Name of the output column.
- --sep
-
Separator character to use for columns. Default is tab.
- --uri COLUMNS
-
Numbers of columns to convert into absolute URIs, if the known attributes do not everything you want.
- --no-uri
-
Switches off the automatic translation to absolute URIs for known attributes like
href
andsrc
.
REPOSITORY
The public repository of this module is http://github.com/Corion/App-scrape.
SUPPORT
The public support forum of this program is http://perlmonks.org/.
AUTHOR
Max Maischein
corion@cpan.org
COPYRIGHT (c)
Copyright 2011-2018 by Max Maischein
corion@cpan.org
.LICENSE
This module is released under the same terms as Perl itself.
Module Install Instructions
To install App::scrape, copy and paste the appropriate command in to your terminal.
cpanm App::scrape
perl -MCPAN -e shell install App::scrape
For more information on module installation, please visit the detailed CPAN module installation guide.