MoonPoint Support Logo

 

Shop Amazon Warehouse Deals - Deep Discounts on Open-box and Used ProductsAmazon Warehouse Deals



Advanced Search
March
Sun Mon Tue Wed Thu Fri Sat
    3
   
2016
Months
Mar


Thu, Mar 03, 2016 10:02 pm

Downloading a web page with Python using command line parameters

If you wish to download a web page with a Python script, you can imput the urllib2 module into a Python script as explained at Downloading a web page with Python. I've modified the script posted there to allow the webpage URL and output file name to be specified as command line arguments to the script:

#!/usr/bin/python

# download_page
# download a webpage to a specified file. The script takes two parameters:
# the URL of the page to download and a file name to be used to hold
# the downloaded web page.

import urllib2, sys

try:
   sys.argv[1]
except IndexError:
   print "Error - URL missing! Usage: ./download_page.py download_page_url outfile"
   sys.exit(1)
else:
   url = sys.argv[1]

try:
   sys.argv[2]
except IndexError:
   print "Error - missing output file name! Usage: ./download_page.py download_page_url outfile"
   sys.exit(1)
else:
   outfile = sys.argv[2]

page = urllib2.urlopen(url)
source = page.read()

downloadFile = open(outfile, 'w')
downloadFile.write(source)
downloadFile.close()

The sys module is imported to check the command line arguments using sys.argv[x], where x. is the number specifying the argument; sys.argv[0] is always the name of the script itself, in this case download_page.py, so sys.argv[1] should be the URL of the webpage to be saved and sys.argv[2] the file name for the output file. The file name can contain a location for the output file, e.g., mydir/somepage.html. If a directory is specified with the file name, the script doesn't check to ensure the directory exists and will exit with a Python "No such file or directory" error message should that error occur. If no directory path is included with the file name, the directory from which the script is run will be used to store the downloaded webpage.

The script will print error messages if the URL and output file name are omitted from the command line. It can be run using python ./download_page.py or ./download_page.py, if for the latter option you have first changed the file permissions on the program to mark it as executable, e.g., with chmod 755 download_page.py.

download_page.py

[/languages/python] permanent link

Valid HTML 4.01 Transitional

Privacy Policy   Contact

Blosxom logo