MoonPoint Support Logo

 

Shop Amazon Warehouse Deals - Deep Discounts on Open-box and Used ProductsAmazon Warehouse Deals



Advanced Search
October
Sun Mon Tue Wed Thu Fri Sat
       
26
2015
Months
Oct


Mon, Oct 26, 2015 8:36 pm

Downloading a web page with Python

To download a webpage with a Python script, you can use the following, substituting the URL for the page you wish to download for the one for which you wish to download the source code:

import urllib2

url="http://www.example.com/somepage.html"

page =urllib2.urlopen(url)
source=page.read()
print source

If you wish the script to prompt for the URL and a location for a file where the source code for the web page will be stored, you can use the following:

import urllib2

url=raw_input("URL: ")
outfile=raw_input("Output file: ")

page =urllib2.urlopen(url)
source=page.read()

f=open(outfile, 'w')
f.write(source)
f.close()

The "w" in the f=open(outfile, 'w') line indicates the file should be opened for writing. Other possible modes for the file are listed below:

ModesDescription
r Opens a file for reading only. The file pointer is placed at the beginning of the file. This is the default mode.
rb Opens a file for reading only in binary format. The file pointer is placed at the beginning of the file. This is the default mode.
r+ Opens a file for both reading and writing. The file pointer placed at the beginning of the file.
rb+ Opens a file for both reading and writing in binary format. The file pointer placed at the beginning of the file.
w Opens a file for writing only. Overwrites the file if the file exists. If the file does not exist, creates a new file for writing.
wb Opens a file for writing only in binary format. Overwrites the file if the file exists. If the file does not exist, creates a new file for writing.
w+ Opens a file for both writing and reading. Overwrites the existing file if the file exists. If the file does not exist, creates a new file for reading and writing.
wb+ Opens a file for both writing and reading in binary format. Overwrites the existing file if the file exists. If the file does not exist, creates a new file for reading and writing.
a Opens a file for appending. The file pointer is at the end of the file if the file exists. That is, the file is in the append mode. If the file does not exist, it creates a new file for writing.
ab Opens a file for appending in binary format. The file pointer is at the end of the file if the file exists. That is, the file is in the append mode. If the file does not exist, it creates a new file for writing.
a+ Opens a file for both appending and reading. The file pointer is at the end of the file if the file exists. The file opens in the append mode. If the file does not exist, it creates a new file for reading and writing.
ab+ Opens a file for both appending and reading in binary format. The file pointer is at the end of the file if the file exists. The file opens in the append mode. If the file does not exist, it creates a new file for reading and writing.

If you named the script download_webpage.py, you could run it from a command line inteface, aka shell prompt, as follows:

$ python download_webpage.py
URL: http://www.example.com/somepage.html
Output file: example-somepage.html

References:

  1. Python Files I/O
    tutorialspoint - The largest Tutorials Library on the web

[/languages/python] permanent link

Valid HTML 4.01 Transitional

Privacy Policy   Contact

Blosxom logo