Beautiful Soup

A python HTML/XML parser designed for quick turnaround of projects like screen-scraping.

Install

pip install beautifulsoup

Note: For earlier versions of python, it might be best to install:

pip install beautifulsoup==3.0.8

… for more information see Having problems with Beautiful Soup 3.1.0?

Sample

For more details, see Documentation:

html = '...

from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup(html)

soup.find('label')
>>> <label for="id_name">Place name</label>

soup.findAll('label')
>>> [<label for="id_name">Place name</label>, <label>Place name</label>]

soup.findAll(id='id_name')
>>> [<input name="name" value="East Anstey" class="textInput" maxlength="45" type="text" id="id_name" />]

Attributes

soup = BeautifulSoup(html)
# get the first element
element = soup.contents[0]
# copy the elements to a dict
dict(element.attrs)

Text

soup.findAll(text='ABC')