Beautiful Soup
**************
.. highlight:: python
A python HTML/XML parser designed for quick turnaround of projects like
screen-scraping.
Links
=====
- Documentation_
- http://www.crummy.com/software/BeautifulSoup/
- `html5lib is the new, better BeautifulSoup`_,
http://code.google.com/p/html5lib/
Install
=======
::
pip install beautifulsoup
Note: For earlier versions of python, it might be best to install:
::
pip install beautifulsoup==3.0.8
... for more information see `Having problems with Beautiful Soup 3.1.0?`_
Sample
======
For more details, see Documentation_:
::
html = '...
from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup(html)
soup.find('label')
>>>
soup.findAll('label')
>>> [, ]
soup.findAll(id='id_name')
>>> []
Attributes
----------
::
soup = BeautifulSoup(html)
# get the first element
element = soup.contents[0]
# copy the elements to a dict
dict(element.attrs)
Text
----
::
soup.findAll(text='ABC')
.. _`Beautiful Soup`: http://www.crummy.com/software/BeautifulSoup/
.. _`Having problems with Beautiful Soup 3.1.0?`: http://www.crummy.com/software/BeautifulSoup/3.1-problems.html
.. _`html5lib is the new, better BeautifulSoup`: http://twitter.com/#!/raymondh/status/1746646673129472
.. _Documentation: http://www.crummy.com/software/BeautifulSoup/documentation.html