From 858dc98f8da00d8cd86ddbe516ec17b399c1465a Mon Sep 17 00:00:00 2001 From: Dimitris Zlatanidis Date: Thu, 23 Jan 2014 23:06:50 +0700 Subject: libraries/BeautifulSoup4: Added (Python HTML/XML parser). Signed-off-by: Willy Sudiarto Raharjo --- libraries/BeautifulSoup4/README | 27 +++++++++++++++++++++++++++ 1 file changed, 27 insertions(+) create mode 100644 libraries/BeautifulSoup4/README (limited to 'libraries/BeautifulSoup4/README') diff --git a/libraries/BeautifulSoup4/README b/libraries/BeautifulSoup4/README new file mode 100644 index 000000000000..41925ef89ed5 --- /dev/null +++ b/libraries/BeautifulSoup4/README @@ -0,0 +1,27 @@ +Beautiful Soup is a Python HTML/XML parser designed for quick turnaround +projects like screen-scraping. Three features make it powerful: + +1. Beautiful Soup won't choke if you give it bad markup. It yields a +parse tree that makes approximately as much sense as your original +document. This is usually good enough to collect the data you need +and run away. + +2. Beautiful Soup provides a few simple methods and Pythonic idioms for +navigating, searching, and modifying a parse tree: a toolkit for +dissecting a document and extracting what you need. You don't have to +create a custom parser for each application. + +3. Beautiful Soup automatically converts incoming documents to Unicode and +outgoing documents to UTF-8. You don't have to think about encodings, +unless the document doesn't specify an encoding and Beautiful Soup +can't autodetect one. Then you just have to specify the original +encoding. + +Beautiful Soup parses anything you give it, and does the tree traversal +stuff for you. You can tell it "Find all the links", or "Find all the links +of class externalLink", or "Find all the links whose urls match "foo.com", +or "Find the table heading that's got bold text, then give me that text." + +Valuable data that was once locked up in poorly-designed websites is now +within your reach. Projects that would have taken hours take only minutes +with Beautiful Soup. -- cgit v1.2.3