diff options
Diffstat (limited to 'libraries/html5lib/README')
-rw-r--r-- | libraries/html5lib/README | 15 |
1 files changed, 4 insertions, 11 deletions
diff --git a/libraries/html5lib/README b/libraries/html5lib/README index a38654faac06..37294a67bc05 100644 --- a/libraries/html5lib/README +++ b/libraries/html5lib/README @@ -1,12 +1,5 @@ -html5lib (HTML parser based on the HTML5 specification) +html5lib is a pure-python library for parsing HTML. It is designed to +conform to the WHATWG HTML specification, as is implemented by all +major web browsers. -HTML parser designed to follow the HTML5 specification. The parser is -designed to handle all flavours of HTML and parses invalid documents -using well-defined error handling rules compatible with the behaviour of -major desktop web browsers. - -Output is to a tree structure; the current release supports output -to DOM, ElementTree and lxml tree formats as well as a simple -custom format. - -Optional: datrie, lxml, and genshi +Optional dependencies: chardet, genshi, and lxml |