Reference¶
-
readability.browser.open_in_browser(html)¶ Open the HTML document in a web browser, saving it to a temporary file to open it. Note that this does not delete the file after use. This is mainly meant for debugging.
-
readability.encoding.fix_charset(encoding)¶ Overrides encoding when charset declaration or charset determination is a subset of a larger charset. Created because of issues with Chinese websites
-
class
readability.readability.Document(input, positive_keywords=None, negative_keywords=None, url=None, min_text_length=25, retry_length=250, xpath=False, handle_failures='discard')¶ Bases:
objectClass to build a etree document out of html.
Returns document author
-
content()¶ Returns document body
-
get_clean_html()¶ An internal method, which can be overridden in subclasses, for example, to disable or to improve DOM-to-text conversion in .summary() method
-
short_title()¶ Returns cleaned up document title
-
summary(html_partial=False)¶ Given a HTML file, extracts the text of the article.
Parameters: html_partial – return only the div of the document, don’t wrap in html and body tags. Warning: It mutates internal DOM representation of the HTML document, so it is better to call other API methods before this one.
-
title()¶ Returns document title
-
exception
readability.readability.Unparseable¶ Bases:
ValueError