Html2Text
in package
Converts HTML into plain text format suitable for email display
Features:
- Maintains links with href copied over
- Information in the is lost
- Handles various HTML elements appropriately for text conversion
Table of Contents
- convert() : string
- Converts HTML into plain text format
- default_options() : array<string, bool|string>
- Default options for HTML to text conversion
- fix_newlines() : string
- Unify newlines
- is_office_document() : bool
- Can we guess that this HTML is generated by Microsoft Office?
- is_whitespace() : bool
- Check if text is whitespace
- nbsp_codes() : array<string|int, string>
- Get non-breaking space character codes
- process_whitespace_newlines() : string
- Remove leading or trailing spaces and excess empty lines from provided multiline text
- zwnj_codes() : array<string|int, string>
- Get zero-width non-joiner character codes
- get_document() : DOMDocument
- Parse HTML into a DOMDocument
- iterate_over_node() : string
- Iterate over a DOM node and convert to text
- next_child_name() : string|null
- Get the next child name
- render_text() : string
- Replace any special characters with simple text versions
Methods
convert()
Converts HTML into plain text format
public
static convert(string $html[, bool|array<string, bool|string> $options = array() ]) : string
Parameters
- $html : string
-
The input HTML.
- $options : bool|array<string, bool|string> = array()
-
Conversion options.
Tags
Return values
string — The HTML converted to text.default_options()
Default options for HTML to text conversion
public
static default_options() : array<string, bool|string>
Return values
array<string, bool|string> — Default options array.fix_newlines()
Unify newlines
public
static fix_newlines(string $text) : string
Converts \r\n to \n, and \r to \n. This means that all newlines (Unix, Windows, Mac) all become \ns.
Parameters
- $text : string
-
Text with any number of \r, \r\n and \n combinations.
Return values
string — The fixed text.is_office_document()
Can we guess that this HTML is generated by Microsoft Office?
public
static is_office_document(string $html) : bool
Parameters
- $html : string
-
The HTML content.
Return values
bool — True if this appears to be an Office document.is_whitespace()
Check if text is whitespace
public
static is_whitespace(string $text) : bool
Parameters
- $text : string
-
The text to check.
Return values
bool — True if the text is whitespace.nbsp_codes()
Get non-breaking space character codes
public
static nbsp_codes() : array<string|int, string>
Return values
array<string|int, string> — Array of nbsp codes.process_whitespace_newlines()
Remove leading or trailing spaces and excess empty lines from provided multiline text
public
static process_whitespace_newlines(string $text) : string
Parameters
- $text : string
-
Multiline text with any number of leading or trailing spaces or excess lines.
Return values
string — The fixed text.zwnj_codes()
Get zero-width non-joiner character codes
public
static zwnj_codes() : array<string|int, string>
Return values
array<string|int, string> — Array of zwnj codes.get_document()
Parse HTML into a DOMDocument
private
static get_document(string $html, array<string, bool|string> $options) : DOMDocument
Parameters
- $html : string
-
The input HTML.
- $options : array<string, bool|string>
-
Parsing options.
Tags
Return values
DOMDocument — The parsed document tree.iterate_over_node()
Iterate over a DOM node and convert to text
private
static iterate_over_node(DOMNode $node, string|null $prev_name, bool $in_pre, bool $is_office_document, array<string, bool|string> $options) : string
Parameters
- $node : DOMNode
-
The DOM node.
- $prev_name : string|null
-
Previous node name.
- $in_pre : bool
-
Whether we're in a pre block.
- $is_office_document : bool
-
Whether this is an Office document.
- $options : array<string, bool|string>
-
Conversion options.
Return values
string — The converted text.next_child_name()
Get the next child name
private
static next_child_name(DOMNode|null $node) : string|null
Parameters
- $node : DOMNode|null
-
The node to check.
Return values
string|null — The next child name.render_text()
Replace any special characters with simple text versions
private
static render_text(string $text) : string
This prevents output issues:
- Convert non-breaking spaces to regular spaces; and
- Convert zero-width non-joiners to '' (nothing).
This is to match our goal of rendering documents as they would be rendered by a browser.
Parameters
- $text : string
-
The text to process.