WooCommerce Code Reference

Html2Text
in package

Converts HTML into plain text format suitable for email display

Features:

  • Maintains links with href copied over
  • Information in the is lost
  • Handles various HTML elements appropriately for text conversion

Table of Contents

convert()  : string
Converts HTML into plain text format
default_options()  : array<string, bool|string>
Default options for HTML to text conversion
fix_newlines()  : string
Unify newlines
is_office_document()  : bool
Can we guess that this HTML is generated by Microsoft Office?
is_whitespace()  : bool
Check if text is whitespace
nbsp_codes()  : array<string|int, string>
Get non-breaking space character codes
process_whitespace_newlines()  : string
Remove leading or trailing spaces and excess empty lines from provided multiline text
zwnj_codes()  : array<string|int, string>
Get zero-width non-joiner character codes
get_document()  : DOMDocument
Parse HTML into a DOMDocument
iterate_over_node()  : string
Iterate over a DOM node and convert to text
next_child_name()  : string|null
Get the next child name
render_text()  : string
Replace any special characters with simple text versions

Methods

convert()

Converts HTML into plain text format

public static convert(string $html[, bool|array<string, bool|string> $options = array() ]) : string
Parameters
$html : string

The input HTML.

$options : bool|array<string, bool|string> = array()

Conversion options.

Tags
throws
Html2Text_Exception|InvalidArgumentException

If the HTML could not be loaded or invalid options are provided.

Return values
stringThe HTML converted to text.

fix_newlines()

Unify newlines

public static fix_newlines(string $text) : string

Converts \r\n to \n, and \r to \n. This means that all newlines (Unix, Windows, Mac) all become \ns.

Parameters
$text : string

Text with any number of \r, \r\n and \n combinations.

Return values
stringThe fixed text.

is_office_document()

Can we guess that this HTML is generated by Microsoft Office?

public static is_office_document(string $html) : bool
Parameters
$html : string

The HTML content.

Return values
boolTrue if this appears to be an Office document.

process_whitespace_newlines()

Remove leading or trailing spaces and excess empty lines from provided multiline text

public static process_whitespace_newlines(string $text) : string
Parameters
$text : string

Multiline text with any number of leading or trailing spaces or excess lines.

Return values
stringThe fixed text.

get_document()

Parse HTML into a DOMDocument

private static get_document(string $html, array<string, bool|string> $options) : DOMDocument
Parameters
$html : string

The input HTML.

$options : array<string, bool|string>

Parsing options.

Tags
throws
Html2Text_Exception

If the HTML could not be loaded.

Return values
DOMDocumentThe parsed document tree.

iterate_over_node()

Iterate over a DOM node and convert to text

private static iterate_over_node(DOMNode $node, string|null $prev_name, bool $in_pre, bool $is_office_document, array<string, bool|string> $options) : string
Parameters
$node : DOMNode

The DOM node.

$prev_name : string|null

Previous node name.

$in_pre : bool

Whether we're in a pre block.

$is_office_document : bool

Whether this is an Office document.

$options : array<string, bool|string>

Conversion options.

Return values
stringThe converted text.

render_text()

Replace any special characters with simple text versions

private static render_text(string $text) : string

This prevents output issues:

  • Convert non-breaking spaces to regular spaces; and
  • Convert zero-width non-joiners to '' (nothing).

This is to match our goal of rendering documents as they would be rendered by a browser.

Parameters
$text : string

The text to process.

Return values
stringThe processed text.