WooCommerce Code Reference

HtmlPruner extends AbstractHtmlProcessor
in package

This class can remove things from HTML.

Table of Contents

CONTENT_TYPE_META_TAG  = '<meta http-equiv="Content-Type" content="text/html; charset=utf-8">'
DEFAULT_DOCUMENT_TYPE  = '<!DOCTYPE html>'
HTML_COMMENT_PATTERN  = '/<!--[^-]*+(?:-(?!->)[^-]*+)*+(?:-->|$)/'
regular expression pattern to match an HTML comment, including delimiters and modifiers
HTML_TEMPLATE_ELEMENT_PATTERN  = '%<template[\s>][^<]*+(?:<(?!/template>)[^<]*+)*+(?:</template>|$)%i'
regular expression pattern to match an HTML `<template>` element, including delimiters and modifiers
PHP_UNRECOGNIZED_VOID_TAGNAME_MATCHER  = '(?:command|embed|keygen|source|track|wbr)'
TAGNAME_ALLOWED_BEFORE_BODY_MATCHER  = '(?:html|head|base|command|link|meta|noscript|script|style|template|title)'
Regular expression part to match tag names that may appear before the start of the `<body>` element. A start tag for any other element would implicitly start the `<body>` element due to tag omission rules.
DISPLAY_NONE_MATCHER  = '//*[@style and contains(translate(translate(@style," ",""),"NOE","noe"),"display:none")' . ' and not(@class and contains(concat(" ", normalize-space(@class), " "), " -emogrifier-keep "))]'
We need to look for display:none, but we need to do a case-insensitive search. Since DOMDocument only supports XPath 1.0, lower-case() isn't available to us. We've thus far only set attributes to lowercase, not attribute values. Consequently, we need to translate() the letters that would be in 'NONE' ("NOE") to lowercase.
$domDocument  : DOMDocument|null
$xPath  : DOMXPath|null
fromDomDocument()  : static
Builds a new instance from the given DOM document.
fromHtml()  : static
Builds a new instance from the given HTML.
getDomDocument()  : DOMDocument
Provides access to the internal DOMDocument representation of the HTML in its current state.
removeElementsWithDisplayNone()  : $this
Removes elements that have a "display: none;" style.
removeRedundantClasses()  : $this
Removes classes that are no longer required (e.g. because there are no longer any CSS rules that reference them) from `class` attributes.
removeRedundantClassesAfterCssInlined()  : $this
After CSS has been inlined, there will likely be some classes in `class` attributes that are no longer referenced by any remaining (uninlinable) CSS. This method removes such classes.
render()  : string
Renders the normalized and processed HTML.
renderBodyContent()  : string
Renders the content of the BODY element of the normalized and processed HTML.
getHtmlElement()  : DOMElement
Returns the HTML element.
getXPath()  : DOMXPath
__construct()  : mixed
The constructor.
addContentTypeMetaTag()  : string
Adds a Content-Type meta tag for the charset.
createRawDomDocument()  : void
Creates a DOMDocument instance from the given HTML and stores it in $this->domDocument.
createUnifiedDomDocument()  : void
Creates a DOM document from the given HTML and stores it in $this->domDocument.
ensureDocumentType()  : string
Makes sure that the passed HTML has a document type, with lowercase "html".
ensureExistenceOfBodyElement()  : void
Checks that $this->domDocument has a BODY element and adds it if it is missing.
ensurePhpUnrecognizedSelfClosingTagsAreXml()  : string
Makes sure that any self-closing tags not recognized as such by PHP's DOMDocument implementation have a self-closing slash.
getBodyElement()  : DOMElement
Returns the BODY element.
hasContentTypeMetaTagInHead()  : bool
Tests whether the given HTML has a valid `Content-Type` metadata element within the `<head>` element. Due to tag omission rules, HTML parsers are expected to end the `<head>` element and start the `<body>` element upon encountering a start tag for any element which is permitted only within the `<body>`.
hasEndOfHeadElement()  : bool
Tests whether the `<head>` element ends within the given HTML. Due to tag omission rules, HTML parsers are expected to end the `<head>` element and start the `<body>` element upon encountering a start tag for any element which is permitted only within the `<body>`.
normalizeDocumentType()  : string
Makes sure the document type in the passed HTML has lowercase "html".
prepareHtmlForDomConversion()  : string
Returns the HTML with added document type, Content-Type meta tag, and self-closing slashes, if needed, ensuring that the HTML will be good for creating a DOM document from it.
removeClassAttributeFromElements()  : void
Removes the `class` attribute from each element in `$elements`.
removeClassesFromElements()  : void
Removes classes from the `class` attribute of each element in `$elements`, except any in `$classesToKeep`, removing the `class` attribute itself if the resultant list is empty.
removeHtmlComments()  : string
Removes comments from the given HTML, including any which are unterminated, for which the remainder of the string is removed.
removeHtmlTemplateElements()  : string
Removes `<template>` elements from the given HTML, including any without an end tag, for which the remainder of the string is removed.
removeSelfClosingTagsClosingTags()  : string
Eliminates any invalid closing tags for void elements from the given HTML.
setDomDocument()  : void
setHtml()  : void
Sets the HTML to process.

Constants

TAGNAME_ALLOWED_BEFORE_BODY_MATCHER

Regular expression part to match tag names that may appear before the start of the `<body>` element. A start tag for any other element would implicitly start the `<body>` element due to tag omission rules.

protected string TAGNAME_ALLOWED_BEFORE_BODY_MATCHER = '(?:html|head|base|command|link|meta|noscript|script|style|template|title)'

DISPLAY_NONE_MATCHER

We need to look for display:none, but we need to do a case-insensitive search. Since DOMDocument only supports XPath 1.0, lower-case() isn't available to us. We've thus far only set attributes to lowercase, not attribute values. Consequently, we need to translate() the letters that would be in 'NONE' ("NOE") to lowercase.

private string DISPLAY_NONE_MATCHER = '//*[@style and contains(translate(translate(@style," ",""),"NOE","noe"),"display:none")' . ' and not(@class and contains(concat(" ", normalize-space(@class), " "), " -emogrifier-keep "))]'

Properties

Methods

fromHtml()

Builds a new instance from the given HTML.

public static fromHtml(string $unprocessedHtml) : static
Parameters
$unprocessedHtml : string

raw HTML, must be UTF-encoded, must not be empty

Tags
throws
InvalidArgumentException

if $unprocessedHtml is anything other than a non-empty string

Return values
static

removeRedundantClasses()

Removes classes that are no longer required (e.g. because there are no longer any CSS rules that reference them) from `class` attributes.

public removeRedundantClasses([array<array-key, string> $classesToKeep = [] ]) : $this

Note that this does not inspect the CSS, but expects to be provided with a list of classes that are still in use.

This method also has the (presumably beneficial) side-effect of minifying (removing superfluous whitespace from) class attributes.

Parameters
$classesToKeep : array<array-key, string> = []

names of classes that should not be removed

Return values
$this

removeRedundantClassesAfterCssInlined()

After CSS has been inlined, there will likely be some classes in `class` attributes that are no longer referenced by any remaining (uninlinable) CSS. This method removes such classes.

public removeRedundantClassesAfterCssInlined(CssInliner $cssInliner) : $this

Note that it does not inspect the remaining CSS, but uses information readily available from the CssInliner instance about the CSS rules that could not be inlined.

Parameters
$cssInliner : CssInliner

object instance that performed the CSS inlining

Tags
throws
BadMethodCallException

if inlineCss has not first been called on $cssInliner

Return values
$this

createUnifiedDomDocument()

Creates a DOM document from the given HTML and stores it in $this->domDocument.

private createUnifiedDomDocument(string $html) : void

The DOM document will always have a BODY element and a document type.

Parameters
$html : string
Return values
void

ensurePhpUnrecognizedSelfClosingTagsAreXml()

Makes sure that any self-closing tags not recognized as such by PHP's DOMDocument implementation have a self-closing slash.

private ensurePhpUnrecognizedSelfClosingTagsAreXml(string $html) : string
Parameters
$html : string
Return values
stringHTML with problematic tags converted.

hasContentTypeMetaTagInHead()

Tests whether the given HTML has a valid `Content-Type` metadata element within the `<head>` element. Due to tag omission rules, HTML parsers are expected to end the `<head>` element and start the `<body>` element upon encountering a start tag for any element which is permitted only within the `<body>`.

private hasContentTypeMetaTagInHead(string $html) : bool
Parameters
$html : string
Return values
bool

hasEndOfHeadElement()

Tests whether the `<head>` element ends within the given HTML. Due to tag omission rules, HTML parsers are expected to end the `<head>` element and start the `<body>` element upon encountering a start tag for any element which is permitted only within the `<body>`.

private hasEndOfHeadElement(string $html) : bool
Parameters
$html : string
Tags
throws
RuntimeException
Return values
bool

prepareHtmlForDomConversion()

Returns the HTML with added document type, Content-Type meta tag, and self-closing slashes, if needed, ensuring that the HTML will be good for creating a DOM document from it.

private prepareHtmlForDomConversion(string $html) : string
Parameters
$html : string
Return values
stringthe unified HTML

removeClassAttributeFromElements()

Removes the `class` attribute from each element in `$elements`.

private removeClassAttributeFromElements(DOMNodeList $elements) : void
Parameters
$elements : DOMNodeList
Return values
void

removeClassesFromElements()

Removes classes from the `class` attribute of each element in `$elements`, except any in `$classesToKeep`, removing the `class` attribute itself if the resultant list is empty.

private removeClassesFromElements(DOMNodeList $elements, array<array-key, string> $classesToKeep) : void
Parameters
$elements : DOMNodeList
$classesToKeep : array<array-key, string>
Return values
void

removeHtmlComments()

Removes comments from the given HTML, including any which are unterminated, for which the remainder of the string is removed.

private removeHtmlComments(string $html) : string
Parameters
$html : string
Tags
throws
RuntimeException
Return values
string

removeHtmlTemplateElements()

Removes `<template>` elements from the given HTML, including any without an end tag, for which the remainder of the string is removed.

private removeHtmlTemplateElements(string $html) : string
Parameters
$html : string
Tags
throws
RuntimeException
Return values
string