WooCommerce Code Reference

AbstractHtmlProcessor
in package

Base class for HTML processor that e.g., can remove, add or modify nodes or attributes.

The "vanilla" subclass is the HtmlNormalizer.

Table of Contents

CONTENT_TYPE_META_TAG  = '<meta http-equiv="Content-Type" content="text/html; charset=utf-8">'
DEFAULT_DOCUMENT_TYPE  = '<!DOCTYPE html>'
HTML_COMMENT_PATTERN  = '/<!--[^-]*+(?:-(?!->)[^-]*+)*+(?:-->|$)/'
regular expression pattern to match an HTML comment, including delimiters and modifiers
HTML_TEMPLATE_ELEMENT_PATTERN  = '%<template[\s>][^<]*+(?:<(?!/template>)[^<]*+)*+(?:</template>|$)%i'
regular expression pattern to match an HTML `<template>` element, including delimiters and modifiers
PHP_UNRECOGNIZED_VOID_TAGNAME_MATCHER  = '(?:command|embed|keygen|source|track|wbr)'
TAGNAME_ALLOWED_BEFORE_BODY_MATCHER  = '(?:html|head|base|command|link|meta|noscript|script|style|template|title)'
Regular expression part to match tag names that may appear before the start of the `<body>` element. A start tag for any other element would implicitly start the `<body>` element due to tag omission rules.
$domDocument  : DOMDocument|null
$xPath  : DOMXPath|null
fromDomDocument()  : static
Builds a new instance from the given DOM document.
fromHtml()  : static
Builds a new instance from the given HTML.
getDomDocument()  : DOMDocument
Provides access to the internal DOMDocument representation of the HTML in its current state.
render()  : string
Renders the normalized and processed HTML.
renderBodyContent()  : string
Renders the content of the BODY element of the normalized and processed HTML.
getHtmlElement()  : DOMElement
Returns the HTML element.
getXPath()  : DOMXPath
__construct()  : mixed
The constructor.
addContentTypeMetaTag()  : string
Adds a Content-Type meta tag for the charset.
createRawDomDocument()  : void
Creates a DOMDocument instance from the given HTML and stores it in $this->domDocument.
createUnifiedDomDocument()  : void
Creates a DOM document from the given HTML and stores it in $this->domDocument.
ensureDocumentType()  : string
Makes sure that the passed HTML has a document type, with lowercase "html".
ensureExistenceOfBodyElement()  : void
Checks that $this->domDocument has a BODY element and adds it if it is missing.
ensurePhpUnrecognizedSelfClosingTagsAreXml()  : string
Makes sure that any self-closing tags not recognized as such by PHP's DOMDocument implementation have a self-closing slash.
getBodyElement()  : DOMElement
Returns the BODY element.
hasContentTypeMetaTagInHead()  : bool
Tests whether the given HTML has a valid `Content-Type` metadata element within the `<head>` element. Due to tag omission rules, HTML parsers are expected to end the `<head>` element and start the `<body>` element upon encountering a start tag for any element which is permitted only within the `<body>`.
hasEndOfHeadElement()  : bool
Tests whether the `<head>` element ends within the given HTML. Due to tag omission rules, HTML parsers are expected to end the `<head>` element and start the `<body>` element upon encountering a start tag for any element which is permitted only within the `<body>`.
normalizeDocumentType()  : string
Makes sure the document type in the passed HTML has lowercase "html".
prepareHtmlForDomConversion()  : string
Returns the HTML with added document type, Content-Type meta tag, and self-closing slashes, if needed, ensuring that the HTML will be good for creating a DOM document from it.
removeHtmlComments()  : string
Removes comments from the given HTML, including any which are unterminated, for which the remainder of the string is removed.
removeHtmlTemplateElements()  : string
Removes `<template>` elements from the given HTML, including any without an end tag, for which the remainder of the string is removed.
removeSelfClosingTagsClosingTags()  : string
Eliminates any invalid closing tags for void elements from the given HTML.
setDomDocument()  : void
setHtml()  : void
Sets the HTML to process.

Constants

TAGNAME_ALLOWED_BEFORE_BODY_MATCHER

Regular expression part to match tag names that may appear before the start of the `<body>` element. A start tag for any other element would implicitly start the `<body>` element due to tag omission rules.

protected string TAGNAME_ALLOWED_BEFORE_BODY_MATCHER = '(?:html|head|base|command|link|meta|noscript|script|style|template|title)'

Properties

Methods

fromHtml()

Builds a new instance from the given HTML.

public static fromHtml(string $unprocessedHtml) : static
Parameters
$unprocessedHtml : string

raw HTML, must be UTF-encoded, must not be empty

Tags
throws
InvalidArgumentException

if $unprocessedHtml is anything other than a non-empty string

Return values
static

createUnifiedDomDocument()

Creates a DOM document from the given HTML and stores it in $this->domDocument.

private createUnifiedDomDocument(string $html) : void

The DOM document will always have a BODY element and a document type.

Parameters
$html : string
Return values
void

ensurePhpUnrecognizedSelfClosingTagsAreXml()

Makes sure that any self-closing tags not recognized as such by PHP's DOMDocument implementation have a self-closing slash.

private ensurePhpUnrecognizedSelfClosingTagsAreXml(string $html) : string
Parameters
$html : string
Return values
stringHTML with problematic tags converted.

hasContentTypeMetaTagInHead()

Tests whether the given HTML has a valid `Content-Type` metadata element within the `<head>` element. Due to tag omission rules, HTML parsers are expected to end the `<head>` element and start the `<body>` element upon encountering a start tag for any element which is permitted only within the `<body>`.

private hasContentTypeMetaTagInHead(string $html) : bool
Parameters
$html : string
Return values
bool

hasEndOfHeadElement()

Tests whether the `<head>` element ends within the given HTML. Due to tag omission rules, HTML parsers are expected to end the `<head>` element and start the `<body>` element upon encountering a start tag for any element which is permitted only within the `<body>`.

private hasEndOfHeadElement(string $html) : bool
Parameters
$html : string
Tags
throws
RuntimeException
Return values
bool

prepareHtmlForDomConversion()

Returns the HTML with added document type, Content-Type meta tag, and self-closing slashes, if needed, ensuring that the HTML will be good for creating a DOM document from it.

private prepareHtmlForDomConversion(string $html) : string
Parameters
$html : string
Return values
stringthe unified HTML

removeHtmlComments()

Removes comments from the given HTML, including any which are unterminated, for which the remainder of the string is removed.

private removeHtmlComments(string $html) : string
Parameters
$html : string
Tags
throws
RuntimeException
Return values
string

removeHtmlTemplateElements()

Removes `<template>` elements from the given HTML, including any without an end tag, for which the remainder of the string is removed.

private removeHtmlTemplateElements(string $html) : string
Parameters
$html : string
Tags
throws
RuntimeException
Return values
string