Lexer
in package
A lexer is a stateful stream generator, it returns the next token in the Source when advanced.
Assuming the source is valid, the final returned token will be EOF, after which the lexer will repeatedly return the same EOF token whenever called.
Algorithm is O(N) both on memory and time.
Tags
Table of Contents
- TOKEN_AMP = 38
- TOKEN_AT = 64
- TOKEN_BANG = 33
- TOKEN_BRACE_L = 123
- TOKEN_BRACE_R = 125
- TOKEN_BRACKET_L = 91
- TOKEN_BRACKET_R = 93
- TOKEN_COLON = 58
- TOKEN_DOLLAR = 36
- TOKEN_DOT = 46
- TOKEN_EQUALS = 61
- TOKEN_PAREN_L = 40
- TOKEN_PAREN_R = 41
- TOKEN_PIPE = 124
- $lastToken : Token
- The previously focused non-ignored token.
- $line : int
- The (1-indexed) line containing the current token.
- $lineStart : int
- The character offset at which the current line begins.
- $options : array<string|int, mixed>
- $source : Source
- $token : Token
- The currently focused non-ignored token.
- $byteStreamPosition : int
- Current cursor position for ASCII representation of the source.
- $position : int
- Current cursor position for UTF8 encoding of the source.
- __construct() : mixed
- advance() : Token
- lookahead() : Token
- assertValidBlockStringCharacterCode() : void
- assertValidStringCharacterCode() : void
- moveStringCursor() : self
- Moves internal string cursor position.
- positionAfterWhitespace() : void
- Reads from body starting at startPosition until it finds a non-whitespace or commented character, then places cursor to the position of that character.
- readBlockString() : Token
- Reads a block string token from the source file.
- readChar() : array{: string, : int|null, : int}
- Reads next UTF8Character from the byte stream, starting from $byteStreamPosition.
- readChars() : array{: string, : int}
- Reads next $numberOfChars UTF8 characters from the byte stream.
- readComment() : Token
- Reads a comment token from the source file.
- readDigits() : string
- Returns string with all digits + changes current string cursor position to point to the first char after digits.
- readName() : Token
- Reads an alphanumeric + underscore name from the source.
- readNumber() : Token
- Reads a number token from the source file, either a float or an int depending on whether a decimal point appears.
- readString() : Token
- readToken() : Token
- unexpectedCharacterMessage() : string
Constants
TOKEN_AMP
private
mixed
TOKEN_AMP
= 38
TOKEN_AT
private
mixed
TOKEN_AT
= 64
TOKEN_BANG
private
mixed
TOKEN_BANG
= 33
TOKEN_BRACE_L
private
mixed
TOKEN_BRACE_L
= 123
TOKEN_BRACE_R
private
mixed
TOKEN_BRACE_R
= 125
TOKEN_BRACKET_L
private
mixed
TOKEN_BRACKET_L
= 91
TOKEN_BRACKET_R
private
mixed
TOKEN_BRACKET_R
= 93
TOKEN_COLON
private
mixed
TOKEN_COLON
= 58
TOKEN_DOLLAR
private
mixed
TOKEN_DOLLAR
= 36
TOKEN_DOT
private
mixed
TOKEN_DOT
= 46
TOKEN_EQUALS
private
mixed
TOKEN_EQUALS
= 61
TOKEN_PAREN_L
private
mixed
TOKEN_PAREN_L
= 40
TOKEN_PAREN_R
private
mixed
TOKEN_PAREN_R
= 41
TOKEN_PIPE
private
mixed
TOKEN_PIPE
= 124
Properties
$lastToken
The previously focused non-ignored token.
public
Token
$lastToken
$line
The (1-indexed) line containing the current token.
public
int
$line
= 1
$lineStart
The character offset at which the current line begins.
public
int
$lineStart
= 0
$options
public
array<string|int, mixed>
$options
Tags
$source
public
Source
$source
$token
The currently focused non-ignored token.
public
Token
$token
$byteStreamPosition
Current cursor position for ASCII representation of the source.
private
int
$byteStreamPosition
= 0
$position
Current cursor position for UTF8 encoding of the source.
private
int
$position
= 0
Methods
__construct()
public
__construct(Source $source[, array<string|int, mixed> $options = [] ]) : mixed
Parameters
- $source : Source
- $options : array<string|int, mixed> = []
Tags
Return values
mixed —advance()
public
advance() : Token
Tags
Return values
Token —lookahead()
public
lookahead() : Token
Tags
Return values
Token —assertValidBlockStringCharacterCode()
private
assertValidBlockStringCharacterCode(int $code, int $position) : void
Parameters
- $code : int
- $position : int
Tags
Return values
void —assertValidStringCharacterCode()
private
assertValidStringCharacterCode(int $code, int $position) : void
Parameters
- $code : int
- $position : int
Tags
Return values
void —moveStringCursor()
Moves internal string cursor position.
private
moveStringCursor(int $positionOffset, int $byteStreamOffset) : self
Parameters
- $positionOffset : int
- $byteStreamOffset : int
Return values
self —positionAfterWhitespace()
Reads from body starting at startPosition until it finds a non-whitespace or commented character, then places cursor to the position of that character.
private
positionAfterWhitespace() : void
Return values
void —readBlockString()
Reads a block string token from the source file.
private
readBlockString(int $line, int $col, Token $prev) : Token
"""("?"?(\"""|\(?!=""")|[^"\]))*"""
Parameters
- $line : int
- $col : int
- $prev : Token
Tags
Return values
Token —readChar()
Reads next UTF8Character from the byte stream, starting from $byteStreamPosition.
private
readChar([bool $advance = false ][, int|null $byteStreamPosition = null ]) : array{: string, : int|null, : int}
Parameters
- $advance : bool = false
- $byteStreamPosition : int|null = null
Return values
array{: string, : int|null, : int} —readChars()
Reads next $numberOfChars UTF8 characters from the byte stream.
private
readChars(int $charCount) : array{: string, : int}
Parameters
- $charCount : int
Return values
array{: string, : int} —readComment()
Reads a comment token from the source file.
private
readComment(int $line, int $col, Token $prev) : Token
#[\u0009\u0020-\uFFFF]*
Parameters
- $line : int
- $col : int
- $prev : Token
Return values
Token —readDigits()
Returns string with all digits + changes current string cursor position to point to the first char after digits.
private
readDigits() : string
Tags
Return values
string —readName()
Reads an alphanumeric + underscore name from the source.
private
readName(int $line, int $col, Token $prev) : Token
[_A-Za-z][_0-9A-Za-z]*
Parameters
- $line : int
- $col : int
- $prev : Token
Return values
Token —readNumber()
Reads a number token from the source file, either a float or an int depending on whether a decimal point appears.
private
readNumber(int $line, int $col, Token $prev) : Token
Int: -?(0|[1-9][0-9]) Float: -?(0|[1-9][0-9])(.[0-9]+)?((E|e)(+|-)?[0-9]+)?
Parameters
- $line : int
- $col : int
- $prev : Token
Tags
Return values
Token —readString()
private
readString(int $line, int $col, Token $prev) : Token
Parameters
- $line : int
- $col : int
- $prev : Token
Tags
Return values
Token —readToken()
private
readToken(Token $prev) : Token
Parameters
- $prev : Token
Tags
Return values
Token —unexpectedCharacterMessage()
private
unexpectedCharacterMessage(int|null $code) : string
Parameters
- $code : int|null
