This removes another Match member that required destruction. The "API"
for accessing the strings is definitely a bit awkward. We'll think of
something nicer eventually.
This mode made a lot of incorrect assumptions about string lifetimes,
and instead of fixing it, let's just remove it and tweak the few unit
tests that used it.
If some state has already been tried, skip over it as it would never
lead to a match regardless.
This fixes performance/memory issues in cases like
/(a+)+b/.exec("aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa")
or
/(a|a?)+b/...
Fixes#2622.
Due to optimiser shenanigans in the tree alternative form, some
JumpNonEmpty ops might be moved before their Checkpoint instruction.
It is safe to assume the distance between the nonexistent checkpoint and
the current op is zero, so just do that.
Take record of the named capture group prior to parsing the group's
body. This requires removal of the recorded minimum length of the named
capture group directly, and now needs to be looked up via the group
minimu lengths table.
fe46b2c141 added the reset-temp-inverse flag, but set it up so all
tempinverse ops were negated at the start of the next op; this commit
makes it so these flags actually persist for one op and not zero.
Fixes#2296.
This commit un-deprecates DeprecatedString, and repurposes it as a byte
string.
As the null state has already been removed, there are no other
particularly hairy blockers in repurposing this type as a byte string
(what it _really_ is).
This commit is auto-generated:
$ xs=$(ack -l \bDeprecatedString\b\|deprecated_string AK Userland \
Meta Ports Ladybird Tests Kernel)
$ perl -pie 's/\bDeprecatedString\b/ByteString/g;
s/deprecated_string/byte_string/g' $xs
$ clang-format --style=file -i \
$(git diff --name-only | grep \.cpp\|\.h)
$ gn format $(git ls-files '*.gn' '*.gni')
This takes the previous alternation optimisation and applies it to all
the alternation blocks instead of just the few instructions at the
start.
By generating a trie of instructions, all logically equivalent
instructions will be consolidated into a single node, allowing the
engine to avoid checking the same thing multiple times.
For instance, given the pattern /abc|ac|ab/, this optimisation would
generate the following tree:
- a
| - b
| | - c
| | | - <accept>
| | - <accept>
| - c
| | - <accept>
which will attempt to match 'a' or 'b' only once, and would also limit
the number of backtrackings performed in case alternatives fails to
match.
This optimisation is currently gated behind a simple cost model that
estimates the number of instructions generated, which is pessimistic for
small patterns, though the change in performance in such patterns is not
particularly large.
The consume(size_t) overload consumes "at most" as many bytes as
requested, but consume() consumes exactly one byte.
This commit makes sure to avoid consuming past EOF.
Fixes#18324.
Fixes#18325.
If a block jumps before performing a compare, we'd need to recursively
find the first of the jumped-to block. While this is doable, it's not
really worth spending the time as most such cases won't actually qualify
for atomic loop rewrite anyway.
Fixes an invalid rewrite when `.+` is followed by an alternation, e.g.
/.+(a|b|c)/.
Previously we were only checking for overlap when the range wasn't in
inverse mode, which made us miss things like /[^x]x/; this patch makes
it so we don't miss that.
Having an alias function that only wraps another one is silly, and
keeping the more obvious name should flush out more uses of deprecated
strings.
No behavior change.
Previously, if a pattern matched the empty string (e.g. ".*"), it would
match the string twice instead of once. Among other issues, this caused
a Regex replacement to duplicate its expected output, since it would
replace "both" empty matches.
In 7c5e30daaa, the focus was "only" on
Userland/Libraries/, whereas this commit cleans up the remaining
headers in the repo, and any new badly-formatted include.
This will make it easier to support both string types at the same time
while we convert code, and tracking down remaining uses.
One big exception is Value::to_string() in LibJS, where the name is
dictated by the ToString AO.