1
0
Fork 0
mirror of https://github.com/LadybirdBrowser/ladybird.git synced 2025-06-08 05:27:14 +09:00
Commit graph

122 commits

Author SHA1 Message Date
Ali Mohammad Pur
76f5dce3db LibRegex: Flatten capture group list in MatchState
This makes copying the capture group COWVector significantly cheaper,
as we no longer have to run any constructors for it - just memcpy.
2025-04-18 17:09:27 +02:00
Andreas Kling
96f1f15ad6 LibRegex: Remove unused Utf8View/Utf32View support in RegexStringView 2025-04-16 10:04:50 +02:00
Andreas Kling
5308d77600 LibRegex: Don't use Optional<T> inside regex::Match
This prevented Match from being trivially copyable, which we want it to
be for fast Vector copying.
2025-04-14 17:40:13 +02:00
Andreas Kling
54edf29f1b LibRegex: Make Match::capture_group_name an index into the string table
This removes another Match member that required destruction. The "API"
for accessing the strings is definitely a bit awkward. We'll think of
something nicer eventually.
2025-04-14 17:40:13 +02:00
Ali Mohammad Pur
69050da929 LibRegex: Merge inverse string table mappings separately 2025-04-06 20:21:16 +02:00
Ali Mohammad Pur
299b9ca572 LibRegex: Check backreference index before looking it up
If a backref happens after it's cleared, the slot may be cleared
already.
2025-04-06 20:21:16 +02:00
Andreas Kling
6b6d3b32a4 LibRegex: Remove the StringCopyMatches mode
This mode made a lot of incorrect assumptions about string lifetimes,
and instead of fixing it, let's just remove it and tweak the few unit
tests that used it.
2025-03-24 22:27:17 +00:00
mikiubo
c85df78c4c LibRegex: Remove orphaned save points in nested LookAhead 2025-03-17 16:11:02 +01:00
Tim Ledbetter
b9ac99d2eb Revert "LibRegex: Remove orphaned save points in nested LookAhead"
This reverts commit f2678bfcb8.
2025-03-14 19:57:33 +00:00
mikiubo
f2678bfcb8 LibRegex: Remove orphaned save points in nested LookAhead 2025-03-14 09:41:41 +01:00
Ali Mohammad Pur
a37315da87 Tests: Get rid of clang-format: off in the Regex tests
Should've done this a long time ago, but now is better than never.
2025-03-09 14:37:57 +01:00
Ali Mohammad Pur
5355710481 LibRegex: Don't treat single-jump blocks as noop in the optimizer 2025-03-09 14:37:57 +01:00
aplefull
389a63d6bf LibRegex: Allow duplicate named capture groups in separate alternatives 2025-03-05 14:36:09 +01:00
aplefull
61744322ad LibRegex: Ensure nullable quantifiers backtrack when input remains
Makes patterns like `/(a?b??)*/` correctly match the string
2025-03-02 15:19:04 +01:00
mikiubo
8a6f7b787e LibRegex: Use depth-first search in regex optimizer
use depth-first search in optimizer code bacause using breadth-first
search generate a bug. Add test example in test lib.
2025-02-25 00:09:20 +01:00
Ali Mohammad Pur
08ebfaff17 LibRegex: Take trailing inversion state into account in block comparison
Fixes #3421.
2025-02-01 11:30:02 +01:00
Ali Mohammad Pur
cce000d57c LibRegex: Don't repeat the same fork again
If some state has already been tried, skip over it as it would never
lead to a match regardless.
This fixes performance/memory issues in cases like
/(a+)+b/.exec("aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa")
or
/(a|a?)+b/...

Fixes #2622.
2025-01-17 10:13:51 +01:00
Ali Mohammad Pur
50733c564c LibRegex: Use the *actually* correct repeat start offset for Repeat
Fixes #2931 and various frequent crashes.
2024-12-23 13:13:52 +01:00
Ali Mohammad Pur
eee90f4aa2 LibRegex: Treat checks against nonexistent checkpoints as empty
Due to optimiser shenanigans in the tree alternative form, some
JumpNonEmpty ops might be moved before their Checkpoint instruction.
It is safe to assume the distance between the nonexistent checkpoint and
the current op is zero, so just do that.
2024-12-13 10:00:16 +01:00
Marc Jessome
efcaf991e6 LibRegex: Ensure nested capture groups have non-conflicting names
Take record of the named capture group prior to parsing the group's
body. This requires removal of the recorded minimum length of the named
capture group directly, and now needs to be looked up via the group
minimu lengths table.
2024-11-24 10:26:09 +01:00
Ali Mohammad Pur
5a4d657a4e LibRegex: Avoid generating ForkJumps when jumping to the next alt block
Fixes #2398.
2024-11-17 20:12:39 +01:00
Ali Mohammad Pur
00bc22c332 LibRegex: Don't immediately ignore TempInverse in optimizer
fe46b2c141 added the reset-temp-inverse flag, but set it up so all
tempinverse ops were negated at the start of the next op; this commit
makes it so these flags actually persist for one op and not zero.

Fixes #2296.
2024-11-17 09:03:29 -05:00
Ali Mohammad Pur
dabd60180f LibRegex: Don't ignore references that weren't bound in checked blocks
Fixes #2281.
2024-11-12 10:37:57 +01:00
Ali Mohammad Pur
00c45243bd LibRegex: Don't blindly accept inverted charclasses for atomic rewrite 2024-10-24 07:36:51 -04:00
Ali Mohammad Pur
cc1f0c3af2 LibRegex: Restore checkpoints when restoring the state post-fork
Fixes the lockup/OOM in #968.
2024-10-09 11:20:58 +02:00
Gingeh
de588a97c0 LibRegex: Only search start of line if pattern begins with ^ 2024-09-30 12:28:22 +02:00
Ali Mohammad Pur
27a38932da LibRegex: Account for extra explicit And/Or in class parser assertion
Fixes #23691.
2024-03-24 08:24:46 +01:00
Ali Mohammad Pur
e265d81277 LibRegex: Correct And/Or and inversion interplay semantics
This commit also fixes an incorrect test case from very early on, our
behaviour now matches the ECMA262 spec in this case.

Fixes #21786.
2024-01-11 11:36:09 +01:00
Ali Mohammad Pur
267040dde7 LibRegex: Error out on Eof when parsing nonempty class range elements
Fixes #22507.
2023-12-31 15:36:42 +01:00
Ali Mohammad Pur
5e1499d104 Everywhere: Rename {Deprecated => Byte}String
This commit un-deprecates DeprecatedString, and repurposes it as a byte
string.
As the null state has already been removed, there are no other
particularly hairy blockers in repurposing this type as a byte string
(what it _really_ is).

This commit is auto-generated:
  $ xs=$(ack -l \bDeprecatedString\b\|deprecated_string AK Userland \
    Meta Ports Ladybird Tests Kernel)
  $ perl -pie 's/\bDeprecatedString\b/ByteString/g;
    s/deprecated_string/byte_string/g' $xs
  $ clang-format --style=file -i \
    $(git diff --name-only | grep \.cpp\|\.h)
  $ gn format $(git ls-files '*.gn' '*.gni')
2023-12-17 18:25:10 +03:30
Timothy Flynn
e122039c99 LibRegex: Support non-ASCII case-insensitive character comparisons
Specifically, when the Unicode flag is set, use Unicode-aware case
folding to case-insensitively compare code points.
2023-11-08 12:54:26 -05:00
Timothy Flynn
3fbf33bd37 LibRegex: Change a couple function parameters to east-const
Automatically done by clang-format-17 (and clang-format-16 leaves these
alone afterwards).
2023-11-08 12:54:26 -05:00
Ali Mohammad Pur
4d71f4edc4 LibRegex: Don't add the Repeat instruction size to its jump target
This was causing the calculated jump target to become invalid, leading
to possibly invalid optimisations and (more likely) crashes.
Fixes #21047.
2023-09-15 18:07:23 +03:30
Ali Mohammad Pur
4d27257c45 LibRegex: Treat backwards jumps to IP 0 as normal backwards jumps too
This shows up in something like /\d+|x/, where the `+` ends up jumping
to the start of its own alternative.
2023-08-16 22:20:24 +03:30
Ali Mohammad Pur
e689422564 LibRegex: Keep track of instruction positions for backwards tree jumps 2023-08-05 16:40:04 +02:00
Ali Mohammad Pur
4e69eb89e8 LibRegex: Generate a search tree when patterns would benefit from it
This takes the previous alternation optimisation and applies it to all
the alternation blocks instead of just the few instructions at the
start.
By generating a trie of instructions, all logically equivalent
instructions will be consolidated into a single node, allowing the
engine to avoid checking the same thing multiple times.
For instance, given the pattern /abc|ac|ab/, this optimisation would
generate the following tree:
    - a
    | - b
    | | - c
    | | | - <accept>
    | | - <accept>
    | - c
    | | - <accept>
which will attempt to match 'a' or 'b' only once, and would also limit
the number of backtrackings performed in case alternatives fails to
match.

This optimisation is currently gated behind a simple cost model that
estimates the number of instructions generated, which is pessimistic for
small patterns, though the change in performance in such patterns is not
particularly large.
2023-07-31 05:31:33 +02:00
Timothy Flynn
8b668da9d5 LibRegex: Bail parsing class set characters upon early EOF
Otherwise, we reach a skip() invocation at the end of this function,
which crashes due to EOF. Caught by test262.
2023-06-23 20:22:45 +02:00
Ali Mohammad Pur
b1ca2e5e39 LibRegex: Do not treat repeats followed by fallthroughs as atomic 2023-06-14 06:41:17 +02:00
Ali Mohammad Pur
eba466b8e7 LibRegex: Avoid calling GenericLexer::consume() past EOF
The consume(size_t) overload consumes "at most" as many bytes as
requested, but consume() consumes exactly one byte.
This commit makes sure to avoid consuming past EOF.

Fixes #18324.
Fixes #18325.
2023-04-14 12:33:54 +02:00
Ali Mohammad Pur
6fc9f5fa28 LibRegex: Make ^ and $ accept all LineTerminators instead of just '\n'
Also adds a couple tests.
2023-03-25 15:44:05 +01:00
Ali Mohammad Pur
7f530c0753 LibRegex: Bail out of atomic rewrite if a block doesn't contain compares
If a block jumps before performing a compare, we'd need to recursively
find the first of the jumped-to block. While this is doable, it's not
really worth spending the time as most such cases won't actually qualify
for atomic loop rewrite anyway.
Fixes an invalid rewrite when `.+` is followed by an alternation, e.g.
/.+(a|b|c)/.
2023-02-15 10:14:26 +01:00
Ali Mohammad Pur
af441bb939 LibRegex: Consider the inverse=true case when finding pattern overlap
Previously we were only checking for overlap when the range wasn't in
inverse mode, which made us miss things like /[^x]x/; this patch makes
it so we don't miss that.
2023-02-15 10:14:26 +01:00
Ali Mohammad Pur
936a9fd759 LibRegex: Make '.' reject matching LF / LS / PS as per the ECMA262 spec
Previously we allowed it to match those, but the ECMA262 spec disallows
these (except in DotAll).
2023-02-15 10:14:26 +01:00
Ali Mohammad Pur
1e022295c4 Tests: Use .is_flag_set() instead of bitwise & in Regex flag tests
The default flag might not be zero, so don't assume masking off flags
will yield zero.
2023-02-15 10:14:26 +01:00
Linus Groh
6e7459322d AK: Remove StringBuilder::build() in favor of to_deprecated_string()
Having an alias function that only wraps another one is silly, and
keeping the more obvious name should flush out more uses of deprecated
strings.
No behavior change.
2023-01-27 20:38:49 +00:00
Timothy Flynn
1edb96376b AK+Everywhere: Make UTF-8 and UTF-32 to UTF-16 converters fallible
These could fail to allocate the underlying storage needed to store the
UTF-16 data. Propagate these errors.
2023-01-08 12:13:15 +01:00
Eli Youngs
87a961534f LibRegex: Prevent patterns from matching the empty string twice
Previously, if a pattern matched the empty string (e.g. ".*"), it would
match the string twice instead of once. Among other issues, this caused
a Regex replacement to duplicate its expected output, since it would
replace "both" empty matches.
2023-01-06 13:52:21 -07:00
Ben Wiederhake
8a331d4fa0 Everywhere: Move AK/Debug.h include to using files or remove 2023-01-02 20:27:20 -05:00
Ben Wiederhake
b83cb09db1 Everywhere: Fix badly-formatted includes
In 7c5e30daaa, the focus was "only" on
Userland/Libraries/, whereas this commit cleans up the remaining
headers in the repo, and any new badly-formatted include.
2023-01-02 11:06:15 -05:00
Linus Groh
57dc179b1f Everywhere: Rename to_{string => deprecated_string}() where applicable
This will make it easier to support both string types at the same time
while we convert code, and tracking down remaining uses.

One big exception is Value::to_string() in LibJS, where the name is
dictated by the ToString AO.
2022-12-06 08:54:33 +01:00