ladybird

mirror of https://github.com/LadybirdBrowser/ladybird.git synced 2025-06-11 02:13:56 +09:00

Author	SHA1	Message	Date
Andrew Kaster	2fa9ec20bd	LibUnicode: Prefix AK::Duration with AK Namespace	2024-07-18 09:43:38 +01:00
Andrew Kaster	bd97442771	Meta: Add vulkan and vulkan-headers to vcpkg dependencies Also require a specific ICU version to not run into unexpected problems.	2024-07-06 01:44:58 +02:00
Timothy Flynn	672a555f98	LibCore+LibJS+LibUnicode: Port retrieving time zone offsets to ICU The changes to tests are due to LibTimeZone incorrectly interpreting time stamps in the TZDB. The TZDB will list zone transitions in either UTC or the zone's local time (which is then subject to DST offsets). LibTimeZone did not handle the latter at all. For example: The following rule is in effect until November 18, 6PM UTC. America/Chicago -5:50:36 - LMT 1883 Nov 18 18:00u The following rule is in effect until March 1, 2AM in Chicago time. But at that time, a DST transition occurs, so the local time is actually 3AM. America/Chicago -6:00 Chicago C%sT 1936 Mar 1 2:00	2024-06-26 10:14:02 +02:00
Timothy Flynn	1b2d47e6bb	LibJS+LibUnicode: Port retrieving available regional time zones to ICU	2024-06-26 10:14:02 +02:00
Timothy Flynn	4fc0fba646	LibCore+LibJS+LibUnicode: Port retrieving available time zones to ICU This required updating some LibJS spec steps to their latest versions, as the data expected by the old steps does not quite match the APIs that are available with the ICU. The new spec steps are much more aligned.	2024-06-26 10:14:02 +02:00
Timothy Flynn	d3e809bcd4	LibJS+LibUnicode: Port retrieving the system time zone to ICU	2024-06-26 10:14:02 +02:00
Timothy Flynn	c379b35798	LibUnicode: Move helper to convert StringEnumeration to a list to ICU.h This will be needed outside of UnicodeKeywords.cpp.	2024-06-26 10:14:02 +02:00
Andrew Kaster	a587eafbf4	CMake: Consistently use imported targets for third party dependencies	2024-06-25 17:15:42 -04:00
Timothy Flynn	ebdb92eef6	LibUnicode+Everywhere: Merge LibLocale back into LibUnicode LibLocale was split off from LibUnicode a couple years ago to reduce the number of applications on SerenityOS that depend on CLDR data. Now that we use ICU, both LibUnicode and LibLocale are actually linking in this data. And since vcpkg gives us static libraries, both libraries are over 30MB in size. This patch reverts the separation and merges LibLocale into LibUnicode again. We now have just one library that includes the ICU data. Further, this will let LibUnicode share the locale cache that previously would only exist in LibLocale.	2024-06-23 19:52:45 +02:00
Timothy Flynn	9220a89d2f	CI+LibUnicode: Remove the UCD from the system	2024-06-22 14:56:39 +02:00
Timothy Flynn	2ba7b4c529	LibUnicode: Remove now-unused code generator facilities	2024-06-22 14:56:39 +02:00
Timothy Flynn	069bed5d47	LibUnicode+LibGfx: Remove superfluous emoji metadata For SerenityOS, we parse emoji metadata from the UCD to learn emoji groups, subgroups, names, etc. We used this information only in the emoji picker dialog. It is entirely unused within Ladybird. This removes our dependence on the UCD emoji file, as we no longer need any of its information. All we need to know is the file path to our custom emoji, which we get from Meta/emoji-file-list.txt.	2024-06-22 14:56:39 +02:00
Timothy Flynn	aa3a30870b	LibUnicode: Replace code point bidirectional classes with ICU	2024-06-22 14:56:39 +02:00
Timothy Flynn	e77dafc987	LibUnicode: Replace code point scripts and script extensions with ICU	2024-06-22 14:56:39 +02:00
Timothy Flynn	986ff984cc	LibUnicode: Replace code point general categories with ICU	2024-06-22 14:56:39 +02:00
Timothy Flynn	c804bda5fd	LibUnicode: Replace code point properties with ICU	2024-06-22 14:56:39 +02:00
Timothy Flynn	ab56b8c8dc	LibUnicode: Remove the locale-unaware text segmentation implementation	2024-06-20 13:46:54 +02:00
Timothy Flynn	5cf818e305	LibUnicode: Replace case transformations and comparison with ICUs There are a couple of differences here due to using ICU: 1. Titlecasing behaves slightly differently. We previously transformed "123dollars" to "123Dollars", as we would use word segmentation to split a string into words, then transform the first cased character to titlecase. ICU doesn't go quite that far, and leaves the string as "123dollars". While this is a behavior change, the only user of this API is the `text-transform: capitalize;` CSS rule, and we now match the behavior of other browsers. 2. There isn't an API to compare strings with case insensitivity without allocating case-folded strings for both the left- and right-hand-side strings. Our implementation was previously allocation-free; however, in a benchmark, ICU is still ~1.4x faster.	2024-06-20 10:59:55 +02:00
Timothy Flynn	8d7216f4e0	LibUnicode: Replace IDNA ASCII conversion with ICU	2024-06-18 21:07:56 +02:00
Timothy Flynn	83475c5380	LibUnicode: Replace Unicode string normalization with ICU In a benchmark, ICU's implementation was over 3x faster than ours.	2024-06-18 21:07:56 +02:00
Timothy Flynn	1feef17bf7	LibUnicode: Remove completely unused code point name & block name data These were used for e.g. the Character Map on Serenity, but are not used at all for Ladybird.	2024-06-18 21:07:56 +02:00
Timothy Flynn	fe3fde2411	AK+LibUnicode: Implement a case-insensitive variant of find_byte_offset The existing String::find_byte_offset is case-sensitive. This variant allows performing searches using Unicode-aware case folding.	2024-06-01 07:37:54 +02:00
Andreas Kling	df547bb321	LibUnicode: Avoid redundant UTF-8 validation in AK::String helpers	2024-04-21 19:32:49 +02:00
Idan Horowitz	945c58c7c1	LibUnicode: Generate and use code point composition mappings These allow us to binary search the code point compositions based on the first code point being combined, which makes the search close to O(log N) instead of O(N).	2024-04-06 14:21:04 -04:00
Idan Horowitz	e227bf0f71	LibUnicode: Optimize the canonical composition algorithm implementation It now takes O(N) time instead of O(N^2) time. Additionally some always false conditions are removed.	2024-04-06 14:21:04 -04:00
Timothy Flynn	576c2f4f4d	LibURL+LibUnicode+LibWebView: Handle punycode directly in LibURL We had defined punycode handling in LibUnicode when LibURL (AK at the time) was unable to depend on LibUnicode. This is no longer the case.	2024-03-26 12:25:21 -04:00
Shannon Booth	e800605ad3	AK+LibURL: Move AK::URL into a new URL library This URL library ends up being a relatively fundamental base library of the system, as LibCore depends on LibURL. This change has two main benefits: * Moving AK back more towards being an agnostic library that can be used between the kernel and userspace. URL has never really fit that description - and is not used in the kernel. * URL _should_ depend on LibUnicode, as it needs punnycode support. However, it's not really possible to do this inside of AK as it can't depend on any external library. This change brings us a little closer to being able to do that, but unfortunately we aren't there quite yet, as the code generators depend on LibCore.	2024-03-18 14:06:28 -04:00
Timothy Flynn	aa0a6d58b2	Userland: Remove LibCore dependency from libraries that do not use it	2024-01-22 08:48:34 -05:00
Ali Mohammad Pur	5e1499d104	Everywhere: Rename {Deprecated => Byte}String This commit un-deprecates DeprecatedString, and repurposes it as a byte string. As the null state has already been removed, there are no other particularly hairy blockers in repurposing this type as a byte string (what it _really_ is). This commit is auto-generated: $ xs=$(ack -l \bDeprecatedString\b\\|deprecated_string AK Userland \ Meta Ports Ladybird Tests Kernel) $ perl -pie 's/\bDeprecatedString\b/ByteString/g; s/deprecated_string/byte_string/g' $xs $ clang-format --style=file -i \ $(git diff --name-only \| grep \.cpp\\|\.h) $ gn format $(git ls-files '.gn' '.gni')	2023-12-17 18:25:10 +03:30
Timothy Flynn	43e9dc0500	LibUnicode: Use weak symbols to provide default IDNA defintions Rather than using #ifdef blocks, update the fallback IDNA definitions to use weak symbols to match the rest of LibUnicode / LibLocale.	2023-12-10 10:19:14 -05:00
Timothy Flynn	1f0e24bc3b	LibUnicode: Fix compilation when ENABLE_UNICODE_DATABASE_DOWNLOAD is OFF	2023-12-10 10:19:14 -05:00
Simon Wanner	58f08107b0	AK+LibUnicode: Add Unicode::create_unicode_url This is a workaround for the fact that AK::URLParser can't call into LibUnicode directly.	2023-12-10 08:04:58 -05:00
Simon Wanner	5bcb019106	LibUnicode: Add IDNA::to_ascii This implements the ToASCII operation of Unicode Technical Standard 46	2023-12-10 08:04:58 -05:00
Simon Wanner	7d9fe44039	LibUnicode: Download and parse IDNA data	2023-12-10 08:04:58 -05:00
Simon Wanner	cfd0a60863	LibUnicode: Add Punycode::encode	2023-12-10 08:04:58 -05:00
Simon Wanner	299d35aadc	LibUnicode: Add Punycode::decode	2023-12-10 08:04:58 -05:00
Shannon Booth	d777b279e3	LibUnicode+Tests: Remove now unused `to_unicode_*_full` methods Relocating all of the tests for these in LibUnicode over to the AK String testsuite.	2023-11-28 17:15:27 -05:00
Shannon Booth	6b32a1f18f	AK+LibUnicode: Expose TrailingCodePointTransformation in to_titlecase Relocating the definition of this enum from LibUnicode to AK.	2023-11-28 17:15:27 -05:00
Timothy Flynn	6070df40f3	LibUnicode: Define case-insensitive string comparison more generically The only user is currently String::equals_ignoring_case, but LibRegex will need to do the same case-folded comparison with UTF-32 data. As it turns out, the comparison works with all Unicode view types without much fuss.	2023-11-08 12:54:26 -05:00
Cr4xy	bbfe0d3a82	LibWeb: Implement `text-transform: capitalize`	2023-10-03 09:47:17 -04:00
Timothy Flynn	139c575cc9	LibUnicode: Update to Unicode version 15.1.0 https://unicode.org/versions/Unicode15.1.0/ This update includes a new set of code point properties, Indic Conjunct Break. These may have the values Consonant, Linker, or Extend. These are used in text segmentation to prevent breaking on some extended grapheme cluster sequences.	2023-09-15 18:30:26 +02:00
Timothy Flynn	02a8683266	LibUnicode+LibJS: Stop propagating small OOM errors from normalization This API only perform small allocations, and is only used by LibJS.	2023-09-09 13:03:25 -04:00
Sam Atkins	0d021a63c7	LibUnicode: Generate data for bidirectional character types This will let us examine code points to determine the rtl/ltr direction of a piece of text.	2023-08-20 16:21:35 -04:00
Timothy Flynn	456211932f	LibUnicode: Perform code point case conversion lookups in constant time Similar to commit `0652cc4`, we now generate 2-stage lookup tables for case conversion information. Only about 1500 code points are actually cased. This means that case information is rather highly compressible, as the blocks we break the code points into will generally all have no casing information at all. In total, this change: * Does not change the size of libunicode.so (which is nice because, generally, the 2-stage lookup tables are expected to trade a bit of size for performance). * Reduces the runtime of the new benchmark test case added here from 1.383s to 1.127s (about an 18.5% improvement).	2023-07-28 05:28:50 +02:00
Timothy Flynn	cb128dcf75	LibUnicode: Move the CodePointRangeComparator struct to a public header Move it out of the generated code so that it may be used by the code generator itself.	2023-07-26 08:36:20 +02:00
Timothy Flynn	c950f88611	LibUnicode: Stop generating Block property data We started generating this data in commit `0505e03`, but it was unused. It's still not used, so let's remove it, rather than bloating the size of libunicode.so with unused data. If we need it in the future, it's trivial to add back. Note we have always used the block name data from that commit, and that is still present here.	2023-07-26 08:36:20 +02:00
Timothy Flynn	1393ed2000	AK+LibUnicode: Implement String::equals_ignoring_case without allocating We currently fully casefold the left- and right-hand sides to compare two strings with case-insensitivity. Now, we casefold one code point at a time, storing the result in a view for comparison, until we exhaust both strings.	2023-03-08 18:57:53 +00:00
Timothy Flynn	f8a0365002	LibUnicode: Detect ZWJ sequences when filtering by emoji presentation This was preventing some unqualified emoji sequences from rendering properly, such as the custom SerenityOS flag. We rendered the flag correctly when given the fully qualified sequence: U+1F3F3 U+FEOF U+200D U+1F41E But were not detecting the unqualified sequence as an emoji when also filtering for emoji-presentation sequences: U+1F3F3 U+200D U+1F41E	2023-03-05 20:21:57 +01:00
Timothy Flynn	42c272c059	LibUnicode: Allow ignoring text presentation emoji in sequence detection This adds an option to only detect emoji that should always present as emoji. For example, the copyright symbol (unless followed by an emoji presentation selector) should render as text.	2023-02-28 13:22:58 +00:00
Timothy Flynn	fa96811a22	LibUnicode: Skip over emoji sequences in grapheme boundary segmentation Emoji sequences in the grapheme segmentation spec are a bit tricky: \p{Extended_Pictographic} Extend* ZWJ × \p{Extended_Pictographic} Our current strategy of tracking a boolean to indicate if we are in an emoji sequence was causing us to break up emoji made of multiple sub- sequences. For example, in the "family: man, woman, girl, boy" sequence: U+1F468 U+200D U+1F469 U+200D U+1F467 U+200D U+1F466 We would break at indices 0 (correctly) and 6 (incorrectly). Instead of tracking a boolean, it's quite a bit simpler to reason about emoji sequences by just skipping past them entirely. Note that in cases like the above emoji, we skip one sub-sequence at a time.	2023-02-25 22:23:39 +01:00

1 2 3 4 5 ...

302 commits