mirror of
https://github.com/VSadov/Satori.git
synced 2025-06-09 09:34:49 +09:00
[browser][non-icu] HybridGlobalization
indexing (#85254)
* A bit faster version of indexing. WIP * Tiny speedup. * Fixed IndexOf, ToDo: LastIndexOf. * All tests pass. * Updated docs. * Update docs. * Slicing + saving previous absolute index instead of pushing the iterator to the start position. * Refactored. * Fixed tests on browser. * Str1 and str2 was confusing. * Fix CI- correctly trimming Hybrid properties. * Previous commit should target only Browser. * Applied @mkhamoyan's suggestion to avoid code duplication. * Applied @pavelsavara's review. * Get rid of build errors. * Revert.
This commit is contained in:
parent
1ffe321285
commit
6022b3e0c3
19 changed files with 384 additions and 101 deletions
|
@ -198,3 +198,50 @@ Web API does not expose locale-sensitive endsWith/startsWith function. As a work
|
|||
|
||||
- `IgnoreSymbols`
|
||||
Only comparisons that do not skip character types are allowed. E.g. `IgnoreSymbols` skips symbol-chars in comparison/indexing. All `CompareOptions` combinations that include `IgnoreSymbols` throw `PlatformNotSupportedException`.
|
||||
|
||||
|
||||
**String indexing**
|
||||
|
||||
Affected public APIs:
|
||||
- CompareInfo.IndexOf
|
||||
- CompareInfo.LastIndexOf
|
||||
- String.IndexOf
|
||||
- String.LastIndexOf
|
||||
|
||||
Web API does not expose locale-sensitive indexing function. There is a discussion on adding it: https://github.com/tc39/ecma402/issues/506. In the current state, as a workaround, locale-sensitive string segmenter combined with locale-sensitive comparison is used. This approach, beyond having the same compare option limitations as described under **String comparison**, has additional limitations connected with the workaround used. Information about additional limitations:
|
||||
|
||||
- Support depends on [`Intl.segmenter's support`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Intl/Segmenter#browser_compatibility).
|
||||
|
||||
- `IgnoreSymbols`
|
||||
|
||||
Only comparisons that ignore types of characters but do not skip them are allowed. E.g. `IgnoreCase` ignores type (case) of characters but `IgnoreSymbols` skips symbol-chars in comparison/indexing. All `CompareOptions` combinations that include `IgnoreSymbols` throw `PlatformNotSupportedException`.
|
||||
|
||||
- Some letters consist of more than one grapheme.
|
||||
|
||||
Using locale-sensitive segmenter `Intl.Segmenter(locale, { granularity: "grapheme" })` does not guarantee that string will be segmented by letters but by graphemes. E.g. in `cs-CZ` and `sk-SK` "ch" is 1 letter, 2 graphemes. The following code with `HybridGlobalization` switched off returns -1 (not found) while with `HybridGlobalization` switched on, it returns 1.
|
||||
|
||||
``` C#
|
||||
new CultureInfo("sk-SK").CompareInfo.IndexOf("ch", "h"); // -1 or 1
|
||||
```
|
||||
|
||||
- Some graphemes consist of more than one character.
|
||||
E.g. `\r\n` that represents two characters in C#, is treated as one grapheme by the segmenter:
|
||||
|
||||
``` JS
|
||||
const segmenter = new Intl.Segmenter(undefined, { granularity: "grapheme" });
|
||||
Array.from(segmenter.segment("\r\n")) // {segment: '\r\n', index: 0, input: '\r\n'}
|
||||
```
|
||||
|
||||
Because we are comparing grapheme-by-grapheme, character `\r` or character `\n` will not be found in `\r\n` string when `HybridGlobalization` is switched on.
|
||||
|
||||
- Some graphemes have multi-grapheme equivalents.
|
||||
E.g. in `de-DE` ß (%u00DF) is one letter and one grapheme and "ss" is one letter and is recognized as two graphemes. Web API's equivalent of `IgnoreNonSpace` treats them as the same letter when comparing. Similar case: dz (%u01F3) and dz.
|
||||
``` JS
|
||||
"ß".localeCompare("ss", "de-DE", { sensitivity: "case" }); // 0
|
||||
```
|
||||
|
||||
Using `IgnoreNonSpace` for these two with `HybridGlobalization` off, also returns 0 (they are equal). However, the workaround used in `HybridGlobalization` will compare them grapheme-by-grapheme and will return -1.
|
||||
|
||||
``` C#
|
||||
new CultureInfo("de-DE").CompareInfo.IndexOf("strasse", "stra\u00DFe", 0, CompareOptions.IgnoreNonSpace); // 0 or -1
|
||||
```
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue