Satori

mirror of https://github.com/VSadov/Satori.git synced 2025-06-08 03:27:04 +09:00

Author	SHA1	Message	Date
Jakob Botsch Nielsen	0c67acb240	JIT: Sign/zero-extend small primitive arguments for pinvokes on arm32 and Apple arm64 (#106314 ) Fix #101046	2024-08-15 01:54:52 +02:00
shangchenglumetro	cb65d110ea	chore: fix some comments (#105504 ) Co-authored-by: Jan Kotas <jkotas@microsoft.com>	2024-07-27 14:49:26 -04:00
Dan Moseley	b252fa50ad	remove corefx from md (#105275 )	2024-07-22 14:55:00 -06:00
David Wrighton	dacf9dbdd4	Change temporary entrypoints to be lazily allocated (#101580 ) * WorkingOnIt * It basically works for a single example. Baseline Loader Heap: ---------------------------------------- System Domain: 7ffab916ec00 LoaderAllocator: 7ffab916ec00 LowFrequencyHeap: Size: 0xf0000 (983040) bytes total. HighFrequencyHeap: Size: 0x16a000 (1482752) bytes total, 0x3000 (12288) bytes wasted. StubHeap: Size: 0x1000 (4096) bytes total. FixupPrecodeHeap: Size: 0x168000 (1474560) bytes total. NewStubPrecodeHeap: Size: 0x18000 (98304) bytes total. IndirectionCellHeap: Size: 0x1000 (4096) bytes total. CacheEntryHeap: Size: 0x1000 (4096) bytes total. Total size: Size: 0x3dd000 (4050944) bytes total, 0x3000 (12288) bytes wasted. Compare Loader Heap: ---------------------------------------- System Domain: 7ff9eb49dc00 LoaderAllocator: 7ff9eb49dc00 LowFrequencyHeap: Size: 0xef000 (978944) bytes total. HighFrequencyHeap: Size: 0x1b2000 (1777664) bytes total, 0x3000 (12288) bytes wasted. StubHeap: Size: 0x1000 (4096) bytes total. FixupPrecodeHeap: Size: 0x70000 (458752) bytes total. NewStubPrecodeHeap: Size: 0x10000 (65536) bytes total. IndirectionCellHeap: Size: 0x1000 (4096) bytes total. CacheEntryHeap: Size: 0x1000 (4096) bytes total. Total size: Size: 0x324000 (3293184) bytes total, 0x3000 (12288) bytes wasted. LowFrequencyHeap is 4KB bigger HighFrequencyHeap is 288KB bigger FixupPrecodeHeap is 992KB smaller NewstubPrecodeHeap is 32KB smaller * If there isn't a parent methodtable and the slot matches... then it by definition the method is defining the slot * Fix a couple more issues found when running a subset of the coreclr tests * Get X86 building again * Attempt to use a consistent api to force slots to be set * Put cache around RequiresStableEntryPoint * Fix typo * Fix interop identified issue where we sometime set a non Precode into an interface * Move ARM and X86 to disable compact entry points * Attempt to fix build breaks * fix typo * Fix another Musl validation issue * More tweaks around NULL handling * Hopefully the last NULL issue * Fix more NULL issues * Fixup obvious issues * Fix allocation behavior so we don't free the data too early or too late * Fix musl validation issue * Fix tiered compilation * Remove Compact Entrypoint logic * Add new ISOSDacInterface15 api * Fix some naming of NoAlloc to a more clear IfExists suffix * Remove way in which GetTemporaryEntryPoint behaves differently for DAC builds, and then remove GetTemporaryEntrypoint usage from DAC entirely in favor of GetTemporaryEntryPointIfExists * Attempt to reduce most of the use of EnsureSlotFilled. Untested, but its late. * Fix the build before sending to github * Fix unix build break, and invalid assert * Improve assertion checks to validate that we don't allocate temporary entrypoints that will be orphaned if the type doesn't actually end up published. * Remove unused parameters and add contracts * Update method-descriptor.md * Fix musl validation issue * Adjust SOS api to be an enumerator * Fix assertion issues noted Fix ISOSDacInterface15 to actually work * Remove GetRestoredSlotIfExists - Its the same as GetSlot .... just replace it with that function. * Update src/coreclr/debug/daccess/daccess.cpp Co-authored-by: Jan Kotas <jkotas@microsoft.com> * Update docs/design/coreclr/botr/method-descriptor.md Co-authored-by: Jan Kotas <jkotas@microsoft.com> * Update src/coreclr/vm/methodtable.inl Co-authored-by: Jan Kotas <jkotas@microsoft.com> * Update src/coreclr/vm/methodtable.h Co-authored-by: Jan Kotas <jkotas@microsoft.com> * Fix GetMethodDescForSlot_NoThrow Try removing EnsureSlotFilled Implement IsEligibleForTieredCompilation in terms of IsEligibleForTieredCompilation_NoCheckMethodDescChunk * Fix missing change intended in last commit * Fix some more IsPublished memory use issues * Call the right GetSlot method * Move another scenario to NoThrow, I think this should clear up our tests... * Add additional IsPublished check * Fix MUSL validation build error and Windows x86 build error * Address code review feedback * Fix classcompat build * Update src/coreclr/vm/method.cpp Co-authored-by: Aaron Robinson <arobins@microsoft.com> * Remove assert that is invalid because TryGetMulticCallableAddrOfCode can return NULL ... and then another thread could produce a stable entrypoint and the assert could lose the race * Final (hopefully) code review tweaks. * Its possible for GetOrCreatePrecode to be called for cases where it isn't REQUIRED. we need to handle that case. --------- Co-authored-by: Jan Kotas <jkotas@microsoft.com> Co-authored-by: Aaron Robinson <arobins@microsoft.com>	2024-07-14 12:20:21 -07:00
Jan Kotas	99e94ce55d	Replace net8 with net9 under docs (#104189 ) Fixes #104088	2024-06-29 07:44:01 -04:00
Jan Kotas	0746dd3d07	Fix mistake in BotR Threading section (#104183 ) Fixes #79248	2024-06-28 18:56:12 -07:00
David Wrighton	eb8f54d92b	Make normal statics simpler (#99183 ) This change makes access to statics much simpler to document and also removes some performance penalties that we've had for a long time due to the old model. Most statics access should be equivalent or faster. This change converts static variables from a model where statics are associated with the module that defined the metadata of the static to a model where each individual type allocates its statics independently. In addition, it moves the flags that indicate whether or not a type is initialized, and whether or not its statics have been allocated to the `MethodTable` structures instead of storing them in a `DomainLocalModule` as was done before. # Particularly notable changes - All statics are now considered "dynamic" statics. - Statics for collectible assemblies now have an identical path for lookup of the static variable addresses as compared to statics for non-collectible assemblies. It is now reasonable for the process of reading static variables to be inlined into shared generic code, although this PR does not attempt to do so. - Lifetime management for collectible non-thread local statics is managed via a combination of a `LOADERHANDLE` to keep the static alive, and a new handle type called a `HNDTYPE_WEAK_INTERIOR_POINTER` which will keep the pointers to managed objects in the `MethodTable` structures up to date with the latest addresses of the static variables. - Each individual type in thread statics has a unique object holding the statics for the type. This means that each type has a separate object[](for gc statics), and/or double[](for non-gc statics) per thread for TLS statics. This isn't necessarily ideal for non-collectible types, but its not terrible either. - Thread statics for collectible types are reported directly to the GC instead of being handled via a GCHandle. While needed to avoid complex lifetime rules for collectible types, this may not be ideal for non-collectable types. - Since the `DomainLocalModule` no longer exists, the `ISOSDacInterface` has been augmented with a new api called `ISOSDacInterface14` which adds the ability to query for the static base/initialization status of an individual type directly. - Significant changes for generated code include - All the helpers are renamed - The statics of generics which have not yet been initialized can now be referenced using a single constant pointer + a helper call instead of needing a pair of pointers. In practice, this was a rare condition in perf-critical code due to the presence of tiered compilation, so this is not a significant change to optimized code. - The pre-initialization of statics can now occur for types which have non-primitive valuetype statics as long as the type does not have a class constructor. - Thread static non-gc statics are now returned as byrefs. (It turns out that for collectible assemblies, there is currently a small GC hole if a function returns the address of a non-gc threadstatic. CoreCLR at this time does not attempt to keep the collectible assembly alive if that is the only live pointer to the collectible static in the system) With this change, the pointers to normal static data are located at a fixed offset from the start of the `MethodTableAuxiliaryData`, and indices for Thread Static variables are stored also stored in such a fixed offset. Concepts such as the `DomainLocalModule` , `ThreadLocalModule`, `ModuleId` and `ModuleIndex` no longer exist. # Lifetime management for collectible statics - For normal collectible statics, each type will allocate a separate object[] for the GC statics and a double[] for the non-GC statics. A pointer to the data of these arrays will be stored in the `DynamicStaticsInfo` structure, and when relocation occurs, if the collectible types managed `LoaderAllocator` is still alive, the static field address will be relocated if the object moves. This is done by means of the new Weak Interior Pointer GC handle type. - For collectible thread-local statics, the lifetime management is substantially more complicated due the issue that it is possible for either a thread or a collectible type to be collected first. Thus the collection algorithm is as follows. - The system shall maintain a global mapping of TLS indices to MethodTable structures - When a native `LoaderAllocator` is being cleaned up, before the WeakTrackResurrection GCHandle that points at the the managed `LoaderAllocator` object is destroyed, the mapping from TLS indices to collectible `LoaderAllocator` structures shall be cleared of all relevant entries (and the current GC index shall be stored in the TLS to MethodTable mapping) - When a GC promotion or collection scan occurs, for every TLS index which was freed to point at a GC index the relevant entry in the TLS table shall be set to NULL in preparation for that entry in the table being reused in the future. In addition, if the TLS index refers to a `MethodTable` which is in a collectible assembly, and the associated `LoaderAllocator` has been freed, then set the relevant entry to NULL. - When allocating new entries from the TLS mapping table for new collectible thread local structures, do not re-use an entry in the table until at least 2 GCs have occurred. This is to allow every thread to have NULL'd out the relevant entry in its thread local table. - When allocating new TLS entries for collectible TLS statics on a per-thread basis allocate a `LOADERHANDLE` for each object allocated, and associate it with the TLS index on that thread. - When cleaning up a thread, for each collectible thread static which is still allocated, we will have a `LOADERHANDLE`. If the collectible type still has a live managed `LoaderAllocator` free the `LOADERHANDLE`. # Expected cost model for extra GC interactions associated with this change This change adds 3 possible ways in which the GC may have to perform additional work beyond what it used to do. 1. For normal statics on collectible types, it uses the a weak interior pointer GC handle for each of these that is allocated. This is purely pay for play and trades off performance of accessing collectible statics at runtime to the cost of maintaining a GCHandle in the GC. As the number of statics increases, this could in theory become a performance problem, but given the typical usages of collectible assemblies, we do not expect this to be significant. 2. For non-collectible thread statics, there is 1 GC pointer that is unconditionally reported for each thread. Usage of this removes a single indirection from every non-collectible thread local access. Given that this pointer is reported unconditionally, and is only a single pointer, this is not expected to be a significant cost. 3. For collectible thread statics, there is a complex protocol to keep thread statics alive for just long enough, and to clean them up as needed. This is expected to be completely pay for play with regard to usage of thread local variables in collectible assemblies, and while slightly more expensive to run than the current logic, will reduce the cost of creation/destruction of threads by a much more significant factor. In addition, if there are no collectible thread statics used on the thread, the cost of this is only a few branches per lookup. # Perf impact of this change I've run the .NET Microbenchmark suite as well as a variety of ASP.NET Benchmarks. (Unfortunately the publicly visible infrastructure for running tests is incompatible with this change, so results are not public). The results are generally quite hard to interpret. ASP.NET Benchmarks are generally (very) slightly better, and the microbenchmarks are generally equivalent in performance, although there is variability in some tests that had not previously shown variability, and the differences in performance are contained within the margin of error in our perf testing for tests with any significant amount of code. When performance differences have been examined in detail, they tend to be in code which has not changed in any way due to this change, and when run in isolation the performance deltas have disappeared in all cases that I have examined. Thus, I assume they are caching side effect changes. Performance testing has led me to add a change such that all NonGC, NonCollectible statics are allocated in a separate LoaderHeap which appears to have reduced the variability in some of the tests by a small fraction, although results are not consistent enough for me to be extremely confident in that statement.	2024-06-12 20:54:31 -07:00
Ken Dale	59f2833b98	Update docs.microsoft.com usages to learn.microsoft.com (#102881 ) * Update https://docs.microsoft.com to https://learn.microsoft.com * Update http://docs.microsoft.com/ to https://learn.microsoft.com (removes trailing slash) * Update docs.microsoft.com to https://learn.microsoft.com * Update docs.microsoft.com to learn.microsoft.com * Replace learn.microsoft.com/en-us/ with learn.microsoft.com/ to remove locale from urls	2024-05-31 11:27:45 -07:00
Tanner Gooding	3bd417108a	Change the ReciprocalEstimate and ReciprocalSqrtEstimate APIs to be mustExpand on RyuJIT (#102098 ) * Change the ReciprocalEstimate and ReciprocalSqrtEstimate APIs to be mustExpand on RyuJIT * Apply formatting patch * Fix the RV64 and LA64 builds * Mark the ReciprocalEstimate and ReciprocalSqrtEstimate methods as AggressiveOptimization to bypass R2R * Mark other usages of ReciprocalEstimate and ReciprocalSqrtEstimate in Corelib with AggressiveOptimization * Mark several non-deterministic APIs as BypassReadyToRun and skip intrinsic expansion in R2R * Cleanup based on PR recommendations to rely on the runtime rather than attributation of non-deterministic intrinsics * Adding a regression test ensuring direct and indirect invocation of non-deterministic intrinsic APIs returns the same result * Add a note about non-deterministic intrinsic expansion to the botr * Apply formatting patch * Ensure vector tests are correctly validating against the scalar implementation * Fix the JIT/SIMD/VectorConvert test and workaround a 32-bit test issue * Skip a test on Mono due to a known/tracked issue * Ensure that lowering on Arm64 doesn't make an assumption about cast shapes * Ensure the tier0opts local is used * Ensure impEstimateIntrinsic bails out for APIs that need to be implemented as user calls	2024-05-13 14:01:21 -07:00
Filip Navara	41b1091890	Replace FEATURE_EH_FUNCLETS in JIT with runtime switch (#99191 ) * Replace FEATURE_EH_FUNCLETS/FEATURE_EH_CALLFINALLY_THUNKS in JIT with runtime switch * Cache Native AOT ABI check to see if TP improves --------- Co-authored-by: Bruce Forstall <brucefo@microsoft.com>	2024-04-05 11:46:18 -07:00
Andy Ayers	f558631580	JIT: fix some display math issues in reconstruction doc (#100609 ) Github does not handle `$$..$$` blocks on consecutive lines. Add some blank lines in between.	2024-04-03 14:28:31 -07:00
Andy Ayers	95f68f5d0e	JIT: add note on profile reconstruction (#100601 ) I left a long note on the algorithm as a comment on #99992. Move it to the doc folder.	2024-04-03 12:45:05 -07:00
Jakob Botsch Nielsen	6d91b3e05f	JIT: Remove `GT_MKREFANY` (#100204 ) The operation of `mkrefany` can easily be represented with more generally handled nodes within the JIT today. This also allows promotion to remain enabled for methods using this construct, so CQ improvements are expected when optimizing.	2024-03-26 09:19:39 +01:00
Jakob Botsch Nielsen	813d743a72	JIT: Add scalar evolution analysis and do IV widening based on it (#97865 ) This adds a new phase meant for optimizing induction variables. It adds infrastructure for SSA-based analysis of induction variables (scalar evolution analysis), and uses it to do induction variable widening. For example, with this optimization, codegen for ```csharp [MethodImpl(MethodImplOptions.NoInlining)] static int Foo(int[] arr) { int sum = 0; for (int i = 0; i < arr.Length; i++) { sum += arr[i]; } return sum; } ``` goes from ```asm ; Assembly listing for method ConsoleApp34.Program:Foo(int[]):int (FullOpts) ; Emitting BLENDED_CODE for X64 with AVX - Windows ; FullOpts code ; optimized code ; rsp based frame ; fully interruptible ; No PGO data ; Final local variable assignments ; ; V00 arg0 [V00,T02] ( 4, 7 ) ref -> rcx class-hnd single-def <int[]> ; V01 loc0 [V01,T01] ( 4, 10 ) int -> rax ; V02 loc1 [V02,T00] ( 5, 17 ) int -> rdx ; V03 OutArgs [V03 ] ( 1, 1 ) struct (32) [rsp+0x00] do-not-enreg[XS] addr-exposed "OutgoingArgSpace" ; V04 cse0 [V04,T03] ( 3, 6 ) int -> r8 "CSE - aggressive" ; ; Lcl frame size = 40 G_M8112_IG01: sub rsp, 40 ;; size=4 bbWeight=1 PerfScore 0.25 G_M8112_IG02: xor eax, eax xor edx, edx mov r8d, dword ptr [rcx+0x08] test r8d, r8d jle SHORT G_M8112_IG04 align [0 bytes for IG03] ;; size=13 bbWeight=1 PerfScore 3.75 G_M8112_IG03: mov r10d, edx add eax, dword ptr [rcx+4r10+0x10] inc edx cmp r8d, edx jg SHORT G_M8112_IG03 ;; size=15 bbWeight=4 PerfScore 19.00 G_M8112_IG04: add rsp, 40 ret ;; size=5 bbWeight=1 PerfScore 1.25 ; Total bytes of code 37, prolog size 4, PerfScore 24.25, instruction count 14, allocated bytes for code 37 (MethodHash=d1cce04f) for method ConsoleApp34.Program:Foo(int[]):int (FullOpts) ; ============================================================ ``` to ```asm ; Assembly listing for method ConsoleApp34.Program:Foo(int[]):int (FullOpts) ; Emitting BLENDED_CODE for X64 with AVX - Windows ; FullOpts code ; optimized code ; rsp based frame ; fully interruptible ; No PGO data ; Final local variable assignments ; ; V00 arg0 [V00,T02] ( 4, 7 ) ref -> rcx class-hnd single-def <int[]> ; V01 loc0 [V01,T01] ( 4, 10 ) int -> rax ; V02 loc1 [V02,T04] ( 0, 0 ) int -> zero-ref ; V03 OutArgs [V03 ] ( 1, 1 ) struct (32) [rsp+0x00] do-not-enreg[XS] addr-exposed "OutgoingArgSpace" ; V04 tmp1 [V04,T00] ( 5, 17 ) long -> r8 "Widened primary induction variable" ; V05 cse0 [V05,T03] ( 3, 6 ) int -> rdx "CSE - aggressive" ; ; Lcl frame size = 40 G_M8112_IG01: ;; offset=0x0000 sub rsp, 40 ;; size=4 bbWeight=1 PerfScore 0.25 G_M8112_IG02: ;; offset=0x0004 xor eax, eax mov edx, dword ptr [rcx+0x08] test edx, edx jle SHORT G_M8112_IG04 xor r8d, r8d align [0 bytes for IG03] ;; size=12 bbWeight=1 PerfScore 3.75 G_M8112_IG03: ;; offset=0x0010 add eax, dword ptr [rcx+4r8+0x10] inc r8d cmp edx, r8d jg SHORT G_M8112_IG03 ;; size=13 bbWeight=4 PerfScore 18.00 G_M8112_IG04: ;; offset=0x001D add rsp, 40 ret ;; size=5 bbWeight=1 PerfScore 1.25 ; Total bytes of code 34, prolog size 4, PerfScore 23.25, instruction count 13, allocated bytes for code 34 (MethodHash=d1cce04f) for method ConsoleApp34.Program:Foo(int[]):int (FullOpts) ``` where we were able to drop a zero extension of the index inside the loop. In the future I plan to build strength reduction on top of the same analysis package. The analysis is inspired by [1] and by LLVM's scalar evolution package. It provides a small IR that represents the evolving value of IR nodes inside loops. At the core of this IR is the notion of an "add recurrence", which describes a changing value as `<loop, start, step>`; the value of such an add recurrence is $start + N step$, where N is the iteration index. Currently only simple add recurrences are supported where the start and step are either constants or invariant locals, but the framework generalizes nicely to allow chains of recurrences if we wish to support that. The IR also supports constants, invariant locals and operators on top of these (casts, adds, multiplications and shifts). For the IR for the above, the analysis produces the following: ```scala Analyzing scalar evolution in L00 header: BB03 Members (1): BB03 Entry: BB02 -> BB03 Exit: BB03 -> BB04 Back: BB03 -> BB03 BB03 [0001] [006..016) -> BB03,BB04 (cond), preds={BB02,BB03} succs={BB04,BB03} STMT00009 ( ??? ... ??? ) N004 ( 0, 0) [000045] DA--------- ▌ STORE_LCL_VAR int V01 loc0 d:3 N003 ( 0, 0) [000044] ----------- └──▌ PHI int N001 ( 0, 0) [000050] ----------- pred BB03 ├──▌ PHI_ARG int V01 loc0 u:4 N002 ( 0, 0) [000047] ----------- pred BB02 └──▌ PHI_ARG int V01 loc0 u:2 [000047] => 0 STMT00007 ( ??? ... ??? ) N004 ( 0, 0) [000041] DA--------- ▌ STORE_LCL_VAR int V02 loc1 d:3 N003 ( 0, 0) [000040] ----------- └──▌ PHI int N001 ( 0, 0) [000051] ----------- pred BB03 ├──▌ PHI_ARG int V02 loc1 u:4 N002 ( 0, 0) [000048] ----------- pred BB02 └──▌ PHI_ARG int V02 loc1 u:2 [000051] => <L00, 1, 1> [000048] => 0 [000041] => <L00, 0, 1> ``` For example, here it was able to show that `V02`, the index, is a primary induction variable; it is an add recurrence that starts at 0 and steps by 1 every iteration of the loop. It also showed that the value that comes from the backedge is similarly an add recurrence, except that it starts at 1 in the first loop. ```scala STMT00003 ( 0x006[E-] ... 0x00B ) N015 ( 8, 9) [000015] DA--GO----- ▌ STORE_LCL_VAR int V01 loc0 d:4 N014 ( 8, 9) [000014] ----GO----- └──▌ ADD int N012 ( 6, 7) [000033] ----GO-N--- ├──▌ COMMA int N001 ( 0, 0) [000025] ----------- │ ├──▌ NOP void N011 ( 6, 7) [000034] n---GO----- │ └──▌ IND int N010 ( 3, 5) [000032] -----O----- │ └──▌ ARR_ADDR byref int[] N009 ( 3, 5) [000031] -------N--- │ └──▌ ADD byref N002 ( 1, 1) [000022] ----------- │ ├──▌ LCL_VAR ref V00 arg0 u:1 N008 ( 4, 5) [000030] -------N--- │ └──▌ ADD long N006 ( 3, 4) [000028] -------N--- │ ├──▌ LSH long N004 ( 2, 3) [000026] ---------U- │ │ ├──▌ CAST long <- uint N003 ( 1, 1) [000023] ----------- │ │ │ └──▌ LCL_VAR int V02 loc1 u:3 N005 ( 1, 1) [000027] -------N--- │ │ └──▌ CNS_INT long 2 N007 ( 1, 1) [000029] ----------- │ └──▌ CNS_INT long 16 N013 ( 1, 1) [000009] ----------- └──▌ LCL_VAR int V01 loc0 u:3 (last use) [000022] => V00.1 [000023] => <L00, 0, 1> [000026] => <L00, 0, 1> [000027] => 2 [000028] => <L00, 0, 4> [000029] => 16 [000030] => <L00, 16, 4> [000031] => <L00, (V00.1 + 16), 4> [000032] => <L00, (V00.1 + 16), 4> ``` This one is more interesting since we can see hints of how strength reduction is going to utilize the information. In particular, the analysis was able to show that the address `[000032]` is also an add recurrence; it starts at value `(V00.1 + 16)` (the address of the first array element) and steps by `4` in every iteration. ```scala STMT00004 ( 0x00C[E-] ... 0x00F ) N004 ( 3, 3) [000019] DA--------- ▌ STORE_LCL_VAR int V02 loc1 d:4 N003 ( 3, 3) [000018] ----------- └──▌ ADD int N001 ( 1, 1) [000016] ----------- ├──▌ LCL_VAR int V02 loc1 u:3 (last use) N002 ( 1, 1) [000017] ----------- └──▌ CNS_INT int 1 [000016] => <L00, 0, 1> [000017] => 1 [000018] => <L00, 1, 1> [000019] => <L00, 1, 1> STMT00002 ( 0x010[E-] ... 0x014 ) N005 ( 7, 7) [000008] ---X------- ▌ JTRUE void N004 ( 5, 5) [000007] J--X---N--- └──▌ GT int N002 ( 3, 3) [000006] ---X------- ├──▌ ARR_LENGTH int N001 ( 1, 1) [000005] ----------- │ └──▌ LCL_VAR ref V00 arg0 u:1 N003 ( 1, 1) [000004] ----------- └──▌ LCL_VAR int V02 loc1 u:4 [000005] => V00.1 [000004] => <L00, 1, 1> ``` Here the analysis shows that the array object is invariant (otherwise analysis would fail) and that the compared index is an add recurrence that starts at 1. When the induction variable has uses after the loop the widening pass will store the widened version back to the old local. This is only possible if all exits where the old local is live-in are not critical blocks in the sense that all their preds must come from inside the loop. Exit canonicalization ensures that this is usually the case, but RBO/assertion prop may have uncovered new natural loops, so we still have to repeat the check. [1] Michael Wolfe. 1992. Beyond induction variables. In Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation (PLDI '92). Association for Computing Machinery, New York, NY, USA, 162–174. https://doi.org/10.1145/143095.143131	2024-02-27 21:54:40 +01:00
Egor Bogatov	51b51bffbb	Remove GT_STORE_DYN_BLK (#98905 )	2024-02-27 11:31:21 +01:00
Egor Bogatov	fab69efde7	Move memset/memcpy/memzero jit helpers to SpanHelpers (#98623 ) --------- Co-authored-by: Jan Kotas <jkotas@microsoft.com>	2024-02-25 17:37:10 +01:00
Egor Bogatov	660cfe08b5	Add DOTNET_JitDisasmOnlyOptimized knob (#96960 )	2024-01-16 15:39:31 +01:00
Bruce Forstall	9ad08ccf9d	Document JitDisasm and JitLateDisasm (#96468 ) * Document JitDisasm and JitLateDisasm As well as the emitter unit tests usage of them. Add all this content to viewing-jit-dumps.md, and rewrite that slightly to accommodate. Add more examples of things like method lists. * Feedback	2024-01-05 11:26:06 -08:00
Egor Bogatov	bed5f469e8	GT_STORE_BLK - do not call memset for blocks containg gc pointers on heap (#95530 ) Co-authored-by: Qiao Pengcheng <qiaopengcheng@loongson.cn>	2024-01-04 23:21:44 +01:00
Bruce Forstall	be4a76b22b	Adjust clr-abi for new arm32 call-finally behavior (#95596 ) Due to change in #95117	2023-12-04 11:36:40 -08:00
Jan Kotas	11a6f9bd1a	Add ABI note about small return types (#94987 ) * Add ABI note about small return types * Update docs/design/coreclr/botr/clr-abi.md Co-authored-by: Jakob Botsch Nielsen <Jakob.botsch.nielsen@gmail.com> * Fix fcall.h to match	2023-11-20 07:02:27 -08:00
SingleAccretion	8ce4279520	Update RyuJit overview (#92789 )	2023-09-28 20:52:25 -07:00
Thomas Johnson	85f54f3ef6	Fixed broken ECMA links (#92547 )	2023-09-24 11:37:59 -07:00
Mark Plesko	88c4b87221	Clarifications in botr gc section (#89888 )	2023-08-02 17:16:34 -07:00
Jan Vorlicek	4fbd7c5827	Set the CORINFO_EH_CLAUSE_SAMETRY on CORINFO_EH_CLAUSE (#88072 ) * Set the CORINFO_EH_CLAUSE_SAMETRY on CORINFO_EH_CLAUSE This change makes setting the `CORINFO_EH_CLAUSE_SAMETRY` on `CORINFO_EH_CLAUSE` to happen for coreclr to. It is a prerequisity for the port of exception handling from nativeaot to coreclr and it is a noop on coreclr with the old exception handling. * Fix comments * Add clr-abi note and r2rdump support for the flag * Fix markdown LINT error * Update docs/design/coreclr/botr/clr-abi.md * Update docs/design/coreclr/botr/clr-abi.md * Update the ABI doc --------- Co-authored-by: Jan Kotas <jkotas@microsoft.com>	2023-07-26 18:48:32 +02:00
Aaron Robinson	5649739953	`UnsafeAccessorAttribute` non-generic support (#86932 ) * CoreCLR and NativeAOT * Add UnsafeAccessorAttribute API * Implement IL generation for all accessor paths * Implement static/instance field lookup - non-generic * Implement static/instance method lookup - non-generic * Defined ambiguity logic with respect to custom modifiers. - First pass ignore custom modifiers - If ambiguity detected, rerun algorithm but require precise matching of custom modifiers. - If there is no clear match throw AmbiguousImplementationException. * Cleanup memory management confusion with ILStubResolver. * Fix non-standard C++ * Remove CORINFO_MODULE_ALLACCESS scope * Remove enum METHOD_TYPE. * Update BOTR on TypeDesc	2023-06-15 06:59:49 -07:00
David Wrighton	8042facb42	Improve the performance of the type loader through various tweaks (#85743 ) * Skip type validation by default in ReadyToRun images - Technically, this is a breaking change, so I've provided a means for disabling the type validation skip - The model is that the C# compile won't get these details wrong, so disable the checks when run through crossgen2. The idea is that we'll get these things checked during normal, non-R2R usage of the app, and publish won't check these details. * Replace expensive lookups of generic parameter and nested class data with R2R optimized forms * Store index of MethodDesc as well as ChunkIndex. Makes MethodDesc::GetTemporaryEntryPoint much faster * Optimize the path for computing if a method is eligible for tiered compilation * Remove CanShareVtableChunksFrom concept - it was only needed to support NGen * Fix up some more issues * Bring back late virtual propagation in the presence of covariant overrides only * Check correct flag on EEClass * Drive by fix for GetRestoredSlot. We don't need the handling of unrestored slots anymore * Fix composite build with new tables * Uniquify the mangled names * Add more __ * Initial pass at type skip verifation checker * Fix logging and some correctness issues * Enable the more of type checking - Notably, the recursive stuff now works - Also fix a bug in constraint checking involving open types in the type system * Fix build breaks involving new feature of GenericParameterDesc * Add documentation for R2R format changes Fix command line parameter to be more reasonable, and allow logging on command Fix the rest of issues noted in crossgen2 testing * Fix implementation of CompareMethodContraints. instead of using IsGeneric map, check to see if the method is generic in the first place. It turns out we have an efficient way to check in every place that matters * Fix nits noticed by Aaron * Add some const correctness to the world * Fix issues noted by Michal, as well as remaining constrain checking issues * Code review details * Code review from trylek	2023-06-13 15:25:50 -07:00
Alexander Soldatov	224927519c	[RISC-V] Fix Stubs for Generics (#87316 ) Fix register saving before passing generic instantiation parameter.	2023-06-09 17:43:48 -07:00
t-mustafin	d88f9a0ed3	Fix typo in dac-notes.md (#87295 )	2023-06-08 17:04:21 -07:00
Andrew Au	f20292493e	Documentation for the HotColdMap section (#87023 )	2023-06-05 13:45:51 -07:00
Tanner Gooding	d7a30ba1f1	Ensure Vector2/3/4, Quaternion, and Plane don't have a false dependency on Vector<T> (#86481 ) * Ensure Vector2/3/4, Quaternion, and Plane don't have a false dependency on Vector<T> * Apply JIT formatting patch * Fixing a build issue * Handle an SPMI assert	2023-05-19 18:59:50 -07:00
Tanner Gooding	2f49fcff6d	Remove getMaxIntrinsicSIMDVectorLength from the JIT/EE interface (#86479 ) * Fixing a couple small typos * Remove getMaxIntrinsicSIMDVectorLength from the JIT/EE interface * Update src/coreclr/vm/methodtablebuilder.cpp --------- Co-authored-by: Jan Kotas <jkotas@microsoft.com>	2023-05-18 21:22:12 -07:00
Tanner Gooding	0f8afd209a	Add an undocumented switch to allow controlling the preferred vector width emitted implicitly by the JIT (#86457 ) * Add an undocumented switch to allow controlling the preferred vector width emitted implicitly by the JIT * Resolving issues and responding to PR feedback * Simplifying the xarch cpu info check	2023-05-18 14:25:00 -07:00
Mark Plesko	0096ba52e8	Convert JIT/CodeGenBringUpTests to a merged test group (#85847 ) See https://github.com/markples/utils/tree/for-PR-dotnet-runtime-85847-others for ILTransform tool. As usual, I recommend viewing the commit list since it partitions the changes in a more readable way and paying more attention to manual changes. * [ILTransform -public] Make test entrypoints accessible * [ILTransform -ilfact] Main->TestEntryPoint, [Fact], remove OutputType=Exe * Manual fixes for xUnit1013 - internal methods * Add merged group * Update porting-ryujit.md with info on merged test groups	2023-05-17 23:36:40 -07:00
David Wrighton	8a2aec1be7	Intrinsics analyzer and fixes (#85481 ) * Implement analyzer for platform intrinsics use in System.Private.CoreLib This analyzer detects the use of all platform intrinsics and checks to ensure that they are all used either protected by an if statement OR ternary operator which checks an appropriate IsSupported flag, or that the intrinsic is used within a method where the behavior of platform support for the intrinsic is not allowed to vary between compile time and runtime. The analyzer attempts to be conservative about allowed patterns. All existing code in System.Private.CoreLib has been annotated to avoid producing errors. See the markdown document for details. Co-authored-by: Jeremy Koritzinsky <jkoritzinsky@gmail.com>	2023-05-15 19:20:01 -07:00
Tanner Gooding	7bd4666671	Ensure getMaxSIMDStructBytes doesn't report `compVerifyInstructionSetUnusable` (#85370 )	2023-05-06 09:09:26 -07:00
Egor Bogatov	0be256e710	Allocate Array.Empty<> on a frozen segment (NonGC heap) (#85559 ) Co-authored-by: Jakob Botsch Nielsen <Jakob.botsch.nielsen@gmail.com>	2023-05-03 11:41:34 +02:00
mikelle-rogers	afa9a930b8	EnC Support for Generics (#85269 ) * EnC non-functional changes - Update inappropriate naming - Update many logging statements - Remove unused code * EnC support for fields on generic types * EnC support for methods on generic types * Fix use after free introduced in EnC breakpoint. Fix off by one for string logging. * update new feature capabilities, JIT GUID * Fix non-enc build * Fix EnCFieldIndex check * Remove IsFdPrivate assert --------- Co-authored-by: Aaron R Robinson <arobins@microsoft.com> Co-authored-by: Juan Hoyos <19413848+hoyosjs@users.noreply.github.com> Co-authored-by: Tom McDonald <tommcdon@microsoft.com>	2023-04-25 10:32:19 -07:00
Tanner Gooding	db93e03226	Updating some places to cover xmm16-xmm31 (#84088 ) * Updating some places to cover xmm16-xmm31 * Remove XMM0-XMM31 and K0-K7 from mapRegNumToDwarfReg	2023-03-29 16:19:44 -07:00
Huo Yaoyuan	b007d12c6d	Update BOTR description of function pointer (#83914 )	2023-03-24 20:30:09 -07:00
David Wrighton	e88d8944bf	Load levels explanation (#83840 ) Add some descriptions of how load levels work in the runtime	2023-03-23 13:35:25 -07:00
Tanner Gooding	46993586e4	Some minor cleanup post the addition of TYP_SIMD64 and ZMM support - P1 (#83044 ) * Ensure EA_16BYTE is FEATURE_SIMD only and EA_32/64BYTE are TARGET_XARCH only * Remove getSIMDSupportLevel as its now unnecessary * Ensure canUseVexEncoding and canUseEvexEncoding are xarch only * Don't make EA_16BYTE+ require FEATURE_SIMD * Resolving formatting and build failures * Adding back a check that shouldn't have been removed	2023-03-07 11:56:26 -08:00
Adeel Mujahid	3b63eb1346	Replace remaining instances of COMPlus with DOTNET (#82985 ) * Replace remaining instances of COMPlus with DOTNET * Fix heading in RyuJIT tutorial	2023-03-05 08:40:20 -08:00
Egor Bogatov	f702d56644	Enable JitDasmWithAlignmentBoundaries and JitDasmWithAddress in Release (#82666 )	2023-03-04 10:40:21 +01:00
Tanner Gooding	83261a9551	Replace the last two SIMDIntrinsic in LIR with NamedIntrinsic and delete GT_SIMD (#80027 ) * Replace the last two SIMDIntrinsic in LIR with NamedIntrinsic and delete GT_SIMD * Applying formatting patch * Ensure SIMD_UpperRestore/Save is handled in gtSetEvalOrder * Handle some asserts on Arm64	2023-01-05 07:38:13 -08:00
Jakob Botsch Nielsen	86254ee842	Update some documentation (#77711 ) * Update references to Ngen variants of JIT environment variables * Update crossgen to crossgen2 * SPC instead of mscorlib	2022-11-01 09:15:55 -07:00
Armin Shoeibi	f4a6849783	Retire .NET 5 (#69911 )	2022-10-27 11:53:40 -07:00
Jan Kotas	3adbcf5ff9	Doc improvements (#76863 ) - Use proper macOS capitalization - Delete superfluous details Co-authored-by: Theodore Tsirpanis <teo@tsirpanis.gr>	2022-10-11 13:35:53 -07:00
Jakob Botsch Nielsen	243cf9f617	JIT: Simplify JitDisasm matching behavior (#74430 ) This changes how the JIT matches method names and signatures for method sets (e.g. JitDisasm). It also starts printing method instantiations for full method names and makes references to types consistent in generic instantiations and the signature. In addition it starts supporting generic instantiations in release too. To do this, most of the type printing is moved to the JIT, which also aligns the output between crossgen2 and the VM (there were subtle differences here, like spaces between generic type arguments). More importantly, we (for the most part) stop relying on JIT-EE methods that are documented to only be for debug purposes. The new behavior of the matching is the following: * The matching behavior is always string based. * The JitDisasm string is a space-separated list of patterns. Patterns can arbitrarily contain both '' (match any characters) and '?' (match any 1 character). The string matched against depends on characters in the pattern: + If the pattern contains a ':' character, the string matched against is prefixed by the class name and a colon + If the pattern contains a '(' character, the string matched against is suffixed by the signature + If the class name (part before colon) contains a '[', the class contains its generic instantiation + If the method name (part between colon and '(') contains a '[', the method contains its generic instantiation For example, consider ``` namespace MyNamespace { public class C<T1, T2> { [MethodImpl(MethodImplOptions.NoInlining)] public void M<T3, T4>(T1 arg1, T2 arg2, T3 arg3, T4 arg4) { } } } new C<sbyte, string>().M<int, object>(default, default, default, default); // compilation 1 new C<int, int>().M<int, int>(default, default, default, default); // compilation 2 ``` The full strings are: Before the change: ``` MyNamespace.C`2[SByte,__Canon][System.SByte,System.__Canon]:M(byte,System.__Canon,int,System.__Canon) MyNamespace.C`2[Int32,Int32][System.Int32,System.Int32]:M(int,int,int,int) ``` Notice no method instantiation and the double class instantiation, which seems like an EE bug. Also two different names are used for sbyte: System.SByte and byte. After the change the strings are: ``` MyNamespace.C`2[byte,System.__Canon]:M[int,System.__Canon](byte,System.__Canon,int,System.__Canon) MyNamespace.C`2[int,int]:M[int,int](int,int,int,int) ``` The following strings will match both compilations: ``` M C`2:M C`2[]:M[]() MyNamespace.C`2:M ``` The following will match only the first one: ``` M[int,Canon] MyNamespace.C`2[byte,]:M M(Canon) ``` There is one significant change in behavior here, which is that I have removed the special case that allows matching class names without namespaces. In particular, today Console:WriteLine would match all overloads of System.Console.WriteLine, while after this change it will not match. However, with generalized wild cards the replacement is simple in *Console:WriteLine.	2022-09-04 17:57:42 +02:00
N0D4N	126045ca05	Repoint urls in docs (#73766 )	2022-08-14 18:56:07 -06:00

1 2 3 4

152 commits