Satori

mirror of https://github.com/VSadov/Satori.git synced 2025-06-12 02:30:29 +09:00

Author	SHA1	Message	Date
Tanner Gooding	6d3cb53af9	Decompose some bitwise operations in HIR to allow more overall optimizations to kick in (#104517 ) * Decompose some bitwise operations in HIR to allow more overall optimizations to kick in * Ensure that we actually remove the underlying op * Ensure the AND_NOT decomposition is still folded during import for minopts * Ensure we propagate AllBitsSet into simd GT_XOR on xarch * Ensure that we prefer AndNot over TernaryLogic * Cleanup the TernaryLogic lowering code * Ensure that TernaryLogic picks the best operand for containment * Ensure we swap the operands that are being checked for containment * Ensure that TernaryLogic is simplified where possible * Apply formatting patch	2024-07-13 07:01:55 -07:00
Tanner Gooding	4addcaa7e3	Add some helper functions for getting the intrinsic ID to use for a given oper (#104498 ) * Add some helper functions for getting the intrinsic ID to use for a given oper * Make the Unix build happy * Make the Arm64 build happy * Respond to PR feedback * Ensure we don't use EVEX unnecessarily * Ensure zero diffs for x64	2024-07-06 17:35:46 -07:00
Andy Ayers	53a8a01fe1	Stack allocate unescaped boxes (#103361 ) Enable object stack allocation for ref classes and extend the support to include boxed value classes. Use a specialized unbox helper for stack allocated boxes, both to avoid apparent escape of the box by the helper, and to ensure all box field accesses are visible to the JIT. Update the local address visitor to rewrite trees involving address of stack allocated boxes in some cases to avoid address exposure. Disable old promotion for stack allocated boxes (since we have no field handles) and allow physical promotion to enregister the box method table and/or payload as appropriate. In OSR methods handle the fact that the stack allocation may actually have been a heap allocation by the Tier0 method. The analysis TP cost is around 0.4-0.7% (notes below). Boxes are much less likely to escape than ref classes (roughly ~90% of boxes escape, ~99.8% of ref classes escape). Codegen impact is diminished somewhat because many of the boxes are dead and were already getting optimized away. Fixes #4584, #9118, #10195, #11192, #53585, #58554, #85570 --------- Co-authored-by: Jakob Botsch Nielsen <jakob.botsch.nielsen@gmail.com> Co-authored-by: Jan Kotas <jkotas@microsoft.com>	2024-07-01 06:54:49 -07:00
Tanner Gooding	fcdb6dba4c	Centralize the folding logic for ConditionalSelect and ensure side effects aren't dropped (#104175 ) * Centralize the folding logic for ConditionalSelect and ensure side effects aren't dropped * Ensure CndSel handles 64-bit operands where possible * Don't fold if op3 has side effects * Ensure that operands are passed into the lowered not->ternarylogic correctly	2024-06-30 09:28:46 -07:00
Mikhail Ablakatov	f21612abc1	Enbale TYP_MASK support for ARM64 (#103818 ) * Enbale TYP_MASK support for ARM64 * cleanup: check for a FEATURE macro instead of TARGET * jit format --------- Co-authored-by: Kunal Pathak <Kunal.Pathak@microsoft.com>	2024-06-21 17:32:08 -07:00
Tanner Gooding	d7ae8c61f0	Add basic support for folding and normalizing hwintrinsic trees in morph (#103143 ) * Add basic support for folding hwintrinsic trees in morph * Reduce the amount of copying required to evaluated vector constants * Have gtFoldExprHWIntrinsic handle side effects	2024-06-13 14:45:50 -07:00
David Wrighton	eb8f54d92b	Make normal statics simpler (#99183 ) This change makes access to statics much simpler to document and also removes some performance penalties that we've had for a long time due to the old model. Most statics access should be equivalent or faster. This change converts static variables from a model where statics are associated with the module that defined the metadata of the static to a model where each individual type allocates its statics independently. In addition, it moves the flags that indicate whether or not a type is initialized, and whether or not its statics have been allocated to the `MethodTable` structures instead of storing them in a `DomainLocalModule` as was done before. # Particularly notable changes - All statics are now considered "dynamic" statics. - Statics for collectible assemblies now have an identical path for lookup of the static variable addresses as compared to statics for non-collectible assemblies. It is now reasonable for the process of reading static variables to be inlined into shared generic code, although this PR does not attempt to do so. - Lifetime management for collectible non-thread local statics is managed via a combination of a `LOADERHANDLE` to keep the static alive, and a new handle type called a `HNDTYPE_WEAK_INTERIOR_POINTER` which will keep the pointers to managed objects in the `MethodTable` structures up to date with the latest addresses of the static variables. - Each individual type in thread statics has a unique object holding the statics for the type. This means that each type has a separate object[](for gc statics), and/or double[](for non-gc statics) per thread for TLS statics. This isn't necessarily ideal for non-collectible types, but its not terrible either. - Thread statics for collectible types are reported directly to the GC instead of being handled via a GCHandle. While needed to avoid complex lifetime rules for collectible types, this may not be ideal for non-collectable types. - Since the `DomainLocalModule` no longer exists, the `ISOSDacInterface` has been augmented with a new api called `ISOSDacInterface14` which adds the ability to query for the static base/initialization status of an individual type directly. - Significant changes for generated code include - All the helpers are renamed - The statics of generics which have not yet been initialized can now be referenced using a single constant pointer + a helper call instead of needing a pair of pointers. In practice, this was a rare condition in perf-critical code due to the presence of tiered compilation, so this is not a significant change to optimized code. - The pre-initialization of statics can now occur for types which have non-primitive valuetype statics as long as the type does not have a class constructor. - Thread static non-gc statics are now returned as byrefs. (It turns out that for collectible assemblies, there is currently a small GC hole if a function returns the address of a non-gc threadstatic. CoreCLR at this time does not attempt to keep the collectible assembly alive if that is the only live pointer to the collectible static in the system) With this change, the pointers to normal static data are located at a fixed offset from the start of the `MethodTableAuxiliaryData`, and indices for Thread Static variables are stored also stored in such a fixed offset. Concepts such as the `DomainLocalModule` , `ThreadLocalModule`, `ModuleId` and `ModuleIndex` no longer exist. # Lifetime management for collectible statics - For normal collectible statics, each type will allocate a separate object[] for the GC statics and a double[] for the non-GC statics. A pointer to the data of these arrays will be stored in the `DynamicStaticsInfo` structure, and when relocation occurs, if the collectible types managed `LoaderAllocator` is still alive, the static field address will be relocated if the object moves. This is done by means of the new Weak Interior Pointer GC handle type. - For collectible thread-local statics, the lifetime management is substantially more complicated due the issue that it is possible for either a thread or a collectible type to be collected first. Thus the collection algorithm is as follows. - The system shall maintain a global mapping of TLS indices to MethodTable structures - When a native `LoaderAllocator` is being cleaned up, before the WeakTrackResurrection GCHandle that points at the the managed `LoaderAllocator` object is destroyed, the mapping from TLS indices to collectible `LoaderAllocator` structures shall be cleared of all relevant entries (and the current GC index shall be stored in the TLS to MethodTable mapping) - When a GC promotion or collection scan occurs, for every TLS index which was freed to point at a GC index the relevant entry in the TLS table shall be set to NULL in preparation for that entry in the table being reused in the future. In addition, if the TLS index refers to a `MethodTable` which is in a collectible assembly, and the associated `LoaderAllocator` has been freed, then set the relevant entry to NULL. - When allocating new entries from the TLS mapping table for new collectible thread local structures, do not re-use an entry in the table until at least 2 GCs have occurred. This is to allow every thread to have NULL'd out the relevant entry in its thread local table. - When allocating new TLS entries for collectible TLS statics on a per-thread basis allocate a `LOADERHANDLE` for each object allocated, and associate it with the TLS index on that thread. - When cleaning up a thread, for each collectible thread static which is still allocated, we will have a `LOADERHANDLE`. If the collectible type still has a live managed `LoaderAllocator` free the `LOADERHANDLE`. # Expected cost model for extra GC interactions associated with this change This change adds 3 possible ways in which the GC may have to perform additional work beyond what it used to do. 1. For normal statics on collectible types, it uses the a weak interior pointer GC handle for each of these that is allocated. This is purely pay for play and trades off performance of accessing collectible statics at runtime to the cost of maintaining a GCHandle in the GC. As the number of statics increases, this could in theory become a performance problem, but given the typical usages of collectible assemblies, we do not expect this to be significant. 2. For non-collectible thread statics, there is 1 GC pointer that is unconditionally reported for each thread. Usage of this removes a single indirection from every non-collectible thread local access. Given that this pointer is reported unconditionally, and is only a single pointer, this is not expected to be a significant cost. 3. For collectible thread statics, there is a complex protocol to keep thread statics alive for just long enough, and to clean them up as needed. This is expected to be completely pay for play with regard to usage of thread local variables in collectible assemblies, and while slightly more expensive to run than the current logic, will reduce the cost of creation/destruction of threads by a much more significant factor. In addition, if there are no collectible thread statics used on the thread, the cost of this is only a few branches per lookup. # Perf impact of this change I've run the .NET Microbenchmark suite as well as a variety of ASP.NET Benchmarks. (Unfortunately the publicly visible infrastructure for running tests is incompatible with this change, so results are not public). The results are generally quite hard to interpret. ASP.NET Benchmarks are generally (very) slightly better, and the microbenchmarks are generally equivalent in performance, although there is variability in some tests that had not previously shown variability, and the differences in performance are contained within the margin of error in our perf testing for tests with any significant amount of code. When performance differences have been examined in detail, they tend to be in code which has not changed in any way due to this change, and when run in isolation the performance deltas have disappeared in all cases that I have examined. Thus, I assume they are caching side effect changes. Performance testing has led me to add a change such that all NonGC, NonCollectible statics are allocated in a separate LoaderHeap which appears to have reduced the variability in some of the tests by a small fraction, although results are not consistent enough for me to be extremely confident in that statement.	2024-06-12 20:54:31 -07:00
Tanner Gooding	2c540e5c55	Adding some constant folding support for basic floating-point operations (#103206 ) * Adding some constant folding support for basic floating-point operations * Use gtWrapWithSideEffects and respond to PR feedback * Make sure we set DEBUG_NODE_MORPHED on the comma	2024-06-12 19:12:15 -07:00
Tanner Gooding	96be3e2e81	Share more of the TYP_MASK handling and support rewriting TYP_MASK operands in rationalization (#103288 ) * Share more of the TYP_MASK handling and support rewriting TYP_MASK operands in rationalization * Ensure we pass in TYP_MASK, not the simdType * Apply formatting patch * Fix copy/paste error, pass in clsHnd for the argument * Ensure that we normalize sigType before inserting the CvtMaskToVectorNode * Ensure that we get the vector node on Arm64 (ConvertVectorToMask has 2 ops)	2024-06-12 00:01:54 -07:00
Khushal Modi	b5948bf403	AVX10.1 API introduction in JIT (#101938 ) * Add AVX10v1 API surface * Define HWINTRINSIC for AVX10v1, AVX10v1_V256 and AVX10v1_V512 * Setup template testing for AVX10v1 APIs * Handle AVX10v1 APIs in JIT where equivalent AVX512* APIs are handled * Merge Avx10v1 and Avx10v1.V256. Rename Avx10.cs to Avx10v1.cs * Add Avx10v1 to relevant places * Fix CI errors. Add missing API in Avx10v1.PlatofrmNotSupported ad end line with a new character * Changes to be made with latest changes on main. Make appropriate comments. Update tests in template testing for Avx10v1 * Lower AVX10v1 hwintrinsic in lowering and gentree.cpp for simdSize 32/16 * Fix failures on GNR for AVX10v1 * Disable template tests disabled for Avx512 * Distinguish between Avx10v1 and Avx10v1/512, Add appropriate comments and clean up code in lowerCast * Remove duplicate code and rather use a single if condition * Use bool instead of compIsa checks where possible * remove duplication of code in shuffle * resolve review comments. Make evex encoding checks clear to read and resolve a bug in gtNewSimdCvtNode * Add FMA and Avx512F.X64 instructions to AVX10v1. Restructure code and compOpportunistic checks * Combine compOpportunistic checks with Avx10 check using IsAvx10OrIsaSupportedOpportunistically * Introduce a new internal ISA InstructionSet_EVEX and remove InstructionSet_AVX10v1_V256 to make space for the new ISA. Also change all the internal special intrinsic nodes for Avx512F on x86/x64 arch to evex nodes * Addressing review comments. resolving errors introduced when merged with main * fix formatting * Reorder declaration of InstructionSet_EVEX to proper position. Run formatting adn resolve errors introduced when merging with main	2024-06-08 20:03:38 -07:00
Jakob Botsch Nielsen	e867965446	JIT: Propagate `LCL_ADDR` nodes during local morph (#102808 ) This changes local morph to run in RPO when optimizations are enabled. It adds infrastructure to track and propagate LCL_ADDR values assigned to locals during local morph. This allows us to avoid address exposure in cases where the destination local does not actually end up escaping in any way. Example: ```csharp public struct Awaitable { public int Opts; public Awaitable(bool value) { Opts = value ? 1 : 2; } } [MethodImpl(MethodImplOptions.NoInlining)] public static int Test() => new Awaitable(false).Opts; ``` Before: ```asm G_M59043_IG01: ;; offset=0x0000 push rax ;; size=1 bbWeight=1 PerfScore 1.00 G_M59043_IG02: ;; offset=0x0001 xor eax, eax mov dword ptr [rsp], eax mov dword ptr [rsp], 2 mov eax, dword ptr [rsp] ;; size=15 bbWeight=1 PerfScore 3.25 G_M59043_IG03: ;; offset=0x0010 add rsp, 8 ret ;; size=5 bbWeight=1 PerfScore 1.25 ; Total bytes of code: 21 ``` After: ```asm G_M59043_IG02: ;; offset=0x0000 mov eax, 2 ;; size=5 bbWeight=1 PerfScore 0.25 G_M59043_IG03: ;; offset=0x0005 ret ``` Propagating the addresses works much like local assertion prop in morph does. Proving that the locals that were stored to do not escape afterwards is done with a simplistic approach: we check globally that no reads of the locals exists, and if so, we replace the `LCL_ADDR` stored to them by a constant 0. We leave it up to liveness to clean up the stores themselves. If we were able to remove any `LCL_ADDR` in this way then we run an additional pass over the locals of the IR to compute the final set of exposed locals. This could be more sophisticated, but in practice this handles the reported cases just fine. Fix #87072 Fix #102273 Fix #102518 This is still not sufficient to handle #69254. To handle that we would need more support around tracking the values of struct fields, and handling of promoted fields. This PR currently does not handle promoted fields at all; we use `lvHasLdAddrOp` as a conservative approximation of address exposure on the destination locals, and promoted structs almost always have this set. If we were to handle promoted fields we would need some other way to determine that a destination holding a local address couldn't be exposed.	2024-05-31 21:06:40 +02:00
Egor Bogatov	4326eb7ed4	Fix unnecessary bounds check with ulong index (#101352 )	2024-05-01 14:50:21 +02:00
Jakob Botsch Nielsen	4965f2160d	JIT: Model JIT helper exceptions correctly in VN (#101062 ) The JIT currently models all exceptions thrown by helpers with a singleton VN. This can cause CSE to remove exception side effects incorrectly. This change starts modelling exceptions thrown by the following helpers accurately: - The R2R cast helper - Division helpers - Static constructor base helpers Remaining JIT helpers are modelled conservatively, with an opaque VN in the exception part. Contributes to #63559 Fix #101028	2024-04-17 16:54:51 +02:00
Aman Khalid	ec7648702c	JIT: Add GT_SWIFT_ERROR_RET to represent loading error register upon return (#100692 ) Follow-up to #100429. If a method has a `SwiftError*` out parameter, a new phase -- `fgAddSwiftErrorReturns` -- converts all `GT_RETURN` nodes into `GT_SWIFT_ERROR_RET` nodes; this new node type is a binop that takes the error value as its first operand, and the normal return value (if there is one) as its second operand. The error value is loaded into the Swift error register upon returning.	2024-04-12 15:26:00 -04:00
Aman Khalid	2c824aed7b	JIT: Allow helper calls that always throw to be marked as no-return (#100900 ) Fixes #100458 by addressing two issues: When flagging a call node as no-return with GTF_CALL_M_DOES_NOT_RETURN, we should always increment Compiler::optNoReturnCallCount to avoid asserts in Compiler::fgTailMergeThrows. Previously, we weren't doing this in a unified place, which seemed error-prone. When incrementing the optNoReturnCallCount member of an inlinee compiler, ensure this information is propagated to the inlinee's parent compiler. In a similar vein, if we try to inline a call, and the inlinee compiler determines it does not return, make sure we increment optNoReturnCallCount in the parent compiler object if the inline fails -- we've kept the call, and we now know it doesn't return. With these changes, I can now mark helper calls that always throw as no-return; this unblocks morph to convert BBJ_ALWAYS blocks with helper calls that throw into BBJ_THROW blocks, and has the nice side effect of improving the efficacy of throw merging. Since I was touching relevant code, I decided to improve our usage of GenTreeCall::IsHelperCall, too.	2024-04-12 15:06:35 -04:00
Egor Bogatov	674ba3fade	JIT: Track sideness of arrOp in GetCheckedBoundArithInfo (#100848 )	2024-04-10 19:32:06 +02:00
SingleAccretion	d9dcc58d95	Remove more `ASG` terminlogy from the codebase (#86760 ) * Excise 'assignment' terminology from the codebase * Standardize on the 'value' terminology for store operands But only in the frontend; backend has lots of "data"s, and it did not seem purposeful renaming them.	2024-04-05 11:57:14 -07:00
Filip Navara	41b1091890	Replace FEATURE_EH_FUNCLETS in JIT with runtime switch (#99191 ) * Replace FEATURE_EH_FUNCLETS/FEATURE_EH_CALLFINALLY_THUNKS in JIT with runtime switch * Cache Native AOT ABI check to see if TP improves --------- Co-authored-by: Bruce Forstall <brucefo@microsoft.com>	2024-04-05 11:46:18 -07:00
Bruce Forstall	862c82f0e9	Update JIT sources to clang-format/clang-tidy 17.0.6 (#100498 ) * Update JIT sources to clang-format/clang-tidy 17.0.6 * Reformat * Reformat x86	2024-04-03 14:43:36 -07:00
Bruce Forstall	c4796a3626	Convert asserts in CEEInfo::getStaticFieldContent() to 'if' checks (#100320 ) * Add additional checks to optimization of constant static field loads In `fgGetStaticFieldSeqAndAddress`, if we have a static field address altered by a tree of `ADD CNS_INS` nodes, we need to verify that the address is within the found field sequence. It might not be after shared constant CSE kicks in (e.g., under OptRepeat), where the sequence of ADDs might be the alter an arbitrary constant address from one type into the address of the static field of a different type. So we can't use the FieldSeq of the base address when considering the full offset. * Review feedback 1. Use `eeGetFieldName` / `eeGetClassName` return pointer 2. Only query extra metadata under `verbose \|\| JitConfig.EnableExtraSuperPmiQueries()` * Convert asserts in CEEInfo::getStaticFieldContent() to 'if' checks * Make fgGetStaticFieldSeqAndAddress static * Code review feedback	2024-03-27 13:36:40 -07:00
Jan Kotas	503970c14c	Simplify floating point mod and round math jit helpers (#100222 ) Co-authored-by: Michał Petryka <paprikapolishgamer@gmail.com>	2024-03-26 13:23:03 -07:00
Bruce Forstall	c1fc1eef94	Remove GetFoldedArithOpResultHandleFlags (#100060 ) For any constant arithmetic on a handle, lose the handle type: it's unreliable. Eliminates problems seen in https://github.com/dotnet/runtime/issues/100059	2024-03-21 09:50:05 -07:00
Alan Hayward	12d96ccfae	JIT ARM64-SVE: Allow LCL_VARs to store as mask (#99608 ) * JIT ARM64-SVE: Allow LCL_VARs to store as mask * Remove FEATURE_MASKED_SIMD * More generic ifdefs * Add varTypeIsSIMDOrMask * Add extra type checks * Fix use of isValidSimm9, and add extra uses * Rename mask conversion functions to gtNewSimdConvert* * Add OperIs functions * Mark untested uses of mov * Add INS_SCALABLE_OPTS_PREDICATE_DEST * Valuenum fixes for tier 1 * Remove importer changes * XARCH versions of OperIsConvertMaskToVector * Revert "Remove importer changes" This reverts commit b5502a6968c1304986f206ea6ac9de9d2fb63f7d. * Add tests fopr emitIns_S_R and emitIns_R_S * Fix formatting * Reapply "Remove importer changes" This reverts commit d8dea0e83c2318a4638d9beea11d3d188c2d5fa2. * Use dummy mask ldr and str * Refactor emitIns_S_R and emitIns_R_S * Move str_mask/ldr_mask * Fix formatting * Set imm * fix assert * Fix assert (2) * Fix assert (3) * nop	2024-03-21 09:38:45 -07:00
Bruce Forstall	ca905a2a34	Add basic support for `TYP_MASK` constants (#99743 ) This is to support fixing JitOptRepeat (https://github.com/dotnet/runtime/pull/94250). I was seeing failures in a Tensor test where `TYP_MASK` generating instructions were getting CSE'd. When OptRepeat kicks in and runs VN over the new IR, it wants to create a "zero" value for the new CSE locals. This change creates a `TYP_MASK` constant type, `simdmask_t`, like the pre-existing `simd64_t`, `simd32_t`, etc. `simdmask_t` is basically a `simd8_t` type, but with its own type. I expanded basically every place that generally handles `simd64_t` with `simdmask_t` support. This might be more than we currently need, but it seems to be a reasonable step towards making `TYP_MASK` more first-class. However, I didn't go so far as to support load/store of these constants, for example.	2024-03-14 10:14:18 -07:00
Aman Khalid	40d1c8975d	JIT: Use successor edges instead of block targets for remaining block kinds (#98993 ) Part of #93020. Replaces BasicBlock::bbTarget/bbFalseTarget/bbTrueTarget with FlowEdge* members.	2024-02-27 22:16:59 -05:00
Egor Bogatov	51b51bffbb	Remove GT_STORE_DYN_BLK (#98905 )	2024-02-27 11:31:21 +01:00
Aman Khalid	597d647327	JIT: Support storing Swift error register value to SwiftError struct (#98586 ) Adds JIT support for storing the error result of a Swift call to the provided SwiftError struct, assuming the caller passed a SwiftError* argument. The LSRA changes assume the presence of a SwiftError* argument in a Swift call indicates the error register may be trashed, and thus should be killed until consumed by GT_SWIFT_ERROR, a new GenTree node for representing the value of the error register post-Swift call. Similarly, these changes also assume the lack of a SwiftError* argument indicates the Swift call cannot throw, and thus will not trash the error register; thus, the Swift call should not block the register's usage.	2024-02-26 16:12:31 -05:00
Bruce Forstall	96bee8dcab	Fix handle dumping for AOT scenarios (#98728 ) Revert #97573 to previous behavior (not dumping handle strings) for NativeAOT and R2R compiles; those require more work to find the handle to use.	2024-02-21 09:54:24 -08:00
Egor Bogatov	8fb9f4b9fa	JIT: Fold more casts (#98528 )	2024-02-21 10:47:21 +01:00
Filip Navara	1be948c959	Prevent incorrect constant folding (#98561 ) * Prevent incorrect constant folding of binary operations involving handle and integer * Fix the conditions for null nodes * Fix the last commit for non-GT/GE/LT/LE * Make JIT format happy * More conservative approach. Limit only arithmetic operations involving handles in a relocatable code.	2024-02-19 19:30:48 +01:00
Egor Bogatov	0272fcc6bf	Fold "cast(cast(obj, cls), cls)" to "cast(obj, cls)" (#98337 ) Co-authored-by: Andy Ayers <andya@microsoft.com>	2024-02-15 20:55:35 +01:00
Bruce Forstall	b1d7ad6933	Display names of handles in dumps (#97573 ) For class/method/field handles, display their name in dumps in addition to their handle value. Also fixes a problem in assertion prop dumping where 64-bit class handle constants were truncated to 32-bit in dump.	2024-02-12 13:52:07 -08:00
Egor Bogatov	33e6c90c19	isinst(cls, null) -> null (#98284 )	2024-02-12 14:05:55 +01:00
Jeremy Koritzinsky	0ce3c32f6c	Remove CoreCLR math.h CRT PAL Redefines (#98048 )	2024-02-10 09:21:50 -08:00
Egor Bogatov	0a5e97f46f	Don't use checked write barriers for boxed statics (#98166 )	2024-02-09 12:12:35 +01:00
Egor Bogatov	765d8845db	BitCast<TYP_REF>(TypeHandleToRuntimeTypeHandle(clsHandle)) => nongc obj (#97955 )	2024-02-07 14:44:18 +01:00
Tanner Gooding	85b5eab2ff	Ensure that constant folding for SIMD shifts on xarch follow the correct behavior on overshift (#98001 ) * Ensure that constant folding for SIMD shifts on xarch follow the correct behavior on overshift * Ensure we test Sse2.IsSupported	2024-02-06 12:47:47 -08:00
Filip Navara	b7dcefe06d	[ARM] Use Math[F].Round implementation in managed code (#97964 ) * Delete all code for ARM [Float/Double]Round intrinsics * Revert "Delete all code for ARM [Float/Double]Round intrinsics" This reverts commit c4c5b3fc9e15238a683cd1a4971f5461099e7b46. * Start small and remove just the new managed impl and JIT code that generates CORINFO_HELP_FLTROUND/CORINFO_HELP_DBLROUND * Remove the references to non-existent JIT helpers * Apply code suggestion * Update src/coreclr/inc/readytorun.h Co-authored-by: Jan Kotas <jkotas@microsoft.com> --------- Co-authored-by: Jan Kotas <jkotas@microsoft.com>	2024-02-05 18:13:46 -08:00
Egor Bogatov	64822a667f	JIT: Assertprop improvements (#97908 )	2024-02-05 12:10:25 +01:00
Tanner Gooding	803afaad00	Update constant prop to only consider certain hwintrinsics (#97616 ) * Update constant prop to only consider certain hwintrinsics * Don't use gtFindLink unnecessarily * Apply formatting patch * Still allow constant propagation for single use locals * Apply formatting patch	2024-02-01 13:01:21 -08:00
Kunal Pathak	7989f18c48	[NativeAOT] Inline TLS access for windows/x64 (#89472 ) * wip * working model * wip * wip * working * Add helper for tlsIndex * add methods in superpmi * revert some local changes * misc fixes * Stop emitting TLS access code for windows/x64 * fix linux build errors * Do not throw not implemented for windows/x64 * fix the problem where ThreadStaticBase helper was still getting invoked * Revert certain changes from JIT method * Introduce getThreadLocalStaticInfo_ReadyToRun() * Consume getThreadLocalStaticInfo_ReadyToRun() * Remove getTlsRootInfo() and other methods * Revert unneeded changes * missing gtInitCldHnd initialization * save target address * jit format * run thunkgenerator * resolve merge conflicts * fix issues so the TLS is inlined * Rename data structures from _ReadyToRun to _NativeAOT * jit format * fix some unit test * fix a bug * fix the weird jump problem * use enclosing type cls handle for VN of static gc/non-gc helper * fix a bug of resetting the flag * useEnclosingTypeOnly from runtime to determine if VN should optimize it * do not use vnf, but only use useEnclosingTypeAsArg0 * Use GT_COMMA to add GCStaticBase call next to TLS call * optimize the cctor call * Remove lazy ctor generation from tls * Update jitinterface to not fetch data for lazy ctor * fix errors after merge * fix test build errors * fix bug in CSE * Use CORINFO_FLG_FIELD_INITCLASS instead of separate flag * Use the INITCLASS flag * Remove useEnclosingTypeOnly * Add NoCtor * Use CORINFO_HELP_READYTORUN_THREADSTATIC_BASE_NOCTOR * Minor cleanup * Renegenrate thunk * Add the SetFalseTarget * fix merge conflict resolution * better handling of GTF_ICON_SECREL_OFFSET better handling of GTF_ICON_SECREL_OFFSET * review feedback * Disable optimization for minopts * Add comments around iiaSecRel * jit format * create emitNewInstrCns() * Expand TLS even if optimization is disabled * Track t_inlinedThreadStaticBase Better tracking `t_inlinedThreadStaticBase` as TYP_REF * jit format	2024-01-17 12:31:31 -08:00
Bruce Forstall	83239262d5	Various no-diff cleanups and debugging additions/fixes (#96200 ) * Various no-diff cleanups and debugging additions 1. Add debugging cVN/dVN dumpers of value numbers 2. Fix new loop debugging dumpers on x86 3. Display the bbID, not just bbNum, in more places 4. For ValueNum handle constants, display the handle constant type 5. Fix some grammar nits and try to rewrite some comments to read better * Code review feedback Have function `GenTree::gtGetHandleKindString` to return handle string; let the caller print it.	2023-12-21 10:27:07 -08:00
Bruce Forstall	30c65646b9	Improve some BasicBlock asserts (#96231 ) 1. Ensure that the `bbTarget` field is never read except by block kinds for which `HasTarget()` is `true`. 2. Remove `BBJ_COND` from these kinds, since it now has its own true/false targets. 3. Add a `TransferTargets()` function which is like `CopyTargets()` but it takes memory ownership of the target descriptors for switch/ ehfinallyret which are then invalidated, instead of creating a new copy. 4. Stop using `JumpsToNext()` for `BBJ_COND`	2023-12-21 10:25:26 -08:00
Jakob Botsch Nielsen	c7a51fdaa4	JIT: Remove loop-related VN quirks (#95729 ) Some minor diffs expected from increased VN precision around newly recognized loops, which leads to different CSEs.	2023-12-13 11:02:43 +01:00
Aman Khalid	6c7e6e2e50	JIT: Add explicit successor for BBJ_COND false branch (#95773 ) This change refactors the BasicBlock API surface such that for BBJ_COND blocks, bbTrueTarget must be used in lieu of bbTarget, and bbFalseTarget must be used in lieu of bbNext (for now, BBJ_COND blocks still fall through into the next block if the false branch is taken, so bbFalseTarget is consistent with bbNext).	2023-12-11 14:49:40 -05:00
Jakob Botsch Nielsen	03c2d25ecb	JIT: Port VN and loop hoisting to new loop representation (#95589 ) Switch VN to use the new loop representation for the limited amount of stuff it does. This is mainly computing loop side effects and using it for some more precision, in addition to storing some memory dependencies around loops. It requires us to have a block -> loop mapping, so add a BlockToNaturalLoopMap class to do this. We really do not need this functionality for much, so we may consider seeing if we can remove it in the future (and remove BasicBlock::bbNatLoopNum). In loop hoisting move a few members out of LoopDsc and into LoopHoistContext; in particular the counts of hoisted variables, which we use for profitability checks and which gets updated while hoisting is going on for a single loop. We do not need to refer to the information from other loops. Record separate postorder numbers for the old and new DFS since we need to use both simultaneously after loop unrolling. We will be able to get rid of this again soon. A small number of diffs are expected because the loop side effects computation is now more precise, since the old loop code includes some blocks in loops that are not actually part of the loop. For example, blocks that always throw can be in the lexical range and would previously cause the side effect computation to believe there was a memory havoc. Also, the side effect computation does some limited value tracking of assignment, which is more precise now since it is running in RPO instead of based on loop order before.	2023-12-07 11:59:33 +01:00
Egor Bogatov	3805c174d0	Clean up GT_NOP (#95353 ) --------- Co-authored-by: SingleAccretion <62474226+SingleAccretion@users.noreply.github.com>	2023-11-30 12:04:07 +01:00
Egor Bogatov	0a5e211201	JIT: Remove questionable transformation in optNarrowTree (#95249 )	2023-11-28 19:47:27 +01:00
Jakob Botsch Nielsen	f106d7ecd1	JIT: Factor SSA's DFS and profile synthesis's loop finding (#95251 ) Factor out SSA's general DFS (that takes EH into account) and encapsulate it in a `FlowGraphDfsTree` class. Factor out profile synthesis's loop finding and encapsulate it in a `FlowGraphNaturalLoops` class. Switch construction of it to use the general DFS instead of the restricted one (that does not account for exceptional flow). Optimize a few things in the process: * Avoid storing loop blocks in a larger than necessary bit vector; store them starting from the loop header's postorder index instead. * Provide post-order and reverse post-order visitors for the loop blocks; switch profile synthesis to use this in a place No diffs are expected. A small amount of diffs are expected when profile synthesis is enabled due to the modelling of exceptional flow and also from handling unreachable predecessors (which would reject some loops as unnatural loops before). My future plans are to proceed to replace the loop representation of loops with this factored version, removing the lexicality requirement in the process, and hopefully fixing some of our deficiencies.	2023-11-28 12:02:21 +01:00
Jakob Botsch Nielsen	f44b4d17c4	JIT: Filter out a few more phi args in VN (#94699 ) I realized we could be a bit more precise when VN'ing phis.	2023-11-14 19:10:37 +01:00

1 2 3 4 5 ...

276 commits