1
0
Fork 0
mirror of https://github.com/VSadov/Satori.git synced 2025-06-12 02:30:29 +09:00
Commit graph

276 commits

Author SHA1 Message Date
Tanner Gooding
6d3cb53af9
Decompose some bitwise operations in HIR to allow more overall optimizations to kick in (#104517)
* Decompose some bitwise operations in HIR to allow more overall optimizations to kick in

* Ensure that we actually remove the underlying op

* Ensure the AND_NOT decomposition is still folded during import for minopts

* Ensure we propagate AllBitsSet into simd GT_XOR on xarch

* Ensure that we prefer AndNot over TernaryLogic

* Cleanup the TernaryLogic lowering code

* Ensure that TernaryLogic picks the best operand for containment

* Ensure we swap the operands that are being checked for containment

* Ensure that TernaryLogic is simplified where possible

* Apply formatting patch
2024-07-13 07:01:55 -07:00
Tanner Gooding
4addcaa7e3
Add some helper functions for getting the intrinsic ID to use for a given oper (#104498)
* Add some helper functions for getting the intrinsic ID to use for a given oper

* Make the Unix build happy

* Make the Arm64 build happy

* Respond to PR feedback

* Ensure we don't use EVEX unnecessarily

* Ensure zero diffs for x64
2024-07-06 17:35:46 -07:00
Andy Ayers
53a8a01fe1
Stack allocate unescaped boxes (#103361)
Enable object stack allocation for ref classes and extend the support to include boxed value classes. Use a specialized unbox helper for stack allocated boxes, both to avoid apparent escape of the box by the helper, and to ensure all box field accesses are visible to the JIT. Update the local address visitor to rewrite trees involving address of stack allocated boxes in some cases to avoid address exposure. Disable old promotion for stack allocated boxes (since we have no field handles) and allow physical promotion to enregister the box method table and/or payload as appropriate. In OSR methods handle the fact that the stack allocation may actually have been a heap allocation by the Tier0 method.

The analysis TP cost is around 0.4-0.7% (notes below). Boxes are much less likely to escape than ref classes (roughly ~90% of boxes escape, ~99.8% of ref classes escape). Codegen impact is diminished somewhat because many of the boxes are dead and were already getting optimized away.
 
Fixes #4584, #9118, #10195, #11192, #53585, #58554, #85570

---------

Co-authored-by: Jakob Botsch Nielsen <jakob.botsch.nielsen@gmail.com>
Co-authored-by: Jan Kotas <jkotas@microsoft.com>
2024-07-01 06:54:49 -07:00
Tanner Gooding
fcdb6dba4c
Centralize the folding logic for ConditionalSelect and ensure side effects aren't dropped (#104175)
* Centralize the folding logic for ConditionalSelect and ensure side effects aren't dropped

* Ensure CndSel handles 64-bit operands where possible

* Don't fold if op3 has side effects

* Ensure that operands are passed into the lowered not->ternarylogic correctly
2024-06-30 09:28:46 -07:00
Mikhail Ablakatov
f21612abc1
Enbale TYP_MASK support for ARM64 (#103818)
* Enbale TYP_MASK support for ARM64

* cleanup: check for a FEATURE macro instead of TARGET

* jit format

---------

Co-authored-by: Kunal Pathak <Kunal.Pathak@microsoft.com>
2024-06-21 17:32:08 -07:00
Tanner Gooding
d7ae8c61f0
Add basic support for folding and normalizing hwintrinsic trees in morph (#103143)
* Add basic support for folding hwintrinsic trees in morph

* Reduce the amount of copying required to evaluated vector constants

* Have gtFoldExprHWIntrinsic handle side effects
2024-06-13 14:45:50 -07:00
David Wrighton
eb8f54d92b
Make normal statics simpler (#99183)
This change makes access to statics much simpler to document and also removes some performance penalties that we've had for a long time due to the old model. Most statics access should be equivalent or faster.

This change converts static variables from a model where statics are associated with the module that defined the metadata of the static to a model where each individual type allocates its statics independently. In addition, it moves the flags that indicate whether or not a type is initialized, and whether or not its statics have been allocated to the `MethodTable` structures instead of storing them in a `DomainLocalModule` as was done before.

# Particularly notable changes
- All statics are now considered "dynamic" statics.
- Statics for collectible assemblies now have an identical path for lookup of the static variable addresses as compared to statics for non-collectible assemblies. It is now reasonable for the process of reading static variables to be inlined into shared generic code, although this PR does not attempt to do so.
- Lifetime management for collectible non-thread local statics is managed via a combination of a `LOADERHANDLE` to keep the static alive, and a new handle type called a `HNDTYPE_WEAK_INTERIOR_POINTER` which will keep the pointers to managed objects in the `MethodTable` structures up to date with the latest addresses of the static variables.
- Each individual type in thread statics has a unique object holding the statics for the type. This means that each type has a separate object[](for gc statics), and/or double[](for non-gc statics) per thread for TLS statics. This isn't necessarily ideal for non-collectible types, but its not terrible either.
- Thread statics for collectible types are reported directly to the GC instead of being handled via a GCHandle. While needed to avoid complex lifetime rules for collectible types, this may not be ideal for non-collectable types.
- Since the `DomainLocalModule` no longer exists, the `ISOSDacInterface` has been augmented with a new api called `ISOSDacInterface14` which adds the ability to query for the static base/initialization status of an individual type directly.
- Significant changes for generated code include
  - All the helpers are renamed
  - The statics of generics which have not yet been initialized can now be referenced using a single constant pointer + a helper call instead of needing a pair of pointers. In practice, this was a rare condition in perf-critical code due to the presence of tiered compilation, so this is not a significant change to optimized code.
  - The pre-initialization of statics can now occur for types which have non-primitive valuetype statics as long as the type does not have a class constructor.
  - Thread static non-gc statics are now returned as byrefs. (It turns out that for collectible assemblies, there is currently a small GC hole if a function returns the address of a non-gc threadstatic. CoreCLR at this time does not attempt to keep the collectible assembly alive if that is the only live pointer to the collectible static in the system)

With this change, the pointers to normal static data are located at a fixed offset from the start of the `MethodTableAuxiliaryData`, and indices for Thread Static variables are stored also stored in such a fixed offset. Concepts such as the `DomainLocalModule` , `ThreadLocalModule`, `ModuleId` and `ModuleIndex` no longer exist.

# Lifetime management for collectible statics
- For normal collectible statics, each type will allocate a separate object[] for the GC statics and a double[] for the non-GC statics. A pointer to the data of these arrays will be stored in the `DynamicStaticsInfo` structure, and when relocation occurs, if the collectible types managed `LoaderAllocator` is still alive, the static field address will be relocated if the object moves. This is done by means of the new Weak Interior Pointer GC handle type. 
- For collectible thread-local statics, the lifetime management is substantially more complicated due the issue that it is possible for either a thread or a collectible type to be collected first. Thus the collection algorithm is as follows.
  - The system shall maintain a global mapping of TLS indices to MethodTable structures
  - When a native `LoaderAllocator` is being cleaned up, before the WeakTrackResurrection GCHandle that points at the the managed `LoaderAllocator` object is destroyed, the mapping from TLS indices to collectible `LoaderAllocator` structures shall be cleared of all relevant entries (and the current GC index shall be stored in the TLS to MethodTable mapping)
  - When a GC promotion or collection scan occurs, for every TLS index which was freed to point at a GC index the relevant entry in the TLS table shall be set to NULL in preparation for that entry in the table being reused in the future. In addition, if the TLS index refers to a `MethodTable` which is in a collectible assembly, and the associated `LoaderAllocator` has been freed, then set the relevant entry to NULL.
  - When allocating new entries from the TLS mapping table for new collectible thread local structures, do not re-use an entry in the table until at least 2 GCs have occurred. This is to allow every thread to have NULL'd out the relevant entry in its thread local table.
  - When allocating new TLS entries for collectible TLS statics on a per-thread basis allocate a `LOADERHANDLE` for each object allocated, and associate it with the TLS index on that thread.
  - When cleaning up a thread, for each collectible thread static which is still allocated, we will have a `LOADERHANDLE`. If the collectible type still has a live managed `LoaderAllocator` free the `LOADERHANDLE`.

# Expected cost model for extra GC interactions associated with this change
This change adds 3 possible ways in which the GC may have to perform additional work beyond what it used to do.
1. For normal statics on collectible types, it uses the a weak interior pointer GC handle for each of these that is allocated. This is purely pay for play and trades off performance of accessing collectible statics at runtime to the cost of maintaining a GCHandle in the GC. As the number of statics increases, this could in theory become a performance problem, but given the typical usages of collectible assemblies, we do not expect this to be significant.
2. For non-collectible thread statics, there is 1 GC pointer that is unconditionally reported for each thread. Usage of this removes a single indirection from every non-collectible thread local access. Given that this pointer is reported unconditionally, and is only a single pointer, this is not expected to be a significant cost.
3. For collectible thread statics, there is a complex protocol to keep thread statics alive for just long enough, and to clean them up as needed. This is expected to be completely pay for play with regard to usage of thread local variables in collectible assemblies, and while slightly more expensive to run than the current logic, will reduce the cost of creation/destruction of threads by a much more significant factor. In addition, if there are no collectible thread statics used on the thread, the cost of this is only a few branches per lookup.

# Perf impact of this change
I've run the .NET Microbenchmark suite as well as a variety of ASP.NET Benchmarks. (Unfortunately the publicly visible infrastructure for running tests is incompatible with this change, so results are not public). The results are generally quite hard to interpret. ASP.NET Benchmarks are generally (very) slightly better, and the microbenchmarks are generally equivalent in performance, although there is variability in some tests that had not previously shown variability, and the differences in performance are contained within the margin of error in our perf testing for tests with any significant amount of code. When performance differences have been examined in detail, they tend to be in code which has not changed in any way due to this change, and when run in isolation the performance deltas have disappeared in all cases that I have examined. Thus, I assume they are caching side effect changes. Performance testing has led me to add a change such that all NonGC, NonCollectible statics are allocated in a separate LoaderHeap which appears to have reduced the variability in some of the tests by a small fraction, although results are not consistent enough for me to be extremely confident in that statement.
2024-06-12 20:54:31 -07:00
Tanner Gooding
2c540e5c55
Adding some constant folding support for basic floating-point operations (#103206)
* Adding some constant folding support for basic floating-point operations

* Use gtWrapWithSideEffects and respond to PR feedback

* Make sure we set DEBUG_NODE_MORPHED on the comma
2024-06-12 19:12:15 -07:00
Tanner Gooding
96be3e2e81
Share more of the TYP_MASK handling and support rewriting TYP_MASK operands in rationalization (#103288)
* Share more of the TYP_MASK handling and support rewriting TYP_MASK operands in rationalization

* Ensure we pass in TYP_MASK, not the simdType

* Apply formatting patch

* Fix copy/paste error, pass in clsHnd for the argument

* Ensure that we normalize sigType before inserting the CvtMaskToVectorNode

* Ensure that we get the vector node on Arm64 (ConvertVectorToMask has 2 ops)
2024-06-12 00:01:54 -07:00
Khushal Modi
b5948bf403
AVX10.1 API introduction in JIT (#101938)
* Add AVX10v1 API surface

* Define HWINTRINSIC for AVX10v1, AVX10v1_V256 and AVX10v1_V512

* Setup template testing for AVX10v1 APIs

* Handle AVX10v1 APIs in JIT where equivalent AVX512* APIs are handled

* Merge Avx10v1 and Avx10v1.V256. Rename Avx10.cs to Avx10v1.cs

* Add Avx10v1 to relevant places

* Fix CI errors. Add missing API in Avx10v1.PlatofrmNotSupported ad end line with a new character

* Changes to be made with latest changes on main. Make appropriate comments. Update tests in template testing for Avx10v1

* Lower AVX10v1 hwintrinsic in lowering and gentree.cpp for simdSize 32/16

* Fix failures on GNR for AVX10v1

* Disable template tests disabled for Avx512

* Distinguish between Avx10v1 and Avx10v1/512, Add appropriate comments and clean up code in lowerCast

* Remove duplicate code and rather use a single if condition

* Use bool instead of compIsa checks where possible

* remove duplication of code in shuffle

* resolve review comments. Make evex encoding checks clear to read and resolve a bug in gtNewSimdCvtNode

* Add FMA and Avx512F.X64 instructions to AVX10v1. Restructure code and compOpportunistic checks

* Combine compOpportunistic checks with Avx10 check using IsAvx10OrIsaSupportedOpportunistically

* Introduce a new internal ISA InstructionSet_EVEX and remove InstructionSet_AVX10v1_V256 to make space for the new ISA. Also change all the internal special intrinsic nodes for Avx512F on x86/x64 arch to evex nodes

* Addressing review comments. resolving errors introduced when merged with main

* fix formatting

* Reorder declaration of InstructionSet_EVEX to proper position. Run formatting adn resolve errors introduced when merging with main
2024-06-08 20:03:38 -07:00
Jakob Botsch Nielsen
e867965446
JIT: Propagate LCL_ADDR nodes during local morph (#102808)
This changes local morph to run in RPO when optimizations are enabled. It adds
infrastructure to track and propagate LCL_ADDR values assigned to locals during
local morph. This allows us to avoid address exposure in cases where the
destination local does not actually end up escaping in any way.

Example:
```csharp
public struct Awaitable
{
    public int Opts;

    public Awaitable(bool value)
    {
        Opts = value ? 1 : 2;
    }
}

[MethodImpl(MethodImplOptions.NoInlining)]
public static int Test() => new Awaitable(false).Opts;
```

Before:
```asm
G_M59043_IG01:  ;; offset=0x0000
       push     rax
						;; size=1 bbWeight=1 PerfScore 1.00

G_M59043_IG02:  ;; offset=0x0001
       xor      eax, eax
       mov      dword ptr [rsp], eax
       mov      dword ptr [rsp], 2
       mov      eax, dword ptr [rsp]
						;; size=15 bbWeight=1 PerfScore 3.25

G_M59043_IG03:  ;; offset=0x0010
       add      rsp, 8
       ret
						;; size=5 bbWeight=1 PerfScore 1.25
; Total bytes of code: 21

```

After:
```asm
G_M59043_IG02:  ;; offset=0x0000
       mov      eax, 2
						;; size=5 bbWeight=1 PerfScore 0.25

G_M59043_IG03:  ;; offset=0x0005
       ret
```

Propagating the addresses works much like local assertion prop in morph does.
Proving that the locals that were stored to do not escape afterwards is done
with a simplistic approach: we check globally that no reads of the locals
exists, and if so, we replace the `LCL_ADDR` stored to them by a constant 0. We
leave it up to liveness to clean up the stores themselves.

If we were able to remove any `LCL_ADDR` in this way then we run an additional
pass over the locals of the IR to compute the final set of exposed locals.

This could be more sophisticated, but in practice this handles the reported
cases just fine.

Fix #87072
Fix #102273
Fix #102518

This is still not sufficient to handle #69254. To handle that we would need more
support around tracking the values of struct fields, and handling of promoted
fields. This PR currently does not handle promoted fields at all; we use
`lvHasLdAddrOp` as a conservative approximation of address exposure on the
destination locals, and promoted structs almost always have this set. If we were
to handle promoted fields we would need some other way to determine that a
destination holding a local address couldn't be exposed.
2024-05-31 21:06:40 +02:00
Egor Bogatov
4326eb7ed4
Fix unnecessary bounds check with ulong index (#101352) 2024-05-01 14:50:21 +02:00
Jakob Botsch Nielsen
4965f2160d
JIT: Model JIT helper exceptions correctly in VN (#101062)
The JIT currently models all exceptions thrown by helpers with a
singleton VN. This can cause CSE to remove exception side effects
incorrectly.

This change starts modelling exceptions thrown by the following helpers
accurately:
- The R2R cast helper
- Division helpers
- Static constructor base helpers

Remaining JIT helpers are modelled conservatively, with an opaque VN in
the exception part.

Contributes to #63559
Fix #101028
2024-04-17 16:54:51 +02:00
Aman Khalid
ec7648702c
JIT: Add GT_SWIFT_ERROR_RET to represent loading error register upon return (#100692)
Follow-up to #100429. If a method has a `SwiftError*` out parameter, a new phase -- `fgAddSwiftErrorReturns` -- converts all `GT_RETURN` nodes into `GT_SWIFT_ERROR_RET` nodes; this new node type is a binop that takes the error value as its first operand, and the normal return value (if there is one) as its second operand. The error value is loaded into the Swift error register upon returning.
2024-04-12 15:26:00 -04:00
Aman Khalid
2c824aed7b
JIT: Allow helper calls that always throw to be marked as no-return (#100900)
Fixes #100458 by addressing two issues:

When flagging a call node as no-return with GTF_CALL_M_DOES_NOT_RETURN, we should always increment Compiler::optNoReturnCallCount to avoid asserts in Compiler::fgTailMergeThrows. Previously, we weren't doing this in a unified place, which seemed error-prone.
When incrementing the optNoReturnCallCount member of an inlinee compiler, ensure this information is propagated to the inlinee's parent compiler. In a similar vein, if we try to inline a call, and the inlinee compiler determines it does not return, make sure we increment optNoReturnCallCount in the parent compiler object if the inline fails -- we've kept the call, and we now know it doesn't return.
With these changes, I can now mark helper calls that always throw as no-return; this unblocks morph to convert BBJ_ALWAYS blocks with helper calls that throw into BBJ_THROW blocks, and has the nice side effect of improving the efficacy of throw merging. Since I was touching relevant code, I decided to improve our usage of GenTreeCall::IsHelperCall, too.
2024-04-12 15:06:35 -04:00
Egor Bogatov
674ba3fade
JIT: Track sideness of arrOp in GetCheckedBoundArithInfo (#100848) 2024-04-10 19:32:06 +02:00
SingleAccretion
d9dcc58d95
Remove more ASG terminlogy from the codebase (#86760)
* Excise 'assignment' terminology from the codebase

* Standardize on the 'value' terminology for store operands

But only in the frontend; backend has lots of "data"s, and it did not seem purposeful renaming them.
2024-04-05 11:57:14 -07:00
Filip Navara
41b1091890
Replace FEATURE_EH_FUNCLETS in JIT with runtime switch (#99191)
* Replace FEATURE_EH_FUNCLETS/FEATURE_EH_CALLFINALLY_THUNKS in JIT with runtime switch

* Cache Native AOT ABI check to see if TP improves

---------

Co-authored-by: Bruce Forstall <brucefo@microsoft.com>
2024-04-05 11:46:18 -07:00
Bruce Forstall
862c82f0e9
Update JIT sources to clang-format/clang-tidy 17.0.6 (#100498)
* Update JIT sources to clang-format/clang-tidy 17.0.6

* Reformat

* Reformat x86
2024-04-03 14:43:36 -07:00
Bruce Forstall
c4796a3626
Convert asserts in CEEInfo::getStaticFieldContent() to 'if' checks (#100320)
* Add additional checks to optimization of constant static field loads

In `fgGetStaticFieldSeqAndAddress`, if we have a static field address
altered by a tree of `ADD CNS_INS` nodes, we need to verify that the
address is within the found field sequence. It might not be after
shared constant CSE kicks in (e.g., under OptRepeat), where the
sequence of ADDs might be the alter an arbitrary constant address from
one type into the address of the static field of a different type.
So we can't use the FieldSeq of the base address when considering
the full offset.

* Review feedback

1. Use `eeGetFieldName` / `eeGetClassName` return pointer
2. Only query extra metadata under `verbose || JitConfig.EnableExtraSuperPmiQueries()`

* Convert asserts in CEEInfo::getStaticFieldContent() to 'if' checks

* Make fgGetStaticFieldSeqAndAddress static

* Code review feedback
2024-03-27 13:36:40 -07:00
Jan Kotas
503970c14c
Simplify floating point mod and round math jit helpers (#100222)
Co-authored-by: Michał Petryka <paprikapolishgamer@gmail.com>
2024-03-26 13:23:03 -07:00
Bruce Forstall
c1fc1eef94
Remove GetFoldedArithOpResultHandleFlags (#100060)
For any constant arithmetic on a handle, lose the handle type:
it's unreliable.

Eliminates problems seen in https://github.com/dotnet/runtime/issues/100059
2024-03-21 09:50:05 -07:00
Alan Hayward
12d96ccfae
JIT ARM64-SVE: Allow LCL_VARs to store as mask (#99608)
* JIT ARM64-SVE: Allow LCL_VARs to store as mask

* Remove FEATURE_MASKED_SIMD

* More generic ifdefs

* Add varTypeIsSIMDOrMask

* Add extra type checks

* Fix use of isValidSimm9, and add extra uses

* Rename mask conversion functions to gtNewSimdConvert*

* Add OperIs functions

* Mark untested uses of mov

* Add INS_SCALABLE_OPTS_PREDICATE_DEST

* Valuenum fixes for tier 1

* Remove importer changes

* XARCH versions of OperIsConvertMaskToVector

* Revert "Remove importer changes"

This reverts commit b5502a6968c1304986f206ea6ac9de9d2fb63f7d.

* Add tests fopr emitIns_S_R and emitIns_R_S

* Fix formatting

* Reapply "Remove importer changes"

This reverts commit d8dea0e83c2318a4638d9beea11d3d188c2d5fa2.

* Use dummy mask ldr and str

* Refactor emitIns_S_R and emitIns_R_S

* Move str_mask/ldr_mask

* Fix formatting

* Set imm

* fix assert

* Fix assert (2)

* Fix assert (3)

* nop
2024-03-21 09:38:45 -07:00
Bruce Forstall
ca905a2a34
Add basic support for TYP_MASK constants (#99743)
This is to support fixing JitOptRepeat (https://github.com/dotnet/runtime/pull/94250).
I was seeing failures in a Tensor test where `TYP_MASK`
generating instructions were getting CSE'd. When OptRepeat kicks in
and runs VN over the new IR, it wants to create a "zero" value
for the new CSE locals.

This change creates a `TYP_MASK` constant type, `simdmask_t`, like the
pre-existing `simd64_t`, `simd32_t`, etc. `simdmask_t` is basically a
`simd8_t` type, but with its own type. I expanded basically every place
that generally handles `simd64_t` with `simdmask_t` support. This might be
more than we currently need, but it seems to be a reasonable step towards
making `TYP_MASK` more first-class. However, I didn't go so far as to
support load/store of these constants, for example.
2024-03-14 10:14:18 -07:00
Aman Khalid
40d1c8975d
JIT: Use successor edges instead of block targets for remaining block kinds (#98993)
Part of #93020. Replaces BasicBlock::bbTarget/bbFalseTarget/bbTrueTarget with FlowEdge* members.
2024-02-27 22:16:59 -05:00
Egor Bogatov
51b51bffbb
Remove GT_STORE_DYN_BLK (#98905) 2024-02-27 11:31:21 +01:00
Aman Khalid
597d647327
JIT: Support storing Swift error register value to SwiftError struct (#98586)
Adds JIT support for storing the error result of a Swift call to the provided SwiftError struct, assuming the caller passed a SwiftError* argument. The LSRA changes assume the presence of a SwiftError* argument in a Swift call indicates the error register may be trashed, and thus should be killed until consumed by GT_SWIFT_ERROR, a new GenTree node for representing the value of the error register post-Swift call. Similarly, these changes also assume the lack of a SwiftError* argument indicates the Swift call cannot throw, and thus will not trash the error register; thus, the Swift call should not block the register's usage.
2024-02-26 16:12:31 -05:00
Bruce Forstall
96bee8dcab
Fix handle dumping for AOT scenarios (#98728)
Revert #97573 to previous behavior (not dumping handle strings) for
NativeAOT and R2R compiles; those require more work to find the
handle to use.
2024-02-21 09:54:24 -08:00
Egor Bogatov
8fb9f4b9fa
JIT: Fold more casts (#98528) 2024-02-21 10:47:21 +01:00
Filip Navara
1be948c959
Prevent incorrect constant folding (#98561)
* Prevent incorrect constant folding of binary operations involving handle and integer

* Fix the conditions for null nodes

* Fix the last commit for non-GT/GE/LT/LE

* Make JIT format happy

* More conservative approach. Limit only arithmetic operations involving handles in a relocatable code.
2024-02-19 19:30:48 +01:00
Egor Bogatov
0272fcc6bf
Fold "cast(cast(obj, cls), cls)" to "cast(obj, cls)" (#98337)
Co-authored-by: Andy Ayers <andya@microsoft.com>
2024-02-15 20:55:35 +01:00
Bruce Forstall
b1d7ad6933
Display names of handles in dumps (#97573)
For class/method/field handles, display their name in dumps
in addition to their handle value.

Also fixes a problem in assertion prop dumping where 64-bit class
handle constants were truncated to 32-bit in dump.
2024-02-12 13:52:07 -08:00
Egor Bogatov
33e6c90c19
isinst(cls, null) -> null (#98284) 2024-02-12 14:05:55 +01:00
Jeremy Koritzinsky
0ce3c32f6c
Remove CoreCLR math.h CRT PAL Redefines (#98048) 2024-02-10 09:21:50 -08:00
Egor Bogatov
0a5e97f46f
Don't use checked write barriers for boxed statics (#98166) 2024-02-09 12:12:35 +01:00
Egor Bogatov
765d8845db
BitCast<TYP_REF>(TypeHandleToRuntimeTypeHandle(clsHandle)) => nongc obj (#97955) 2024-02-07 14:44:18 +01:00
Tanner Gooding
85b5eab2ff
Ensure that constant folding for SIMD shifts on xarch follow the correct behavior on overshift (#98001)
* Ensure that constant folding for SIMD shifts on xarch follow the correct behavior on overshift

* Ensure we test Sse2.IsSupported
2024-02-06 12:47:47 -08:00
Filip Navara
b7dcefe06d
[ARM] Use Math[F].Round implementation in managed code (#97964)
* Delete all code for ARM [Float/Double]Round intrinsics

* Revert "Delete all code for ARM [Float/Double]Round intrinsics"

This reverts commit c4c5b3fc9e15238a683cd1a4971f5461099e7b46.

* Start small and remove just the new managed impl and JIT code that generates CORINFO_HELP_FLTROUND/CORINFO_HELP_DBLROUND

* Remove the references to non-existent JIT helpers

* Apply code suggestion

* Update src/coreclr/inc/readytorun.h

Co-authored-by: Jan Kotas <jkotas@microsoft.com>

---------

Co-authored-by: Jan Kotas <jkotas@microsoft.com>
2024-02-05 18:13:46 -08:00
Egor Bogatov
64822a667f
JIT: Assertprop improvements (#97908) 2024-02-05 12:10:25 +01:00
Tanner Gooding
803afaad00
Update constant prop to only consider certain hwintrinsics (#97616)
* Update constant prop to only consider certain hwintrinsics

* Don't use gtFindLink unnecessarily

* Apply formatting patch

* Still allow constant propagation for single use locals

* Apply formatting patch
2024-02-01 13:01:21 -08:00
Kunal Pathak
7989f18c48
[NativeAOT] Inline TLS access for windows/x64 (#89472)
* wip

* working model

* wip

* wip

* working

* Add helper for tlsIndex

* add methods in superpmi

* revert some local changes

* misc fixes

* Stop emitting TLS access code for windows/x64

* fix linux build errors

* Do not throw not implemented for windows/x64

* fix the problem where ThreadStaticBase helper was still getting invoked

* Revert certain changes from JIT method

* Introduce getThreadLocalStaticInfo_ReadyToRun()

* Consume getThreadLocalStaticInfo_ReadyToRun()

* Remove getTlsRootInfo() and other methods

* Revert unneeded changes

* missing gtInitCldHnd initialization

* save target address

* jit format

* run thunkgenerator

* resolve merge conflicts

* fix issues so the TLS is inlined

* Rename data structures from *_ReadyToRun to *_NativeAOT

* jit format

* fix some unit test

* fix a bug

* fix the weird jump problem

* use enclosing type cls handle for VN of static gc/non-gc helper

* fix a bug of resetting the flag

* useEnclosingTypeOnly from runtime to determine if VN should optimize it

* do not use vnf, but only use useEnclosingTypeAsArg0

* Use GT_COMMA to add GCStaticBase call next to TLS call

* optimize the cctor call

* Remove lazy ctor generation from tls

* Update jitinterface to not fetch data for lazy ctor

* fix errors after merge

* fix test build errors

* fix bug in CSE

* Use CORINFO_FLG_FIELD_INITCLASS instead of separate flag

* Use the INITCLASS flag

* Remove useEnclosingTypeOnly

* Add NoCtor

* Use CORINFO_HELP_READYTORUN_THREADSTATIC_BASE_NOCTOR

* Minor cleanup

* Renegenrate thunk

* Add the SetFalseTarget

* fix merge conflict resolution

* better handling of GTF_ICON_SECREL_OFFSET

better handling of GTF_ICON_SECREL_OFFSET

* review feedback

* Disable optimization for minopts

* Add comments around iiaSecRel

* jit format

* create emitNewInstrCns()

* Expand TLS even if optimization is disabled

* Track t_inlinedThreadStaticBase

Better tracking `t_inlinedThreadStaticBase` as TYP_REF

* jit format
2024-01-17 12:31:31 -08:00
Bruce Forstall
83239262d5
Various no-diff cleanups and debugging additions/fixes (#96200)
* Various no-diff cleanups and debugging additions

1. Add debugging cVN/dVN dumpers of value numbers
2. Fix new loop debugging dumpers on x86
3. Display the bbID, not just bbNum, in more places
4. For ValueNum handle constants, display the handle constant type
5. Fix some grammar nits and try to rewrite some comments to read better

* Code review feedback

Have function `GenTree::gtGetHandleKindString` to return handle string;
let the caller print it.
2023-12-21 10:27:07 -08:00
Bruce Forstall
30c65646b9
Improve some BasicBlock asserts (#96231)
1. Ensure that the `bbTarget` field is never read except by
block kinds for which `HasTarget()` is `true`.
2. Remove `BBJ_COND` from these kinds, since it now has its
own true/false targets.
3. Add a `TransferTargets()` function which is like `CopyTargets()`
but it takes memory ownership of the target descriptors for switch/
ehfinallyret which are then invalidated, instead of creating a new copy.
4. Stop using `JumpsToNext()` for `BBJ_COND`
2023-12-21 10:25:26 -08:00
Jakob Botsch Nielsen
c7a51fdaa4
JIT: Remove loop-related VN quirks (#95729)
Some minor diffs expected from increased VN precision around newly
recognized loops, which leads to different CSEs.
2023-12-13 11:02:43 +01:00
Aman Khalid
6c7e6e2e50
JIT: Add explicit successor for BBJ_COND false branch (#95773)
This change refactors the BasicBlock API surface such that for BBJ_COND blocks, bbTrueTarget must be used in lieu of bbTarget, and bbFalseTarget must be used in lieu of bbNext (for now, BBJ_COND blocks still fall through into the next block if the false branch is taken, so bbFalseTarget is consistent with bbNext).
2023-12-11 14:49:40 -05:00
Jakob Botsch Nielsen
03c2d25ecb
JIT: Port VN and loop hoisting to new loop representation (#95589)
Switch VN to use the new loop representation for the limited amount of
stuff it does. This is mainly computing loop side effects and using it
for some more precision, in addition to storing some memory dependencies
around loops.
It requires us to have a block -> loop mapping, so add a
BlockToNaturalLoopMap class to do this. We really do not need this
functionality for much, so we may consider seeing if we can remove it in
the future (and remove BasicBlock::bbNatLoopNum).

In loop hoisting move a few members out of LoopDsc and into
LoopHoistContext; in particular the counts of hoisted variables, which
we use for profitability checks and which gets updated while hoisting is
going on for a single loop. We do not need to refer to the information
from other loops.

Record separate postorder numbers for the old and new DFS since we need
to use both simultaneously after loop unrolling. We will be able to get
rid of this again soon.

A small number of diffs are expected because the loop side effects
computation is now more precise, since the old loop code includes some
blocks in loops that are not actually part of the loop. For example,
blocks that always throw can be in the lexical range and would
previously cause the side effect computation to believe there was a
memory havoc. Also, the side effect computation does some limited value
tracking of assignment, which is more precise now since it is running in
RPO instead of based on loop order before.
2023-12-07 11:59:33 +01:00
Egor Bogatov
3805c174d0
Clean up GT_NOP (#95353)
---------

Co-authored-by: SingleAccretion <62474226+SingleAccretion@users.noreply.github.com>
2023-11-30 12:04:07 +01:00
Egor Bogatov
0a5e211201
JIT: Remove questionable transformation in optNarrowTree (#95249) 2023-11-28 19:47:27 +01:00
Jakob Botsch Nielsen
f106d7ecd1
JIT: Factor SSA's DFS and profile synthesis's loop finding (#95251)
Factor out SSA's general DFS (that takes EH into account) and
encapsulate it in a `FlowGraphDfsTree` class.

Factor out profile synthesis's loop finding and encapsulate it in a
`FlowGraphNaturalLoops` class. Switch construction of it to use the
general DFS instead of the restricted one (that does not account for
exceptional flow).

Optimize a few things in the process:
* Avoid storing loop blocks in a larger than necessary bit vector; store
  them starting from the loop header's postorder index instead.
* Provide post-order and reverse post-order visitors for the loop
  blocks; switch profile synthesis to use this in a place

No diffs are expected. A small amount of diffs are expected when profile
synthesis is enabled due to the modelling of exceptional flow and also
from handling unreachable predecessors (which would reject some loops as
unnatural loops before).

My future plans are to proceed to replace the loop representation of
loops with this factored version, removing the lexicality requirement in
the process, and hopefully fixing some of our deficiencies.
2023-11-28 12:02:21 +01:00
Jakob Botsch Nielsen
f44b4d17c4
JIT: Filter out a few more phi args in VN (#94699)
I realized we could be a bit more precise when VN'ing phis.
2023-11-14 19:10:37 +01:00