This change makes access to statics much simpler to document and also removes some performance penalties that we've had for a long time due to the old model. Most statics access should be equivalent or faster.
This change converts static variables from a model where statics are associated with the module that defined the metadata of the static to a model where each individual type allocates its statics independently. In addition, it moves the flags that indicate whether or not a type is initialized, and whether or not its statics have been allocated to the `MethodTable` structures instead of storing them in a `DomainLocalModule` as was done before.
# Particularly notable changes
- All statics are now considered "dynamic" statics.
- Statics for collectible assemblies now have an identical path for lookup of the static variable addresses as compared to statics for non-collectible assemblies. It is now reasonable for the process of reading static variables to be inlined into shared generic code, although this PR does not attempt to do so.
- Lifetime management for collectible non-thread local statics is managed via a combination of a `LOADERHANDLE` to keep the static alive, and a new handle type called a `HNDTYPE_WEAK_INTERIOR_POINTER` which will keep the pointers to managed objects in the `MethodTable` structures up to date with the latest addresses of the static variables.
- Each individual type in thread statics has a unique object holding the statics for the type. This means that each type has a separate object[](for gc statics), and/or double[](for non-gc statics) per thread for TLS statics. This isn't necessarily ideal for non-collectible types, but its not terrible either.
- Thread statics for collectible types are reported directly to the GC instead of being handled via a GCHandle. While needed to avoid complex lifetime rules for collectible types, this may not be ideal for non-collectable types.
- Since the `DomainLocalModule` no longer exists, the `ISOSDacInterface` has been augmented with a new api called `ISOSDacInterface14` which adds the ability to query for the static base/initialization status of an individual type directly.
- Significant changes for generated code include
- All the helpers are renamed
- The statics of generics which have not yet been initialized can now be referenced using a single constant pointer + a helper call instead of needing a pair of pointers. In practice, this was a rare condition in perf-critical code due to the presence of tiered compilation, so this is not a significant change to optimized code.
- The pre-initialization of statics can now occur for types which have non-primitive valuetype statics as long as the type does not have a class constructor.
- Thread static non-gc statics are now returned as byrefs. (It turns out that for collectible assemblies, there is currently a small GC hole if a function returns the address of a non-gc threadstatic. CoreCLR at this time does not attempt to keep the collectible assembly alive if that is the only live pointer to the collectible static in the system)
With this change, the pointers to normal static data are located at a fixed offset from the start of the `MethodTableAuxiliaryData`, and indices for Thread Static variables are stored also stored in such a fixed offset. Concepts such as the `DomainLocalModule` , `ThreadLocalModule`, `ModuleId` and `ModuleIndex` no longer exist.
# Lifetime management for collectible statics
- For normal collectible statics, each type will allocate a separate object[] for the GC statics and a double[] for the non-GC statics. A pointer to the data of these arrays will be stored in the `DynamicStaticsInfo` structure, and when relocation occurs, if the collectible types managed `LoaderAllocator` is still alive, the static field address will be relocated if the object moves. This is done by means of the new Weak Interior Pointer GC handle type.
- For collectible thread-local statics, the lifetime management is substantially more complicated due the issue that it is possible for either a thread or a collectible type to be collected first. Thus the collection algorithm is as follows.
- The system shall maintain a global mapping of TLS indices to MethodTable structures
- When a native `LoaderAllocator` is being cleaned up, before the WeakTrackResurrection GCHandle that points at the the managed `LoaderAllocator` object is destroyed, the mapping from TLS indices to collectible `LoaderAllocator` structures shall be cleared of all relevant entries (and the current GC index shall be stored in the TLS to MethodTable mapping)
- When a GC promotion or collection scan occurs, for every TLS index which was freed to point at a GC index the relevant entry in the TLS table shall be set to NULL in preparation for that entry in the table being reused in the future. In addition, if the TLS index refers to a `MethodTable` which is in a collectible assembly, and the associated `LoaderAllocator` has been freed, then set the relevant entry to NULL.
- When allocating new entries from the TLS mapping table for new collectible thread local structures, do not re-use an entry in the table until at least 2 GCs have occurred. This is to allow every thread to have NULL'd out the relevant entry in its thread local table.
- When allocating new TLS entries for collectible TLS statics on a per-thread basis allocate a `LOADERHANDLE` for each object allocated, and associate it with the TLS index on that thread.
- When cleaning up a thread, for each collectible thread static which is still allocated, we will have a `LOADERHANDLE`. If the collectible type still has a live managed `LoaderAllocator` free the `LOADERHANDLE`.
# Expected cost model for extra GC interactions associated with this change
This change adds 3 possible ways in which the GC may have to perform additional work beyond what it used to do.
1. For normal statics on collectible types, it uses the a weak interior pointer GC handle for each of these that is allocated. This is purely pay for play and trades off performance of accessing collectible statics at runtime to the cost of maintaining a GCHandle in the GC. As the number of statics increases, this could in theory become a performance problem, but given the typical usages of collectible assemblies, we do not expect this to be significant.
2. For non-collectible thread statics, there is 1 GC pointer that is unconditionally reported for each thread. Usage of this removes a single indirection from every non-collectible thread local access. Given that this pointer is reported unconditionally, and is only a single pointer, this is not expected to be a significant cost.
3. For collectible thread statics, there is a complex protocol to keep thread statics alive for just long enough, and to clean them up as needed. This is expected to be completely pay for play with regard to usage of thread local variables in collectible assemblies, and while slightly more expensive to run than the current logic, will reduce the cost of creation/destruction of threads by a much more significant factor. In addition, if there are no collectible thread statics used on the thread, the cost of this is only a few branches per lookup.
# Perf impact of this change
I've run the .NET Microbenchmark suite as well as a variety of ASP.NET Benchmarks. (Unfortunately the publicly visible infrastructure for running tests is incompatible with this change, so results are not public). The results are generally quite hard to interpret. ASP.NET Benchmarks are generally (very) slightly better, and the microbenchmarks are generally equivalent in performance, although there is variability in some tests that had not previously shown variability, and the differences in performance are contained within the margin of error in our perf testing for tests with any significant amount of code. When performance differences have been examined in detail, they tend to be in code which has not changed in any way due to this change, and when run in isolation the performance deltas have disappeared in all cases that I have examined. Thus, I assume they are caching side effect changes. Performance testing has led me to add a change such that all NonGC, NonCollectible statics are allocated in a separate LoaderHeap which appears to have reduced the variability in some of the tests by a small fraction, although results are not consistent enough for me to be extremely confident in that statement.
- Implement `GetThreadStoreData` and `GetThreadCounts` in `Thread` contract
- Finish implementing `ISOSDacInterface::GetThreadStoreData` in cDAC
- Add specific threads (first in thread store, Finalizer, GC) and counts
- Make existing DAC call into cDAC for `GetThreadData` if available
- Only fills out managed thread ID and next thread right now - always returns E_NOTIMPL
- Update the example C# API in docs to be closer to what we have now
* Add support for primary constructors in LoggerMessageGenerator
* Get the primary constructor parameters types from the constructor symbol instead of from the semantic model
* Prioritize fields over primary constructor parameters and ignore shadowed parameters when finding a logger
* Make checking for primary constructors non-conditional on Roslyn version and simplify project setup
* Reintroduce Roslyn 4.8 test project
* Add info-level diagnostic for logger primary constructor parameters that are shadowed by field
* Update list of diagnostics with new logging message generator diagnostic
* Only add non-logger field names to set of shadowed names
* Add comment explaining the use of the set of shadowed names with an example
* Change the ReciprocalEstimate and ReciprocalSqrtEstimate APIs to be mustExpand on RyuJIT
* Apply formatting patch
* Fix the RV64 and LA64 builds
* Mark the ReciprocalEstimate and ReciprocalSqrtEstimate methods as AggressiveOptimization to bypass R2R
* Mark other usages of ReciprocalEstimate and ReciprocalSqrtEstimate in Corelib with AggressiveOptimization
* Mark several non-deterministic APIs as BypassReadyToRun and skip intrinsic expansion in R2R
* Cleanup based on PR recommendations to rely on the runtime rather than attributation of non-deterministic intrinsics
* Adding a regression test ensuring direct and indirect invocation of non-deterministic intrinsic APIs returns the same result
* Add a note about non-deterministic intrinsic expansion to the botr
* Apply formatting patch
* Ensure vector tests are correctly validating against the scalar implementation
* Fix the JIT/SIMD/VectorConvert test and workaround a 32-bit test issue
* Skip a test on Mono due to a known/tracked issue
* Ensure that lowering on Arm64 doesn't make an assumption about cast shapes
* Ensure the tier0opts local is used
* Ensure impEstimateIntrinsic bails out for APIs that need to be implemented as user calls
* Update link to dotnet-format tool
* It now lives in dotnet/sdk
* Link to official docs instead
Co-authored-by: Jan Kotas <jkotas@microsoft.com>
---------
Co-authored-by: Jan Kotas <jkotas@microsoft.com>
- Map indirect global values to corresponding values in `pointer_data` array from contract descriptor
- Add `[Try]Read<T>`, `[Try]ReadPointer`, `[Try]ReadGlobal<T>` and `[Try]ReadGlobalPointer` to `Target`
- Make cDAC implementation of `ISOSDacInterface9.GetBreakingChangeVersion` read value from globals
- Create test helpers for mocking out reading from the target and providing a contract descriptor for different bitness/endianness
- Add unit tests for reading global values (direct and indirect)
# cDAC Build Tool
## Summary
The purpose of `cdac-build-tool` is to generate a `.c` file that contains a JSON cDAC contract descriptor.
It works by processing one or more object files containing data descriptors and zero or more text
files that specify contracts.
## Running
```console
% cdac-build-tool compose [-v] -o contractdescriptor.c -c contracts.txt datadescriptor.o
```
## .NET runtime build integration
`cdac-build-tool` is meant to run as a CMake custom command.
It consumes a target platform object file and emits a C source
file that contains a JSON contract descriptor. The C source
is the included in the normal build and link steps to create the runtime.
The contract descriptor source file depends on `contract-aux-data.c` which is a source file that contains
the definitions of the "indirect pointer data" that is referenced by the data descriptor. This is typically the addresses of important global variables in the runtime.
Constants and build flags are embedded directly in the JSON payload.
Multiple data descriptor source files may be specified (for example if they are produced by different components of the runtime, or by different source languages). The final JSON payload will be a composition of all the data descriptors.
Multiple contracts text files may be specified. This may be useful if some contracts are conditionally included (for example if they are platform-specific). The final JSON payload will be a composition of all the contracts files.
```mermaid
flowchart TB
headers("runtime headers")
data_header("datadescriptor.h")
data_src("datadescriptor.c")
compile_data["clang"]
data_obj("datadescriptor.o")
contracts("contracts.txt")
globals("contractpointerdata.c")
build[["cdac-build-tool"]]
descriptor_src("contractdescriptor.c")
vm("runtime sources")
compile_runtime["clang"]
runtime_lib(["libcoreclr.so"])
headers -.-> data_src
headers ~~~ data_header
data_header -.-> data_src
headers -.-> globals
headers -.-> vm
data_src --> compile_data --> data_obj --> build
contracts ---> build
build --> descriptor_src
descriptor_src --> compile_runtime
data_header -.-> globals ----> compile_runtime
vm ----> compile_runtime --> runtime_lib
```
---
* add implementation note notes
* add an emitter
* read in the directory header
* contract parsing
* indirect pointer value support
* move sample to tool dir
* Take baselines from the docs/design/datacontracts/data dir
We don't parse them yet, however
* Add README
* fix BE
Store the magic as a uint64_t so that it will follow the platform endianness.
Store endmagic as bytes so that it directly follows the name pool - and fix the endmagic check not to look at the endianness
* hook up cdac-build-tool to the coreclr build; export DotNetRuntimeContractDescriptor
* cleanup; add contracts.txt
* add diagram to README
* move implementation notes
* better verbose output from ObjectFileScraper
* turn off whole program optimizations for data-descriptor.obj
On windows /GL creates object files that cdac-build-tool cannot read
It's ok to do this because we don't ship data-descriptor.obj as part of the product - it's only used to generate the cDAC descriptor
* C++-ify and add real Thread offsets
* no C99 designated initializers in C++ until C++20
* build data descriptor after core runtime
* fix gcc build
* simplify ObjectFileScraper
just read the whole file into memory
* invoke 'dotnet cmake-build-tool.dll' instead of 'dotnet run --project'
* clean up macro boilerplate
* platform flags
* turn off verbose output
* can't use constexpr function in coreclr
because debugreturn.h defines a `return` macro that expands to something that is not c++11 constexpr
* Rename "aux data" to "pointer data"
* rename "data-descriptor" to "datadescriptor"
* simplify linking
* cdac-build-tool don't build dotnet tool; turn on analyzers
* rationalize naming; update docs; add some inline comments
* renamce cdac.h to cdacoffsets.h
* improve output: hex offsets; improved formatting
* don't throw in ParseContracts; add line numbers to errors
* change input format for contracts to jsonc
* add custom JsonConverter instances for the compact json representation
* simplify; bug fix - PointerDataCount include placeholder
* one more set of feedback changes: simpler json converters
* set _RequiresLiveILLink=false for cdac-build-tool.csproj
fixes windows builds:
error MSB3026: (NETCORE_ENGINEERING_TELEMETRY=Build) Could not copy "D:\a\_work\1\s\artifacts\obj\ILLink.Tasks\Debug\net9.0\ILLink.Tasks.dll" to "D:\a\_work\1\s\artifacts\bin\ILLink.Tasks\Debug\net9.0\ILLink.Tasks.dll". Beginning retry 1 in 1000ms. The process cannot access the file 'D:\a\_work\1\s\artifacts\bin\ILLink.Tasks\Debug\net9.0\ILLink.Tasks.dll' because it is being used by another process.
---------
Co-authored-by: Elinor Fung <elfung@microsoft.com>
Co-authored-by: Aaron Robinson <arobins@microsoft.com>
This is a small workaround to allow developers working on Mac the
ability to generate .dSYM bundles as part of inner-loop development,
instead of the unsupported .dwarf files that are generated by default.
A full solution to use .dSYM bundles everywhere on Mac, including
packaging and symbol indexing, is tracked by
https://github.com/dotnet/runtime/issues/92911.
To build .dSYM bundles instead of .dwarf files, invoke build.sh as
follows:
```bash
./build.sh --subset clr --cmakeargs "-DCLR_CMAKE_APPLE_DSYM=TRUE"
```
* Replace FEATURE_EH_FUNCLETS/FEATURE_EH_CALLFINALLY_THUNKS in JIT with runtime switch
* Cache Native AOT ABI check to see if TP improves
---------
Co-authored-by: Bruce Forstall <brucefo@microsoft.com>
- Only create one .NET install layout to be shared by all host tests
- Add `pretest.proj` for `host.pretest` subset that builds all test project assets and creates the single .NET install layout
- Fix `NativeHostApis` tests that were editing the .NET install layout directly (instead of creating a copy to edit)
- Remove some unnecessary copying/creating of SDKs and frameworks by sharing the fixture across tests
- Update host testing doc with simpler setup instructions and more details around investigating test failures
Building on #100253 , describe an in-memory representation of the toplevel contract descriptor, comprisied of:
* some target architecture properties
* a data descriptor
* a collection of compatible contracts
Contributes to #99298
Fixes https://github.com/dotnet/runtime/issues/99299
---
* [cdac] Physical contract descriptor spec
* Add "contracts" to the data descriptor
* one runtime per module
if there are multiple hosted runtimes, diagnostic tooling should look in each loaded module to discover the contract descriptor
* Apply suggestions from code review
* Review feedback
- put the aux data and descriptor sizes closer to the pointers
- Don't include trailing nul `descriptor_size`. Clarify it is counting bytes and that `descriptor` is in UTF-8
- Simplify `DotNetRuntimeContractDescriptor` naming discussion
---------
Co-authored-by: Elinor Fung <elfung@microsoft.com>
- Delete build infrastructure around test project asset restore
- Remove requirement that packs must be built before running host tests
- Building packs was only necessary to support directing the restore/build for the test project assets to the built packs
Contributes to #100162 which is part of #99298
Follow-up to #99936 that removes "type layout" and "global value" contracts and instead replaces them with a "data descriptor" blob.
Conceptually a particular target runtime provides a pair of a logical data descriptor together with a set of algorithmic contract versions. The logical data descriptor is just a single model that defines all the globals and type layouts relevant to the set of algorithmic contract versions.
A logical data descriptor is realized by merging two physical data descriptors in a proscribed order.
The physical data descriptors provide some subset of the type layouts or global values.
The physical data descriptors come in two flavors:
- baseline descriptors that are checked into the dotnet/runtime repo and have well -known names
- in-proc descriptors that get embedded into a target runtime.
Each in-proc descriptor may refer to a baseline and represents a delta applied on top of the baseline.
The data contract model works on top of a flattened logical data descriptor.
Co-authored-by: Aaron Robinson <arobins@microsoft.com>
Co-authored-by: Jan Kotas <jkotas@microsoft.com>
Co-authored-by: Noah Falk <noahfalk@users.noreply.github.com>
* Delete native safehandle
* Delete PInvoke last error on thread
* Delete IsRealThreadPoolResetNeeded
* Delete TS_TaskReset
* Delete GetThreadContext
* Fix build break
* Delete unused resource strings
* Introduce FEATURE_IJW and use it in number of places
The operation of `mkrefany` can easily be represented with more
generally handled nodes within the JIT today. This also allows promotion
to remain enabled for methods using this construct, so CQ improvements
are expected when optimizing.