1
0
Fork 0
mirror of https://github.com/VSadov/Satori.git synced 2025-06-08 11:37:04 +09:00
Satori/docs/design/coreclr/botr/profiling.md
Viktor Hofer a71a3249a9
Consolidate docs (#251)
* Consolidate sub-repo docs
2019-11-25 23:52:43 +01:00

43 KiB
Raw Blame History

Profiling

Profiling, in this document, means monitoring the execution of a program which is executing on the Common Language Runtime (CLR). This document details the interfaces, provided by the Runtime, to access such information.

Although it is called the Profiling API, the functionality provided by it is suitable for use by more than just traditional profiling tools. Traditional profiling tools focus on measuring the execution of the program—time spent in each function, or memory usage of the program over time. However, the profiling API is really targeted at a broader class of diagnostic tools, such as code-coverage utilities or even advanced debugging aids.

The common thread among all of these uses is that they are all diagnostic in nature — the tool is written to monitor the execution of a program. The Profiling API should never be used by the program itself, and the correctness of the program's execution should not depend on (or be affected by) having a profiler active against it.

Profiling a CLR program requires more support than profiling conventionally compiled machine code. This is because the CLR has concepts such as application domains, garbage collection, managed exception handling and JIT compilation of code (converting Intermediate Language into native machine code), that the existing conventional profiling mechanisms are unable to identify and provide useful information. The Profiling API provides this missing information in an efficient way that causes minimal impact on the performance of the CLR and the profiled program.

Note that JIT-compiling routines at runtime provide good opportunities, as the API allows a profiler to change the in-memory IL code stream for a routine, and then request that it be JIT-compiled anew. In this way, the profiler can dynamically add instrumentation code to particular routines that need deeper investigation. Although this approach is possible in conventional scenarios, it's much easier to do this for the CLR.

Goals for the Profiling API

  • Expose information that existing profilers will require for a user to determine and analyze performance of a program run on the CLR. Specifically:

    • Common Language Runtime startup and shutdown events
    • Application domain creation and shutdown events
    • Assembly loading and unloading events
    • Module load/unload events
    • Com VTable creation and destruction events
    • JIT-compiles, and code pitching events
    • Class load/unload events
    • Thread birth/death/synchronization
    • Function entry/exit events
    • Exceptions
    • Transitions between managed and unmanaged execution
    • Transitions between different Runtime contexts
    • Information about Runtime suspensions
    • Information about the Runtime memory heap and garbage collection activity
  • Callable from any (non-managed) COM-compatible language

  • Efficient, in terms of CPU and memory consumption - the act of profiling should not cause such a big change upon the program being profiled that the results are misleading

  • Useful to both sampling and non-sampling profilers. [A _sampling _profiler inspects the profilee at regular clock ticks - maybe 5 milliseconds apart, say. A _non-sampling _profiler is informed of events, synchronously with the thread that causes them]

Non-goals for the Profiling API

  • The Profiling API does not support profiling unmanaged code. Existing mechanisms must instead be used to profile unmanaged code. The CLR profiling API works only for managed code. However, profiler provides managed/unmanaged transition events to determine the boundaries between managed and unmanaged code.
  • The Profiling API does not support writing applications that will modify their own code, for purposes such as aspect-oriented programming.
  • The Profiling API does not provide information needed to check bounds. The CLR provides intrinsic support for bounds checking of all managed code.

The CLR code profiler interfaces do not support remote profiling due to the following reasons:

  • It is necessary to minimize execution time using these interfaces so that profiling results will not be unduly affected. This is especially true where execution performance is being monitored. However, it is not a limitation when the interfaces are used to monitor memory usage or to obtain Runtime information on stack frames, objects, etc.
  • The code profiler needs to register one or more callback interfaces with the Runtime on the local machine on which the application being profiled runs. This limits the ability to create a remote code profiler.

Profiling API Overview

The profiling API within CLR allows the user to monitor the execution and memory usage of a running application. Typically, this API will be used to write a code profiler package. In the sections that follow, we will talk about a profiler as a package built to monitor execution of any managed application.

The profiling API is used by a profiler DLL, loaded into the same process as the program being profiled. The profiler DLL implements a callback interface (ICorProfilerCallback2). The runtime calls methods on that interface to notify the profiler of events in the profiled process. The profiler can call back into the runtime with methods on ICorProfilerInfo to get information about the state of the profiled application.

Note that only the data-gathering part of the profiler solution should be running in-process with the profiled application—UI and data analysis should be done in a separate process.

Profiling Process Overview

The ICorProfilerCallback and ICorProfilerCallback2 interfaces consists of methods with names like ClassLoadStarted, ClassLoadFinished, JITCompilationStarted. Each time the CLR loads/unloads a class, compiles a function, etc., it calls the corresponding method in the profiler's ICorProfilerCallback/ICorProfilerCallback2 interface. (And similarly for all of the other notifications; see later for details)

So, for example, a profiler could measure code performance via the two notifications FunctionEnter and FunctionLeave. It simply timestamps each notification, accumulates results, then outputs a list indicating which functions consumed the most cpu time, or most wall-clock time, during execution of the application.

The ICorProfilerCallback/ICorProfilerCallback2 interface can be considered to be the "notifications API".

The other interface involved for profiling is ICorProfilerInfo. The profiler calls this, as required, to obtain more information to help its analysis. For example, whenever the CLR calls FunctionEnter it supplies a value for the FunctionId. The profiler can discover more information about that FunctionId by calling the ICorProfilerInfo::GetFunctionInfo to discover the function's parent class, its name, etc, etc.

The picture so far describes what happens once the application and profiler are running. But how are the two connected together when an application is started? The CLR makes the connection during its initialization in each process. It decides whether to connect to a profiler, and which profiler that should be, depending upon the value for two environment variables, checked one after the other:

  • Cor_Enable_Profiling - only connect with a profiler if this environment variable exists and is set to a non-zero value.
  • Cor_Profiler - connect with the profiler with this CLSID or ProgID (which must have been stored previously in the Registry). The Cor_Profiler environment variable is defined as a string:
    • set Cor_Profiler={32E2F4DA-1BEA-47ea-88F9-C5DAF691C94A}, or
    • set Cor_Profiler="MyProfiler"
  • The profiler class is the one that implements ICorProfilerCallback/ICorProfilerCallback2. It is required that a profiler implement ICorProfilerCallback2; if it does not, it will not be loaded.

When both checks above pass, the CLR creates an instance of the profiler in a similar fashion to CoCreateInstance. The profiler is not loaded through a direct call to CoCreateInstance so that a call to CoInitialize may be avoided, which requires setting the threading model. It then calls the ICorProfilerCallback::Initialize method in the profiler. The signature of this method is:

HRESULT Initialize(IUnknown *pICorProfilerInfoUnk)

The profiler must QueryInterface pICorProfilerInfoUnk for an ICorProfilerInfo interface pointer and save it so that it can call for more info during later profiling. It then calls ICorProfilerInfo::SetEventMask to say which categories of notifications it is interested in. For example:

ICorProfilerInfo* pInfo;

pICorProfilerInfoUnk->QueryInterface(IID_ICorProfilerInfo, (void**)&pInfo);

pInfo->SetEventMask(COR_PRF_MONITOR_ENTERLEAVE | COR_PRF_MONITOR_GC)

This mask would be used for a profiler interested only in function enter/leave notifications and garbage collection notifications. The profiler then simply returns, and is off and running!

By setting the notifications mask in this way, the profiler can limit which notifications it receives. This obviously helps the user build a simpler, or special-purpose profiler; it also reduces wasted cpu time in sending notifications that the profiler would simply 'drop on the floor' (see later for details).

TODO: This text is a bit confusing. It seems to be conflating the fact that you need to create a different 'environment' (as in environment variables) to specify a different profiler and the fact that only one profiler can attach to a process at once. It may also be conflating launch vs. attach scenarios. Is that right??

Note that only one profiler can be profiling a process at one time in a given environment. In different environments it is possible to have two different profilers registered in each environment, each profiling separate processes.

Certain profiler events are IMMUTABLE which means that once they are set in the ICorProfilerCallback::Initialize callback they cannot be turned off using ICorProfilerInfo::SetEventMask(). Trying to change an immutable event will result in SetEventMask returning a failed HRESULT.

The profiler must be implemented as an inproc COM server a DLL, which is mapped into the same address space as the process being profiled. Any other type of COM server is not supported; if a profiler, for example, wants to monitor applications from a remote computer, it must implement 'collector agents' on each machine, which batch results and communicate them to the central data collection machine.

Profiling API Recurring Concepts

This brief section explains a few concepts that apply throughout the profiling API, rather than repeat them with the description of each method.

IDs

Runtime notifications supply an ID for reported classes, threads, AppDomains, etc. These IDs can be used to query the Runtime for more info. These IDs are simply the address of a block in memory that describes the item; however, they should be treated as opaque handles by any profiler. If an invalid ID is used in a call to any Profiling API function then the results are undefined. Most likely, the result will be an access violation. The user has to ensure that the ID's used are perfectly valid. The profiling API does not perform any type of validation since that would create overhead and it would slow down the execution considerably.

Uniqueness

A ProcessID is unique system-wide for the lifetime of the process. All other IDs are unique process-wide for the lifetime of the ID.

Hierarchy & Containment

ID's are arranged in a hierarchy, mirroring the hierarchy in the process. Processes contain AppDomains contain Assemblies contain Modules contain Classes contain Functions. Threads are contained within Processes, and may move from AppDomain to AppDomain. Objects are mostly contained within AppDomains (a very few objects may be members of more than one AppDomain at a time). Contexts are contained within Processes.

Lifetime & Stability

When a given ID dies, all IDs contained within it die.

ProcessID Alive and stable from the call to Initialize until the return from Shutdown.

AppDomainID Alive and stable from the call to AppDomainCreationFinished until the return from AppDomainShutdownStarted.

AssemblyID, ModuleID, ClassID Alive and stable from the call to LoadFinished for the ID until the return from UnloadStarted for the ID.

FunctionID Alive and stable from the call to JITCompilationFinished or JITCachedFunctionSearchFinished until the death of the containing ClassID.

ThreadID Alive and stable from the call to ThreadCreated until the return from ThreadDestroyed.

ObjectID Alive beginning with the call to ObjectAllocated. Eligible to change or die with each garbage collection.

GCHandleID Alive from the call to HandleCreated until the return from HandleDestroyed.

In addition, any ID returned from a profiling API function will be alive at the time it is returned.

App-Domain Affinity

There is an AppDomainID for each user-created app-domain in the process, plus the "default" domain, plus a special pseudo-domain used for holding domain-neutral assemblies.

Assembly, Module, Class, Function, and GCHandleIDs have app-domain affinity, meaning that if an assembly is loaded into multiple app domains, it (and all of the modules, classes, and functions contained within it) will have a different ID in each, and operations upon each ID will take effect only in the associated app domain. Domain-neutral assemblies will appear in the special pseudo-domain mentioned above.

Special Notes

All IDs except ObjectID should be treated as opaque values. Most IDs are fairly self-explanatory. A few are worth explaining in more detail:

ClassIDs represent classes. In the case of generic classes, they represent fully-instantiated types. List, List, List