1
0
Fork 0
mirror of https://github.com/VSadov/Satori.git synced 2025-06-09 09:34:49 +09:00

Move and lint ECMA-335 Augments spec (#35095)

* Move specs and rename Ecma-335-Issues.md

* Fix typos

* MD009/no-trailing-spaces

* MD012/no-multiple-blanks

* MD007/ul-indent

* MD031/blanks-around-fences

* MD032/blanks-around-lists

* MD040/fenced-code-language

* MD030/list-marker-space

* Simplify markdown code span syntax

* Update title and intro

* Fix broken markdown code span

* Link addendum doc from dotnet-standards.md
This commit is contained in:
Joseph Musser 2020-04-17 11:26:55 -04:00 committed by GitHub
parent 684a76b1e3
commit da94c02257
Signed by: github
GPG key ID: 4AEE18F83AFDEB23
4 changed files with 59 additions and 62 deletions

View file

@ -1,25 +1,20 @@
# Known Issues in ECMA-335 CLI Specification
# ECMA-335 CLI Specification Addendum
This is an informal list of notes on issues that have been encountered
with the ECMA-335 CLI specification, primarily during development,
testing, and support of System.Reflection.Metadata.
Some of these are definite spec errors while others could be reasoned
as Microsoft implementation quirks.
This is a list of additions and edits to be made in ECMA-335 specifications. It includes both documentation of new runtime features and issues encountered during development. Some of the issues are definite spec errors while others could be reasoned as Microsoft implementation quirks.
## Signatures
There is a general philosophical issue whereby the spec defines the
There is a general philosophical issue whereby the spec defines the
*syntax* of signatures to exclude errors such as:
* using void outside of return types or pointer element types
* instantiating a generic with a byref type
* having a field of byref type
* etc.
* using void outside of return types or pointer element types
* instantiating a generic with a byref type
* having a field of byref type
* etc.
Another approach is to syntactically treat `VOID`, `TYPEDBYREF`,
`BYREF Type`, `CMOD_OPT Type`, `CMOD_REQ Type` as the other `Type`s
and then deal with the cases like those above as semantic errors in their use.
and then deal with the cases like those above as semantic errors in their use.
That is closer to how many implementations work. It is also how type syntax
is defined in the grammar for IL, with many of the semantic errors
deferred to peverify and/or runtime checking rather than being checked
@ -33,22 +28,24 @@ grammars and diagrams.
Many of the specific issues below arise from the tension between these
two approaches.
### 1. `(CLASS | VALUETYPE)` cannot be followed by TypeSpec in practice
In II.23.2.12 and II.23.2.14, it is implied that the token in `(CLASS
| VALUETYPE) TypeDefOrRefOrSpecEncoded` can be a `TypeSpec`, when in
In II.23.2.12 and II.23.2.14, it is implied that the token in
`(CLASS | VALUETYPE) TypeDefOrRefOrSpecEncoded` can be a `TypeSpec`, when in
fact it must be a `TypeDef` or `TypeRef`.
peverify gives the following error:
```
[MD]: Error: Signature has token following ELEMENT_TYPE_CLASS
(_VALUETYPE) that is not a TypeDef or TypeRef
```
An insightful comment in CLR source code notes that this rule prevents
cycles in signatures, but see #2 below.
Related issue:
* https://github.com/dotnet/roslyn/issues/7970
#### Proposed specification change
@ -59,11 +56,11 @@ b) Replace
> These items are compact ways to store a TypeDef, TypeRef, or TypeSpec token in a Signature (§II.23.2.12).
with
with
> TypeDefOrRefEncoded is a compact representation of either TypeDef or TypeRef token in a Signature (§II.23.2.12). TypeDefOrRefOrSpecEncoded is a compact representation of either TypeDef, TypeRef or TypeSpec token in a Signature.
Also correct
Also correct
> The encoded version of this TypeRef token is made up as follows:
@ -73,7 +70,7 @@ to
c) In section II.23.2.12 replace
```
```ebnf
Type ::=
...
CLASS TypeDefOrRefOrSpecEncoded
@ -86,19 +83,19 @@ Type ::=
with
```
```ebnf
Type ::=
...
(CLASS | VALUETYPE) TypeDefOrRefEncoded
GENERICINST (CLASS | VALUETYPE) TypeDefOrRefEncoded GenArgCount Type+
GENERICINST (CLASS | VALUETYPE) TypeDefOrRefEncoded GenArgCount Type+
...
```
Note also the correction of ```Type*``` to ```Type+```. A generic type instantiation shall have at least one type argument.
Note also the correction of `Type*` to `Type+`. A generic type instantiation shall have at least one type argument.
d) In section II.23.2.14 replace
```
```ebnf
TypeSpecBlob ::=
...
GENERICINST (CLASS | VALUETYPE) TypeDefOrRefOrSpecEncoded GenArgCount Type Type*
@ -107,14 +104,14 @@ TypeSpecBlob ::=
with
```
```ebnf
TypeSpecBlob ::=
...
GENERICINST (CLASS | VALUETYPE) TypeDefOrRefEncoded GenArgCount Type+
...
```
```Type Type*``` is simplified to ```Type+```.
`Type Type*` is simplified to `Type+`.
#### Rationale of the proposal
@ -122,19 +119,18 @@ TypeSpecBlob ::=
2. Potential TypeSpec recursion is prevented.
3. PEVerify, the CLR runtime and C# compiler prior to VS 2015 report an error when encountering an encoded TypeSpec in the positions decribed above.
3. PEVerify, the CLR runtime and C# compiler prior to VS 2015 report an error when encountering an encoded TypeSpec in the positions described above.
### 2. `(CMOD_OPT | CMOD_REQ) <TypeSpec>` is permitted in practice
### 2. `(CMOD_OPT | CMOD_REQ) <TypeSpec>` is permitted in practice
In II.23.2.7, it is noted that CMOD_OPT or CMOD_REQD is followed
by a TypeRef or TypeDef metadata token, but TypeSpec tokens are
by a TypeRef or TypeDef metadata token, but TypeSpec tokens are
also allowed by ilasm, csc, peverify, and the CLR.
Note, in particular, that TypeSpecs are used there by C++/CLI to
represent strongly-typed boxing in C++/CLI. e.g. `Nullable<int>^`
in C++/CLI becomes `[mscorlib]System.ValueType
modopt([mscorlib]System.Nulllable`1<int>)
modopt([mscorlib]System.Runtime.CompilerServices.IsBoxed)`
represent strongly-typed boxing in C++/CLI. e.g. `Nullable<int>^`
in C++/CLI becomes
``[mscorlib]System.ValueType modopt([mscorlib]System.Nullable`1<int>) modopt([mscorlib]System.Runtime.CompilerServices.IsBoxed)``
in IL.
This tolerance adds a loophole to the rule above whereby cyclical
@ -149,6 +145,7 @@ permitted, and ideally readers would detect such cycles and handle the
error with a suitable message rather than a stack overflow.
Related issues:
* https://github.com/dotnet/roslyn/issues/7971
* https://github.com/dotnet/coreclr/issues/2674
@ -157,14 +154,14 @@ Related issues:
In section II.23.2.7, replace
> The CMOD_OPT or CMOD_REQD is followed by a metadata token that indexes a row in the TypeDef
table or the TypeRef table. However, these tokens are encoded and compressed see §II.23.2.8
table or the TypeRef table. However, these tokens are encoded and compressed see §II.23.2.8
for details
with
> The CMOD_OPT or CMOD_REQD is followed by a metadata token that indexes a row in the TypeDef
table, TypeRef table, or TypeSpec table. However, these tokens are encoded and compressed
see §II.23.2.8 for details. Furthermore, if a row in the TypeSpec table is indicated,
table, TypeRef table, or TypeSpec table. However, these tokens are encoded and compressed
see §II.23.2.8 for details. Furthermore, if a row in the TypeSpec table is indicated,
it must not create cycle.
### 3. Custom modifiers can go in more places than specified
@ -181,13 +178,13 @@ and return types.
a) In section II.23.2.4 FieldSig, replace the diagram with a production rule:
```
```ebnf
FieldSig ::= FIELD Type
```
b) In section II.23.2.5 PropertySig, replace the diagram with a production rule:
```
```ebnf
PropertySig ::= PROPERTY HASTHIS? ParamCount RetType Param*
```
@ -195,8 +192,8 @@ Note that this change also allows properties to have BYREF type.
c) In section II.23.2.6 LocalVarSig, replace the diagram with production rules:
```
LocalVarSig ::=
```ebnf
LocalVarSig ::=
LOCAL_SIG Count LocalVarType+
LocalVarType ::=
@ -209,8 +206,8 @@ LocalVarType ::=
d) In section II.23.2.10 Param, replace the diagram with production rules:
```
Param ::=
```ebnf
Param ::=
Type
CustomMod* BYREF Type
CustomMod* TYPEDBYREF
@ -218,7 +215,7 @@ Param ::=
e) In section II.23.2.11 RetType, replace the diagram with production rules:
```
```ebnf
RetType ::=
Type
CustomMod* BYREF Type
@ -226,50 +223,50 @@ RetType ::=
CustomMod* VOID
```
f) In section II.23.2.12 Type, add a production rule to the definition of ```Type```:
f) In section II.23.2.12 Type, add a production rule to the definition of `Type`:
```
```ebnf
Type ::= CustomMod* Type
```
g) In sections II.23.2.12 Type and II.23.2.14 TypeSpec replace production rule
```
```ebnf
PTR CustomMod* Type
```
with
with
```
```ebnf
PTR Type
```
and replace production rule
```
```ebnf
SZARRAY CustomMod* Type
```
with
with
```
```ebnf
SZARRAY Type
```
### 4. BYREF can come before custom modifiers
Everywhere `BYREF` appears in the spec's box and pointer diagrams, it
comes after any custom modifiers, but the C++/CLI declaration `const
int&` is emitted as `BYREF CMOD_OPT IsConst I4`, and a call-site using
comes after any custom modifiers, but the C++/CLI declaration `const int&`
is emitted as `BYREF CMOD_OPT IsConst I4`, and a call-site using
`CMOD_OPT IsConst BYREF I4` will not match.
Under the interpretation that `BYREF` is just a managed pointer type, it
makes sense that there should be parity between `PTR` and `BYREF` with
respect to modifiers. Consider, `const int*` vs. `int* const` in
C++. The former (pointer to constant int) is `PTR CMOD_OPT IsConst I4`
and the latter (constant pointer to int) is `CMOD_OPT IsConst PTR
I4`. The analogy from `const int*` to `const int&` justifies C++'s
and the latter (constant pointer to int) is `CMOD_OPT IsConst PTR I4`.
The analogy from `const int*` to `const int&` justifies C++'s
encoding of `BYREF` before `CMOD_OPT` in defiance of the spec.
#### Proposed specification change
@ -308,9 +305,9 @@ typespec with signature I4.
Even more obscurely, this gives us a way to use `VOID`, `TYPEDBYREF`,
`CMOD_OPT`, and `CMOD_REQ` at the root of a `TypeSpec`, which are not even
specified as valid at the root of a `Type`: `modopt(int32
modopt(int32))`, `modopt(void)`, and `modopt(typedref)` all work in
practice. `CMOD_OPT` and `CMOD_REQ` at the root can also be otained by putting
specified as valid at the root of a `Type`: `modopt(int32 modopt(int32))`,
`modopt(void)`, and `modopt(typedref)` all work in
practice. `CMOD_OPT` and `CMOD_REQ` at the root can also be obtained by putting
a modifier on the type used with `constrained.`.
## Heap sizes
@ -319,13 +316,12 @@ The ECMA-335-II specification isn't clear on the maximum sizes of #String, #Blob
#### Proposed specification change
We propose the limit on #String and #Blob heap size is 2^29 (0.5 GB), that is any index to these heaps fits into 29 bits.
We propose the limit on #String and #Blob heap size is 2^29 (0.5 GB), that is any index to these heaps fits into 29 bits.
#### Rationale of the proposal
1) 2^29 is the maximum value representable by a compressed integer as defined elsewhere in the spec. Currently the metadata don't encode heap indices anywhere using compressed integers. However the Portable PDB specification uses compressed integers for efficient encoding of heap indices. We could extend the definition of compressed integer to cover all 32 bit integers, but it would be simpler if we could leave it as is.
1) 2^29 is the maximum value representable by a compressed integer as defined elsewhere in the spec. Currently the metadata don't encode heap indices anywhere using compressed integers. However the Portable PDB specification uses compressed integers for efficient encoding of heap indices. We could extend the definition of compressed integer to cover all 32 bit integers, but it would be simpler if we could leave it as is.
2) 0.5 GB is a very large heap. Having such a big PE file seems unreasonable and very rare scenario (if it exists at all).
3) Having 3 spare bits available is very beneficial for the implementation. It allows to represent WinRT projected strings, namespaces, etc. in very efficient way. If we had to represent heap indices with all 32 bits it would bloat various structures and increase memory pressure. PE files over 0.5 GB of size are very rare, but the overhead would affect all compilers and tools working with the metadata reader.
2) 0.5 GB is a very large heap. Having such a big PE file seems unreasonable and very rare scenario (if it exists at all).
3) Having 3 spare bits available is very beneficial for the implementation. It allows to represent WinRT projected strings, namespaces, etc. in very efficient way. If we had to represent heap indices with all 32 bits it would bloat various structures and increase memory pressure. PE files over 0.5 GB of size are very rare, but the overhead would affect all compilers and tools working with the metadata reader.

View file

@ -29,6 +29,7 @@ ECMA 335 - CLI
- [ECMA 335 Standard Overview](http://www.ecma-international.org/publications/standards/Ecma-335.htm)
- [ECMA 335 Standard (PDF)](http://www.ecma-international.org/publications/files/ECMA-ST/ECMA-335.pdf)
- [Wikipedia entry on CLI](http://en.wikipedia.org/wiki/Common_Language_Infrastructure)
- [ECMA 335 Addendum](../design/specs/Ecma-335-Augments.md)
**ECMA 335 Partitions with added Microsoft Specific Implementation Notes**