Tag Archives: MSVC

Two-phase Lookup in C++ Templates

This is a quick note to C++ Templates: The Complete Guide. Name Taxonomy comes first in chapter 9:

Qualified name: This term is not defined in the standard, but we use it to refer to names that undergo so-called qualified lookup. Specifically, this is a qualified-id or an unqualified-id that is used after an explicit member access operator (. or ->). Examples are S::x, this->f, and p->A::m. However, just class_mem in a context that is implicitly equivalent to this->class_mem is not a qualified name: The member access must be explicit.

Unqualified name: An unqualified-id that is not a qualified name. This is not a standard term but corresponds to names that undergo what the standard calls unqualified lookup.

Dependent name: A name that depends in some way on a template parameter. Certainly any qualified or unqualified name that explicitly contains a template parameter is dependent. Furthermore, a qualified name that is qualified by a member access operator (. or ->) is dependent if the type of the expression on the left of the access operator depends on a template parameter. In particular, b in this->b is a dependent name when it appears in a template. Finally, the identifier ident in a call of the form ident(x, y, z) is a dependent name if and only if any of the argument expressions has a type that depends on a template parameter.

Nondependent name: A name that is not a dependent name by the above description.

And the definition from Chapter 10:

This leads to the concept of two-phase lookup: The first phase is the parsing of a template, and the second phase is its instantiation.

During the first phase, nondependent names are looked up while the template is being parsed using both the ordinary lookup rules and, if applicable, the rules for argument-dependent lookup (ADL). Unqualified dependent names (which are dependent because they look like the name of a function in a function call with dependent arguments) are also looked up that way, but the result of the lookup is not considered complete until an additional lookup is performed when the template is instantiated.

During the second phase, which occurs when templates are instantiated at a point called the point of instantiation(POI), dependent qualified names are looked up (with the template parameters replaced with the template arguments for that specific instantiation), and an additional ADL is performed for the unqualified dependent names.

To summarize: nondependent names are looked up in first phase, qualified dependent names are looked up in second phase, and unqualified dependent names are looked up in both phases. Some code to illustrate how this works:

Now look into Derived::foo(). I is a nondependent name, it should be looked up only in first phase. But at that point, the compiler cannot decide the type of it. When instantiated with Derived<bool>, I is type int. When instantiated with Derived<void>, I is type double. So it’s better to look up I in the second phase. We can use typename Base<T>::I i = 1.024; to delay the look up, for I is a qualified dependent name now.

Unfortunately, two-phase lookup(C++03 standard) is not fully supported in VC++ even in VC++2013. It compiles well and gives your most expecting result(output 1 and 1.024). With gcc-4.6, it gives errors like:

Another code snippet:

When the compiler sees f(), g() has not been declared. This code should not compile, if f() is a nontemplate function. Since f() is a template function and g() is a nondependent name, the compiler can use ADL in first phase to find the declaration of g(). Note, a user-defined type like Int is required here. Since int is a primitive type, it has no associated namespace, and no ADL is performed.

VC++2013 still compiles well with this code. You can find some clue that they will not support it in the next VC++2015 release. With gcc, they declared to fully support two-phase lookup in gcc-4.7. I used gcc-4.8, error output looks like:

And the code compiles well with self-defined type Int(using -D_USE_STRUCT switch).

Pre/Post-main Function Call Implementation in C

In C++, pre/post-main function call can be implemented using a global class instance. Its constructor and destructor are invoked automatically before and after the main function. But in C, no such mechanism. Actually, there’s a glib implementation that can help. You may want to read my previous post about CRT sections of MSVC. I just copy the code and do some renaming:

One limitation in glib code is the lack of support for VS2003 and early versions. #pragma code_seg() is used to implement the same function:

Output from msvc/gcc:

MSVC CRT Initialization

This post provides a detailed view of the MSDN article CRT Initialization. Just paste some content here:

The CRT obtains the list of function pointers from the Visual C++ compiler. When the compiler sees a global initializer, it generates a dynamic initializer in the .CRT$XCU section (where CRT is the section name and XCU is the group name). To obtain a list of those dynamic initializers run the command dumpbin /all main.obj, and then search the .CRT$XCU section (when main.cpp is compiled as a C++ file, not a C file).

The CRT defines two pointers:
– __xc_a in .CRT$XCA
– __xc_z in .CRT$XCZ

Both groups do not have any other symbols defined except __xc_a and __xc_z. Now, when the linker reads various .CRT groups, it combines them in one section and orders them alphabetically. This means that the user-defined global initializers (which the Visual C++ compiler puts in .CRT$XCU) will always come after .CRT$XCA and before .CRT$XCZ.

So, the CRT library uses both __xc_a and __xc_z to determine the start and end of the global initializers list because of the way in which they are laid out in memory after the image is loaded.

Let’s run our VS debugger to further investigate the CRT implementation. I’m using VS2010, and a global instance of class A is declared and initialized:

Now set the breakpoints in the constructor and destructor, and start debugging. I’ve tried exe/dll and dynamic/static CRT combinations to view the call stacks:

_initterm is defined as follow. It is used to walk through __xc_a and __xc_z mentioned above:

__xc_a, __xc_z and other section groups are defined as:

gcc uses similar technology to deal with pre/post-main stuff. The section names are .init and .fini .

Compiler Intrinsic Functions

Copied from Wikipedia:

An intrinsic function is a function available for use in a given programming language whose implementation is handled specially by the compiler. Typically, it substitutes a sequence of automatically generated instructions for the original function call, similar to an inline function. Unlike an inline function though, the compiler has an intimate knowledge of the intrinsic function and can therefore better integrate it and optimize it for the situation. This is also called builtin function in many languages.

A code snippet is written to check the code generation when intrinsic is enabled or not:

Generated assembly:

Only printf() is in code. No abs() nor memcpy(). Since they are intrinsic, as listed here in gcc’s online document.

Intrinsic can be explicitly disabled. For instance, CRT intrinsic must be disabled for kernel development. Add -fno-builtin flag to gcc, or remove /Oi switch in MSVC. Only paste the generated code in gcc case here:

There _are_ abs() and memcpy() now. General MSVC intrinsic can be found here.

Intrinsic is easier than inline assembly. It is used to increase performance in most cases. Both gcc and MSVC provide intrinsic support for Intel’s MMX, SSE and SSE2 instrument set. Code snippet to use MMX:

Assembly looks like:

You see MMX registers and instruments this time. -mmmx flag is required to build for gcc. MSVC also generate similar code. Reference for these instrument set is available on Intel’s website.

A simple benchmark to use SSE is avalable here.

MSVC Inline Assembly

MSVC’s inline assembly is easier to use, as compared to gcc’s version. It is easier to write right code than wrong one, I think. Still a simple add function is used to illustrate:

The corresponding inline version:

__asm keyword is used to specify a inline assembly block. From MSDN, there is another asm keyword which is not recommended:

Visual C++ support for the Standard C++ asm keyword is limited to the fact that the compiler will not generate an error on the keyword. However, an asm block will not generate any meaningful code. Use __asm instead of asm.

Symbols in C/C++ code can be used directly in inline assembly. This is much more convenient than gcc. And it is also not necessary to load parameters into registers before usage as in gcc. MSVC does the job right even in optimization case.

NOTE: Inline assembly is not supported on the Itanium and x64 processors.

Let’s see the generated code:

Output:

Function parameters are located in [ebp+12] and [ebp+8] as referred by symbol a and b. Then, what happened if registers other than scratch registers are specified?

Output assembly code:

As you see, MSVC automatically preserves ebx for us. From MSDN:

When using __asm to write assembly language in C/C++ functions, you don’t need to preserve the EAX, EBX, ECX, EDX, ESI, or EDI registers.

Let’s see the case when stdcall calling convention is used:

Output:

In stdcall, stack is cleaned up by callee. So, there’s a ret 8 before return. And the function name is mangled to _add4@8.

MSVC also supports fastcall calling convention, but it causes register conflicts as mentioned on MSDN, and is not recommended. Just test it here, the code happens to work:)

Output:

Function parameters are passed in ecx and edx when using fastcall. But they are saved to stack first. It seems we get no benefit using this calling convention. Maybe MSVC does not implement it well. The function name is mangled to @add5@8.

Last, we can tell MSVC that we want to write our own prolog/epilog code sequences using __declspec(naked) directive:

Output:

Normal prolog/epilog is used here. MSVC does not generate duplicate these code when using __declspec(naked) directive.