Andrey Karpov , Evgenii Ryzhkov

Oct 02 2008

Tags:

#64bit

AMD64 (EM64T) architecture

Oct 02 2008

Author: Andrey Karpov , Evgenii Ryzhkov

Introduction
1. AMD64 architecture
AMD64 program model
3. Porting applications on AMD64
Conclusion
References

The article briefly describes AMD64 architecture by AMD Company and its implementation EM64T by Intel Company. The architecture's peculiarities, advantages and disadvantages are described.

Introduction

Development of computer-solved tasks demands more and more from the hardware these tasks are being solved on. The requirements to computer systems of personal-computer class have been growing year by year for 20 years already. It happens because people wish to solve on their personal computers more and more complex tasks which have been earlier solved only on high-performance mainframes.

What are these requirements to the personal computers for solving complex tasks? Of course, these are requirements of main-memory size and processor's performance (don't mix up with frequency!). IA32 architecture (Intel Architecture 32) dominating during the last decade offers 4Gb (2^32) of main memory of which only 2Gb are usually allocated to an application; different register blocks and sets of various tricks such as branch predication block, which should increase the system's performance without increasing such an abstract parameter as processor's frequency [1].

Modern tasks for personal computers approach 2Gb while processors' frequency increase cannot help increase performance.

Newly-developed 64-bit architectures SPARC64 and Intel Itanium can to some extend serve to solve the problem of modern 32-bit computers' limitations. But they are intended for hi-end systems and are not available as cheap solutions. It is AMD64 architecture by AMD Company and its implementation EM64T by Intel Company which are to become really popular. These architectures are twins and programs compiled for one of them can be launched on the other as well. But it is the solution by AMD that historically appeared first. EM64T is actually only an implementation of AMD64 by Intel. AMD64 architecture is now implemented in processors of all classes: mobiles, work-stations, servers.

Despite evident advantages of AMD64 platform (which are described in detail in this article) it doesn't introduce anything revolutionary into computing machinery. Porting from 32 bits to 64 bits didn't lead to quality improvements while previous porting from 16 bits to 32 bits had increased systems' safety and performance significantly.

1. AMD64 architecture

AMD64 architecture is fully described in five documentation volumes provided by AMD Company. This chapter provides a brief description based on the first volume [2]. Pay attention that in official documentation this architecture is defined as AMD x86-64 what underlines its backward compatibility.

1.1. The architecture's description

AMD x86-64 architecture is a simple but powerful backward-compatible extension of the obsolete industrial architecture x86 [1]. It adds 64-bit address space and extends register resources for supporting more performance for recompiled 64-bit programs providing support of obsolete 16-bit and 32-bit code of applications and operational systems without modifying or recompiling them.

Necessity of 64-bit x86 architecture is explained by applications which need large address space. These are high-performance servers, data managers, CAD-systems and of course games. Such applications will gain an advantage due to 64-bit address space and more registers. Few registers available in obsolete x86 architecture limit computing-task performance. More registers provide sufficient performance for most applications.

x86-64 architecture introduces two new peculiarities:

1. Extended registers (Picture 1):

8 general-purpose registers;
all 16 general-purpose registers are 64-bit;
8 new 128-bit XMM registers;
a new command prefix (REX) for access to extended registers.

2. special mode "Long Mode" which is shown in Table 1:

up to 64-bit virtual addresses;
64-bit command pointer (RIP);
flat address space.

Picture 1. Set of x86-64 registers

Table 1. Processor operating modes.

Table 2 contains comparison of registers' and stack's resources available to an application in different modes. Left columns show resources provided by obsolete x86 architecture which are available only to compatibility. Right columns show resources available in 64-bit mode. The difference between the modes is marked grey.

Table 2. Registers and stack available in different modes

As shown in Table 2 obsolete x86 architecture (this mode is called legacy mode in x86-64) supports 8 general-purpose registers. But actually only 4 registers are usually used: EAX, EBX, ECX, EDX. Registers EBP, ESI, EDI, ESP have a special purpose. X86-64 architecture adds 8 general-purpose registers and enlarges the register range from 32 bits to 64 bits. It allows compilers to increase code performance. A 64-bit compiler can use registers for storing variables more efficiently. The compiler also allows you to minimize memory access by locating operation inside general-purpose registers.

x86-64 architecture supports the whole set of x86 instructions and adds some new instructions for supporting long-mode. The commands are divided into several subsets:
General-purpose commands. These are main x86 integer commands used in all programs. Most of them are intended for loading, saving and processing data located in general-purpose registers or memory. Some of these commands manage the command stream providing passage from one program section to another.
128-bit media-commands. These are SSE and SSE2 (streaming SIMD extension) commands intended for loading, saving or processing data located in 128-bit XMM registers. They perform integer or floating-point operations over vector (packed) and scalar data types. As vector commands can perform one operation over a data set independently they are called single-instruction, multiple-data (SIMD) commands. They are used for media- and science applications for processing data blocks.
64-bit media-commands. These are multimedia extension (MMX) and 3DNow! Commands. They save, restore and process data located in 64-bit MMX registers. Like 128-bit commands described before they perform integer and floating-point operations over vector (packed) and scalar data.
x87 commands. They are intended for working with the floating point in obsolete x87 applications. They process data in x87 registers.

Some of these commands connect two or more subsets of the commands described above. For example, such are commands of data transmission between general-purpose registers and XMM or MMX registers.

Let's consider in detail the operating modes shown in Table 1 supported by x86-64. In most cases addresses' and operands' sizes can be overlayed by a command prefix.

Let's describe long-mode at first. This is an extension of the obsolete protected mode. Long-mode consists of two submodes: 64-bit mode and compatibility mode. 64-bit mode supports all the new possibilities and register extensions introduced into x86-64. Compatibility mode supports binary compatibility with existing 16-bit and 32-bit code. Long-mode doesn't support obsolete real mode or obsolete virtual-8086 mode and it also doesn't support hardware task switching.

As 64-bit mode supports 64-bit address space you need to use a new 64-bit operational system for its work. Meanwhile, the existing applications can be launched without recompiling in compatibility mode under the OS working in 64-bit mode. For 64-bit command addressing a 64-bit register (RIP) and a new addressing mode with single flat address space for code, stack and data are used.

64-bit mode implements support of extended registers through a new prefix group of REX commands.

In 64-bit mode addresses' size is 64 bits on default but implementations of x86-64 may have a smaller size. An operand's size is 32 bits on default. For most instructions the operand's size can be overlaid using a prefix of REX-type commands.

64-bit mode provides data addressing relative to the 64-bit register RIP. X86 architecture provided addressing relative to IP register only in control transfer commands. RIP-relative addressing increases efficiency of position-independent code and code addressing global data.

Some opcode commands were redefined to support extended registers and 64-bit addressing.

Compatibility mode is intended for executing existing 16-bit and 32-bit programs in a 64-bit OS. Applications are launched in compatibility mode with the use of 32- or 16-bit address space and can have access to 4Gb of virtual address space. Commands' prefixes can switch 16- and 32-bit addresses and operands' sizes.

From the application's viewpoint compatibility mode looks like the obsolete protected x86 mode but from the viewpoint of the OS (address translation, processing of interruptions and exceptions) 64-bit mechanisms are used.

Legacy mode provides binary compatibility not only with 16- and 32-bit applications but with 16- and 32-bit operational systems as well. It includes three modes:

Protected mode. 16- and 32-bit programs with segmental memory organization, privilege and virtual memory support. Address space is 4Gb.
Virtual-8086 mode. Supports 16-bit applications launched as tasks in protected mode. Address space is 1Mb.
Real mode. Supports 16-bit programs with simple register addressing of segmented memory. Virtual memory and privileges are not supported. 1Mb of memory is available.

Legacy mode is used only when 16- and 32-bit OS are operating.

1.2. The architecture's advantages

Let's outline the main advantages of AMD x86-64 architecture.

64-bit address space.
Extended register set.
Developer-habitual command set.
Possibility of launching obsolete 32-bit applications in a 64-bit OS.
Possibility of using a 32-bit OS.

1.3. The architecture's disadvantages

The new architecture AMD x86-64 hasn't introduced crucial disadvantages into 32-bit architecture. We can point out only a bit increased programs' memory requirements because of the larger size of addresses and operands. But it won't influence however significantly the code size or the requirements to available main memory.

But the fact is that AMD x86-64 hasn't introduced anything significantly new. There is no performance gain. On the average, you can expect 5-15% performance gain after recompiling a program.

AMD64 program model

Nearly all modern OS now have versions for AMD64 architecture. Thus, Microsoft presents Windows XP 64-bit, Windows Server 2003 64bit, Windows Vista 64bit. The leading UNIX system developers also provide 64-bit versions, such as, for example, Linux Debian 3.1 x86-64. But it doesn't mean that the whole code of such a system is completely 64-bit. Some OS code and many applications still can remain 32-bit as AMD64 provides backward compatibility.

64-bit Windows version, for example, uses a special mode WoW (Windows-on-Windows 64) which translates 32-bit applications' calls to the resources of a 64-bit OS. Let's consider in detail AMD64 program model available to a programmer in 64-bit Windows shortly called Win64.

Let's begin with address space. Although a 64-bit processor can theoretically address 16 exabyte (2^64) Win64 now supports 16 terabytes (2^44). There are several reasons for this. Existing processors can provide access only to 1 terabyte (2^40) of actual storage. The architecture (but not the hardware part) can extend this space up to 4 petabytes. But anyway we need a great memory size for page tables representing memory. (see Table 3).

	32-bit mode	64-bit mode
Process's general address space	4Gb	16Tb
Address space available to a 32-bit process	2Gb (3Gb if the system is loaded with /3GB key)	4Gb if the application is compiled with /LARGEADDRESSAWARE key (2Gb otherwise)
Address space available to a 64-bit process	Impossible	8Tb
Paged pool	470Mb	128Gb
Non-paged pool	256Mb	128Gb
System Page Table (PTE)	660Mb - 900Mb	128Gb

Table 3. Main memory limitations in Windows

Like in Win32 the addressed memory range is divided into user and system addresses. Each process receives 8Tb and 8Tb remain in the system (unlike 2Gb and 2Gb in Win32 correspondingly). Different Windows versions have different limitations shown in Table 4.

Actual storage and number of processors	32-bit models	64-bit models
Windows XP Home	4 Gb, 1 CPU	Not present
Windows XP Professional	4 Gb, 1-2 CPU	128 Gb, 1-2 CPU
Windows Server 2003, Standard	4 Gb, 1-4 CPU	32 Gb, 1-4 CPU
Windows Server 2003, Enterprise	64 Gb, 1-8 CPU	1 Tb, 1-8 CPU
Windows Server 2003, Datacenter	64 Gb, 8-32 CPU	1 Tb, 8-64 CPU
Windows Server 2008, Datacenter	64 Gb, 2-64 CPU	2 Tb, 2-64 CPU
Windows Server 2008, Enterprise	64 Gb, 1-8 CPU	2 Tb, 1-8 CPU
Windows Server 2008, Standard	4 Gb, 1-4 CPU	32 Gb, 1-4 CPU
Windows Server 2008, Web Server	4 Gb, 1-4 CPU	32 Gb, 1-4 CPU
Vista Home Basic	4 Gb, 1 CPU	8 Gb, 1 CPU
Vista Home Premium	4 Gb, 1-2 CPU	16 Gb, 1-2 CPU
Vista Business	4 Gb, 1-2 CPU	128 Gb, 1-2 CPU
Vista Enterprise	4 Gb, 1-2 CPU	128 Gb, 1-2 CPU
Vista Ultimate	4 Gb, 1-2 CPU	128 Gb, 1-2 CPU

Table 4. Limitations of different Windows versions

Like in Win32 a page's size is 4Kb. First 4Kb of address space are never shown, i.e. the least true address is 0x10000. Unlike Win32 system DLL are loaded exceeding 4Gb.

All the processors implementing AMD64 have support for "CPU No Execution" bit which is used by Windows for implementing the hardware technology "Data Execution Protection" (DEP) which forbids execution of user data instead of code. It allows you to increase programs' safety excluding influence of such errors as execution of the buffer with data as code.

The peculiarity of AMD64 compilers is that they can most efficiently implement registers for passing parameters into functions instead of using the stack. It allowed Win64 architecture developers to get rid off such a notion as calling convention. In Win32 you can use different conventions (ways of passing parameters): __stdcall, __cdecl, __fastcall etc. In Win64 there is only one calling convention. Let's consider an example of how four arguments of integer-type are passed in registers:

RCX: first argument
RDX: second argument
R8: third argument
R9: fourth argument

Arguments after the first four integers are passed on the stack. For float arguments XMM0-XMM3 both the registers and the stack are used.

The difference in calling conventions leads to that you cannot use both 64-bit and 32-bit code in one program. In other words, if an application is compiled for 64-bit mode all the used DLL libraries must be 64-bit too.

While writing 64-bit code you can get additional performance gain thanks to special optimization. This question is considered in detail in optimizing instructions [3].

3. Porting applications on AMD64

One of the purposes of high-level languages is to reduce as far as possible the binding of program code to the architecture and provide the most possible portability between hardware platforms. For example, C++ programs written correctly are theoretically independent from the hardware platform. And, ideally, to compile the corresponding 32-bit applications for AMD64 platform it is enough only to change the compiler [4] and just recompile the program. But in practice everything is more complicated.

Software using Assembler code for 32-bit processors still exists. Many programs written in high-level languages contain Assembler blocks. That's why it is often impossible just to recompile a large project. The solution of this problem is clear. Firstly, you can refuse porting an application on a new platform. It can be a very reasonable solution because, for example, Windows-family OS provide good backward compatibility due to Wow64 technology. The second variant is to rewrite the program code. Moreover, it seems reasonable to rewrite it using high-level languages. By the way, pay attention that Visual C++ compiler doesn't support compilation of Assembler blocks in 64-bit compilation mode anymore [5].

Presence of Assembler program code is not the only obstacle we face while mastering 64-bit systems. While porting programs on 64-bit systems different errors occur relating to changing of the data model (type size). What's more, some errors become apparent only while using large memory size which was unavailable in 32-bit systems. Such errors are well described in the article "20 issues of porting C++ code on the 64-bit platform" [6].

All said above relates mostly to C/C++ applications. It is better with managed code (C#) although we can face some small problems here as well. Unfortunately, large program complexes are often built using libraries written in C/C++. And that's why in case of a large C# project it most likely contains C/C++ modules or libraries which can be unsafe and contain vulnerabilities.

For testing and checking program code ported on a 64-bit platform you can use different special methods and tools [7]. For example, such static analyzers as Viva64 (for Windows systems) and PC-Lint (for Unix systems) can provide good results. To learn more about this toolkit read the article "Comparison of analyzers' diagnostic abilities while testing 64-bit code" [8].

Conclusion

Undoubtedly, AMD64 architecture offered by AMD Company turned out to be needed on market. AMD64's advantage is that it allows you to smoothly switch to 64-bit programs without losing compatibility with obsolete 32-bit applications. But there is nothing revolutionary in AMD64.

Migration of 32-bit programs on AMD64, as experiments demonstrate, allows you, firstly, to solve tasks which are much more memory-demanding and, secondly, get about 10% performance gain "just so" without changing code due to optimization of an application by the compiler for the new architecture.

We may conclude that AMD64 architecture postponed the problem of limited available main-memory size for many years but didn't solve the problem of modern personal computers' performance gain. The future is still with multi-core and multi-processor systems.