The OpenD Programming Language

Inline Assembler

$(HTMLTAG3 a, href="http://digitalmars.com/gift/index.html" title="Gift Shop" target="_top", $(HTMLTAG3V img, src="images/d5.gif" border="0" align="right" alt="Some Assembly Required" width="284" height="186") )

D, being a systems programming language, provides an inline assembler. The inline assembler is standardized for D implementations across the same CPU family, for example, the Intel Pentium inline assembler for a Win32 D compiler will be syntax compatible with the inline assembler for Linux running on an Intel Pentium.

Implementations of D on different architectures, however, are free to innovate upon the memory model, function call/return conventions, argument passing conventions, etc.

This document describes the x86 and x86_64 implementations of the inline assembler. The inline assembler platform support that a compiler provides is indicated by the D_InlineAsm_X86 and D_InlineAsm_X86_64 version identifiers, respectively.

Asm statement

 AsmStatement:
    asm function, FunctionAttributesopt { AsmInstructionListopt }

AsmInstructionList: AsmInstruction ; AsmInstruction ; AsmInstructionList

Assembler instructions must be located inside an asm block. Like functions, asm statements must be anotated with adequate function attributes to be compatible with the caller. Asm statements attributes must be explicitly defined, they are not infered.

void func1() pure nothrow @safe @nogc
{
    asm pure nothrow @trusted @nogc
    {}
}

void func2() @safe @nogc
{
    asm @nogc // Error: asm statement is assumed to be @system - mark it with '@trusted' if it is not
    {}
}

Asm instruction

 AsmInstruction:
    Identifier : AsmInstruction
    align IntegerExpression
    even
    naked
    db Operands
    ds Operands
    di Operands
    dl Operands
    df Operands
    dd Operands
    de Operands
    db StringLiteral
    ds StringLiteral
    di StringLiteral
    dl StringLiteral
    dw StringLiteral
    dq StringLiteral
    Opcode

Opcode Operands

Opcode: Identifier int in out

Operands: Operand

Operand , Operands

Labels

Assembler instructions can be labeled just like other statements. They can be the target of goto statements. For example:

void *pc;
asm
{
    call L1          ;
  L1:                ;
    pop  EBX         ;
    mov  pc[EBP],EBX ; // pc now points to code at L1
}

align IntegerExpression

 IntegerExpression:
    IntegerLiteral
    Identifier

Causes the assembler to emit NOP instructions to align the next assembler instruction on an IntegerExpression boundary. IntegerExpression must evaluate at compile time to an integer that is a power of 2.

Aligning the start of a loop body can sometimes have a dramatic effect on the execution speed.

even

Causes the assembler to emit NOP instructions to align the next assembler instruction on an even boundary.

naked

Causes the compiler to not generate the function prolog and epilog sequences. This means such is the responsibility of inline assembly programmer, and is normally used when the entire function is to be written in assembler.

db

These pseudo ops are for inserting raw data directly into the code. db is for bytes, ds is for 16 bit words, di is for 32 bit words, dl is for 64 bit words, df is for 32 bit floats, dd is for 64 bit doubles, and de is for 80 bit extended reals. Each can have multiple operands. If an operand is a string literal, it is as if there were length operands, where length is the number of characters in the string. One character is used per operand. For example:

asm
{
    db 5,6,0x83;   // insert bytes 0x05, 0x06, and 0x83 into code
    ds 0x1234;     // insert bytes 0x34, 0x12
    di 0x1234;     // insert bytes 0x34, 0x12, 0x00, 0x00
    dl 0x1234;     // insert bytes 0x34, 0x12, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
    df 1.234;      // insert float 1.234
    dd 1.234;      // insert double 1.234
    de 1.234;      // insert real 1.234
    db "abc";      // insert bytes 0x61, 0x62, and 0x63
    ds "abc";      // insert bytes 0x61, 0x00, 0x62, 0x00, 0x63, 0x00
}

Opcodes

A list of supported opcodes is at the end.

The following registers are supported. Register names are always in upper case.

 Register:
    AL
    AH
    AX
    EAX

BL BH BX EBX

CL CH CX ECX

DL DH DX EDX

BP EBP

SP ESP

DI EDI

SI ESI

ES CS SS DS GS FS

CR0 CR2 CR3 CR4

DR0 DR1 DR2 DR3 DR6 DR7

TR3 TR4 TR5 TR6 TR7

ST

ST(0) ST(1) ST(2) ST(3) ST(4) ST(5) ST(6) ST(7)

MM0 MM1 MM2 MM3 MM4 MM5 MM6 MM7

XMM0 XMM1 XMM2 XMM3 XMM4 XMM5 XMM6 XMM7

x86_64 adds these additional registers.

 Register64:
    RAX
    RBX
    RCX
    RDX

BPL RBP

SPL RSP

DIL RDI

SIL RSI

R8B R8W R8D R8

R9B R9W R9D R9

R10B R10W R10D R10

R11B R11W R11D R11

R12B R12W R12D R12

R13B R13W R13D R13

R14B R14W R14D R14

R15B R15W R15D R15

XMM8 XMM9 XMM10 XMM11 XMM12 XMM13 XMM14 XMM15

YMM0 YMM1 YMM2 YMM3 YMM4 YMM5 YMM6 YMM7

YMM8 YMM9 YMM10 YMM11 YMM12 YMM13 YMM14 YMM15

Special Cases

lock, rep, repe, repne, repnz, repz
These prefix instructions do not appear in the same statement as the instructions they prefix; they appear in their own statement. For example:
asm
{
    rep   ;
    movsb ;
}
pause
This opcode is not supported by the assembler, instead use
asm
{
    rep  ;
    nop  ;
}

which produces the same result.

floating point ops
Use the two operand form of the instruction format;
fdiv ST(1);     // wrong
fmul ST;        // wrong
fdiv ST,ST(1);  // right
fmul ST,ST(0);  // right

Operands

 Operand:
    AsmExp

AsmExp: AsmLogOrExp

AsmLogOrExp ? AsmExp : AsmExp

AsmLogOrExp: AsmLogAndExp

AsmLogOrExp || AsmLogAndExp

AsmLogAndExp: AsmOrExp

AsmLogAndExp && AsmOrExp

AsmOrExp: AsmXorExp

AsmOrExp | AsmXorExp

AsmXorExp: AsmAndExp

AsmXorExp ^ AsmAndExp

AsmAndExp: AsmEqualExp

AsmAndExp & AsmEqualExp

AsmEqualExp: AsmRelExp

AsmEqualExp == AsmRelExp AsmEqualExp != AsmRelExp

AsmRelExp: AsmShiftExp

AsmRelExp < AsmShiftExp AsmRelExp <= AsmShiftExp AsmRelExp > AsmShiftExp AsmRelExp >= AsmShiftExp

AsmShiftExp: AsmAddExp

AsmShiftExp << AsmAddExp AsmShiftExp >> AsmAddExp AsmShiftExp >>> AsmAddExp

AsmAddExp: AsmMulExp

AsmAddExp + AsmMulExp AsmAddExp - AsmMulExp

AsmMulExp: AsmBrExp

AsmMulExp * AsmBrExp AsmMulExp / AsmBrExp AsmMulExp % AsmBrExp

AsmBrExp: AsmUnaExp

AsmBrExp [ AsmExp ]

AsmUnaExp: AsmTypePrefix AsmExp

offsetof AsmExp seg AsmExp + AsmUnaExp - AsmUnaExp ! AsmUnaExp ~ AsmUnaExp AsmPrimaryExp

AsmPrimaryExp: IntegerLiteral FloatLiteral __LOCAL_SIZE $ Register

Register : AsmExp Register64

Register64 : AsmExp DotIdentifier

this

DotIdentifier: Identifier Identifier . DotIdentifier type, FundamentalType . Identifier

The operand syntax more or less follows the Intel CPU documentation conventions. In particular, the convention is that for two operand instructions the source is the right operand and the destination is the left operand. The syntax differs from that of Intel's in order to be compatible with the D language tokenizer and to simplify parsing.

The seg means load the segment number that the symbol is in. This is not relevant for flat model code. Instead, do a move from the relevant segment register.

A dotted expression is evaluated during the compilation and then must either give a constant or indicate a higher level variable that fits in the target register or variable.

Operand Types

 AsmTypePrefix:
    near ptr
    far ptr
    word ptr
    dword ptr
    qword ptr
    type, FundamentalType ptr

In cases where the operand size is ambiguous, as in:

add [EAX],3     ;

it can be disambiguated by using an AsmTypePrefix:

add  byte ptr [EAX],3 ;
add  int ptr [EAX],7  ;

far ptr is not relevant for flat model code.

Struct/Union/Class Member Offsets

To access members of an aggregate, given a pointer to the aggregate is in a register, use the .offsetof property of the qualified name of the member:

struct Foo { int a,b,c; }
int bar(Foo *f)
{
    asm
    {
        mov EBX,f                   ;
        mov EAX,Foo.b.offsetof[EBX] ;
    }
}
void main()
{
    Foo f = Foo(0, 2, 0);
    assert(bar(&f) == 2);
}

Alternatively, inside the scope of an aggregate, only the member name is needed:

struct Foo   // or class
{
    int a,b,c;
    int bar()
    {
        asm
        {
            mov EBX, this   ;
            mov EAX, b[EBX] ;
        }
    }
}
void main()
{
    Foo f = Foo(0, 2, 0);
    assert(f.bar() == 2);
}

Stack Variables

Stack variables (variables local to a function and allocated on the stack) are accessed via the name of the variable indexed by EBP:

int foo(int x)
{
    asm
    {
        mov EAX,x[EBP] ; // loads value of parameter x into EAX
        mov EAX,x      ; // does the same thing
    }
}

If the EBP is omitted, it is assumed for local variables. If naked is used, this no longer holds.

Special Symbols

$
Represents the program counter of the start of the next instruction. So,
jmp  $  ;

branches to the instruction following the jmp instruction. The $ can only appear as the target of a jmp or call instruction.

__LOCAL_SIZE
This gets replaced by the number of local bytes in the local stack frame. It is most handy when the naked is invoked and a custom stack frame is programmed.

Opcodes Supported

Opcodes
aaaaadaamaasadc
addaddpdaddpsaddsdaddss
andandnpdandnpsandpdandps
arplboundbsfbsrbswap
btbtcbtrbtscall
cbwcdqclccldclflush
clicltscmccmovacmovae
cmovbcmovbecmovccmovecmovg
cmovgecmovlcmovlecmovnacmovnae
cmovnbcmovnbecmovnccmovnecmovng
cmovngecmovnlcmovnlecmovnocmovnp
cmovnscmovnzcmovocmovpcmovpe
cmovpocmovscmovzcmpcmppd
cmppscmpscmpsbcmpsdcmpss
cmpswcmpxchgcmpxchg8bcmpxchg16b
comisdcomiss
cpuidcvtdq2pdcvtdq2pscvtpd2dqcvtpd2pi
cvtpd2pscvtpi2pdcvtpi2pscvtps2dqcvtps2pd
cvtps2picvtsd2sicvtsd2sscvtsi2sdcvtsi2ss
cvtss2sdcvtss2sicvttpd2dqcvttpd2picvttps2dq
cvttps2picvttsd2sicvttss2sicwdcwde
dadaadasdbdd
dedecdfdidiv
divpddivpsdivsddivssdl
dqdsdtdwemms
enterf2xm1fabsfaddfaddp
fbldfbstpfchsfclexfcmovb
fcmovbefcmovefcmovnbfcmovnbefcmovne
fcmovnufcmovufcomfcomifcomip
fcompfcomppfcosfdecstpfdisi
fdivfdivpfdivrfdivrpfeni
ffreefiaddficomficompfidiv
fidivrfildfimulfincstpfinit
fistfistpfisubfisubrfld
fld1fldcwfldenvfldl2efldl2t
fldlg2fldln2fldpifldzfmul
fmulpfnclexfndisifnenifninit
fnopfnsavefnstcwfnstenvfnstsw
fpatanfpremfprem1fptanfrndint
frstorfsavefscalefsetpmfsin
fsincosfsqrtfstfstcwfstenv
fstpfstswfsubfsubpfsubr
fsubrpftstfucomfucomifucomip
fucompfucomppfwaitfxamfxch
fxrstorfxsavefxtractfyl2xfyl2xp1
hltidivimulininc
insinsbinsdinswint
intoinvdinvlpgiretiretd
iretqjajaejbjbe
jcjcxzjejecxzjg
jgejljlejmpjna
jnaejnbjnbejncjne
jngjngejnljnlejno
jnpjnsjnzjojp
jpejpojsjzlahf
larldmxcsrldslealeave
leslfencelfslgdtlgs
lidtlldtlmswlocklods
lodsblodsdlodswlooploope
loopneloopnzloopzlsllss
ltrmaskmovdqumaskmovqmaxpdmaxps
maxsdmaxssmfenceminpdminps
minsdminssmovmovapdmovaps
movdmovdq2qmovdqamovdqumovhlps
movhpdmovhpsmovlhpsmovlpdmovlps
movmskpdmovmskpsmovntdqmovntimovntpd
movntpsmovntqmovqmovq2dqmovs
movsbmovsdmovssmovswmovsx
movupdmovupsmovzxmulmulpd
mulpsmulsdmulssnegnop
notororpdorpsout
outsoutsboutsdoutswpackssdw
packsswbpackuswbpaddbpadddpaddq
paddsbpaddswpaddusbpadduswpaddw
pandpandnpavgbpavgwpcmpeqb
pcmpeqdpcmpeqwpcmpgtbpcmpgtdpcmpgtw
pextrwpinsrwpmaddwdpmaxswpmaxub
pminswpminubpmovmskbpmulhuwpmulhw
pmullwpmuludqpoppopapopad
popfpopfdporprefetchntaprefetcht0
prefetcht1prefetcht2psadbwpshufdpshufhw
pshuflwpshufwpslldpslldqpsllq
psllwpsradpsrawpsrldpsrldq
psrlqpsrlwpsubbpsubdpsubq
psubsbpsubswpsubusbpsubuswpsubw
punpckhbwpunpckhdqpunpckhqdqpunpckhwdpunpcklbw
punpckldqpunpcklqdqpunpcklwdpushpusha
pushadpushfpushfdpxorrcl
rcppsrcpssrcrrdmsrrdpmc
rdtscrepreperepnerepnz
repzretretfrolror
rsmrsqrtpsrsqrtsssahfsal
sarsbbscasscasbscasd
scaswsetasetaesetbsetbe
setcsetesetgsetgesetl
setlesetnasetnaesetnbsetnbe
setncsetnesetngsetngesetnl
setnlesetnosetnpsetnssetnz
setosetpsetpesetposets
setzsfencesgdtshlshld
shrshrdshufpdshufpssidt
sldtsmswsqrtpdsqrtpssqrtsd
sqrtssstcstdstistmxcsr
stosstosbstosdstoswstr
subsubpdsubpssubsdsubss
syscallsysentersysexitsysrettest
ucomisducomissud2unpckhpdunpckhps
unpcklpdunpcklpsverrverwwait
wbinvdwrmsrxaddxchgxlat
xlatbxorxorpdxorps

Pentium 4 (Prescott) Opcodes Supported

Pentium 4 Opcodes
addsubpdaddsubpsfisttphaddpdhaddps
hsubpdhsubpslddqumonitormovddup
movshdupmovsldupmwait

AMD Opcodes Supported

AMD Opcodes
pavgusbpf2idpfaccpfaddpfcmpeq
pfcmpgepfcmpgtpfmaxpfminpfmul
pfnaccpfpnaccpfrcppfrcpit1pfrcpit2
pfrsqit1pfrsqrtpfsubpfsubrpi2fd
pmulhrwpswapd

SIMD

SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2 and AVX are supported.

GCC syntax

The GNU D Compiler uses an alternative, GCC-based syntax for inline assembler:

 GccAsmStatement:
    asm function, FunctionAttributesopt { GccAsmInstructionList }

GccAsmInstructionList: GccAsmInstruction ; GccAsmInstruction ; GccAsmInstructionList

GccAsmInstruction: GccBasicAsmInstruction

GccExtAsmInstruction

GccGotoAsmInstruction

GccBasicAsmInstruction: expression, AssignExpression

GccExtAsmInstruction: expression, AssignExpression : GccAsmOperandsopt expression, AssignExpression : GccAsmOperandsopt : GccAsmOperandsopt expression, AssignExpression : GccAsmOperandsopt : GccAsmOperandsopt : GccAsmClobbersopt

GccGotoAsmInstruction: expression, AssignExpression : : GccAsmOperandsopt : GccAsmClobbersopt : GccAsmGotoLabelsopt

GccAsmOperands: GccSymbolicNameopt StringLiteral ( expression, AssignExpression ) GccSymbolicNameopt StringLiteral ( expression, AssignExpression ) , GccAsmOperands

GccSymbolicName: [ Identifier ]

GccAsmClobbers: StringLiteral StringLiteral , GccAsmClobbers

GccAsmGotoLabels: Identifier Identifier , GccAsmGotoLabels

float, Floating Point, ddoc, Embedded Documentation