Assembly language for CE newbie #3: Arithmetic, Conditions and Branches operations with SSE

General Assembly and Auto Assembler tutorials


Post Reply
User avatar
bbfox
Table Master
Table Master
Journeyman Hacker
Journeyman Hacker
Posts: 225
Joined: Sat Jul 23, 2022 8:59 am
Answers: 0
x 505

Assembly language for CE newbie #3: Arithmetic, Conditions and Branches operations with SSE

Post by bbfox »

This article is for newbies.

Note: I'm not an expert. I'm just someone who knows some instructions and can write Auto Assembler scripts. This article shares my experiences.

Warning: This post features AI-assisted content. While I created the first document, an AI arranged the syntax and wording, which I then curated. If you prefer not to engage with such material, please use your browser's back button.



Table of Contents
Assembly language for CE newbie #3: Arithmetic, Conditions and Branches operations with SSE
Arithmetic operation SSE Scalar instructions
Arithmetic operations with AVX Scalar Instructions
Move / Convert data type between source and SSE/AVX instructions
Conditions and Branches with SSE/AVX instructions
Understanding SHUFPS and VSHUFPS Instructions in SIMD Programming




Assembly language for CE newbie #3: Arithmetic, Conditions and Branches operations with SSE


Reference:
About Intel x64 CPU Registers
Performing Arithmetic Operations on Floats using SSE2 (Source Integer in EAX)


"SSE", or "Streaming SIMD Extensions", is a set of single instructions to operate on multiple data. Four 32-bit floats, two 64-bit double or integers.

SSE instructions operate scalar/serial or packed/parallel data. In the instruction name, if it include "s?" means scalar/serial, "p?" means packed/parallel.

In CE environment, we usually operate Scalar single-precision:"float" or Scalar double-precision:"double". In this type of instruction set, the instruction naming contains "ss" / "sd"


SSE XMM Registers
A XMM register is 128-bit width, that means it can store 4 32-bit floats or 2 64-bit double value.

  • If a XMM register store and operate for only one float or double, it's called Scalar (Single/Double)-precision float/double.

  • If a XMM register store and operate for 4 float or 2 double at one instruction, it's called Packed (Single/Double)-precision (4 floats/2 doubles).


AVX instructions introduce new ?MM series registers

  • A YMM register is 256-bit width, that means it can store 8 32-bit floats or 4 64-bit double value.

  • A ZMM register is 512-bit width, that means it can store 16 32-bit floats or 8 64-bit double value.

As Cheat Engine 7.5, it's Auto Assembler (AA) script does not support ZMM. YMM registers can be used but no syntax highlighting for YMM registers.
I never used any YMM registers so far.

XMM  registers (128 bit width)
instruction series:

                                      Scalar             Packed
-----------------------------------------------------------------------------
Single-precision "float"	        ss                 ps (4 floats/Singles)
Double-precision "double"               sd                 pd (2 Doubles)

XMM  registers (128 bit width)

bit#              128    96     64     32      0
---------------------------+------+------+------+
ss instructions                           XXXXXX  1 float
ps instructions      XXXXXX XXXXXX XXXXXX XXXXXX  4 floats
sd instructions                    XXXXXX-XXXXXX  1 double
pd instructions      XXXXXX-XXXXXX XXXXXX-XXXXXX  2 doubles

instructions example:
| Instruction | Operation                       |
|-------------|---------------------------------|
| addss       | Add Scalar Single-precision     |
| addsd       | Add Scalar Double-precision     |
| addps       | Add Packed Single-precision     |

In CE's Auto Assembler, most of the instructions I use are from the ss series -- Scalar Single-precision operations. Occasionally, I use the sd series, depending on whether the source data is already in double precision format.


Some cases for SS operations

Use ss or sd depends on what you want. There is no rule for this.

For integer operation, just use integer arithmetic instructions, i.e. add, sub, mul...etc if you want to.

Reasons I use SS instructions:

  • Source data is float

  • Source data is integer and I want to do some multiplier like 1.33 or 2.5

  • Source data is integer and I don't want to maintain RDX/RAX register via push/pop

Reasons I use SD instructions:

  • Source data is double



Arithmetic operation SSE Scalar instructions

Instruction format:

  • ???ss xmm1, xmm2/m32: operand 1 is xmm register, operand 2 can be an xmm register or 32-bit memory location.

  • ???sd xmm1, xmm2/m64: operand 1 is xmm register, operand 2 can be an xmm register or 64-bit memory location.


Common used instructions i used:
Add

addss: for float
addsd: for double

Example:

addss xmm1, xmm2  ; xmm1 = xmm1 + xmm2

Subtract

subss: for float
subsd: for double

Example:

subss xmm1, xmm2  ; xmm1 = xmm1 - xmm2

Multiply

mulss: for float
mulsd: for double

Example:

mulss xmm1, xmm2  ; xmm1 = xmm1 * xmm2

Divide

divss: for float
divsd: for double

Example:

divss xmm1, xmm2  ; xmm1 = xmm1 / xmm2

The addss/subss.. instructions listed above are included in SSE3 instruction set (Y2004).



Arithmetic operations with AVX Scalar Instructions

AVX Instruction Set
AVX was introduced in 2011:

  • Intel: Starting from the "Sandy Bridge" architecture or later.
  • AMD: Starting from the Bulldozer, Piledriver, Steamroller, Excavator, and Zen architectures support AVX.

Atom, Celeron, or older Pentium CPUs may not support AVX.
Use the CPU-Z tool to check if your CPU supports AVX.

The SSE3 Scalar instructions mentioned above, such as addss/subss, overwrite the destination XMM register's value with the result. In some cases, we have to reload the original data from the source. AVX introduces similar instructions that do not destroy the original register's content. I use AVX whenever possible (note: AVX cannot be used in 32-bit programs). The drawback is that if the user's CPU is very old or does not support AVX, the script may crash the program.

Instruction Format:
v???ss xmm1, xmm2, xmm3/m32
v???sd xmm1, xmm2, xmm3/m64

The result is stored in xmm1.

Common Instructions I Use:
Add

- vaddss: for float
- vaddsd: for double

Example:

vaddss xmm1, xmm2, xmm3              ; xmm1 = xmm2 + xmm3
vaddss xmm1, xmm2, dword ptr [var1]  ; xmm1 = xmm2 + memory location [var1]

Subtract

- vsubss: for float
- vsubsd: for double

Example:

vsubss xmm1, xmm2, xmm3  ; xmm1 = xmm2 - xmm3
vsubss xmm1, xmm1, xmm3  ; xmm1 = xmm1 - xmm3

Multiply

- vmulss: for float
- vmulsd: for double

Example:

vmulss xmm1, xmm2, xmm3  ; xmm1 = xmm2 * xmm3

Divide

- vdivss: for float
- vdivsd: for double

Example:

vdivss xmm1, xmm2, xmm3  ; xmm1 = xmm2 / xmm3

Image



Move / Convert Data Types Between Source and SSE/AVX Instructions

The first problem I faced was: how to convert or move data from a source to xmm registers?
I used two major types of instructions:

  1. Move-in / Move-out data: moving data from general-purpose registers to XMM registers, from memory to XMM registers, and from XMM registers to registers or memory.
  2. Data conversion: converting integers to floats and floats to integers.

Move-In / Move-Out Data

                  SSE         AVX
-------------------------------------
(Source Type)
float            movss      vmovss
double           movsd      vmovsd
32-bit integer    movd       vmovd
64-bit integer    movq       vmovq

Example:

movss xmm1, [fltVar1]    ; copy float from memory [fltVar1] to xmm1
                         ; lower 32 bits replaced by [fltVar1]
                         ; higher bits will be cleared.

vmovss xmm1, [fltVar1]   ; copy float from memory [fltVar1] to xmm1 with AVX
                         ; lower 32 bits replaced by [fltVar1]
                         ; higher bits will be cleared.

movss xmm1, xmm2         ; copy lower 32-bit data from xmm2 to xmm1

vmovss xmm1, xmm3, xmm2  ; copy lower 32-bit data from xmm2 to xmm1
                         ; copy 33-128 bit data from xmm3 to xmm1

movss [fltVar2], xmm1    ; copy lower 32-bit xmm1 float to [fltVar2]

vmovss [fltVar2], xmm1   ; copy lower 32-bit xmm1 float to [fltVar2]

movd xmm1, eax           ; copy data from eax to xmm1 lower 32-bit
                         ; higher bits cleared (set to zero)

vmovd xmm1, eax          ; copy data from eax to xmm1 lower 32-bit with AVX
                         ; higher bits cleared (set to zero)

vmovd xmm1, [intVar1]    ; copy data from [intVar1] to xmm1 lower 32-bit with AVX
                         ; higher bits cleared (set to zero)

Convert data type from one to another

Converting between different types is a common operation for SSE/AVX instructions.
Typically, we convert data between float and integer data types.

SSE instructions
             (to)  32-bit int.      float         double
--------------------------------------------------------
(From)
32/64-bit integer         N/A    cvtsi2ss       cvtsi2sd
float                cvtss2si         N/A       cvtss2sd
double               cvtsd2si    cvtsd2ss            N/A

  
AVX instructions (to) 32-bit int. float double -------------------------------------------------------- (From) 32/64-bit integer N/A vcvtsi2ss vcvtsi2sd float vcvtss2si N/A vcvtss2sd double vcvtsd2si vcvtsd2ss N/A

Most frequently used are the integer <-> float conversions.

Example:

cvtsi2ss xmm0, eax         ; convert integer in eax to float, store in
                           ; xmm0's lower 32-bit location
                           ; other bits in xmm0 remain unchanged

vcvtsi2ss xmm0, xmm1, eax  ; convert integer in eax to float, store in
                           ; xmm0's lower 32-bit location
                           ; other bits in xmm0 replaced by xmm1

cvtss2si eax, xmm0         ; convert float in xmm0's lower 32-bit to
                           ; integer, stored in eax

vcvtss2si eax, xmm0        ; convert float in xmm0's lower 32-bit to
                           ; integer with AVX, stored in eax

Example of multiplication with SSE:

cvtsi2ss xmm15, eax      ; convert 32-bit integer in eax to float in xmm15
movss xmm14, [fltMul]    ; move float value from memory [fltMul] to xmm14
mulss xmm15, xmm14       ; xmm15 = xmm15 * xmm14
cvtss2si eax, xmm15      ; convert float in xmm15 to 32-bit integer, store in eax

Example of multiplication with AVX:

vcvtsi2ss xmm15, xmm15, eax  ; convert 32-bit integer in eax to float in xmm15
vmovss xmm14, [fltMul]       ; move float value from memory [fltMul] to xmm14
vmulss xmm13, xmm15, xmm14   ; xmm13 = xmm15 * xmm14
vcvtss2si eax, xmm13         ; convert float in xmm13 to 32-bit integer, store in eax

Remember to save registers before using them, this depends how these registers are used before injection.
Reference:
Preserving XMM Registers
Preserving XMM Registers to Pre-Allocated Memory
Preserving Register States in Assembly



Conditions and Branches with SSE/AVX Instructions

Sometimes we need to perform checks to see if a value is non-negative (i.e., must be >= 0). For general-purpose registers (GPRs), we use the cmp instruction along with branch instructions like ja, je, or jge to perform different operations.
In SSE/AVX, we can use comiss or ucomiss to complete this task:

  • GPRs: cmp

  • SSE: comiss, ucomiss

  • AVX: vcomiss, vucomiss

The difference between comiss and ucomiss is how they handle NaN (Not a Number). When NaN is found, EFLAGS are set as follows:

        Flags:     CF       PF       ZF
------------------------------------------
comiss              1        1        1
ucomiss             1        1        0
vcomiss             1        1        1
vucomiss            1        1        0

You can check if NaN with instruction, like jp.
I do not check for NaN in all cases. That means whether I use comiss or ucomiss depends on the specific requirements. The program may crash or fail if NaN is encountered.

After executing comiss or ucomiss, we can now use branch instructions like ja, jb, je, jae, jbe, etc., to perform conditional branching.

Example:
Return a multiplied value only if the result is > 0.

cvtsi2ss xmm15, eax      ; Convert 32-bit integer in eax to float in xmm15.
movss xmm14, [fltMul]    ; Move float value from memory [fltMul] to xmm14.
mulss xmm15, xmm14       ; xmm15 = xmm15 * xmm14.
subss xmm15, [decVal]    ; xmm15 = xmm15 - [decVal].
xorps xmm14, xmm14       ; Clear xmm14 to zero.
comiss xmm15, xmm14
jbe endp                 ; Jump if xmm15 <= 0, skip conversion if result not > 0.

cvtss2si eax, xmm15      ; Convert float in xmm15 to 32-bit integer, store in eax.

endp:

Or using AVX instructions:

vcvtsi2ss xmm15, xmm15, eax      ; Convert 32-bit integer in eax to float in xmm15.
vmovss xmm14, [fltMul]           ; Move float value from memory [fltMul] to xmm14.
vmulss xmm13, xmm15, xmm14       ; xmm13 = xmm15 * xmm14.
vsubss xmm13, xmm13, [decVal]    ; xmm13 = xmm13 - [decVal].
vxorps xmm14, xmm14, xmm14       ; Clear xmm14 to zero.
vcomiss xmm15, xmm14
jbe endp                         ; Jump if xmm15 <= 0, skip conversion if result not > 0.

vcvtss2si eax, xmm15             ; Convert float in xmm15 to 32-bit integer, store in eax.

endp:

Notice for float to integer conversion:
If the original value in the xmm register exceeds 2147483647, it is uncertain what will happen to eax. It may become the maximum allowed value of 2147483647 or something else.
You may choose to add more checks before conversion, but I have chosen to ignore it. Users must accept the risk themselves.


Understanding SHUFPS and VSHUFPS Instructions in SIMD Programming

shufps and vshufps are powerful SIMD instructions used for reordering (shuffling) single-precision floating-point elements within xmm or ymm registers. These instructions are widely used in SSE (Streaming SIMD Extensions) and AVX (Advanced Vector Extensions) for efficiently manipulating data. In this article, we will discuss how these instructions work, with a particular focus on the use of the immediate value (imm8) control byte.

1. The Structure of shufps and vshufps Control Byte (imm8)

The control byte (imm8) in shufps and vshufps instructions is an 8-bit immediate value that determines how elements in the source registers are rearranged. The 8 bits are split into four pairs, and each pair controls the destination of one of the 32-bit elements in the target register.

  • imm8 consists of 8 bits:

    Code: Select all

    [7, 6, 5, 4, 3, 2, 1, 0]
  • Each pair of bits controls a specific position in the target register:
    • Bits [1:0]: Controls the element in position 0 of the target register.
    • Bits [3:2]: Controls the element in position 1 of the target register.
    • Bits [5:4]: Controls the element in position 2 of the target register.
    • Bits [7:6]: Controls the element in position 3 of the target register.

The value of each pair determines the source of the element in the final register:

  • 00 (“0”): Selects the element from position 0 of the source register.
  • 01 (“1”): Selects the element from position 1 of the source register.
  • 10 (“2”): Selects the element from position 2 of the source register.
  • 11 (“3”): Selects the element from position 3 of the source register.

2. Example: imm8 = 0b01000100 (0x44)

Let’s take an example with imm8 set to 0b01000100 (which is equivalent to 0x44 in hexadecimal). Here’s how the control byte is interpreted:

  • Bits [1:0] (“00”): The element at position 0 of the target register will come from position 0 of the source1 register.
  • Bits [3:2] (“01”): The element at position 1 of the target register will come from position 1 of the source1 register.
  • Bits [5:4] (“00”): The element at position 2 of the target register will come from position 0 of the source2 register.
  • Bits [7:6] (“01”): The element at position 3 of the target register will come from position 1 of the source2 register.

Thus, imm8 = 0b01000100 effectively swaps and duplicates elements from the source register.

3. Example Operation

Suppose xmm1 contains the elements [d, c, b, a], and xmm2 contains [h, g, f, e]. Using the following instruction:

Code: Select all

shufps xmm1, xmm2, 0b11010001

The result in xmm1 would be [h, f, a, b]. This means that:

  • The 0th element is replaced by the #1 element from xmm1 (“b”).
  • The 1st element is replaced by the #0 element from xmm1 (“a”).
  • The 2nd element is also taken from the #1 element of xmm2 (“f”).
  • The 3rd element is taken from the #3 element of xmm2 (“h”).

3.1 Example Operation: rotate

Code: Select all

shufps xmm1, xmm1, 0x39

We want floats in xmm1 rotate right. From [e, f, g, h] to [h, e, f, g]. The imm8 code should be:

  • The 0th element is replaced by the #1 element from xmm1 (“g”); imm bits = 01
  • The 1st element is replaced by the #2 element from xmm1 (“f”); imm bits = 10
  • The 2nd element is replaced by the #3 element of xmm1 (“e”); imm bits = 11
  • The 3rd element is replaced by the #0 element of xmm1 (“h”); imm bits = 00

imm8 = 0b00111001 = 0x39:

4. Key Differences Between shufps and vshufps

  • shufps: Used for 128-bit xmm registers. This instruction is part of the SSE instruction set and allows rearranging elements within 128-bit registers.
  • vshufps: An AVX instruction that can be used with both 128-bit (xmm) and 256-bit (ymm) registers. It supports three operands: two source registers and a destination register. This flexibility makes vshufps more powerful for certain operations, as it allows preserving the original source registers while writing the shuffled result to a different destination register.
    vshufps instruction (ymm related not explained):
    vshufps xmm1, xmm2, xmm3/m128, imm8
    
    Source: xmm2, xmm3
    Destination: xmm1
    How it works:
    Lower position elements 0 - 1: picked from xmm2
    Upper position elements 2 - 3: picked from xmm3/m128
    Store result in xmm1

More examples
Copy lowest float in xmm to other position 1-3:

shufps xmm1, xmm1, 0
//or
vshufps xmm1, xmm1, xmm1, 0

Copy float data from position 1 to position 0 (don't care others):

shufps xmm1, xmm1, 1
//or
vshufps xmm1, xmm1, xmm1, 1

Copy float data from position 2 to position 0 (don't care others):

shufps xmm1, xmm1, 2
//or
vshufps xmm1, xmm1, xmm1, 2

Combine lowest position of float in xmm2 into xmm1 position 1:

shufps xmm1, xmm2, 0 // pos 0-1: xmm1:0, pos 2-3:xmm2:0; or movlhps xmm1, xmm2: copy 2 low elements to high position
shufps xmm1, xmm1, 8 // 0b00001000 // pos0: xmm1:0, pos1:xmm1:2, pos3:xmm1:0, pos4:xmm1:0
//or 
insertps xmm1, xmm2, 0x40 //7:6 -> dest. pos, 5:4 -> src. pos. 3:0 --> 0 = do not clear source

A shufps / vshufps html JavaScript helper:

Code: Select all

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>SHUFPS / VSHUFPS Imm8 Helper</title>
    <style>
        body {
            font-family: Arial, sans-serif;
        }
        .container {
            max-width: 600px;
            margin: 0 auto;
        }
        .xmm-select {
            margin-bottom: 20px;
        }
    </style>
</head>
<body>
<div class="container">
    <h1>SHUFPS / VSHUFPS Imm8 Helper</h1>
    <div class="xmm-select">
        <label for="xmm1">First XMM Register:</label>
        <select id="xmm1">
            <option value="xmm0">xmm0</option>
            <option value="xmm1">xmm1</option>
            <option value="xmm2">xmm2</option>
            <option value="xmm3">xmm3</option>
            <option value="xmm4">xmm4</option>
            <option value="xmm5">xmm5</option>
            <option value="xmm6">xmm6</option>
            <option value="xmm7">xmm7</option>
            <option value="xmm8">xmm8</option>
            <option value="xmm9">xmm9</option>
            <option value="xmm10">xmm10</option>
            <option value="xmm11">xmm11</option>
            <option value="xmm12">xmm12</option>
            <option value="xmm13">xmm13</option>
            <option value="xmm14">xmm14</option>
            <option value="xmm15">xmm15</option>
        </select>
    </div>
    <div class="xmm-select">
        <label for="xmm2">Second XMM Register:</label>
        <select id="xmm2">
            <option value="xmm0">xmm0</option>
            <option value="xmm1">xmm1</option>
            <option value="xmm2">xmm2</option>
            <option value="xmm3">xmm3</option>
            <option value="xmm4">xmm4</option>
            <option value="xmm5">xmm5</option>
            <option value="xmm6">xmm6</option>
            <option value="xmm7">xmm7</option>
            <option value="xmm8">xmm8</option>
            <option value="xmm9">xmm9</option>
            <option value="xmm10">xmm10</option>
            <option value="xmm11">xmm11</option>
            <option value="xmm12">xmm12</option>
            <option value="xmm13">xmm13</option>
            <option value="xmm14">xmm14</option>
            <option value="xmm15">xmm15</option>
        </select>
    </div>
    <div class="xmm-select">
        <label for="xmm_dest">Destination XMM Register (for VSHUFPS):</label>
        <select id="xmm_dest">
            <option value="xmm0">xmm0</option>
            <option value="xmm1">xmm1</option>
            <option value="xmm2">xmm2</option>
            <option value="xmm3">xmm3</option>
            <option value="xmm4">xmm4</option>
            <option value="xmm5">xmm5</option>
            <option value="xmm6">xmm6</option>
            <option value="xmm7">xmm7</option>
            <option value="xmm8">xmm8</option>
            <option value="xmm9">xmm9</option>
            <option value="xmm10">xmm10</option>
            <option value="xmm11">xmm11</option>
            <option value="xmm12">xmm12</option>
            <option value="xmm13">xmm13</option>
            <option value="xmm14">xmm14</option>
            <option value="xmm15">xmm15</option>
        </select>
    </div>
    <h3>Select Floats to Shuffle</h3>
    <p>Choose the 4 positions from the two registers (0-1 from the first xmm, 2-3 from the second xmm).</p>
    <div id="float-positions">
        <label>Result Position 0:</label>
        <select class="position-select" id="pos0">
            <option value="0">xmm 1st[0]</option>
            <option value="1">xmm 1st[1]</option>
            <option value="2">xmm 1st[2]</option>
            <option value="3">xmm 1st[3]</option>
        </select>
        <br>
        <label>Result Position 1:</label>
        <select class="position-select" id="pos1">
            <option value="0">xmm 1st[0]</option>
            <option value="1">xmm 1st[1]</option>
            <option value="2">xmm 1st[2]</option>
            <option value="3">xmm 1st[3]</option>
        </select>
        <br>
        <label>Result Position 2:</label>
        <select class="position-select" id="pos2">
            <option value="0">xmm 2nd[0]</option>
            <option value="1">xmm 2nd[1]</option>
            <option value="2">xmm 2nd[2]</option>
            <option value="3">xmm 2nd[3]</option>
        </select>
        <br>
        <label>Result Position 3:</label>
        <select class="position-select" id="pos3">
            <option value="0">xmm 2nd[0]</option>
            <option value="1">xmm 2nd[1]</option>
            <option value="2">xmm 2nd[2]</option>
            <option value="3">xmm 2nd[3]</option>
        </select>
    </div>
    <br>
    <button onclick="generateInstruction('shufps')">Generate SHUFPS Instruction</button>
    <button onclick="generateInstruction('vshufps')">Generate VSHUFPS Instruction</button>
    <h3>Result:</h3>
    <p id="instruction"></p>
</div>
<script>
    function generateInstruction(type) {
        const xmm1 = document.getElementById('xmm1').value;
        const xmm2 = document.getElementById('xmm2').value;
        const xmmDest = document.getElementById('xmm_dest').value;
        const pos0 = parseInt(document.getElementById('pos0').value);
        const pos1 = parseInt(document.getElementById('pos1').value);
        const pos2 = parseInt(document.getElementById('pos2').value);
        const pos3 = parseInt(document.getElementById('pos3').value);

    // Calculate imm8 value
    const imm8 = (pos3 << 6) | (pos2 << 4) | (pos1 << 2) | pos0;

    // Generate instruction
    let instruction = '';
    if (type === 'shufps') {
        instruction = `shufps ${xmm1}, ${xmm2}, 0x${imm8.toString(16)}`;
    } else if (type === 'vshufps') {
        instruction = `vshufps ${xmmDest}, ${xmm1}, ${xmm2}, 0x${imm8.toString(16)}`;
    }
    document.getElementById('instruction').textContent = instruction;
}
</script>
</body>
</html>
Last edited by bbfox on Sat Dec 14, 2024 12:10 am, edited 10 times in total.

I create tables to suit my preferences. Table is free to use, but need to leave the author's name and source URL: https://opencheattables.com.
Table will not be up-to-date. Feel free to modify it, but kindly provide credit to the source.


Alex Darkside
Cheater
Cheater
Posts: 14
Joined: Sun Aug 21, 2022 12:08 pm
Answers: 0
x 10

Re: Assembly language for CE newbie #3: Arithmetic, Conditions and Branches operations with SSE

Post by Alex Darkside »

bbfox wrote: Sun Apr 14, 2024 9:55 am

movss xmm1, [fltVar1] ; copy float from memory [fltVar1] to xmm1
; lower 32 bits replaced by [fltVar1]
; higher bits remain unchanged.

(Google translation)

Small clarification.
This is a screenshot from the "Intel® 64 and IA-32 Architectures Software Developer's Manual".
Please pay attention to the highlighted places in the text.

Image


User avatar
bbfox
Table Master
Table Master
Journeyman Hacker
Journeyman Hacker
Posts: 225
Joined: Sat Jul 23, 2022 8:59 am
Answers: 0
x 505

Re: Assembly language for CE newbie #3: Arithmetic, Conditions and Branches operations with SSE

Post by bbfox »

Corrected.
I don't know what's behavior. Even in Visual Studio, I tested it and high bits are cleared.

Checked before and after vmovss xmm1, dword ptr [exflt1]
Image

Image

Code: Select all

; masm code start
align 10h

flt1 real4 1.0f
flt2 real4 2.0f
flt3 real4 3.0f
flt4 real4 4.0f

mul1 real4 1.5f
mul2 real4 2.0f
mul3 real4 2.5f
mul4 real4 3.0f

res1 real4 0.0f, 0.0f, 0.0f, 0.0f

exflt1 real4 100.0f
exflt2 real4 200.0f

.code
main PROC
    ; Code start
    movaps xmm1, [flt1]
    vmovss xmm1, dword ptr [exflt1]

movaps xmm1, [flt1]

I don't know what the "load" means. From manual the description for move from memory to xmm is:
Image

Source: Intel 64 and IA-32 Architectures Software Developer’s Manual: Volume 2B, instruction set reference M-U, page 124
So high bits will be cleared.


I create tables to suit my preferences. Table is free to use, but need to leave the author's name and source URL: https://opencheattables.com.
Table will not be up-to-date. Feel free to modify it, but kindly provide credit to the source.


Post Reply