This article is for newbies.

Note: I'm not an expert. I'm just someone who knows some instructions and can write Auto Assembler scripts. This article shares my experiences.

Warning: This post features **AI-assisted content**. While I created the first document, an AI arranged the syntax and wording, which I then curated. If you prefer not to engage with such material, please use your browser's back button.

Table of Contents

Assembly language for CE newbie #3: Arithmetic, Conditions and Branches operations with SSE

Arithmetic operation SSE Scalar instructions

Arithmetic operations with AVX Scalar Instructions

Move / Convert data type between source and SSE/AVX instructions

Conditions and Branches with SSE/AVX instructions

**Assembly language for CE newbie #3: Arithmetic, Conditions and Branches operations with SSE**

Reference:

About Intel x64 CPU Registers

Performing Arithmetic Operations on Floats using SSE2 (Source Integer in EAX)

"SSE", or "Streaming SIMD Extensions", is a set of single instructions to operate on multiple data. Four 32-bit floats, two 64-bit double or integers.

SSE instructions operate scalar/serial or packed/parallel data. In the instruction name, if it include "s?" means scalar/serial, "p?" means packed/parallel.

In CE environment, we usually operate Scalar single-precision:"float" or Scalar double-precision:"double". In this type of instruction set, the instruction naming contains "ss" / "sd"

**SSE XMM Registers**

A XMM register is 128-bit width, that means it can store 4 32-bit floats or 2 64-bit double value.

If a XMM register store and operate for only one float or double, it's called Scalar (Single/Double)-precision float/double.

If a XMM register store and operate for 4 float or 2 double at one instruction, it's called Packed (Single/Double)-precision (4 floats/2 doubles).

**AVX instructions introduce new ?MM series registers**

A YMM register is 256-bit width, that means it can store 8 32-bit floats or 4 64-bit double value.

A ZMM register is 512-bit width, that means it can store 16 32-bit floats or 8 64-bit double value.

As Cheat Engine 7.5, it's Auto Assembler (AA) script does not support ZMM. YMM registers can be used but no syntax highlighting for YMM registers.

I never used any YMM registers so far.

XMM registers (128 bit width) instruction series: Scalar Packed ----------------------------------------------------------------------------- Single-precision "float" ss ps (4 floats/Singles) Double-precision "double" sd pd (2 Doubles)

XMM registers (128 bit width) bit# 128 96 64 32 0 ---------------------------+------+------+------+ ss instructions XXXXXX 1 float ps instructions XXXXXX XXXXXX XXXXXX XXXXXX 4 floats sd instructions XXXXXX-XXXXXX 1 double pd instructions XXXXXX-XXXXXX XXXXXX-XXXXXX 2 doubles instructions example: | Instruction | Operation | |-------------|---------------------------------| | addss | Add Scalar Single-precision | | addsd | Add Scalar Double-precision | | addps | Add Packed Single-precision |

In CE's Auto Assembler, most of the instructions I use are from the ss series -- Scalar Single-precision operations. Occasionally, I use the sd series, depending on whether the source data is already in double precision format.

**Some cases for SS operations**

Use ss or sd depends on what you want. There is no rule for this.

For integer operation, just use integer arithmetic instructions, i.e. `add`

, `sub`

, `mul`

...etc if you want to.

Reasons I use SS instructions:

Source data is float

Source data is integer and I want to do some multiplier like 1.33 or 2.5

Source data is integer and I don't want to maintain RDX/RAX register via push/pop

Reasons I use SD instructions:

Source data is double

**Arithmetic operation SSE Scalar instructions**

Instruction format:

???ss xmm1, xmm2/m32: operand 1 is xmm register, operand 2 can be an xmm register or 32-bit memory location.

???sd xmm1, xmm2/m64: operand 1 is xmm register, operand 2 can be an xmm register or 64-bit memory location.

Common used instructions i used:

**Add**

addss: for float addsd: for double

Example:

addss xmm1, xmm2 ; xmm1 = xmm1 + xmm2

**Subtract**

subss: for float subsd: for double

Example:

subss xmm1, xmm2 ; xmm1 = xmm1 - xmm2

**Multiply**

mulss: for float mulsd: for double

Example:

mulss xmm1, xmm2 ; xmm1 = xmm1 * xmm2

**Divide**

divss: for float divsd: for double

Example:

divss xmm1, xmm2 ; xmm1 = xmm1 / xmm2

The addss/subss.. instructions listed above are included in SSE3 instruction set (Y2004).

**Arithmetic operations with AVX Scalar Instructions**

**AVX Instruction Set**

AVX was introduced in 2011:

**Intel:**Starting from the "Sandy Bridge" architecture or later.**AMD:**Starting from the Bulldozer, Piledriver, Steamroller, Excavator, and Zen architectures support AVX.

Atom, Celeron, or older Pentium CPUs may not support AVX.

Use the CPU-Z tool to check if your CPU supports AVX.

The SSE3 Scalar instructions mentioned above, such as addss/subss, overwrite the destination XMM register's value with the result. In some cases, we have to reload the original data from the source. AVX introduces similar instructions that do not destroy the original register's content. I use AVX whenever possible (note: AVX cannot be used in 32-bit programs). The drawback is that if the user's CPU is very old or does not support AVX, the script may crash the program.

**Instruction Format:**

v???ss xmm1, xmm2, xmm3/m32

v???sd xmm1, xmm2, xmm3/m64

The result is stored in xmm1.

**Common Instructions I Use:**

**Add**

- vaddss: for float - vaddsd: for double

Example:

vaddss xmm1, xmm2, xmm3 ; xmm1 = xmm2 + xmm3 vaddss xmm1, xmm2, dword ptr [var1] ; xmm1 = xmm2 + memory location [var1]

**Subtract**

- vsubss: for float - vsubsd: for double

Example:

vsubss xmm1, xmm2, xmm3 ; xmm1 = xmm2 - xmm3 vsubss xmm1, xmm1, xmm3 ; xmm1 = xmm1 - xmm3

**Multiply**

- vmulss: for float - vmulsd: for double

Example:

vmulss xmm1, xmm2, xmm3 ; xmm1 = xmm2 * xmm3

**Divide**

- vdivss: for float - vdivsd: for double

Example:

vdivss xmm1, xmm2, xmm3 ; xmm1 = xmm2 / xmm3

**Move / Convert Data Types Between Source and SSE/AVX Instructions**

The first problem I faced was: how to convert or move data from a source to xmm registers?

I used two major types of instructions:

- Move-in / Move-out data: moving data from general-purpose registers to XMM registers, from memory to XMM registers, and from XMM registers to registers or memory.
- Data conversion: converting integers to floats and floats to integers.

**Move-In / Move-Out Data**

SSE AVX ------------------------------------- (Source Type) float movss vmovss double movsd vmovsd 32-bit integer movd vmovd 64-bit integer movq vmovq

Example:

movss xmm1, [fltVar1] ; copy float from memory [fltVar1] to xmm1 ; lower 32 bits replaced by [fltVar1] ; higher bits will be cleared. vmovss xmm1, [fltVar1] ; copy float from memory [fltVar1] to xmm1 with AVX ; lower 32 bits replaced by [fltVar1] ; higher bits will be cleared. movss xmm1, xmm2 ; copy lower 32-bit data from xmm2 to xmm1 vmovss xmm1, xmm3, xmm2 ; copy lower 32-bit data from xmm2 to xmm1 ; copy 33-128 bit data from xmm3 to xmm1 movss [fltVar2], xmm1 ; copy lower 32-bit xmm1 float to [fltVar2] vmovss [fltVar2], xmm1 ; copy lower 32-bit xmm1 float to [fltVar2] movd xmm1, eax ; copy data from eax to xmm1 lower 32-bit ; higher bits cleared (set to zero) vmovd xmm1, eax ; copy data from eax to xmm1 lower 32-bit with AVX ; higher bits cleared (set to zero) vmovd xmm1, [intVar1] ; copy data from [intVar1] to xmm1 lower 32-bit with AVX ; higher bits cleared (set to zero)

**Convert data type from one to another**

Converting between different types is a common operation for SSE/AVX instructions.

Typically, we convert data between float and integer data types.

SSE instructions (to) 32-bit int. float double -------------------------------------------------------- (From) 32/64-bit integer N/A cvtsi2ss cvtsi2sd float cvtss2si N/A cvtss2sd double cvtsd2si cvtsd2ss N/A

AVX instructions (to) 32-bit int. float double -------------------------------------------------------- (From) 32/64-bit integer N/A vcvtsi2ss vcvtsi2sd float vcvtss2si N/A vcvtss2sd double vcvtsd2si vcvtsd2ss N/A

**Most frequently used are the integer <-> float conversions**.

Example:

cvtsi2ss xmm0, eax ; convert integer in eax to float, store in ; xmm0's lower 32-bit location ; other bits in xmm0 remain unchanged vcvtsi2ss xmm0, xmm1, eax ; convert integer in eax to float, store in ; xmm0's lower 32-bit location ; other bits in xmm0 replaced by xmm1 cvtss2si eax, xmm0 ; convert float in xmm0's lower 32-bit to ; integer, stored in eax vcvtss2si eax, xmm0 ; convert float in xmm0's lower 32-bit to ; integer with AVX, stored in eax

Example of multiplication with SSE:

cvtsi2ss xmm15, eax ; convert 32-bit integer in eax to float in xmm15 movss xmm14, [fltMul] ; move float value from memory [fltMul] to xmm14 mulss xmm15, xmm14 ; xmm15 = xmm15 * xmm14 cvtss2si eax, xmm15 ; convert float in xmm15 to 32-bit integer, store in eax

Example of multiplication with AVX:

vcvtsi2ss xmm15, xmm15, eax ; convert 32-bit integer in eax to float in xmm15 vmovss xmm14, [fltMul] ; move float value from memory [fltMul] to xmm14 vmulss xmm13, xmm15, xmm14 ; xmm13 = xmm15 * xmm14 vcvtss2si eax, xmm13 ; convert float in xmm13 to 32-bit integer, store in eax

Remember to save registers before using them, this depends how these registers are used before injection.

Reference:

Preserving XMM Registers

Preserving XMM Registers to Pre-Allocated Memory

Preserving Register States in Assembly

**Conditions and Branches with SSE/AVX Instructions**

Sometimes we need to perform checks to see if a value is non-negative (i.e., must be >= 0). For general-purpose registers (GPRs), we use the `cmp`

instruction along with branch instructions like `ja`

, `je`

, or `jge`

to perform different operations.

In SSE/AVX, we can use `comiss`

or `ucomiss`

to complete this task:

GPRs:

`cmp`

SSE:

`comiss`

,`ucomiss`

AVX:

`vcomiss`

,`vucomiss`

The difference between `comiss`

and `ucomiss`

is how they handle NaN (Not a Number). When NaN is found, EFLAGS are set as follows:

Flags: CF PF ZF ------------------------------------------ comiss 1 1 1 ucomiss 1 1 0 vcomiss 1 1 1 vucomiss 1 1 0

You can check if NaN with instruction, like `jp`

.

I do not check for NaN in all cases. That means whether I use `comiss`

or `ucomiss`

depends on the specific requirements. The program may crash or fail if NaN is encountered.

After executing `comiss`

or `ucomiss`

, we can now use branch instructions like `ja`

, `jb`

, `je`

, `jae`

, `jbe`

, etc., to perform conditional branching.

Example:

Return a multiplied value only if the result is > 0.

cvtsi2ss xmm15, eax ; Convert 32-bit integer in eax to float in xmm15. movss xmm14, [fltMul] ; Move float value from memory [fltMul] to xmm14. mulss xmm15, xmm14 ; xmm15 = xmm15 * xmm14. subss xmm15, [decVal] ; xmm15 = xmm15 - [decVal]. xorps xmm14, xmm14 ; Clear xmm14 to zero. comiss xmm15, xmm14 jbe endp ; Jump if xmm15 <= 0, skip conversion if result not > 0. cvtss2si eax, xmm15 ; Convert float in xmm15 to 32-bit integer, store in eax. endp:

Or using AVX instructions:

vcvtsi2ss xmm15, xmm15, eax ; Convert 32-bit integer in eax to float in xmm15. vmovss xmm14, [fltMul] ; Move float value from memory [fltMul] to xmm14. vmulss xmm13, xmm15, xmm14 ; xmm13 = xmm15 * xmm14. vsubss xmm13, xmm13, [decVal] ; xmm13 = xmm13 - [decVal]. vxorps xmm14, xmm14, xmm14 ; Clear xmm14 to zero. vcomiss xmm15, xmm14 jbe endp ; Jump if xmm15 <= 0, skip conversion if result not > 0. vcvtss2si eax, xmm15 ; Convert float in xmm15 to 32-bit integer, store in eax. endp:

Notice for float to integer conversion:

If the original value in the xmm register exceeds 2147483647, it is uncertain what will happen to eax. It may become the maximum allowed value of 2147483647 or something else.

You may choose to add more checks before conversion, but I have chosen to ignore it. Users must accept the risk themselves.