The OpenD Programming Language

inteli.xmmintrin

Public Imports

inteli.types
public import inteli.types;
Undocumented in source.

Members

Aliases

_mm_cvt_pi2ps
alias _mm_cvt_pi2ps = _mm_cvtpi32_ps

Convert packed signed 32-bit integers in b to packed single-precision (32-bit) floating-point elements, store the results in the lower 2 elements, and copy the upper 2 packed elements from a to the upper elements of result.

_mm_load1_ps
alias _mm_load1_ps = _mm_load_ps1

Load a single-precision (32-bit) floating-point element from memory into all elements.

Functions

_MM_GET_EXCEPTION_MASK
uint _MM_GET_EXCEPTION_MASK()

Get the exception mask bits from the MXCSR control and status register. The exception mask may contain any of the following flags: _MM_MASK_INVALID, _MM_MASK_DIV_ZERO, _MM_MASK_DENORM, _MM_MASK_OVERFLOW, _MM_MASK_UNDERFLOW, _MM_MASK_INEXACT. Note: won't correspond to reality on non-x86, where MXCSR this is emulated.

_MM_GET_EXCEPTION_STATE
uint _MM_GET_EXCEPTION_STATE()

Get the exception state bits from the MXCSR control and status register. The exception state may contain any of the following flags: _MM_EXCEPT_INVALID, _MM_EXCEPT_DIV_ZERO, _MM_EXCEPT_DENORM, _MM_EXCEPT_OVERFLOW, _MM_EXCEPT_UNDERFLOW, _MM_EXCEPT_INEXACT. Note: won't correspond to reality on non-x86, where MXCSR this is emulated. No exception reported.

_MM_GET_FLUSH_ZERO_MODE
uint _MM_GET_FLUSH_ZERO_MODE()

Get the flush zero bits from the MXCSR control and status register. The flush zero may contain any of the following flags: _MM_FLUSH_ZERO_ON or _MM_FLUSH_ZERO_OFF

_MM_GET_ROUNDING_MODE
uint _MM_GET_ROUNDING_MODE()

Get the rounding mode bits from the MXCSR control and status register. The rounding mode may contain any of the following flags: _MM_ROUND_NEAREST, _MM_ROUND_DOWN, _MM_ROUND_UP, _MM_ROUND_TOWARD_ZERO`.

_MM_SET_EXCEPTION_MASK
void _MM_SET_EXCEPTION_MASK(int _MM_MASK_xxxx)

Set the exception mask bits of the MXCSR control and status register to the value in unsigned 32-bit integer _MM_MASK_xxxx. The exception mask may contain any of the following flags: _MM_MASK_INVALID, _MM_MASK_DIV_ZERO, _MM_MASK_DENORM, _MM_MASK_OVERFLOW, _MM_MASK_UNDERFLOW, _MM_MASK_INEXACT.

_MM_SET_EXCEPTION_STATE
void _MM_SET_EXCEPTION_STATE(int _MM_EXCEPT_xxxx)

Set the exception state bits of the MXCSR control and status register to the value in unsigned 32-bit integer _MM_EXCEPT_xxxx. The exception state may contain any of the following flags: _MM_EXCEPT_INVALID, _MM_EXCEPT_DIV_ZERO, _MM_EXCEPT_DENORM, _MM_EXCEPT_OVERFLOW, _MM_EXCEPT_UNDERFLOW, _MM_EXCEPT_INEXACT.

_MM_SET_FLUSH_ZERO_MODE
void _MM_SET_FLUSH_ZERO_MODE(int _MM_FLUSH_xxxx)

Set the flush zero bits of the MXCSR control and status register to the value in unsigned 32-bit integer _MM_FLUSH_xxxx. The flush zero may contain any of the following flags: _MM_FLUSH_ZERO_ON or _MM_FLUSH_ZERO_OFF.

_MM_SET_ROUNDING_MODE
void _MM_SET_ROUNDING_MODE(int _MM_ROUND_xxxx)

Set the rounding mode bits of the MXCSR control and status register to the value in unsigned 32-bit integer _MM_ROUND_xxxx. The rounding mode may contain any of the following flags: _MM_ROUND_NEAREST, _MM_ROUND_DOWN, _MM_ROUND_UP, _MM_ROUND_TOWARD_ZERO.

_MM_TRANSPOSE4_PS
void _MM_TRANSPOSE4_PS(__m128 row0, __m128 row1, __m128 row2, __m128 row3)

Transpose the 4x4 matrix formed by the 4 rows of single-precision (32-bit) floating-point elements in row0, row1, row2, and row3, and store the transposed matrix in these vectors (row0 now contains column 0, etc.).

_mm_add_ps
__m128 _mm_add_ps(__m128 a, __m128 b)

Add packed single-precision (32-bit) floating-point elements in a and b.

_mm_add_ss
__m128 _mm_add_ss(__m128 a, __m128 b)

Add the lower single-precision (32-bit) floating-point element in a and b, store the result in the lower element of result, and copy the upper 3 packed elements from a to the upper elements of result.

_mm_and_ps
__m128 _mm_and_ps(__m128 a, __m128 b)

Compute the bitwise AND of packed single-precision (32-bit) floating-point elements in a and b.

_mm_andnot_ps
__m128 _mm_andnot_ps(__m128 a, __m128 b)

Compute the bitwise NOT of packed single-precision (32-bit) floating-point elements in a and then AND with b.

_mm_avg_pu16
__m64 _mm_avg_pu16(__m64 a, __m64 b)

Average packed unsigned 16-bit integers in `a and b`.

_mm_avg_pu8
__m64 _mm_avg_pu8(__m64 a, __m64 b)

Average packed unsigned 8-bit integers in `a and b`.

_mm_cmpeq_ps
__m128 _mm_cmpeq_ps(__m128 a, __m128 b)

Compare packed single-precision (32-bit) floating-point elements in a and b for equality.

_mm_cmpeq_ss
__m128 _mm_cmpeq_ss(__m128 a, __m128 b)

Compare the lower single-precision (32-bit) floating-point elements in a and b for equality, and copy the upper 3 packed elements from a to the upper elements of result.

_mm_cmpge_ps
__m128 _mm_cmpge_ps(__m128 a, __m128 b)

Compare packed single-precision (32-bit) floating-point elements in a and b for greater-than-or-equal.

_mm_cmpge_ss
__m128 _mm_cmpge_ss(__m128 a, __m128 b)

Compare the lower single-precision (32-bit) floating-point elements in a and b for greater-than-or-equal, and copy the upper 3 packed elements from a to the upper elements of result.

_mm_cmpgt_ps
__m128 _mm_cmpgt_ps(__m128 a, __m128 b)

Compare packed single-precision (32-bit) floating-point elements in a and b for greater-than.

_mm_cmpgt_ss
__m128 _mm_cmpgt_ss(__m128 a, __m128 b)

Compare the lower single-precision (32-bit) floating-point elements in a and b for greater-than, and copy the upper 3 packed elements from a to the upper elements of result.

_mm_cmple_ps
__m128 _mm_cmple_ps(__m128 a, __m128 b)

Compare packed single-precision (32-bit) floating-point elements in a and b for less-than-or-equal.

_mm_cmple_ss
__m128 _mm_cmple_ss(__m128 a, __m128 b)

Compare the lower single-precision (32-bit) floating-point elements in a and b for less-than-or-equal, and copy the upper 3 packed elements from a to the upper elements of result.

_mm_cmplt_ps
__m128 _mm_cmplt_ps(__m128 a, __m128 b)

Compare packed single-precision (32-bit) floating-point elements in a and b for less-than.

_mm_cmplt_ss
__m128 _mm_cmplt_ss(__m128 a, __m128 b)

Compare the lower single-precision (32-bit) floating-point elements in a and b for less-than, and copy the upper 3 packed elements from a to the upper elements of result.

_mm_cmpneq_ps
__m128 _mm_cmpneq_ps(__m128 a, __m128 b)

Compare packed single-precision (32-bit) floating-point elements in a and b for not-equal.

_mm_cmpneq_ss
__m128 _mm_cmpneq_ss(__m128 a, __m128 b)

Compare the lower single-precision (32-bit) floating-point elements in a and b for not-equal, and copy the upper 3 packed elements from a to the upper elements of result.

_mm_cmpnge_ps
__m128 _mm_cmpnge_ps(__m128 a, __m128 b)

Compare packed single-precision (32-bit) floating-point elements in a and b for not-greater-than-or-equal.

_mm_cmpnge_ss
__m128 _mm_cmpnge_ss(__m128 a, __m128 b)

Compare the lower single-precision (32-bit) floating-point elements in a and b for not-greater-than-or-equal, and copy the upper 3 packed elements from a to the upper elements of result.

_mm_cmpngt_ps
__m128 _mm_cmpngt_ps(__m128 a, __m128 b)

Compare packed single-precision (32-bit) floating-point elements in a and b for not-greater-than.

_mm_cmpngt_ss
__m128 _mm_cmpngt_ss(__m128 a, __m128 b)

Compare the lower single-precision (32-bit) floating-point elements in a and b for not-greater-than, and copy the upper 3 packed elements from a to the upper elements of result.

_mm_cmpnle_ps
__m128 _mm_cmpnle_ps(__m128 a, __m128 b)

Compare packed single-precision (32-bit) floating-point elements in a and b for not-less-than-or-equal.

_mm_cmpnle_ss
__m128 _mm_cmpnle_ss(__m128 a, __m128 b)

Compare the lower single-precision (32-bit) floating-point elements in a and b for not-less-than-or-equal, and copy the upper 3 packed elements from a to the upper elements of result.

_mm_cmpnlt_ps
__m128 _mm_cmpnlt_ps(__m128 a, __m128 b)

Compare packed single-precision (32-bit) floating-point elements in a and b for not-less-than.

_mm_cmpnlt_ss
__m128 _mm_cmpnlt_ss(__m128 a, __m128 b)

Compare the lower single-precision (32-bit) floating-point elements in a and b for not-less-than, and copy the upper 3 packed elements from a to the upper elements of result.

_mm_cmpord_ps
__m128 _mm_cmpord_ps(__m128 a, __m128 b)

Compare packed single-precision (32-bit) floating-point elements in a and b to see if neither is NaN.

_mm_cmpord_ss
__m128 _mm_cmpord_ss(__m128 a, __m128 b)

Compare the lower single-precision (32-bit) floating-point elements in a and b to see if neither is NaN, and copy the upper 3 packed elements from a to the upper elements of result.

_mm_cmpunord_ps
__m128 _mm_cmpunord_ps(__m128 a, __m128 b)

Compare packed single-precision (32-bit) floating-point elements in a and b to see if either is NaN.

_mm_cmpunord_ss
__m128 _mm_cmpunord_ss(__m128 a, __m128 b)

Compare the lower single-precision (32-bit) floating-point elements in a and b to see if either is NaN. and copy the upper 3 packed elements from a to the upper elements of result.

_mm_comieq_ss
int _mm_comieq_ss(__m128 a, __m128 b)

Compare the lower single-precision (32-bit) floating-point element in a and b for equality, and return the boolean result (0 or 1).

_mm_comige_ss
int _mm_comige_ss(__m128 a, __m128 b)

Compare the lower single-precision (32-bit) floating-point element in a and b for greater-than-or-equal, and return the boolean result (0 or 1).

_mm_comigt_ss
int _mm_comigt_ss(__m128 a, __m128 b)

Compare the lower single-precision (32-bit) floating-point element in a and b for greater-than, and return the boolean result (0 or 1).

_mm_comile_ss
int _mm_comile_ss(__m128 a, __m128 b)

Compare the lower single-precision (32-bit) floating-point element in a and b for less-than-or-equal, and return the boolean result (0 or 1).

_mm_comilt_ss
int _mm_comilt_ss(__m128 a, __m128 b)

Compare the lower single-precision (32-bit) floating-point element in a and b for less-than, and return the boolean result (0 or 1).

_mm_comineq_ss
int _mm_comineq_ss(__m128 a, __m128 b)

Compare the lower single-precision (32-bit) floating-point element in a and b for not-equal, and return the boolean result (0 or 1).

_mm_cvt_ps2pi
__m64 _mm_cvt_ps2pi(__m128 a)

Convert 2 lower packed single-precision (32-bit) floating-point elements in a to packed 32-bit integers.

_mm_cvt_si2ss
__m128 _mm_cvt_si2ss(__m128 v, int x)

Convert the signed 32-bit integer b to a single-precision (32-bit) floating-point element, store the result in the lower element, and copy the upper 3 packed elements from a to the upper elements of the result.

_mm_cvtpi16_ps
__m128 _mm_cvtpi16_ps(__m64 a)

Convert packed 16-bit integers in a to packed single-precision (32-bit) floating-point elements.

_mm_cvtpi32_ps
__m128 _mm_cvtpi32_ps(__m128 a, __m64 b)

Convert packed signed 32-bit integers in b to packed single-precision (32-bit) floating-point elements, store the results in the lower 2 elements, and copy the upper 2 packed elements from a to the upper elements of result.

_mm_cvtpi32x2_ps
__m128 _mm_cvtpi32x2_ps(__m64 a, __m64 b)

Convert packed signed 32-bit integers in a to packed single-precision (32-bit) floating-point elements, store the results in the lower 2 elements, then covert the packed signed 32-bit integers in b to single-precision (32-bit) floating-point element, and store the results in the upper 2 elements.

_mm_cvtpi8_ps
__m128 _mm_cvtpi8_ps(__m64 a)

Convert the lower packed 8-bit integers in a to packed single-precision (32-bit) floating-point elements.

_mm_cvtps_pi16
__m64 _mm_cvtps_pi16(__m128 a)

Convert packed single-precision (32-bit) floating-point elements in a to packed 16-bit integers. Note: this intrinsic will generate 0x7FFF, rather than 0x8000, for input values between 0x7FFF and 0x7FFFFFFF.

_mm_cvtps_pi32
__m64 _mm_cvtps_pi32(__m128 a)

Convert packed single-precision (32-bit) floating-point elements in a to packed 32-bit integers.

_mm_cvtps_pi8
__m64 _mm_cvtps_pi8(__m128 a)

Convert packed single-precision (32-bit) floating-point elements in a to packed 8-bit integers, and store the results in lower 4 elements. Note: this intrinsic will generate 0x7F, rather than 0x80, for input values between 0x7F and 0x7FFFFFFF.

_mm_cvtpu16_ps
__m128 _mm_cvtpu16_ps(__m64 a)

Convert packed unsigned 16-bit integers in a to packed single-precision (32-bit) floating-point elements.

_mm_cvtpu8_ps
__m128 _mm_cvtpu8_ps(__m64 a)

Convert the lower packed unsigned 8-bit integers in a to packed single-precision (32-bit) floating-point element.

_mm_cvtsi32_ss
__m128 _mm_cvtsi32_ss(__m128 v, int x)

Convert the signed 32-bit integer b to a single-precision (32-bit) floating-point element, store the result in the lower element, and copy the upper 3 packed elements from a to the upper elements of result.

_mm_cvtsi64_ss
__m128 _mm_cvtsi64_ss(__m128 v, long x)

Convert the signed 64-bit integer b to a single-precision (32-bit) floating-point element, store the result in the lower element, and copy the upper 3 packed elements from a to the upper elements of result.

_mm_cvtss_f32
float _mm_cvtss_f32(__m128 a)

Take the lower single-precision (32-bit) floating-point element of a.

_mm_cvtss_si32
int _mm_cvtss_si32(__m128 a)

Convert the lower single-precision (32-bit) floating-point element in a to a 32-bit integer.

_mm_cvtss_si64
long _mm_cvtss_si64(__m128 a)

Convert the lower single-precision (32-bit) floating-point element in a to a 64-bit integer.

_mm_cvtt_ps2pi
__m64 _mm_cvtt_ps2pi(__m128 a)

Convert packed single-precision (32-bit) floating-point elements in a to packed 32-bit integers with truncation.

_mm_cvtt_ss2si
int _mm_cvtt_ss2si(__m128 a)

Convert the lower single-precision (32-bit) floating-point element in a to a 32-bit integer with truncation.

_mm_cvttss_si64
long _mm_cvttss_si64(__m128 a)

Convert the lower single-precision (32-bit) floating-point element in a to a 64-bit integer with truncation.

_mm_div_ps
__m128 _mm_div_ps(__m128 a, __m128 b)

Divide packed single-precision (32-bit) floating-point elements in a by packed elements in b.

_mm_div_ss
__m128 _mm_div_ss(__m128 a, __m128 b)

Divide the lower single-precision (32-bit) floating-point element in a by the lower single-precision (32-bit) floating-point element in b, store the result in the lower element of result, and copy the upper 3 packed elements from a to the upper elements of result.

_mm_extract_pi16
int _mm_extract_pi16(__m64 a, int imm8)

Extract a 16-bit unsigned integer from a, selected with imm8. Zero-extended.

_mm_free
void _mm_free(void* mem_addr)

Free aligned memory that was allocated with _mm_malloc or _mm_realloc.

_mm_getcsr
uint _mm_getcsr()

Get the unsigned 32-bit value of the MXCSR control and status register. Note: this is emulated on ARM, because there is no MXCSR register then.

_mm_insert_pi16
__m64 _mm_insert_pi16(__m64 v, int i, int imm8)

Insert a 16-bit integer i inside a at the location specified by imm8.

_mm_load_ps
__m128 _mm_load_ps(const(float)* p)

Load 128-bits (composed of 4 packed single-precision (32-bit) floating-point elements) from memory.

_mm_load_ps1
__m128 _mm_load_ps1(const(float)* p)

Load a single-precision (32-bit) floating-point element from memory into all elements.

_mm_load_ss
__m128 _mm_load_ss(const(float)* mem_addr)

Load a single-precision (32-bit) floating-point element from memory into the lower of dst, and zero the upper 3 elements. mem_addr does not need to be aligned on any particular boundary.

_mm_loadh_pi
__m128 _mm_loadh_pi(__m128 a, const(__m64)* mem_addr)

Load 2 single-precision (32-bit) floating-point elements from memory into the upper 2 elements of result, and copy the lower 2 elements from a to result. mem_addr does not need to be aligned on any particular boundary.

_mm_loadl_pi
__m128 _mm_loadl_pi(__m128 a, const(__m64)* mem_addr)

Load 2 single-precision (32-bit) floating-point elements from memory into the lower 2 elements of result, and copy the upper 2 elements from a to result. mem_addr does not need to be aligned on any particular boundary.

_mm_loadr_ps
__m128 _mm_loadr_ps(const(float)* mem_addr)

Load 4 single-precision (32-bit) floating-point elements from memory in reverse order. mem_addr must be aligned on a 16-byte boundary or a general-protection exception may be generated.

_mm_loadu_ps
__m128 _mm_loadu_ps(const(float)* mem_addr)

Load 128-bits (composed of 4 packed single-precision (32-bit) floating-point elements) from memory. mem_addr does not need to be aligned on any particular boundary.

_mm_malloc
void* _mm_malloc(size_t size, size_t alignment)

Allocate size bytes of memory, aligned to the alignment specified in align, and return a pointer to the allocated memory. _mm_free should be used to free memory that is allocated with _mm_malloc.

_mm_maskmove_si64
void _mm_maskmove_si64(__m64 a, __m64 mask, char* mem_addr)

Conditionally store 8-bit integer elements from a into memory using mask (elements are not stored when the highest bit is not set in the corresponding element) and a non-temporal memory hint.

_mm_max_pi16
__m64 _mm_max_pi16(__m64 a, __m64 b)

Compare packed signed 16-bit integers in a and b, and return packed maximum value.

_mm_max_ps
__m128 _mm_max_ps(__m128 a, __m128 b)

Compare packed single-precision (32-bit) floating-point elements in a and b, and return packed maximum values.

_mm_max_pu8
__m64 _mm_max_pu8(__m64 a, __m64 b)

Compare packed unsigned 8-bit integers in a and b, and return packed maximum values.

_mm_max_ss
__m128 _mm_max_ss(__m128 a, __m128 b)

Compare the lower single-precision (32-bit) floating-point elements in a and b, store the maximum value in the lower element of result, and copy the upper 3 packed elements from a to the upper element of result.

_mm_min_pi16
__m64 _mm_min_pi16(__m64 a, __m64 b)

Compare packed signed 16-bit integers in a and b, and return packed minimum values.

_mm_min_ps
__m128 _mm_min_ps(__m128 a, __m128 b)

Compare packed single-precision (32-bit) floating-point elements in a and b, and return packed maximum values.

_mm_min_pu8
__m64 _mm_min_pu8(__m64 a, __m64 b)

Compare packed unsigned 8-bit integers in a and b, and return packed minimum values.

_mm_min_ss
__m128 _mm_min_ss(__m128 a, __m128 b)

Compare the lower single-precision (32-bit) floating-point elements in a and b, store the minimum value in the lower element of result, and copy the upper 3 packed elements from a to the upper element of result.

_mm_move_ss
__m128 _mm_move_ss(__m128 a, __m128 b)

Move the lower single-precision (32-bit) floating-point element from b to the lower element of result, and copy the upper 3 packed elements from a to the upper elements of result.

_mm_movehl_ps
__m128 _mm_movehl_ps(__m128 a, __m128 b)

Move the upper 2 single-precision (32-bit) floating-point elements from b to the lower 2 elements of result, and copy the upper 2 elements from a to the upper 2 elements of dst.

_mm_movelh_ps
__m128 _mm_movelh_ps(__m128 a, __m128 b)

Move the lower 2 single-precision (32-bit) floating-point elements from b to the upper 2 elements of result, and copy the lower 2 elements from a to the lower 2 elements of result

_mm_movemask_pi8
int _mm_movemask_pi8(__m64 a)

Create mask from the most significant bit of each 8-bit element in a.

_mm_movemask_ps
int _mm_movemask_ps(__m128 a)

Set each bit of result based on the most significant bit of the corresponding packed single-precision (32-bit) floating-point element in a.

_mm_mul_ps
__m128 _mm_mul_ps(__m128 a, __m128 b)

Multiply packed single-precision (32-bit) floating-point elements in a and b.

_mm_mul_ss
__m128 _mm_mul_ss(__m128 a, __m128 b)

Multiply the lower single-precision (32-bit) floating-point element in a and b, store the result in the lower element of result, and copy the upper 3 packed elements from a to the upper elements of result.

_mm_mulhi_pu16
__m64 _mm_mulhi_pu16(__m64 a, __m64 b)

Multiply the packed unsigned 16-bit integers in a and b, producing intermediate 32-bit integers, and return the high 16 bits of the intermediate integers.

_mm_or_ps
__m128 _mm_or_ps(__m128 a, __m128 b)

Compute the bitwise OR of packed single-precision (32-bit) floating-point elements in a and b, and return the result.

_mm_prefetch
void _mm_prefetch(const(void)* p)

Fetch the line of data from memory that contains address p to a location in the cache hierarchy specified by the locality hint i.

_mm_rcp_ps
__m128 _mm_rcp_ps(__m128 a)

Compute the approximate reciprocal of packed single-precision (32-bit) floating-point elements in a` , and return the results. The maximum relative error for this approximation is less than 1.5*2^-12.

_mm_rcp_ss
__m128 _mm_rcp_ss(__m128 a)

Compute the approximate reciprocal of the lower single-precision (32-bit) floating-point element in a, store it in the lower element of the result, and copy the upper 3 packed elements from a to the upper elements of result. The maximum relative error for this approximation is less than 1.5*2^-12.

_mm_realloc
void* _mm_realloc(void* aligned, size_t size, size_t alignment)

Reallocate size bytes of memory, aligned to the alignment specified in alignment, and return a pointer to the newly allocated memory. Previous data is preserved if any.

_mm_realloc_discard
void* _mm_realloc_discard(void* aligned, size_t size, size_t alignment)

Reallocate size bytes of memory, aligned to the alignment specified in alignment, and return a pointer to the newly allocated memory. Previous data is discarded.

_mm_rsqrt_ps
__m128 _mm_rsqrt_ps(__m128 a)

Compute the approximate reciprocal square root of packed single-precision (32-bit) floating-point elements in a. The maximum relative error for this approximation is less than 1.5*2^-12.

_mm_rsqrt_ss
__m128 _mm_rsqrt_ss(__m128 a)

Compute the approximate reciprocal square root of the lower single-precision (32-bit) floating-point element in a, store the result in the lower element. Copy the upper 3 packed elements from a to the upper elements of result. The maximum relative error for this approximation is less than 1.5*2^-12.

_mm_sad_pu8
__m64 _mm_sad_pu8(__m64 a, __m64 b)

Compute the absolute differences of packed unsigned 8-bit integers in a and b, then horizontally sum each consecutive 8 differences to produce four unsigned 16-bit integers, and pack these unsigned 16-bit integers in the low 16 bits of result.

_mm_set1_ps
__m128 _mm_set1_ps(float a)

Broadcast single-precision (32-bit) floating-point value a to all elements.

_mm_set_ps
__m128 _mm_set_ps(float e3, float e2, float e1, float e0)

Set packed single-precision (32-bit) floating-point elements with the supplied values.

_mm_set_ss
__m128 _mm_set_ss(float a)

Copy single-precision (32-bit) floating-point element a to the lower element of result, and zero the upper 3 elements.

_mm_setcsr
void _mm_setcsr(uint controlWord)

Set the MXCSR control and status register with the value in unsigned 32-bit integer controlWord.

_mm_setr_ps
__m128 _mm_setr_ps(float e3, float e2, float e1, float e0)

Set packed single-precision (32-bit) floating-point elements with the supplied values in reverse order.

_mm_setzero_ps
__m128 _mm_setzero_ps()

Return vector of type __m128 with all elements set to zero.

_mm_sfence
void _mm_sfence()

Do a serializing operation on all store-to-memory instructions that were issued prior to this instruction. Guarantees that every store instruction that precedes, in program order, is globally visible before any store instruction which follows the fence in program order.

_mm_shuffle_ps
__m128 _mm_shuffle_ps(__m128 a, __m128 b)

Shuffle single-precision (32-bit) floating-point elements in a and b using the control in imm8, Warning: the immediate shuffle value imm is given at compile-time instead of runtime.

_mm_sqrt_ps
__m128 _mm_sqrt_ps(__m128 a)

Compute the square root of packed single-precision (32-bit) floating-point elements in a.

_mm_sqrt_ss
__m128 _mm_sqrt_ss(__m128 a)

Compute the square root of the lower single-precision (32-bit) floating-point element in a, store it in the lower element, and copy the upper 3 packed elements from a to the upper elements of result.

_mm_store1_ps
void _mm_store1_ps(float* mem_addr, __m128 a)

Store the lower single-precision (32-bit) floating-point element from a into 4 contiguous elements in memory. mem_addr must be aligned on a 16-byte boundary or a general-protection exception may be generated.

_mm_store_ps
void _mm_store_ps(float* mem_addr, __m128 a)

Store 128-bits (composed of 4 packed single-precision (32-bit) floating-point elements) from a into memory. mem_addr must be aligned on a 16-byte boundary or a general-protection exception may be generated.

_mm_store_ss
void _mm_store_ss(float* mem_addr, __m128 a)

Store the lower single-precision (32-bit) floating-point element from a into memory. mem_addr does not need to be aligned on any particular boundary.

_mm_storeh_pi
void _mm_storeh_pi(__m64* p, __m128 a)

Store the upper 2 single-precision (32-bit) floating-point elements from a into memory.

_mm_storel_pi
void _mm_storel_pi(__m64* p, __m128 a)

Store the lower 2 single-precision (32-bit) floating-point elements from a into memory.

_mm_storer_ps
void _mm_storer_ps(float* mem_addr, __m128 a)

Store 4 single-precision (32-bit) floating-point elements from a into memory in reverse order. mem_addr must be aligned on a 16-byte boundary or a general-protection exception may be generated.

_mm_storeu_ps
void _mm_storeu_ps(float* mem_addr, __m128 a)

Store 128-bits (composed of 4 packed single-precision (32-bit) floating-point elements) from a into memory. mem_addr does not need to be aligned on any particular boundary.

_mm_stream_pi
void _mm_stream_pi(__m64* mem_addr, __m64 a)

Store 64-bits of integer data from a into memory using a non-temporal memory hint. Note: non-temporal stores should be followed by _mm_sfence() for reader threads.

_mm_stream_ps
void _mm_stream_ps(float* mem_addr, __m128 a)

Store 128-bits (composed of 4 packed single-precision (32-bit) floating-point elements) from as into memory using a non-temporal memory hint. mem_addr must be aligned on a 16-byte boundary or a general-protection exception may be generated. Note: non-temporal stores should be followed by _mm_sfence() for reader threads.

_mm_sub_ps
__m128 _mm_sub_ps(__m128 a, __m128 b)

Subtract packed single-precision (32-bit) floating-point elements in b from packed single-precision (32-bit) floating-point elements in a.

_mm_sub_ss
__m128 _mm_sub_ss(__m128 a, __m128 b)

Subtract the lower single-precision (32-bit) floating-point element in b from the lower single-precision (32-bit) floating-point element in a, store the subtration result in the lower element of result, and copy the upper 3 packed elements from a to the upper elements of result.

_mm_undefined_ps
__m128 _mm_undefined_ps()

Return vector of type __m128 with undefined elements.

_mm_unpackhi_ps
__m128 _mm_unpackhi_ps(__m128 a, __m128 b)

Unpack and interleave single-precision (32-bit) floating-point elements from the high half a and b.

_mm_unpacklo_ps
__m128 _mm_unpacklo_ps(__m128 a, __m128 b)

Unpack and interleave single-precision (32-bit) floating-point elements from the low half of a and b.

_mm_xor_ps
__m128 _mm_xor_ps(__m128 a, __m128 b)

Compute the bitwise XOR of packed single-precision (32-bit) floating-point elements in a and b.

Manifest constants

_MM_HINT_NTA
enum _MM_HINT_NTA;
_MM_HINT_T0
enum _MM_HINT_T0;
_MM_HINT_T1
enum _MM_HINT_T1;
_MM_HINT_T2
enum _MM_HINT_T2;

Variables

_MM_EXCEPT_DENORM
enum int _MM_EXCEPT_DENORM;
_MM_EXCEPT_DIV_ZERO
enum int _MM_EXCEPT_DIV_ZERO;
_MM_EXCEPT_INEXACT
enum int _MM_EXCEPT_INEXACT;

MXCSR Exception states.

_MM_EXCEPT_INVALID
enum int _MM_EXCEPT_INVALID;

MXCSR Exception states.

_MM_EXCEPT_MASK
enum int _MM_EXCEPT_MASK;

MXCSR Exception states mask.

_MM_EXCEPT_OVERFLOW
enum int _MM_EXCEPT_OVERFLOW;
_MM_EXCEPT_UNDERFLOW
enum int _MM_EXCEPT_UNDERFLOW;

MXCSR Exception states.

_MM_FLUSH_ZERO_MASK
enum int _MM_FLUSH_ZERO_MASK;

MXCSR Denormal flush to zero mask.

_MM_FLUSH_ZERO_OFF
enum int _MM_FLUSH_ZERO_OFF;

MXCSR Denormal flush to zero modes.

_MM_FLUSH_ZERO_ON
enum int _MM_FLUSH_ZERO_ON;

MXCSR Denormal flush to zero modes.

_MM_MASK_DENORM
enum int _MM_MASK_DENORM;
_MM_MASK_DIV_ZERO
enum int _MM_MASK_DIV_ZERO;
_MM_MASK_INEXACT
enum int _MM_MASK_INEXACT;

MXCSR Exception masks.

_MM_MASK_INVALID
enum int _MM_MASK_INVALID;

MXCSR Exception masks.

_MM_MASK_MASK
enum int _MM_MASK_MASK;

MXCSR Exception masks mask.

_MM_MASK_OVERFLOW
enum int _MM_MASK_OVERFLOW;
_MM_MASK_UNDERFLOW
enum int _MM_MASK_UNDERFLOW;

MXCSR Exception masks.

_MM_ROUND_DOWN
enum int _MM_ROUND_DOWN;

MXCSR Rounding mode.

_MM_ROUND_MASK
enum int _MM_ROUND_MASK;

MXCSR Rounding mode mask.

_MM_ROUND_NEAREST
enum int _MM_ROUND_NEAREST;
_MM_ROUND_TOWARD_ZERO
enum int _MM_ROUND_TOWARD_ZERO;
_MM_ROUND_UP
enum int _MM_ROUND_UP;

MXCSR Rounding mode.

Meta