The OpenD Programming Language

inteli.avxintrin

Public Imports

inteli.types
public import inteli.types;
Undocumented in source.
inteli.smmintrin
public import inteli.smmintrin;
Undocumented in source.
inteli.tmmintrin
public import inteli.tmmintrin;
Undocumented in source.
inteli.nmmintrin
public import inteli.nmmintrin;
Undocumented in source.

Members

Aliases

_CMP_EQ
alias _CMP_EQ = int

IMPORTANT NOTE ABOUT MASK LOAD/STORE:

_mm256_set1_epi64
alias _mm256_set1_epi64 = _mm256_set1_epi64x

Broadcast 64-bit integer a to all elements of the return value.

_mm256_set_epi64
alias _mm256_set_epi64 = _mm256_set_epi64x

Set packed 64-bit integers with the supplied values.

_mm256_setr_epi64
alias _mm256_setr_epi64 = _mm256_setr_epi64x

Set packed 64-bit integers with the supplied values in reverse order.

Enums

_CMP_EQ_OQ
anonymousenum _CMP_EQ_OQ

IMPORTANT NOTE ABOUT MASK LOAD/STORE:

Functions

_mm256_add_pd
__m256d _mm256_add_pd(__m256d a, __m256d b)

Add packed double-precision (64-bit) floating-point elements in a and b.

_mm256_add_ps
__m256 _mm256_add_ps(__m256 a, __m256 b)

Add packed single-precision (32-bit) floating-point elements in a and b.

_mm256_addsub_pd
__m256d _mm256_addsub_pd(__m256d a, __m256d b)

Alternatively add and subtract packed double-precision (64-bit) floating-point elements in a to/from packed elements in b.

_mm256_addsub_ps
__m256 _mm256_addsub_ps(__m256 a, __m256 b)

Alternatively add and subtract packed single-precision (32-bit) floating-point elements in a to/from packed elements in b.

_mm256_and_pd
__m256d _mm256_and_pd(__m256d a, __m256d b)

Compute the bitwise AND of packed double-precision (64-bit) floating-point elements in a and b.

_mm256_and_ps
__m256 _mm256_and_ps(__m256 a, __m256 b)

Compute the bitwise AND of packed single-precision (32-bit) floating-point elements in a and b.

_mm256_andnot_pd
__m256d _mm256_andnot_pd(__m256d a, __m256d b)

Compute the bitwise NOT of packed double-precision (64-bit) floating-point elements in a and then AND with b.

_mm256_andnot_ps
__m256 _mm256_andnot_ps(__m256 a, __m256 b)

Compute the bitwise NOT of packed single-precision (32-bit) floating-point elements in a and then AND with b.

_mm256_blend_pd
__m256d _mm256_blend_pd(__m256d a, __m256d b)

Blend packed double-precision (64-bit) floating-point elements from a and b using control mask imm8.

_mm256_blend_ps
__m256 _mm256_blend_ps(__m256 a, __m256 b)

Blend packed single-precision (32-bit) floating-point elements from a and b using control mask imm8.

_mm256_blendv_pd
__m256d _mm256_blendv_pd(__m256d a, __m256d b, __m256d mask)

Blend packed double-precision (64-bit) floating-point elements from a and b using mask.

_mm256_blendv_ps
__m256 _mm256_blendv_ps(__m256 a, __m256 b, __m256 mask)

Blend packed single-precision (32-bit) floating-point elements from a and b using mask.

_mm256_broadcast_pd
__m256d _mm256_broadcast_pd(const(__m128d)* mem_addr)

Broadcast 128 bits from memory (composed of 2 packed double-precision (64-bit) floating-point elements) to all elements. This effectively duplicates the 128-bit vector.

_mm256_broadcast_ps
__m256 _mm256_broadcast_ps(const(__m128)* mem_addr)

Broadcast 128 bits from memory (composed of 4 packed single-precision (32-bit) floating-point elements) to all elements. This effectively duplicates the 128-bit vector.

_mm256_broadcast_sd
__m256d _mm256_broadcast_sd(const(double)* mem_addr)

Broadcast a single-precision (32-bit) floating-point element from memory to all elements.

_mm256_castpd128_pd256
__m256d _mm256_castpd128_pd256(__m128d a)

Cast vector of type __m128d to type __m256d; the upper 128 bits of the result are undefined.

_mm256_castpd256_pd128
__m128d _mm256_castpd256_pd128(__m256d a)

Cast vector of type __m256d to type __m128d; the upper 128 bits of a are lost.

_mm256_castpd_ps
__m256 _mm256_castpd_ps(__m256d a)

Cast vector of type __m256d to type __m256.

_mm256_castpd_si256
__m256i _mm256_castpd_si256(__m256d a)

Cast vector of type __m256d to type __m256i.

_mm256_castps128_ps256
__m256 _mm256_castps128_ps256(__m128 a)

Cast vector of type __m128 to type __m256; the upper 128 bits of the result are undefined.

_mm256_castps256_ps128
__m128 _mm256_castps256_ps128(__m256 a)

Cast vector of type __m256 to type __m128. The upper 128-bit of a are lost.

_mm256_castps_pd
__m256d _mm256_castps_pd(__m256 a)

Cast vector of type __m256 to type __m256d.

_mm256_castps_si256
__m256i _mm256_castps_si256(__m256 a)

Cast vector of type __m256 to type __m256i.

_mm256_castsi128_si256
__m256i _mm256_castsi128_si256(__m128i a)

Cast vector of type __m128i to type __m256i; the upper 128 bits of the result are undefined.

_mm256_castsi256_pd
__m256d _mm256_castsi256_pd(__m256i a)

Cast vector of type __m256i to type __m256d.

_mm256_castsi256_ps
__m256 _mm256_castsi256_ps(__m256i a)

Cast vector of type __m256i to type __m256.

_mm256_castsi256_si128
__m128i _mm256_castsi256_si128(__m256i a)

Cast vector of type __m256i to type __m128i. The upper 128-bit of a are lost.

_mm256_ceil_pd
__m256d _mm256_ceil_pd(__m256d a)

Round the packed double-precision (64-bit) floating-point elements in a up to an integer value, and store the results as packed double-precision floating-point elements.

_mm256_ceil_ps
__m256 _mm256_ceil_ps(__m256 a)

Round the packed single-precision (32-bit) floating-point elements in a up to an integer value, and store the results as packed single-precision floating-point elements.

_mm256_cmp_pd
__m256d _mm256_cmp_pd(__m256d a, __m256d b)

Compare packed double-precision (64-bit) floating-point elements in a and b based on the comparison operand specified by imm8.

_mm256_cmp_ps
__m256 _mm256_cmp_ps(__m256 a, __m256 b)

Compare packed double-precision (32-bit) floating-point elements in a and b based on the comparison operand specified by imm8.

_mm256_cvtepi32_pd
__m256d _mm256_cvtepi32_pd(__m128i a)

Convert packed signed 32-bit integers in a to packed double-precision (64-bit) floating-point elements.

_mm256_cvtepi32_ps
__m256 _mm256_cvtepi32_ps(__m256i a)

Convert packed signed 32-bit integers in a to packed single-precision (32-bit) floating-point elements.

_mm256_cvtpd_epi32
__m128i _mm256_cvtpd_epi32(__m256d a)

Convert packed double-precision (64-bit) floating-point elements in a to packed 32-bit integers. Follows the current rounding mode.

_mm256_cvtpd_ps
__m128 _mm256_cvtpd_ps(__m256d a)

Convert packed double-precision (64-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements.

_mm256_cvtps_epi32
__m256i _mm256_cvtps_epi32(__m256 a)

Convert packed single-precision (32-bit) floating-point elements in a to packed 32-bit integers, using the current rounding mode.

_mm256_cvtps_pd
__m256d _mm256_cvtps_pd(__m128 a)

Convert packed single-precision (32-bit) floating-point elements in a` to packed double-precision (64-bit) floating-point elements.

_mm256_cvtsd_f64
double _mm256_cvtsd_f64(__m256d a)

Return the lower double-precision (64-bit) floating-point element of a.

_mm256_cvtsi256_si32
int _mm256_cvtsi256_si32(__m256i a)

Return the lower 32-bit integer in a.

_mm256_cvtss_f32
float _mm256_cvtss_f32(__m256 a)

Return the lower single-precision (32-bit) floating-point element of a.

_mm256_cvttpd_epi32
__m128i _mm256_cvttpd_epi32(__m256d a)

Convert packed double-precision (64-bit) floating-point elements in a to packed 32-bit integers with truncation.

_mm256_cvttps_epi32
__m256i _mm256_cvttps_epi32(__m256 a)

Convert packed single-precision (32-bit) floating-point elements in a.

_mm256_div_pd
__m256d _mm256_div_pd(__m256d a, __m256d b)

Divide packed double-precision (64-bit) floating-point elements in a by packed elements in b.

_mm256_div_ps
__m256 _mm256_div_ps(__m256 a, __m256 b)

Divide packed single-precision (32-bit) floating-point elements in a by packed elements in b.

_mm256_dp_ps
__m256 _mm256_dp_ps(__m256 a, __m256 b)

Conditionally multiply the packed single-precision (32-bit) floating-point elements in a and b using the high 4 bits in imm8, sum the four products, and conditionally store the sum using the low 4 bits of imm8.

_mm256_extract_epi32
int _mm256_extract_epi32(__m256i a, int imm8)

Extract a 32-bit integer from a, selected with imm8.

_mm256_extract_epi64
long _mm256_extract_epi64(__m256i a, int index)

Extract a 64-bit integer from a, selected with index.

_mm256_extractf128_pd
__m128d _mm256_extractf128_pd(__m256d a)
_mm256_extractf128_ps
__m128 _mm256_extractf128_ps(__m256 a)
_mm256_extractf128_si256
__m128i _mm256_extractf128_si256(__m256i a)

Extract a 128-bits lane from a, selected with index (0 or 1). Note: _mm256_extractf128_pd!0 is equivalent to _mm256_castpd256_pd128.

_mm256_floor_pd
__m256d _mm256_floor_pd(__m256d a)

Round the packed double-precision (64-bit) floating-point elements in a down to an integer value, and store the results as packed double-precision floating-point elements.

_mm256_floor_ps
__m256 _mm256_floor_ps(__m256 a)

Round the packed single-precision (32-bit) floating-point elements in a down to an integer value, and store the results as packed single-precision floating-point elements.

_mm256_hadd_pd
__m256d _mm256_hadd_pd(__m256d a, __m256d b)

Horizontally add adjacent pairs of double-precision (64-bit) floating-point elements in a and b.

_mm256_hadd_ps
__m256 _mm256_hadd_ps(__m256 a, __m256 b)

Horizontally add adjacent pairs of single-precision (32-bit) floating-point elements in a and b.

_mm256_hsub_pd
__m256d _mm256_hsub_pd(__m256d a, __m256d b)

Horizontally subtract adjacent pairs of double-precision (64-bit) floating-point elements in a and b.

_mm256_insert_epi16
__m256i _mm256_insert_epi16(__m256i a, short i, int index)

Copy a, and insert the 16-bit integer i into the result at the location specified by index & 15.

_mm256_insert_epi32
__m256i _mm256_insert_epi32(__m256i a, int i, int index)

Copy a, and insert the 32-bit integer i into the result at the location specified by index & 7.

_mm256_insert_epi64
__m256i _mm256_insert_epi64(__m256i a, long i, int index)

Copy a, and insert the 64-bit integer i into the result at the location specified by index & 3.

_mm256_insert_epi8
__m256i _mm256_insert_epi8(__m256i a, byte i, int index)

Copy a, and insert the 8-bit integer i into the result at the location specified by index & 31.

_mm256_insertf128_pd
__m256d _mm256_insertf128_pd(__m256d a, __m128d b)

Copy a, then insert 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from b at the location specified by imm8.

_mm256_insertf128_ps
__m256 _mm256_insertf128_ps(__m256 a, __m128 b)

Copy a then insert 128 bits (composed of 4 packed single-precision (32-bit) floating-point elements) from b, at the location specified by imm8.

_mm256_insertf128_si256
__m256i _mm256_insertf128_si256(__m256i a, __m128i b)

Copy a, then insert 128 bits from b at the location specified by imm8.

_mm256_lddqu_si256
__m256i _mm256_lddqu_si256(const(__m256i)* mem_addr)

Load 256-bits of integer data from unaligned memory into dst. This intrinsic may run better than _mm256_loadu_si256 when the data crosses a cache line boundary.

_mm256_load_pd
__m256d _mm256_load_pd(const(double)* mem_addr)

Load 256-bits (composed of 4 packed double-precision (64-bit) floating-point elements) from memory. mem_addr must be aligned on a 32-byte boundary or a general-protection exception may be generated.

_mm256_load_ps
__m256 _mm256_load_ps(const(float)* mem_addr)

Load 256-bits (composed of 8 packed single-precision (32-bit) floating-point elements) from memory. mem_addr must be aligned on a 32-byte boundary or a general-protection exception may be generated.

_mm256_load_si256
__m256i _mm256_load_si256(const(void)* mem_addr)

Load 256-bits of integer data from memory. mem_addr must be aligned on a 32-byte boundary or a general-protection exception may be generated.

_mm256_loadu2_m128
__m256 _mm256_loadu2_m128(const(float)* hiaddr, const(float)* loaddr)

Load two 128-bit values (composed of 4 packed single-precision (32-bit) floating-point elements) from memory, and combine them into a 256-bit value. hiaddr and loaddr do not need to be aligned on any particular boundary.

_mm256_loadu2_m128d
__m256d _mm256_loadu2_m128d(const(double)* hiaddr, const(double)* loaddr)

Load two 128-bit values (composed of 2 packed double-precision (64-bit) floating-point elements) from memory, and combine them into a 256-bit value. hiaddr and loaddr do not need to be aligned on any particular boundary.

_mm256_loadu2_m128i
__m256i _mm256_loadu2_m128i(const(__m128i)* hiaddr, const(__m128i)* loaddr)

Load two 128-bit values (composed of integer data) from memory, and combine them into a 256-bit value. hiaddr and loaddr do not need to be aligned on any particular boundary.

_mm256_loadu_pd
__m256d _mm256_loadu_pd(const(void)* mem_addr)

Load 256-bits (composed of 4 packed double-precision (64-bit) floating-point elements) from memory. mem_addr does not need to be aligned on any particular boundary.

_mm256_loadu_ps
__m256 _mm256_loadu_ps(const(float)* mem_addr)

Load 256-bits (composed of 8 packed single-precision (32-bit) floating-point elements) from memory. mem_addr does not need to be aligned on any particular boundary.

_mm256_loadu_si256
__m256i _mm256_loadu_si256(const(__m256i)* mem_addr)

Load 256-bits of integer data from memory. mem_addr does not need to be aligned on any particular boundary.

_mm256_maskload_pd
__m256d _mm256_maskload_pd(const(double)* mem_addr, __m256i mask)

Load packed double-precision (64-bit) floating-point elements from memory using mask (elements are zeroed out when the high bit of the corresponding element is not set). See: "Note about mask load/store" to know why you must address valid memory only.

_mm256_maskload_ps
__m256 _mm256_maskload_ps(const(float)* mem_addr, __m256i mask)

Load packed single-precision (32-bit) floating-point elements from memory using mask (elements are zeroed out when the high bit of the corresponding element is not set). Note: emulating that instruction isn't efficient, since it needs to perform memory access only when needed. See: "Note about mask load/store" to know why you must address valid memory only.

_mm256_maskstore_pd
void _mm256_maskstore_pd(double* mem_addr, __m256i mask, __m256d a)

Store packed double-precision (64-bit) floating-point elements from a into memory using mask. See: "Note about mask load/store" to know why you must address valid memory only.

_mm256_maskstore_ps
void _mm256_maskstore_ps(float* mem_addr, __m256i mask, __m256 a)

Store packed single-precision (32-bit) floating-point elements from a into memory using mask. See: "Note about mask load/store" to know why you must address valid memory only.

_mm256_max_pd
__m256d _mm256_max_pd(__m256d a, __m256d b)

Compare packed double-precision (64-bit) floating-point elements in a and b, and return packed maximum values.

_mm256_max_ps
__m256 _mm256_max_ps(__m256 a, __m256 b)

Compare packed single-precision (32-bit) floating-point elements in a and b, and return packed maximum values.

_mm256_min_pd
__m256d _mm256_min_pd(__m256d a, __m256d b)

packed minimum values.

_mm256_min_ps
__m256 _mm256_min_ps(__m256 a, __m256 b)

Compare packed single-precision (32-bit) floating-point elements in a and b, and return packed maximum values.

_mm256_movedup_pd
__m256d _mm256_movedup_pd(__m256d a)

Duplicate even-indexed double-precision (64-bit) floating-point elements from a.

_mm256_movehdup_ps
__m256 _mm256_movehdup_ps(__m256 a)

Duplicate odd-indexed single-precision (32-bit) floating-point elements from a.

_mm256_moveldup_ps
__m256 _mm256_moveldup_ps(__m256 a)

Duplicate even-indexed single-precision (32-bit) floating-point elements from a.

_mm256_movemask_pd
int _mm256_movemask_pd(__m256d a)

Set each bit of result mask based on the most significant bit of the corresponding packed double-precision (64-bit) floating-point element in a.

_mm256_movemask_ps
int _mm256_movemask_ps(__m256 a)

Set each bit of mask result based on the most significant bit of the corresponding packed single-precision (32-bit) floating-point element in a.

_mm256_mul_pd
__m256d _mm256_mul_pd(__m256d a, __m256d b)

Multiply packed double-precision (64-bit) floating-point elements in a and b.

_mm256_mul_ps
__m256 _mm256_mul_ps(__m256 a, __m256 b)

Multiply packed single-precision (32-bit) floating-point elements in a and b.

_mm256_not_si256
__m256i _mm256_not_si256(__m256i a)

Compute the bitwise NOT of 256 bits in a. #BONUS

_mm256_or_pd
__m256d _mm256_or_pd(__m256d a, __m256d b)

Compute the bitwise OR of packed double-precision (64-bit) floating-point elements in a and b.

_mm256_or_ps
__m256 _mm256_or_ps(__m256 a, __m256 b)

Compute the bitwise OR of packed single-precision (32-bit) floating-point elements in a and b.

_mm256_permute2f128_pd
__m256d _mm256_permute2f128_pd(__m256d a, __m256d b)
_mm256_permute2f128_ps
__m256d _mm256_permute2f128_ps(__m256 a, __m256 b)
_mm256_permute2f128_si256
__m256i _mm256_permute2f128_si256(__m256i a, __m256i b)

Shuffle 128-bits (composed of 2 packed double-precision (64-bit) floating-point elements) selected by imm8 from a and b.

_mm256_permute_pd
__m256d _mm256_permute_pd(__m256d a)

Shuffle double-precision (64-bit) floating-point elements in a using the control in imm8.

_mm256_permute_ps
__m256 _mm256_permute_ps(__m256 a)

Shuffle single-precision (32-bit) floating-point elements in a within 128-bit lanes using the control in imm8. The same shuffle is applied in lower and higher 128-bit lane.

_mm256_permutevar_pd
__m256d _mm256_permutevar_pd(__m256d a, __m256i b)

Shuffle double-precision (64-bit) floating-point elements in a using the control in b. Warning: the selector is in bit 1, not bit 0, of each 64-bit element! This is really not intuitive.

_mm256_permutevar_ps
__m256 _mm256_permutevar_ps(__m256 a, __m256i b)

Shuffle single-precision (32-bit) floating-point elements in a using the control in b.

_mm256_rcp_ps
__m256 _mm256_rcp_ps(__m256 a)

Compute the approximate reciprocal of packed single-precision (32-bit) floating-point elements in a. The maximum relative error for this approximation is less than 1.5*2^-12.

_mm256_round_pd
__m256d _mm256_round_pd(__m256d a)

Round the packed double-precision (64-bit) floating-point elements in a using the rounding parameter, and store the results as packed double-precision floating-point elements. Rounding is done according to the rounding[3:0] parameter, which can be one of: (_MM_FROUND_TO_NEAREST_INT |_MM_FROUND_NO_EXC) // round to nearest, and suppress exceptions (_MM_FROUND_TO_NEG_INF |_MM_FROUND_NO_EXC) // round down, and suppress exceptions (_MM_FROUND_TO_POS_INF |_MM_FROUND_NO_EXC) // round up, and suppress exceptions (_MM_FROUND_TO_ZERO |_MM_FROUND_NO_EXC) // truncate, and suppress exceptions _MM_FROUND_CUR_DIRECTION // use MXCSR.RC; see _MM_SET_ROUNDING_MODE

_mm256_round_ps
__m256 _mm256_round_ps(__m256 a)

Round the packed single-precision (32-bit) floating-point elements in a using the rounding parameter, and store the results as packed single-precision floating-point elements. Rounding is done according to the rounding[3:0] parameter, which can be one of: (_MM_FROUND_TO_NEAREST_INT |_MM_FROUND_NO_EXC) // round to nearest, and suppress exceptions (_MM_FROUND_TO_NEG_INF |_MM_FROUND_NO_EXC) // round down, and suppress exceptions (_MM_FROUND_TO_POS_INF |_MM_FROUND_NO_EXC) // round up, and suppress exceptions (_MM_FROUND_TO_ZERO |_MM_FROUND_NO_EXC) // truncate, and suppress exceptions _MM_FROUND_CUR_DIRECTION // use MXCSR.RC; see _MM_SET_ROUNDING_MODE

_mm256_rsqrt_ps
__m256 _mm256_rsqrt_ps(__m256 a)

Compute the approximate reciprocal square root of packed single-precision (32-bit) floating-point elements in a. The maximum relative error for this approximation is less than 1.5*2^-12.

_mm256_set1_epi16
__m256i _mm256_set1_epi16(short a)

Broadcast 16-bit integer a to all elements of the return value.

_mm256_set1_epi32
__m256i _mm256_set1_epi32(int a)

Broadcast 32-bit integer a to all elements.

_mm256_set1_epi64x
__m256i _mm256_set1_epi64x(long a)

Broadcast 64-bit integer a to all elements of the return value.

_mm256_set1_epi8
__m256i _mm256_set1_epi8(byte a)

Broadcast 8-bit integer a to all elements of the return value.

_mm256_set1_pd
__m256d _mm256_set1_pd(double a)

Broadcast double-precision (64-bit) floating-point value a to all elements of the return value.

_mm256_set1_ps
__m256 _mm256_set1_ps(float a)

Broadcast single-precision (32-bit) floating-point value a to all elements of the return value.

_mm256_set_epi16
__m256i _mm256_set_epi16(short e15, short e14, short e13, short e12, short e11, short e10, short e9, short e8, short e7, short e6, short e5, short e4, short e3, short e2, short e1, short e0)

Set packed 16-bit integers with the supplied values.

_mm256_set_epi32
__m256i _mm256_set_epi32(int e7, int e6, int e5, int e4, int e3, int e2, int e1, int e0)

Set packed 32-bit integers with the supplied values.

_mm256_set_epi64x
__m256i _mm256_set_epi64x(long e3, long e2, long e1, long e0)

Set packed 64-bit integers with the supplied values.

_mm256_set_epi8
__m256i _mm256_set_epi8(byte e31, byte e30, byte e29, byte e28, byte e27, byte e26, byte e25, byte e24, byte e23, byte e22, byte e21, byte e20, byte e19, byte e18, byte e17, byte e16, byte e15, byte e14, byte e13, byte e12, byte e11, byte e10, byte e9, byte e8, byte e7, byte e6, byte e5, byte e4, byte e3, byte e2, byte e1, byte e0)

Set packed 8-bit integers with the supplied values.

_mm256_set_m128
__m256 _mm256_set_m128(__m128 hi, __m128 lo)

Set packed __m256d vector with the supplied values.

_mm256_set_m128d
__m256d _mm256_set_m128d(__m128d hi, __m128d lo)

Set packed __m256d vector with the supplied values.

_mm256_set_m128i
__m256i _mm256_set_m128i(__m128i hi, __m128i lo)

Set packed __m256i vector with the supplied values.

_mm256_set_pd
__m256d _mm256_set_pd(double e3, double e2, double e1, double e0)

Set packed double-precision (64-bit) floating-point elements with the supplied values.

_mm256_set_ps
__m256 _mm256_set_ps(float e7, float e6, float e5, float e4, float e3, float e2, float e1, float e0)

Set packed single-precision (32-bit) floating-point elements with the supplied values.

_mm256_setr_epi16
__m256i _mm256_setr_epi16(short e15, short e14, short e13, short e12, short e11, short e10, short e9, short e8, short e7, short e6, short e5, short e4, short e3, short e2, short e1, short e0)

Set packed 16-bit integers with the supplied values in reverse order.

_mm256_setr_epi32
__m256i _mm256_setr_epi32(int e7, int e6, int e5, int e4, int e3, int e2, int e1, int e0)

Set packed 32-bit integers with the supplied values in reverse order.

_mm256_setr_epi64x
__m256i _mm256_setr_epi64x(long e3, long e2, long e1, long e0)

Set packed 64-bit integers with the supplied values in reverse order.

_mm256_setr_epi8
__m256i _mm256_setr_epi8(byte e31, byte e30, byte e29, byte e28, byte e27, byte e26, byte e25, byte e24, byte e23, byte e22, byte e21, byte e20, byte e19, byte e18, byte e17, byte e16, byte e15, byte e14, byte e13, byte e12, byte e11, byte e10, byte e9, byte e8, byte e7, byte e6, byte e5, byte e4, byte e3, byte e2, byte e1, byte e0)

Set packed 8-bit integers with the supplied values in reverse order.

_mm256_setr_m128
__m256 _mm256_setr_m128(__m128 lo, __m128 hi)

Set packed __m256 vector with the supplied values.

_mm256_setr_m128d
__m256d _mm256_setr_m128d(__m128d lo, __m128d hi)

Set packed __m256d vector with the supplied values.

_mm256_setr_m128i
__m256i _mm256_setr_m128i(__m128i lo, __m128i hi)

Set packed __m256i vector with the supplied values.

_mm256_setr_pd
__m256d _mm256_setr_pd(double e3, double e2, double e1, double e0)

Set packed double-precision (64-bit) floating-point elements with the supplied values in reverse order.

_mm256_setr_ps
__m256 _mm256_setr_ps(float e7, float e6, float e5, float e4, float e3, float e2, float e1, float e0)

Set packed single-precision (32-bit) floating-point elements with the supplied values in reverse order.

_mm256_setzero_pd
__m256d _mm256_setzero_pd()

Return vector of type __m256d with all elements set to zero.

_mm256_setzero_ps
__m256 _mm256_setzero_ps()

Return vector of type __m256 with all elements set to zero.

_mm256_setzero_si256
__m256i _mm256_setzero_si256()

Return vector of type __m256i with all elements set to zero.

_mm256_shuffle_pd
__m256d _mm256_shuffle_pd(__m256d a, __m256d b)

Shuffle double-precision (64-bit) floating-point elements within 128-bit lanes using the control in imm8.

_mm256_shuffle_ps
__m256 _mm256_shuffle_ps(__m256 a, __m256 b)

Shuffle single-precision (32-bit) floating-point elements in a within 128-bit lanes using the control in imm8.

_mm256_sqrt_pd
__m256d _mm256_sqrt_pd(__m256d a)

Compute the square root of packed double-precision (64-bit) floating-point elements in a.

_mm256_sqrt_ps
__m256 _mm256_sqrt_ps(__m256 a)

Compute the square root of packed single-precision (32-bit) floating-point elements in a.

_mm256_store_pd
void _mm256_store_pd(double* mem_addr, __m256d a)

Store 256-bits (composed of 4 packed double-precision (64-bit) floating-point elements) from a into memory. mem_addr must be aligned on a 32-byte boundary or a general-protection exception may be generated.

_mm256_store_ps
void _mm256_store_ps(float* mem_addr, __m256 a)

Store 256-bits (composed of 8 packed single-precision (32-bit) floating-point elements) from a into memory. mem_addr must be aligned on a 32-byte boundary or a general-protection exception may be generated.

_mm256_store_si256
void _mm256_store_si256(__m256i* mem_addr, __m256i a)

Store 256-bits of integer data from a into memory. mem_addr must be aligned on a 32-byte boundary or a general-protection exception may be generated.

_mm256_storeu2_m128
void _mm256_storeu2_m128(float* hiaddr, float* loaddr, __m256 a)

Store the high and low 128-bit halves (each composed of 4 packed single-precision (32-bit) floating-point elements) from a into memory two different 128-bit locations. hiaddr and loaddr do not need to be aligned on any particular boundary.

_mm256_storeu2_m128d
void _mm256_storeu2_m128d(double* hiaddr, double* loaddr, __m256d a)

Store the high and low 128-bit halves (each composed of 2 packed double-precision (64-bit) floating-point elements) from a into memory two different 128-bit locations. hiaddr and loaddr do not need to be aligned on any particular boundary.

_mm256_storeu2_m128i
void _mm256_storeu2_m128i(__m128i* hiaddr, __m128i* loaddr, __m256i a)

Store the high and low 128-bit halves (each composed of integer data) from a into memory two different 128-bit locations. hiaddr and loaddr do not need to be aligned on any particular boundary.

_mm256_storeu_pd
void _mm256_storeu_pd(double* mem_addr, __m256d a)

Store 256-bits (composed of 4 packed double-precision (64-bit) floating-point elements) from a into memory. mem_addr does not need to be aligned on any particular boundary.

_mm256_storeu_ps
void _mm256_storeu_ps(float* mem_addr, __m256 a)

Store 256-bits (composed of 8 packed single-precision (32-bit) floating-point elements) from a into memory. mem_addr does not need to be aligned on any particular boundary.

_mm256_storeu_si256
void _mm256_storeu_si256(__m256i* mem_addr, __m256i a)

Store 256-bits of integer data from a into memory. mem_addr does not need to be aligned on any particular boundary.

_mm256_stream_pd
void _mm256_stream_pd(double* mem_addr, __m256d a)

Store 256-bits (composed of 4 packed single-precision (64-bit) floating-point elements) from a into memory using a non-temporal memory hint. mem_addr must be aligned on a 32-byte boundary or a general-protection exception may be generated. Note: non-temporal stores should be followed by _mm_sfence() for reader threads.

_mm256_stream_ps
void _mm256_stream_ps(float* mem_addr, __m256 a)

Store 256-bits (composed of 8 packed single-precision (32-bit) floating-point elements) from a into memory using a non-temporal memory hint. mem_addr must be aligned on a 32-byte boundary or a general-protection exception may be generated. Note: non-temporal stores should be followed by _mm_sfence() for reader threads.

_mm256_stream_si256
void _mm256_stream_si256(__m256i* mem_addr, __m256i a)

Store 256-bits of integer data from a into memory using a non-temporal memory hint. mem_addr must be aligned on a 32-byte boundary or a general-protection exception may be generated. Note: there isn't any particular instruction in AVX to do that. It just defers to SSE2. Note: non-temporal stores should be followed by _mm_sfence() for reader threads.

_mm256_sub_pd
__m256d _mm256_sub_pd(__m256d a, __m256d b)

Subtract packed double-precision (64-bit) floating-point elements in b from packed double-precision (64-bit) floating-point elements in a.

_mm256_sub_ps
__m256 _mm256_sub_ps(__m256 a, __m256 b)

Subtract packed single-precision (32-bit) floating-point elements in b from packed single-precision (32-bit) floating-point elements in a.

_mm256_testc_pd
int _mm256_testc_pd(__m256d a, __m256d b)

Compute the bitwise NOT of a and then AND with b, producing an intermediate value, and return 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise return 0.

_mm256_testc_ps
int _mm256_testc_ps(__m256 a, __m256 b)

Compute the bitwise NOT of a and then AND with b, producing an intermediate value, and return 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise return 0.

_mm256_testc_si256
int _mm256_testc_si256(__m256i a, __m256i b)

Compute the bitwise NOT of a and then AND with b, and return 1 if the result is zero, otherwise return 0. In other words, test if all bits masked by b are also 1 in a.

_mm256_testnzc_pd
int _mm256_testnzc_pd(__m256d a, __m256d b)

Compute the bitwise AND of 256 bits (representing double-precision (64-bit) floating-point elements) in a and b, producing an intermediate 256-bit value, and set ZF to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set ZF to 0. Compute the bitwise NOT of a and then AND with b, producing an intermediate value, and set CF to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set CF to 0. Return 1 if both the ZF and CF values are zero, otherwise return 0.

_mm256_testnzc_ps
int _mm256_testnzc_ps(__m256 a, __m256 b)

Compute the bitwise AND of 256 bits (representing double-precision (64-bit) floating-point elements) in a and b, producing an intermediate 256-bit value, and set ZF to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set ZF to 0. Compute the bitwise NOT of a and then AND with b, producing an intermediate value, and set CF to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set CF to 0. Return 1 if both the ZF and CF values are zero, otherwise return 0.

_mm256_testnzc_si256
int _mm256_testnzc_si256(__m256i a, __m256i b)

Compute the bitwise AND of 256 bits (representing integer data) in a and b, and set ZF to 1 if the result is zero, otherwise set ZF to 0. Compute the bitwise NOT of a and then AND with b, and set CF to 1 if the result is zero, otherwise set CF to 0. Return 1 if both the ZF and CF values are zero, otherwise return 0.

_mm256_testz_pd
int _mm256_testz_pd(__m256d a, __m256d b)

Compute the bitwise AND of 256 bits (representing double-precision (64-bit) floating-point elements) in a and b, producing an intermediate 256-bit value, return 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise return 0. In other words, return 1 if a and b don't both have a negative number as the same place.

_mm256_testz_ps
int _mm256_testz_ps(__m256 a, __m256 b)

Compute the bitwise AND of 256 bits (representing double-precision (32-bit) floating-point elements) in a and b, producing an intermediate 256-bit value, return 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise return 0. In other words, return 1 if a and b don't both have a negative number as the same place.

_mm256_testz_si256
int _mm256_testz_si256(__m256i a, __m256i b)

Compute the bitwise AND of 256 bits (representing integer data) in and return 1 if the result is zero, otherwise return 0. In other words, test if all bits masked by b are 0 in a.

_mm256_undefined_pd
__m256d _mm256_undefined_pd()

Return vector of type __m256d with undefined elements.

_mm256_undefined_ps
__m256 _mm256_undefined_ps()

Return vector of type __m256 with undefined elements.

_mm256_undefined_si256
__m256i _mm256_undefined_si256()

Return vector of type __m256i with undefined elements.

_mm256_unpackhi_pd
__m256d _mm256_unpackhi_pd(__m256d a, __m256d b)

Unpack and interleave double-precision (64-bit) floating-point elements from the high half of each 128-bit lane in a and b.

_mm256_unpackhi_ps
__m256 _mm256_unpackhi_ps(__m256 a, __m256 b)

Unpack and interleave double-precision (64-bit) floating-point elements from the high half of each 128-bit lane in a and b.

_mm256_unpacklo_pd
__m256d _mm256_unpacklo_pd(__m256d a, __m256d b)

Unpack and interleave double-precision (64-bit) floating-point elements from the low half of each 128-bit lane in a and b.

_mm256_unpacklo_ps
__m256 _mm256_unpacklo_ps(__m256 a, __m256 b)

Unpack and interleave single-precision (32-bit) floating-point elements from the low half of each 128-bit lane in a and b.

_mm256_xor_pd
__m256d _mm256_xor_pd(__m256d a, __m256d b)

Compute the bitwise XOR of packed double-precision (64-bit) floating-point elements in a and b.

_mm256_xor_ps
__m256 _mm256_xor_ps(__m256 a, __m256 b)

Compute the bitwise XOR of packed single-precision (32-bit) floating-point elements in a and b.

_mm256_zextpd128_pd256
__m256d _mm256_zextpd128_pd256(__m128d a)

Cast vector of type __m128d to type __m256d; the upper 128 bits of the result are zeroed.

_mm256_zextps128_ps256
__m256 _mm256_zextps128_ps256(__m128 a)

Cast vector of type __m128 to type __m256; the upper 128 bits of the result are zeroed.

_mm256_zextsi128_si256
__m256i _mm256_zextsi128_si256(__m128i a)

Cast vector of type __m128i to type __m256i; the upper 128 bits of the result are zeroed.

_mm_broadcast_ss
__m128 _mm_broadcast_ss(const(float)* mem_addr)

Broadcast a single-precision (32-bit) floating-point element from memory to all elements.

_mm_cmp_pd
__m128d _mm_cmp_pd(__m128d a, __m128d b)

Compare packed double-precision (64-bit) floating-point elements in a and b based on the comparison operand specified by imm8.

_mm_cmp_ps
__m128 _mm_cmp_ps(__m128 a, __m128 b)

Compare packed double-precision (32-bit) floating-point elements in a and b based on the comparison operand specified by imm8.

_mm_cmp_sd
__m128d _mm_cmp_sd(__m128d a, __m128d b)

Compare the lower double-precision (64-bit) floating-point element in a and b based on the comparison operand specified by imm8, store the result in the lower element of result, and copy the upper element from a to the upper element of result.

_mm_cmp_ss
__m128 _mm_cmp_ss(__m128 a, __m128 b)

Compare the lower single-precision (32-bit) floating-point element in a and b based on the comparison operand specified by imm8, store the result in the lower element of result, and copy the upper 3 packed elements from a to the upper elements of result.

_mm_maskload_pd
__m128d _mm_maskload_pd(const(double)* mem_addr, __m128i mask)

Load packed double-precision (64-bit) floating-point elements from memory using mask (elements are zeroed out when the high bit of the corresponding element is not set). Note: emulating that instruction isn't efficient, since it needs to perform memory access only when needed. See: "Note about mask load/store" to know why you must address valid memory only.

_mm_maskload_ps
__m128 _mm_maskload_ps(const(float)* mem_addr, __m128i mask)

Load packed single-precision (32-bit) floating-point elements from memory using mask (elements are zeroed out when the high bit of the corresponding element is not set). Warning: See "Note about mask load/store" to know why you must address valid memory only.

_mm_maskstore_pd
void _mm_maskstore_pd(double* mem_addr, __m128i mask, __m128d a)

Store packed double-precision (64-bit) floating-point elements from a into memory using mask. Note: emulating that instruction isn't efficient, since it needs to perform memory access only when needed. See: "Note about mask load/store" to know why you must address valid memory only.

_mm_maskstore_ps
void _mm_maskstore_ps(float* mem_addr, __m128i mask, __m128 a)

Store packed single-precision (32-bit) floating-point elements from a into memory using mask. Note: emulating that instruction isn't efficient, since it needs to perform memory access only when needed. See: "Note about mask load/store" to know why you must address valid memory only.

_mm_permute_pd
__m128d _mm_permute_pd(__m128d a)

Shuffle double-precision (64-bit) floating-point elements in a using the control in imm8.

_mm_permute_ps
__m128 _mm_permute_ps(__m128 a)

Shuffle single-precision (32-bit) floating-point elements in a using the control in imm8.

_mm_permutevar_pd
__m128d _mm_permutevar_pd(__m128d a, __m128i b)

Shuffle double-precision (64-bit) floating-point elements in a using the control in b. Warning: the selector is in bit 1, not bit 0, of each 64-bit element! This is really not intuitive.

_mm_permutevar_ps
__m128 _mm_permutevar_ps(__m128 a, __m128i b)

Shuffle single-precision (32-bit) floating-point elements in a using the control in b.

_mm_testc_pd
int _mm_testc_pd(__m128d a, __m128d b)

Compute the bitwise NOT of a and then AND with b, producing an intermediate value, and return 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise return 0.

_mm_testc_ps
int _mm_testc_ps(__m128 a, __m128 b)

Compute the bitwise NOT of a and then AND with b, producing an intermediate value, and return 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise return 0.

_mm_testnzc_pd
int _mm_testnzc_pd(__m128d a, __m128d b)

Compute the bitwise AND of 128 bits (representing double-precision (64-bit) floating-point elements) in a and b, producing an intermediate 128-bit value, and set ZF to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set ZF to 0. Compute the bitwise NOT of a and then AND with b, producing an intermediate value, and set CF to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set CF to 0. Return 1 if both the ZF and CF values are zero, otherwise return 0.

_mm_testnzc_ps
int _mm_testnzc_ps(__m128 a, __m128 b)

Compute the bitwise AND of 128 bits (representing double-precision (64-bit) floating-point elements) in a and b, producing an intermediate 128-bit value, and set ZF to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set ZF to 0. Compute the bitwise NOT of a and then AND with b, producing an intermediate value, and set CF to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set CF to 0. Return 1 if both the ZF and CF values are zero, otherwise return 0.

_mm_testz_pd
int _mm_testz_pd(__m128d a, __m128d b)

Compute the bitwise AND of 128 bits (representing double-precision (64-bit) floating-point elements) in a and b, producing an intermediate 128-bit value, return 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise return 0. In other words, return 1 if a and b don't both have a negative number as the same place.

_mm_testz_ps
int _mm_testz_ps(__m128 a, __m128 b)

Compute the bitwise AND of 128 bits (representing double-precision (32-bit) floating-point elements) in a and b, producing an intermediate 128-bit value, return 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise return 0. In other words, return 1 if a and b don't both have a negative number as the same place.

Meta