Broadcast 128 bits of integer data from `a to all 128-bit lanes in result. Note: also exist with name _mm256_broadcastsi128_si256 which is identical.
Compute the absolute value of packed signed 16-bit integers in a.
Compute the absolute value of packed signed 32-bit integers in a.
Compute the absolute value of packed signed 8-bit integers in a.
Add packed 16-bit integers in a and b.
Add packed 32-bit integers in a and b.
Add packed 64-bit integers in a and b.
Add packed 8-bit integers in a and b.
Add packed 16-bit signed integers in a and b using signed saturation.
Add packed 8-bit signed integers in a and b using signed saturation.
Add packed 16-bit unsigned integers in a and b using unsigned saturation.
Add packed 8-bit unsigned integers in a and b using unsigned saturation.
Concatenate pairs of 16-byte blocks in a and b into a 32-byte temporary result, shift the result right by imm8 bytes, and return the low 16 bytes of that in each lane.
Compute the bitwise AND of 256 bits (representing integer data) in a and b.
Compute the bitwise NOT of 256 bits (representing integer data) in a and then AND with b.
Average packed unsigned 16-bit integers in a and b.
Average packed unsigned 8-bit integers in a and b.
Blend packed 16-bit integers from a and b within 128-bit lanes using 8-bit control mask imm8, in each of the two lanes. Note: this is functionally equivalent to two _mm_blend_epi16.
Blend packed 32-bit integers from a and b using 8-bit control mask imm8.
Bro0adcast the low packed 8-bit integer from a to all elements of result.
Broadcast the low packed 32-bit integer from a to all elements of result.
Broadcast the low packed 64-bit integer from a to all elements of result.
Broadcast the low double-precision (64-bit) floating-point element from a to all elements of result.
Broadcast the low single-precision (32-bit) floating-point element from a to all elements of result.
Broadcast the low packed 16-bit integer from a to all elements of result.
Compare packed 16-bit integers in a and b for equality.
Compare packed 32-bit integers in a and b for equality.
Compare packed 64-bit integers in a and b for equality.
Compare packed 8-bit integers in a and b for equality.
Compare packed signed 16-bit integers in a and b for greater-than.
Compare packed signed 32-bit integers in a and b for greater-than.
Compare packed signed 8-bit integers in a and b for greater-than.
Sign extend packed 16-bit integers in a to packed 32-bit integers.
Sign extend packed 16-bit integers in a to packed 64-bit integers.
Sign extend packed 32-bit integers in a to packed 64-bit integers.
Sign extend packed 8-bit integers in a to packed 16-bit integers.
Sign extend packed 8-bit integers in a to packed 32-bit integers.
Sign extend packed 8-bit integers in the low 8 bytes of a to packed 64-bit integers.
Zero-extend packed unsigned 16-bit integers in a to packed 32-bit integers.
Zero-extend packed unsigned 16-bit integers in a to packed 64-bit integers.
Zero-extend packed unsigned 32-bit integers in a to packed 64-bit integers.
Zero-extend packed unsigned 8-bit integers in a to packed 16-bit integers.
Zero-extend packed unsigned 8-bit integers in a to packed 32-bit integers.
Zero-extend packed unsigned 8-bit integers in a to packed 64-bit integers.
Extract a 16-bit integer from a, selected with index.
Extract a 8-bit integer from a, selected with index.
Extract 128 bits (composed of integer data) from a, selected with imm8.
Copy a to result, then insert 128 bits from b into result at the location specified by imm8.
Multiply packed signed 16-bit integers in a and b, producing intermediate signed 32-bit integers. Horizontally add adjacent pairs of intermediate 32-bit integers, and pack the results in destination.
Compare packed signed 16-bit integers in a and b, and return packed maximum values.
Compare packed signed 32-bit integers in a and b, and return packed maximum values.
Compare packed signed 8-bit integers in a and b, and return packed maximum values.
Compare packed unsigned 16-bit integers in a and b, and return packed maximum values.
Compare packed unsigned 32-bit integers in a and b, and return packed maximum values.
Compare packed unsigned 8-bit integers in a and b, and return packed maximum values.
Compare packed signed 32-bit integers in a and b, and return packed minimum values.
Compare packed signed 8-bit integers in a and b, and return packed minimum values.
Compare packed unsigned 16-bit integers in a and b, and return packed minimum values.
Compare packed unsigned 32-bit integers in a and b, and return packed minimum values.
Compare packed unsigned 8-bit integers in a and b, and return packed minimum values.
Multiply the low signed 32-bit integers from each packed 64-bit element in a and b, and return the signed 64-bit results.
Multiply the low unsigned 32-bit integers from each packed 64-bit element in a and b, and return the unsigned 64-bit results.
Compute the bitwise OR of 256 bits (representing integer data) in a and b.
Convert packed signed 16-bit integers from a and b to packed 8-bit integers using signed saturation. Warning: a and b are interleaved per-lane. Result has: a lane 0, b lane 0, a lane 1, b lane 1.
Convert packed signed 32-bit integers from a and b to packed 16-bit integers using signed saturation. Warning: a and b are interleaved per-lane. Result has: a lane 0, b lane 0, a lane 1, b lane 1.
Convert packed signed 16-bit integers from a and b to packed 8-bit integers using unsigned saturation. Warning: a and b are interleaved per-lane. Result has: a lane 0, b lane 0, a lane 1, b lane 1.
Convert packed signed 32-bit integers from a and b to packed 16-bit integers using unsigned saturation. Warning: a and b are interleaved per-lane. Result has: a lane 0, b lane 0, a lane 1, b lane 1.
Compute the absolute differences of packed unsigned 8-bit integers in a and b, then horizontally sum each consecutive 8 differences to produce two unsigned 16-bit integers, and pack these unsigned 16-bit integers in the low 16 bits of 64-bit elements in result.
Shift packed 16-bit integers in a left by imm8 while shifting in zeros.
Shift packed 32-bit integers in a left by imm8 while shifting in zeros.
Shift packed 64-bit integers in a left by imm8 while shifting in zeros.
Shift packed 32-bit integers in a right by imm8 while shifting in sign bits.
Shift packed 32-bit integers in a right by imm8 while shifting in sign bits.
Shift packed 16-bit integers in a right by imm8 while shifting in zeros.
Shift packed 32-bit integers in a right by imm8 while shifting in zeros.
Subtract packed 16-bit integers in b from packed 16-bit integers in a.
Subtract packed 32-bit integers in b from packed 32-bit integers in a.
Subtract packed 64-bit integers in b from packed 64-bit integers in a.
Subtract packed 8-bit integers in b from packed 8-bit integers in a.
Subtract packed signed 16-bit integers in b from packed 16-bit integers in a using saturation.
Subtract packed signed 8-bit integers in b from packed 8-bit integers in a using saturation.
Subtract packed unsigned 16-bit integers in b from packed unsigned 16-bit integers in a using saturation.
Subtract packed unsigned 8-bit integers in b from packed unsigned 8-bit integers in a using saturation.
Unpack and interleave 16-bit integers from the high half of each 128-bit lane in a and b.
Unpack and interleave 32-bit integers from the high half of each 128-bit lane in a and b.
Unpack and interleave 64-bit integers from the high half of each 128-bit lane in a and b.
Unpack and interleave 8-bit integers from the high half of each 128-bit lane in a and b,
Unpack and interleave 16-bit integers from the low half of each 128-bit lane in a and b.
Unpack and interleave 32-bit integers from the low half of each 128-bit lane in a and b.
Unpack and interleave 64-bit integers from the low half of each 128-bit lane in a and b.
Unpack and interleave 8-bit integers from the low half of each 128-bit lane in a and b.
Compute the bitwise XOR of 256 bits (representing integer data) in a and b.
Blend packed 32-bit integers from a and b using 4-bit control mask imm8.
Broadcast the low packed 8-bit integer from a to all elements of result.
Broadcast the low packed 32-bit integer from a to all elements of result.
Broadcast the low packed 64-bit integer from a to all elements of result.
Broadcast the low double-precision (64-bit) floating-point element from a to all elements of result.
Broadcast 128 bits of integer data from `a to all 128-bit lanes in result. Note: also exist with name _mm256_broadcastsi128_si256 which is identical.
Broadcast the low single-precision (32-bit) floating-point element from a to all elements of result.
Broadcast the low packed 16-bit integer from a to all elements of result.
Load packed 32-bit integers from memory using mask (elements are zeroed out when the highest bit is not set in the corresponding element). Warning: See "Note about mask load/store" to know why you must address valid memory only.
AVX2 intrinsics. https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#techs=AVX2