Blend packed 16-bit integers from a and b using control mask imm8, and store the results.
Blend packed double-precision (64-bit) floating-point elements from a and b using control mask imm8.
Blend packed single-precision (32-bit) floating-point elements from a and b using control mask imm8.
Blend packed 8-bit integers from a and b using mask.
Blend packed double-precision (64-bit) floating-point elements from a and b using mask.
Blend packed single-precision (32-bit) floating-point elements from a and b using mask.
Round the packed double-precision (64-bit) floating-point elements in a up to an integer value, and store the results as packed double-precision floating-point elements.
Round the packed single-precision (32-bit) floating-point elements in a up to an integer value, and store the results as packed single-precision floating-point elements.
Round the lower double-precision (64-bit) floating-point element in b up to an integer value, store the result as a double-precision floating-point element in the lower element of result, and copy the upper element from a to the upper element of dst.
Round the lower single-precision (32-bit) floating-point element in b up to an integer value, store the result as a single-precision floating-point element in the lower element of result, and copy the upper 3 packed elements from a to the upper elements of result.
Compare packed 64-bit integers in a and b for equality.
Sign extend packed 16-bit integers in a to packed 32-bit integers.
Sign extend packed 16-bit integers in a to packed 64-bit integers.
Sign extend packed 32-bit integers in a to packed 64-bit integers.
Sign extend packed 8-bit integers in a to packed 16-bit integers.
Sign extend packed 8-bit integers in a to packed 32-bit integers.
Sign extend packed 8-bit integers in the low 8 bytes of a to packed 64-bit integers.
Zero extend packed unsigned 16-bit integers in a to packed 32-bit integers.
Zero extend packed unsigned 16-bit integers in a to packed 64-bit integers.
Zero extend packed unsigned 32-bit integers in a to packed 64-bit integers.
Zero extend packed unsigned 8-bit integers in a to packed 16-bit integers.
Zero extend packed unsigned 8-bit integers in a to packed 32-bit integers.
Zero extend packed unsigned 8-bit integers in the low 8 bytes of a to packed 64-bit integers.
Conditionally multiply the packed double-precision (64-bit) floating-point elements in a and b using the high 4 bits in imm8, sum the four products, and conditionally store the sum in dst using the low 4 bits of imm8.
Conditionally multiply the packed single-precision (32-bit) floating-point elements in a and b using the high 4 bits in imm8, sum the four products, and conditionally store the sum in result using the low 4 bits of imm8.
Extract a 32-bit integer from a, selected with imm8.
Extract a 64-bit integer from a, selected with imm8.
Extract an 8-bit integer from a, selected with imm8. Warning: the returned value is zero-extended to 32-bits.
Extract a single-precision (32-bit) floating-point element from a, selected with imm8. Note: returns a 32-bit integer.
Round the packed double-precision (64-bit) floating-point elements in a down to an integer value, and store the results as packed double-precision floating-point elements.
Round the packed single-precision (32-bit) floating-point elements in a down to an integer value, and store the results as packed single-precision floating-point elements.
Round the lower double-precision (64-bit) floating-point element in b down to an integer value, store the result as a double-precision floating-point element in the lower element, and copy the upper element from a to the upper element.
Round the lower single-precision (32-bit) floating-point element in b down to an integer value, store the result as a single-precision floating-point element in the lower element, and copy the upper 3 packed elements from a to the upper elements.
Insert the 32-bit integer i into a at the location specified by imm8[1:0].
Insert the 64-bit integer i into a at the location specified by imm8[0].
Insert the 8-bit integer i into a at the location specified by imm8[2:0]. Copy a to dst, and insert the lower 8-bit integer from i into dst at the location specified by imm8.
Warning: of course it does something totally different from _mm_insert_epi32! Copy a to tmp, then insert a single-precision (32-bit) floating-point element from b into tmp using the control in imm8. Store tmp to result using the mask in imm8[3:0] (elements are zeroed out when the corresponding bit is set).
Compare packed signed 32-bit integers in a and b, returns packed maximum values.
Compare packed signed 8-bit integers in a and b, and return packed maximum values.
Compare packed unsigned 16-bit integers in a and b, returns packed maximum values.
Compare packed unsigned 32-bit integers in a and b, returns packed maximum values.
Compare packed signed 32-bit integers in a and b, returns packed maximum values.
Compare packed signed 8-bit integers in a and b, and return packed minimum values.
Compare packed unsigned 16-bit integers in a and b, and store packed minimum values in dst.
Compare packed unsigned 32-bit integers in a and b, and store packed minimum values in dst.
Horizontally compute the minimum amongst the packed unsigned 16-bit integers in a, store the minimum and index in return value, and zero the remaining bits.
Compute the sum of absolute differences (SADs) of quadruplets of unsigned 8-bit integers in a compared to those in b, and store the 16-bit results in dst. Eight SADs are performed using one quadruplet from b and eight quadruplets from a. One quadruplet is selected from b starting at on the offset specified in imm8[1:0]. Eight quadruplets are formed from sequential 8-bit integers selected from a starting at the offset specified in imm8[2].
Multiply the low signed 32-bit integers from each packed 64-bit element in a and b, and store the signed 64-bit results in dst.
Multiply the packed 32-bit integers in a and b, producing intermediate 64-bit integers, return the low 32 bits of the intermediate integers.
Convert packed signed 32-bit integers from a and b to packed 16-bit integers using unsigned saturation.
Round the packed double-precision (64-bit) floating-point elements in a using the rounding parameter, and store the results as packed double-precision floating-point elements. Rounding is done according to the rounding[3:0] parameter, which can be one of: (_MM_FROUND_TO_NEAREST_INT |_MM_FROUND_NO_EXC) // round to nearest, and suppress exceptions (_MM_FROUND_TO_NEG_INF |_MM_FROUND_NO_EXC) // round down, and suppress exceptions (_MM_FROUND_TO_POS_INF |_MM_FROUND_NO_EXC) // round up, and suppress exceptions (_MM_FROUND_TO_ZERO |_MM_FROUND_NO_EXC) // truncate, and suppress exceptions _MM_FROUND_CUR_DIRECTION // use MXCSR.RC; see _MM_SET_ROUNDING_MODE
Round the packed single-precision (32-bit) floating-point elements in a using the rounding parameter, and store the results as packed single-precision floating-point elements. Rounding is done according to the rounding[3:0] parameter, which can be one of: (_MM_FROUND_TO_NEAREST_INT |_MM_FROUND_NO_EXC) // round to nearest, and suppress exceptions (_MM_FROUND_TO_NEG_INF |_MM_FROUND_NO_EXC) // round down, and suppress exceptions (_MM_FROUND_TO_POS_INF |_MM_FROUND_NO_EXC) // round up, and suppress exceptions (_MM_FROUND_TO_ZERO |_MM_FROUND_NO_EXC) // truncate, and suppress exceptions _MM_FROUND_CUR_DIRECTION // use MXCSR.RC; see _MM_SET_ROUNDING_MODE
Round the lower double-precision (64-bit) floating-point element in b using the rounding parameter, store the result as a double-precision floating-point element in the lower element of result, and copy the upper element from a to the upper element of result. Rounding is done according to the rounding[3:0] parameter, which can be one of: (_MM_FROUND_TO_NEAREST_INT |_MM_FROUND_NO_EXC) // round to nearest, and suppress exceptions (_MM_FROUND_TO_NEG_INF |_MM_FROUND_NO_EXC) // round down, and suppress exceptions (_MM_FROUND_TO_POS_INF |_MM_FROUND_NO_EXC) // round up, and suppress exceptions (_MM_FROUND_TO_ZERO |_MM_FROUND_NO_EXC) // truncate, and suppress exceptions _MM_FROUND_CUR_DIRECTION // use MXCSR.RC; see _MM_SET_ROUNDING_MODE
Round the lower single-precision (32-bit) floating-point element in b using the rounding parameter, store the result as a single-precision floating-point element in the lower element of result, and copy the upper 3 packed elements from a to the upper elements of result. Rounding is done according to the rounding[3:0] parameter, which can be one of: (_MM_FROUND_TO_NEAREST_INT |_MM_FROUND_NO_EXC) // round to nearest, and suppress exceptions (_MM_FROUND_TO_NEG_INF |_MM_FROUND_NO_EXC) // round down, and suppress exceptions (_MM_FROUND_TO_POS_INF |_MM_FROUND_NO_EXC) // round up, and suppress exceptions (_MM_FROUND_TO_ZERO |_MM_FROUND_NO_EXC) // truncate, and suppress exceptions _MM_FROUND_CUR_DIRECTION // use MXCSR.RC; see _MM_SET_ROUNDING_MODE
Load 128-bits of integer data from memory using a non-temporal memory hint. mem_addr must be aligned on a 16-byte boundary or a general-protection exception may be generated.
Return 1 if all bits in a are all 1's. Else return 0.
Return 1 if all bits in a are all 0's. Else return 0.
Compute the bitwise AND of 128 bits (representing integer data) in a and mask, and return 1 if the result is zero, otherwise return 0.
Compute the bitwise AND of 128 bits (representing integer data) in a and mask, and set ZF to 1 if the result is zero, otherwise set ZF to 0. Compute the bitwise NOT of a and then AND with mask, and set CF to 1 if the result is zero, otherwise set CF to 0. Return 1 if both the ZF and CF values are zero, otherwise return 0.
Compute the bitwise NOT of a and then AND with b, and return 1 if the result is zero, otherwise return 0. In other words, test if all bits masked by b are 1 in a.
Compute the bitwise AND of 128 bits (representing integer data) in a and b, and set ZF to 1 if the result is zero, otherwise set ZF to 0. Compute the bitwise NOT of a and then AND with b, and set CF to 1 if the result is zero, otherwise set CF to 0. Return 1 if both the ZF and CF values are zero, otherwise return 0.
Compute the bitwise AND of 128 bits (representing integer data) in a and b, and return 1 if the result is zero, otherwise return 0. In other words, test if all bits masked by b are 0 in a.
SSE4.1 rounding modes
SSE4.1 rounding modes
SSE4.1 intrinsics. https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#techs=SSE4_1