Intrinsics for Fused Multiply Add Operations

_mm_fmadd_pd, _mm256_fmadd_pd
Multiply-adds packed double-precision floating-point values using three float64 vectors. The corresponding FMA instruction is VFMADD<XXX>PD, where XXX could be 132, 213, or 231.
_mm_fmadd_ps, _mm256_fmadd_ps
Multiply-adds packed single-precision floating-point values using three float32 vectors. The corresponding FMA instruction is VFMADD<XXX>PS, where XXX could be 132, 213, or 231.
_mm_fmadd_sd
Multiply-adds scalar double-precision floating-point values using three float64 vectors. The corresponding FMA instruction is VFMADD<XXX>SD, where XXX could be 132, 213, or 231.
_mm_fmadd_ss
Multiply-adds scalar single-precision floating-point values using three float32vectors. The corresponding FMA instruction is VFMADD<XXX>SS, where XXX could be 132, 213, or 231.
_mm_fmaddsub_pd, _mm256_fmaddsub_pd
Multiply-adds and subtracts packed double-precision floating-point values using three float64 vectors. The corresponding FMA instruction is VFMADDSUB<XXX>PD, where XXX could be 132, 213, or 231.
_mm_fmaddsub_ps, _mm256_fmaddsub_ps
Multiply-adds and subtracts packed single-precision floating-point values using three float32 vectors. The corresponding FMA instruction is VFMADDSUB<XXX>PS, where XXX could be 132, 213, or 231.
_mm_fmsubadd_pd, _mm256_fmsubadd_pd
Multiply-subtracts and adds packed double-precision floating-point values using three float64 vectors. The corresponding FMA instruction is VFMSUBADD<XXX>PD, where XXX could be 132, 213, or 231.
_mm_fmsubadd_ps, _mm256_fmsubadd_ps
Multiply-subtracts and adds packed single-precision floating-point values using three float32 vectors. The corresponding FMA instruction is VFMSUBADD<XXX>PS, where XXX could be 132, 213, or 231.
_mm_fmsub_pd, _mm256_fmsub_pd
Multiply-subtracts packed double-precision floating-point values using three float64 vectors. The corresponding FMA instruction is VFMSUB<XXX>PD, where XXX could be 132, 213, or 231.
_mm_fmsub_ps, _mm256_fmsub_ps
Multiply-subtracts packed single-precision floating-point values using three float32 vectors. The corresponding FMA instruction is VFMSUB<XXX>PS, where XXX could be 132, 213, or 231.
_mm_fmsub_sd
Multiply-subtracts scalar double-precision floating-point values using three float64 vectors. The corresponding FMA instruction is VFMSUB<XXX>SD, where XXX could be 132, 213, or 231.
_mm_fmsub_ss
Multiply-subtracts scalar single-precision floating-point values using three float32vectors. The corresponding FMA instruction is VFMSUB<XXX>SS, where XXX could be 132, 213, or 231.
_mm_fnmadd_pd, _mm256_fnmadd_pd
Multiply-adds negated packed double-precision floating-point values of three float64 vectors. The corresponding FMA instruction is VFNMADD<XXX>PD, where XXX could be 132, 213, or 231.
_mm_fnmadd_ps, _mm256_fnmadd_ps
Multiply-adds negated packed single-precision floating-point values of three float32 vectors. The corresponding FMA instruction is VFNMADD<XXX>PS, where XXX could be 132, 213, or 231.
_mm_fnmadd_sd
Multiply-adds negated scalar double-precision floating-point values of three float64 vectors. The corresponding FMA instruction is VFNMADD<XXX>SD, where XXX could be 132, 213, or 231.
_mm_fnmadd_ss
Multiply-adds negated scalar single-precision floating-point values of three float32 vectors. The corresponding FMA instruction is VFNMADD<XXX>SS, where XXX could be 132, 213, or 231.
_mm_fnmsub_pd, _mm256_fnmsub_pd
Multiply-subtracts negated packed double-precision floating-point values of three float64 vectors. The corresponding FMA instruction is VFNMSUB<XXX>PD, where XXX could be 132, 213, or 231.
_mm_fnmsub_ps, _mm256_fnmsub_ps
Multiply-subtracts negated packed single-precision floating-point values of three float32 vectors. The corresponding FMA instruction is VFNMSUB<XXX>PS, where XXX could be 132, 213, or 231.
_mm_fnmsub_sd
Multiply-subtracts negated scalar double-precision floating-point values of three float64 vectors. The corresponding FMA instruction is VFNMSUB<XXX>SD, where XXX could be 132, 213, or 231.
_mm_fnmsub_ss
Multiply-subtracts negated scalar single-precision floating-point values of three float32 vectors. The corresponding FMA instruction is VFNMSUB<XXX>SS, where XXX could be 132, 213, or 231.