Load Intrinsics

The prototypes for Intel® Streaming SIMD Extensions (Intel® SSE) intrinsics for load operations are in the xmmintrin.h header file.

The results of each intrinsic operation are placed in a register. This register is illustrated for each intrinsic with R0-R3. R0, R1, R2 and R3 each represent one of the four 32-bit pieces of the result register.

Intrinsic Name	Operation	Corresponding Intel® SSE Instructions
_mm_loadh_pi	Load high	MOVHPS reg, mem
_mm_loadl_pi	Load low	MOVLPS reg, mem
_mm_load_ss	Load the low value and clear the three high values	MOVSS
_mm_load1_ps	Load one value into all four words	MOVSS + Shuffling
_mm_load_ps	Load four values, address aligned	MOVAPS
_mm_loadu_ps	Load four values, address unaligned	MOVUPS
_mm_loadr_ps	Load four values in reverse	MOVAPS + Shuffling

__m128 _mm_loadh_pi(__m128 a, __m64 const *p)

Sets the upper two SP FP values with 64 bits of data loaded from the address p.

R0	R1	R2	R3
a0	a1	*p0	*p1

__m128 _mm_loadl_pi(__m128 a, __m64 const *p)

Sets the lower two SP FP values with 64 bits of data loaded from the address p; the upper two values are passed through from a.

R0	R1	R2	R3
*p0	*p1	a2	a3

__m128 _mm_load_ss(float * p )

Loads an SP FP value into the low word and clears the upper three words.

R0	R1	R2	R3
*p	0.0	0.0	0.0

__m128 _mm_load1_ps(float * p)

Loads a single SP FP value, copying it into all four words.

R0	R1	R2	R3
*p	*p	*p	*p

__m128 _mm_load_ps(float * p )

Loads four SP FP values. The address must be 16-byte-aligned.

R0	R1	R2	R3
p[0]	p[1]	p[2]	p[3]

__m128 _mm_loadu_ps(float * p)

Loads four SP FP values. The address need not be 16-byte-aligned.

R0	R1	R2	R3
p[0]	p[1]	p[2]	p[3]

__m128 _mm_loadr_ps(float * p)

Loads four SP FP values in reverse order. The address must be 16-byte-aligned.

R0	R1	R2	R3
p[3]	p[2]	p[1]	p[0]