PsychoacousticModelTwo

Implements ISO/IEC 11172-3 psychoacoustic model recommendation 2 to estimate masked threshold and perceptual entropy associated with a block of PCM audio input.

Syntax

IppStatus ippsPsychoacousticModelTwo_MP3_16s(const Ipp16s* pSrcPcm, IppMP3PsychoacousticModelTwoAnalysis* pDstPsyInfo, int* pDstIsSfbBound, IppMP3SideInfo* pDstSideInfo, IppMP3FrameHeader* pFrameHeader, IppMP3PsychoacousticModelTwoState* pFramePsyState, Ipp32s* pWorkBuffer, int pcmMode);

Parameters

pSrcPcm	Pointer to the start of the buffer containing the input PCM audio vector, the samples of which must conform to the following format specification:16-bits per sample, signed, little-endian, Q15. The buffer must contain 1152 samples, that is, two granules of 576 samples each, if the parameter pFrameHeader -> mode has the value 1 (mono), or 2304 samples, that is, two granules of 576 samples each, if the parameter pFrameHeader -> mode has the value of 2 (stereo, dual mono). In the stereophonic case, the PCM samples associated with the left and right channels should be organized according to the pcmMode flag. Failure to satisfy any of the above PCM format and/or buffer requirements results in undefined model outputs.
pDstPsychoInfo	Pointer to the first element in a set of PsychoacousticModelTwoAnalysis structures. Each set member contains the MSR and PE estimates for one granule. The number of elements in the set is equal to the number of channels, with the outputs arranged as follows: (Analysis[0] = granule 1, channel 1), (...Analysis[1] = granule 1, channel 2), (...Analysis[2] = granule 2, channel 1), (...Analysis[3] = granule 2, channel 2).
pDstIsSfbBound	If intensity coding has been enabled, pDstIsSfbBound points to the list of SFB lower bounds above which all spectral coefficients should be processed by the joint stereo intensity coding module. Since the intensity coding SFB lower bound is block-specific, the number of valid elements pointed to by pDstIsSfbBound varies depending upon the individual block types associated with each granule. In particular, the list of SFB bounds is indexed as follows: pIsSfbBound[3* gr] for long block granules pIsSfbBound[3* gr + w ] for short block granules, where gr is the granule index (0 indicates granule 1 and 1 indicates granule 2), and w is the block index (0 indicates block 1, 1 indicates block 2, 2 indicates block 3).
	For example, given short-block analysis in granule 1 followed by long block analysis in granule 2, the list of SFB bounds would be generated in the following order: pIsSfbBound[] = {granule 1/block 1, granule 1/block 2, granule 1/block 2, granule 2/long block}.
	Only one SFB lower bound decision is generated for long block granules, whereas three are generated for short block granules. If both MS and intensity coding are enabled, then the SFB intensity coding lower bound simultaneously represents the upper bound SFB for MS coding. If only MS coding has been enabled, then the SFB bound represents the lowest non-MS SFB.
pDstSideInfo	Pointer to the updated set of IppMP3SideInfo structures associated with all granules and channels. The model updates the following fields in all set elements: blockType, winSwitch, and mixedBlock. The number of elements in the set is equal to 2 times the number of channels. Ordering of the set elements is the same as pDstPsychoInfo.
pFrameHeader	Pointer to the updated IppMP3FrameHeader structure that contains the header associated with the current frame. The model updates the element modeExt to reflect the joint stereo coding mode decision. No other frame header fields are modified by this function.
pFramePsyState	Pointer to the first element in a set of IppMP3PsychoacousticModelTwoState> structures that contains the updated psychoacoustic model state information associated with both the current frame and next frame. The number of elements in the set is equal to the number of channels contained in the input audio. That is, a separate analysis is carried for each channel.
	Prior to encoding a new audio stream, all elements of the psychoacoustic model state structure pPsychoacousticModelState should be initialized to contain the value 0.
	In the signal processing domain, this could be accomplished using the function ippsZero_16s as follows: ippsZero_16s ((Ipp16s *) pPsychoacousticModelState, sizeof(IppMP3PsychoacousticModelTwoState)/sizeof(Ipp16s)).
pWorkBuffer	Pointer to the workspace buffer internally used by the psychoacoustic model for storage of intermediate results and other temporary data. The buffer length must be at least 25,200 bytes, that is, 6300 elements of type Ipp32s.
pcmMode	PCM mode flag. Communicates the psychoacoustic model which type of PCM vector organization to expect:pcmMode = 1 denotes non-interleaved PCM input samples, that is, pSrcPcm[0..1151] contains the input samples associated with the left channel, and pSrcPcm[1152..2303] contains the input samples associated with the right channel.pcmMode = 2 denotes interleaved PCM input samples, that is, pSrcPcm[2* i] and pSrcPcm[2 *i+1] contain the samples associated with the left and right channels, respectively, where i= 0,1,...,1151.
	You can also use appropriately typecast elements ippMP3NonInterleavedPCM and ippMP3InterleavedPCM of the enumerated type IppMP3PcmMode as an alternative to the constants 1 and 2 for pcmMode.

Description

The function is declared in the ippac.h file. This function implements the ISO/IEC 11172-3 psychoacoustic model recommendation 2 to estimate the masked threshold and perceptual entropy associated with a block of PCM audio input. Quantization process uses model outputs to estimate a perceptually optimal bit allocation for the spectral coefficients generated by the analysis filterbank. The psychoacoustic model also controls stereophonic MS/intensity mode selection and processing as well as analysis filterbank block size switching. Given one frame of PCM input audio of 1152 samples per channel, that is, two granules of 576 samples each, the psychoacoustic model generates the following outputs:

Estimated SFB (scale factor band) Mask-to-Signal ratios (MSRs). The model generates a vector of estimated MSRs for the 21 SFBs in long block mode and 12 SFBs for each of three consecutive blocks in short block mode. The MSR is derived from the masked threshold, which quantifies the simultaneous masking power associated with one granule/channel (576 samples) of input audio. Given the properties of the audio stimulus presented to the listener, this threshold essentially quantifies the granule-instantaneous modified threshold of hearing. Ideally, the threshold estimate should provide a frequency-dependent intensity (dB SPL) profile beneath which an average listener cannot perceive quantization noise or, for that matter, any other spectral energy. To estimate the masked threshold from a block of input audio, the function ippsPsych_MP3_16s implements the procedure recommended in Annex D.2 of ISO/IEC 11172-3. First, the output of a classical FFT-based spectral analysis is grouped into threshold calculation partitions that are organized to achieve analysis with sub-critical bandwidth resolution. On each threshold calculation partition, the model employs a weighted estimate of tone-like or noise-like signal behavior determined by an assessment of a spectral unpredictability across time to estimate masking power in each partition. Second, a spreading function is applied to model the spectral selectivity of the auditory system. Finally, the estimated threshold is compared against the absolute threshold of hearing in quiet and the maximum of the two is assigned to the threshold calculation partition. Ultimately, in order to match its output to the bit allocation scheme of the quantization module, the model converts from the threshold calculation partition scale to a scale factor band (SFB) scale. One set of 21 SFB thresholds is generated for long blocks (576 samples), or three consecutive blocks of 12 SFB thresholds are generated for short blocks (192 samples). To facilitate efficient quantization, the SFB thresholds are inverted and normalized by the signal energy and returned in a vector of SFB Mask-to-Signal ratios (MSRs). The estimated MSRs are returned in the PsychoacousticModelTwoAnalysis structure.
Estimated perceptual entropy. The model generates a perceptual entropy (PE) estimate for each granule. The PE quantifies the minimum number of bits required to represent the PCM samples of the granule with “perceptual transparency”. That is, without audible loss of quality for an average listener in comparison to the original, uncoded version. The estimated PE is derived from the masked threshold, in combination with classical assumptions about the minimum number of bits required to achieve a particular signal-to-noise ratio (SNR) target in each SFB, incremental per bit SNR improvement = +6 dB, where the minimum required SNR and hence minimum required number of bits for each SFB is derived from the signal-to-mask ratio (SMR). Perceptual entropy is used to control analysis filterbank block size switching, since sudden large PE increases are often associated with transient audio events that are prone to pre-echo distortion. The PE estimate is returned in the PsychoacousticModelTwoAnalysis structure.
Analysis filterbank block size decision. Using perceptual entropy and other indicators, the model determines whether or not the current granule is susceptible to pre-echo distortion. You should prefer using the short block mode when pre-echoes are likely and use long blocks in all other cases. In order to ensure selection of the appropriate block type, the decision incorporates single block look ahead switching logic. For example, if the current block type is long and the next block type is short, the current block type is changed from block type “long/normal” to block type “long/start” in order to guarantee seamless block processing upon mode switch. Similarly, if the current block type has been designated as “long/stop” and the next block type is determined to be “short”, the block switching logic changes the current block from “long/stop” back to “short” in order to avoid unnecessary mode switching. The block type decision is returned in the frame/granule IppMP3SideInfo structure.
Joint stereophonic processing mode decision. For 2-channel audio sources, the model evaluates interchannel correlations and other indicators in order to generate joint stereo LR/MS and/or intensity processing mode decisions. The joint stereo mode decision is returned in the modeExt field of the IppMP3FrameHeader structure.
Intensity stereo coding SFB bound decision. If intensity coding has been activated (see the preceding item joint stereophonic processing mode decision), the psychoacoustic model determines an appropriate lower SFB bound above which all spectral coefficients should be encoded using intensity mode stereophonic processing.

The psychoacoustic model performs analysis on a frame basis (1152 samples per channel), including two granules and up to two channels for either stereophonic or dual mono inputs. Valid lengths for both input and output vectors depend upon which mono or stereo channel modes have been enabled.

Return Values

ippStsNoErr	Indicates no error.
ippStsNullPtrErr	Indicates an error when at least one of the pointers pSrcPcm, pDstPsyInfo, pDstSideInfo, pDstIsSfbBound, pFrameHeader, pDstPsyState, or pWorkBuffer is NULL.