Two-stage Algorithm in Inspector-Executor Sparse BLAS Routines

You can use a two-stage algorithm in Inspector-executor Sparse BLAS routines which produce a sparse matrix. The applicable routines are:

mkl_sparse_sp2m (BSR/CSR/CSC formats)
mkl_sparse_sypr (CSR format)

The two-stage algorithm allows you to split computations into stages. The main purpose of the splitting is to provide an estimate for the memory required for the output prior to allocating the largest part of the memory (for the indices and values of the non-zero elements). Additionally, the two-stage approach extends the functionality and allows more complex usage models.

Note

The multistage approach currently does not allow you to allocate memory for the output matrix outside oneMKL.

In the two-stage algorithm:

The first stage allocates data which is necessary for the memory estimation (arrays rows_start/rows_end or cols_start/cols_end depending on the format, (see Sparse Matrix Storage Formats) and computes the number of entries or the full structure of the matrix.
Note
The format of the output is decided internally but can be checked using the export functionality mkl_sparse_?_export_<format>.
The second stage allocates data and computes column or row indices (depending on the format) of non-zero elements and/or values of the output matrix.

Specifying the stage for execution is supported through the sparse_request_t parameter in the API with the following options:

Values for sparse_request_t parameter
Value	Description
`SPARSE_STAGE_NNZ_COUNT`	Allocates and computes only the `rows_start`/`rows_end` (CSR/BSR format) or `cols_start`/`cols_end` (CSC format) arrays for the output matrix. After this stage, by calling mkl_sparse_?_export_<format>, you can obtain the number of non-zeros in the output matrix and calculate the amount of memory required for the output matrix.
`SPARSE_STAGE_FINALIZE_MULT_NO_VAL`	Allocates and computes row/column indices provided that `rows_start`/`rows_end` or `cols_start`/`cols_end` have already been computed in a prior call with the request `SPARSE_STAGE_NNZ_COUNT`. The values of the output matrix are not computed.
`SPARSE_STAGE_FINALIZE_MULT`	Depending on the state of the output matrix `C` on entry to the routine, this stage does one of the following: Allocates and computes row/column indices and values of nonzero elements, if only `rows_start`/`rows_end` or `cols_start`/`cols_end` are present allocates and computes values of nonzero elements, if `rows_start`/`rows_end` or `cols_start`/`cols_end` and row/column indices of non-zero elements are present
`SPARSE_STAGE_FULL_MULT_NO_VAL`	Allocates and computes the output matrix structure in a single step. The values of the output matrix are not computed.
`SPARSE_STAGE_FULL_MULT`	Allocates and computes the entire output matrix (structure and values) in a single step.

The example below shows how you can use the two-stage approach for estimating the memory requirements for the output matrix in CSR format:

First stage (sparse_request_t = SPARSE_STAGE_NNZ_COUNT)

The routine mkl_sparse_sp2m is called with the request parameter SPARSE_STAGE_NNZ_COUNT.
The arrays rows_start and rows_end are exported using the mkl_sparse_x_export_csr routine.
These arrays are used to calculate the number of non-zeros (nnz) of the resulting output matrix.

Note that by the end of the first stage, the arrays associated with column indices and values of the output matrix have not been allocated or computed yet.

sparse_matrix_t csrC = NULL;
status = mkl_sparse_sp2m (opA, descrA, csrA, opB, descrB, csrB, SPARSE_STAGE_NNZ_COUNT, &csrC);

/* optional calculation of nnz in the output matrix for getting a memory estimate */

status = mkl_sparse_?_export_csr (csrC, &indexing, &nrows, &ncols, &rows_start, &rows_end, &col_indx, &values);

MKL_INT nnz = rows_end[nrows-1] - rows_start[0];

Second stage (sparse_request_t = SPARSE_STAGE_FINALIZE_MULT)

This stage allocates and computes the remaining output arrays (associated with column indices and values of output matrix entries) and completes the matrix-matrix multiplication.

status = mkl_sparse_sp2m (opA, descrA, csrA, opB, descrB, csrB, SPARSE_STAGE_FINALIZE_MULT, &csrC);

When the two-stage approach is not needed, you can perform both stages in a single call:

Single stage operation (sparse_request_t = SPARSE_STAGE_FULL_MULT)

status = mkl_sparse_sp2m (opA, descrA, csrA, opB, descrB, csrB, SPARSE_STAGE_FULL_MULT, &csrC);