Intel® oneAPI Math Kernel Library Developer Reference - C
You can use a two-stage algorithm in Inspector-executor Sparse BLAS routines which produce a sparse matrix. The applicable routines are:
The two-stage algorithm allows you to split computations into stages. The main purpose of the splitting is to provide an estimate for the memory required for the output prior to allocating the largest part of the memory (for the indices and values of the non-zero elements). Additionally, the two-stage approach extends the functionality and allows more complex usage models.
In the two-stage algorithm:
Specifying the stage for execution is supported through the sparse_request_t parameter in the API with the following options:
Value | Description |
---|---|
SPARSE_STAGE_NNZ_COUNT | Allocates and computes only the rows_start/rows_end (CSR/BSR format) or cols_start/cols_end (CSC format) arrays for the output matrix. After this stage, by calling mkl_sparse_?_export_<format>, you can obtain the number of non-zeros in the output matrix and calculate the amount of memory required for the output matrix. |
SPARSE_STAGE_FINALIZE_MULT_NO_VAL | Allocates and computes row/column indices provided that rows_start/rows_end or cols_start/cols_end have already been computed in a prior call with the request SPARSE_STAGE_NNZ_COUNT. The values of the output matrix are not computed. |
SPARSE_STAGE_FINALIZE_MULT | Depending on the state of the output matrix C on entry to the routine, this stage does one of the following:
|
SPARSE_STAGE_FULL_MULT_NO_VAL | Allocates and computes the output matrix structure in a single step. The values of the output matrix are not computed. |
SPARSE_STAGE_FULL_MULT | Allocates and computes the entire output matrix (structure and values) in a single step. |
The example below shows how you can use the two-stage approach for estimating the memory requirements for the output matrix in CSR format:
First stage (sparse_request_t = SPARSE_STAGE_NNZ_COUNT)
Note that by the end of the first stage, the arrays associated with column indices and values of the output matrix have not been allocated or computed yet.
sparse_matrix_t csrC = NULL; status = mkl_sparse_sp2m (opA, descrA, csrA, opB, descrB, csrB, SPARSE_STAGE_NNZ_COUNT, &csrC); /* optional calculation of nnz in the output matrix for getting a memory estimate */ status = mkl_sparse_?_export_csr (csrC, &indexing, &nrows, &ncols, &rows_start, &rows_end, &col_indx, &values); MKL_INT nnz = rows_end[nrows-1] - rows_start[0];
Second stage (sparse_request_t = SPARSE_STAGE_FINALIZE_MULT)
This stage allocates and computes the remaining output arrays (associated with column indices and values of output matrix entries) and completes the matrix-matrix multiplication.
status = mkl_sparse_sp2m (opA, descrA, csrA, opB, descrB, csrB, SPARSE_STAGE_FINALIZE_MULT, &csrC);
When the two-stage approach is not needed, you can perform both stages in a single call:
Single stage operation (sparse_request_t = SPARSE_STAGE_FULL_MULT)
status = mkl_sparse_sp2m (opA, descrA, csrA, opB, descrB, csrB, SPARSE_STAGE_FULL_MULT, &csrC);