.. _onemkl_blas_dgmm_batch: dgmm_batch ========== Computes a group of (diagonal matrix-matrix product (``dgmm``) operations. Description *********** The ``dgmm_batch`` routines perform multiple diagonal matrix-matrix product (``dgmm``) operations in a single call. The diagonal matrices are stored as dense vectors and the operations are performed with groups of matrices and vectors. ``dgmm_batch`` supports the following precisions: .. list-table:: :header-rows: 1 * - T * - ``float`` * - ``double`` * - ``std::complex`` * - ``std::complex`` dgmm_batch (Buffer Version) --------------------------- Buffer version of ``dgmm_batch`` supports only strided API. **Strided API** --------------- Strided API operation is defined as: .. code-block:: for i = 0 … batch_size – 1 A and C are matrices at offset i * stridea in a, i * stridec in c. X is a vector at offset i * stridex in x if (left_right == side::left) C = diag(X) * A else C = A * diag(X) end for where: - ``A`` is a matrix - ``X`` is a diagonal matrix stored as a vector For strided API, all matrices ``A`` and ``C`` and vector ``X`` have the same parameters (size, increments) and are stored at a constant stride given by ``stridea``, ``stridec`` and ``stridex`` from each other. The ``a`` and ``x`` buffers contain all the input matrices. Total number of matrices in ``a`` and ``x`` are given by ``batch_size`` parameter. Syntax ------ .. code-block:: cpp namespace oneapi::mkl::blas::column_major { void dgmm_batch(sycl::queue &queue, oneapi::mkl::side left_right, std::inte64_t m, std::int64_t n, sycl::buffer &a, std::int64_t lda, std::int64_t stridea, sycl::buffer &x, std::int64_t incx, std::int64_t stridex, sycl::buffer &c, std::int64_t ldc, std::int64_t stridec, std::int64_t batch_size); } .. code-block:: cpp namespace oneapi::mkl::blas::row_major { void dgmm_batch(sycl::queue &queue, oneapi::mkl::side left_right, std::inte64_t m, std::int64_t n, sycl::buffer &a, std::int64_t lda, std::int64_t stridea, sycl::buffer &x, std::int64_t incx, std::int64_t stridex, sycl::buffer &c, std::int64_t ldc, std::int64_t stridec, std::int64_t batch_size); } Input Parameters ---------------- queue The queue where the routine should be executed. left_right Specifies the position of the diagonal matrix in the product. See :ref:`data-types` for more details. m Number of rows of matrix ``A`` and matrix ``C``. Must be at least zero. n Number of columns of matrix ``A`` and matrix ``C``. Must be at least zero. a Buffer holding input matrices ``A``. Size of the buffer must be at least ``lda`` * ``k`` + ``stridea`` * (``batch_size`` - 1) where ``k`` is ``n`` if column major layout or ``m`` if row major layout is used. lda Leading dimension of matrices ``A``. Must be at least ``m`` if column major layout or ``n`` if row major layout is used. Must be positive. stridea Stride between two consecutive ``A`` matrices. Must be at least zero. See :ref:`matrix-storage` for more details. x Buffer holding input matrices ``X``. Size of the buffer must be at least (1 + (``len`` - 1)*abs(``incx``)) + ``stridex`` * (``batch_size`` - 1) where ``len`` is ``n`` if the diagonal matrix is on the right of the product or ``m`` otherwise. incx Stride between two consecutive elements of the ``X`` vectors. stridex Stride between two consecutive ``X`` vectors. Must be at least zero. See :ref:`matrix-storage` for more details. c Buffer holding input/output matrices ``C``. Size of the buffer must be at least ``batch_size`` * ``stridec``. ldc Leading dimension of matrices ``C``. Must be at least ``m`` if column major layout or ``n`` if row major layout is used. Must be positive. stridec Stride between two consecutive ``C`` matrices. Must be at least ``ldc`` * ``n`` if column major layout or ``ldc`` * ``m`` if row major layout is used. See :ref:`matrix-storage` for more details. batch_size Number of ``dgmm`` computations to perform. Must be at least zero. Output Parameters ----------------- c Buffer holding output matrices ``C`` overwritten by ``batch_size`` ``dgmm`` operations. dgmm_batch (USM Version) ************************ USM version of ``dgmm_batch`` supports group API and strided API. **Group API** ------------- Group API operation is defined as: .. code-block:: idx = 0 for i = 0 … group_count – 1 for j = 0 … group_size – 1 A and C are matrices at a[idx] and c[idx] X is a vector at x[idx] if (left_right[idx] == side::left) C = diag(X) * A else C = A * diag(X) idx = idx + 1 end for end for where: - ``A`` is a matrix - ``X`` is a diagonal matrix stored as a vector For group API, each group contain matrices and vectors with the same parameters (size, increment). The ``a`` and ``x`` arrays contain the pointers for all the input matrices. Total number of matrices in ``a`` and ``x`` are given by: .. math:: total\_batch\_count = \sum_{i=0}^{group\_count-1}group\_size[i] Syntax ------ .. code-block:: cpp namespace oneapi::mkl::blas::column_major { sycl::event dgmm_batch(sycl::queue &queue, oneapi::mkl::side *left_right, std::int64_t *m, std::int64_t *n, const T **a, std::int64_t *lda, const T **x, std::int64_t *incx, T **c, std::int64_t *ldc, std::int64_t group_count, std::int64_t *group_size, const std::vector &dependencies = {}) } .. code-block:: cpp namespace oneapi::mkl::blas::row_major { sycl::event dgmm_batch(sycl::queue &queue, oneapi::mkl::side *left_right, std::int64_t *m, std::int64_t *n, const T **a, std::int64_t *lda, const T **x, std::int64_t *incx, T **c, std::int64_t *ldc, std::int64_t group_count, std::int64_t *group_size, const std::vector &dependencies = {}) } Input Parameters ---------------- queue The queue where the routine should be executed. left_right Array of ``group_count`` parameters. ``left_right[i]`` specifies the position of the diagonal matrix in group ``i``. See :ref:`data-types` for more details. m Array of ``group_count`` integers. ``m[i]`` specifies number of rows of ``A`` for every matrix in group ``i``. All entries must be at least zero. n Array of ``group_count`` integers. ``n[i]`` specifies number of columns of ``A`` for every matrix in group ``i``. All entries must be at least zero. a Array of pointers to input matrices ``A`` with size ``total_batch_count``. Size of the array must be at least ``lda[i]`` * ``n[i]`` if column major layout or at least ``lda[i]`` * ``m[i]`` if row major layout is used. See :ref:`matrix-storage` for more details. lda Array of ``group_count`` integers. ``lda[i]`` specifies the leading dimension of ``A`` for every matrix in group ``i``. All entries must be positive and at least ``m[i]`` if column major layout or at least ``n[i]`` if row major layout is used. x Array of pointers to input vectors ``X`` with size ``total_batch_count``. Size of the array must be at least (1 + ``len[i]`` – 1)*abs(``incx[i]``)) where ``len[i]`` is ``n[i]`` if diagonal matrix is on the right of the product or ``m[i]`` otherwise. See :ref:`matrix-storage` for more details. incx Array of ``group_count`` integers. ``incx[i]`` specifies the stride of ``X`` for every vector in group ``i``. All entries must be positive. c Array of pointers to input/output matrices ``C`` with size ``total_batch_count``. Size of the array must be least ``ldc[i]`` * ``n[i]`` if column major layout or at least ``ldc[i]`` * ``m[i]`` if row major layout is used. See :ref:`matrix-storage` for more details. ldc Array of ``group_count`` integers. ``ldc[i]`` specifies the leading dimension of ``C`` for every matrix in group ``i``. All entries must be positive and at least ``m[i]`` if column major layout or at least ``n[i]`` if row major layout is used. group_count Specifies number of groups. Must be at least zero. group_size Array of ``group_count`` integers. ``group_size[i]`` specifies the number of diagonal matrix-matrix product operations in group ``i``. All entries must be at least zero. dependencies List of events to wait for before starting computation, if any. If omitted, defaults to no dependencies. Output Parameters ----------------- c Array of pointers to output matrices ``C`` overwritten by ``total_batch_count`` ``dgmm`` operations. Return Values ------------- Output event to wait on to ensure computation is complete. **Strided API** --------------- Strided API operation is defined as: .. code-block:: for i = 0 … batch_size – 1 A and C are matrices at offset i * stridea in a, i * stridec in c. X is a vector at offset i * stridex in x if (left_right == side::left) C = diag(X) * A else C = A * diag(X) end for where: - ``A`` is a matrix - ``X`` is a diagonal matrix stored as a vector For strided API, all matrices ``A`` and ``C`` and vector ``X`` have the same parameters (size, increments) and are stored at a constant stride given by ``stridea``, ``stridec`` and ``stridex`` from each other. The ``a`` and ``x`` buffers contain all the input matrices. Total number of matrices in ``a`` and ``x`` are given by ``batch_size`` parameter. Syntax ------ .. code-block:: cpp namespace oneapi::mkl::blas::column_major { sycl::event dgmm_batch(sycl::queue &queue, oneapi::mkl::side left_right, std::inte64_t m, std::int64_t n, const T *a, std::int64_t lda, std::int64_t stridea, const T *x, std::int64_t incx, std::int64_t stridex, T *c, std::int64_t ldc, std::int64_t stridec, std::int64_t batch_size, const std::vector &dependencies = {}) } .. code-block:: cpp namespace oneapi::mkl::blas::row_major { sycl::event dgmm_batch(sycl::queue &queue, oneapi::mkl::side left_right, std::inte64_t m, std::int64_t n, const T *a, std::int64_t lda, std::int64_t stridea, const T *x, std::int64_t incx, std::int64_t stridex, T *c, std::int64_t ldc, std::int64_t stridec, std::int64_t batch_size, const std::vector &dependencies = {}) } Input Parameters ---------------- queue The queue where the routine should be executed. left_right Specifies the position of the diagonal matrix in the product. See :ref:`data-types` for more details. m Number of rows of matrix ``A`` and matrix ``C``. Must be at least zero. n Number of columns of matrix ``A`` and matrix ``C``. Must be at least zero. a Pointer to input matrices ``A``. Size of the array must be at least ``lda`` * ``k`` + ``stridea`` * (``batch_size`` - 1) where ``k`` is ``n`` if column major layout or ``m`` if row major layout is used. lda Leading dimension of matrices ``A``. Must be at least ``m`` if column major layout or ``n`` if row major layout is used. Must be positive. stridea Stride between two consecutive ``A`` matrices. Must be at least zero. See :ref:`matrix-storage` for more details. x Pointer to input matrices ``X``. Size of the array must be at least (1 + (``len`` - 1)*abs(``incx``)) + ``stridex`` * (``batch_size`` - 1) where ``len`` is ``n`` if the diagonal matrix is on the right of the product or ``m`` otherwise. incx Stride between two consecutive elements of the ``X`` vectors. stridex Stride between two consecutive ``X`` vectors. Must be at least zero. See :ref:`matrix-storage` for more details. c Pointer to input/output matrices ``C``. Size of the array must be at least ``batch_size`` * ``stridec``. ldc Leading dimension of matrices ``C``. Must be at least ``m`` if column major layout or ``n`` if row major layout is used. Must be positive. stridec Stride between two consecutive ``C`` matrices. Must be at least ``ldc`` * ``n`` if column major layout or ``ldc`` * ``m`` if row major layout is used. See :ref:`matrix-storage` for more details. batch_size Number of ``dgmm`` computations to perform. Must be at least zero. dependencies List of events to wait for before starting computation, if any. If omitted, defaults to no dependencies. Output Parameters ----------------- c Pointer to output matrices ``C`` overwritten by ``batch_size`` ``dgmm`` operations. Return Values ------------- Output event to wait on to ensure computation is complete.