.. _onemkl_blas_dgmm_batch:

dgmm_batch
==========

Computes a group of (diagonal matrix-matrix product (``dgmm``) operations.

Description
***********

The ``dgmm_batch`` routines perform multiple diagonal matrix-matrix product (``dgmm``) operations in a single call. 
The diagonal matrices are stored as dense vectors and the operations are performed with groups of matrices and vectors.

``dgmm_batch`` supports the following precisions:

.. list-table::
   :header-rows: 1

   * -  T
   * -  ``float``
   * -  ``double``
   * -  ``std::complex<float>``
   * -  ``std::complex<double>``

dgmm_batch (Buffer Version)
---------------------------

Buffer version of ``dgmm_batch`` supports only strided API. 

**Strided API**
---------------

Strided API operation is defined as:

.. code-block::

   for i = 0 … batch_size – 1
       A and C are matrices at offset i * stridea in a, i * stridec in c.
       X is a vector at offset i * stridex in x 
       if (left_right == side::left)
           C = diag(X) * A
       else
           C = A * diag(X)
   end for

where:

- ``A`` is a matrix

- ``X`` is a diagonal matrix stored as a vector

For strided API, all matrices ``A`` and ``C`` and vector ``X`` have the same parameters (size, increments) and are stored at a constant stride given by ``stridea``, ``stridec`` and ``stridex`` from each other.

The ``a`` and ``x`` buffers contain all the input matrices. Total number of matrices in ``a`` and ``x`` are given by ``batch_size`` parameter.

Syntax
------

.. code-block:: cpp

   namespace oneapi::mkl::blas::column_major {
       void dgmm_batch(sycl::queue &queue,
                       oneapi::mkl::side left_right,
                       std::inte64_t m, 
                       std::int64_t n,
                       sycl::buffer<T,1> &a, 
                       std::int64_t lda, 
                       std::int64_t stridea,
                       sycl::buffer<T,1> &x, 
                       std::int64_t incx, 
                       std::int64_t stridex,
                       sycl::buffer<T,1> &c, 
                       std::int64_t ldc, 
                       std::int64_t stridec,
                       std::int64_t batch_size);
   }

.. code-block:: cpp

   namespace oneapi::mkl::blas::row_major {
       void dgmm_batch(sycl::queue &queue,
                       oneapi::mkl::side left_right,
                       std::inte64_t m, 
                       std::int64_t n,
                       sycl::buffer<T,1> &a, 
                       std::int64_t lda, 
                       std::int64_t stridea,
                       sycl::buffer<T,1> &x, 
                       std::int64_t incx, 
                       std::int64_t stridex,
                       sycl::buffer<T,1> &c, 
                       std::int64_t ldc, 
                       std::int64_t stridec,
                       std::int64_t batch_size);
   }

Input Parameters
----------------

queue
   The queue where the routine should be executed.

left_right
   Specifies the position of the diagonal matrix in the product. See :ref:`data-types` for more details.

m
   Number of rows of matrix ``A`` and matrix ``C``. Must be at least zero.

n
   Number of columns of matrix ``A`` and matrix ``C``. Must be at least zero.

a
   Buffer holding input matrices ``A``. Size of the buffer must be at least ``lda`` * ``k`` + ``stridea`` * (``batch_size`` - 1) where ``k`` is ``n`` if column major layout or ``m`` if row major layout is used.

lda
   Leading dimension of matrices ``A``. Must be at least ``m`` if column major layout or ``n`` if row major layout is used. Must be positive. 

stridea
   Stride between two consecutive ``A`` matrices. Must be at least zero. See :ref:`matrix-storage` for more details.

x
   Buffer holding input matrices ``X``. Size of the buffer must be at least (1 + (``len`` - 1)*abs(``incx``)) + ``stridex`` * (``batch_size`` - 1) where ``len`` is ``n`` if the diagonal matrix is on the right of the product or ``m`` otherwise.

incx
   Stride between two consecutive elements of the ``X`` vectors.

stridex
   Stride between two consecutive ``X`` vectors.  Must be at least zero. See :ref:`matrix-storage` for more details.

c
   Buffer holding input/output matrices ``C``. Size of the buffer must be at least ``batch_size`` * ``stridec``.

ldc
   Leading dimension of matrices ``C``. Must be at least ``m`` if column major layout or ``n`` if row major layout is used. Must be positive.

stridec
   Stride between two consecutive ``C`` matrices. Must be at least ``ldc`` * ``n`` if column major layout or ``ldc`` * ``m`` if row major layout is used. See :ref:`matrix-storage` for more details.

batch_size
   Number of ``dgmm`` computations to perform. Must be at least zero.

Output Parameters
-----------------

c
   Buffer holding output matrices ``C`` overwritten by ``batch_size`` ``dgmm`` operations.


dgmm_batch (USM Version)
************************

USM version of ``dgmm_batch`` supports group API and strided API.

**Group API**
-------------

Group API operation is defined as:

.. code-block::

   idx = 0
   for i = 0 … group_count – 1
        for j = 0 … group_size – 1
            A and C are matrices at a[idx] and c[idx]
            X is a vector at x[idx] 
            if (left_right[idx] == side::left)
                C = diag(X) * A
            else
                C = A * diag(X)
            idx = idx + 1
        end for
   end for

where:

- ``A`` is a matrix

- ``X`` is a diagonal matrix stored as a vector

For group API, each group contain matrices and vectors with the same parameters (size, increment). 
The ``a`` and ``x`` arrays contain the pointers for all the input matrices. Total number of matrices in ``a`` and ``x`` are given by:

.. math::

      total\_batch\_count = \sum_{i=0}^{group\_count-1}group\_size[i]

Syntax
------

.. code-block:: cpp

   namespace oneapi::mkl::blas::column_major {
       sycl::event dgmm_batch(sycl::queue &queue,
                              oneapi::mkl::side *left_right,
                              std::int64_t *m,
                              std::int64_t *n,
                              const T **a,
                              std::int64_t *lda,
                              const T **x,
                              std::int64_t *incx,
                              T **c,
                              std::int64_t *ldc,
                              std::int64_t group_count,
                              std::int64_t *group_size,
                              const std::vector<sycl::event> &dependencies = {})
   }
.. code-block:: cpp

   namespace oneapi::mkl::blas::row_major {
       sycl::event dgmm_batch(sycl::queue &queue,
                              oneapi::mkl::side *left_right,
                              std::int64_t *m,
                              std::int64_t *n,
                              const T **a,
                              std::int64_t *lda,
                              const T **x,
                              std::int64_t *incx,
                              T **c,
                              std::int64_t *ldc,
                              std::int64_t group_count,
                              std::int64_t *group_size,
                              const std::vector<sycl::event> &dependencies = {})
   }

Input Parameters
----------------

queue
   The queue where the routine should be executed.

left_right
   Array of ``group_count`` parameters. ``left_right[i]`` specifies the position of the diagonal matrix in group ``i``.
   See :ref:`data-types` for more details.

m
   Array of ``group_count`` integers. ``m[i]`` specifies 
   number of rows of ``A`` for every matrix in group ``i``. All entries must be at least zero.

n
   Array of ``group_count`` integers. ``n[i]`` specifies 
   number of columns of ``A`` for every matrix in group ``i``. All entries must be at least zero.

a
   Array of pointers to input matrices ``A`` with size
   ``total_batch_count``.  Size of the array must be at least ``lda[i]`` * ``n[i]`` if
   column major layout or at least ``lda[i]`` * ``m[i]`` if row major layout is used.
   See :ref:`matrix-storage` for more details.

lda
   Array of ``group_count`` integers. ``lda[i]`` specifies the
   leading dimension of ``A`` for every matrix in group ``i``. All
   entries must be positive and at least ``m[i]`` if column major
   layout or at least ``n[i]`` if row major layout is used.

x
   Array of pointers to input vectors ``X`` with size
   ``total_batch_count``.  Size of the array must be at least (1 + ``len[i]`` –
   1)*abs(``incx[i]``)) where ``len[i]`` is ``n[i]`` if diagonal matrix is on the
   right of the product or ``m[i]`` otherwise.
   See :ref:`matrix-storage` for more details.

incx
   Array of ``group_count`` integers. ``incx[i]`` specifies the
   stride of ``X`` for every vector in group ``i``. All entries
   must be positive.
c
   Array of pointers to input/output matrices ``C`` with size ``total_batch_count``. 
   Size of the array must be least ``ldc[i]`` * ``n[i]`` if column major layout or at least
   ``ldc[i]`` * ``m[i]`` if row major layout is used.
   See :ref:`matrix-storage` for more details.

ldc
   Array of ``group_count`` integers. ``ldc[i]`` specifies the
   leading dimension of ``C`` for every matrix in group ``i``.  All
   entries must be positive and at least
   ``m[i]`` if column major layout or at least ``n[i]`` if row major layout is used.

group_count
   Specifies number of groups. Must be at least zero.

group_size
   Array of ``group_count`` integers. ``group_size[i]`` specifies the
   number of diagonal matrix-matrix product operations in group ``i``.
   All entries must be at least zero.

dependencies
      List of events to wait for before starting computation, if any.
      If omitted, defaults to no dependencies.

Output Parameters
-----------------

c
   Array of pointers to output matrices ``C`` overwritten by ``total_batch_count`` ``dgmm`` operations.

Return Values
-------------

Output event to wait on to ensure computation is complete.


**Strided API**
---------------

Strided API operation is defined as:

.. code-block::

   for i = 0 … batch_size – 1
       A and C are matrices at offset i * stridea in a, i * stridec in c.
       X is a vector at offset i * stridex in x 
       if (left_right == side::left)
           C = diag(X) * A
       else
           C = A * diag(X)
   end for

where:

- ``A`` is a matrix

- ``X`` is a diagonal matrix stored as a vector

For strided API, all matrices ``A`` and ``C`` and vector ``X`` have the same parameters (size, increments) and are stored at a constant stride given by ``stridea``, ``stridec`` and ``stridex`` from each other.

The ``a`` and ``x`` buffers contain all the input matrices. Total number of matrices in ``a`` and ``x`` are given by ``batch_size`` parameter.

Syntax
------

.. code-block:: cpp

   namespace oneapi::mkl::blas::column_major {
       sycl::event dgmm_batch(sycl::queue &queue,
                              oneapi::mkl::side left_right,
                              std::inte64_t m, 
                              std::int64_t n,
                              const T *a, 
                              std::int64_t lda, 
                              std::int64_t stridea,
                              const T *x, 
                              std::int64_t incx, 
                              std::int64_t stridex,
                              T *c, 
                              std::int64_t ldc, 
                              std::int64_t stridec,
                              std::int64_t batch_size,
                              const std::vector<sycl::event> &dependencies = {})
   }

.. code-block:: cpp

   namespace oneapi::mkl::blas::row_major {
       sycl::event dgmm_batch(sycl::queue &queue,
                              oneapi::mkl::side left_right,
                              std::inte64_t m, 
                              std::int64_t n,
                              const T *a, 
                              std::int64_t lda, 
                              std::int64_t stridea,
                              const T *x, 
                              std::int64_t incx, 
                              std::int64_t stridex,
                              T *c, 
                              std::int64_t ldc, 
                              std::int64_t stridec,
                              std::int64_t batch_size,
                              const std::vector<sycl::event> &dependencies = {})
   }

Input Parameters
----------------

queue
   The queue where the routine should be executed.

left_right
   Specifies the position of the diagonal matrix in the product. See :ref:`data-types` for more details.

m
   Number of rows of matrix ``A`` and matrix ``C``. Must be at least zero.

n
   Number of columns of matrix ``A`` and matrix ``C``. Must be at least zero.

a
   Pointer to input matrices ``A``. Size of the array must be at least ``lda`` * ``k`` + ``stridea`` * (``batch_size`` - 1) where ``k`` is ``n`` if column major layout or ``m`` if row major layout is used.

lda
   Leading dimension of matrices ``A``. Must be at least ``m`` if column major layout or ``n`` if row major layout is used. Must be positive. 

stridea
   Stride between two consecutive ``A`` matrices. Must be at least zero. See :ref:`matrix-storage` for more details.

x
   Pointer to input matrices ``X``. Size of the array must be at least (1 + (``len`` - 1)*abs(``incx``)) + ``stridex`` * (``batch_size`` - 1) where ``len`` is ``n`` if the diagonal matrix is on the right of the product or ``m`` otherwise.

incx
   Stride between two consecutive elements of the ``X`` vectors.

stridex
   Stride between two consecutive ``X`` vectors.  Must be at least zero. See :ref:`matrix-storage` for more details.

c
   Pointer to input/output matrices ``C``. Size of the array must be at least ``batch_size`` * ``stridec``.

ldc
   Leading dimension of matrices ``C``. Must be at least ``m`` if column major layout or ``n`` if row major layout is used. Must be positive.

stridec
   Stride between two consecutive ``C`` matrices. Must be at least ``ldc`` * ``n`` if column major layout or ``ldc`` * ``m`` if row major layout is used. See :ref:`matrix-storage` for more details.

batch_size
   Number of ``dgmm`` computations to perform. Must be at least zero.

dependencies
   List of events to wait for before starting computation, if any. If omitted, defaults to no dependencies.

Output Parameters
-----------------

c
   Pointer to output matrices ``C`` overwritten by ``batch_size`` ``dgmm`` operations.

Return Values
-------------

Output event to wait on to ensure computation is complete.