.. _onemkl_blas_syrk_batch:

syrk_batch
==========

Computes a group of ``syrk`` operations.

Description
***********

The ``syrk_batch`` routines are batched versions of :ref:`onemkl_blas_syrk`, performing multiple ``syrk`` operations in a single call. Each ``syrk`` operation performs a rank-k update with general matrices.

``syrk_batch`` supports the following precisions:

.. list-table::
   :header-rows: 1

   * -  T
   * -  ``float``
   * -  ``double``
   * -  ``std::complex<float>``
   * -  ``std::complex<double>``


syrk_batch (Buffer Version)
---------------------------

Buffer version of ``syrk_batch`` supports only strided API.

**Strided API**
---------------

Strided API operation is defined as:

.. code-block::

   for i = 0 … batch_size – 1
       A and C are matrices at offset i * stridea and i * stridec in a and c.
       C = alpha * op(A) * op(A)^T + beta * C
   end for

where:

- op(``X``) is one of op(``X``) = ``X``, or op(``X``) = ``X``\ :sup:`T`, or op(``X``) = ``X``\ :sup:`H`

- ``alpha`` and ``beta`` are scalars

- ``A`` is general matrix and ``C`` is symmetric matrix

- op(``A``) is ``n`` x ``k`` and ``C`` is ``n`` x ``n``

For strided API, ``a`` and ``c`` buffers contain all the input matrices. The stride between matrices is given by the stride parameters. Total number of matrices in ``a`` and ``c`` buffers is given by ``batch_size`` parameter.

Syntax
------

.. code-block:: cpp

   namespace oneapi::mkl::blas::column_major {
      void syrk_batch(sycl::queue &queue,
                      oneapi::mkl::uplo upper_lower,
                      oneapi::mkl::transpose trans,
                      std::int64_t n,
                      std::int64_t k,
                      T alpha,
                      sycl::buffer<T,1> &a,
                      std::int64_t lda,
                      std::int64_t stridea,
                      T beta,
                      sycl::buffer<T,1> &c,
                      std::int64_t ldc,
                      std::int64_t stridec,
                      std::int64_t batch_size)
   }

.. code-block:: cpp

   namespace oneapi::mkl::blas::row_major {
      void syrk_batch(sycl::queue &queue,
                      oneapi::mkl::uplo upper_lower,
                      oneapi::mkl::transpose trans,
                      std::int64_t n,
                      std::int64_t k,
                      T alpha,
                      sycl::buffer<T,1> &a,
                      std::int64_t lda,
                      std::int64_t stridea,
                      T beta,
                      sycl::buffer<T,1> &c,
                      std::int64_t ldc,
                      std::int64_t stridec,
                      std::int64_t batch_size)
   }

Input Parameters
----------------

queue
   The queue where the routine should be executed.

upper_lower
   Specifies whether matrices ``C`` are upper or lower triangular. See :ref:`data-types` for more details.

trans
   Specifies op(``A``), transposition operation applied to matrices ``A``. 
   Conjugation is never performed even if ``trans`` = ``transpose::conjtrans``. See :ref:`data-types` for more details.

n
   Number of rows and columns of matrices ``C``. Must be at least zero.

k
   Number of columns of matrices op(``A``). Must be at least zero.

alpha
   Scaling factor for rank-k update.

a
   Buffer holding input matrices ``A``. Size of the buffer must be at least ``stridea`` * ``batch_size``.

lda
   Leading dimension of matrices ``A``. Must be positive. 

   .. list-table::
      :header-rows: 1
  
      * -
        - ``transa`` = ``transpose::nontrans``
        - ``transa`` = ``transpose::trans`` or ``trans`` = ``transpose::conjtrans``
      * - Column major
        - Must be at least ``n``
        - Must be at least ``k``
      * - Row major
        - Must be at least ``k``
        - Must be at least ``n``

stridea
   Stride between two consecutive ``A`` matrices. 

   .. list-table::
      :header-rows: 1
  
      * -
        - ``transa`` = ``transpose::nontrans``
        - ``transa`` = ``transpose::trans`` or ``trans`` = ``transpose::conjtrans``
      * - Column major
        - Must be at least ``lda`` * ``k``
        - Must be at least ``lda`` * ``n``
      * - Row major
        - Must be at least ``lda`` * ``n``
        - Must be at least ``lda`` * ``k``

beta
   Scaling factor for matrices ``C``.

c
   Buffer holding input/output matrices ``C``. Size of the buffer must be at least ``stridec`` * ``batch_size``.

ldc
   Leading dimension of matrices ``C``. Must be positive and at least ``n``.

stridec
   Stride between  two consecutive ``C`` matrices. Must be least ``ldc`` * ``n``.

batch_size
   Specifies the number of matrix multiply operations to perform.

Output Parameters
-----------------

c
   Output buffer overwritten by ``batch_size`` ``syrk`` operations of the form ``alpha`` * op(``A``) * op(``A``)\ :sup:`T` + ``beta`` * ``C``.


syrk_batch (USM Version)
------------------------

USM version of ``syrk_batch`` supports group API and strided API.

**Group API**
-------------

Group API operation is defined as:

.. code-block::

   idx = 0
   for i = 0 … group_count – 1
       for j = 0 … group_size – 1
           A, and C are matrices in a[idx] and c[idx]
           C = alpha[i] * op(A) * op(A)^T + beta[i] * C
           idx := idx + 1
       end for
   end for

where:

- op(``X``) is one of op(``X``) = ``X``, or op(``X``) = ``X``\ :sup:`T`, or op(``X``) = ``X``\ :sup:`H`

- ``alpha`` and ``beta`` are scalars

- ``A`` is general matrix and ``C`` is symmetric matrix

- op(``A``) is ``n`` x ``k`` and ``C`` is ``n`` x ``n``

For group API, ``a`` and ``c`` arrays contain the pointers for all the input matrices. 
The total number of matrices in ``a`` and ``c`` are given by: 

.. math::

      total\_batch\_count = \sum_{i=0}^{group\_count-1}group\_size[i]    

Syntax
------

.. code-block:: cpp

   namespace oneapi::mkl::blas::column_major {
       sycl::event syrk_batch(sycl::queue &queue,
                              oneapi::mkl::uplo *upper_lower,
                              oneapi::mkl::transpose *trans,
                              std::int64_t *n,
                              std::int64_t *k,
                              T *alpha,
                              const T **a,
                              std::int64_t *lda,
                              T *beta,
                              T **c,
                              std::int64_t *ldc,
                              std::int64_t group_count,
                              std::int64_t *group_size,
                              const std::vector<sycl::event> &dependencies = {})
   }

.. code-block:: cpp

   namespace oneapi::mkl::blas::row_major {
       sycl::event syrk_batch(sycl::queue &queue,
                              oneapi::mkl::uplo *upper_lower,
                              oneapi::mkl::transpose *trans,
                              std::int64_t *n,
                              std::int64_t *k,
                              T *alpha,
                              const T **a,
                              std::int64_t *lda,
                              T *beta,
                              T **c,
                              std::int64_t *ldc,
                              std::int64_t group_count,
                              std::int64_t *group_size,
                              const std::vector<sycl::event> &dependencies = {})
   }

Input Parameters
----------------

queue
   The queue where the routine should be executed.

upper_lower
   Array of ``group_count`` ``oneapi::mkl::uplo`` values. ``upper_lower[i]`` specifies whether matrices ``C`` are 
   upper or lower triangular in group ``i``. See :ref:`data-types` for more details.

trans
   Array of ``group_count`` ``oneapi::mkl::transpose`` values. ``trans[i]`` specifies op(``A``), transposition operation 
   applied to matrices ``A`` in group ``i``. See :ref:`data-types` for more details.

n
   Array of ``group_count`` integers. ``n[i]`` specifies number of rows and columns of matrices ``C`` in group ``i``. All entries must be at least zero.

k
   Array of ``group_count`` integers. ``k[i]`` specifies number of columns of matrices op(``A``) in group ``i``. All entries must be at least zero.

alpha
   Array of ``group_count`` scalar elements.  ``alpha[i]`` specifies scaling factor for every rank-k update in group ``i``. 

a
   Array of ``total_batch_count`` pointers for input matrices ``A``. See :ref:`matrix-storage` for more details.

   .. list-table::
      :header-rows: 1
     
      * -
        - ``trans`` = ``transpose::nontrans``
        - ``trans`` = ``transpose::trans`` or ``trans`` = ``transpose::conjtrans``
      * - Column major
        - Size of array ``A[i]`` must be at least ``lda[i]`` * ``k[i]``
        - Size of array ``A[i]`` must be at least ``lda[i]`` * ``n[i]``
      * - Row major
        - Size of array ``A[i]`` must be at least ``lda[i]`` * ``n[i]``
        - Size of array ``A[i]`` must be at least ``lda[i]`` * ``k[i]``

lda
   Array of ``group_count`` integers. ``lda[i]`` specifies leading dimension of matrices ``A`` in group ``i``. Must be positive.

   .. list-table::
      :header-rows: 1

      * -
        - ``trans`` = ``transpose::nontrans``
        - ``trans`` = ``transpose::trans`` or ``trans`` = ``transpose::conjtrans``
      * - Column major
        - Must be at least ``n[i]``.
        - Must be at least ``k[i]``.
      * - Row major
        - Must be at least ``k[i]``.
        - Must be at least ``n[i]``.

beta
   Array of ``group_count`` scalar elements. ``beta[i]`` specifies scaling factor for matrices ``C`` in group ``i``.

c
   Array of ``total_batch_count`` pointers for input/output matrices ``C``. Size of array ``C[i]`` 
   must be at least ``ldc[i]`` * ``n[i]``. See :ref:`matrix-storage` for more details.

ldc
   Array of ``group_count`` integers. ``ldc[i]`` specifies leading dimension of matrices ``C`` in group ``i``. Must be positive.

group_count
   Number of groups. Must be at least zero.

group_size
   Array of ``group_count`` integers. ``group_size[i]`` specifies the number of ``syrk`` operations in group ``i``. Each element in ``group_size`` must be at least zero.

dependencies
   List of events to wait for before starting computation, if any. If omitted, defaults to no dependencies.

Output Parameters
-----------------

c
   Array of pointers to output matrices ``C`` overwritten by ``total_batch_count`` ``syrk`` operations of the form ``alpha`` * op(``A``) * op(``A``)\ :sup:`T` + ``beta`` * ``C``. 

Return Values
-------------

Output event to wait on to ensure computation is complete.


**Strided API**
---------------

Strided API operation is defined as:

.. code-block::

   for i = 0 … batch_size – 1
       A and C are matrices at offset i * stridea and i * stridec in a and c.
       C = alpha * op(A) * op(A)^T + beta * C
   end for

where:

- op(``X``) is one of op(``X``) = ``X``, or op(``X``) = ``X``\ :sup:`T`, or op(``X``) = ``X``\ :sup:`H`

- ``alpha`` and ``beta`` are scalars

- ``A`` is general matrix and ``C`` is symmetric matrix

- op(``A``) is ``n`` x ``k`` and ``C`` is ``n`` x ``n``

For strided API, ``a`` and ``c`` arrays contain all the input matrices. The stride between matrices is given by the stride parameters. Total number of matrices in ``a`` and ``c`` arrays is given by ``batch_size`` parameter.

Syntax
------

.. code-block:: cpp

   namespace oneapi::mkl::blas::column_major {
      sycl::event syrk_batch(sycl::queue &queue,
                             oneapi::mkl::uplo upper_lower,
                             oneapi::mkl::transpose trans,
                             std::int64_t n,
                             std::int64_t k,
                             T alpha,
                             const T *a,
                             std::int64_t lda,
                             std::int64_t stridea,
                             T beta,
                             T *c,
                             std::int64_t ldc,
                             std::int64_t stridec,
                             std::int64_t batch_size,
                             const std::vector<sycl::event> &dependencies = {})
   }

.. code-block:: cpp

   namespace oneapi::mkl::blas::row_major {
      sycl::event syrk_batch(sycl::queue &queue,
                             oneapi::mkl::uplo upper_lower,
                             oneapi::mkl::transpose trans,
                             std::int64_t n,
                             std::int64_t k,
                             T alpha,
                             const T *a,
                             std::int64_t lda,
                             std::int64_t stridea,
                             T beta,
                             T *c,
                             std::int64_t ldc,
                             std::int64_t stridec,
                             std::int64_t batch_size,
                             const std::vector<sycl::event> &dependencies = {})
   }

Input Parameters
----------------

queue
   The queue where the routine should be executed.

upper_lower
   Specifies whether matrices ``C`` are upper or lower triangular. See :ref:`data-types` for more details.

trans
   Specifies op(``A``), transposition operation applied to matrices ``A``. 
   Conjugation is never performed even if ``trans`` = ``transpose::conjtrans``. See :ref:`data-types` for more details.

n
   Number of rows and columns of matrices ``C``. Must be at least zero.

k
   Number of columns of matrices op(``A``). Must be at least zero.

alpha
   Scaling factor for rank-k update.

a
   Pointer to input matrices ``A``. Size of the array must be at least ``stridea`` * ``batch_size``.

lda
   Leading dimension of matrices ``A``. Must be positive. 

   .. list-table::
      :header-rows: 1
  
      * -
        - ``transa`` = ``transpose::nontrans``
        - ``transa`` = ``transpose::trans`` or ``trans`` = ``transpose::conjtrans``
      * - Column major
        - Must be at least ``n``
        - Must be at least ``k``
      * - Row major
        - Must be at least ``k``
        - Must be at least ``n``

stridea
   Stride between two consecutive ``A`` matrices. 

   .. list-table::
      :header-rows: 1
  
      * -
        - ``transa`` = ``transpose::nontrans``
        - ``transa`` = ``transpose::trans`` or ``trans`` = ``transpose::conjtrans``
      * - Column major
        - Must be at least ``lda`` * ``k``
        - Must be at least ``lda`` * ``n``
      * - Row major
        - Must be at least ``lda`` * ``n``
        - Must be at least ``lda`` * ``k``

beta
   Scaling factor for matrices ``C``.

c
   Pointer to input/output matrices ``C``. Size of the array must be at least ``stridec`` * ``batch_size``.

ldc
   Leading dimension of matrices ``C``. Must be positive and at least ``n``.

stridec
   Stride between  two consecutive ``C`` matrices. Must be least ``ldc`` * ``n``.

batch_size
   Specifies the number of matrix multiply operations to perform.

dependencies
   List of events to wait for before starting computation, if any. If omitted, defaults to no dependencies.

Output Parameters
-----------------

c
   Pointer to output matrices ``C`` overwritten by ``batch_size`` ``syrk`` operations of the form ``alpha`` * op(``A``) * op(``A``)\ :sup:`T` + ``beta`` * ``C``.

Return Values
-------------

Output event to wait on to ensure computation is complete.