.. _onemkl_blas_trsm_batch:

trsm_batch
==========

Computes a group of ``trsm`` operations.

Description
***********

The ``trsm_batch`` routines are batched versions of :ref:`onemkl_blas_trsm`, performing multiple ``trsm`` operations in a single call. Each ``trsm`` solves an equation of the form op(A) * X = alpha * B or X * op(A) = alpha * B.

``trsm_batch`` supports the following precisions:

.. list-table::
   :header-rows: 1

   * -  T
   * -  ``float``
   * -  ``double``
   * -  ``std::complex<float>``
   * -  ``std::complex<double>``


trsm_batch (Buffer Version)
---------------------------

Buffer version of ``trsm_batch`` supports only strided API. 
  
**Strided API**
---------------

Strided API operation is defined as:

.. code-block::

      for i = 0 … batch_size – 1
          A and B are matrices at offset i * stridea and i * strideb in a and b.
          if (left_right == side::left) then
             compute X such that op(A) * X = alpha * B
          else 
             compute X such that X * op(A) = alpha * B
          B = X
      end for

where:

- op(``A``) is one of op(``A``) = ``A``, or op(``A``) = ``A``\ :sup:`T`, or op(``A``) = ``A``\ :sup:`H`

- ``alpha`` is a scalar

- ``A`` is either ``m`` x ``m`` or ``n`` x ``n`` triangular matrix

- ``B`` and ``X`` are ``m`` x ``n`` general matrices

On return, matrix ``B`` is overwritten by solution matrix ``X``.

For strided API, ``a`` and ``b`` buffers contains all the input matrices. The stride between matrices is given by the stride parameters. Total number of matrices in ``a`` and ``b`` buffers is given by ``batch_size`` parameter.

Syntax
------

.. code-block:: cpp

   namespace oneapi::mkl::blas::column_major {
       void trsm_batch(sycl::queue &queue,
                       oneapi::mkl::side left_right,
                       oneapi::mkl::uplo upper_lower,
                       oneapi::mkl::transpose trans,
                       oneapi::mkl::diag unit_diag,
                       std::int64_t m,
                       std::int64_t n,
                       T alpha,
                       sycl::buffer<T,1> &a,
                       std::int64_t lda,
                       std::int64_t stridea,
                       sycl::buffer<T,1> &b,
                       std::int64_t ldb,
                       std::int64_t strideb,
                       std::int64_t batch_size)
   }

.. code-block:: cpp

   namespace oneapi::mkl::blas::row_major {
       void trsm_batch(sycl::queue &queue,
                       oneapi::mkl::side left_right,
                       oneapi::mkl::uplo upper_lower,
                       oneapi::mkl::transpose trans,
                       oneapi::mkl::diag unit_diag,
                       std::int64_t m,
                       std::int64_t n,
                       T alpha,
                       sycl::buffer<T,1> &a,
                       std::int64_t lda,
                       std::int64_t stridea,
                       sycl::buffer<T,1> &b,
                       std::int64_t ldb,
                       std::int64_t strideb,
                       std::int64_t batch_size)
   }

Input Parameters
----------------

queue
   The queue where the routine should be executed.


left_right
   Specifies whether matrices ``A`` are on the left side or right side of the multiplication. See :ref:`data-types` for more details.

upper_lower
   Specifies whether matrices ``A`` are upper or lower triangular. See :ref:`data-types` for more details.

trans
   Specifies op(``A``), transposition operation applied to matrices ``A``. See :ref:`data-types` for more details.

unit_diag
   Specifies whether matrices ``A`` are unit triangular or not. See :ref:`data-types` for more details.

m
   Number of rows of matrices ``B``. Must be at least zero.

n
   Number of columns of matrices ``B``. Must be at least zero.

alpha
   Scaling factor for the solution.

a
   Buffer holding input matricees ``A``. Size of the buffer must be at least ``stridea`` * ``batch_size``.


lda
   Leading dimension of matrices ``A``. Must be at least ``m`` if ``left_right`` = ``side::left`` or at least ``n`` if ``left_right`` = ``side::right``. Must be positive.

stridea
   Stride between two consecutive ``A`` matrices.

b
   Buffer holding input/output matrices ``B``. Size of the buffer must be at least ``strideb`` * ``batch_size``.

ldb
   Leading dimension of matrices ``B``. Must be at least ``m`` if column major layout or at least ``n`` if row major layout is used. Must be positive.

strideb
   Stride between two consecutive ``B`` matrices.

batch_size
   Specifies number of triangular linear systems to solve.  

Output Parameters
-----------------

b
   Output buffer overwritten by ``batch_size`` solution matrices ``X``.

.. note::
   If ``alpha`` = 0, matrices ``B`` are set to zero, and ``A`` and ``B`` do not need to be initialized before calling ``trsm_batch``..


trsm_batch (USM Version)
------------------------

USM version of ``trsm_batch`` supports group API and strided API.

**Group API**
-------------

Group API operation is defined as:

.. code-block::

      idx = 0
      for i = 0 … group_count – 1
          for j = 0 … group_size – 1
              A and B are matrices in a[idx] and b[idx]
              if (left_right == side::left) then
                  compute X such that op(A) * X = alpha[i] * B
              else
                  compute X such that X * op(A) = alpha[i] * B
              end if
              B = X
              idx = idx + 1
          end for
      end for     

where:

- op(``A``) is one of op(``A``) = ``A``, or op(``A``) = ``A``\ :sup:`T`, or op(``A``) = ``A``\ :sup:`H`

- ``alpha`` is a scalar

- ``A`` is either ``m`` x ``m`` or ``n`` x ``n`` triangular matrix

- ``B`` and ``X`` are ``m`` x ``n`` general matrices

On return, matrix ``B`` is overwritten by solution matrix ``X``.

For group API, ``a`` and ``b`` arrays contain the pointers for all the input matrices. 
The total number of matrices in ``a`` and ``b`` are given by: 

.. math::

      total\_batch\_count = \sum_{i=0}^{group\_count-1}group\_size[i]    

Syntax
------

.. code-block:: cpp

   namespace oneapi::mkl::blas::column_major {
       sycl::event trsm_batch(sycl::queue &queue,
                              oneapi::mkl::side *left_right,
                              oneapi::mkl::uplo *upper_lower,
                              oneapi::mkl::transpose *trans,
                              oneapi::mkl::diag *unit_diag,
                              std::int64_t *m,
                              std::int64_t *n,
                              T *alpha,
                              const T **a,
                              std::int64_t *lda,
                              T **b,
                              std::int64_t *ldb,
                              std::int64_t group_count,
                              std::int64_t *group_size,
                              const std::vector<sycl::event> &dependencies = {})
   }
.. code-block:: cpp

   namespace oneapi::mkl::blas::row_major {
       sycl::event trsm_batch(sycl::queue &queue,
                              oneapi::mkl::side *left_right,
                              oneapi::mkl::uplo *upper_lower,
                              oneapi::mkl::transpose *trans,
                              oneapi::mkl::diag *unit_diag,
                              std::int64_t *m,
                              std::int64_t *n,
                              T *alpha,
                              const T **a,
                              std::int64_t *lda,
                              T **b,
                              std::int64_t *ldb,
                              std::int64_t group_count,
                              std::int64_t *group_size,
                              const std::vector<sycl::event> &dependencies = {})
   }

Input Parameters
----------------

queue
   The queue where the routine should be executed.

left_right
   Array of ``group_count`` ``oneapi::mkl::side`` values. ``left_right[i]`` specifies whether matrices
   ``A`` are on the left side or right side of the multiplication in group ``i``. See :ref:`data-types` for more details.

upper_lower
   Array of ``group_count`` ``oneapi::mkl::uplo`` values. ``upper_lower[i]`` specifies whether matrices
   ``A`` are upper or lower triangular in group ``i``. See :ref:`data-types` for more details.

trans
   Array of ``group_count`` ``oneapi::mkl::transpose`` values. ``trans[i]`` specifies op(``A``), transposition operation 
   applied to matrices ``A`` in each group ``i``. See :ref:`data-types` for more details.

unit_diag
   Array of ``group_count`` ``oneapi::mkl::diag`` values. ``unit_diag[i]`` specifies whether matrices
   ``A`` are unit triangular or not. See :ref:`data-types` for more details.

m
   Array of ``group_count`` integers. ``m[i]`` specifies number of rows of matrices ``B`` in group ``i``. All entries must be at least zero.

n
   Array of ``group_count`` integers. ``n[i]`` specifies number of columns of matrices ``B`` in group ``i``. All entries must be at least zero.

alpha
   Array of ``group_count`` scalar elements. ``alpha[i]`` specifies scaling factors for the solutions in group ``i``.

a
   Array of ``total_batch_count`` pointers for input matrices ``A``. See :ref:`matrix-storage` for more details.

lda
   Array of ``group_count`` integers. ``lda[i]`` specifies leading dimension of matrices ``A`` in group ``i``. Must be at least ``m[i]`` if 
   ``left_right[i]`` = ``side::left`` or at least ``n[i]`` if ``left_right[i]`` = ``side::right``. All entries must be positive. 

b
   Array of ``total_batch_count`` pointers for input/output matrices ``B``. See :ref:`matrix-storage` for more details.

ldb
   Array of ``group_count`` integers. ``ldb[i]`` specifies leading dimension of matrices ``B`` in group ``i``. Must be at least ``m[i]`` if 
   column major layout or at least ``n[i]`` if row major layout is used. All entries must be positive. 

group_count
   Number of groups. Must be at least zero.

group_size
   Array of ``group_count`` integers. ``group_size[i]`` specifies the number of ``trsm`` operations in group ``i``. Each element in ``group_size`` must be at least zero.

dependencies
   List of events to wait for before starting computation, if any. If omitted, defaults to no dependencies.

Output Parameters
-----------------

b
   Array of pointers to output matrices ``B`` overwritten by ``total_batch_count`` solution matrices ``X``.

.. note::
   If ``alpha`` = 0, matrices ``B`` are set to zero, and ``A`` and ``B`` do not need to be initialized before calling ``trsm_batch``..

Return Values
-------------

Output event to wait on to ensure computation is complete.

**Strided API**
---------------

Strided API operation is defined as:

.. code-block::

      for i = 0 … batch_size – 1
          A and B are matrices at offset i * stridea and i * strideb in a and b.
          if (left_right == side::left) then
             compute X such that op(A) * X = alpha * B
          else 
             compute X such that X * op(A) = alpha * B
          B = X
      end for

where:

- op(``A``) is one of op(``A``) = ``A``, or op(``A``) = ``A``\ :sup:`T`, or op(``A``) = ``A``\ :sup:`H`

- ``alpha`` is a scalar

- ``A`` is either ``m`` x ``m`` or ``n`` x ``n`` triangular matrix

- ``B`` and ``X`` are ``m`` x ``n`` general matrices

On return, matrix ``B`` is overwritten by solution matrix ``X``.

For strided API, ``a`` and ``b`` arrays contain all the input matrices. The stride between matrices is given by the stride parameters. Total number of matrices in ``a`` and ``b`` arrays is given by ``batch_size`` parameter.

Syntax
------

.. code-block:: cpp

   namespace oneapi::mkl::blas::column_major {
       sycl::event trsm_batch(sycl::queue &queue,
                              oneapi::mkl::side left_right,
                              oneapi::mkl::uplo upper_lower,
                              oneapi::mkl::transpose trans,
                              oneapi::mkl::diag unit_diag,
                              std::int64_t m,
                              std::int64_t n,
                              T alpha,
                              const T *a,
                              std::int64_t lda,
                              std::int64_t stridea,
                              T *b,
                              std::int64_t ldb,
                              std::int64_t strideb,
                              std::int64_t batch_size,
                              const std::vector<sycl::event> &dependencies = {})
   }

.. code-block:: cpp

   namespace oneapi::mkl::blas::row_major {
       sycl::event trsm_batch(sycl::queue &queue,
                              oneapi::mkl::side left_right,
                              oneapi::mkl::uplo upper_lower,
                              oneapi::mkl::transpose trans,
                              oneapi::mkl::diag unit_diag,
                              std::int64_t m,
                              std::int64_t n,
                              T alpha,
                              const T *a,
                              std::int64_t lda,
                              std::int64_t stridea,
                              T *b,
                              std::int64_t ldb,
                              std::int64_t strideb,
                              std::int64_t batch_size,
                              const std::vector<sycl::event> &dependencies = {})
   }

Input Parameters
----------------

queue
   The queue where the routine should be executed.


left_right
   Specifies whether matrices ``A`` are on the left side or right side of the multiplication. See :ref:`data-types` for more details.

upper_lower
   Specifies whether matrices ``A`` are upper or lower triangular. See :ref:`data-types` for more details.

trans
   Specifies op(``A``), transposition operation applied to matrices ``A``. See :ref:`data-types` for more details.

unit_diag
   Specifies whether matrices ``A`` are unit triangular or not. See :ref:`data-types` for more details.

m
   Number of rows of matrices ``B``. Must be at least zero.

n
   Number of columns of matrices ``B``. Must be at least zero.

alpha
   Scaling factor for the solution.

a
   Pointer to input matricees ``A``. Size of the array must be at least ``stridea`` * ``batch_size``.

lda
   Leading dimension of matrices ``A``. Must be at least ``m`` if ``left_right`` = ``side::left`` or at least ``n`` if ``left_right`` = ``side::right``. Must be positive.

stridea
   Stride between two consecutive ``A`` matrices.

b
   Pointer to input/output matrices ``B``. Size of the array must be at least ``strideb`` * ``batch_size``.

ldb
   Leading dimension of matrices ``B``. Must be at least ``m`` if column major layout or at least ``n`` if row major layout is used. Must be positive.

strideb
   Stride between two consecutive ``B`` matrices.

batch_size
   Specifies number of triangular linear systems to solve.  

dependencies
   List of events to wait for before starting computation, if any. If omitted, defaults to no dependencies.

Output Parameters
-----------------

b
   Pointer to output matrix ``B`` overwritten by ``batch_size`` solution matrices ``X``.

.. note::
   If ``alpha`` = 0, matrices ``B`` are set to zero, and ``A`` and ``B`` do not need to be initialized before calling ``trsm_batch``..

Return Values
-------------

Output event to wait on to ensure computation is complete.