diff --git a/fix-rankk-rank2k/Makefile b/fix-rankk-rank2k/Makefile new file mode 100644 index 00000000..655cdec6 --- /dev/null +++ b/fix-rankk-rank2k/Makefile @@ -0,0 +1,3 @@ +include ../P0009/wg21/Makefile + +.DEFAULT_GOAL := $(HTML) diff --git a/fix-rankk-rank2k/P3371R0.html b/fix-rankk-rank2k/P3371R0.html new file mode 100644 index 00000000..c2b73a5b --- /dev/null +++ b/fix-rankk-rank2k/P3371R0.html @@ -0,0 +1,1301 @@ + + +
+ + + + +| Document #: | +P3371R0 | +
| Date: | +2024-08-07 | +
| Project: | +Programming Language C++ + LEWG + |
+
| Reply-to: | +
+ Mark Hoemmen <mhoemmen@nvidia.com> + |
+
The four std::linalg functions
+symmetric_matrix_rank_k_update,
+hermitian_matrix_rank_k_update,
+symmetric_matrix_rank_2k_update, and
+hermitian_matrix_rank_2k_update currently have behavior
+inconsistent with their corresponding BLAS (Basic Linear Algebra
+Subroutines) routines. Also, their behavior is inconsistent with
+std::linalg’s matrix_product, even though in
+mathematical terms they are special cases of a matrix-matrix
+product.
We propose fixing this by making two changes. First, we will add
+“updating” overloads analogous to the updating overloads of
+matrix_product. For example,
+symmetric_matrix_rank_k_update(A, scaled(beta, C), C, upper_triangle)
+will perform C := βC + AAT.
+Second, we will change the behavior of the existing overloads to be
+“overwriting,” without changing the number of their parameters or how
+they are called. For example,
+symmetric_matrix_rank_k_update(A, C, upper_triangle) will
+perform C := AAT
+instead of C := C + AAT.
The second set of changes is a breaking change. Thus, we must finish +this before finalization of C++26.
+Each function in std::linalg generally corresponds to one or more +routines or functions in the original BLAS (Basic Linear Algebra +Subroutines). Every computation that the BLAS can do, a function in +std::linalg should be able to do.
+One std::linalg user
+reported
+an exception to this rule. The BLAS routine DSYRK
+(Double-precision SYmmetric Rank-K update) computes C := βC + αAAT,
+but the corresponding std::linalg function
+symmetric_matrix_rank_k_update only computes C := C + αAAT.
+That is, std::linalg currently has no way to express this
+BLAS operation with a general β scaling factor. This issue applies
+to all of the following functions.
symmetric_matrix_rank_k_update: computes C := C + αAAThermitian_matrix_rank_k_update: computes C := C + αAAHsymmetric_matrix_rank_2k_update: computes C := C + αABH + αBAHhermitian_matrix_rank_2k_update: computes C := C + αABH + ᾱBAH,
+where ᾱ denotes the complex
+conjugate of αThese functions implement special cases of matrix-matrix products.
+The matrix_product function in std::linalg
+implements the general case of matrix-matrix products. This function
+corresponds to the BLAS’s SGEMM, DGEMM,
+CGEMM, and ZGEMM, which compute C := βC + αAB,
+where α and β are scaling factors. The
+matrix_product function has two kinds of overloads:
overwriting (C = AB) +and
updating (C = E + AB).
The updating overloads handle the general α and β case by
+matrix_product(scaled(alpha, A), B, scaled(beta, C), C).
+The specification explicitly permits the input
+scaled(beta, C) to alias the output C
+([linalg.algs.blas3.gemm] 10: “Remarks:
+C may alias E”). The std::linalg
+library provides overwriting and updating overloads so that it can do
+everything that the BLAS does, just in a more idiomatically C++ way.
+Please see
+P1673R13
+Section 10.3 (“Function argument aliasing and zero scalar
+multipliers”) for a more detailed explanation.
The problem with functions like
+symmetric_matrix_rank_k_update is that they have the same
+calling syntax as the overwriting version of
+matrix_product, but semantics that differ from
+both the overwriting and the updating versions of
+matrix_product. The current overloads are not overwriting,
+so we can’t just fix this problem by introducing an “updating” overload
+for each function.
Incidentally, the fact that these functions have “update” in their
+name is not relevant, because that naming choice is original to the
+BLAS. The BLAS calls its corresponding xSYRK,
+xHERK, xSYR2K, and xHER2K
+routines “{Symmetric, Hermitian} rank {one, two} update,” even though
+setting β = 0 makes these
+routines “overwriting” in the sense of std::linalg.
We propose to fix this by making the four functions work just like
+matrix_vector_product or matrix_product. This
+entails two changes.
We will add “updating” overloads with a new input matrix
+parameter E, analogous to the updating overloads of
+matrix_product. For example,
+symmetric_matrix_rank_k_update(A, E, C, upper_triangle)
+will compute C = E + AAT.
+We will specifically permit C and E to alias,
+thus permitting the desired case where E is
+scaled(beta, C). The updating overloads will take
+E as an in-matrix, and will change
+C from a possibly-packed-inout-matrix
+to a possibly-packed-out-matrix.
We will change the behavior of the existing overloads to be
+“overwriting,” without changing how they are called. For example,
+symmetric_matrix_rank_k_update(A, C, upper_triangle) will
+compute C = AAT
+instead of C := C + AAT.
+The overwriting overloads will change C from a
+possibly-packed-inout-matrix to a
+possibly-packed-out-matrix.
This second set of changes is a breaking change. It needs to be so +that we can provide the overwriting behavior C := αAAT +that the corresponding BLAS routines already provide. Thus, we must +finish this before finalization of C++26.
+Both sets of overloads still only write to the specified triangle
+(lower or upper) of the output matrix C. As a result, the
+new updating overloads only read from that triangle of the input matrix
+E. Therefore, even though E may be a different
+matrix than C, the updating overloads do not need an
+additional Triangle t_E parameter for E. The
+symmetric_* functions interpret E as symmetric
+in the same way that they interpret C as symmetric, and the
+hermitian_* functions interpret E as Hermitian
+in the same way that they interpret C as Hermitian.
The four functions we propose to change have the following rank-1 and +rank-2 analogs, where A +denotes a symmetric or Hermitian matrix (depending on the function’s +name) and x and y denote vectors.
+symmetric_matrix_rank_1_update: computes A := A + αxxThermetian_matrix_rank_1_update: computes A := A + αxxHsymmetric_matrix_rank_2_update: computes A := A + αxyT + αyxThermitian_matrix_rank_2_update: computes A := A + αxyH + ᾱxyHWe do NOT propose to change these functions analogously to the rank-k
+and rank-2k update functions. This is because the BLAS routines
+corresponding to the rank-1 and rank-2 functions – xSYR,
+xHER, xSYR2, and xHER2 – do not
+have a way to supply a β
+scaling factor. That is, these std::linalg functions can
+already do everything that their corresponding BLAS routines can do.
+This is consistent with our design intent in
+Section
+10.3 of P1673R3 for translating Fortran INTENT(INOUT)
+arguments into a C++ idiom.
+++
+- +
Else, if the BLAS function unconditionally updates (like +
xGER), we retain read-and-write behavior for that +argument.- +
Else, if the BLAS function uses a scalar
beta+argument to decide whether to read the output argument as well as write +to it (likexGEMM), we provide two versions: a write-only +[that is, “overwriting”] version (as ifbetais zero), and +a read-and-write [that is, “updating”] version (as ifbeta+is nonzero).
The rank-1 and rank-2 update functions “unconditionally update,” in
+the same way that the BLAS’s general rank-1 update routine
+xGER does. However, the BLAS’s rank-k and rank-2k update
+functions “use a scalar beta argument…,” so for
+consistency, it makes sense for std::linalg to provide both
+overwriting and updating versions.
Since we do not propose changing the symmetric and Hermitian rank-1
+and rank-2 functions, we retain the exposition-only concept
+possibly-packed-inout-matrix, which they use to
+constrain their parameter A.
There are some wording issues in
+[linalg.algs.blas3.rankk] and
+[linalg.algs.blas3.rank2k], specifically in the
+Mandates, Preconditions, and Complexity clauses. These clauses do not
+reflect the design intent, which is expressed by the corresponding BLAS
+routines and the mathematical expressions in the current wording that
+express the functions’ behavior. We have not yet filed an LWG issue for
+these wording issues. Given that the above changes will call for
+adjusting some parts of these clauses anyway (e.g., by changing
+InOutMat to OutMat), we have decided to
+propose “drive-by” fixes to these clauses in this proposal. We
+nevertheless plan to file an LWG issue to fix both these and other
+wording issues we have found.
Many thanks (with permission) to Raffaele Solcà (CSCS Swiss National
+Supercomputing Centre, raffaele.solca@cscs.ch) for pointing
+out this issue and related issues that we plan to file as part of the
+aforementioned LWG issue.
++Text in blockquotes is not proposed wording, but rather instructions +for generating proposed wording. The � character is used to denote a +placeholder section number which the editor shall determine.
+In [version.syn], for the following definition,
+
#define __cpp_lib_linalg YYYYMML // also in <linalg>++adjust the placeholder value
+YYYYMMLas needed so as to +denote this proposal’s date of adoption.
++In the header
+<linalg>synopsis +[linalg.syn], immediately before the following,
template<class T>
+ concept possibly-packed-inout-matrix = see below; // exposition only++add the following.
+
template<class T>
+ concept possibly-packed-out-matrix = see below; // exposition onlyThen, to [linalg.helpers.concepts], immediately +before the following,
+template<class T>
+ concept possibly-packed-inout-matrix =
+ is-mdspan<T> && T::rank() == 2 &&
+ is_assignable_v<typename T::reference, typename T::element_type> &&
+ (T::is_always_unique() || is-layout-blas-packed<typename T::layout_type>);++add the following definition of the exposition-only concept +
+possibly-packed-out-matrix.
template<class T>
+ concept possibly-packed-out-matrix =
+ is-mdspan<T> && T::rank() == 2 &&
+ is_assignable_v<typename T::reference, typename T::element_type> &&
+ (T::is_always_unique() || is-layout-blas-packed<typename T::layout_type>);++In the header
+<linalg>synopsis +[linalg.syn], change the following
// rank-k symmetric matrix update
+ template<class Scalar, in-matrix InMat, possibly-packed-inout-matrix InOutMat, class Triangle>
+ void symmetric_matrix_rank_k_update(Scalar alpha, InMat A, InOutMat C, Triangle t);
+ template<class ExecutionPolicy, class Scalar,
+ in-matrix InMat, possibly-packed-inout-matrix InOutMat, class Triangle>
+ void symmetric_matrix_rank_k_update(ExecutionPolicy&& exec,
+ Scalar alpha, InMat A, InOutMat C, Triangle t);
+
+ template<in-matrix InMat, possibly-packed-inout-matrix InOutMat, class Triangle>
+ void symmetric_matrix_rank_k_update(InMat A, InOutMat C, Triangle t);
+ template<class ExecutionPolicy,
+ in-matrix InMat, possibly-packed-inout-matrix InOutMat, class Triangle>
+ void symmetric_matrix_rank_k_update(ExecutionPolicy&& exec,
+ InMat A, InOutMat C, Triangle t);
+
+ // rank-k Hermitian matrix update
+ template<class Scalar, in-matrix InMat, possibly-packed-inout-matrix InOutMat, class Triangle>
+ void hermitian_matrix_rank_k_update(Scalar alpha, InMat A, InOutMat C, Triangle t);
+ template<class ExecutionPolicy,
+ class Scalar, in-matrix InMat, possibly-packed-inout-matrix InOutMat, class Triangle>
+ void hermitian_matrix_rank_k_update(ExecutionPolicy&& exec,
+ Scalar alpha, InMat A, InOutMat C, Triangle t);
+
+ template<in-matrix InMat, possibly-packed-inout-matrix InOutMat, class Triangle>
+ void hermitian_matrix_rank_k_update(InMat A, InOutMat C, Triangle t);
+ template<class ExecutionPolicy,
+ in-matrix InMat, possibly-packed-inout-matrix InOutMat, class Triangle>
+ void hermitian_matrix_rank_k_update(ExecutionPolicy&& exec,
+ InMat A, InOutMat C, Triangle t);++to read as follows.
+
// overwriting rank-k symmetric matrix update
+ template<class Scalar,
+ in-matrix InMat,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+ void symmetric_matrix_rank_k_update(
+ Scalar alpha, InMat A, OutMat C, Triangle t);
+ template<class ExecutionPolicy, class Scalar,
+ in-matrix InMat,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+ void symmetric_matrix_rank_k_update(
+ ExecutionPolicy&& exec, Scalar alpha, InMat A, OutMat C, Triangle t);
+ template<in-matrix InMat,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+ void symmetric_matrix_rank_k_update(
+ InMat A, OutMat C, Triangle t);
+ template<class ExecutionPolicy,
+ in-matrix InMat,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+ void symmetric_matrix_rank_k_update(
+ ExecutionPolicy&& exec, InMat A, OutMat C, Triangle t);
+
+ // updating rank-k symmetric matrix update
+ template<class Scalar,
+ in-matrix InMat1,
+ in-matrix InMat2,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+ void symmetric_matrix_rank_k_update(
+ Scalar alpha,
+ InMat1 A, InMat2 E, OutMat C, Triangle t);
+ template<class ExecutionPolicy, class Scalar,
+ in-matrix InMat1,
+ in-matrix InMat2,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+ void symmetric_matrix_rank_k_update(
+ ExecutionPolicy&& exec, Scalar alpha,
+ InMat1 A, InMat2 E, OutMat C, Triangle t);
+ template<in-matrix InMat1,
+ in-matrix InMat2,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+ void symmetric_matrix_rank_k_update(
+ InMat1 A, InMat2 E, OutMat C, Triangle t);
+ template<class ExecutionPolicy,
+ in-matrix InMat1,
+ in-matrix InMat2,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+ void symmetric_matrix_rank_k_update(
+ ExecutionPolicy&& exec,
+ InMat1 A, InMat2 E, OutMat C, Triangle t);
+
+ // overwriting rank-k Hermitian matrix update
+ template<class Scalar,
+ in-matrix InMat,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+ void hermitian_matrix_rank_k_update(
+ Scalar alpha, InMat A, OutMat C, Triangle t);
+ template<class ExecutionPolicy, class Scalar,
+ in-matrix InMat,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+ void hermitian_matrix_rank_k_update(
+ ExecutionPolicy&& exec, Scalar alpha, InMat A, OutMat C, Triangle t);
+ template<in-matrix InMat,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+ void hermitian_matrix_rank_k_update(
+ InMat A, OutMat C, Triangle t);
+ template<class ExecutionPolicy,
+ in-matrix InMat,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+ void hermitian_matrix_rank_k_update(
+ ExecutionPolicy&& exec, InMat A, OutMat C, Triangle t);
+
+ // updating rank-k Hermitian matrix update
+ template<class Scalar,
+ in-matrix InMat1,
+ in-matrix InMat2,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+ void hermitian_matrix_rank_k_update(
+ Scalar alpha,
+ InMat1 A, InMat2 E, OutMat C, Triangle t);
+ template<class ExecutionPolicy, class Scalar,
+ in-matrix InMat1,
+ in-matrix InMat2,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+ void hermitian_matrix_rank_k_update(
+ ExecutionPolicy&& exec, Scalar alpha,
+ InMat1 A, InMat2 E, OutMat C, Triangle t);
+ template<in-matrix InMat1,
+ in-matrix InMat2,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+ void hermitian_matrix_rank_k_update(
+ InMat1 A, InMat2 E, OutMat C, Triangle t);
+ template<class ExecutionPolicy,
+ in-matrix InMat1,
+ in-matrix InMat2,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+ void hermitian_matrix_rank_k_update(
+ ExecutionPolicy&& exec,
+ InMat1 A, InMat2 E, OutMat C, Triangle t);++In the header
+<linalg>synopsis +[linalg.syn], change the following
// rank-2k symmetric matrix update
+ template<in-matrix InMat1, in-matrix InMat2,
+ possibly-packed-inout-matrix InOutMat, class Triangle>
+ void symmetric_matrix_rank_2k_update(InMat1 A, InMat2 B, InOutMat C, Triangle t);
+ template<class ExecutionPolicy,
+ in-matrix InMat1, in-matrix InMat2,
+ possibly-packed-inout-matrix InOutMat, class Triangle>
+ void symmetric_matrix_rank_2k_update(ExecutionPolicy&& exec,
+ InMat1 A, InMat2 B, InOutMat C, Triangle t);
+
+ // rank-2k Hermitian matrix update
+ template<in-matrix InMat1, in-matrix InMat2,
+ possibly-packed-inout-matrix InOutMat, class Triangle>
+ void hermitian_matrix_rank_2k_update(InMat1 A, InMat2 B, InOutMat C, Triangle t);
+ template<class ExecutionPolicy,
+ in-matrix InMat1, in-matrix InMat2,
+ possibly-packed-inout-matrix InOutMat, class Triangle>
+ void hermitian_matrix_rank_2k_update(ExecutionPolicy&& exec,
+ InMat1 A, InMat2 B, InOutMat C, Triangle t);++to read as follows.
+
// overwriting rank-2k symmetric matrix update
+ template<in-matrix InMat1, in-matrix InMat2,
+ possibly-packed-out-matrix OutMat, class Triangle>
+ void symmetric_matrix_rank_2k_update(InMat1 A, InMat2 B, OutMat C, Triangle t);
+ template<class ExecutionPolicy,
+ in-matrix InMat1, in-matrix InMat2,
+ possibly-packed-out-matrix OutMat, class Triangle>
+ void symmetric_matrix_rank_2k_update(ExecutionPolicy&& exec,
+ InMat1 A, InMat2 B, OutMat C, Triangle t);
+
+ // updating rank-2k symmetric matrix update
+ template<in-matrix InMat1, in-matrix InMat2, in-matrix InMat3,
+ possibly-packed-out-matrix OutMat, class Triangle>
+ void symmetric_matrix_rank_2k_update(InMat1 A, InMat2 B, InMat3 E, OutMat C, Triangle t);
+ template<class ExecutionPolicy,
+ in-matrix InMat1, in-matrix InMat2, in-matrix InMat3,
+ possibly-packed-out-matrix OutMat, class Triangle>
+ void symmetric_matrix_rank_2k_update(ExecutionPolicy&& exec,
+ InMat1 A, InMat2 B, InMat3 E, OutMat C, Triangle t);
+
+ // overwriting rank-2k Hermitian matrix update
+ template<in-matrix InMat1, in-matrix InMat2,
+ possibly-packed-out-matrix OutMat, class Triangle>
+ void hermitian_matrix_rank_2k_update(InMat1 A, InMat2 B, OutMat C, Triangle t);
+ template<class ExecutionPolicy,
+ in-matrix InMat1, in-matrix InMat2,
+ possibly-packed-out-matrix OutMat, class Triangle>
+ void hermitian_matrix_rank_2k_update(ExecutionPolicy&& exec,
+ InMat1 A, InMat2 B, OutMat C, Triangle t);
+
+ // updating rank-2k Hermitian matrix update
+ template<in-matrix InMat1, in-matrix InMat2, in-matrix InMat3,
+ possibly-packed-out-matrix OutMat, class Triangle>
+ void hermitian_matrix_rank_2k_update(InMat1 A, InMat2 B, InMat3 E, OutMat C, Triangle t);
+ template<class ExecutionPolicy,
+ in-matrix InMat1, in-matrix InMat2, in-matrix InMat3,
+ possibly-packed-out-matrix OutMat, class Triangle>
+ void hermitian_matrix_rank_2k_update(ExecutionPolicy&& exec,
+ InMat1 A, InMat2 B, InMat3 E, OutMat C, Triangle t);++Replace the entire contents of [linalg.algs.blas3.rankk] with the +following.
+
[Note: These functions correspond to the BLAS functions
+xSYRK and xHERK. – end note]
1 +The following elements apply to all functions in +[linalg.algs.blas3.rankk].
+2 +Mandates:
+(2.1) If
+OutMat has layout_blas_packed layout, then the
+layout’s Triangle template argument has the same type as
+the function’s Triangle template argument.
(2.2)
+possibly-multipliable<decltype(A), decltype(transposed(A)), decltype(C)>
+is true.
(2.3)
+possibly-addable<decltype(C), decltype(E), decltype(C)>
+is true for those overloads that take an E
+parameter.
3 +Preconditions:
+(3.1)
+multipliable(A, transposed(A), C) is
+true. [Note: This implies that C is
+square. – end note]
(3.2)
+addable(C, E, C) is true
+for those overloads that take an E parameter.
4
+Complexity: O(
+A.extent(0) ⋅
+A.extent(1) ⋅
+A.extent(0) ).
template<class Scalar,
+ in-matrix InMat,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+void symmetric_matrix_rank_k_update(
+ Scalar alpha,
+ InMat A,
+ OutMat C,
+ Triangle t);
+template<class ExecutionPolicy,
+ class Scalar,
+ in-matrix InMat,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+void symmetric_matrix_rank_k_update(
+ ExecutionPolicy&& exec,
+ Scalar alpha,
+ InMat A,
+ OutMat C,
+ Triangle t);5
+Effects: Computes C = αAAT,
+where the scalar α is
+alpha.
template<in-matrix InMat,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+void symmetric_matrix_rank_k_update(
+ InMat A,
+ OutMat C,
+ Triangle t);
+template<class ExecutionPolicy,
+ in-matrix InMat,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+void symmetric_matrix_rank_k_update(
+ ExecutionPolicy&& exec,
+ InMat A,
+ OutMat C,
+ Triangle t);6 +Effects: Computes C = AAT.
+template<class Scalar,
+ in-matrix InMat,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+void hermitian_matrix_rank_k_update(
+ Scalar alpha,
+ InMat A,
+ OutMat C,
+ Triangle t);
+template<class ExecutionPolicy,
+ class Scalar,
+ in-matrix InMat,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+void hermitian_matrix_rank_k_update(
+ ExecutionPolicy&& exec,
+ Scalar alpha,
+ InMat A,
+ OutMat C,
+ Triangle t);7
+Effects: Computes C = αAAH,
+where the scalar α is
+alpha.
template<in-matrix InMat,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+void hermitian_matrix_rank_k_update(
+ InMat A,
+ OutMat C,
+ Triangle t);
+template<class ExecutionPolicy,
+ in-matrix InMat,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+void hermitian_matrix_rank_k_update(
+ ExecutionPolicy&& exec,
+ InMat A,
+ OutMat C,
+ Triangle t);8 +Effects: Computes C = AAH.
+template<class Scalar,
+ in-matrix InMat1,
+ in-matrix InMat2,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+void symmetric_matrix_rank_k_update(
+ Scalar alpha,
+ InMat1 A,
+ InMat2 E,
+ OutMat C,
+ Triangle t);
+template<class ExecutionPolicy,
+ class Scalar,
+ in-matrix InMat1,
+ in-matrix InMat2,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+void symmetric_matrix_rank_k_update(
+ ExecutionPolicy&& exec,
+ Scalar alpha,
+ InMat1 A,
+ InMat2 E,
+ OutMat C,
+ Triangle t);9
+Effects: Computes C = E + αAAT,
+where the scalar α is
+alpha.
template<in-matrix InMat1,
+ in-matrix InMat2,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+void symmetric_matrix_rank_k_update(
+ InMat1 A,
+ InMat2 E,
+ OutMat C,
+ Triangle t);
+template<class ExecutionPolicy,
+ in-matrix InMat1,
+ in-matrix InMat2,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+void symmetric_matrix_rank_k_update(
+ ExecutionPolicy&& exec,
+ InMat1 A,
+ InMat2 E,
+ OutMat C,
+ Triangle t);10 +Effects: Computes C = E + AAT.
+template<class Scalar,
+ in-matrix InMat1,
+ in-matrix InMat2,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+void hermitian_matrix_rank_k_update(
+ Scalar alpha,
+ InMat1 A,
+ InMat2 E,
+ OutMat C,
+ Triangle t);
+template<class ExecutionPolicy,
+ class Scalar,
+ in-matrix InMat1,
+ in-matrix InMat2,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+void hermitian_matrix_rank_k_update(
+ ExecutionPolicy&& exec,
+ Scalar alpha,
+ InMat1 A,
+ InMat2 E,
+ OutMat C,
+ Triangle t);11
+Effects: Computes C = E + αAAH,
+where the scalar α is
+alpha.
template<in-matrix InMat1,
+ in-matrix InMat2,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+void hermitian_matrix_rank_k_update(
+ InMat1 A,
+ InMat2 E,
+ OutMat C,
+ Triangle t);
+template<class ExecutionPolicy,
+ in-matrix InMat1,
+ in-matrix InMat2,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+void hermitian_matrix_rank_k_update(
+ ExecutionPolicy&& exec,
+ InMat1 A,
+ InMat2 E,
+ OutMat C,
+ Triangle t);12 +Effects: Computes C = E + AAH.
+++Replace the entire contents of [linalg.algs.blas3.rank2k] with the +following.
+
[Note: These functions correspond to the BLAS functions
+xSYR2K and xHER2K. – end note]
1 +The following elements apply to all functions in +[linalg.algs.blas3.rank2k].
+2 +Mandates:
+(2.1) If
+OutMat has layout_blas_packed layout, then the
+layout’s Triangle template argument has the same type as
+the function’s Triangle template argument;
(2.2)
+possibly-multipliable<decltype(A), decltype(transposed(B)), decltype(C)>
+is true.
(2.3)
+possibly-multipliable<decltype(B), decltype(transposed(A)), decltype(C)>
+is true.
(2.4)
+possibly-addable<decltype(C), decltype(E), decltype(C)>
+is true for those overloads that take an E
+parameter.
3 +Preconditions:
+(3.1)
+multipliable(A, transposed(B), C) is
+true.
(3.2)
+multipliable(B, transposed(A), C) is
+true. [Note: This and the previous imply that
+C is square. – end note]
(3.3)
+addable(C, E, C) is true
+for those overloads that take an E parameter.
4
+Complexity: O(
+A.extent(0) ⋅
+A.extent(1) ⋅
+B.extent(0) )
template<in-matrix InMat1,
+ in-matrix InMat2,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+void symmetric_matrix_rank_2k_update(
+ InMat1 A,
+ InMat2 B,
+ OutMat C,
+ Triangle t);
+template<class ExecutionPolicy,
+ in-matrix InMat1,
+ in-matrix InMat2,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+void symmetric_matrix_rank_2k_update(
+ ExecutionPolicy&& exec,
+ InMat1 A,
+ InMat2 B,
+ OutMat C,
+ Triangle t);5 +Effects: Computes C = ABT + BAT.
+template<in-matrix InMat1,
+ in-matrix InMat2,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+void hermitian_matrix_rank_2k_update(
+ InMat1 A,
+ InMat2 B,
+ OutMat C,
+ Triangle t);
+template<class ExecutionPolicy,
+ in-matrix InMat1,
+ in-matrix InMat2,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+void hermitian_matrix_rank_2k_update(
+ ExecutionPolicy&& exec,
+ InMat1 A,
+ InMat2 B,
+ OutMat C,
+ Triangle t);6 +Effects: Computes C = ABH + BAH.
+template<in-matrix InMat1,
+ in-matrix InMat2,
+ in-matrix InMat3,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+void symmetric_matrix_rank_2k_update(
+ InMat1 A,
+ InMat2 B,
+ InMat3 E,
+ OutMat C,
+ Triangle t);
+template<class ExecutionPolicy,
+ in-matrix InMat1,
+ in-matrix InMat2,
+ in-matrix InMat3,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+void symmetric_matrix_rank_2k_update(
+ ExecutionPolicy&& exec,
+ InMat1 A,
+ InMat2 B,
+ InMat3 E,
+ OutMat C,
+ Triangle t);7 +Effects: Computes C = E + ABT + BAT.
+template<in-matrix InMat1,
+ in-matrix InMat2,
+ in-matrix InMat3,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+void hermitian_matrix_rank_2k_update(
+ InMat1 A,
+ InMat2 B,
+ InMat3 E,
+ OutMat C,
+ Triangle t);
+template<class ExecutionPolicy,
+ in-matrix InMat1,
+ in-matrix InMat2,
+ in-matrix InMat3,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+void hermitian_matrix_rank_2k_update(
+ ExecutionPolicy&& exec,
+ InMat1 A,
+ InMat2 B,
+ InMat3 E,
+ OutMat C,
+ Triangle t);8 +Effects: Computes C = E + ABH + BAH.
+| Document #: | +P3371R1 | +
| Date: | +2024-09-11 | +
| Project: | +Programming Language C++ + LEWG + |
+
| Reply-to: | +
+ Mark Hoemmen <mhoemmen@nvidia.com> + Ilya Burylov <burylov@gmail.com> + |
+
Scalar is
+noncomplex but conj is ADL-findable?xTRMM with the implicit unit diagonal option and
+alpha not equal to oneRevision 0 to be submitted 2024-08-15
Revision 1 to be submitted 2024-09-15
+Main wording changes from R0:
+Instead of just changing the symmetric and Hermitian rank-k and +rank-2k updates to have both overwriting and updating overloads, change +all the update functions in this way: rank-1 (general, +symmetric, and Hermitian), rank-2, rank-k, and rank-2k
For all the new updating overloads, specify that C
+(or A) may alias E
For the new symmetric and Hermitian updating overloads, specify
+that the functions access the new E parameter in the same
+way (e.g., with respect to the lower or upper triangle) as the
+C (or A) parameter
Add exposition-only concept noncomplex to
+constrain a scaling factor to be noncomplex, as needed for Hermitian
+rank-1 and rank-k functions
Add Ilya Burylov as coauthor
Change title and abstract to express the wording changes
Add nonwording section explaining why we change rank-1 and rank-2 +updates to be consistent with rank-k and rank-2k updates
Add nonwording sections explaining why we don’t change
+hermitian_matrix_vector_product,
+hermitian_matrix_product,
+triangular_matrix_product, or the triangular
+solves
Add nonwording section explaining why we constrain some scaling +factors to be noncomplex at compile time, instead of taking a run-time +approach
Reorganize and expand nonwording sections
The [linalg] functions matrix_rank_1_update,
+matrix_rank_1_update_c,
+symmetric_rank_1_update,
+hermitian_rank_1_update,
+symmetric_matrix_rank_k_update,
+hermitian_matrix_rank_k_update,
+symmetric_matrix_rank_2k_update, and
+hermitian_matrix_rank_2k_update currently have behavior
+inconsistent with their corresponding BLAS (Basic Linear Algebra
+Subroutines) routines. Also, the behavior of the rank-k and rank-2k
+updates is inconsistent with that of matrix_product, even
+though in mathematical terms they are special cases of a matrix-matrix
+product. We propose three fixes.
Add “updating” overloads to the rank-1, rank-2, rank-k, and
+rank-2k update functions. The new overloads are analogous to the
+updating overloads of matrix_product. For example,
+symmetric_matrix_rank_k_update(A, scaled(beta, C), C, upper_triangle)
+will perform C := βC + AAT.
Change the behavior of the existing rank-1, rank-2, rank-k, and
+rank-2k update functions to be “overwriting” instead of “unconditionally
+updating.” For example,
+symmetric_matrix_rank_k_update(A, C, upper_triangle) will
+perform C = AAT
+instead of C := C + AAT.
For hermitian_rank_1_update and
+hermitian_rank_k_update, constrain the Scalar
+template parameter (if any) to be noncomplex. This ensures that the
+update will be mathematically Hermitian. (A constraint is not needed for
+the rank-2 and rank-2k update functions.)
Items (2) and (3) are breaking changes to the current Working Draft. +Thus, we must finish this before finalization of C++26.
+Each function in any section whose label begins with “linalg.algs” +generally corresponds to one or more routines or functions in the +original BLAS (Basic Linear Algebra Subroutines). Every computation that +the BLAS can do, a function in the C++ Standard Library should be able +to do.
+One std::linalg user
+reported
+an exception to this rule. The BLAS routine DSYRK
+(Double-precision SYmmetric Rank-K update) computes C := βC + αAAT,
+but the corresponding std::linalg function
+symmetric_matrix_rank_k_update only computes C := C + αAAT.
+That is, std::linalg currently has no way to express this
+BLAS operation with a general β scaling factor. This issue applies
+to all of the symmetric and Hermitian rank-k and rank-2k update
+functions.
symmetric_matrix_rank_k_update: computes C := C + αAAThermitian_matrix_rank_k_update: computes C := C + αAAHsymmetric_matrix_rank_2k_update: computes C := C + αABH + αBAHhermitian_matrix_rank_2k_update: computes C := C + αABH + ᾱBAH,
+where ᾱ denotes the complex
+conjugate of αThese functions implement special cases of matrix-matrix products.
+The matrix_product function in std::linalg
+implements the general case of matrix-matrix products. This function
+corresponds to the BLAS’s SGEMM, DGEMM,
+CGEMM, and ZGEMM, which compute C := βC + αAB,
+where α and β are scaling factors. The
+matrix_product function has two kinds of overloads:
overwriting (C = AB) +and
updating (C = E + AB).
The updating overloads handle the general α and β case by
+matrix_product(scaled(alpha, A), B, scaled(beta, C), C).
+The specification explicitly permits the input
+scaled(beta, C) to alias the output C
+([linalg.algs.blas3.gemm] 10: “Remarks:
+C may alias E”). The std::linalg
+library provides overwriting and updating overloads so that it can do
+everything that the BLAS does, just in a more idiomatically C++ way.
+Please see
+P1673R13
+Section 10.3 (“Function argument aliasing and zero scalar
+multipliers”) for a more detailed explanation.
The problem with the current symmetric and Hermitian rank-k and
+rank-2k functions is that they have the same calling syntax as
+the overwriting version of matrix_product, but
+semantics that differ from both the overwriting and the
+updating versions of matrix_product. For example,
hermitian_matrix_rank_k_update(alpha, A, C);updates C with C + αAAH, +but
+matrix_product(scaled(alpha, A), conjugate_transposed(A), C);overwrites C with αAAH. +The current rank-k and rank-2k overloads are not overwriting, so we +can’t just fix this problem by introducing an “updating” overload for +each function.
+Incidentally, the fact that these functions have “update” in their
+name is not relevant, because that naming choice is original to the
+BLAS. The BLAS calls its corresponding xSYRK,
+xHERK, xSYR2K, and xHER2K
+routines “{Symmetric, Hermitian} rank {one, two} update,” even though
+setting β = 0 makes these
+routines “overwriting” in the sense of std::linalg.
We propose to fix this by making the functions work just like
+matrix_vector_product or matrix_product. This
+entails three changes.
Add two new exposition-only concepts
+possibly-packed-in-matrix and
+possibly-packed-out-matrix for constraining input
+and output parameters of the changed or new symmetric and Hermitian
+update functions.
Add “updating” overloads of the symmetric and Hermitian rank-k +and rank-2k update functions.
+The updating overloads take a new input matrix parameter
+E, analogous to the updating overloads of
+matrix_product, and make C an output parameter
+instead of an in/out parameter. For example,
+symmetric_matrix_rank_k_update(A, E, C, upper_triangle)
+computes C = E + AAT.
Explicitly permit C and E to alias,
+thus permitting the desired case where E is
+scaled(beta, C).
The updating overloads take E as a
+possibly-packed-in-matrix, and take C
+as a possibly-packed-out-matrix (instead of a
+possibly-packed-inout-matrix).
E must be accessed as a symmetric or Hermitian
+matrix (depending on the function name) and such accesses must use the
+same triangle as C. (The existing [linalg.general] 4
+wording for symmetric and Hermitian behavior does not cover
+E.)
Change the behavior of the existing symmetric and Hermitian +rank-k and rank-2k overloads to be overwriting instead of updating.
+For example,
+symmetric_matrix_rank_k_update(A, C, upper_triangle) will
+compute C = AAT
+instead of C := C + AAT.
Change C from a
+possibly-packed-inout-matrix to a
+possibly-packed-out-matrix.
Items (2) and (3) are breaking changes to the current Working Draft. +This needs to be so that we can provide the overwriting behavior C := αAAT +or C := αAAH +that the corresponding BLAS routines already provide. Thus, we must +finish this before finalization of C++26.
+Both sets of overloads still only write to the specified triangle
+(lower or upper) of the output matrix C. As a result, the
+new updating overloads only read from that triangle of the input matrix
+E. Therefore, even though E may be a different
+matrix than C, the updating overloads do not need an
+additional Triangle t_E parameter for E. The
+symmetric_* functions interpret E as symmetric
+in the same way that they interpret C as symmetric, and the
+hermitian_* functions interpret E as Hermitian
+in the same way that they interpret C as Hermitian.
+Nevertheless, we do need new wording to explain how the functions may
+interpret and access E.
Rank-1 and rank-2 updates currently unconditionally update and do +not take a β scaling +factor.
We propose making all the rank-1 and rank-2 update functions +consistent with the proposed change to the rank-k and rank-2k updates. +This means both changing the meaning of the current overloads to be +overwriting, and adding new overloads that are updating. This includes +general (nonsymmetric), symmetric, and Hermitian rank-1 update +functions, as well as symmetric and Hermitian rank-2 update +functions.
As a result, the exposition-only concept
+possibly-packed-inout-matrix is no longer needed.
+We propose removing it.
The rank-k and rank-2k update functions have the following rank-1 and +rank-2 analogs, where A +denotes a symmetric or Hermitian matrix (depending on the function’s +name) and x and y denote vectors.
+symmetric_matrix_rank_1_update: computes A := A + αxxThermetian_matrix_rank_1_update: computes A := A + αxxHsymmetric_matrix_rank_2_update: computes A := A + αxyT + αyxThermitian_matrix_rank_2_update: computes A := A + αxyH + ᾱxyHThese functions unconditionally update the matrix A. They do not have an overwriting +option. In this, they follow the “general” (not necessarily symmetric or +Hermitian) rank-1 update functions.
+matrix_rank_1_update: computes A := A + xyTmatrix_rank_1_update_c: computes A := A + xyHThese six rank-1 and rank-2 update functions map to BLAS routines as +follows.
+matrix_rank_1_update: xGERmatrix_rank_1_update: xGERCsymmetric_matrix_rank_1_update: xSYR,
+xSPRhermitian_matrix_rank_1_update: xHER,
+xHPRhermitian_matrix_rank_1_update: xSYR2,
+xSPR2hermitian_matrix_rank_1_update: xHER2,
+xHPR2The Reference BLAS and the BLAS Standard (see Chapter 2, pp. 64 - 68)
+differ here. The Reference BLAS and the original 1988 BLAS 2 paper
+specify all of the rank-1 and rank-2 update routines listed above as
+unconditionally updating, and not taking a β scaling factor. However, the
+(2002) BLAS Standard specifies all of these rank-1 and rank-2 update
+functions as taking a β
+scaling factor. We consider the latter to express our design intent. It
+is also consistent with the corresponding rank-k and rank-2k update
+functions in the BLAS, which all take a β scaling factor and thus can do
+either overwriting (with zero β) or updating (with nonzero β). These routines include
+xSYRK, xHERK, xSYR2K, and
+xHER2K. One could also include the general matrix-matrix
+product xGEMM among these, as xGEMM also takes
+a β scaling factor.
Section
+10.3 of P1673R13 explains the three ways that the std::linalg design
+translates Fortran INTENT(INOUT) arguments into a C++
+idiom.
Provide in-place and not-in-place overloads for triangular solve +and triangular multiply.
“Else, if the BLAS function unconditionally updates (like
+xGER), we retain read-and-write behavior for that
+argument.”
“Else, if the BLAS function uses a scalar beta argument to decide
+whether to read the output argument as well as write to it (like
+xGEMM), we provide two versions: a write-only version (as
+if beta is zero), and a read-and-write version (as if
+beta is nonzero).”
Our design goal was for functions with vector or matrix output to
+imitate std::transform as much as possible. This favors Way
+(3) as the default approach, which turns INTENT(INOUT)
+arguments on the Fortran side into separate input and output parameters
+on the C++ side. Way (2) is really an awkward special case. The BLAS
+Standard effectively eliminates this special case on the Fortran side by
+making the rank-1 and rank-2 updates work just like the rank-k and
+rank-2k updates, with a β
+scaling factor. This makes it natural to eliminate the Way (2) special
+case on the C++ side as well.
These changes make the exposition-only concept
+possibly-packed-inout-matrix superfluous. We
+propose removing it.
Note that this would not eliminate all uses of the exposition-only
+concept inout-matrix. The in-place triangular
+matrix product functions triangular_matrix_left_product and
+triangular_matrix_right_product, and the in-place
+triangular linear system solve functions
+triangular_matrix_matrix_left_solve and
+triangular_matrix_matrix_right_solve would still use
+inout-matrix.
The C++ Working Draft already has Scalar alpha overloads
+of hermitian_rank_k_update. The Scalar type
+currently can be complex. However, if alpha has nonzero
+imaginary part, then αAAH
+may no longer be a Hermitian matrix, even though AAH is
+mathematically always Hermitian. For example, if A is the identity matrix (with ones
+on the diagonal and zeros elsewhere) and α = i, then αAAH
+is the diagonal matrix whose diagonal elements are all i. While that matrix is symmetric,
+it is not Hermitian, because all elements on the diagonal of a Hermitian
+matrix must have nonzero imaginary part. The rank-1 update function
+hermitian_rank_1_update has the analogous issue.
The BLAS solves this problem by having the Hermitian rank-1 update
+routines xHER and rank-k update routines xHERK
+take the scaling factor α as a
+noncomplex number. This suggests a fix: For all
+hermitian_rank_1_update and
+hermitian_rank_k_update overloads that take
+Scalar alpha, constrain Scalar so that it is
+noncomplex. We can avoid introducing new undefined behavior (or “valid
+but unspecified” elements of the output matrix) by making “noncomplex” a
+constraint on the Scalar type of alpha.
+“Noncomplex” should follow the definition of “noncomplex” used by
+conj-if-needed: either an arithmetic type, or
+conj(E) is not ADL-findable for an expression
+E of type Scalar.
This issue does not arise with the rank-2 or rank-2k
+updates. In the BLAS, the rank-2 updates xHER2 and the
+rank-2k updates xHER2K all take alpha as a
+complex number. The matrix αABH + ᾱBAH
+is Hermitian by construction, so there’s no need to impose a
+precondition on the value of α.
Both the current wording and the proposed changes to all these update
+functions behave correctly with respect to beta.
For the new updating overloads of
+hermitian_rank_1_update and
+hermitian_rank_k_update, [linalg] expresses a
+beta scaling factor by letting users supply
+scaled(beta, C) as the argument for E. The
+wording only requires that E be Hermitian. If
+E is scaled(beta, C), this concerns only the
+product of beta and C. It would be incorrect
+to constrain beta or C separately. For
+example, if β = − i
+and C is the matrix whose
+elements are all i, then C is not Hermitian but βC (and therefore
+scaled(beta, C)) is Hermitian.
This issue does not arise with the rank-2k updates. For
+example, the BLAS routine xHER2K takes beta as
+a real number. The previous paragraph’s reasoning for beta
+applies here as well.
This issue also does not arise with the rank-2 updates. In the
+Reference BLAS, the rank-2 update routines xHER2 do not
+have a way to supply beta, while in the BLAS Standard,
+xHER2 does take beta. The BLAS
+Standard says that “α is a
+complex scalar and and [sic] β
+is a real scalar.” The Fortran 77 and C bindings specify the type of
+beta as real (<rtype> resp.
+RSCALAR_IN), but the Fortran 95 binding lists both
+alpha and beta as
+COMPLEX(<wp>). The type of beta in the
+Fortran 95 is likely a typo, considering the wording.
In our view, the current behavior of
+hermitian_matrix_vector_product and
+hermitian_matrix_product is correct with respect to the
+alpha scaling factor. These functions do not need extra
+overloads that take Scalar alpha. Users who want to supply
+alpha with nonzero imaginary part should not scale
+the matrix A (as in scaled(alpha, A)).
+Instead, they should scale the input vector x, as in the
+following.
auto alpha = std::complex{0.0, 1.0};
+hermitian_matrix_vector_product(A, upper_triangle, scaled(alpha, x), y);In Chapter 2 of the BLAS Standard, both xHEMV and
+xHEMM take the scaling factors α and β as complex numbers
+(COMPLEX<wp>, where <wp>
+represents the current working precision). The BLAS permits
+xHEMV or xHEMM to be called with
+alpha whose imaginary part is nonzero. The matrix that the
+BLAS assumes to be Hermitian is A, not αA. Even if A is Hermitian, αA might not necessarily be
+Hermitian. For example, if A
+is the identity matrix (diagonal all ones) and α is i, then αA is not Hermitian but
+skew-Hermitian.
The current [linalg] wording requires that the input matrix be
+Hermitian. This excludes using scaled(alpha, A) as the
+input matrix, where alpha has nonzero imaginary part. For
+example, the following gives mathematically incorrect results.
auto alpha = std::complex{0.0, 1.0};
+hermitian_matrix_vector_product(scaled(alpha, A), upper_triangle, x, y);Note that the behavior of this is still well defined, at least after +applying the fix proposed in LWG4136 for diagonal elements with nonzero +imaginary part. It does not violate a precondition. Therefore, the +Standard has no way to tell the user that they did something wrong.
+The current wording permits introducing the scaling factor
+alpha through the input vector.
auto alpha = std::complex{0.0, 1.0};
+hermitian_matrix_vector_product(A, upper_triangle, scaled(alpha, x), y);This is fine as long as αAx equals Aαx, that is, as
+long as α commutes with the
+elements of A. This issue would only be of concern if those
+multiplications might be noncommutative. We’re already well outside the
+world of “types that do ordinary arithmetic with
+std::complex.” This scenario would only arise given a
+user-defined complex type, like
+user_complex<user_noncommutative> in the example
+below, whose real parts have noncommutative multiplication.
template<class T>
+class user_complex {
+public:
+ user_complex(T re, T im) : re_(re), im_(im) {}
+
+ // ... overloaded arithmetic operators ...
+
+ friend T real(user_complex<T> z) { return z.re_; }
+ friend T imag(user_complex<T> z) { return z.im_; }
+ friend user_complex<T> conj(user_complex<T> z) {
+ return {real(z), -imag(z)};
+ }
+
+private:
+ T re_, im_;
+};
+
+auto alpha = user_complex<user_noncommutative>{something, something_else};
+hermitian_matrix_vector_product(N, upper_triangle, scaled(alpha, x), y);The [linalg] library was designed to support element types with
+noncommutative multiplication. On the other hand, generally, if we speak
+of Hermitian matrices or even of inner products (which are used to
+define Hermitian matrices), we’re working in a vector space. This means
+that multiplication of the matrix’s elements is commutative. Anything
+more general than that is far beyond what the BLAS can do. Thus, we
+think restricting use of alpha with nonzero imaginary part
+to scaled(alpha, x) is not so onerous.
Many users may not like the status quo of needing to scale
+x instead of A. First, it differs from the
+BLAS, which puts alpha before A in its
+xHEMV and xHEMM function arguments. Second,
+users would get no compile-time feedback and likely no run-time feedback
+if they scale A instead of x. Their code would
+compile and produce correct results for almost all matrix-vector or
+matrix-matrix products, except for the Hermitian case, and
+only when the scaling factor has a nonzero imaginary part.
+However, we still think the status quo is the least bad choice. We
+explain why by discussing the following alternatives.
Treat scaled(alpha, A) as a special case: expect
+A to be Hermitian and permit alpha to have
+nonzero imaginary part
Forbid scaled(alpha, A) at compile time, so that
+users must scale x instead
Add overloads that take alpha, analogous to the
+rank-1 and rank-k update functions
The first choice is mathematically incorrect, as we will explain +below. The second is not incorrect, but could only catch some errors. +The third is likewise not incorrect, but would add a lot of overloads +for an uncommon use case, and would still not prevent users from scaling +the matrix in mathematically incorrect ways.
+“Treat scaled(alpha, A) as a special case” actually
+means three special cases, given some nested accessor type
+Acc and a scaling factor alpha of type
+Scalar.
scaled(alpha, A), whose accessor type is
+scaled_accessor<Scalar, Acc>
conjugated(scaled(alpha, A)), whose accessor type is
+conjugated_accessor<scaled_accessor<Scalar, Acc>>
scaled(alpha, conjugated(A)), whose accessor type is
+scaled_accessor<Scalar, conjugated_accessor<Acc>>
One could replace conjugated with
+conjugate_transposed (which we expect to be more common in
+practice) without changing the accessor type.
This approach violates the fundamental [linalg] principle that “…
+each mdspan parameter of a function behaves as itself and
+is not otherwise ‘modified’ by other parameters” (Section 10.2.5,
+P1673R13). The behavior of [linalg] is agnostic of specific accessor
+types, even though implementations likely have optimizations for
+specific accessor types. [linalg] takes this approach for consistency,
+even where it results in different behavior from the BLAS (see Section
+10.5.2 of P1673R13). The application of this principle here is “the
+input parameter A is always Hermitian.”
In this case, the consistency matters for mathematical correctness.
+What if scaled(alpha, A) is Hermitian, but A
+itself is not? An example is α = − i and A is the 2 x 2 matrix whose elements
+are all i. If we treat
+scaled_accessor as a special case, then
+hermitian_matrix_vector_product will compute different
+results.
Another problem is that users are permitted to define their own
+accessor types, including nested accessors. Arbitrary user accessors
+might have scaled_accessor somewhere in that nesting, or
+they might have the effect of scaled_accessor
+without being called that. Thus, we can’t detect all possible ways that
+the matrix might be scaled.
Hermitian matrix-matrix and matrix-vector product functions could +instead forbid scaling the matrix at compile time, at least for +the three accessor type cases listed in the previous option.
+scaled_accessor<Scalar, Acc>
conjugated_accessor<scaled_accessor<Scalar, Acc>>
scaled_accessor<Scalar, conjugated_accessor<Acc>>
Doing this would certainly encourage correct behavior for the most +common cases. However, as mentioned above, users are permitted to define +their own accessor types, including nested accessors. Thus, we can’t +detect all ways that an arbitrary, possibly user-defined accessor might +scale the matrix. Furthermore, scaling the matrix might still be +mathematically correct. A scaling factor with nonzero imaginary part +might even make the matrix Hermitian. Applying i as a scaling factor twice would +give a perfectly valid scaling factor of − 1.
+One could imagine adding overloads that take alpha,
+analogous to the rank-1 and rank-k update overloads that take
+alpha. Users could then write
hermitian_matrix_vector_product(alpha, A, upper_triangle, x, y);instead of
+hermitian_matrix_vector_product(A, upper_triangle, scaled(alpha, x), y);We do not support this approach. First, it would introduce many
+overloads, without adding to what the existing overloads can accomplish.
+(Users can already introduce the scaling factor alpha
+through x.) Contrast this with the rank-1 and rank-k
+Hermitian update functions, where not having alpha
+overloads might break simple cases. Here are two examples.
If the matrix and vector element types and the scaling factor
+α = 2 are all integers, then
+the update C := C − 2xxH
+can be computed using integer types and arithmetic with an
+alpha overload. However, without an alpha
+overload, the user would need to use scaled(sqrt(2), x) as
+the input vector, thus forcing use of floating-point arithmetic that may
+give inexact results.
The update C := C − xxH +would require resorting to complex arithmetic, as the only way to +express − xxH +with the same scaling factor for both vectors is (ix)(ix)H.
Second, alpha overloads would not prevent users from
+also supplying scaled(gamma, A) as the matrix for
+some other scaling factor gamma. Thus, instead of solving
+the problem, the overloads would introduce more possibilities for
+errors.
Scalar
+is noncomplex but conj is ADL-findable?Our proposed change defines a “noncomplex number” at compile time. We
+say that complex numbers have conj that is findable by ADL,
+and noncomplex numbers are either arithmetic types or do not have an
+ADL-findable conj. We choose this definition because it is
+the same one that we use to define the behavior of
+conjugated_accessor (and also conjugated, if
+P3050 is adopted). It also is the C++ analog of what the BLAS does,
+namely specify the type of the alpha argument as real
+instead of complex.
This definition is conservative, because it excludes complex numbers
+with zero imaginary part. For conjugated_accessor and
+conjugated, this does not matter; the class and function
+behave the same from the user’s perspective. The exposition-only
+function conj-if-needed specifically exists so
+that conjugated_accessor and conjugated do not
+change their input mdspan’s value_type.
+However, for the rank-1 and rank-k Hermitian update functions affected
+by this proposal, constraining Scalar alpha at compile time
+to be noncomplex prevents users from calling those functions with a
+“complex” number alpha whose imaginary part is zero.
This matters if the user defines a number type Real that
+is meant to represent noncomplex numbers, but nevertheless has an
+ADL-findable conj, thus making it a “complex” number type
+from the perspective of [linalg] functions. There are two ways users
+might define conj(Real).
Imitating std::complex: Users might define
+a complex number type UserComplex whose real and imaginary
+parts have type Real, and then imitate the behavior of
+std::conj(double) by defining
+UserComplex conj(Real x) to return a
+UserComplex number with real part x and
+imaginary part zero.
Type-preserving: Real conj(Real x) returns
+x.
Option (1) would be an unfortunate choice. [linalg] defines
+conj-if-needed specifically to fix the problem
+that std::conj(double) returns
+std::complex<double> instead of double.
+However, Option (2) would be a reasonable thing for users to do,
+especially if they have designed custom number types without [linalg] in
+mind. One could accommodate such users by relaxing the constraint on
+Scalar and taking one of the following two approaches.
Adding a precondition that
+imag-if-needed(alpha) equals
+Scalar{}
Imitating LWG
+4136, by defining the scaling factor to be
+real-if-needed(alpha) instead of
+alpha
We did not take Approach (1), because adding a precondition decreases
+safety by adding undefined behavior. It also forces users to add
+run-time checks. Defining those checks correctly for generic, possibly
+but not necessarily complex number types would be challenging. We did
+not take Approach (2) because its behavior would deviate from the BLAS,
+which requires the scaling factor alpha to be noncomplex at
+compile time.
In BLAS, triangular matrix-vector and matrix-matrix products
+apply alpha scaling to the implicit unit diagonal. In
+[linalg], the scaling factor alpha is not applied to the
+implicit unit diagonal. This is because the library does not interpret
+scaled(alpha, A) differently than any other
+mdspan.
Users of triangular matrix-vector products can recover BLAS +functionality by scaling the input vector instead of the input matrix, +so this only matters for triangular matrix-matrix products.
All calls of the BLAS’s triangular matrix-matrix product routine
+xTRMM in LAPACK (other than in testing routines) use
+alpha equal to one.
Straightforward approaches for fixing this issue would not break +backwards compatibility.
Therefore, we do not consider fixing this a high-priority issue, +and we do not propose a fix for it in this paper.
The triangular_matrix_vector_product and
+triangular_matrix_product algorithms have an
+implicit_unit_diagonal option. This makes the algorithm not
+access the diagonal of the matrix, and compute as if the diagonal were
+all ones. The option corresponds to the BLAS’s “Unit” flag. BLAS
+routines that take both a “Unit” flag and an alpha scaling
+factor apply “Unit” before scaling by alpha, so
+that the matrix is treated as if it has a diagonal of all
+alpha values. In contrast, [linalg] follows the general
+principle that scaled(alpha, A) should be treated like any
+other kind of mdspan. As a result, algorithms interpret
+implicit_unit_diagonal as applied to the matrix
+after scaling by alpha, so that the matrix still
+has a diagonal of all ones.
The triangular solve algorithms in std::linalg are not affected,
+because their BLAS analogs either do not take an alpha
+argument (as with xTRSV), or the alpha
+argument does not affect the triangular matrix (with xTRSM,
+alpha affects the right-hand sides B, not the
+triangular matrix A).
This issue only reduces functionality of
+triangular_matrix_product. Users of
+triangular_matrix_vector_product who wish to replicate the
+original BLAS functionality can scale the input matrix (by supplying
+scaled(alpha, x) instead of x as the input
+argument) instead of the triangular matrix.
The following example computes A := 2AB where +A is a lower triangular +matrix, but it makes the diagonal of A all ones on the input (right-hand) +side.
+triangular_matrix_product(scaled(2.0, A), lower_triangle, implicit_unit_diagonal, B, A);Contrast with the analogous BLAS routine DTRMM, which
+has the effect of making the diagonal elements all 2.0.
dtrmm('Left side', 'Lower triangular', 'No transpose', 'Unit diagonal', m, n, 2.0, A, lda, B, ldb)If we want to use [linalg] to express what the BLAS call expresses, +we need to perform a separate scaling.
+triangular_matrix_product(A, lower_triangle, implicit_unit_diagonal, B, A);
+scale(2.0, A);This is counterintuitive, and may also affect performance.
+Performance of scale is typically bound by memory bandwidth
+and/or latency, but if the work done by scale could be
+fused with the work done by the triangular_matrix_product,
+then scale’s memory operations could be “hidden” in the
+cost of the matrix product.
xTRMM with the implicit unit diagonal option and
+alpha not equal to oneHow much might users care about this missing [linalg] feature? +P1673R13 explains that the BLAS was codesigned with LAPACK and that +every reference BLAS routine is used by some LAPACK routine. “The BLAS +does not aim to provide a complete set of mathematical operations. Every +function in the BLAS exists because some LINPACK or LAPACK algorithm +needs it” (Section 10.6.1). Therefore, to judge the urgency of adding +new functionality to [linalg], we can ask whether the functionality +would be needed by a C++ re-implementation of LAPACK. We think not much, +because the highest-priority target audience of the BLAS is LAPACK +developers, and LAPACK routines (other than testing routines) never use +a scaling factor alpha other than one.
+We survey calls to xTRMM in the latest version of LAPACK
+as of the publication date of R1 of this proposal, LAPACK 3.12.0. It
+suffices to survey DTRMM, the double-precision real case,
+since for all the routines of interest, the complex case follows the
+same pattern. (We did survey ZTRMM, the double-precision
+complex case, just in case.) LAPACK has 24 routines that call
+DTRMM directly. They fall into five categories.
Test routines: DCHK3, DCHKE,
+DLARHS
Routines relating to QR factorization or using the result of a QR
+factorization (especially with block Householder reflectors):
+DGELQT3, DLARFB, DGEQRT3,
+DLARFB_GETT, DLARZB,
+DORM22
Routines relating to computing an inverse of a triangular matrix
+or of a matrix that has been factored into triangular matrices:
+DLAUUM, DTRITRI, DTFTRI,
+DPFTRI
Routines relating to solving eigenvalue (or generalized
+eigenvalue) problems: DLAHR2, DSYGST,
+DGEHRD, DSYGV, DSYGV_2STAGE,
+DSYGVD, DSYGVX (note that DLAQR5
+depends on DTRMM via EXTERNAL declaration, but
+doesn’t actually call it)
Routines relating to symmetric indefinite factorizations:
+DSYT01_AA, DSYTRI2X,
+DSYTRI_3X
The only routines that call DTRMM with
+alpha equal to anything other than one or negative one are
+the testing routines. Some calls in DGELQT3 and
+DLARFB_GETT use negative one, but these calls never specify
+an implicit unit diagonal (they use the explicit diagonal option). The
+only routine that might possibly call DTRMM with both
+negative one as alpha and the implicit unit diagonal is
+DTFTRI. (This routine “computes the inverse of a triangular
+matrix A stored in RFP [Rectangular Full Packed] format.” RFP format was
+introduced to LAPACK in the late 2000’s, well after the BLAS Standard
+was published. See
+LAPACK
+Working Note 199, which was published in 2008.) DTFTRI
+passes its caller’s diag argument (which specifies either
+implicit unit diagonal or explicit diagonal) to DTRMM. The
+only two LAPACK routines that call DTFTRI are
+DERRRFP (a testing routine) and DPFTRI.
+DPFTRI only ever calls DTFTRI with
+diag not specifying the implicit unit diagonal
+option. Therefore, LAPACK never needs both alpha not equal
+to one and the implicit unit diagonal option, so adding the ability to
+“scale the implicit diagonal” in [linalg] is a low-priority feature.
We can think of two ways to fix this issue. First, we could add an
+alpha scaling parameter, analogous to the symmetric and
+Hermitian rank-1 and rank-k update functions. Second, we could add a new
+kind of Diagonal template parameter type that expresses a
+“diagonal value.” For example, implicit_diagonal_t{alpha}
+(or a function form, implicit_diagonal(alpha)) would tell
+the algorithm not to access the diagonal elements, but instead to assume
+that their value is alpha. Both of these solutions would
+let users specify the diagonal’s scaling factor separately from the
+scaling factor for the rest of the matrix. Those two scaling factors
+could differ, which is new functionality not offered by the BLAS. More
+importantly, both of these solutions could be added later, after C++26,
+without breaking backwards compatibility.
In BLAS, triangular solves with possibly multiple right-hand
+sides (xTRSM) apply alpha scaling to the
+implicit unit diagonal. In [linalg], the scaling factor
+alpha is not applied to the implicit unit diagonal. This is
+because the library does not interpret scaled(alpha, A)
+differently than any other mdspan.
Users of triangular solves would need a separate
+scale call to recover BLAS functionality.
LAPACK sometimes calls xTRSM with alpha
+not equal to one.
Straightforward approaches for fixing this issue would not break +backwards compatibility.
Therefore, we do not consider fixing this a high-priority issue, +and we do not propose a fix for it in this paper.
Triangular solves have a similar issue to the one explained in the
+previous section. The BLAS routine xTRSM applies
+alpha “after” the implicit unit diagonal, while std::linalg
+applies alpha “before.” (xTRSV does not take
+an alpha scaling factor.) As a result, the BLAS solves with
+a different matrix than std::linalg.
In mathematical terms, xTRSM solves the equation α(A+I)X = B
+for X, where A is the user’s input matrix
+(without implicit unit diagonal) and I is the identity matrix (with ones
+on the diagonal and zeros everywhere else).
+triangular_matrix_matrix_left_solve solves the equation
+(αA+I)Y = B
+for Y. The two results X and Y are not equal in general.
Users could work around this problem by first scaling the matrix
+A by α, and then solving for Y. In the common case where the
+“other triangle” of A holds
+another triangular matrix, users could not call
+scale(alpha, A). They would instead need to iterate over
+the elements of A manually.
+Users might also need to “unscale” the matrix after the solve. Another
+option would be to copy the matrix A before scaling.
for (size_t i = 0; i < A.extent(0); ++i) {
+ for (size_t j = i + 1; j < A.extent(1); ++j) {
+ A[i, j] *= alpha;
+ }
+}
+triangular_matrix_matrix_left_solve(A, lower_triangle, implicit_unit_diagonal, B, Y);
+for (size_t i = 0; i < A.extent(0); ++i) {
+ for (size_t j = i + 1; j < A.extent(1); ++j) {
+ A[i, j] /= alpha;
+ }
+}Users cannot solve this problem by scaling B (either with
+scaled(1.0 / alpha, B) or with
+scale(1.0 / alpha, B)). Transforming X into Y or vice versa is mathematically
+nontrivial in general, and may introduce new failure conditions. This
+issue occurs with both the in-place and out-of-place triangular
+solves.
The common case in LAPACK is calling xTRSM with
+alpha equal to one, but other values of alpha
+occur. For example, xTRTRI calls xTRSM with
+alpha equal to − 1. Thus,
+we cannot dismiss this issue, as we could with xTRMM.
As with triangular matrix products above, we can think of two ways to
+fix this issue. First, we could add an alpha scaling
+parameter, analogous to the symmetric and Hermitian rank-1 and rank-k
+update functions. Second, we could add a new kind of
+Diagonal template parameter type that expresses a “diagonal
+value.” For example, implicit_diagonal_t{alpha} (or a
+function form, implicit_diagonal(alpha)) would tell the
+algorithm not to access the diagonal elements, but instead to assume
+that their value is alpha. Both of these solutions would
+let users specify the diagonal’s scaling factor separately from the
+scaling factor for the rest of the matrix. Those two scaling factors
+could differ, which is new functionality not offered by the BLAS. More
+importantly, both of these solutions could be added later, after C++26,
+without breaking backwards compatibility.
We currently have two other std::linalg fix papers in
+review.
P3222: Fix C++26 by adding transposed special cases
+for P2642 layouts (forwarded by LEWG to LWG on 2024-08-27 pending
+electronic poll results)
P3050: “Fix C++26 by optimizing linalg::conjugated
+for noncomplex value types” (forwarded by LEWG to LWG on 2024-09-03
+pending electronic poll results)
LEWG was aware of these two papers and this pending paper P3371 in
+its 2024-09-03 review of P3050R2. All three of these papers increment
+the value of the __cpp_lib_linalg macro. While this
+technically causes a conflict between the papers, advice from LEWG on
+2024-09-03 was not to introduce special wording changes to avoid this
+conflict.
We also have two outstanding LWG issues.
+LWG4136 +specifies the behavior of Hermitian algorithms on diagonal matrix +elements with nonzero imaginary part. (As the BLAS Standard specifies +and the Reference BLAS implements, the Hermitian algorithms do not +access the imaginary parts of diagonal elements, and assume they are +zero.) In our view, P3371 does not conflict with LWG4136.
LWG4137, +“Fix Mandates, Preconditions, and Complexity elements of [linalg] +algorithms,” affects several sections touched by this proposal, +including [linalg.algs.blas3.rankk] and [linalg.algs.blas3.rank2k]. We +consider P3371 rebased atop the wording changes proposed by LWG4137. +While the wording changes may conflict in a formal (“diff”) sense, it is +our view that they do not conflict in a mathematical or specification +sense.
Many thanks (with permission) to Raffaele Solcà (CSCS Swiss National
+Supercomputing Centre, raffaele.solca@cscs.ch) for pointing
+out some of the issues fixed by this paper, as well as the issues
+leading to LWG4137.
++Text in blockquotes is not proposed wording, but rather instructions +for generating proposed wording. The � character is used to denote a +placeholder section number which the editor shall determine.
+In [version.syn], for the following definition,
+
#define __cpp_lib_linalg YYYYMML // also in <linalg>++adjust the placeholder value
+YYYYMMLas needed so as to +denote this proposal’s date of adoption.
++Add the following lines to the header
+<linalg>+synopsis [linalg.syn], just after the declaration of +the exposition-only conceptinout-matrixand +before the declaration of the exposition-only concept +possibly-packed-inout-matrix. [Editorial +Note: This addition is not shown in green, becuase the authors could +not convince Markdown to format the code correctly. – end +note]
template<class T>
+ concept possibly-packed-in-matrix = see below; // exposition only
+
+ template<class T>
+ concept possibly-packed-out-matrix = see below; // exposition only++Then, remove the declaration of the exposition-only concept +
+possibly-packed-inout-matrixfrom the header +<linalg>synopsis [linalg.syn].
++Then, add the following lines to +[linalg.helpers.concepts], just after the definition of +the exposition-only variable template +
+is-layout_blas_packedand just before the +definition of the exposition-only concept +possibly-packed-inout-matrix. [Editorial +Note: This addition is not shown in green, becuase the authors could +not convince Markdown to format the code correctly. – end +note]
template<class T>
+ concept possibly-packed-in-matrix =
+ is-mdspan<T> && T::rank() == 2 &&
+ (T::is_always_unique() || is-layout-blas-packed<typename T::layout_type>);
+
+ template<class T>
+ concept possibly-packed-out-matrix =
+ is-mdspan<T> && T::rank() == 2 &&
+ is_assignable_v<typename T::reference, typename T::element_type> &&
+ (T::is_always_unique() || is-layout-blas-packed<typename T::layout_type>);++Then, remove the definition of the exposition-only concept +
+possibly-packed-inout-matrixfrom +[linalg.helpers.concepts].
++In the Header
+<linalg>synopsis [linalg.syn], at +the end of the section started by the following comment:+
// [linalg.helpers.concepts], linear algebra argument concepts,add the following declaration of the exposition-only concept +
+noncomplex. [Editorial Note: This addition +is not shown in green, becuase the authors could not convince Markdown +to format the code correctly. – end note]
template<class T>
+ concept noncomplex = see below; // exposition only++In [linalg.helpers.concepts], change paragraph 3 to read as follows +(new content in green).
+
Unless explicitly permitted, any inout-vector,
+inout-matrix, inout-object,
+out-vector, out-matrix,
+out-object, possibly-packed-out-matrix,
+or possibly-packed-inout-matrix parameter of a
+function in [linalg] shall not overlap any other mdspan
+parameter of the function.
++Append the following to the end of [linalg.helpers.concepts]. +[Editorial Note: These additions are not shown in green, becuase +the authors could not convince Markdown to format the code correctly. +– end note]
+
template<class T>
+ concept noncomplex = see below;4
+A type T models noncomplex if
+T is a linear algebra value type, and either
(4.1)
+T is not an arithmetic type, or
(4.2) the
+expression conj(E) is not valid, with overload resolution
+performed in a context that includes the declaration
+template<class T> T conj(const T&) = delete;.
++In the header
+<linalg>synopsis +[linalg.syn], replace all the declarations of all the +matrix_rank_1_update,matrix_rank_1_update_c, +symmetric_matrix_rank_1_update, and +hermitian_matrix_rank_1_updateoverloads to read as +follows. [Editorial Note: There are three changes here. First, +the existing overloads become “overwriting” overloads. Second, new +“updating” overloads are added. Third, the +hermitian_rank_1_updatefunctions that take an +alphaparameter now constrainalphato be +noncomplex.Changes do not use red or green highlighting, becuase the authors +could not convince Markdown to format the code correctly. – end +note]
+
// [linalg.algs.blas2.rank1], nonsymmetric rank-1 matrix update
+
+ // overwriting nonsymmetric rank-1 matrix update
+ template<in-vector InVec1, in-vector InVec2, out-matrix OutMat>
+ void matrix_rank_1_update(InVec1 x, InVec2 y, OutMat A);
+ template<class ExecutionPolicy, in-vector InVec1, in-vector InVec2, out-matrix OutMat>
+ void matrix_rank_1_update(ExecutionPolicy&& exec,
+ InVec1 x, InVec2 y, OutMat A);
+
+ template<in-vector InVec1, in-vector InVec2, out-matrix OutMat>
+ void matrix_rank_1_update_c(InVec1 x, InVec2 y, OutMat A);
+ template<class ExecutionPolicy, in-vector InVec1, in-vector InVec2, out-matrix OutMat>
+ void matrix_rank_1_update_c(ExecutionPolicy&& exec,
+ InVec1 x, InVec2 y, OutMat A);
+
+ // updating nonsymmetric rank-1 matrix update
+ template<in-vector InVec1, in-vector InVec2, in-matrix InMat, out-matrix OutMat>
+ void matrix_rank_1_update(InVec1 x, InVec2 y, InMat E, OutMat A);
+ template<class ExecutionPolicy, in-vector InVec1, in-matrix InMat, in-vector InVec2, out-matrix OutMat>
+ void matrix_rank_1_update(ExecutionPolicy&& exec,
+ InVec1 x, InVec2 y, InMat E, OutMat A);
+
+ template<in-vector InVec1, in-vector InVec2, in-matrix InMat, out-matrix OutMat>
+ void matrix_rank_1_update_c(InVec1 x, InVec2 y, InMat E, OutMat A);
+ template<class ExecutionPolicy, in-vector InVec1, in-vector InVec2, in-matrix InMat, out-matrix OutMat>
+ void matrix_rank_1_update_c(ExecutionPolicy&& exec,
+ InVec1 x, InVec2 y, InMat E, OutMat A);
+
+ // [linalg.algs.blas2.symherrank1], symmetric or Hermitian rank-1 matrix update
+
+ // overwriting symmetric rank-1 matrix update
+ template<class Scalar, in-vector InVec, possibly-packed-out-matrix OutMat, class Triangle>
+ void symmetric_matrix_rank_1_update(Scalar alpha, InVec x, OutMat A, Triangle t);
+ template<class ExecutionPolicy,
+ class Scalar, in-vector InVec, possibly-packed-out-matrix OutMat, class Triangle>
+ void symmetric_matrix_rank_1_update(ExecutionPolicy&& exec,
+ Scalar alpha, InVec x, OutMat A, Triangle t);
+ template<in-vector InVec, possibly-packed-out-matrix OutMat, class Triangle>
+ void symmetric_matrix_rank_1_update(InVec x, OutMat A, Triangle t);
+ template<class ExecutionPolicy,
+ in-vector InVec, possibly-packed-out-matrix OutMat, class Triangle>
+ void symmetric_matrix_rank_1_update(ExecutionPolicy&& exec,
+ InVec x, OutMat A, Triangle t);
+
+ // updating symmetric rank-1 matrix update
+ template<class Scalar, in-vector InVec, possibly-packed-in-matrix InMat, possibly-packed-out-matrix OutMat, class Triangle>
+ void symmetric_matrix_rank_1_update(Scalar alpha, InVec x, InMat E, OutMat A, Triangle t);
+ template<class ExecutionPolicy,
+ class Scalar, in-vector InVec, possibly-packed-in-matrix InMat, possibly-packed-out-matrix OutMat, class Triangle>
+ void symmetric_matrix_rank_1_update(ExecutionPolicy&& exec,
+ Scalar alpha, InVec x, InMat E, OutMat A, Triangle t);
+ template<in-vector InVec, possibly-packed-in-matrix InMat, possibly-packed-out-matrix OutMat, class Triangle>
+ void symmetric_matrix_rank_1_update(InVec x, InMat E, OutMat A, Triangle t);
+ template<class ExecutionPolicy,
+ in-vector InVec, possibly-packed-in-matrix InMat, possibly-packed-out-matrix OutMat, class Triangle>
+ void symmetric_matrix_rank_1_update(ExecutionPolicy&& exec,
+ InVec x, InMat E, OutMat A, Triangle t);
+
+ // overwriting Hermitian rank-1 matrix update
+ template<noncomplex Scalar, in-vector InVec, possibly-packed-out-matrix OutMat, class Triangle>
+ void hermitian_matrix_rank_1_update(Scalar alpha, InVec x, OutMat A, Triangle t);
+ template<class ExecutionPolicy,
+ noncomplex Scalar, in-vector InVec, possibly-packed-out-matrix OutMat, class Triangle>
+ void hermitian_matrix_rank_1_update(ExecutionPolicy&& exec,
+ Scalar alpha, InVec x, OutMat A, Triangle t);
+ template<in-vector InVec, possibly-packed-out-matrix OutMat, class Triangle>
+ void hermitian_matrix_rank_1_update(InVec x, OutMat A, Triangle t);
+ template<class ExecutionPolicy,
+ in-vector InVec, possibly-packed-out-matrix OutMat, class Triangle>
+ void hermitian_matrix_rank_1_update(ExecutionPolicy&& exec,
+ InVec x, OutMat A, Triangle t);
+
+ // updating Hermitian rank-1 matrix update
+ template<noncomplex Scalar, in-vector InVec, possibly-packed-in-matrix InMat, possibly-packed-out-matrix OutMat, class Triangle>
+ void hermitian_matrix_rank_1_update(Scalar alpha, InVec x, InMat E, OutMat A, Triangle t);
+ template<class ExecutionPolicy,
+ noncomplex Scalar, in-vector InVec, possibly-packed-in-matrix InMat, possibly-packed-out-matrix OutMat, class Triangle>
+ void hermitian_matrix_rank_1_update(ExecutionPolicy&& exec,
+ Scalar alpha, InVec x, InMat E, OutMat A, Triangle t);
+ template<in-vector InVec, possibly-packed-in-matrix InMat, possibly-packed-out-matrix OutMat, class Triangle>
+ void hermitian_matrix_rank_1_update(InVec x, InMat E, OutMat A, Triangle t);
+ template<class ExecutionPolicy,
+ in-vector InVec, possibly-packed-in-matrix InMat, possibly-packed-out-matrix OutMat, class Triangle>
+ void hermitian_matrix_rank_1_update(ExecutionPolicy&& exec,
+ InVec x, InMat E, OutMat A, Triangle t);++In the header
+<linalg>synopsis +[linalg.syn], replace all the declarations of all the +symmetric_matrix_rank_2_updateand +hermitian_matrix_rank_2_updateoverloads to read as +follows. [Editorial Note: These additions are not shown in green, +becuase the authors could not convince Markdown to format the code +correctly. – end note]
// [linalg.algs.blas2.rank2], symmetric and Hermitian rank-2 matrix updates
+
+ // overwriting symmetric rank-2 matrix update
+ template<in-vector InVec1, in-vector InVec2,
+ possibly-packed-out-matrix OutMat, class Triangle>
+ void symmetric_matrix_rank_2_update(InVec1 x, InVec2 y, OutMat A, Triangle t);
+ template<class ExecutionPolicy, in-vector InVec1, in-vector InVec2,
+ possibly-packed-out-matrix OutMat, class Triangle>
+ void symmetric_matrix_rank_2_update(ExecutionPolicy&& exec,
+ InVec1 x, InVec2 y, OutMat A, Triangle t);
+
+ // updating symmetric rank-2 matrix update
+ template<in-vector InVec1, in-vector InVec2,
+ possibly-packed-in-matrix InMat,
+ possibly-packed-out-matrix OutMat, class Triangle>
+ void symmetric_matrix_rank_2_update(InVec1 x, InVec2 y, InMat E, OutMat A, Triangle t);
+ template<class ExecutionPolicy, in-vector InVec1, in-vector InVec2,
+ possibly-packed-in-matrix InMat,
+ possibly-packed-out-matrix OutMat, class Triangle>
+ void symmetric_matrix_rank_2_update(ExecutionPolicy&& exec,
+ InVec1 x, InVec2 y, InMat E, OutMat A, Triangle t);
+
+ // overwriting Hermitian rank-2 matrix update
+ template<in-vector InVec1, in-vector InVec2,
+ possibly-packed-out-matrix OutMat, class Triangle>
+ void hermitian_matrix_rank_2_update(InVec1 x, InVec2 y, OutMat A, Triangle t);
+ template<class ExecutionPolicy, in-vector InVec1, in-vector InVec2,
+ possibly-packed-out-matrix OutMat, class Triangle>
+ void hermitian_matrix_rank_2_update(ExecutionPolicy&& exec,
+ InVec1 x, InVec2 y, OutMat A, Triangle t);
+
+ // updating Hermitian rank-2 matrix update
+ template<in-vector InVec1, in-vector InVec2,
+ possibly-packed-in-matrix InMat,
+ possibly-packed-out-matrix OutMat, class Triangle>
+ void hermitian_matrix_rank_2_update(InVec1 x, InVec2 y, InMat E, OutMat A, Triangle t);
+ template<class ExecutionPolicy, in-vector InVec1, in-vector InVec2,
+ possibly-packed-in-matrix InMat,
+ possibly-packed-out-matrix OutMat, class Triangle>
+ void hermitian_matrix_rank_2_update(ExecutionPolicy&& exec,
+ InVec1 x, InVec2 y, InMat E, OutMat A, Triangle t);++In the header
+<linalg>synopsis +[linalg.syn], replace all the declarations of all the +symmetric_matrix_rank_k_updateand +hermitian_matrix_rank_k_updateoverloads to read as +follows. [Editorial Note: These additions are not shown in green, +becuase the authors could not convince Markdown to format the code +correctly. – end note]
// [linalg.algs.blas3.rankk], rank-k update of a symmetric or Hermitian matrix
+
+ // overwriting symmetric rank-k matrix update
+ template<class Scalar,
+ in-matrix InMat,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+ void symmetric_matrix_rank_k_update(
+ Scalar alpha, InMat A, OutMat C, Triangle t);
+ template<class ExecutionPolicy, class Scalar,
+ in-matrix InMat,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+ void symmetric_matrix_rank_k_update(
+ ExecutionPolicy&& exec, Scalar alpha, InMat A, OutMat C, Triangle t);
+ template<in-matrix InMat,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+ void symmetric_matrix_rank_k_update(
+ InMat A, OutMat C, Triangle t);
+ template<class ExecutionPolicy,
+ in-matrix InMat,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+ void symmetric_matrix_rank_k_update(
+ ExecutionPolicy&& exec, InMat A, OutMat C, Triangle t);
+
+ // updating symmetric rank-k matrix update
+ template<class Scalar,
+ in-matrix InMat1,
+ possibly-packed-in-matrix InMat2,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+ void symmetric_matrix_rank_k_update(
+ Scalar alpha,
+ InMat1 A, InMat2 E, OutMat C, Triangle t);
+ template<class ExecutionPolicy, class Scalar,
+ in-matrix InMat1,
+ possibly-packed-in-matrix InMat2,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+ void symmetric_matrix_rank_k_update(
+ ExecutionPolicy&& exec, Scalar alpha,
+ InMat1 A, InMat2 E, OutMat C, Triangle t);
+ template<in-matrix InMat1,
+ possibly-packed-in-matrix InMat2,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+ void symmetric_matrix_rank_k_update(
+ InMat1 A, InMat2 E, OutMat C, Triangle t);
+ template<class ExecutionPolicy,
+ in-matrix InMat1,
+ possibly-packed-in-matrix InMat2,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+ void symmetric_matrix_rank_k_update(
+ ExecutionPolicy&& exec,
+ InMat1 A, InMat2 E, OutMat C, Triangle t);
+
+ // overwriting rank-k Hermitian matrix update
+ template<noncomplex Scalar,
+ in-matrix InMat,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+ void hermitian_matrix_rank_k_update(
+ Scalar alpha, InMat A, OutMat C, Triangle t);
+ template<class ExecutionPolicy, noncomplex Scalar,
+ in-matrix InMat,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+ void hermitian_matrix_rank_k_update(
+ ExecutionPolicy&& exec, Scalar alpha, InMat A, OutMat C, Triangle t);
+ template<in-matrix InMat,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+ void hermitian_matrix_rank_k_update(
+ InMat A, OutMat C, Triangle t);
+ template<class ExecutionPolicy,
+ in-matrix InMat,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+ void hermitian_matrix_rank_k_update(
+ ExecutionPolicy&& exec, InMat A, OutMat C, Triangle t);
+
+ // updating rank-k Hermitian matrix update
+ template<noncomplex Scalar,
+ in-matrix InMat1,
+ possibly-packed-in-matrix InMat2,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+ void hermitian_matrix_rank_k_update(
+ Scalar alpha,
+ InMat1 A, InMat2 E, OutMat C, Triangle t);
+ template<class ExecutionPolicy, noncomplex Scalar,
+ in-matrix InMat1,
+ possibly-packed-in-matrix InMat2,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+ void hermitian_matrix_rank_k_update(
+ ExecutionPolicy&& exec, Scalar alpha,
+ InMat1 A, InMat2 E, OutMat C, Triangle t);
+ template<in-matrix InMat1,
+ possibly-packed-in-matrix InMat2,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+ void hermitian_matrix_rank_k_update(
+ InMat1 A, InMat2 E, OutMat C, Triangle t);
+ template<class ExecutionPolicy,
+ in-matrix InMat1,
+ possibly-packed-in-matrix InMat2,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+ void hermitian_matrix_rank_k_update(
+ ExecutionPolicy&& exec,
+ InMat1 A, InMat2 E, OutMat C, Triangle t);++In the header
+<linalg>synopsis +[linalg.syn], replace all the declarations of all the +symmetric_matrix_rank_2k_updateand +hermitian_matrix_rank_2k_updateoverloads to read as +follows. [Editorial Note: These additions are not shown in green, +becuase the authors could not convince Markdown to format the code +correctly. – end note]
// [linalg.algs.blas3.rank2k], rank-2k update of a symmetric or Hermitian matrix
+
+ // overwriting symmetric rank-2k matrix update
+ template<in-matrix InMat1, in-matrix InMat2,
+ possibly-packed-out-matrix OutMat, class Triangle>
+ void symmetric_matrix_rank_2k_update(InMat1 A, InMat2 B, OutMat C, Triangle t);
+ template<class ExecutionPolicy,
+ in-matrix InMat1, in-matrix InMat2,
+ possibly-packed-out-matrix OutMat, class Triangle>
+ void symmetric_matrix_rank_2k_update(ExecutionPolicy&& exec,
+ InMat1 A, InMat2 B, OutMat C, Triangle t);
+
+ // updating symmetric rank-2k matrix update
+ template<in-matrix InMat1, in-matrix InMat2,
+ possibly-packed-in-matrix InMat3,
+ possibly-packed-out-matrix OutMat, class Triangle>
+ void symmetric_matrix_rank_2k_update(InMat1 A, InMat2 B, InMat3 E, OutMat C, Triangle t);
+ template<class ExecutionPolicy,
+ in-matrix InMat1, in-matrix InMat2,
+ possibly-packed-in-matrix InMat3,
+ possibly-packed-out-matrix OutMat, class Triangle>
+ void symmetric_matrix_rank_2k_update(ExecutionPolicy&& exec,
+ InMat1 A, InMat2 B, InMat3 E, OutMat C, Triangle t);
+
+ // overwriting Hermitian rank-2k matrix update
+ template<in-matrix InMat1, in-matrix InMat2,
+ possibly-packed-out-matrix OutMat, class Triangle>
+ void hermitian_matrix_rank_2k_update(InMat1 A, InMat2 B, OutMat C, Triangle t);
+ template<class ExecutionPolicy,
+ in-matrix InMat1, in-matrix InMat2,
+ possibly-packed-out-matrix OutMat, class Triangle>
+ void hermitian_matrix_rank_2k_update(ExecutionPolicy&& exec,
+ InMat1 A, InMat2 B, OutMat C, Triangle t);
+
+ // updating Hermitian rank-2k matrix update
+ template<in-matrix InMat1, in-matrix InMat2,
+ possibly-packed-in-matrix InMat3,
+ possibly-packed-out-matrix OutMat, class Triangle>
+ void hermitian_matrix_rank_2k_update(InMat1 A, InMat2 B, InMat3 E, OutMat C, Triangle t);
+ template<class ExecutionPolicy,
+ in-matrix InMat1, in-matrix InMat2,
+ possibly-packed-in-matrix InMat3,
+ possibly-packed-out-matrix OutMat, class Triangle>
+ void hermitian_matrix_rank_2k_update(ExecutionPolicy&& exec,
+ InMat1 A, InMat2 B, InMat3 E, OutMat C, Triangle t);++Replace the entire contents of [linalg.algs.blas2.rank1] with the +following.
+
1 +The following elements apply to all functions in +[linalg.algs.blas2.rank1].
+2 +Mandates:
+(2.1)
+possibly-multipliable<OutMat, InVec2, InVec1>()
+is true, and
(2.2)
+possibly-addable(A, E, A) is
+true for those overloads that take an E
+parameter.
3 +Preconditions:
+(3.1)
+multipliable(A, y, x) is true, and
(3.2)
+addable(A, E, A) is true
+for those overloads that take an E parameter.
4
+Complexity: O(
+x.extent(0) × y.extent(0) ).
template<in-vector InVec1, in-vector InVec2, out-matrix OutMat>
+ void matrix_rank_1_update(InVec1 x, InVec2 y, OutMat A);
+template<class ExecutionPolicy, in-vector InVec1, in-vector InVec2, out-matrix OutMat>
+ void matrix_rank_1_update(ExecutionPolicy&& exec, InVec1 x, InVec2 y, OutMat A);5 +These functions perform a overwriting nonsymmetric nonconjugated rank-1 +update.
+[Note: These functions correspond to the BLAS functions
+xGER (for real element types) and xGERU (for
+complex element types)[bib]. – end note]
6 +Effects: Computes A = xyT.
+template<in-vector InVec1, in-vector InVec2, in-matrix InMat, out-matrix OutMat>
+ void matrix_rank_1_update(InVec1 x, InVec2 y, InMat E, OutMat A);
+template<class ExecutionPolicy, in-vector InVec1, in-vector InVec2, in-matrix InMat, out-matrix OutMat>
+ void matrix_rank_1_update(ExecutionPolicy&& exec, InVec1 x, InVec2 y, InMat E, OutMat A);7 +These functions perform an updating nonsymmetric nonconjugated rank-1 +update.
+[Note: These functions correspond to the BLAS functions
+xGER (for real element types) and xGERU (for
+complex element types)[bib]. – end note]
8 +Effects: Computes A = E + xyT.
+9
+Remarks: A may alias E.
template<in-vector InVec1, in-vector InVec2, out-matrix OutMat>
+ void matrix_rank_1_update_c(InVec1 x, InVec2 y, OutMat A);
+template<class ExecutionPolicy, in-vector InVec1, in-vector InVec2, out-matrix OutMat>
+ void matrix_rank_1_update_c(ExecutionPolicy&& exec, InVec1 x, InVec2 y, OutMat A);10 These +functions perform a overwriting nonsymmetric conjugated rank-1 +update.
+[Note: These functions correspond to the BLAS functions
+xGER (for real element types) and xGERC (for
+complex element types)[bib]. – end note]
11 +Effects:
+(11.1) For
+the overloads without an ExecutionPolicy argument,
+equivalent to:
matrix_rank_1_update(x, conjugated(y), A);(11.2) +otherwise, equivalent to:
+matrix_rank_1_update(std::forward<ExecutionPolicy>(exec), x, conjugated(y), A);++Replace the entire contents of [linalg.algs.blas2.symherrank1] with +the following.
+
1
+[Note: These functions correspond to the BLAS functions
+xSYR, xSPR, xHER, and
+xHPR[bib]. They have overloads taking a scaling factor
+alpha, because it would be impossible to express the update
+A = AxxT
+otherwise. – end note]
2 +The following elements apply to all functions in +[linalg.algs.blas2.symherrank1].
+3
+For any function F in this section that takes a parameter
+named t, an InMat template parameter, and a
+function parameter InMat E, t applies to
+accesses done through the parameter E. F will
+only access the triangle of E specified by t.
+For accesses of diagonal elements E[i, i], F
+will use the value
+real-if-needed(E[i, i]) if the name
+of F starts with hermitian. For accesses
+E[i, j] outside the triangle specified by t,
+F will use the value
(3.1)
+conj-if-needed(E[j, i]) if the name
+of F starts with hermitian, or
(3.2)
+E[j, i] if the name of F starts with
+symmetric.
4 +Mandates:
+(4.1) If
+OutMat has layout_blas_packed layout, then the
+layout’s Triangle template argument has the same type as
+the function’s Triangle template argument;
(4.2) If
+the function has an InMat template parameter and
+InMat has layout_blas_packed layout, then the
+layout’s Triangle template argument has the same type as
+the function’s Triangle template argument;
(4.3)
+compatible-static-extents<decltype(A), decltype(A)>(0, 1)
+is true;
(4.4)
+compatible-static-extents<decltype(A), decltype(x)>(0, 0)
+is true; and
(4.5)
+possibly-addable<decltype(A), decltype(E), decltype(A)>
+is true for those overloads that take an E
+parameter.
5 +Preconditions:
+(5.1)
+A.extent(0) equals A.extent(1),
(5.2)
+A.extent(0) equals x.extent(0), and
(5.3)
+addable(A, E, A) is true
+for those overloads that take an E parameter.
6
+Complexity: O(
+x.extent(0) × x.extent(0) ).
template<class Scalar, in-vector InVec, possibly-packed-out-matrix OutMat, class Triangle>
+ void symmetric_matrix_rank_1_update(Scalar alpha, InVec x, OutMat A, Triangle t);
+template<class ExecutionPolicy,
+ class Scalar, in-vector InVec, possibly-packed-out-matrix OutMat, class Triangle>
+ void symmetric_matrix_rank_1_update(ExecutionPolicy&& exec,
+ Scalar alpha, InVec x, OutMat A, Triangle t);7
+These functions perform an overwriting symmetric rank-1 update of the
+symmetric matrix A, taking into account the
+Triangle parameter that applies to A
+([linalg.general]).
8
+Effects: Computes A = αxxT,
+where the scalar α is
+alpha.
template<in-vector InVec, possibly-packed-out-matrix OutMat, class Triangle>
+ void symmetric_matrix_rank_1_update(InVec x, OutMat A, Triangle t);
+template<class ExecutionPolicy,
+ in-vector InVec, possibly-packed-out-matrix OutMat, class Triangle>
+ void symmetric_matrix_rank_1_update(ExecutionPolicy&& exec, InVec x, OutMat A, Triangle t);9
+These functions perform an overwriting symmetric rank-1 update of the
+symmetric matrix A, taking into account the
+Triangle parameter that applies to A
+([linalg.general]).
10 +Effects: Computes A = xxT.
+template<class Scalar, in-vector InVec, possibly-packed-in-matrix InMat, possibly-packed-out-matrix OutMat, class Triangle>
+ void symmetric_matrix_rank_1_update(Scalar alpha, InVec x, InMat E, OutMat A, Triangle t);
+template<class ExecutionPolicy,
+ class Scalar, in-vector InVec, possibly-packed-in-matrix InMat, possibly-packed-out-matrix OutMat, class Triangle>
+ void symmetric_matrix_rank_1_update(ExecutionPolicy&& exec,
+ Scalar alpha, InVec x, InMat E, OutMat A, Triangle t);11 These
+functions perform an updating symmetric rank-1 update of the symmetric
+matrix A using the symmetric matrix E, taking
+into account the Triangle parameter that applies to
+A and E ([linalg.general]).
12
+Effects: Computes A = E + αxxT,
+where the scalar α is
+alpha.
template<in-vector InVec, possibly-packed-in-matrix InMat, possibly-packed-out-matrix OutMat, class Triangle>
+ void symmetric_matrix_rank_1_update(InVec x, InMat E, OutMat A, Triangle t);
+template<class ExecutionPolicy,
+ in-vector InVec, possibly-packed-in-matrix InMat, possibly-packed-out-matrix OutMat, class Triangle>
+ void symmetric_matrix_rank_1_update(ExecutionPolicy&& exec,
+ InVec x, InMat E, OutMat A, Triangle t);13 These
+functions perform an updating symmetric rank-1 update of the symmetric
+matrix A using the symmetric matrix E, taking
+into account the Triangle parameter that applies to
+A and E ([linalg.general]).
14 +Effects: Computes A = E + xxT.
+template<noncomplex Scalar, in-vector InVec, possibly-packed-out-matrix OutMat, class Triangle>
+ void hermitian_matrix_rank_1_update(Scalar alpha, InVec x, OutMat A, Triangle t);
+template<class ExecutionPolicy,
+ noncomplex Scalar, in-vector InVec, possibly-packed-out-matrix OutMat, class Triangle>
+ void hermitian_matrix_rank_1_update(ExecutionPolicy&& exec,
+ Scalar alpha, InVec x, OutMat A, Triangle t);15 These
+functions perform an overwriting Hermitian rank-1 update of the
+Hermitian matrix A, taking into account the
+Triangle parameter that applies to A
+([linalg.general]).
16
+Effects: Computes A = αxxH,
+where the scalar α is
+alpha.
template<in-vector InVec, possibly-packed-out-matrix OutMat, class Triangle>
+ void hermitian_matrix_rank_1_update(InVec x, OutMat A, Triangle t);
+template<class ExecutionPolicy,
+ in-vector InVec, possibly-packed-out-matrix OutMat, class Triangle>
+ void hermitian_matrix_rank_1_update(ExecutionPolicy&& exec, InVec x, OutMat A, Triangle t);17 These
+functions perform an overwriting Hermitian rank-1 update of the
+Hermitian matrix A, taking into account the
+Triangle parameter that applies to A
+([linalg.general]).
18 +Effects: Computes A = xxT.
+template<noncomplex Scalar, in-vector InVec, possibly-packed-in-matrix InMat, possibly-packed-out-matrix OutMat, class Triangle>
+ void hermitian_matrix_rank_1_update(Scalar alpha, InVec x, InMat E, OutMat A, Triangle t);
+template<class ExecutionPolicy,
+ noncomplex Scalar, in-vector InVec, possibly-packed-in-matrix InMat, possibly-packed-out-matrix OutMat, class Triangle>
+ void hermitian_matrix_rank_1_update(ExecutionPolicy&& exec,
+ Scalar alpha, InVec x, InMat E, OutMat A, Triangle t);19 These
+functions perform an updating Hermitian rank-1 update of the Hermitian
+matrix A using the Hermitian matrix E, taking
+into account the Triangle parameter that applies to
+A and E ([linalg.general]).
20
+Effects: Computes A = E + αxxH,
+where the scalar α is
+alpha.
template<in-vector InVec, possibly-packed-in-matrix InMat, possibly-packed-out-matrix OutMat, class Triangle>
+ void hermitian_matrix_rank_1_update(InVec x, InMat E, OutMat A, Triangle t);
+template<class ExecutionPolicy,
+ in-vector InVec, possibly-packed-in-matrix InMat, possibly-packed-out-matrix OutMat, class Triangle>
+ void hermitian_matrix_rank_1_update(ExecutionPolicy&& exec,
+ InVec x, InMat E, OutMat A, Triangle t);21 These
+functions perform an updating Hermitian rank-1 update of the Hermitian
+matrix A using the Hermitian matrix E, taking
+into account the Triangle parameter that applies to
+A and E ([linalg.general]).
22 +Effects: Computes A = E + xxT.
+++Replace the entire contents of [linalg.algs.blas2.rank2] with the +following.
+
1
+[Note: These functions correspond to the BLAS functions
+xSYR2, xSPR2, xHER2, and
+xHPR2 [bib]. – end note]
2 +The following elements apply to all functions in +[linalg.algs.blas2.rank2].
+3
+For any function F in this section that takes a parameter
+named t, an InMat template parameter, and a
+function parameter InMat E, t applies to
+accesses done through the parameter E. F will
+only access the triangle of E specified by t.
+For accesses of diagonal elements E[i, i], F
+will use the value
+real-if-needed(E[i, i]) if the name
+of F starts with hermitian. For accesses
+E[i, j] outside the triangle specified by t,
+F will use the value
(3.1)
+conj-if-needed(E[j, i]) if the name
+of F starts with hermitian, or
(3.2)
+E[j, i] if the name of F starts with
+symmetric.
4 +Mandates:
+(4.1) If
+OutMat has layout_blas_packed layout, then the
+layout’s Triangle template argument has the same type as
+the function’s Triangle template argument;
(4.2) If
+the function has an InMat template parameter and
+InMat has layout_blas_packed layout, then the
+layout’s Triangle template argument has the same type as
+the function’s Triangle template argument;
(4.3)
+compatible-static-extents<decltype(A), decltype(A)>(0, 1)
+is true;
(4.4)
+possibly-multipliable<decltype(A), decltype(x), decltype(y)>()
+is true; and
(4.5)
+possibly-addable<decltype(A), decltype(E), decltype(A)>
+is true for those overloads that take an E
+parameter.
5 +Preconditions:
+(5.1)
+A.extent(0) equals A.extent(1),
(5.2)
+multipliable(A, x, y) is
+true, and
(5.3)
+addable(A, E, A) is true
+for those overloads that take an E parameter.
6
+Complexity: O(
+x.extent(0) × y.extent(0) ).
template<in-vector InVec1, in-vector InVec2,
+ possibly-packed-out-matrix OutMat, class Triangle>
+ void symmetric_matrix_rank_2_update(InVec1 x, InVec2 y, OutMat A, Triangle t);
+template<class ExecutionPolicy, in-vector InVec1, in-vector InVec2,
+ possibly-packed-out-matrix OutMat, class Triangle>
+ void symmetric_matrix_rank_2_update(ExecutionPolicy&& exec,
+ InVec1 x, InVec2 y, OutMat A, Triangle t);7
+These functions perform an overwriting symmetric rank-2 update of the
+symmetric matrix A, taking into account the
+Triangle parameter that applies to A
+([linalg.general]).
8 +Effects: Computes A = xyT + yxT.
+template<in-vector InVec1, in-vector InVec2,
+ possibly-packed-in-matrix InMat,
+ possibly-packed-out-matrix OutMat, class Triangle>
+ void symmetric_matrix_rank_2_update(InVec1 x, InVec2 y, InMat E, OutMat A, Triangle t);
+template<class ExecutionPolicy, in-vector InVec1, in-vector InVec2,
+ possibly-packed-in-matrix InMat,
+ possibly-packed-out-matrix OutMat, class Triangle>
+ void symmetric_matrix_rank_2_update(ExecutionPolicy&& exec,
+ InVec1 x, InVec2 y, InMat E, OutMat A, Triangle t);9
+These functions perform an updating symmetric rank-2 update of the
+symmetric matrix A using the symmetric matrix
+E, taking into account the Triangle parameter
+that applies to A and E
+([linalg.general]).
10 +Effects: Computes A = E + xyT + yxT.
+template<in-vector InVec1, in-vector InVec2,
+ possibly-packed-out-matrix OutMat, class Triangle>
+ void hermitian_matrix_rank_2_update(InVec1 x, InVec2 y, OutMat A, Triangle t);
+template<class ExecutionPolicy, in-vector InVec1, in-vector InVec2,
+ possibly-packed-out-matrix OutMat, class Triangle>
+ void hermitian_matrix_rank_2_update(ExecutionPolicy&& exec,
+ InVec1 x, InVec2 y, OutMat A, Triangle t);11 These
+functions perform an overwriting Hermitian rank-2 update of the
+Hermitian matrix A, taking into account the
+Triangle parameter that applies to A
+([linalg.general]).
12 +Effects: Computes A = xyH + yxH.
+template<in-vector InVec1, in-vector InVec2,
+ possibly-packed-in-matrix InMat,
+ possibly-packed-out-matrix OutMat, class Triangle>
+ void hermitian_matrix_rank_2_update(InVec1 x, InVec2 y, InMat E, OutMat A, Triangle t);
+template<class ExecutionPolicy, in-vector InVec1, in-vector InVec2,
+ possibly-packed-in-matrix InMat,
+ possibly-packed-out-matrix OutMat, class Triangle>
+ void hermitian_matrix_rank_2_update(ExecutionPolicy&& exec,
+ InVec1 x, InVec2 y, InMat E, OutMat A, Triangle t);13 These
+functions perform an updating Hermitian rank-2 update of the Hermitian
+matrix A using the Hermitian matrix E, taking
+into account the Triangle parameter that applies to
+A and E ([linalg.general]).
14 +Effects: Computes A = E + xyH + yxH.
+++Replace the entire contents of [linalg.algs.blas3.rankk] with the +following.
+
[Note: These functions correspond to the BLAS functions
+xSYRK and xHERK. – end note]
1 +The following elements apply to all functions in +[linalg.algs.blas3.rankk].
+2
+For any function F in this section that takes a parameter
+named t, an InMat2 template parameter, and a
+function parameter InMat2 E, t applies to
+accesses done through the parameter E. F will
+only access the triangle of E specified by t.
+For accesses of diagonal elements E[i, i], F
+will use the value
+real-if-needed(E[i, i]) if the name
+of F starts with hermitian. For accesses
+E[i, j] outside the triangle specified by t,
+F will use the value
(2.1)
+conj-if-needed(E[j, i]) if the name
+of F starts with hermitian, or
(2.2)
+E[j, i] if the name of F starts with
+symmetric.
3 +Mandates:
+(3.1) If
+OutMat has layout_blas_packed layout, then the
+layout’s Triangle template argument has the same type as
+the function’s Triangle template argument.
(3.2) If
+the function takes an InMat2 template parameter and if
+InMat2 has layout_blas_packed layout, then the
+layout’s Triangle template argument has the same type as
+the function’s Triangle template argument.
(3.3)
+possibly-multipliable<decltype(A), decltype(transposed(A)), decltype(C)>
+is true.
(3.4)
+possibly-addable<decltype(C), decltype(E), decltype(C)>
+is true for those overloads that take an E
+parameter.
4 +Preconditions:
+(4.1)
+multipliable(A, transposed(A), C) is
+true. [Note: This implies that C is
+square. – end note]
(4.2)
+addable(C, E, C) is true
+for those overloads that take an E parameter.
5
+Complexity: O(
+A.extent(0) ⋅
+A.extent(1) ⋅
+A.extent(0) ).
6
+Remarks: C may alias E for those
+overloads that take an E parameter.
template<class Scalar,
+ in-matrix InMat,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+void symmetric_matrix_rank_k_update(
+ Scalar alpha,
+ InMat A,
+ OutMat C,
+ Triangle t);
+template<class ExecutionPolicy,
+ class Scalar,
+ in-matrix InMat,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+void symmetric_matrix_rank_k_update(
+ ExecutionPolicy&& exec,
+ Scalar alpha,
+ InMat A,
+ OutMat C,
+ Triangle t);5
+Effects: Computes C = αAAT,
+where the scalar α is
+alpha.
template<in-matrix InMat,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+void symmetric_matrix_rank_k_update(
+ InMat A,
+ OutMat C,
+ Triangle t);
+template<class ExecutionPolicy,
+ in-matrix InMat,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+void symmetric_matrix_rank_k_update(
+ ExecutionPolicy&& exec,
+ InMat A,
+ OutMat C,
+ Triangle t);6 +Effects: Computes C = AAT.
+template<noncomplex Scalar,
+ in-matrix InMat,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+void hermitian_matrix_rank_k_update(
+ Scalar alpha,
+ InMat A,
+ OutMat C,
+ Triangle t);
+template<class ExecutionPolicy,
+ noncomplex Scalar,
+ in-matrix InMat,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+void hermitian_matrix_rank_k_update(
+ ExecutionPolicy&& exec,
+ Scalar alpha,
+ InMat A,
+ OutMat C,
+ Triangle t);7
+Effects: Computes C = αAAH,
+where the scalar α is
+alpha.
template<in-matrix InMat,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+void hermitian_matrix_rank_k_update(
+ InMat A,
+ OutMat C,
+ Triangle t);
+template<class ExecutionPolicy,
+ in-matrix InMat,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+void hermitian_matrix_rank_k_update(
+ ExecutionPolicy&& exec,
+ InMat A,
+ OutMat C,
+ Triangle t);8 +Effects: Computes C = AAH.
+template<class Scalar,
+ in-matrix InMat1,
+ possibly-packed-in-matrix InMat2,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+void symmetric_matrix_rank_k_update(
+ Scalar alpha,
+ InMat1 A,
+ InMat2 E,
+ OutMat C,
+ Triangle t);
+template<class ExecutionPolicy,
+ class Scalar,
+ in-matrix InMat1,
+ possibly-packed-in-matrix InMat2,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+void symmetric_matrix_rank_k_update(
+ ExecutionPolicy&& exec,
+ Scalar alpha,
+ InMat1 A,
+ InMat2 E,
+ OutMat C,
+ Triangle t);9
+Effects: Computes C = E + αAAT,
+where the scalar α is
+alpha.
template<in-matrix InMat1,
+ possibly-packed-in-matrix InMat2,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+void symmetric_matrix_rank_k_update(
+ InMat1 A,
+ InMat2 E,
+ OutMat C,
+ Triangle t);
+template<class ExecutionPolicy,
+ in-matrix InMat1,
+ possibly-packed-in-matrix InMat2,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+void symmetric_matrix_rank_k_update(
+ ExecutionPolicy&& exec,
+ InMat1 A,
+ InMat2 E,
+ OutMat C,
+ Triangle t);10 +Effects: Computes C = E + AAT.
+template<noncomplex Scalar,
+ in-matrix InMat1,
+ possibly-packed-in-matrix InMat2,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+void hermitian_matrix_rank_k_update(
+ Scalar alpha,
+ InMat1 A,
+ InMat2 E,
+ OutMat C,
+ Triangle t);
+template<class ExecutionPolicy,
+ noncomplex Scalar,
+ in-matrix InMat1,
+ possibly-packed-in-matrix InMat2,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+void hermitian_matrix_rank_k_update(
+ ExecutionPolicy&& exec,
+ Scalar alpha,
+ InMat1 A,
+ InMat2 E,
+ OutMat C,
+ Triangle t);11
+Effects: Computes C = E + αAAH,
+where the scalar α is
+alpha.
template<in-matrix InMat1,
+ possibly-packed-in-matrix InMat2,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+void hermitian_matrix_rank_k_update(
+ InMat1 A,
+ InMat2 E,
+ OutMat C,
+ Triangle t);
+template<class ExecutionPolicy,
+ in-matrix InMat1,
+ possibly-packed-in-matrix InMat2,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+void hermitian_matrix_rank_k_update(
+ ExecutionPolicy&& exec,
+ InMat1 A,
+ InMat2 E,
+ OutMat C,
+ Triangle t);12 +Effects: Computes C = E + AAH.
+++Replace the entire contents of [linalg.algs.blas3.rank2k] with the +following.
+
[Note: These functions correspond to the BLAS functions
+xSYR2K and xHER2K. – end note]
1 +The following elements apply to all functions in +[linalg.algs.blas3.rank2k].
+2
+For any function F in this section that takes a parameter
+named t, an InMat3 template parameter, and a
+function parameter InMat3 E, t applies to
+accesses done through the parameter E. F will
+only access the triangle of E specified by t.
+For accesses of diagonal elements E[i, i], F
+will use the value
+real-if-needed(E[i, i]) if the name
+of F starts with hermitian. For accesses
+E[i, j] outside the triangle specified by t,
+F will use the value
(2.1)
+conj-if-needed(E[j, i]) if the name
+of F starts with hermitian, or
(2.2)
+E[j, i] if the name of F starts with
+symmetric.
3 +Mandates:
+(3.1) If
+OutMat has layout_blas_packed layout, then the
+layout’s Triangle template argument has the same type as
+the function’s Triangle template argument;
(3.2) If
+the function takes an InMat3 template parameter and if
+InMat3 has layout_blas_packed layout, then the
+layout’s Triangle template argument has the same type as
+the function’s Triangle template argument.
(3.3)
+possibly-multipliable<decltype(A), decltype(transposed(B)), decltype(C)>
+is true.
(3.4)
+possibly-multipliable<decltype(B), decltype(transposed(A)), decltype(C)>
+is true.
(3.5)
+possibly-addable<decltype(C), decltype(E), decltype(C)>
+is true for those overloads that take an E
+parameter.
4 +Preconditions:
+(4.1)
+multipliable(A, transposed(B), C) is
+true.
(4.2)
+multipliable(B, transposed(A), C) is
+true. [Note: This and the previous imply that
+C is square. – end note]
(4.3)
+addable(C, E, C) is true
+for those overloads that take an E parameter.
4
+Complexity: O(
+A.extent(0) ⋅
+A.extent(1) ⋅
+B.extent(0) )
5
+Remarks: C may alias E for those
+overloads that take an E parameter.
template<in-matrix InMat1,
+ in-matrix InMat2,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+void symmetric_matrix_rank_2k_update(
+ InMat1 A,
+ InMat2 B,
+ OutMat C,
+ Triangle t);
+template<class ExecutionPolicy,
+ in-matrix InMat1,
+ in-matrix InMat2,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+void symmetric_matrix_rank_2k_update(
+ ExecutionPolicy&& exec,
+ InMat1 A,
+ InMat2 B,
+ OutMat C,
+ Triangle t);5 +Effects: Computes C = ABT + BAT.
+template<in-matrix InMat1,
+ in-matrix InMat2,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+void hermitian_matrix_rank_2k_update(
+ InMat1 A,
+ InMat2 B,
+ OutMat C,
+ Triangle t);
+template<class ExecutionPolicy,
+ in-matrix InMat1,
+ in-matrix InMat2,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+void hermitian_matrix_rank_2k_update(
+ ExecutionPolicy&& exec,
+ InMat1 A,
+ InMat2 B,
+ OutMat C,
+ Triangle t);6 +Effects: Computes C = ABH + BAH.
+template<in-matrix InMat1,
+ in-matrix InMat2,
+ in-matrix InMat3,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+void symmetric_matrix_rank_2k_update(
+ InMat1 A,
+ InMat2 B,
+ InMat3 E,
+ OutMat C,
+ Triangle t);
+template<class ExecutionPolicy,
+ in-matrix InMat1,
+ in-matrix InMat2,
+ in-matrix InMat3,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+void symmetric_matrix_rank_2k_update(
+ ExecutionPolicy&& exec,
+ InMat1 A,
+ InMat2 B,
+ InMat3 E,
+ OutMat C,
+ Triangle t);7 +Effects: Computes C = E + ABT + BAT.
+template<in-matrix InMat1,
+ in-matrix InMat2,
+ in-matrix InMat3,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+void hermitian_matrix_rank_2k_update(
+ InMat1 A,
+ InMat2 B,
+ InMat3 E,
+ OutMat C,
+ Triangle t);
+template<class ExecutionPolicy,
+ in-matrix InMat1,
+ in-matrix InMat2,
+ in-matrix InMat3,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+void hermitian_matrix_rank_2k_update(
+ ExecutionPolicy&& exec,
+ InMat1 A,
+ InMat2 B,
+ InMat3 E,
+ OutMat C,
+ Triangle t);8 +Effects: Computes C = E + ABH + BAH.
+| Document #: | +P3371R2 | +
| Date: | +2024-10-14 | +
| Project: | +Programming Language C++ + LEWG + |
+
| Reply-to: | +
+ Mark Hoemmen <mhoemmen@nvidia.com> + Ilya Burylov <burylov@gmail.com> + |
+
beta, while std::linalg
+currently does notalpha for Hermitian matrix rank-1 and rank-k
+updates
+Revision 0 to be submitted 2024-08-15
Revision 1 to be submitted 2024-09-15
+Main wording changes from R0:
+Instead of just changing the symmetric and Hermitian rank-k and +rank-2k updates to have both overwriting and updating overloads, change +all the update functions in this way: rank-1 (general, +symmetric, and Hermitian), rank-2, rank-k, and rank-2k
For all the new updating overloads, specify that C
+(or A) may alias E
For the new symmetric and Hermitian updating overloads, specify
+that the functions access the new E parameter in the same
+way (e.g., with respect to the lower or upper triangle) as the
+C (or A) parameter
Add exposition-only concept noncomplex to
+constrain a scaling factor to be noncomplex, as needed for Hermitian
+rank-1 and rank-k functions
Add Ilya Burylov as coauthor
Change title and abstract to express the wording changes
Add nonwording section explaining why we change rank-1 and rank-2 +updates to be consistent with rank-k and rank-2k updates
Add nonwording sections explaining why we don’t change
+hermitian_matrix_vector_product,
+hermitian_matrix_product,
+triangular_matrix_product, or the triangular
+solves
Add nonwording section explaining why we constrain some scaling +factors to be noncomplex at compile time, instead of taking a run-time +approach
Reorganize and expand nonwording sections
Revision 2 to be submitted 2024-10-15
+For Hermitian matrix rank-1 and rank-k updates, do not constrain
+the type of the scaling factor alpha, as R1 did. Instead,
+define the algorithms to use
+real-if-needed(alpha). Remove
+exposition-only concept noncomplex.
Remove the exposition-only concept
+possibly-packed-in-matrix that was introduced in
+R1, as it is overly restrictive. (Input matrices never need to have
+unique layout, even if they are not packed.)
We propose the following changes to [linalg] that improve consistency +of the rank-1, rank-2, rank-k, and rank-2k update functions with the +BLAS.
+Add “updating” overloads to all the rank-1, rank-2, rank-k, and
+rank-2k update functions: general, symmetric, and Hermitian. The new
+overloads are analogous to the updating overloads of
+matrix_product. For example,
+symmetric_matrix_rank_k_update(A, scaled(beta, C), C, upper_triangle)
+will perform C := βC + AAT.
+This makes the functions consistent with the BLAS’s behavior for nonzero
+beta, and also more consistent with the behavior of
+matrix_product (of which they are mathematically a special
+case).
Change the behavior of all the existing rank-1, rank-2, rank-k,
+and rank-2k update functions (general, symmetric, and Hermitian) to be
+“overwriting” instead of “unconditionally updating.” For example,
+symmetric_matrix_rank_k_update(A, C, upper_triangle) will
+perform C = AAT
+instead of C := C + AAT.
+This makes them consistent with the BLAS’s behavior when
+beta is zero.
For the overloads of hermitian_rank_1_update and
+hermitian_rank_k_update that have an alpha
+scaling factor parameter, only use
+real-if-needed(alpha) in the update.
+This ensures that the update will be mathematically Hermitian, and makes
+the behavior well defined if alpha has nonzero imaginary
+part. The change is also consistent with our proposed resolution for LWG
+4136 (“Specify behavior of [linalg] Hermitian algorithms on diagonal
+with nonzero imaginary part”).
Items (2) and (3) are breaking changes to the current Working Draft. +Thus, we must finish this before finalization of C++26.
+For rank-k and rank-2k updates (general, symmetric, and
+Hermitian), BLAS routines support both overwriting and updating behavior
+by exposing a scaling factor beta. The corresponding
+[linalg] algorithms currently do not expose the equivalent
+functionality. Instead, they are unconditionally updating, as if
+beta is one.
The rank-k and rank-2k updates are special cases of
+matrix_product, but as a result of (1), their behavior is
+not consistent with matrix_product.
Therefore, we need to add updating overloads of the rank-k and +rank-2k updates, and change the existing overloads to be +overwriting.
The change to existing overloads is a breaking change and thus +must be finished before C++26.
To simplify wording, we add the new exposition-only concept
+possibly-packed-out-matrix for symmetric and
+Hermitian matrix update algorithms.
beta, while
+std::linalg currently does notEach function in any section whose label begins with “linalg.algs” +generally corresponds to one or more routines or functions in the +original BLAS (Basic Linear Algebra Subroutines). Every computation that +the BLAS can do, a function in the C++ Standard Library should be able +to do.
+One std::linalg user
+reported
+an exception to this rule. The BLAS routines xSYRK
+(SYmmetric Rank-K update) computes C := βC + αAAT,
+but the corresponding std::linalg function
+symmetric_matrix_rank_k_update only computes C := C + αAAT.
+That is, std::linalg currently has no way to express this
+BLAS operation with a general β scaling factor.
This issue applies to all of the symmetric and Hermitian rank-k and
+rank-2k update functions. The following table lists these functions and
+what they compute now. A and
+B denote general matrices,
+C denotes a symmetric or
+Hermitian matrix (depending on the algorithm’s name), the superscript
+T denotes the transpose, the superscript H
+denotes the Hermitian transpose, α denotes a scaling factor, and
+ᾱ denotes the complex
+conjugate of α. Making the
+functions have “updating” overloads that take an input matrix E would permit E = βC, and thus
+make [linalg] able to compute what the BLAS can compute.
| +[linalg] algorithm + | ++What it computes now + | +
|---|---|
+symmetric_matrix_rank_k_update
+ |
++C := C + αAAT + | +
+hermitian_matrix_rank_k_update
+ |
++C := C + αAAH + | +
+symmetric_matrix_rank_2k_update
+ |
++C := C + αABT + αBAT + | +
+hermitian_matrix_rank_2k_update
+ |
++C := C + αABH + ᾱBAH + | +
These functions implement special cases of matrix-matrix products.
+The matrix_product function in std::linalg
+implements the general case of matrix-matrix products. This function
+corresponds to the BLAS’s SGEMM, DGEMM,
+CGEMM, and ZGEMM, which compute C := βC + αAB,
+where α and β are scaling factors. The
+matrix_product function has two kinds of overloads:
overwriting (C = AB) +and
updating (C = E + AB).
The updating overloads handle the general α and β case by
+matrix_product(scaled(alpha, A), B, scaled(beta, C), C).
+The specification explicitly permits the input
+scaled(beta, C) to alias the output C
+([linalg.algs.blas3.gemm] 10: “Remarks:
+C may alias E”). The std::linalg
+library provides overwriting and updating overloads so that it can do
+everything that the BLAS does, just in a more idiomatically C++ way.
+Please see
+P1673R13
+Section 10.3 (“Function argument aliasing and zero scalar
+multipliers”) for a more detailed explanation.
The problem with the current symmetric and Hermitian rank-k and
+rank-2k functions is that they have the same calling syntax as
+the overwriting version of matrix_product, but
+semantics that differ from both the overwriting and the
+updating versions of matrix_product. For example,
hermitian_matrix_rank_k_update(alpha, A, C);updates C with C + αAAH, +but
+matrix_product(scaled(alpha, A), conjugate_transposed(A), C);overwrites C with αAAH. +The current rank-k and rank-2k overloads are not overwriting, so we +can’t just fix this problem by introducing an “updating” overload for +each function.
+Incidentally, the fact that these functions have “update” in their
+name is not relevant, because that naming choice is original to the
+BLAS. The BLAS calls its corresponding xSYRK,
+xHERK, xSYR2K, and xHER2K
+routines “{Symmetric, Hermitian} rank {one, two} update,” even though
+setting β = 0 makes these
+routines “overwriting” in the sense of std::linalg.
We propose to fix this by making the functions work just like
+matrix_vector_product or matrix_product. This
+entails three changes.
Add two new exposition-only concept
+possibly-packed-out-matrix for constraining output
+parameters of the changed or new symmetric and Hermitian update
+functions.
Add “updating” overloads of the symmetric and Hermitian rank-k +and rank-2k update functions.
+The updating overloads take a new input matrix parameter
+E, analogous to the updating overloads of
+matrix_product, and make C an output parameter
+instead of an in/out parameter. For example,
+symmetric_matrix_rank_k_update(A, E, C, upper_triangle)
+computes C = E + AAT.
Explicitly permit C and E to alias,
+thus permitting the desired case where E is
+scaled(beta, C).
The updating overloads take E as an
+in-matrix, and take C as a
+possibly-packed-out-matrix (instead of a
+possibly-packed-inout-matrix).
E must be accessed as a symmetric or Hermitian
+matrix (depending on the function name) and such accesses must use the
+same triangle as C. (The existing [linalg.general] 4
+wording for symmetric and Hermitian behavior does not cover
+E.)
Change the behavior of the existing symmetric and Hermitian +rank-k and rank-2k overloads to be overwriting instead of updating.
+For example,
+symmetric_matrix_rank_k_update(A, C, upper_triangle) will
+compute C = AAT
+instead of C := C + AAT.
Change C from a
+possibly-packed-inout-matrix to a
+possibly-packed-out-matrix.
Items (2) and (3) are breaking changes to the current Working Draft. +This needs to be so that we can provide the overwriting behavior C := αAAT +or C := αAAH +that the corresponding BLAS routines already provide. Thus, we must +finish this before finalization of C++26.
+Both sets of overloads still only write to the specified triangle
+(lower or upper) of the output matrix C. As a result, the
+new updating overloads only read from that triangle of the input matrix
+E. Therefore, even though E may be a different
+matrix than C, the updating overloads do not need an
+additional Triangle t_E parameter for E. The
+symmetric_* functions interpret E as symmetric
+in the same way that they interpret C as symmetric, and the
+hermitian_* functions interpret E as Hermitian
+in the same way that they interpret C as Hermitian.
+Nevertheless, we do need new wording to explain how the functions may
+interpret and access E.
| +[linalg] algorithm + | ++What it computes now + | ++Change (overwriting) + | ++Add (updating) + | +
|---|---|---|---|
+symmetric_matrix_rank_k_update
+ |
++C := C + αAAT + | ++C = αAAT + | ++C = E + αAAT + | +
+hermitian_matrix_rank_k_update
+ |
++C := C + αAAH + | ++C = αAAH + | ++C = E + αAAH + | +
+symmetric_matrix_rank_2k_update
+ |
++C := C + αABT + αBAT + | ++C = αABT + αBAT + | ++C = E + αABT + αBAT + | +
+hermitian_matrix_rank_2k_update
+ |
++C := C + αABH + ᾱBAH + | ++C = αABH + ᾱBAH + | ++C = E + αABH + ᾱBAH + | +
Currently, the rank-1 and rank-2 updates unconditionally update
+and do not take a beta scaling factor.
This behavior deviates from the BLAS Standard, is inconsistent +with the rank-k and rank-2k updates, and introduces a special case in +[linalg]’s design.
We propose making all the rank-1 and rank-2 update functions +consistent with the proposed change to the rank-k and rank-2k updates. +This means both changing the meaning of the current overloads to be +overwriting, and adding new overloads that are updating. This includes +general (nonsymmetric), symmetric, and Hermitian rank-1 update +functions, as well as symmetric and Hermitian rank-2 update +functions.
The exposition-only concept
+possibly-packed-inout-matrix is no longer needed.
+We propose removing it.
The symmetric and Hermitian rank-k and rank-2k update functions have +the following rank-1 and rank-2 analogs, A denotes a symmetric or Hermitian +matrix (depending on the function’s name), x and y denote vectors, and α denotes a scaling factor.
+| +[linalg] algorithm + | ++What it computes now + | +
|---|---|
+symmetric_matrix_rank_1_update
+ |
++A := A + αxxT + | +
+hermitian_matrix_rank_1_update
+ |
++A := A + αxxH + | +
+symmetric_matrix_rank_2_update
+ |
++A := A + αxyT + αyxT + | +
+hermitian_matrix_rank_2_update
+ |
++A := A + αxyH + ᾱxyH + | +
These functions unconditionally update the matrix A. They do not have an overwriting +option. In this, they follow the “general” (not necessarily symmetric or +Hermitian) rank-1 update functions.
+| +[linalg] algorithm + | ++What it computes now + | +
|---|---|
+matrix_rank_1_update
+ |
++A := A + xyT + | +
+matrix_rank_1_update_c
+ |
++A := A + xyH + | +
These six rank-1 and rank-2 update functions map to BLAS routines as +follows.
+| +[linalg] algorithm + | ++Corresponding BLAS routine(s) + | +
|---|---|
+matrix_rank_1_update
+ |
+
+xGER
+ |
+
+matrix_rank_1_update_c
+ |
+
+xGERC
+ |
+
+symmetric_matrix_rank_1_update
+ |
+
+xSYR, xSPR
+ |
+
+hermitian_matrix_rank_1_update
+ |
+
+xHER, xHPR
+ |
+
+symmetric_matrix_rank_2_update
+ |
+
+xSYR2, xSPR2
+ |
+
+hermitian_matrix_rank_2_update
+ |
+
+xHER2, xHPR2
+ |
+
The Reference BLAS and the BLAS Standard (see Chapter 2, pp. 64 - 68)
+differ here. The Reference BLAS and the original 1988 BLAS 2 paper
+specify all of the rank-1 and rank-2 update routines listed above as
+unconditionally updating, and not taking a β scaling factor. However, the
+(2002) BLAS Standard specifies all of these rank-1 and rank-2 update
+functions as taking a β
+scaling factor. We consider the latter to express our design intent. It
+is also consistent with the corresponding rank-k and rank-2k update
+functions in the BLAS, which all take a β scaling factor and thus can do
+either overwriting (with zero β) or updating (with nonzero β). These routines include
+xSYRK, xHERK, xSYR2K, and
+xHER2K. One could also include the general matrix-matrix
+product xGEMM among these, as xGEMM also takes
+a β scaling factor.
Section
+10.3 of P1673R13 explains the three ways that the std::linalg design
+translates Fortran INTENT(INOUT) arguments into a C++
+idiom.
Provide both in-place and not-in-place overloads for triangular +solve and triangular multiply.
“Else, if the BLAS function unconditionally updates (like
+xGER), we retain read-and-write behavior for that
+argument.”
“Else, if the BLAS function uses a scalar beta
+argument to decide whether to read the output argument as well as write
+to it (like xGEMM), we provide two versions: a write-only
+version (as if beta is zero), and a read-and-write version
+(as if beta is nonzero).”
Our design goal was for functions with vector or matrix output to
+imitate std::transform as much as possible. This favors Way
+(3) as the default approach, which turns INTENT(INOUT)
+arguments on the Fortran side into separate input and output parameters
+on the C++ side. Way (2) is really an awkward special case. The BLAS
+Standard effectively eliminates this special case by making the rank-1
+and rank-2 updates work just like the rank-k and rank-2k updates, with a
+β scaling factor. This makes
+it natural to eliminate the Way (2) special case in [linalg] as
+well.
These changes make the exposition-only concept
+possibly-packed-inout-matrix superfluous. We
+propose removing it.
Note that this would not eliminate all uses of the exposition-only
+concept inout-matrix. The in-place triangular
+matrix product functions triangular_matrix_left_product and
+triangular_matrix_right_product, and the in-place
+triangular linear system solve functions
+triangular_matrix_matrix_left_solve and
+triangular_matrix_matrix_right_solve would still use
+inout-matrix.
| +[linalg] algorithm + | ++What it computes now + | ++Change (overwriting) + | ++Add (updating) + | +
|---|---|---|---|
+matrix_rank_1_update
+ |
++A := A + xyT + | ++A = xyT + | ++A = E + xyT + | +
+matrix_rank_1_update_c
+ |
++A := A + xyH + | ++A = xyH + | ++A = E + xyH + | +
+symmetric_matrix_rank_1_update
+ |
++A := A + αxxT + | ++A = αxxT + | ++A = E + αxxT + | +
+hermitian_matrix_rank_1_update
+ |
++A := A + αxxH + | ++A = αxxH + | ++A = E + αxxH + | +
+symmetric_matrix_rank_2_update
+ |
++A := A + αxyT + αyxT + | ++A = αxyT + αyxT + | ++A = E + αxyT + αyxT + | +
+hermitian_matrix_rank_2_update
+ |
++A := A + αxyH + ᾱxyH + | ++A = αxyH + ᾱxyH + | ++A = E + αxyH + ᾱxyH + | +
alpha for Hermitian matrix rank-1 and rank-k
+updatesFor Hermitian rank-1 and rank-k matrix updates, if users provide a
+scaling factor alpha, it must have zero imaginary part.
+Otherwise, the matrix update will not be Hermitian, because all elements
+on the diagonal of a Hermitian matrix must have zero imaginary part.
+Even though AAH is
+mathematically always Hermitian, if α has nonzero imaginary part, then
+αAAH
+may no longer be a Hermitian matrix. For example, if A is the identity matrix (with ones
+on the diagonal and zeros elsewhere) and α = i (the imaginary unit,
+which is the square root of negative one), then αAAH
+is the diagonal matrix whose diagonal elements are all i, and thus has nonzero imaginary
+part.
The specification of hermitian_matrix_rank_1_update and
+hermitian_matrix_rank_k_update does not currently require
+that alpha have zero imaginary part. We propose fixing this
+by making these update algorithms only use the real part of
+alpha, as in
+real-if-needed(alpha). This solution
+is consistent with our proposed resolution of
+LWG Issue 4136,
+“Specify behavior of [linalg] Hermitian algorithms on diagonal with
+nonzero imaginary part,” where we make Hermitian rank-1 and rank-k
+matrix updates use only the real part of matrices’ diagonals.
We begin with a summary of all the Hermitian matrix BLAS routines, +how scaling factors influence their mathematical correctness. Then, we +explain how these scaling factor concerns translate into [linalg] +function concerns. Finally, we discuss alternative solutions.
+The BLAS’s Hermitian matrix routines take alpha and
+beta scaling factors. The BLAS addresses the resulting
+correctness concerns in different ways, depending on what each routine
+computes. For routines where a nonzero imaginary part could make the
+result incorrect, the routine restricts the scaling factor to have a
+noncomplex number type. Otherwise, the routine takes the scaling factor
+as a complex number type. We discuss all the Hermitian routines
+here.
xHEMM:
+Hermitian matrix-matrix multiplyxHEMM (HErmitian Matrix-matrix Multiply) computes either
+C := αAB + βC
+or C := αBA + βC,
+where A is a Hermitian matrix,
+and neither B nor C need to be Hermitian. The products
+AB and BA thus need not be
+Hermitian, so the scaling factors α and β can have nonzero imaginary parts.
+The BLAS takes them both as complex numbers.
xHEMV:
+HErmitian Matrix-Vector multiplyxHEMV (HErmitian Matrix-Vector multiply) computes y := αAx + βy,
+where A is a Hermitian matrix
+and x and y are vectors. The scaled matrix
+αA does not need to
+be Hermitian. Thus, α and
+β can have nonzero imaginary
+parts. The BLAS takes them both as complex numbers.
xHER:
+HErmitian Rank-1 updatexHER (HErmitian Rank-1 update) differs between the
+Reference BLAS (which computes A := αxxH + A)
+and the BLAS Standard (which computes A := αxxH + βA).
+The matrix A must be
+Hermitian, and the rank-1 matrix xxH is
+always mathematically Hermitian, so both α and β need to have zero imaginary part
+in order for the update to preserve A’s Hermitian property. The BLAS
+takes them both as real (noncomplex) numbers.
xHER2:
+HErmitian Rank-2 updatexHER2 (HErmitian Rank-2 update) differs between the
+Reference BLAS (which computes A := αxyH + ᾱyxH + A,
+where ᾱ denotes the complex
+conjugate of α) and the BLAS
+Standard (which computes A := αxyH + ᾱyxH + βA).
+The matrix A must be
+Hermitian, and the rank-2 matrix αxyH + ᾱyxH
+is always mathematically Hermitian, no matter the value of α. Thus, α can have nonzero imaginary part,
+but β cannot. The BLAS thus
+takes alpha as a complex number, but beta as a
+real (noncomplex) number. (There is likely a typo in the BLAS Standard’s
+description of the Fortran 95 binding. It says that both
+alpha and beta are complex (have type
+COMPLEX(<wp>)), even though in the Fortran 77
+binding, beta is real (<rtype>). The
+BLAS Standard’s description of xHER2K (see below) says that
+alpha is complex but beta is real.
+xHER2 needs to be consistent with xHER2K.)
xHERK:
+HErmitian Rank-K updatexHERK (HErmitian Rank-K update) computes either C := αAAH + βC
+or C := αAHA + βC,
+where C must be Hermitian.
+This is a generalization of xHER and thus both α and β need to have zero imaginary part.
+The BLAS takes them both as real (noncomplex) numbers.
xHER2K:
+HErmitian Rank-2k updatexHER2K (HErmitian Rank-2k update) computes either C := αABH + ᾱBAH + βC
+or C := αAHB + ᾱBHA + βC.
+This is a generalization of xHER2: α can have nonzero imaginary part,
+but β cannot. The BLAS thus
+takes alpha as a complex number, but beta as a
+real (noncomplex) number.
The following table lists, for all the Hermitian matrix update BLAS
+routines, whether the routine restricts alpha and/or
+beta to have zero imaginary part, and whether the routine
+is a generalization of some other routine in the list (N/A, “not
+applicable,” means that it is not).
| +BLAS routine + | +
+Restricts alpha
+ |
+
+Restricts beta
+ |
++Generalizes + | +
|---|---|---|---|
+xHEMM
+ |
++No + | ++No + | ++N/A + | +
+xHER
+ |
++Yes + | ++Yes + | ++N/A + | +
+xHER2
+ |
++No + | ++Yes + | ++N/A + | +
+xHERK
+ |
++Yes + | ++Yes + | +
+xHER
+ |
+
+xHER2K
+ |
++No + | ++Yes + | +
+xHER2
+ |
+
We assume here the changes proposed in previous sections that remove +inout matrix parameters from the rank-1, rank-2, rank-k, and rank-2k +algorithms, and separate these algorithms into overwriting and updating +overloads. This lets us only consider input matrix and vector +parameters.
+The [linalg] library and the BLAS treat scaling factors in different
+ways. First, [linalg] treats the result of scaled just like
+any other matrix or vector parameter. It applies any mathematical
+requirements (like being Hermitian) to the parameter, regardless of
+whether the corresponding argument results from scaled. It
+also does not forbid any input argument from being the result of
+scaled. Second, the BLAS always exposes alpha
+and beta scaling factor parameters separately from the
+matrix or vector parameters to which they are applied. In contrast,
+[linalg] only exposes a separate alpha scaling factor
+(never beta) if it would otherwise be mathematically
+impossible to express an operation that the BLAS can express. For
+example, for matrices and scaling factors that are noncomplex,
+symmetric_matrix_rank_1_update cannot express A := A − xxT
+with a noncomplex scaling factor (because the square root of − 1 is i).
In some cases, [linalg] does not expose a separate scaling factor
+parameter, even when this prevents [linalg] from doing some things that
+the BLAS can do. We give an example below of triangular matrix solves
+with multiple right-hand sides and alpha scaling not equal
+to one, where the matrix has an implicit unit diagonal.
Even though this means that the BLAS can do some things that [linalg] +cannot do, it does not cause [linalg] to violate mathematical +consistency. More importantly, as we show later, [linalg] can be +extended to do whatever the BLAS can do without breaking backwards +compatibility. Thus, we consider the status quo acceptable for +C++26.
+beta is not a concernThe [linalg] library never exposes a scaling factor
+beta. For BLAS routines that perform an update with
+beta times an inout matrix or vector parameter (e.g., βy or βC), [linalg] instead takes
+an input matrix or vector parameter (e.g., E) that can be
+separate from the output matrix or vector (e.g., C). For
+Hermitian BLAS routines where beta needs to have zero
+imaginary part, [linalg] simply requires that E be
+Hermitian – a strictly more general requirement. For example, for the
+new updating overloads of hermitian_rank_1_update and
+hermitian_rank_k_update proposed above, [linalg] expresses
+a beta scaling factor by letting users supply
+scaled(beta, C) as the argument for E. The
+wording only requires that E be Hermitian. If
+E is scaled(beta, C), this concerns only the
+product of beta and C. It would be incorrect
+to constrain beta or C separately. For
+example, if β = − i
+and C is the matrix whose
+elements are all i, then C is not Hermitian but βC (and therefore
+scaled(beta, C)) is Hermitian. The same reasoning applies
+for the rank-2 and rank-2k updates.
The above arguments help us restrict our concerns. This section of +our proposal concerns itself with Hermitian matrix update algorithms +where
+the algorithm exposes a separate scaling factor parameter
+alpha, and
alpha needs to have zero imaginary part,
+but
nothing in the wording currently prevents alpha from
+having nonzero imaginary part.
These correspond exactly to the BLAS’s Hermitian matrix update
+routines where the type of alpha is real: xHER
+and xHERK. This strongly suggests solving the problem in
+[linalg] by constraining the type of alpha to be
+noncomplex. However, as we explain in “Alternative solutions” below, it
+is hard to define a “noncomplex number” constraint that works well for
+user-defined number types. Instead, we propose fixing this in a way that
+is consistent with our proposed resolution of
+LWG Issue 4136,
+“Specify behavior of [linalg] Hermitian algorithms on diagonal with
+nonzero imaginary part.” That is, the Hermitian rank-1 and rank-k update
+algorithms will simply use
+real-if-needed(alpha) and ignore any
+nonzero imaginary part of alpha.
We can think of at least four different ways to solve this problem, +and will explain why we did not choose those solutions.
+Constrain alpha by defining a generic “noncomplex
+number type” constraint.
Only constrain alpha not to be
+std::complex<T>; do not try to define a generic
+“noncomplex number” constraint.
Constrain alpha by default using a generic
+“noncomplex number type” constraint, but let users specialize an
+“opt-out trait” to tell the library that their number type is
+noncomplex.
Impose a precondition that the imaginary part of
+alpha is zero.
If we had to pick one of these solutions, we would favor first (4), +then (3), and then (1). We would object most to (2).
+alpha via generic “noncomplex” constraintThe BLAS solves this problem by having the Hermitian rank-1 update
+routines xHER and rank-k update routines xHERK
+take the scaling factor α as a
+noncomplex number. This suggests constraining alpha’s type
+to be noncomplex. However, [linalg] accepts user-defined number types,
+including user-defined complex number types. How do we define a
+“noncomplex number type”? If we get it wrong and say that a number type
+is complex when its “imaginary part” is always zero, we end up excluding
+a perfectly valid custom number type.
For number types that have an additive identity (like zero for the
+integers and reals), it’s mathematically correct to treat those as
+“complex numbers” whose imaginary parts are always zero. This is what
+conjugated_accessor does if you give it a number type for
+which ADL cannot find conj. P3050 optimizes
+conjugated for such number types by bypassing
+conjugated_accessor and using default_accessor
+instead, but this is just a code optimization. Therefore, it can afford
+to be conservative: if a number type might be complex, then
+conjugated needs to use conjugated_accessor.
+This is why P3050’s approach doesn’t apply here. If a number type has
+ADL-findable conj, real, and
+imag, then it might be complex. However, if we
+define the opposite of that as “noncomplex,” then we might be preventing
+users from using otherwise reasonable number types.
The following MyRealNumber example always has zero
+imaginary part, but nevertheless has ADL-findable conj,
+real, and imag. Furthermore, it has a
+constructor for which MyRealNumber(1.2, 3.4) is
+well-formed. (This is an unfortunate design choice; making
+precision have class type that is not implicitly
+convertible from double would be wiser, so that users would
+have to type something like
+MyRealNumber(1.2, Precision(42)).) As a result, there is no
+reasonable way to tell at compile time if MyRealNumber
+might represent a complex number.
class MyRealNumber {
+public:
+ explicit MyRealNumber(double initial_value);
+ // precision represents the amount of storage
+ // used by an arbitrary-precision real number object
+ explicit MyRealNumber(double initial_value, int precision);
+
+ MyRealNumber(); // result is the additive identity "zero"
+
+ // ... other members to make MyRealNumber regular ...
+ // ... hidden friend overloaded arithmetic operators ...
+
+ friend MyRealNumber conj(MyRealNumber x) { return x; }
+ friend MyRealNumber real(MyRealNumber x) { return x; }
+ friend MyRealNumber imag(MyRealNumber) { return {}; }
+};It’s reasonable to write custom noncomplex number types that define
+ADL-findable conj, real, and
+imag. First, users may want to write or use libraries of
+generic numerical algorithms that work for both complex and noncomplex
+number types. P1673 argues that defining conj to be
+type-preserving (unlike std::conj in the C++ Standard
+Library) makes this possible. For example, Appendix A below shows how to
+implement a generic two-norm absolute value (or magnitude, for complex
+numbers) function, using this interface. Second, the Standard Library
+might lead users to write a type like this. std::conj
+accepts arguments of any integer or floating-point type, none of which
+represent complex numbers. The Standard makes the unfortunate choice for
+std::conj of an integer type to return
+std::complex<double>. However, users who try to make
+conj(MyRealNumber) return
+std::complex<MyRealNumber> would find out that
+std::complex<MyRealNumber> does not compile, because
+std::complex<T> requires that T be a
+floating-point type. The least-effort next step would be to make
+conj(MyRealNumber) return MyRealNumber.
We want rank-1 and rank-k Hermitian matrix updates to work with types
+like MyRealNumber, but any reasonable way to constrain the
+type of alpha would exclude MyRealNumber.
alpha not to be std::complexA variant of this suggestion would be only to constrain
+alpha not to be std::complex<T>, and not
+try to define a generic “noncomplex number” constraint. However, this
+would break generic numerical algorithms by making valid code for the
+non-Standard complex number case invalid code for
+std::complex<T>. We do not want [linalg] to treat
+custom complex number types differently than
+std::complex.
alpha by default, but let users “opt out”Another option would be to constrain alpha by default
+using the same generic “noncomplex number type” constraint as in Option
+(1), but let users specialize an “opt-out trait” to tell the library
+that their number type is noncomplex. We would add a new “tag” type
+trait is_noncomplex that users can specialize. By default,
+is_noncomplex<T>::value is false for any
+type T. This does not mean that the type is
+complex, just that the user declares their type to be noncomplex. The
+distinction matters, because a noncomplex number type might still
+provide ADL-findable conj, real, and
+imag, as we showed above. Users must take positive action
+to declare their type U as “not a complex number type,” by
+specializing is_noncomplex<U> so that
+is_noncomplex<U>::value is true. If
+users do that, then the library will ignore any ADL-findable functions
+conj, real, and imag (whether or
+not they exist), and will assume that the number type is noncomplex.
Standard Library precedent for this approach is in heterogeneous
+lookup for associative containers (see N3657 for ordered associative
+containers, and P0919 and P1690 for unordered containers). User-defined
+hash functions and key equality comparison functions can tell the
+container to provide heterogeneous comparisons by exposing a
+static constexpr bool is_transparent whose value is
+true. Default behavior does not expose heterogeneous
+comparisons. Thus, users must opt in at compile time to assert something
+about their user-defined types. Another example is
+uses_allocator<T, alloc>, whose value
+member defaults to false unless T has a nested
+type allocator_type that is convertible from
+Alloc. Standard Library types like tuple use
+uses_allocator to determine if a user-defined type
+T is allocator-aware.
Of the three constraint-based approaches discussed in this proposal,
+we favor this one the most. It still treats types “as they are” and does
+not permit users to claim that a type is complex when it lacks the
+needed operations, but it lets users optimize by giving the Standard
+Library a single bit of compile-time information. By default, any linear
+algebra value type (see [linalg.reqs.val]) that meets the
+maybe_complex concept below would be considered “possibly
+complex.” Types that do not meet this concept would result in
+compilation errors; users would then be able to search documentation or
+web pages to find out that they need to specialize
+is_noncomplex.
template<class T>
+concept maybe_complex =
+ std::semiregular<T> &&
+ requires(T t) {
+ {conj(t)} -> T;
+ {real(t)} -> std::convertible_to<T>;
+ {imag(t)} -> std::convertible_to<T>;
+ };P1673 generally avoids approaches based on specializing traits. Its +design philosophy favors treating types as they are. Users should not +need to do something “extra” with their custom number types to get +correct behavior, beyond what they would reasonably need to define to +make a custom number type behave like a number.
+We base this principle on our past experiences in generic numerical
+algorithms development. In the 2010’s, one of the authors maintained a
+generic mathematical algorithms library called Trilinos. The Teuchos
+(pronounced “TEFF-os”) package of Trilinos provides a monolithic
+ScalarTraits class template that defines different
+properties of a number type. It combines the features of
+std::numeric_limits with generic complex arithmetic
+operations like conjugate, real, and
+imag. Trilinos’ generic algorithms assume that number types
+are regular and define overloaded +, -,
+*, and /, but use
+ScalarTraits<T>::conjugate,
+ScalarTraits<T>::real, and
+ScalarTraits<T>::imag. As a result, users with a
+custom complex number type had to specialize ScalarTraits
+and provide all these operations. Even if users had imitated
+std::complex’s interface perfectly and provided
+ADL-findable conj, real, and
+imag, users had to do extra work to make Trilinos compile
+and run correctly for their numbers. With P1673, we decided instead that
+users who define a custom complex number type with an interface
+sufficiently like std::complex should get reasonable
+behavior without needing to do anything else.
As a tangent, we would like to comment on the monolithic design of
+Teuchos::ScalarTraits. The monolithic design was partly an
+imitation of std::numeric_limits, and partly a side effect
+of a requirement to support pre-C++11 compilers that did not permit
+partial specialization of function templates. (The typical pre-C++11
+work-around is to define an unspecialized function template that
+dispatches to a possibly specialized class template.) Partial
+specialization of function templates and C++14’s variable templates both
+encourage “breaking up” monolithic traits classes into separate traits.
+Our paper P1370R1 (“Generic numerical algorithm development with(out)
+numeric_limits”) aligns with this trend.
alphaAnother option would be to impose a precondition that
+imag-if-needed(alpha) is zero.
+However, this would be inconsistent with our proposed resolution of
+LWG Issue 4136,
+“Specify behavior of [linalg] Hermitian algorithms on diagonal with
+nonzero imaginary part”. WG21 members have expressed wanting
+fewer preconditions and less undefined behavior in the
+Standard Library.
If users call Hermitian matrix rank-1 or rank-k updates with
+alpha being std::complex<float> or
+std::complex<double>, implementations of [linalg]
+that call an underlying C or Fortran BLAS would have to get the real
+part of alpha anyway, because these BLAS routines only take
+alpha as a real type. Thus, our proposed solution – to
+define the behavior of the update algorithms as using
+real-if-needed(alpha) – would not add
+overhead.
For Hermitian matrix update algorithms where
+the algorithm exposes a separate scaling factor parameter
+alpha, and
alpha needs to have zero imaginary part,
+but
nothing in the wording currently prevents alpha from
+having nonzero imaginary part,
specify that these algorithms use
+real-if-needed(alpha) and ignore any
+nonzero imaginary part of alpha.
We pointed out above that
+hermitian_matrix_vector_product and
+hermitian_matrix_product expect that the (possibly scaled)
+input matrix is Hermitian, while the corresponding BLAS routines
+xHEMV and xHEMM expect that the unscaled input
+matrix is Hermitian and permit the scaling factor alpha to
+have nonzero imaginary part. However, this does not affect the ability
+of these [linalg] algorithms to compute what the BLAS can compute. Users
+who want to supply alpha with nonzero imaginary part should
+not scale the matrix A (as in
+scaled(alpha, A)). Instead, they should scale the input
+vector x, as in the following.
auto alpha = std::complex{0.0, 1.0};
+hermitian_matrix_vector_product(A, upper_triangle, scaled(alpha, x), y);Therefore, hermitian_matrix_vector_product and
+hermitian_matrix_product do not need extra
+overloads with a scaling factor alpha parameter.
In Chapter 2 of the BLAS Standard, both xHEMV and
+xHEMM take the scaling factors α and β as complex numbers
+(COMPLEX<wp>, where <wp>
+represents the current working precision). The BLAS permits
+xHEMV or xHEMM to be called with
+alpha whose imaginary part is nonzero. The matrix that the
+BLAS assumes to be Hermitian is A, not αA. Even if A is Hermitian, αA might not necessarily be
+Hermitian. For example, if A
+is the identity matrix (diagonal all ones) and α is i, then αA is not Hermitian but
+skew-Hermitian.
The current [linalg] wording requires that the input matrix be
+Hermitian. This excludes replicating BLAS behavior by using
+scaled(alpha, A) (where alpha has nonzero
+imaginary part, and A is any Hermitian matrix) as the input
+matrix. Note that the behavior of this is still otherwise well defined,
+at least after applying the fix proposed in LWG4136 for diagonal
+elements with nonzero imaginary part. It does not violate a
+precondition. Therefore, the Standard has no way to tell the user that
+they did something wrong.
The current wording permits introducing the scaling factor
+alpha through the input vector, even if alpha
+has nonzero imaginary part.
auto alpha = std::complex{0.0, 1.0};
+hermitian_matrix_vector_product(A, upper_triangle, scaled(alpha, x), y);This is mathematically correct as long as αAx equals Aαx, that is, as
+long as α commutes with the
+elements of A. This issue would only be of concern if those
+multiplications might be noncommutative. Multiplication with
+floating-point numbers, integers, and anything that behaves reasonably
+like a real or complex number is commutative. However, practical number
+types exist that have noncommutative multiplication. Quaternions are one
+example. Another is the ring of square N by N matrices (for some fixed dimension
+N), with matrix multiplication
+using the same definition that linalg::matrix_product uses.
+One way for a user-defined complex number type to have noncommutative
+multiplication would be if its real and imaginary parts each have a
+user-defined number type with noncommutative multiplication, as in the
+user_complex<noncommutative> example below.
template<class T>
+class user_complex {
+public:
+ user_complex(T re, T im) : re_(re), im_(im) {}
+
+ friend T real(user_complex<T> z) { return z.re_; }
+ friend T imag(user_complex<T> z) { return z.im_; }
+ friend user_complex<T> conj(user_complex<T> z) {
+ return {real(z), -imag(z)};
+ }
+
+ // ... other overloaded arithmetic operators ...
+
+ // the usual complex arithmetic definition
+ friend user_complex<T>
+ operator*(user_complex<T> z, user_complex<T> w) {
+ return {
+ real(z) * real(w) - imag(z) * imag(w),
+ real(z) * imag(w) + imag(z) * real(w)
+ };
+ }
+
+private:
+ T re_, im_;
+};
+
+class noncommutative {
+public:
+ explicit noncommutative(double value) : value_(value) {}
+
+ // ... overloaded arithmetic operators ...
+
+ // x * y != y * x here, for example with x=3 and y=5
+ friend auto operator*(noncommutative x, noncommutative y) {
+ return x + 2.0 * y.value_;
+ }
+
+private:
+ double value_;
+};
+
+auto alpha = user_complex<noncommutative>{
+ noncommutative{3.0}, noncommutative{4.0}
+};
+hermitian_matrix_vector_product(N, upper_triangle, scaled(alpha, x), y);The [linalg] library was designed to support element types with
+noncommutative multiplication. On the other hand, generally, if we speak
+of Hermitian matrices or even of inner products (which are used to
+define Hermitian matrices), we’re working in a vector space. This means
+that multiplication of the matrix’s elements is commutative. Thus, we
+think it is not so onerous to restrict use of alpha with
+nonzero imaginary part in this case.
Many users may not like the status quo of needing to scale
+x instead of A. First, it differs from the
+BLAS, which puts alpha before A in its
+xHEMV and xHEMM function arguments. Second,
+users would get no compile-time feedback and likely no run-time feedback
+if they scale A instead of x. Their code would
+compile and produce correct results for almost all matrix-vector or
+matrix-matrix products, except for the Hermitian case, and
+only when the scaling factor has a nonzero imaginary part.
+However, we still think the status quo is the least bad choice. We
+explain why by discussing the following alternatives.
Treat scaled(alpha, A) as a special case: expect
+A to be Hermitian and permit alpha to have
+nonzero imaginary part
Forbid scaled(alpha, A) at compile time, so that
+users must scale x instead
Add overloads that take alpha, analogous to the
+rank-1 and rank-k update functions
The first choice is mathematically incorrect, as we will explain +below. The second is not incorrect, but could only catch some errors. +The third is likewise not incorrect, but would add a lot of overloads +for an uncommon use case, and would still not prevent users from scaling +the matrix in mathematically incorrect ways.
+“Treat scaled(alpha, A) as a special case” actually
+means three special cases, given some nested accessor type
+Acc and a scaling factor alpha of type
+Scalar.
scaled(alpha, A), whose accessor type is
+scaled_accessor<Scalar, Acc>
conjugated(scaled(alpha, A)), whose accessor type is
+conjugated_accessor<scaled_accessor<Scalar, Acc>>
scaled(alpha, conjugated(A)), whose accessor type is
+scaled_accessor<Scalar, conjugated_accessor<Acc>>
One could replace conjugated with
+conjugate_transposed (which we expect to be more common in
+practice) without changing the accessor type.
This approach violates the fundamental [linalg] principle that “…
+each mdspan parameter of a function behaves as itself and
+is not otherwise ‘modified’ by other parameters” (Section 10.2.5,
+P1673R13). The behavior of [linalg] is agnostic of specific accessor
+types, even though implementations likely have optimizations for
+specific accessor types. [linalg] takes this approach for consistency,
+even where it results in different behavior from the BLAS (see Section
+10.5.2 of P1673R13). The application of this principle here is “the
+input parameter A is always Hermitian.” In this case, the
+consistency matters for mathematical correctness. What if
+scaled(alpha, A) is Hermitian, but A itself is
+not? An example is α = − i and A is the 2 x 2 matrix whose elements
+are all i. If we treat
+scaled_accessor as a special case, then
+hermitian_matrix_vector_product will compute different
+results.
Another problem with this approach is that users might define their
+own accessor types with the effect of scaled_accessor, or
+combine existing nested accessor types in hard-to-detect ways (like a
+long nesting of conjugated_accessor with a
+scaled_accessor inside). The [linalg] library has no way to
+detect all possible ways that the matrix might be scaled.
scaled_accessor would not solve the problemHermitian matrix-matrix and matrix-vector product functions could +instead forbid scaling the matrix at compile time. This means +excluding, at compile time, the three accessor type cases listed in the +previous option.
+scaled_accessor<Scalar, Acc>
conjugated_accessor<scaled_accessor<Scalar, Acc>>
scaled_accessor<Scalar, conjugated_accessor<Acc>>
Doing this would certainly encourage correct behavior for the most +common cases. However, as mentioned above, users are permitted to define +their own accessor types, and to combine existing nested accessors in +arbitrary ways. The [linalg] library cannot detect all possible ways +that an arbitrary, possibly user-defined accessor might scale the +matrix. Furthermore, scaling the matrix might still be mathematically +correct. A scaling factor with nonzero imaginary part might even +make the matrix Hermitian. Applying i as a scaling factor twice would +give a perfectly valid scaling factor of − 1.
+alpha
+overloads would make the problem worseOne could imagine adding overloads that take alpha,
+analogous to the rank-1 and rank-k update overloads that take
+alpha. Users could then write
hermitian_matrix_vector_product(alpha, A, upper_triangle, x, y);instead of
+hermitian_matrix_vector_product(A, upper_triangle, scaled(alpha, x), y);We do not support this approach. First, users can already introduce a
+scaling factor through the input vector parameter, so adding
+alpha overloads would not add much to what the existing
+overloads can accomplish. Contrast this with the rank-1 and rank-k
+Hermitian update functions, where not having alpha
+overloads would prevent simple cases, like C := C − 2xxH
+with a user-defined complex number type whose real and imaginary parts
+are both integers. Second, alpha overloads would not
+prevent users from also supplying scaled(gamma, A)
+as the matrix for some other scaling factor gamma. This
+would make the problem worse, because users would need to reason about
+the combination of two ways that the matrix could be scaled.
In BLAS, triangular matrix-vector and matrix-matrix products
+apply alpha scaling to the implicit unit diagonal. In
+[linalg], the scaling factor alpha is not applied to the
+implicit unit diagonal. This is because the library does not interpret
+scaled(alpha, A) differently than any other
+mdspan.
Users of triangular matrix-vector products can recover BLAS +functionality by scaling the input vector instead of the input matrix, +so this only matters for triangular matrix-matrix products.
All calls of the BLAS’s triangular matrix-matrix product routine
+xTRMM in LAPACK (other than in testing routines) use
+alpha equal to one.
Straightforward approaches for fixing this issue would not break +backwards compatibility.
Therefore, we do not consider fixing this a high-priority issue, +and we do not propose a fix for it in this paper.
The triangular_matrix_vector_product and
+triangular_matrix_product algorithms have an
+implicit_unit_diagonal option. This makes the algorithm not
+access the diagonal of the matrix, and compute as if the diagonal were
+all ones. The option corresponds to the BLAS’s “Unit” flag. BLAS
+routines that take both a “Unit” flag and an alpha scaling
+factor apply “Unit” before scaling by alpha, so
+that the matrix is treated as if it has a diagonal of all
+alpha values. In contrast, [linalg] follows the general
+principle that scaled(alpha, A) should be treated like any
+other kind of mdspan. As a result, algorithms interpret
+implicit_unit_diagonal as applied to the matrix
+after scaling by alpha, so that the matrix still
+has a diagonal of all ones.
The triangular solve algorithms in std::linalg are not affected,
+because their BLAS analogs either do not take an alpha
+argument (as with xTRSV), or the alpha
+argument does not affect the triangular matrix (with xTRSM,
+alpha affects the right-hand sides B, not the
+triangular matrix A).
This issue only reduces functionality of
+triangular_matrix_product. Users of
+triangular_matrix_vector_product who wish to replicate the
+original BLAS functionality can scale the input matrix (by supplying
+scaled(alpha, x) instead of x as the input
+argument) instead of the triangular matrix.
The following example computes A := 2AB where +A is a lower triangular +matrix, but it makes the diagonal of A all ones on the input (right-hand) +side.
+triangular_matrix_product(scaled(2.0, A), lower_triangle, implicit_unit_diagonal, B, A);Contrast with the analogous BLAS routine DTRMM, which
+has the effect of making the diagonal elements all 2.0.
dtrmm('Left side', 'Lower triangular', 'No transpose', 'Unit diagonal', m, n, 2.0, A, lda, B, ldb)If we want to use [linalg] to express what the BLAS call expresses, +we need to perform a separate scaling.
+triangular_matrix_product(A, lower_triangle, implicit_unit_diagonal, B, A);
+scale(2.0, A);This is counterintuitive, and may also affect performance.
+Performance of scale is typically bound by memory bandwidth
+and/or latency, but if the work done by scale could be
+fused with the work done by the triangular_matrix_product,
+then scale’s memory operations could be “hidden” in the
+cost of the matrix product.
xTRMM with the implicit unit diagonal option and
+alpha not equal to oneHow much might users care about this missing [linalg] feature? +P1673R13 explains that the BLAS was codesigned with LAPACK and that +every reference BLAS routine is used by some LAPACK routine. “The BLAS +does not aim to provide a complete set of mathematical operations. Every +function in the BLAS exists because some LINPACK or LAPACK algorithm +needs it” (Section 10.6.1). Therefore, to judge the urgency of adding +new functionality to [linalg], we can ask whether the functionality +would be needed by a C++ re-implementation of LAPACK. We think not much, +because the highest-priority target audience of the BLAS is LAPACK +developers, and LAPACK routines (other than testing routines) never use +a scaling factor alpha other than one.
+We survey calls to xTRMM in the latest version of LAPACK
+as of the publication date of R1 of this proposal, LAPACK 3.12.0. It
+suffices to survey DTRMM, the double-precision real case,
+since for all the routines of interest, the complex case follows the
+same pattern. (We did survey ZTRMM, the double-precision
+complex case, just in case.) LAPACK has 24 routines that call
+DTRMM directly. They fall into five categories.
Test routines: DCHK3, DCHKE,
+DLARHS
Routines relating to QR factorization or using the result of a QR
+factorization (especially with block Householder reflectors):
+DGELQT3, DLARFB, DGEQRT3,
+DLARFB_GETT, DLARZB,
+DORM22
Routines relating to computing an inverse of a triangular matrix
+or of a matrix that has been factored into triangular matrices:
+DLAUUM, DTRITRI, DTFTRI,
+DPFTRI
Routines relating to solving eigenvalue (or generalized
+eigenvalue) problems: DLAHR2, DSYGST,
+DGEHRD, DSYGV, DSYGV_2STAGE,
+DSYGVD, DSYGVX (note that DLAQR5
+depends on DTRMM via EXTERNAL declaration, but
+doesn’t actually call it)
Routines relating to symmetric indefinite factorizations:
+DSYT01_AA, DSYTRI2X,
+DSYTRI_3X
The only routines that call DTRMM with
+alpha equal to anything other than one or negative one are
+the testing routines. Some calls in DGELQT3 and
+DLARFB_GETT use negative one, but these calls never specify
+an implicit unit diagonal (they use the explicit diagonal option). The
+only routine that might possibly call DTRMM with both
+negative one as alpha and the implicit unit diagonal is
+DTFTRI. (This routine “computes the inverse of a triangular
+matrix A stored in RFP [Rectangular Full Packed] format.” RFP format was
+introduced to LAPACK in the late 2000’s, well after the BLAS Standard
+was published. See
+LAPACK
+Working Note 199, which was published in 2008.) DTFTRI
+passes its caller’s diag argument (which specifies either
+implicit unit diagonal or explicit diagonal) to DTRMM. The
+only two LAPACK routines that call DTFTRI are
+DERRRFP (a testing routine) and DPFTRI.
+DPFTRI only ever calls DTFTRI with
+diag not specifying the implicit unit diagonal
+option. Therefore, LAPACK never needs both alpha not equal
+to one and the implicit unit diagonal option, so adding the ability to
+“scale the implicit diagonal” in [linalg] is a low-priority feature.
We can think of two ways to fix this issue. First, we could add an
+alpha scaling parameter, analogous to the symmetric and
+Hermitian rank-1 and rank-k update functions. Second, we could add a new
+kind of Diagonal template parameter type that expresses a
+“diagonal value.” For example, implicit_diagonal_t{alpha}
+(or a function form, implicit_diagonal(alpha)) would tell
+the algorithm not to access the diagonal elements, but instead to assume
+that their value is alpha. Both of these solutions would
+let users specify the diagonal’s scaling factor separately from the
+scaling factor for the rest of the matrix. Those two scaling factors
+could differ, which is new functionality not offered by the BLAS. More
+importantly, both of these solutions could be added later, after C++26,
+without breaking backwards compatibility.
In BLAS, triangular solves with possibly multiple right-hand
+sides (xTRSM) apply alpha scaling to the
+implicit unit diagonal. In [linalg], the scaling factor
+alpha is not applied to the implicit unit diagonal. This is
+because the library does not interpret scaled(alpha, A)
+differently than any other mdspan.
Users of triangular solves would need a separate
+scale call to recover BLAS functionality.
LAPACK sometimes calls xTRSM with alpha
+not equal to one.
Straightforward approaches for fixing this issue would not break +backwards compatibility.
Therefore, we do not consider fixing this a high-priority issue, +and we do not propose a fix for it in this paper.
Triangular solves have a similar issue to the one explained in the
+previous section. The BLAS routine xTRSM applies
+alpha “after” the implicit unit diagonal, while std::linalg
+applies alpha “before.” (xTRSV does not take
+an alpha scaling factor.) As a result, the BLAS solves with
+a different matrix than std::linalg.
In mathematical terms, xTRSM solves the equation α(A+I)X = B
+for X, where A is the user’s input matrix
+(without implicit unit diagonal) and I is the identity matrix (with ones
+on the diagonal and zeros everywhere else).
+triangular_matrix_matrix_left_solve solves the equation
+(αA+I)Y = B
+for Y. The two results X and Y are not equal in general.
Users could work around this problem by first scaling the matrix
+A by α, and then solving for Y. In the common case where the
+“other triangle” of A holds
+another triangular matrix, users could not call
+scale(alpha, A). They would instead need to iterate over
+the elements of A manually.
+Users might also need to “unscale” the matrix after the solve. Another
+option would be to copy the matrix A before scaling.
for (size_t i = 0; i < A.extent(0); ++i) {
+ for (size_t j = i + 1; j < A.extent(1); ++j) {
+ A[i, j] *= alpha;
+ }
+}
+triangular_matrix_matrix_left_solve(A, lower_triangle, implicit_unit_diagonal, B, Y);
+for (size_t i = 0; i < A.extent(0); ++i) {
+ for (size_t j = i + 1; j < A.extent(1); ++j) {
+ A[i, j] /= alpha;
+ }
+}Users cannot solve this problem by scaling B (either with
+scaled(1.0 / alpha, B) or with
+scale(1.0 / alpha, B)). Transforming X into Y or vice versa is mathematically
+nontrivial in general, and may introduce new failure conditions. This
+issue occurs with both the in-place and out-of-place triangular
+solves.
The common case in LAPACK is calling xTRSM with
+alpha equal to one, but other values of alpha
+occur. For example, xTRTRI calls xTRSM with
+alpha equal to − 1. Thus,
+we cannot dismiss this issue, as we could with xTRMM.
As with triangular matrix products above, we can think of two ways to
+fix this issue. First, we could add an alpha scaling
+parameter, analogous to the symmetric and Hermitian rank-1 and rank-k
+update functions. Second, we could add a new kind of
+Diagonal template parameter type that expresses a “diagonal
+value.” For example, implicit_diagonal_t{alpha} (or a
+function form, implicit_diagonal(alpha)) would tell the
+algorithm not to access the diagonal elements, but instead to assume
+that their value is alpha. Both of these solutions would
+let users specify the diagonal’s scaling factor separately from the
+scaling factor for the rest of the matrix. Those two scaling factors
+could differ, which is new functionality not offered by the BLAS. More
+importantly, both of these solutions could be added later, after C++26,
+without breaking backwards compatibility.
We currently have two other std::linalg fix papers in
+review.
P3222: Fix C++26 by adding transposed special cases
+for P2642 layouts (forwarded by LEWG to LWG on 2024-08-27 pending
+electronic poll results)
P3050: “Fix C++26 by optimizing linalg::conjugated
+for noncomplex value types” (forwarded by LEWG to LWG on 2024-09-03
+pending electronic poll results)
LEWG was aware of these two papers and this pending paper P3371 in
+its 2024-09-03 review of P3050R2. All three of these papers increment
+the value of the __cpp_lib_linalg macro. While this
+technically causes a conflict between the papers, advice from LEWG on
+2024-09-03 was not to introduce special wording changes to avoid this
+conflict.
We also have two outstanding LWG issues.
+LWG4136 +specifies the behavior of Hermitian algorithms on diagonal matrix +elements with nonzero imaginary part. (As the BLAS Standard specifies +and the Reference BLAS implements, the Hermitian algorithms do not +access the imaginary parts of diagonal elements, and assume they are +zero.) In our view, P3371 does not conflict with LWG4136.
LWG4137, +“Fix Mandates, Preconditions, and Complexity elements of [linalg] +algorithms,” affects several sections touched by this proposal, +including [linalg.algs.blas3.rankk] and [linalg.algs.blas3.rank2k]. We +consider P3371 rebased atop the wording changes proposed by LWG4137. +While the wording changes may conflict in a formal (“diff”) sense, it is +our view that they do not conflict in a mathematical or specification +sense.
The following function overload sets need changing.
+matrix_rank_1_updatematrix_rank_1_update_csymmetric_matrix_rank_1_updatehermitian_matrix_rank_1_updatesymmetric_matrix_rank_2_updatehermitian_matrix_rank_2_updatesymmetric_matrix_rank_k_updatehermitian_matrix_rank_k_updatesymmetric_matrix_rank_2k_updatehermitian_matrix_rank_2k_updateAs of 2024/10/04, +Pull request +293 in the reference std::linalg implementation implements changes +to the following functions, and adds tests to ensure test coverage of +the new overloads.
+matrix_rank_1_updatematrix_rank_1_update_csymmetric_matrix_rank_1_updatehermitian_matrix_rank_1_updateMany thanks (with permission) to Raffaele Solcà (CSCS Swiss National
+Supercomputing Centre, raffaele.solca@cscs.ch) for pointing
+out some of the issues fixed by this paper, as well as the issues
+leading to LWG4137.
++Text in blockquotes is not proposed wording, but rather instructions +for generating proposed wording. The � character is used to denote a +placeholder section number which the editor shall determine.
+In [version.syn], for the following definition,
+
#define __cpp_lib_linalg YYYYMML // also in <linalg>++adjust the placeholder value
+YYYYMMLas needed so as to +denote this proposal’s date of adoption.
++Add the following lines to the header
+<linalg>+synopsis [linalg.syn], just after the declaration of +the exposition-only conceptinout-matrixand +before the declaration of the exposition-only concept +possibly-packed-inout-matrix. [Editorial +Note: This addition is not shown in green, becuase the authors could +not convince Markdown to format the code correctly. – end +note]
template<class T>
+ concept possibly-packed-out-matrix = see below; // exposition only++Then, remove the declaration of the exposition-only concept +
+possibly-packed-inout-matrixfrom the header +<linalg>synopsis [linalg.syn].
++Then, add the following lines to +[linalg.helpers.concepts], just after the definition of +the exposition-only variable template +
+is-layout_blas_packedand just before the +definition of the exposition-only concept +possibly-packed-inout-matrix. [Editorial +Note: This addition is not shown in green, becuase the authors could +not convince Markdown to format the code correctly. – end +note]
template<class T>
+ concept possibly-packed-out-matrix =
+ is-mdspan<T> && T::rank() == 2 &&
+ is_assignable_v<typename T::reference, typename T::element_type> &&
+ (T::is_always_unique() || is-layout-blas-packed<typename T::layout_type>);++Then, remove the definition of the exposition-only concept +
+possibly-packed-inout-matrixfrom +[linalg.helpers.concepts].
++Then, in [linalg.helpers.concepts], change paragraph 3 to read as +follows (new content “or +
+possibly-packed-out-matrix” in green; removed +content “orpossibly-packed-inout-matrix” in +red).
Unless explicitly permitted, any inout-vector,
+inout-matrix, inout-object,
+out-vector, out-matrix,
+out-object, or
+possibly-packed-out-matrix, or
+
+parameter of a function in [linalg] shall not overlap any other
+possibly-packed-inout-matrixmdspan parameter of the function.
++In the header
+<linalg>synopsis +[linalg.syn], replace all the declarations of all the +matrix_rank_1_update,matrix_rank_1_update_c, +symmetric_matrix_rank_1_update, and +hermitian_matrix_rank_1_updateoverloads to read as +follows. [Editorial Note: There are two changes here. First, the +existing overloads become “overwriting” overloads. Second, new +“updating” overloads are added.Changes do not use red or green highlighting, becuase the authors +could not convince Markdown to format the code correctly. – end +note]
+
// [linalg.algs.blas2.rank1], nonsymmetric rank-1 matrix update
+
+ // overwriting nonsymmetric rank-1 matrix update
+ template<in-vector InVec1, in-vector InVec2, out-matrix OutMat>
+ void matrix_rank_1_update(InVec1 x, InVec2 y, OutMat A);
+ template<class ExecutionPolicy, in-vector InVec1, in-vector InVec2, out-matrix OutMat>
+ void matrix_rank_1_update(ExecutionPolicy&& exec,
+ InVec1 x, InVec2 y, OutMat A);
+
+ template<in-vector InVec1, in-vector InVec2, out-matrix OutMat>
+ void matrix_rank_1_update_c(InVec1 x, InVec2 y, OutMat A);
+ template<class ExecutionPolicy, in-vector InVec1, in-vector InVec2, out-matrix OutMat>
+ void matrix_rank_1_update_c(ExecutionPolicy&& exec,
+ InVec1 x, InVec2 y, OutMat A);
+
+ // updating nonsymmetric rank-1 matrix update
+ template<in-vector InVec1, in-vector InVec2, in-matrix InMat, out-matrix OutMat>
+ void matrix_rank_1_update(InVec1 x, InVec2 y, InMat E, OutMat A);
+ template<class ExecutionPolicy, in-vector InVec1, in-matrix InMat, in-vector InVec2, out-matrix OutMat>
+ void matrix_rank_1_update(ExecutionPolicy&& exec,
+ InVec1 x, InVec2 y, InMat E, OutMat A);
+
+ template<in-vector InVec1, in-vector InVec2, in-matrix InMat, out-matrix OutMat>
+ void matrix_rank_1_update_c(InVec1 x, InVec2 y, InMat E, OutMat A);
+ template<class ExecutionPolicy, in-vector InVec1, in-vector InVec2, in-matrix InMat, out-matrix OutMat>
+ void matrix_rank_1_update_c(ExecutionPolicy&& exec,
+ InVec1 x, InVec2 y, InMat E, OutMat A);
+
+ // [linalg.algs.blas2.symherrank1], symmetric or Hermitian rank-1 matrix update
+
+ // overwriting symmetric rank-1 matrix update
+ template<class Scalar, in-vector InVec, possibly-packed-out-matrix OutMat, class Triangle>
+ void symmetric_matrix_rank_1_update(Scalar alpha, InVec x, OutMat A, Triangle t);
+ template<class ExecutionPolicy,
+ class Scalar, in-vector InVec, possibly-packed-out-matrix OutMat, class Triangle>
+ void symmetric_matrix_rank_1_update(ExecutionPolicy&& exec,
+ Scalar alpha, InVec x, OutMat A, Triangle t);
+ template<in-vector InVec, possibly-packed-out-matrix OutMat, class Triangle>
+ void symmetric_matrix_rank_1_update(InVec x, OutMat A, Triangle t);
+ template<class ExecutionPolicy,
+ in-vector InVec, possibly-packed-out-matrix OutMat, class Triangle>
+ void symmetric_matrix_rank_1_update(ExecutionPolicy&& exec,
+ InVec x, OutMat A, Triangle t);
+
+ // updating symmetric rank-1 matrix update
+ template<class Scalar, in-vector InVec, in-matrix InMat, possibly-packed-out-matrix OutMat, class Triangle>
+ void symmetric_matrix_rank_1_update(Scalar alpha, InVec x, InMat E, OutMat A, Triangle t);
+ template<class ExecutionPolicy,
+ class Scalar, in-vector InVec, in-matrix InMat, possibly-packed-out-matrix OutMat, class Triangle>
+ void symmetric_matrix_rank_1_update(ExecutionPolicy&& exec,
+ Scalar alpha, InVec x, InMat E, OutMat A, Triangle t);
+ template<in-vector InVec, in-matrix InMat, possibly-packed-out-matrix OutMat, class Triangle>
+ void symmetric_matrix_rank_1_update(InVec x, InMat E, OutMat A, Triangle t);
+ template<class ExecutionPolicy,
+ in-vector InVec, in-matrix InMat, possibly-packed-out-matrix OutMat, class Triangle>
+ void symmetric_matrix_rank_1_update(ExecutionPolicy&& exec,
+ InVec x, InMat E, OutMat A, Triangle t);
+
+ // overwriting Hermitian rank-1 matrix update
+ template<class Scalar, in-vector InVec, possibly-packed-out-matrix OutMat, class Triangle>
+ void hermitian_matrix_rank_1_update(Scalar alpha, InVec x, OutMat A, Triangle t);
+ template<class ExecutionPolicy,
+ class Scalar, in-vector InVec, possibly-packed-out-matrix OutMat, class Triangle>
+ void hermitian_matrix_rank_1_update(ExecutionPolicy&& exec,
+ Scalar alpha, InVec x, OutMat A, Triangle t);
+ template<in-vector InVec, possibly-packed-out-matrix OutMat, class Triangle>
+ void hermitian_matrix_rank_1_update(InVec x, OutMat A, Triangle t);
+ template<class ExecutionPolicy,
+ in-vector InVec, possibly-packed-out-matrix OutMat, class Triangle>
+ void hermitian_matrix_rank_1_update(ExecutionPolicy&& exec,
+ InVec x, OutMat A, Triangle t);
+
+ // updating Hermitian rank-1 matrix update
+ template<class Scalar, in-vector InVec, in-matrix InMat, possibly-packed-out-matrix OutMat, class Triangle>
+ void hermitian_matrix_rank_1_update(Scalar alpha, InVec x, InMat E, OutMat A, Triangle t);
+ template<class ExecutionPolicy,
+ class Scalar, in-vector InVec, in-matrix InMat, possibly-packed-out-matrix OutMat, class Triangle>
+ void hermitian_matrix_rank_1_update(ExecutionPolicy&& exec,
+ Scalar alpha, InVec x, InMat E, OutMat A, Triangle t);
+ template<in-vector InVec, in-matrix InMat, possibly-packed-out-matrix OutMat, class Triangle>
+ void hermitian_matrix_rank_1_update(InVec x, InMat E, OutMat A, Triangle t);
+ template<class ExecutionPolicy,
+ in-vector InVec, in-matrix InMat, possibly-packed-out-matrix OutMat, class Triangle>
+ void hermitian_matrix_rank_1_update(ExecutionPolicy&& exec,
+ InVec x, InMat E, OutMat A, Triangle t);++In the header
+<linalg>synopsis +[linalg.syn], replace all the declarations of all the +symmetric_matrix_rank_2_updateand +hermitian_matrix_rank_2_updateoverloads to read as +follows. [Editorial Note: These additions are not shown in green, +becuase the authors could not convince Markdown to format the code +correctly. – end note]
// [linalg.algs.blas2.rank2], symmetric and Hermitian rank-2 matrix updates
+
+ // overwriting symmetric rank-2 matrix update
+ template<in-vector InVec1, in-vector InVec2,
+ possibly-packed-out-matrix OutMat, class Triangle>
+ void symmetric_matrix_rank_2_update(InVec1 x, InVec2 y, OutMat A, Triangle t);
+ template<class ExecutionPolicy, in-vector InVec1, in-vector InVec2,
+ possibly-packed-out-matrix OutMat, class Triangle>
+ void symmetric_matrix_rank_2_update(ExecutionPolicy&& exec,
+ InVec1 x, InVec2 y, OutMat A, Triangle t);
+
+ // updating symmetric rank-2 matrix update
+ template<in-vector InVec1, in-vector InVec2,
+ in-matrix InMat,
+ possibly-packed-out-matrix OutMat, class Triangle>
+ void symmetric_matrix_rank_2_update(InVec1 x, InVec2 y, InMat E, OutMat A, Triangle t);
+ template<class ExecutionPolicy, in-vector InVec1, in-vector InVec2,
+ in-matrix InMat,
+ possibly-packed-out-matrix OutMat, class Triangle>
+ void symmetric_matrix_rank_2_update(ExecutionPolicy&& exec,
+ InVec1 x, InVec2 y, InMat E, OutMat A, Triangle t);
+
+ // overwriting Hermitian rank-2 matrix update
+ template<in-vector InVec1, in-vector InVec2,
+ possibly-packed-out-matrix OutMat, class Triangle>
+ void hermitian_matrix_rank_2_update(InVec1 x, InVec2 y, OutMat A, Triangle t);
+ template<class ExecutionPolicy, in-vector InVec1, in-vector InVec2,
+ possibly-packed-out-matrix OutMat, class Triangle>
+ void hermitian_matrix_rank_2_update(ExecutionPolicy&& exec,
+ InVec1 x, InVec2 y, OutMat A, Triangle t);
+
+ // updating Hermitian rank-2 matrix update
+ template<in-vector InVec1, in-vector InVec2,
+ in-matrix InMat,
+ possibly-packed-out-matrix OutMat, class Triangle>
+ void hermitian_matrix_rank_2_update(InVec1 x, InVec2 y, InMat E, OutMat A, Triangle t);
+ template<class ExecutionPolicy, in-vector InVec1, in-vector InVec2,
+ in-matrix InMat,
+ possibly-packed-out-matrix OutMat, class Triangle>
+ void hermitian_matrix_rank_2_update(ExecutionPolicy&& exec,
+ InVec1 x, InVec2 y, InMat E, OutMat A, Triangle t);++In the header
+<linalg>synopsis +[linalg.syn], replace all the declarations of all the +symmetric_matrix_rank_k_updateand +hermitian_matrix_rank_k_updateoverloads to read as +follows. [Editorial Note: These additions are not shown in green, +becuase the authors could not convince Markdown to format the code +correctly. – end note]
// [linalg.algs.blas3.rankk], rank-k update of a symmetric or Hermitian matrix
+
+ // overwriting symmetric rank-k matrix update
+ template<class Scalar,
+ in-matrix InMat,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+ void symmetric_matrix_rank_k_update(
+ Scalar alpha, InMat A, OutMat C, Triangle t);
+ template<class ExecutionPolicy, class Scalar,
+ in-matrix InMat,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+ void symmetric_matrix_rank_k_update(
+ ExecutionPolicy&& exec, Scalar alpha, InMat A, OutMat C, Triangle t);
+ template<in-matrix InMat,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+ void symmetric_matrix_rank_k_update(
+ InMat A, OutMat C, Triangle t);
+ template<class ExecutionPolicy,
+ in-matrix InMat,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+ void symmetric_matrix_rank_k_update(
+ ExecutionPolicy&& exec, InMat A, OutMat C, Triangle t);
+
+ // updating symmetric rank-k matrix update
+ template<class Scalar,
+ in-matrix InMat1,
+ in-matrix InMat2,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+ void symmetric_matrix_rank_k_update(
+ Scalar alpha,
+ InMat1 A, InMat2 E, OutMat C, Triangle t);
+ template<class ExecutionPolicy, class Scalar,
+ in-matrix InMat1,
+ in-matrix InMat2,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+ void symmetric_matrix_rank_k_update(
+ ExecutionPolicy&& exec, Scalar alpha,
+ InMat1 A, InMat2 E, OutMat C, Triangle t);
+ template<in-matrix InMat1,
+ in-matrix InMat2,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+ void symmetric_matrix_rank_k_update(
+ InMat1 A, InMat2 E, OutMat C, Triangle t);
+ template<class ExecutionPolicy,
+ in-matrix InMat1,
+ in-matrix InMat2,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+ void symmetric_matrix_rank_k_update(
+ ExecutionPolicy&& exec,
+ InMat1 A, InMat2 E, OutMat C, Triangle t);
+
+ // overwriting rank-k Hermitian matrix update
+ template<class Scalar,
+ in-matrix InMat,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+ void hermitian_matrix_rank_k_update(
+ Scalar alpha, InMat A, OutMat C, Triangle t);
+ template<class ExecutionPolicy, class Scalar,
+ in-matrix InMat,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+ void hermitian_matrix_rank_k_update(
+ ExecutionPolicy&& exec, Scalar alpha, InMat A, OutMat C, Triangle t);
+ template<in-matrix InMat,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+ void hermitian_matrix_rank_k_update(
+ InMat A, OutMat C, Triangle t);
+ template<class ExecutionPolicy,
+ in-matrix InMat,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+ void hermitian_matrix_rank_k_update(
+ ExecutionPolicy&& exec, InMat A, OutMat C, Triangle t);
+
+ // updating rank-k Hermitian matrix update
+ template<class Scalar,
+ in-matrix InMat1,
+ in-matrix InMat2,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+ void hermitian_matrix_rank_k_update(
+ Scalar alpha,
+ InMat1 A, InMat2 E, OutMat C, Triangle t);
+ template<class ExecutionPolicy, class Scalar,
+ in-matrix InMat1,
+ in-matrix InMat2,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+ void hermitian_matrix_rank_k_update(
+ ExecutionPolicy&& exec, Scalar alpha,
+ InMat1 A, InMat2 E, OutMat C, Triangle t);
+ template<in-matrix InMat1,
+ in-matrix InMat2,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+ void hermitian_matrix_rank_k_update(
+ InMat1 A, InMat2 E, OutMat C, Triangle t);
+ template<class ExecutionPolicy,
+ in-matrix InMat1,
+ in-matrix InMat2,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+ void hermitian_matrix_rank_k_update(
+ ExecutionPolicy&& exec,
+ InMat1 A, InMat2 E, OutMat C, Triangle t);++In the header
+<linalg>synopsis +[linalg.syn], replace all the declarations of all the +symmetric_matrix_rank_2k_updateand +hermitian_matrix_rank_2k_updateoverloads to read as +follows. [Editorial Note: These additions are not shown in green, +becuase the authors could not convince Markdown to format the code +correctly. – end note]
// [linalg.algs.blas3.rank2k], rank-2k update of a symmetric or Hermitian matrix
+
+ // overwriting symmetric rank-2k matrix update
+ template<in-matrix InMat1, in-matrix InMat2,
+ possibly-packed-out-matrix OutMat, class Triangle>
+ void symmetric_matrix_rank_2k_update(InMat1 A, InMat2 B, OutMat C, Triangle t);
+ template<class ExecutionPolicy,
+ in-matrix InMat1, in-matrix InMat2,
+ possibly-packed-out-matrix OutMat, class Triangle>
+ void symmetric_matrix_rank_2k_update(ExecutionPolicy&& exec,
+ InMat1 A, InMat2 B, OutMat C, Triangle t);
+
+ // updating symmetric rank-2k matrix update
+ template<in-matrix InMat1, in-matrix InMat2,
+ in-matrix InMat3,
+ possibly-packed-out-matrix OutMat, class Triangle>
+ void symmetric_matrix_rank_2k_update(InMat1 A, InMat2 B, InMat3 E, OutMat C, Triangle t);
+ template<class ExecutionPolicy,
+ in-matrix InMat1, in-matrix InMat2,
+ in-matrix InMat3,
+ possibly-packed-out-matrix OutMat, class Triangle>
+ void symmetric_matrix_rank_2k_update(ExecutionPolicy&& exec,
+ InMat1 A, InMat2 B, InMat3 E, OutMat C, Triangle t);
+
+ // overwriting Hermitian rank-2k matrix update
+ template<in-matrix InMat1, in-matrix InMat2,
+ possibly-packed-out-matrix OutMat, class Triangle>
+ void hermitian_matrix_rank_2k_update(InMat1 A, InMat2 B, OutMat C, Triangle t);
+ template<class ExecutionPolicy,
+ in-matrix InMat1, in-matrix InMat2,
+ possibly-packed-out-matrix OutMat, class Triangle>
+ void hermitian_matrix_rank_2k_update(ExecutionPolicy&& exec,
+ InMat1 A, InMat2 B, OutMat C, Triangle t);
+
+ // updating Hermitian rank-2k matrix update
+ template<in-matrix InMat1, in-matrix InMat2,
+ in-matrix InMat3,
+ possibly-packed-out-matrix OutMat, class Triangle>
+ void hermitian_matrix_rank_2k_update(InMat1 A, InMat2 B, InMat3 E, OutMat C, Triangle t);
+ template<class ExecutionPolicy,
+ in-matrix InMat1, in-matrix InMat2,
+ in-matrix InMat3,
+ possibly-packed-out-matrix OutMat, class Triangle>
+ void hermitian_matrix_rank_2k_update(ExecutionPolicy&& exec,
+ InMat1 A, InMat2 B, InMat3 E, OutMat C, Triangle t);++Replace the entire contents of [linalg.algs.blas2.rank1] with the +following.
+
1 +The following elements apply to all functions in +[linalg.algs.blas2.rank1].
+2 +Mandates:
+(2.1)
+possibly-multipliable<OutMat, InVec2, InVec1>()
+is true, and
(2.2)
+possibly-addable(A, E, A) is
+true for those overloads that take an E
+parameter.
3 +Preconditions:
+(3.1)
+multipliable(A, y, x) is true, and
(3.2)
+addable(A, E, A) is true
+for those overloads that take an E parameter.
4
+Complexity: O(
+x.extent(0) × y.extent(0) ).
template<in-vector InVec1, in-vector InVec2, out-matrix OutMat>
+ void matrix_rank_1_update(InVec1 x, InVec2 y, OutMat A);
+template<class ExecutionPolicy, in-vector InVec1, in-vector InVec2, out-matrix OutMat>
+ void matrix_rank_1_update(ExecutionPolicy&& exec, InVec1 x, InVec2 y, OutMat A);5 +These functions perform a overwriting nonsymmetric nonconjugated rank-1 +update.
+[Note: These functions correspond to the BLAS functions
+xGER (for real element types) and xGERU (for
+complex element types)[bib]. – end note]
6 +Effects: Computes A = xyT.
+template<in-vector InVec1, in-vector InVec2, in-matrix InMat, out-matrix OutMat>
+ void matrix_rank_1_update(InVec1 x, InVec2 y, InMat E, OutMat A);
+template<class ExecutionPolicy, in-vector InVec1, in-vector InVec2, in-matrix InMat, out-matrix OutMat>
+ void matrix_rank_1_update(ExecutionPolicy&& exec, InVec1 x, InVec2 y, InMat E, OutMat A);7 +These functions perform an updating nonsymmetric nonconjugated rank-1 +update.
+[Note: These functions correspond to the BLAS functions
+xGER (for real element types) and xGERU (for
+complex element types)[bib]. – end note]
8 +Effects: Computes A = E + xyT.
+9
+Remarks: A may alias E.
template<in-vector InVec1, in-vector InVec2, out-matrix OutMat>
+ void matrix_rank_1_update_c(InVec1 x, InVec2 y, OutMat A);
+template<class ExecutionPolicy, in-vector InVec1, in-vector InVec2, out-matrix OutMat>
+ void matrix_rank_1_update_c(ExecutionPolicy&& exec, InVec1 x, InVec2 y, OutMat A);10 These +functions perform a overwriting nonsymmetric conjugated rank-1 +update.
+[Note: These functions correspond to the BLAS functions
+xGER (for real element types) and xGERC (for
+complex element types)[bib]. – end note]
11 +Effects:
+(11.1) For
+the overloads without an ExecutionPolicy argument,
+equivalent to:
matrix_rank_1_update(x, conjugated(y), A);(11.2) +otherwise, equivalent to:
+matrix_rank_1_update(std::forward<ExecutionPolicy>(exec), x, conjugated(y), A);++Replace the entire contents of [linalg.algs.blas2.symherrank1] with +the following.
+
1
+[Note: These functions correspond to the BLAS functions
+xSYR, xSPR, xHER, and
+xHPR[bib]. They have overloads taking a scaling factor
+alpha, because it would be impossible to express the update
+A = A − xxT
+in noncomplex arithmetic otherwise. – end note]
2 +The following elements apply to all functions in +[linalg.algs.blas2.symherrank1].
+3
+For any function F in this section that takes a parameter
+named t, an InMat template parameter, and a
+function parameter InMat E, t applies to
+accesses done through the parameter E. F will
+only access the triangle of E specified by t.
+For accesses of diagonal elements E[i, i], F
+will use the value
+real-if-needed(E[i, i]) if the name
+of F starts with hermitian. For accesses
+E[i, j] outside the triangle specified by t,
+F will use the value
(3.1)
+conj-if-needed(E[j, i]) if the name
+of F starts with hermitian, or
(3.2)
+E[j, i] if the name of F starts with
+symmetric.
4 +Mandates:
+(4.1) If
+OutMat has layout_blas_packed layout, then the
+layout’s Triangle template argument has the same type as
+the function’s Triangle template argument;
(4.2) If
+the function has an InMat template parameter and
+InMat has layout_blas_packed layout, then the
+layout’s Triangle template argument has the same type as
+the function’s Triangle template argument;
(4.3)
+compatible-static-extents<decltype(A), decltype(A)>(0, 1)
+is true;
(4.4)
+compatible-static-extents<decltype(A), decltype(x)>(0, 0)
+is true; and
(4.5)
+possibly-addable<decltype(A), decltype(E), decltype(A)>
+is true for those overloads that take an E
+parameter.
5 +Preconditions:
+(5.1)
+A.extent(0) equals A.extent(1),
(5.2)
+A.extent(0) equals x.extent(0), and
(5.3)
+addable(A, E, A) is true
+for those overloads that take an E parameter.
6
+Complexity: O(
+x.extent(0) × x.extent(0) ).
template<class Scalar, in-vector InVec, possibly-packed-out-matrix OutMat, class Triangle>
+ void symmetric_matrix_rank_1_update(Scalar alpha, InVec x, OutMat A, Triangle t);
+template<class ExecutionPolicy,
+ class Scalar, in-vector InVec, possibly-packed-out-matrix OutMat, class Triangle>
+ void symmetric_matrix_rank_1_update(ExecutionPolicy&& exec,
+ Scalar alpha, InVec x, OutMat A, Triangle t);7
+These functions perform an overwriting symmetric rank-1 update of the
+symmetric matrix A, taking into account the
+Triangle parameter that applies to A
+([linalg.general]).
8
+Effects: Computes A = αxxT,
+where the scalar α is
+alpha.
template<in-vector InVec, possibly-packed-out-matrix OutMat, class Triangle>
+ void symmetric_matrix_rank_1_update(InVec x, OutMat A, Triangle t);
+template<class ExecutionPolicy,
+ in-vector InVec, possibly-packed-out-matrix OutMat, class Triangle>
+ void symmetric_matrix_rank_1_update(ExecutionPolicy&& exec, InVec x, OutMat A, Triangle t);9
+These functions perform an overwriting symmetric rank-1 update of the
+symmetric matrix A, taking into account the
+Triangle parameter that applies to A
+([linalg.general]).
10 +Effects: Computes A = xxT.
+template<class Scalar, in-vector InVec, in-matrix InMat, possibly-packed-out-matrix OutMat, class Triangle>
+ void symmetric_matrix_rank_1_update(Scalar alpha, InVec x, InMat E, OutMat A, Triangle t);
+template<class ExecutionPolicy,
+ class Scalar, in-vector InVec, in-matrix InMat, possibly-packed-out-matrix OutMat, class Triangle>
+ void symmetric_matrix_rank_1_update(ExecutionPolicy&& exec,
+ Scalar alpha, InVec x, InMat E, OutMat A, Triangle t);11 These
+functions perform an updating symmetric rank-1 update of the symmetric
+matrix A using the symmetric matrix E, taking
+into account the Triangle parameter that applies to
+A and E ([linalg.general]).
12
+Effects: Computes A = E + αxxT,
+where the scalar α is
+alpha.
template<in-vector InVec, in-matrix InMat, possibly-packed-out-matrix OutMat, class Triangle>
+ void symmetric_matrix_rank_1_update(InVec x, InMat E, OutMat A, Triangle t);
+template<class ExecutionPolicy,
+ in-vector InVec, in-matrix InMat, possibly-packed-out-matrix OutMat, class Triangle>
+ void symmetric_matrix_rank_1_update(ExecutionPolicy&& exec,
+ InVec x, InMat E, OutMat A, Triangle t);13 These
+functions perform an updating symmetric rank-1 update of the symmetric
+matrix A using the symmetric matrix E, taking
+into account the Triangle parameter that applies to
+A and E ([linalg.general]).
14 +Effects: Computes A = E + xxT.
+template<class Scalar, in-vector InVec, possibly-packed-out-matrix OutMat, class Triangle>
+ void hermitian_matrix_rank_1_update(Scalar alpha, InVec x, OutMat A, Triangle t);
+template<class ExecutionPolicy,
+ class Scalar, in-vector InVec, possibly-packed-out-matrix OutMat, class Triangle>
+ void hermitian_matrix_rank_1_update(ExecutionPolicy&& exec,
+ Scalar alpha, InVec x, OutMat A, Triangle t);15 These
+functions perform an overwriting Hermitian rank-1 update of the
+Hermitian matrix A, taking into account the
+Triangle parameter that applies to A
+([linalg.general]).
16
+Effects: Computes A = αxxH,
+where the scalar α is
+real-if-needed(alpha).
template<in-vector InVec, possibly-packed-out-matrix OutMat, class Triangle>
+ void hermitian_matrix_rank_1_update(InVec x, OutMat A, Triangle t);
+template<class ExecutionPolicy,
+ in-vector InVec, possibly-packed-out-matrix OutMat, class Triangle>
+ void hermitian_matrix_rank_1_update(ExecutionPolicy&& exec, InVec x, OutMat A, Triangle t);17 These
+functions perform an overwriting Hermitian rank-1 update of the
+Hermitian matrix A, taking into account the
+Triangle parameter that applies to A
+([linalg.general]).
18 +Effects: Computes A = xxT.
+template<class Scalar, in-vector InVec, in-matrix InMat, possibly-packed-out-matrix OutMat, class Triangle>
+ void hermitian_matrix_rank_1_update(Scalar alpha, InVec x, InMat E, OutMat A, Triangle t);
+template<class ExecutionPolicy,
+ class Scalar, in-vector InVec, in-matrix InMat, possibly-packed-out-matrix OutMat, class Triangle>
+ void hermitian_matrix_rank_1_update(ExecutionPolicy&& exec,
+ Scalar alpha, InVec x, InMat E, OutMat A, Triangle t);19 These
+functions perform an updating Hermitian rank-1 update of the Hermitian
+matrix A using the Hermitian matrix E, taking
+into account the Triangle parameter that applies to
+A and E ([linalg.general]).
20
+Effects: Computes A = E + αxxH,
+where the scalar α is
+real-if-needed(alpha).
template<in-vector InVec, in-matrix InMat, possibly-packed-out-matrix OutMat, class Triangle>
+ void hermitian_matrix_rank_1_update(InVec x, InMat E, OutMat A, Triangle t);
+template<class ExecutionPolicy,
+ in-vector InVec, in-matrix InMat, possibly-packed-out-matrix OutMat, class Triangle>
+ void hermitian_matrix_rank_1_update(ExecutionPolicy&& exec,
+ InVec x, InMat E, OutMat A, Triangle t);21 These
+functions perform an updating Hermitian rank-1 update of the Hermitian
+matrix A using the Hermitian matrix E, taking
+into account the Triangle parameter that applies to
+A and E ([linalg.general]).
22 +Effects: Computes A = E + xxT.
+++Replace the entire contents of [linalg.algs.blas2.rank2] with the +following.
+
1
+[Note: These functions correspond to the BLAS functions
+xSYR2, xSPR2, xHER2, and
+xHPR2 [bib]. – end note]
2 +The following elements apply to all functions in +[linalg.algs.blas2.rank2].
+3
+For any function F in this section that takes a parameter
+named t, an InMat template parameter, and a
+function parameter InMat E, t applies to
+accesses done through the parameter E. F will
+only access the triangle of E specified by t.
+For accesses of diagonal elements E[i, i], F
+will use the value
+real-if-needed(E[i, i]) if the name
+of F starts with hermitian. For accesses
+E[i, j] outside the triangle specified by t,
+F will use the value
(3.1)
+conj-if-needed(E[j, i]) if the name
+of F starts with hermitian, or
(3.2)
+E[j, i] if the name of F starts with
+symmetric.
4 +Mandates:
+(4.1) If
+OutMat has layout_blas_packed layout, then the
+layout’s Triangle template argument has the same type as
+the function’s Triangle template argument;
(4.2) If
+the function has an InMat template parameter and
+InMat has layout_blas_packed layout, then the
+layout’s Triangle template argument has the same type as
+the function’s Triangle template argument;
(4.3)
+compatible-static-extents<decltype(A), decltype(A)>(0, 1)
+is true;
(4.4)
+possibly-multipliable<decltype(A), decltype(x), decltype(y)>()
+is true; and
(4.5)
+possibly-addable<decltype(A), decltype(E), decltype(A)>
+is true for those overloads that take an E
+parameter.
5 +Preconditions:
+(5.1)
+A.extent(0) equals A.extent(1),
(5.2)
+multipliable(A, x, y) is
+true, and
(5.3)
+addable(A, E, A) is true
+for those overloads that take an E parameter.
6
+Complexity: O(
+x.extent(0) × y.extent(0) ).
template<in-vector InVec1, in-vector InVec2,
+ possibly-packed-out-matrix OutMat, class Triangle>
+ void symmetric_matrix_rank_2_update(InVec1 x, InVec2 y, OutMat A, Triangle t);
+template<class ExecutionPolicy, in-vector InVec1, in-vector InVec2,
+ possibly-packed-out-matrix OutMat, class Triangle>
+ void symmetric_matrix_rank_2_update(ExecutionPolicy&& exec,
+ InVec1 x, InVec2 y, OutMat A, Triangle t);7
+These functions perform an overwriting symmetric rank-2 update of the
+symmetric matrix A, taking into account the
+Triangle parameter that applies to A
+([linalg.general]).
8 +Effects: Computes A = xyT + yxT.
+template<in-vector InVec1, in-vector InVec2,
+ in-matrix InMat,
+ possibly-packed-out-matrix OutMat, class Triangle>
+ void symmetric_matrix_rank_2_update(InVec1 x, InVec2 y, InMat E, OutMat A, Triangle t);
+template<class ExecutionPolicy, in-vector InVec1, in-vector InVec2,
+ in-matrix InMat,
+ possibly-packed-out-matrix OutMat, class Triangle>
+ void symmetric_matrix_rank_2_update(ExecutionPolicy&& exec,
+ InVec1 x, InVec2 y, InMat E, OutMat A, Triangle t);9
+These functions perform an updating symmetric rank-2 update of the
+symmetric matrix A using the symmetric matrix
+E, taking into account the Triangle parameter
+that applies to A and E
+([linalg.general]).
10 +Effects: Computes A = E + xyT + yxT.
+template<in-vector InVec1, in-vector InVec2,
+ possibly-packed-out-matrix OutMat, class Triangle>
+ void hermitian_matrix_rank_2_update(InVec1 x, InVec2 y, OutMat A, Triangle t);
+template<class ExecutionPolicy, in-vector InVec1, in-vector InVec2,
+ possibly-packed-out-matrix OutMat, class Triangle>
+ void hermitian_matrix_rank_2_update(ExecutionPolicy&& exec,
+ InVec1 x, InVec2 y, OutMat A, Triangle t);11 These
+functions perform an overwriting Hermitian rank-2 update of the
+Hermitian matrix A, taking into account the
+Triangle parameter that applies to A
+([linalg.general]).
12 +Effects: Computes A = xyH + yxH.
+template<in-vector InVec1, in-vector InVec2,
+ in-matrix InMat,
+ possibly-packed-out-matrix OutMat, class Triangle>
+ void hermitian_matrix_rank_2_update(InVec1 x, InVec2 y, InMat E, OutMat A, Triangle t);
+template<class ExecutionPolicy, in-vector InVec1, in-vector InVec2,
+ in-matrix InMat,
+ possibly-packed-out-matrix OutMat, class Triangle>
+ void hermitian_matrix_rank_2_update(ExecutionPolicy&& exec,
+ InVec1 x, InVec2 y, InMat E, OutMat A, Triangle t);13 These
+functions perform an updating Hermitian rank-2 update of the Hermitian
+matrix A using the Hermitian matrix E, taking
+into account the Triangle parameter that applies to
+A and E ([linalg.general]).
14 +Effects: Computes A = E + xyH + yxH.
+++Replace the entire contents of [linalg.algs.blas3.rankk] with the +following.
+
[Note: These functions correspond to the BLAS functions
+xSYRK and xHERK. – end note]
1 +The following elements apply to all functions in +[linalg.algs.blas3.rankk].
+2
+For any function F in this section that takes a parameter
+named t, an InMat2 template parameter, and a
+function parameter InMat2 E, t applies to
+accesses done through the parameter E. F will
+only access the triangle of E specified by t.
+For accesses of diagonal elements E[i, i], F
+will use the value
+real-if-needed(E[i, i]) if the name
+of F starts with hermitian. For accesses
+E[i, j] outside the triangle specified by t,
+F will use the value
(2.1)
+conj-if-needed(E[j, i]) if the name
+of F starts with hermitian, or
(2.2)
+E[j, i] if the name of F starts with
+symmetric.
3 +Mandates:
+(3.1) If
+OutMat has layout_blas_packed layout, then the
+layout’s Triangle template argument has the same type as
+the function’s Triangle template argument.
(3.2) If
+the function takes an InMat2 template parameter and if
+InMat2 has layout_blas_packed layout, then the
+layout’s Triangle template argument has the same type as
+the function’s Triangle template argument.
(3.3)
+possibly-multipliable<decltype(A), decltype(transposed(A)), decltype(C)>
+is true.
(3.4)
+possibly-addable<decltype(C), decltype(E), decltype(C)>
+is true for those overloads that take an E
+parameter.
4 +Preconditions:
+(4.1)
+multipliable(A, transposed(A), C) is
+true. [Note: This implies that C is
+square. – end note]
(4.2)
+addable(C, E, C) is true
+for those overloads that take an E parameter.
5
+Complexity: O(
+A.extent(0) ⋅
+A.extent(1) ⋅
+A.extent(0) ).
6
+Remarks: C may alias E for those
+overloads that take an E parameter.
template<class Scalar,
+ in-matrix InMat,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+void symmetric_matrix_rank_k_update(
+ Scalar alpha,
+ InMat A,
+ OutMat C,
+ Triangle t);
+template<class ExecutionPolicy,
+ class Scalar,
+ in-matrix InMat,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+void symmetric_matrix_rank_k_update(
+ ExecutionPolicy&& exec,
+ Scalar alpha,
+ InMat A,
+ OutMat C,
+ Triangle t);5
+Effects: Computes C = αAAT,
+where the scalar α is
+alpha.
template<in-matrix InMat,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+void symmetric_matrix_rank_k_update(
+ InMat A,
+ OutMat C,
+ Triangle t);
+template<class ExecutionPolicy,
+ in-matrix InMat,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+void symmetric_matrix_rank_k_update(
+ ExecutionPolicy&& exec,
+ InMat A,
+ OutMat C,
+ Triangle t);6 +Effects: Computes C = AAT.
+template<class Scalar,
+ in-matrix InMat,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+void hermitian_matrix_rank_k_update(
+ Scalar alpha,
+ InMat A,
+ OutMat C,
+ Triangle t);
+template<class ExecutionPolicy,
+ class Scalar,
+ in-matrix InMat,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+void hermitian_matrix_rank_k_update(
+ ExecutionPolicy&& exec,
+ Scalar alpha,
+ InMat A,
+ OutMat C,
+ Triangle t);7
+Effects: Computes C = αAAH,
+where the scalar α is
+real-if-needed(alpha).
template<in-matrix InMat,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+void hermitian_matrix_rank_k_update(
+ InMat A,
+ OutMat C,
+ Triangle t);
+template<class ExecutionPolicy,
+ in-matrix InMat,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+void hermitian_matrix_rank_k_update(
+ ExecutionPolicy&& exec,
+ InMat A,
+ OutMat C,
+ Triangle t);8 +Effects: Computes C = AAH.
+template<class Scalar,
+ in-matrix InMat1,
+ in-matrix InMat2,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+void symmetric_matrix_rank_k_update(
+ Scalar alpha,
+ InMat1 A,
+ InMat2 E,
+ OutMat C,
+ Triangle t);
+template<class ExecutionPolicy,
+ class Scalar,
+ in-matrix InMat1,
+ in-matrix InMat2,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+void symmetric_matrix_rank_k_update(
+ ExecutionPolicy&& exec,
+ Scalar alpha,
+ InMat1 A,
+ InMat2 E,
+ OutMat C,
+ Triangle t);9
+Effects: Computes C = E + αAAT,
+where the scalar α is
+alpha.
template<in-matrix InMat1,
+ in-matrix InMat2,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+void symmetric_matrix_rank_k_update(
+ InMat1 A,
+ InMat2 E,
+ OutMat C,
+ Triangle t);
+template<class ExecutionPolicy,
+ in-matrix InMat1,
+ in-matrix InMat2,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+void symmetric_matrix_rank_k_update(
+ ExecutionPolicy&& exec,
+ InMat1 A,
+ InMat2 E,
+ OutMat C,
+ Triangle t);10 +Effects: Computes C = E + AAT.
+template<class Scalar,
+ in-matrix InMat1,
+ in-matrix InMat2,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+void hermitian_matrix_rank_k_update(
+ Scalar alpha,
+ InMat1 A,
+ InMat2 E,
+ OutMat C,
+ Triangle t);
+template<class ExecutionPolicy,
+ class Scalar,
+ in-matrix InMat1,
+ in-matrix InMat2,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+void hermitian_matrix_rank_k_update(
+ ExecutionPolicy&& exec,
+ Scalar alpha,
+ InMat1 A,
+ InMat2 E,
+ OutMat C,
+ Triangle t);11
+Effects: Computes C = E + αAAH,
+where the scalar α is
+real-if-needed(alpha).
template<in-matrix InMat1,
+ in-matrix InMat2,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+void hermitian_matrix_rank_k_update(
+ InMat1 A,
+ InMat2 E,
+ OutMat C,
+ Triangle t);
+template<class ExecutionPolicy,
+ in-matrix InMat1,
+ in-matrix InMat2,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+void hermitian_matrix_rank_k_update(
+ ExecutionPolicy&& exec,
+ InMat1 A,
+ InMat2 E,
+ OutMat C,
+ Triangle t);12 +Effects: Computes C = E + AAH.
+++Replace the entire contents of [linalg.algs.blas3.rank2k] with the +following.
+
[Note: These functions correspond to the BLAS functions
+xSYR2K and xHER2K. – end note]
1 +The following elements apply to all functions in +[linalg.algs.blas3.rank2k].
+2
+For any function F in this section that takes a parameter
+named t, an InMat3 template parameter, and a
+function parameter InMat3 E, t applies to
+accesses done through the parameter E. F will
+only access the triangle of E specified by t.
+For accesses of diagonal elements E[i, i], F
+will use the value
+real-if-needed(E[i, i]) if the name
+of F starts with hermitian. For accesses
+E[i, j] outside the triangle specified by t,
+F will use the value
(2.1)
+conj-if-needed(E[j, i]) if the name
+of F starts with hermitian, or
(2.2)
+E[j, i] if the name of F starts with
+symmetric.
3 +Mandates:
+(3.1) If
+OutMat has layout_blas_packed layout, then the
+layout’s Triangle template argument has the same type as
+the function’s Triangle template argument;
(3.2) If
+the function takes an InMat3 template parameter and if
+InMat3 has layout_blas_packed layout, then the
+layout’s Triangle template argument has the same type as
+the function’s Triangle template argument.
(3.3)
+possibly-multipliable<decltype(A), decltype(transposed(B)), decltype(C)>
+is true.
(3.4)
+possibly-multipliable<decltype(B), decltype(transposed(A)), decltype(C)>
+is true.
(3.5)
+possibly-addable<decltype(C), decltype(E), decltype(C)>
+is true for those overloads that take an E
+parameter.
4 +Preconditions:
+(4.1)
+multipliable(A, transposed(B), C) is
+true.
(4.2)
+multipliable(B, transposed(A), C) is
+true. [Note: This and the previous imply that
+C is square. – end note]
(4.3)
+addable(C, E, C) is true
+for those overloads that take an E parameter.
4
+Complexity: O(
+A.extent(0) ⋅
+A.extent(1) ⋅
+B.extent(0) )
5
+Remarks: C may alias E for those
+overloads that take an E parameter.
template<in-matrix InMat1,
+ in-matrix InMat2,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+void symmetric_matrix_rank_2k_update(
+ InMat1 A,
+ InMat2 B,
+ OutMat C,
+ Triangle t);
+template<class ExecutionPolicy,
+ in-matrix InMat1,
+ in-matrix InMat2,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+void symmetric_matrix_rank_2k_update(
+ ExecutionPolicy&& exec,
+ InMat1 A,
+ InMat2 B,
+ OutMat C,
+ Triangle t);5 +Effects: Computes C = ABT + BAT.
+template<in-matrix InMat1,
+ in-matrix InMat2,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+void hermitian_matrix_rank_2k_update(
+ InMat1 A,
+ InMat2 B,
+ OutMat C,
+ Triangle t);
+template<class ExecutionPolicy,
+ in-matrix InMat1,
+ in-matrix InMat2,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+void hermitian_matrix_rank_2k_update(
+ ExecutionPolicy&& exec,
+ InMat1 A,
+ InMat2 B,
+ OutMat C,
+ Triangle t);6 +Effects: Computes C = ABH + BAH.
+template<in-matrix InMat1,
+ in-matrix InMat2,
+ in-matrix InMat3,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+void symmetric_matrix_rank_2k_update(
+ InMat1 A,
+ InMat2 B,
+ InMat3 E,
+ OutMat C,
+ Triangle t);
+template<class ExecutionPolicy,
+ in-matrix InMat1,
+ in-matrix InMat2,
+ in-matrix InMat3,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+void symmetric_matrix_rank_2k_update(
+ ExecutionPolicy&& exec,
+ InMat1 A,
+ InMat2 B,
+ InMat3 E,
+ OutMat C,
+ Triangle t);7 +Effects: Computes C = E + ABT + BAT.
+template<in-matrix InMat1,
+ in-matrix InMat2,
+ in-matrix InMat3,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+void hermitian_matrix_rank_2k_update(
+ InMat1 A,
+ InMat2 B,
+ InMat3 E,
+ OutMat C,
+ Triangle t);
+template<class ExecutionPolicy,
+ in-matrix InMat1,
+ in-matrix InMat2,
+ in-matrix InMat3,
+ possibly-packed-out-matrix OutMat,
+ class Triangle>
+void hermitian_matrix_rank_2k_update(
+ ExecutionPolicy&& exec,
+ InMat1 A,
+ InMat2 B,
+ InMat3 E,
+ OutMat C,
+ Triangle t);8 +Effects: Computes C = E + ABH + BAH.
+The following example shows how to implement a generic numerical
+algorithm, two_norm_abs. This algorithm computes the
+absolute value of a variety of number types. For complex numbers, it
+returns the magnitude, which is the same as the two-norm of the
+two-element vector composed of the real and imaginary parts of the
+complex number. This Compiler
+Explorer link demonstrates the implementation. This is not meant to
+show an ideal implementation. (A better one would use rescaling, in the
+manner of std::hypot, to avoid undue overflow or
+underflow.) Instead, it illustrates generic numerical algorithm
+development. Commenting out the
+#define DEFINE_CONJ_REAL_IMAG_FOR_REAL 1 line shows that
+without ADL-findable conj, real, and
+imag, users’ generic numerical algorithms would need more
+special cases and more assumptions on number types.
#include <cassert>
+#include <cmath>
+#include <complex>
+#include <concepts>
+#include <type_traits>
+
+#define DEFINE_CONJ_REAL_IMAG_FOR_REAL 1
+
+template<class T>
+constexpr bool is_std_complex = false;
+template<class R>
+constexpr bool is_std_complex<std::complex<R>> = true;
+
+template<class T>
+auto two_norm_abs(T t) {
+ if constexpr (std::is_unsigned_v<T>) {
+ return t;
+ }
+ else if constexpr (std::is_arithmetic_v<T> || is_std_complex<T>) {
+ return std::abs(t);
+ }
+#if ! defined(DEFINE_CONJ_REAL_IMAG_FOR_REAL)
+ else if constexpr (requires(T x) {
+ {abs(x)} -> std::convertible_to<T>;
+ }) {
+ return T{abs(t)};
+ }
+#endif
+ else if constexpr (requires(T x) {
+ {sqrt(real(x * conj(x)))} -> std::same_as<decltype(real(x))>;
+ }) {
+ return sqrt(real(t * conj(t)));
+ }
+ else {
+ static_assert(false, "No reasonable way to implement abs(t)");
+ }
+}
+
+struct MyRealNumber {
+ MyRealNumber() = default;
+ MyRealNumber(double value) : value_(value) {}
+
+ double value() const {
+ return value_;
+ }
+
+ friend bool operator==(MyRealNumber, MyRealNumber) = default;
+ friend MyRealNumber operator-(MyRealNumber x) {
+ return {-x.value_};
+ }
+ friend MyRealNumber operator+(MyRealNumber x, MyRealNumber y) {
+ return {x.value_ + y.value_};
+ }
+ friend MyRealNumber operator-(MyRealNumber x, MyRealNumber y) {
+ return {x.value_ - y.value_};
+ }
+ friend MyRealNumber operator*(MyRealNumber x, MyRealNumber y) {
+ return x.value_ * y.value_;
+ }
+
+#if defined(DEFINE_CONJ_REAL_IMAG_FOR_REAL)
+ friend MyRealNumber conj(MyRealNumber x) { return x; }
+ friend MyRealNumber real(MyRealNumber x) { return x; }
+ friend MyRealNumber imag(MyRealNumber x) { return {}; }
+#else
+ friend MyRealNumber abs(MyRealNumber x) { return std::abs(x.value_); }
+#endif
+ friend MyRealNumber sqrt(MyRealNumber x) { return std::sqrt(x.value_); }
+
+private:
+ double value_{};
+};
+
+class MyComplexNumber {
+public:
+ MyComplexNumber(MyRealNumber re, MyRealNumber im = {}) : re_(re), im_(im) {}
+ MyComplexNumber() = default;
+
+ std::complex<double> value() const {
+ return {re_.value(), im_.value()};
+ }
+
+ friend bool operator==(MyComplexNumber, MyComplexNumber) = default;
+ friend MyComplexNumber operator*(MyComplexNumber z, MyComplexNumber w) {
+ return {real(z) * real(w) - imag(z) * imag(w),
+ real(z) * imag(w) + imag(z) * real(w)};
+ }
+ friend MyComplexNumber conj(MyComplexNumber z) { return {real(z), -imag(z)}; }
+ friend MyRealNumber real(MyComplexNumber z) { return z.re_; }
+ friend MyRealNumber imag(MyComplexNumber z) { return z.im_; }
+
+private:
+ MyRealNumber re_{};
+ MyRealNumber im_{};
+};
+
+int main() {
+ [[maybe_unused]] double x1 = two_norm_abs(-4.2);
+ assert(x1 == 4.2);
+
+ [[maybe_unused]] float y0 = two_norm_abs(std::complex<float>{-3.0f, 4.0f});
+ assert(y0 == 5.0f);
+
+ [[maybe_unused]] MyRealNumber r = two_norm_abs(MyRealNumber{-6.7});
+ assert(r == MyRealNumber{6.7});
+
+ [[maybe_unused]] MyRealNumber z = two_norm_abs(MyComplexNumber{-3, 4});
+ assert(z.value() == 5.0);
+
+ return 0;
+}| [linalg] algorithm | +What it computes now | +
|---|---|
| `symmetric_matrix_rank_k_update` | +$C := C + \alpha A A^T$ | +
| `hermitian_matrix_rank_k_update` | +$C := C + \alpha A A^H$ | +
| `symmetric_matrix_rank_2k_update` | +$C := C + \alpha A B^T + \alpha B A^T$ | +
| `hermitian_matrix_rank_2k_update` | +$C := C + \alpha A B^H + \bar{\alpha} B A^H$ | +
| [linalg] algorithm | +What it computes now | +Change (overwriting) | +Add (updating) | +
|---|---|---|---|
| `symmetric_matrix_rank_k_update` | +$C := C + \alpha A A^T$ | +$C = \alpha A A^T$ | +$C = E + \alpha A A^T$ | +
| `hermitian_matrix_rank_k_update` | +$C := C + \alpha A A^H$ | +$C = \alpha A A^H$ | +$C = E + \alpha A A^H$ | +
| `symmetric_matrix_rank_2k_update` | +$C := C + \alpha A B^T + \alpha B A^T$ | +$C = \alpha A B^T + \alpha B A^T$ | +$C = E + \alpha A B^T + \alpha B A^T$ | +
| `hermitian_matrix_rank_2k_update` | +$C := C + \alpha A B^H + \bar{\alpha} B A^H$ | +$C = \alpha A B^H + \bar{\alpha} B A^H$ | +$C = E + \alpha A B^H + \bar{\alpha} B A^H$ | +
| [linalg] algorithm | +What it computes now | +
|---|---|
| `symmetric_matrix_rank_1_update` | +$A := A + \alpha x x^T$ | +
| `hermitian_matrix_rank_1_update` | +$A := A + \alpha x x^H$ | +
| `symmetric_matrix_rank_2_update` | +$A := A + \alpha x y^T + \alpha y x^T$ | +
| `hermitian_matrix_rank_2_update` | +$A := A + \alpha x y^H + \bar{\alpha} x y^H$ | +
| [linalg] algorithm | +What it computes now | +
|---|---|
| `matrix_rank_1_update` | +$A := A + x y^T$ | +
| `matrix_rank_1_update_c` | +$A := A + x y^H$ | +
| [linalg] algorithm | +Corresponding BLAS routine(s) | +
|---|---|
| `matrix_rank_1_update` | +`xGER` | +
| `matrix_rank_1_update_c` | +`xGERC` | +
| `symmetric_matrix_rank_1_update` | +`xSYR`, `xSPR` | +
| `hermitian_matrix_rank_1_update` | +`xHER`, `xHPR` | +
| `symmetric_matrix_rank_2_update` | +`xSYR2`, `xSPR2` | +
| `hermitian_matrix_rank_2_update` | +`xHER2`, `xHPR2` | +
| [linalg] algorithm | +What it computes now | +Change (overwriting) | +Add (updating) | +
|---|---|---|---|
| `matrix_rank_1_update` | +$A := A + x y^T$ | +$A = x y^T$ | +$A = E + x y^T$ | +
| `matrix_rank_1_update_c` | +$A := A + x y^H$ | +$A = x y^H$ | +$A = E + x y^H$ | +
| `symmetric_matrix_rank_1_update` | +$A := A + \alpha x x^T$ | +$A = \alpha x x^T$ | +$A = E + \alpha x x^T$ | +
| `hermitian_matrix_rank_1_update` | +$A := A + \alpha x x^H$ | +$A = \alpha x x^H$ | +$A = E + \alpha x x^H$ | +
| `symmetric_matrix_rank_2_update` | +$A := A + \alpha x y^T + \alpha y x^T$ | +$A = \alpha x y^T + \alpha y x^T$ | +$A = E + \alpha x y^T + \alpha y x^T$ | +
| `hermitian_matrix_rank_2_update` | +$A := A + \alpha x y^H + \bar{\alpha} x y^H$ | +$A = \alpha x y^H + \bar{\alpha} x y^H$ | +$A = E + \alpha x y^H + \bar{\alpha} x y^H$ | +
| BLAS routine | +Restricts `alpha` | +Restricts `beta` | +Generalizes | +
|---|---|---|---|
| `xHEMM` | +No | +No | +N/A | +
| `xHER` | +Yes | +Yes | +N/A | +
| `xHER2` | +No | +Yes | +N/A | +
| `xHERK` | +Yes | +Yes | +`xHER` | +
| `xHER2K` | +No | +Yes | +`xHER2` | +