From c793411c7e3f83d33d39cd6fc4f256e815e3ebc6 Mon Sep 17 00:00:00 2001 From: Mark Hoemmen Date: Wed, 7 Aug 2024 15:16:29 -0600 Subject: [PATCH 01/17] P3371R0: Fix rank-k & rank-2k updates --- fix-rankk-rank2k/Makefile | 3 + fix-rankk-rank2k/P3371R0.html | 1297 ++++++++++++++++++++++++++ fix-rankk-rank2k/README | 7 + fix-rankk-rank2k/fix-rankk-rank2k.md | 742 +++++++++++++++ 4 files changed, 2049 insertions(+) create mode 100644 fix-rankk-rank2k/Makefile create mode 100644 fix-rankk-rank2k/P3371R0.html create mode 100644 fix-rankk-rank2k/README create mode 100644 fix-rankk-rank2k/fix-rankk-rank2k.md diff --git a/fix-rankk-rank2k/Makefile b/fix-rankk-rank2k/Makefile new file mode 100644 index 00000000..655cdec6 --- /dev/null +++ b/fix-rankk-rank2k/Makefile @@ -0,0 +1,3 @@ +include ../P0009/wg21/Makefile + +.DEFAULT_GOAL := $(HTML) diff --git a/fix-rankk-rank2k/P3371R0.html b/fix-rankk-rank2k/P3371R0.html new file mode 100644 index 00000000..645b16b5 --- /dev/null +++ b/fix-rankk-rank2k/P3371R0.html @@ -0,0 +1,1297 @@ + + + + + + + + Fix C++26 by making the symmetric and Hermitian rank-k and rank-2k updates consistent with the BLAS + + + + + + + + +
+
+

Fix C++26 by making the +symmetric and Hermitian rank-k and rank-2k updates consistent with the +BLAS

+ + + + + + + + + + + + + + + + + + +
Document #: P3371R0
Date: 2024-08-07
Project: Programming Language C++
+ LEWG
+
Reply-to: + Mark Hoemmen
<>
+
+ +
+
+ +

1 Authors

+
    +
  • Mark Hoemmen (mhoemmen@nvidia.com) (NVIDIA)
  • +
+

2 Revision history

+
    +
  • Revision 0 to be submitted 2024-08-15
  • +
+

3 Abstract

+

The four std::linalg functions +symmetric_matrix_rank_k_update, +hermitian_matrix_rank_k_update, +symmetric_matrix_rank_2k_update, and +hermitian_matrix_rank_2k_update currently have behavior +inconsistent with their corresponding BLAS (Basic Linear Algebra +Subroutines) routines. Also, their behavior is inconsistent with +std::linalg’s matrix_product, even though in +mathematical terms they are special cases of a matrix-matrix +product.

+

We propose fixing this by making two changes. First, we will add +“updating” overloads analogous to the updating overloads of +matrix_product. For example, +symmetric_matrix_rank_k_update(A, scaled(beta, C), C, upper_triangle) +will perform C := βC + AAT. +Second, we will change the behavior of the existing overloads to be +“overwriting,” without changing the number of their parameters or how +they are called. For example, +symmetric_matrix_rank_k_update(A, C, upper_triangle) will +perform C := AAT +instead of C := C + AAT.

+

The second set of changes is a breaking change. Thus, we must finish +this before finalization of C++26.

+

4 Motivation

+

Each function in std::linalg generally corresponds to one or more +routines or functions in the original BLAS (Basic Linear Algebra +Subroutines). Every computation that the BLAS can do, a function in +std::linalg should be able to do.

+

One std::linalg user +reported +an exception to this rule. The BLAS routine DSYRK +(Double-precision SYmmetric Rank-K update) computes C := βC + αAAT, +but the corresponding std::linalg function +symmetric_matrix_rank_k_update only computes C := C + αAAT. +That is, std::linalg currently has no way to express this +BLAS operation with a general β scaling factor. This issue applies +to all of the following functions.

+
    +
  • symmetric_matrix_rank_k_update: computes C := C + αAAT
  • +
  • hermitian_matrix_rank_k_update: computes C := C + αAAH
  • +
  • symmetric_matrix_rank_2k_update: computes C := C + αABH + αBAH
  • +
  • hermitian_matrix_rank_2k_update: computes C := C + αABH + ᾱBAH, +where ᾱ denotes the complex +conjugate of α
  • +
+

These functions implement special cases of matrix-matrix products. +The matrix_product function in std::linalg +implements the general case of matrix-matrix products. This function +corresponds to the BLAS’s SGEMM, DGEMM, +CGEMM, and ZGEMM, which compute C := βC + αAB, +where α and β are scaling factors. The +matrix_product function has two kinds of overloads:

+
    +
  1. overwriting (C = AB) +and

  2. +
  3. updating (C = E + AB).

  4. +
+

The updating overloads handle the general α and β case by +matrix_product(scaled(alpha, A), B, scaled(beta, C), C). +The specification explicitly permits the input +scaled(beta, C) to alias the output C +([linalg.algs.blas3.gemm] 10: “Remarks: +C may alias E”). The std::linalg +library provides overwriting and updating overloads so that it can do +everything that the BLAS does, just in a more idiomatically C++ way. +Please see +P1673R13 +Section 10.3 (“Function argument aliasing and zero scalar +multipliers”) for a more detailed explanation.

+

The problem with functions like +symmetric_matrix_rank_k_update is that they have the same +calling syntax as the overwriting version of +matrix_product, but semantics that differ from +both the overwriting and the updating versions of +matrix_product. The current overloads are not overwriting, +so we can’t just fix this problem by introducing an “updating” overload +for each function.

+

Incidentally, the fact that these functions have “update” in their +name is not relevant, because that naming choice is original to the +BLAS. The BLAS calls its corresponding xSYRK, +xHERK, xSYR2K, and xHER2K +routines “{Symmetric, Hermitian} rank {one, two} update,” even though +setting β = 0 makes these +routines “overwriting” in the sense of std::linalg.

+

5 Proposed changes

+

We propose to fix this by making the four functions work just like +matrix_vector_product or matrix_product. This +entails two changes.

+
    +
  1. We will add “updating” overloads with a new input matrix +parameter E, analogous to the updating overloads of +matrix_product. For example, +symmetric_matrix_rank_k_update(A, E, C, upper_triangle) +will compute C = E + AAT. +We will specifically permit C and E to alias, +thus permitting the desired case where E is +scaled(beta, C). The updating overloads will take +E as an in-matrix, and will change +C from a possibly-packed-inout-matrix +to a possibly-packed-out-matrix.

  2. +
  3. We will change the behavior of the existing overloads to be +“overwriting,” without changing how they are called. For example, +symmetric_matrix_rank_k_update(A, C, upper_triangle) will +compute C = AAT +instead of C := C + AAT. +The overwriting overloads will change C from a +possibly-packed-inout-matrix to a +possibly-packed-out-matrix.

  4. +
+

This second set of changes is a breaking change. It needs to be so +that we can provide the overwriting behavior C := αAAT +that the corresponding BLAS routines already provide. Thus, we must +finish this before finalization of C++26.

+

Both sets of overloads still only write to the specified triangle +(lower or upper) of the output matrix C. As a result, the +new updating overloads only read from that triangle of the input matrix +E. Therefore, even though E may be a different +matrix than C, the updating overloads do not need an +additional Triangle t_E parameter for E. The +symmetric_* functions interpret E as symmetric +in the same way that they interpret C as symmetric, and the +hermitian_* functions interpret E as Hermitian +in the same way that they interpret C as Hermitian.

+

6 We do NOT propose changing rank-1 +or rank-2 update functions

+

The four functions we propose to change have the following rank-1 and +rank-2 analogs, where A +denotes a symmetric or Hermitian matrix (depending on the function’s +name) and x and y denote vectors.

+
    +
  • symmetric_matrix_rank_1_update: computes A := A + αxxT
  • +
  • hermetian_matrix_rank_1_update: computes A := A + αxxH
  • +
  • symmetric_matrix_rank_2_update: computes A := A + αxyT + αyxT
  • +
  • hermitian_matrix_rank_2_update: computes A := A + αxyH + ᾱxyH
  • +
+

We do NOT propose to change these functions analogously to the rank-k +and rank-2k update functions. This is because the BLAS routines +corresponding to the rank-1 and rank-2 functions – xSYR, +xHER, xSYR2, and xHER2 – do not +have a way to supply a β +scaling factor. That is, these std::linalg functions can +already do everything that their corresponding BLAS routines can do. +This is consistent with our design intent in +Section +10.3 of P1673R3 for translating Fortran INTENT(INOUT) +arguments into a C++ idiom.

+
+
    +
  1. Else, if the BLAS function unconditionally updates (like +xGER), we retain read-and-write behavior for that +argument.

  2. +
  3. Else, if the BLAS function uses a scalar beta +argument to decide whether to read the output argument as well as write +to it (like xGEMM), we provide two versions: a write-only +[that is, “overwriting”] version (as if beta is zero), and +a read-and-write [that is, “updating”] version (as if beta +is nonzero).

  4. +
+
+

The rank-1 and rank-2 update functions “unconditionally update,” in +the same way that the BLAS’s general rank-1 update routine +xGER does. However, the BLAS’s rank-k and rank-2k update +functions “use a scalar beta argument…,” so for +consistency, it makes sense for std::linalg to provide both +overwriting and updating versions.

+

Since we do not propose changing the symmetric and Hermitian rank-1 +and rank-2 functions, we retain the exposition-only concept +possibly-packed-inout-matrix, which they use to +constrain their parameter A.

+

7 Fixes for some wording issues

+

There are some wording issues in +[linalg.algs.blas3.rankk] and +[linalg.algs.blas3.rank2k], specifically in the +Mandates, Preconditions, and Complexity clauses. These clauses do not +reflect the design intent, which is expressed by the corresponding BLAS +routines and the mathematical expressions in the current wording that +express the functions’ behavior. We have not yet filed an LWG issue for +these wording issues. Given that the above changes will call for +adjusting some parts of these clauses anyway (e.g., by changing +InOutMat to OutMat), we have decided to +propose “drive-by” fixes to these clauses in this proposal. We +nevertheless plan to file an LWG issue to fix both these and other +wording issues we have found.

+

8 Wording

+
+

Text in blockquotes is not proposed wording, but rather instructions +for generating proposed wording. The � character is used to denote a +placeholder section number which the editor shall determine.

+

In [version.syn], for the following definition,

+
+
#define __cpp_lib_linalg YYYYMML // also in <linalg>
+
+

adjust the placeholder value YYYYMML as needed so as to +denote this proposal’s date of adoption.

+
+

8.1 New exposition-only concept

+
+

In the header <linalg> synopsis +[linalg.syn], immediately before the following,

+
+
  template<class T>
+    concept possibly-packed-inout-matrix = see below;   // exposition only
+
+

add the following.

+
+
  template<class T>
+    concept possibly-packed-out-matrix = see below;   // exposition only
+

Then, to [linalg.helpers.concepts], immediately +before the following,

+
template<class T>
+  concept possibly-packed-inout-matrix =
+    is-mdspan<T> && T::rank() == 2 &&
+    is_assignable_v<typename T::reference, typename T::element_type> &&
+    (T::is_always_unique() || is-layout-blas-packed<typename T::layout_type>);
+
+

add the following definition of the exposition-only concept +possibly-packed-out-matrix.

+
+
template<class T>
+  concept possibly-packed-out-matrix =
+    is-mdspan<T> && T::rank() == 2 &&
+    is_assignable_v<typename T::reference, typename T::element_type> &&
+    (T::is_always_unique() || is-layout-blas-packed<typename T::layout_type>);
+

8.2 Rank-k update functions in +synopsis

+
+

In the header <linalg> synopsis +[linalg.syn], change the following

+
+
  // rank-k symmetric matrix update
+  template<class Scalar, in-matrix InMat, possibly-packed-inout-matrix InOutMat, class Triangle>
+    void symmetric_matrix_rank_k_update(Scalar alpha, InMat A, InOutMat C, Triangle t);
+  template<class ExecutionPolicy, class Scalar,
+           in-matrix InMat, possibly-packed-inout-matrix InOutMat, class Triangle>
+    void symmetric_matrix_rank_k_update(ExecutionPolicy&& exec,
+                                        Scalar alpha, InMat A, InOutMat C, Triangle t);
+
+  template<in-matrix InMat, possibly-packed-inout-matrix InOutMat, class Triangle>
+    void symmetric_matrix_rank_k_update(InMat A, InOutMat C, Triangle t);
+  template<class ExecutionPolicy,
+           in-matrix InMat, possibly-packed-inout-matrix InOutMat, class Triangle>
+    void symmetric_matrix_rank_k_update(ExecutionPolicy&& exec,
+                                        InMat A, InOutMat C, Triangle t);
+
+  // rank-k Hermitian matrix update
+  template<class Scalar, in-matrix InMat, possibly-packed-inout-matrix InOutMat, class Triangle>
+    void hermitian_matrix_rank_k_update(Scalar alpha, InMat A, InOutMat C, Triangle t);
+  template<class ExecutionPolicy,
+           class Scalar, in-matrix InMat, possibly-packed-inout-matrix InOutMat, class Triangle>
+    void hermitian_matrix_rank_k_update(ExecutionPolicy&& exec,
+                                        Scalar alpha, InMat A, InOutMat C, Triangle t);
+
+  template<in-matrix InMat, possibly-packed-inout-matrix InOutMat, class Triangle>
+    void hermitian_matrix_rank_k_update(InMat A, InOutMat C, Triangle t);
+  template<class ExecutionPolicy,
+           in-matrix InMat, possibly-packed-inout-matrix InOutMat, class Triangle>
+    void hermitian_matrix_rank_k_update(ExecutionPolicy&& exec,
+                                        InMat A, InOutMat C, Triangle t);
+
+

to read as follows.

+
+
  // overwriting rank-k symmetric matrix update
+  template<class Scalar,
+           in-matrix InMat,
+           possibly-packed-out-matrix OutMat,
+           class Triangle>
+    void symmetric_matrix_rank_k_update(
+      Scalar alpha, InMat A, OutMat C, Triangle t);
+  template<class ExecutionPolicy, class Scalar,
+           in-matrix InMat,
+           possibly-packed-out-matrix OutMat,
+           class Triangle>
+    void symmetric_matrix_rank_k_update(
+      ExecutionPolicy&& exec, Scalar alpha, InMat A, OutMat C, Triangle t);
+  template<in-matrix InMat,
+           possibly-packed-out-matrix OutMat,
+           class Triangle>
+    void symmetric_matrix_rank_k_update(
+      InMat A, OutMat C, Triangle t);
+  template<class ExecutionPolicy,
+           in-matrix InMat,
+           possibly-packed-out-matrix OutMat,
+           class Triangle>
+    void symmetric_matrix_rank_k_update(
+      ExecutionPolicy&& exec, InMat A, OutMat C, Triangle t);
+
+  // updating rank-k symmetric matrix update
+  template<class Scalar,
+           in-matrix InMat1,
+           in-matrix InMat2,
+           possibly-packed-out-matrix OutMat,
+           class Triangle>
+    void symmetric_matrix_rank_k_update(
+      Scalar alpha,
+      InMat1 A, InMat2 E, OutMat C, Triangle t);
+  template<class ExecutionPolicy, class Scalar,
+           in-matrix InMat1,
+           in-matrix InMat2,
+           possibly-packed-out-matrix OutMat,
+           class Triangle>
+    void symmetric_matrix_rank_k_update(
+      ExecutionPolicy&& exec, Scalar alpha,
+      InMat1 A, InMat2 E, OutMat C, Triangle t);
+  template<in-matrix InMat1,
+           in-matrix InMat2,
+           possibly-packed-out-matrix OutMat,
+           class Triangle>
+    void symmetric_matrix_rank_k_update(
+      InMat1 A, InMat2 E, OutMat C, Triangle t);
+  template<class ExecutionPolicy,
+           in-matrix InMat1,
+           in-matrix InMat2,
+           possibly-packed-out-matrix OutMat,
+           class Triangle>
+    void symmetric_matrix_rank_k_update(
+      ExecutionPolicy&& exec,
+      InMat1 A, InMat2 E, OutMat C, Triangle t);
+
+  // overwriting rank-k Hermitian matrix update
+  template<class Scalar,
+           in-matrix InMat,
+           possibly-packed-out-matrix OutMat,
+           class Triangle>
+    void hermitian_matrix_rank_k_update(
+      Scalar alpha, InMat A, OutMat C, Triangle t);
+  template<class ExecutionPolicy, class Scalar,
+           in-matrix InMat,
+           possibly-packed-out-matrix OutMat,
+           class Triangle>
+    void hermitian_matrix_rank_k_update(
+      ExecutionPolicy&& exec, Scalar alpha, InMat A, OutMat C, Triangle t);
+  template<in-matrix InMat,
+           possibly-packed-out-matrix OutMat,
+           class Triangle>
+    void hermitian_matrix_rank_k_update(
+      InMat A, OutMat C, Triangle t);
+  template<class ExecutionPolicy,
+           in-matrix InMat,
+           possibly-packed-out-matrix OutMat,
+           class Triangle>
+    void hermitian_matrix_rank_k_update(
+      ExecutionPolicy&& exec, InMat A, OutMat C, Triangle t);
+
+  // updating rank-k Hermitian matrix update
+  template<class Scalar,
+           in-matrix InMat1,
+           in-matrix InMat2,
+           possibly-packed-out-matrix OutMat,
+           class Triangle>
+    void hermitian_matrix_rank_k_update(
+      Scalar alpha,
+      InMat1 A, InMat2 E, OutMat C, Triangle t);
+  template<class ExecutionPolicy, class Scalar,
+           in-matrix InMat1,
+           in-matrix InMat2,
+           possibly-packed-out-matrix OutMat,
+           class Triangle>
+    void hermitian_matrix_rank_k_update(
+      ExecutionPolicy&& exec, Scalar alpha,
+      InMat1 A, InMat2 E, OutMat C, Triangle t);
+  template<in-matrix InMat1,
+           in-matrix InMat2,
+           possibly-packed-out-matrix OutMat,
+           class Triangle>
+    void hermitian_matrix_rank_k_update(
+      InMat1 A, InMat2 E, OutMat C, Triangle t);
+  template<class ExecutionPolicy,
+           in-matrix InMat1,
+           in-matrix InMat2,
+           possibly-packed-out-matrix OutMat,
+           class Triangle>
+    void hermitian_matrix_rank_k_update(
+      ExecutionPolicy&& exec,
+      InMat1 A, InMat2 E, OutMat C, Triangle t);
+

8.3 Rank-2k update functions in +synopsis

+
+

In the header <linalg> synopsis +[linalg.syn], change the following

+
+
  // rank-2k symmetric matrix update
+  template<in-matrix InMat1, in-matrix InMat2,
+           possibly-packed-inout-matrix InOutMat, class Triangle>
+    void symmetric_matrix_rank_2k_update(InMat1 A, InMat2 B, InOutMat C, Triangle t);
+  template<class ExecutionPolicy,
+           in-matrix InMat1, in-matrix InMat2,
+           possibly-packed-inout-matrix InOutMat, class Triangle>
+    void symmetric_matrix_rank_2k_update(ExecutionPolicy&& exec,
+                                         InMat1 A, InMat2 B, InOutMat C, Triangle t);
+
+  // rank-2k Hermitian matrix update
+  template<in-matrix InMat1, in-matrix InMat2,
+           possibly-packed-inout-matrix InOutMat, class Triangle>
+    void hermitian_matrix_rank_2k_update(InMat1 A, InMat2 B, InOutMat C, Triangle t);
+  template<class ExecutionPolicy,
+           in-matrix InMat1, in-matrix InMat2,
+           possibly-packed-inout-matrix InOutMat, class Triangle>
+    void hermitian_matrix_rank_2k_update(ExecutionPolicy&& exec,
+                                         InMat1 A, InMat2 B, InOutMat C, Triangle t);
+
+

to read as follows.

+
+
  // overwriting rank-2k symmetric matrix update
+  template<in-matrix InMat1, in-matrix InMat2,
+           possibly-packed-out-matrix OutMat, class Triangle>
+    void symmetric_matrix_rank_2k_update(InMat1 A, InMat2 B, OutMat C, Triangle t);
+  template<class ExecutionPolicy,
+           in-matrix InMat1, in-matrix InMat2,
+           possibly-packed-out-matrix OutMat, class Triangle>
+    void symmetric_matrix_rank_2k_update(ExecutionPolicy&& exec,
+                                         InMat1 A, InMat2 B, OutMat C, Triangle t);
+
+  // updating rank-2k symmetric matrix update
+  template<in-matrix InMat1, in-matrix InMat2, in-matrix InMat3,
+           possibly-packed-out-matrix OutMat, class Triangle>
+    void symmetric_matrix_rank_2k_update(InMat1 A, InMat2 B, InMat3 E, OutMat C, Triangle t);
+  template<class ExecutionPolicy,
+           in-matrix InMat1, in-matrix InMat2, in-matrix InMat3,
+           possibly-packed-out-matrix OutMat, class Triangle>
+    void symmetric_matrix_rank_2k_update(ExecutionPolicy&& exec,
+                                         InMat1 A, InMat2 B, InMat3 E, OutMat C, Triangle t);
+
+  // overwriting rank-2k Hermitian matrix update
+  template<in-matrix InMat1, in-matrix InMat2,
+           possibly-packed-out-matrix OutMat, class Triangle>
+    void hermitian_matrix_rank_2k_update(InMat1 A, InMat2 B, OutMat C, Triangle t);
+  template<class ExecutionPolicy,
+           in-matrix InMat1, in-matrix InMat2,
+           possibly-packed-out-matrix OutMat, class Triangle>
+    void hermitian_matrix_rank_2k_update(ExecutionPolicy&& exec,
+                                         InMat1 A, InMat2 B, OutMat C, Triangle t);
+
+  // updating rank-2k Hermitian matrix update
+  template<in-matrix InMat1, in-matrix InMat2, in-matrix InMat3,
+           possibly-packed-out-matrix OutMat, class Triangle>
+    void hermitian_matrix_rank_2k_update(InMat1 A, InMat2 B, InMat3 E, OutMat C, Triangle t);
+  template<class ExecutionPolicy,
+           in-matrix InMat1, in-matrix InMat2, in-matrix InMat3,
+           possibly-packed-out-matrix OutMat, class Triangle>
+    void hermitian_matrix_rank_2k_update(ExecutionPolicy&& exec,
+                                         InMat1 A, InMat2 B, InMat3 E, OutMat C, Triangle t);
+

8.4 Specification of rank-k update +functions

+
+

Replace the entire contents of [linalg.algs.blas3.rankk] with the +following.

+
+

[Note: These functions correspond to the BLAS functions +xSYRK and xHERK. – end note]

+

1 +The following elements apply to all functions in +[linalg.algs.blas3.rankk].

+

2 +Mandates:

+
    +
  • (2.1) If +OutMat has layout_blas_packed layout, then the +layout’s Triangle template argument has the same type as +the function’s Triangle template argument.

  • +
  • (2.2) +possibly-multipliable<decltype(A), decltype(transposed(A)), decltype(C)> +is true.

  • +
  • (2.3) +possibly-addable<decltype(C), decltype(E), decltype(C)> +is true for those overloads that take an E +parameter.

  • +
+

3 +Preconditions:

+
    +
  • (3.1) +multipliable(A, transposed(A), C) is +true. [Note: This implies that C is +square. – end note]

  • +
  • (3.2) +addable(C, E, C) is true +for those overloads that take an E parameter.

  • +
+

4 +Complexity: O( +A.extent(0) +A.extent(1) +A.extent(0) ).

+
template<class Scalar,
+         in-matrix InMat,
+         possibly-packed-out-matrix OutMat,
+         class Triangle>
+void symmetric_matrix_rank_k_update(
+  Scalar alpha,
+  InMat A,
+  OutMat C,
+  Triangle t);
+template<class ExecutionPolicy,
+         class Scalar,
+         in-matrix InMat,
+         possibly-packed-out-matrix OutMat,
+         class Triangle>
+void symmetric_matrix_rank_k_update(
+  ExecutionPolicy&& exec,
+  Scalar alpha,
+  InMat A,
+  OutMat C,
+  Triangle t);
+

5 +Effects: Computes C = αAAT, +where the scalar α is +alpha.

+
template<in-matrix InMat,
+         possibly-packed-out-matrix OutMat,
+         class Triangle>
+void symmetric_matrix_rank_k_update(
+  InMat A,
+  OutMat C,
+  Triangle t);
+template<class ExecutionPolicy,
+         in-matrix InMat,
+         possibly-packed-out-matrix OutMat,
+         class Triangle>
+void symmetric_matrix_rank_k_update(
+  ExecutionPolicy&& exec,
+  InMat A,
+  OutMat C,
+  Triangle t);
+

6 +Effects: Computes C = AAT.

+
template<class Scalar,
+         in-matrix InMat,
+         possibly-packed-out-matrix OutMat,
+         class Triangle>
+void hermitian_matrix_rank_k_update(
+  Scalar alpha,
+  InMat A,
+  OutMat C,
+  Triangle t);
+template<class ExecutionPolicy,
+         class Scalar,
+         in-matrix InMat,
+         possibly-packed-out-matrix OutMat,
+         class Triangle>
+void hermitian_matrix_rank_k_update(
+  ExecutionPolicy&& exec,
+  Scalar alpha,
+  InMat A,
+  OutMat C,
+  Triangle t);
+

7 +Effects: Computes C = αAAH, +where the scalar α is +alpha.

+
template<in-matrix InMat,
+         possibly-packed-out-matrix OutMat,
+         class Triangle>
+void hermitian_matrix_rank_k_update(
+  InMat A,
+  OutMat C,
+  Triangle t);
+template<class ExecutionPolicy,
+         in-matrix InMat,
+         possibly-packed-out-matrix OutMat,
+         class Triangle>
+void hermitian_matrix_rank_k_update(
+  ExecutionPolicy&& exec,
+  InMat A,
+  OutMat C,
+  Triangle t);
+

8 +Effects: Computes C = AAH.

+
template<class Scalar,
+         in-matrix InMat1,
+         in-matrix InMat2,
+         possibly-packed-out-matrix OutMat,
+         class Triangle>
+void symmetric_matrix_rank_k_update(
+  Scalar alpha,
+  InMat1 A,
+  InMat2 E,
+  OutMat C,
+  Triangle t);
+template<class ExecutionPolicy,
+         class Scalar,
+         in-matrix InMat1,
+         in-matrix InMat2,
+         possibly-packed-out-matrix OutMat,
+         class Triangle>
+void symmetric_matrix_rank_k_update(
+  ExecutionPolicy&& exec,
+  Scalar alpha,
+  InMat1 A,
+  InMat2 E,
+  OutMat C,
+  Triangle t);
+

9 +Effects: Computes C = E + αAAT, +where the scalar α is +alpha.

+
template<in-matrix InMat1,
+         in-matrix InMat2,
+         possibly-packed-out-matrix OutMat,
+         class Triangle>
+void symmetric_matrix_rank_k_update(
+  InMat1 A,
+  InMat2 E,
+  OutMat C,
+  Triangle t);
+template<class ExecutionPolicy,
+         in-matrix InMat1,
+         in-matrix InMat2,
+         possibly-packed-out-matrix OutMat,
+         class Triangle>
+void symmetric_matrix_rank_k_update(
+  ExecutionPolicy&& exec,
+  InMat1 A,
+  InMat2 E,
+  OutMat C,
+  Triangle t);
+

10 +Effects: Computes C = E + AAT.

+
template<class Scalar,
+         in-matrix InMat1,
+         in-matrix InMat2,
+         possibly-packed-out-matrix OutMat,
+         class Triangle>
+void hermitian_matrix_rank_k_update(
+  Scalar alpha,
+  InMat1 A,
+  InMat2 E,
+  OutMat C,
+  Triangle t);
+template<class ExecutionPolicy,
+         class Scalar,
+         in-matrix InMat1,
+         in-matrix InMat2,
+         possibly-packed-out-matrix OutMat,
+         class Triangle>
+void hermitian_matrix_rank_k_update(
+  ExecutionPolicy&& exec,
+  Scalar alpha,
+  InMat1 A,
+  InMat2 E,
+  OutMat C,
+  Triangle t);
+

11 +Effects: Computes C = E + αAAH, +where the scalar α is +alpha.

+
template<in-matrix InMat1,
+         in-matrix InMat2,
+         possibly-packed-out-matrix OutMat,
+         class Triangle>
+void hermitian_matrix_rank_k_update(
+  InMat1 A,
+  InMat2 E,
+  OutMat C,
+  Triangle t);
+template<class ExecutionPolicy,
+         in-matrix InMat1,
+         in-matrix InMat2,
+         possibly-packed-out-matrix OutMat,
+         class Triangle>
+void hermitian_matrix_rank_k_update(
+  ExecutionPolicy&& exec,
+  InMat1 A,
+  InMat2 E,
+  OutMat C,
+  Triangle t);
+

12 +Effects: Computes C = E + AAH.

+

8.5 Specification of rank-2k update +functions

+
+

Replace the entire contents of [linalg.algs.blas3.rank2k] with the +following.

+
+

[Note: These functions correspond to the BLAS functions +xSYR2K and xHER2K. – end note]

+

1 +The following elements apply to all functions in +[linalg.algs.blas3.rank2k].

+

2 +Mandates:

+
    +
  • (2.1) If +OutMat has layout_blas_packed layout, then the +layout’s Triangle template argument has the same type as +the function’s Triangle template argument;

  • +
  • (2.2) +possibly-multipliable<decltype(A), decltype(transposed(B)), decltype(C)> +is true.

  • +
  • (2.3) +possibly-multipliable<decltype(B), decltype(transposed(A)), decltype(C)> +is true.

  • +
  • (2.4) +possibly-addable<decltype(C), decltype(E), decltype(C)> +is true for those overloads that take an E +parameter.

  • +
+

3 +Preconditions:

+
    +
  • (3.1) +multipliable(A, transposed(B), C) is +true.

  • +
  • (3.2) +multipliable(B, transposed(A), C) is +true. [Note: This and the previous imply that +C is square. – end note]

  • +
  • (3.3) +addable(C, E, C) is true +for those overloads that take an E parameter.

  • +
+

4 +Complexity: O( +A.extent(0) +A.extent(1) +B.extent(0) )

+
template<in-matrix InMat1,
+         in-matrix InMat2,
+         possibly-packed-out-matrix OutMat,
+         class Triangle>
+void symmetric_matrix_rank_2k_update(
+  InMat1 A,
+  InMat2 B,
+  OutMat C,
+  Triangle t);
+template<class ExecutionPolicy,
+         in-matrix InMat1,
+         in-matrix InMat2,
+         possibly-packed-out-matrix OutMat,
+         class Triangle>
+void symmetric_matrix_rank_2k_update(
+  ExecutionPolicy&& exec,
+  InMat1 A,
+  InMat2 B,
+  OutMat C,
+  Triangle t);
+

5 +Effects: Computes C = ABT + BAT.

+
template<in-matrix InMat1,
+         in-matrix InMat2,
+         possibly-packed-out-matrix OutMat,
+         class Triangle>
+void hermitian_matrix_rank_2k_update(
+  InMat1 A,
+  InMat2 B,
+  OutMat C,
+  Triangle t);
+template<class ExecutionPolicy,
+         in-matrix InMat1,
+         in-matrix InMat2,
+         possibly-packed-out-matrix OutMat,
+         class Triangle>
+void hermitian_matrix_rank_2k_update(
+  ExecutionPolicy&& exec,
+  InMat1 A,
+  InMat2 B,
+  OutMat C,
+  Triangle t);
+

6 +Effects: Computes C = ABH + BAH.

+
template<in-matrix InMat1,
+         in-matrix InMat2,
+         in-matrix InMat3,
+         possibly-packed-out-matrix OutMat,
+         class Triangle>
+void symmetric_matrix_rank_2k_update(
+  InMat1 A,
+  InMat2 B,
+  InMat3 E,
+  OutMat C,
+  Triangle t);
+template<class ExecutionPolicy,
+         in-matrix InMat1,
+         in-matrix InMat2,
+         in-matrix InMat3,
+         possibly-packed-out-matrix OutMat,
+         class Triangle>
+void symmetric_matrix_rank_2k_update(
+  ExecutionPolicy&& exec,
+  InMat1 A,
+  InMat2 B,
+  InMat3 E,
+  OutMat C,
+  Triangle t);
+

7 +Effects: Computes C = E + ABT + BAT.

+
template<in-matrix InMat1,
+         in-matrix InMat2,
+         in-matrix InMat3,
+         possibly-packed-out-matrix OutMat,
+         class Triangle>
+void hermitian_matrix_rank_2k_update(
+  InMat1 A,
+  InMat2 B,
+  InMat3 E,
+  OutMat C,
+  Triangle t);
+template<class ExecutionPolicy,
+         in-matrix InMat1,
+         in-matrix InMat2,
+         in-matrix InMat3,
+         possibly-packed-out-matrix OutMat,
+         class Triangle>
+void hermitian_matrix_rank_2k_update(
+  ExecutionPolicy&& exec,
+  InMat1 A,
+  InMat2 B,
+  InMat3 E,
+  OutMat C,
+  Triangle t);
+

8 +Effects: Computes C = E + ABH + BAH.

+
+
+ + diff --git a/fix-rankk-rank2k/README b/fix-rankk-rank2k/README new file mode 100644 index 00000000..84b5ae65 --- /dev/null +++ b/fix-rankk-rank2k/README @@ -0,0 +1,7 @@ +To render the document on Linux: + +1. Install "pandoc" (a new enough version that `pandoc --citeproc` works -- + this may require downloading a recent binary or source package, + even on a fairly recent Ubuntu or WSL) +2. Install the Python "panflute" package +3. Run make diff --git a/fix-rankk-rank2k/fix-rankk-rank2k.md b/fix-rankk-rank2k/fix-rankk-rank2k.md new file mode 100644 index 00000000..7a746b71 --- /dev/null +++ b/fix-rankk-rank2k/fix-rankk-rank2k.md @@ -0,0 +1,742 @@ + +--- +title: "Fix C++26 by making the symmetric and Hermitian rank-k and rank-2k updates consistent with the BLAS" +document: P3371R0 +date: today +audience: LEWG +author: + - name: Mark Hoemmen + email: +toc: true +--- + +# Authors + +* Mark Hoemmen (mhoemmen@nvidia.com) (NVIDIA) + +# Revision history + +* Revision 0 to be submitted 2024-08-15 + +# Abstract + +The four `std::linalg` functions `symmetric_matrix_rank_k_update`, `hermitian_matrix_rank_k_update`, `symmetric_matrix_rank_2k_update`, and `hermitian_matrix_rank_2k_update` currently have behavior inconsistent with their corresponding BLAS (Basic Linear Algebra Subroutines) routines. Also, their behavior is inconsistent with `std::linalg`'s `matrix_product`, even though in mathematical terms they are special cases of a matrix-matrix product. + +We propose fixing this by making two changes. First, we will add "updating" overloads analogous to the updating overloads of `matrix_product`. For example, `symmetric_matrix_rank_k_update(A, scaled(beta, C), C, upper_triangle)` will perform $C := \beta C + A A^T$. Second, we will change the behavior of the existing overloads to be "overwriting," without changing the number of their parameters or how they are called. For example, `symmetric_matrix_rank_k_update(A, C, upper_triangle)` will perform $C := A A^T$ instead of $C := C + A A^T$. + +The second set of changes is a breaking change. Thus, we must finish this before finalization of C++26. + +# Motivation + +Each function in std::linalg generally corresponds to one or more routines or functions in the original BLAS (Basic Linear Algebra Subroutines). Every computation that the BLAS can do, a function in std::linalg should be able to do. + +One std::linalg user reported an exception to this rule. The BLAS routine `DSYRK` (Double-precision SYmmetric Rank-K update) computes $C := \beta C + \alpha A A^T$, but the corresponding `std::linalg` function `symmetric_matrix_rank_k_update` only computes $C := C + \alpha A A^T$. That is, `std::linalg` currently has no way to express this BLAS operation with a general $\beta$ scaling factor. This issue applies to all of the following functions. + +* `symmetric_matrix_rank_k_update`: computes $C := C + \alpha A A^T$ +* `hermitian_matrix_rank_k_update`: computes $C := C + \alpha A A^H$ +* `symmetric_matrix_rank_2k_update`: computes $C := C + \alpha A B^H + \alpha B A^H$ +* `hermitian_matrix_rank_2k_update`: computes $C := C + \alpha A B^H + \bar{\alpha} B A^H$, where $\bar{\alpha}$ denotes the complex conjugate of $\alpha$ + +These functions implement special cases of matrix-matrix products. The `matrix_product` function in `std::linalg` implements the general case of matrix-matrix products. This function corresponds to the BLAS's `SGEMM`, `DGEMM`, `CGEMM`, and `ZGEMM`, which compute $C := \beta C + \alpha A B$, where $\alpha$ and $\beta$ are scaling factors. The `matrix_product` function has two kinds of overloads: + +1. *overwriting* ($C = A B$) and + +2. *updating* ($C = E + A B$). + +The updating overloads handle the general $\alpha$ and $\beta$ case by `matrix_product(scaled(alpha, A), B, scaled(beta, C), C)`. The specification explicitly permits the input `scaled(beta, C)` to alias the output `C` (**[linalg.algs.blas3.gemm]** 10: "*Remarks*: `C` may alias `E`"). The `std::linalg` library provides overwriting and updating overloads so that it can do everything that the BLAS does, just in a more idiomatically C++ way. Please see P1673R13 Section 10.3 ("Function argument aliasing and zero scalar multipliers") for a more detailed explanation. + +The problem with functions like `symmetric_matrix_rank_k_update` is that they have the same _calling syntax_ as the overwriting version of `matrix_product`, but _semantics_ that differ from both the overwriting and the updating versions of `matrix_product`. The current overloads are not overwriting, so we can't just fix this problem by introducing an "updating" overload for each function. + +Incidentally, the fact that these functions have "update" in their name is not relevant, because that naming choice is original to the BLAS. The BLAS calls its corresponding `xSYRK`, `xHERK`, `xSYR2K`, and `xHER2K` routines "{Symmetric, Hermitian} rank {one, two} update," even though setting $\beta = 0$ makes these routines "overwriting" in the sense of `std::linalg`. + +# Proposed changes + +We propose to fix this by making the four functions work just like `matrix_vector_product` or `matrix_product`. This entails two changes. + +1. We will add "updating" overloads with a new input matrix parameter `E`, analogous to the updating overloads of `matrix_product`. For example, `symmetric_matrix_rank_k_update(A, E, C, upper_triangle)` will compute $C = E + A A^T$. We will specifically permit `C` and `E` to alias, thus permitting the desired case where `E` is `scaled(beta, C)`. The updating overloads will take `E` as an _`in-matrix`_, and will change `C` from a _`possibly-packed-inout-matrix`_ to a _`possibly-packed-out-matrix`_. + +2. We will change the behavior of the existing overloads to be "overwriting," without changing how they are called. For example, `symmetric_matrix_rank_k_update(A, C, upper_triangle)` will compute $C = A A^T$ instead of $C := C + A A^T$. The overwriting overloads will change `C` from a _`possibly-packed-inout-matrix`_ to a _`possibly-packed-out-matrix`_. + +This second set of changes is a breaking change. It needs to be so that we can provide the overwriting behavior $C := \alpha A A^T$ that the corresponding BLAS routines already provide. Thus, we must finish this before finalization of C++26. + +Both sets of overloads still only write to the specified triangle (lower or upper) of the output matrix `C`. As a result, the new updating overloads only read from that triangle of the input matrix `E`. Therefore, even though `E` may be a different matrix than `C`, the updating overloads do not need an additional `Triangle t_E` parameter for `E`. The `symmetric_*` functions interpret `E` as symmetric in the same way that they interpret `C` as symmetric, and the `hermitian_*` functions interpret `E` as Hermitian in the same way that they interpret `C` as Hermitian. + +# We do NOT propose changing rank-1 or rank-2 update functions + +The four functions we propose to change have the following rank-1 and rank-2 analogs, where $A$ denotes a symmetric or Hermitian matrix (depending on the function's name) and $x$ and $y$ denote vectors. + +* `symmetric_matrix_rank_1_update`: computes $A := A + \alpha x x^T$ +* `hermetian_matrix_rank_1_update`: computes $A := A + \alpha x x^H$ +* `symmetric_matrix_rank_2_update`: computes $A := A + \alpha x y^T + \alpha y x^T$ +* `hermitian_matrix_rank_2_update`: computes $A := A + \alpha x y^H + \bar{\alpha} x y^H$ + +We do NOT propose to change these functions analogously to the rank-k and rank-2k update functions. This is because the BLAS routines corresponding to the rank-1 and rank-2 functions -- `xSYR`, `xHER`, `xSYR2`, and `xHER2` -- do not have a way to supply a $\beta$ scaling factor. That is, these `std::linalg` functions can already do everything that their corresponding BLAS routines can do. This is consistent with our design intent in Section 10.3 of P1673R3 for translating Fortran `INTENT(INOUT)` arguments into a C++ idiom. + +> b. Else, if the BLAS function unconditionally updates (like `xGER`), we retain read-and-write behavior for that argument. +> +> c. Else, if the BLAS function uses a scalar `beta` argument to decide whether to read the output argument as well as write to it (like `xGEMM`), we provide two versions: a write-only [that is, "overwriting"] version (as if `beta` is zero), and a read-and-write [that is, "updating"] version (as if `beta` is nonzero). + +The rank-1 and rank-2 update functions "unconditionally update," in the same way that the BLAS's general rank-1 update routine `xGER` does. However, the BLAS's rank-k and rank-2k update functions "use a scalar `beta` argument...," so for consistency, it makes sense for `std::linalg` to provide both overwriting and updating versions. + +Since we do not propose changing the symmetric and Hermitian rank-1 and rank-2 functions, we retain the exposition-only concept _`possibly-packed-inout-matrix`_, which they use to constrain their parameter `A`. + +# Fixes for some wording issues + +There are some wording issues in **[linalg.algs.blas3.rankk]** and **[linalg.algs.blas3.rank2k]**, specifically in the Mandates, Preconditions, and Complexity clauses. These clauses do not reflect the design intent, which is expressed by the corresponding BLAS routines and the mathematical expressions in the current wording that express the functions' behavior. We have not yet filed an LWG issue for these wording issues. Given that the above changes will call for adjusting some parts of these clauses anyway (e.g., by changing `InOutMat` to `OutMat`), we have decided to propose "drive-by" fixes to these clauses in this proposal. We nevertheless plan to file an LWG issue to fix both these and other wording issues we have found. + +# Wording + +> Text in blockquotes is not proposed wording, +> but rather instructions for generating proposed wording. +> The � character is used to denote a placeholder section number +> which the editor shall determine. +> +> In **[version.syn]**, for the following definition, + +```c++ +#define __cpp_lib_linalg YYYYMML // also in +``` + +> adjust the placeholder value `YYYYMML` as needed +> so as to denote this proposal's date of adoption. + +## New exposition-only concept + +> In the header `` synopsis **[linalg.syn]**, +> immediately before the following, + +```c++ + template + concept @_possibly-packed-inout-matrix_@ = @_see below_@; // exposition only +``` + +> add the following. + +```c++ + template + concept @_possibly-packed-out-matrix_@ = @_see below_@; // exposition only +``` + +Then, to **[linalg.helpers.concepts]**, immediately before the following, + +```c++ +template + concept @_possibly-packed-inout-matrix_@ = + is-mdspan && T::rank() == 2 && + is_assignable_v && + (T::is_always_unique() || is-layout-blas-packed); +``` + +> add the following definition of the exposition-only concept _`possibly-packed-out-matrix`_. + +```c++ +template + concept @_possibly-packed-out-matrix_@ = + is-mdspan && T::rank() == 2 && + is_assignable_v && + (T::is_always_unique() || is-layout-blas-packed); +``` + +## Rank-k update functions in synopsis + +> In the header `` synopsis **[linalg.syn]**, +> change the following + +```c++ + // rank-k symmetric matrix update + template + void symmetric_matrix_rank_k_update(Scalar alpha, InMat A, InOutMat C, Triangle t); + template + void symmetric_matrix_rank_k_update(ExecutionPolicy&& exec, + Scalar alpha, InMat A, InOutMat C, Triangle t); + + template<@_in-matrix_@ InMat, @_possibly-packed-inout-matrix_@ InOutMat, class Triangle> + void symmetric_matrix_rank_k_update(InMat A, InOutMat C, Triangle t); + template + void symmetric_matrix_rank_k_update(ExecutionPolicy&& exec, + InMat A, InOutMat C, Triangle t); + + // rank-k Hermitian matrix update + template + void hermitian_matrix_rank_k_update(Scalar alpha, InMat A, InOutMat C, Triangle t); + template + void hermitian_matrix_rank_k_update(ExecutionPolicy&& exec, + Scalar alpha, InMat A, InOutMat C, Triangle t); + + template<@_in-matrix_@ InMat, @_possibly-packed-inout-matrix_@ InOutMat, class Triangle> + void hermitian_matrix_rank_k_update(InMat A, InOutMat C, Triangle t); + template + void hermitian_matrix_rank_k_update(ExecutionPolicy&& exec, + InMat A, InOutMat C, Triangle t); +``` + +> to read as follows. + +```c++ + // overwriting rank-k symmetric matrix update + template + void symmetric_matrix_rank_k_update( + Scalar alpha, InMat A, OutMat C, Triangle t); + template + void symmetric_matrix_rank_k_update( + ExecutionPolicy&& exec, Scalar alpha, InMat A, OutMat C, Triangle t); + template<@_in-matrix_@ InMat, + @_possibly-packed-out-matrix_@ OutMat, + class Triangle> + void symmetric_matrix_rank_k_update( + InMat A, OutMat C, Triangle t); + template + void symmetric_matrix_rank_k_update( + ExecutionPolicy&& exec, InMat A, OutMat C, Triangle t); + + // updating rank-k symmetric matrix update + template + void symmetric_matrix_rank_k_update( + Scalar alpha, + InMat1 A, InMat2 E, OutMat C, Triangle t); + template + void symmetric_matrix_rank_k_update( + ExecutionPolicy&& exec, Scalar alpha, + InMat1 A, InMat2 E, OutMat C, Triangle t); + template<@_in-matrix_@ InMat1, + @_in-matrix_@ InMat2, + @_possibly-packed-out-matrix_@ OutMat, + class Triangle> + void symmetric_matrix_rank_k_update( + InMat1 A, InMat2 E, OutMat C, Triangle t); + template + void symmetric_matrix_rank_k_update( + ExecutionPolicy&& exec, + InMat1 A, InMat2 E, OutMat C, Triangle t); + + // overwriting rank-k Hermitian matrix update + template + void hermitian_matrix_rank_k_update( + Scalar alpha, InMat A, OutMat C, Triangle t); + template + void hermitian_matrix_rank_k_update( + ExecutionPolicy&& exec, Scalar alpha, InMat A, OutMat C, Triangle t); + template<@_in-matrix_@ InMat, + @_possibly-packed-out-matrix_@ OutMat, + class Triangle> + void hermitian_matrix_rank_k_update( + InMat A, OutMat C, Triangle t); + template + void hermitian_matrix_rank_k_update( + ExecutionPolicy&& exec, InMat A, OutMat C, Triangle t); + + // updating rank-k Hermitian matrix update + template + void hermitian_matrix_rank_k_update( + Scalar alpha, + InMat1 A, InMat2 E, OutMat C, Triangle t); + template + void hermitian_matrix_rank_k_update( + ExecutionPolicy&& exec, Scalar alpha, + InMat1 A, InMat2 E, OutMat C, Triangle t); + template<@_in-matrix_@ InMat1, + @_in-matrix_@ InMat2, + @_possibly-packed-out-matrix_@ OutMat, + class Triangle> + void hermitian_matrix_rank_k_update( + InMat1 A, InMat2 E, OutMat C, Triangle t); + template + void hermitian_matrix_rank_k_update( + ExecutionPolicy&& exec, + InMat1 A, InMat2 E, OutMat C, Triangle t); +``` + +## Rank-2k update functions in synopsis + +> In the header `` synopsis **[linalg.syn]**, +> change the following + +```c++ + // rank-2k symmetric matrix update + template<@_in-matrix_@ InMat1, @_in-matrix_@ InMat2, + @_possibly-packed-inout-matrix_@ InOutMat, class Triangle> + void symmetric_matrix_rank_2k_update(InMat1 A, InMat2 B, InOutMat C, Triangle t); + template + void symmetric_matrix_rank_2k_update(ExecutionPolicy&& exec, + InMat1 A, InMat2 B, InOutMat C, Triangle t); + + // rank-2k Hermitian matrix update + template<@_in-matrix_@ InMat1, @_in-matrix_@ InMat2, + @_possibly-packed-inout-matrix_@ InOutMat, class Triangle> + void hermitian_matrix_rank_2k_update(InMat1 A, InMat2 B, InOutMat C, Triangle t); + template + void hermitian_matrix_rank_2k_update(ExecutionPolicy&& exec, + InMat1 A, InMat2 B, InOutMat C, Triangle t); +``` + +> to read as follows. + +```c++ + // overwriting rank-2k symmetric matrix update + template<@_in-matrix_@ InMat1, @_in-matrix_@ InMat2, + @_possibly-packed-out-matrix_@ OutMat, class Triangle> + void symmetric_matrix_rank_2k_update(InMat1 A, InMat2 B, OutMat C, Triangle t); + template + void symmetric_matrix_rank_2k_update(ExecutionPolicy&& exec, + InMat1 A, InMat2 B, OutMat C, Triangle t); + + // updating rank-2k symmetric matrix update + template<@_in-matrix_@ InMat1, @_in-matrix_@ InMat2, @_in-matrix_@ InMat3, + @_possibly-packed-out-matrix_@ OutMat, class Triangle> + void symmetric_matrix_rank_2k_update(InMat1 A, InMat2 B, InMat3 E, OutMat C, Triangle t); + template + void symmetric_matrix_rank_2k_update(ExecutionPolicy&& exec, + InMat1 A, InMat2 B, InMat3 E, OutMat C, Triangle t); + + // overwriting rank-2k Hermitian matrix update + template<@_in-matrix_@ InMat1, @_in-matrix_@ InMat2, + @_possibly-packed-out-matrix_@ OutMat, class Triangle> + void hermitian_matrix_rank_2k_update(InMat1 A, InMat2 B, OutMat C, Triangle t); + template + void hermitian_matrix_rank_2k_update(ExecutionPolicy&& exec, + InMat1 A, InMat2 B, OutMat C, Triangle t); + + // updating rank-2k Hermitian matrix update + template<@_in-matrix_@ InMat1, @_in-matrix_@ InMat2, @_in-matrix_@ InMat3, + @_possibly-packed-out-matrix_@ OutMat, class Triangle> + void hermitian_matrix_rank_2k_update(InMat1 A, InMat2 B, InMat3 E, OutMat C, Triangle t); + template + void hermitian_matrix_rank_2k_update(ExecutionPolicy&& exec, + InMat1 A, InMat2 B, InMat3 E, OutMat C, Triangle t); +``` + +## Specification of rank-k update functions + +> Replace the entire contents of [linalg.algs.blas3.rankk] with the following. + +[Note: These functions correspond to the BLAS functions +`xSYRK` and `xHERK`. -- end note] + +[1]{.pnum} The following elements apply to all functions in [linalg.algs.blas3.rankk]. + +[2]{.pnum} *Mandates:* + + * [2.1]{.pnum} If `OutMat` has `layout_blas_packed` layout, then the + layout's `Triangle` template argument has the same type as + the function's `Triangle` template argument. + + * [2.2]{.pnum} _`possibly-multipliable`_`` is `true`. + + * [2.3]{.pnum} _`possibly-addable`_`` is `true` for those overloads that take an `E` parameter. + +[3]{.pnum} *Preconditions:* + + * [3.1]{.pnum} _`multipliable`_`(A, transposed(A), C)` is `true`. [Note: This implies that `C` is square. -- end note] + + * [3.2]{.pnum} _`addable`_`(C, E, C)` is `true` for those overloads that take an `E` parameter. + +[4]{.pnum} *Complexity:* $O($ `A.extent(0)` $\cdot$ `A.extent(1)` $\cdot$ `A.extent(0)` $)$. + +```c++ +template +void symmetric_matrix_rank_k_update( + Scalar alpha, + InMat A, + OutMat C, + Triangle t); +template +void symmetric_matrix_rank_k_update( + ExecutionPolicy&& exec, + Scalar alpha, + InMat A, + OutMat C, + Triangle t); +``` + +[5]{.pnum} *Effects:* +Computes $C = \alpha A A^T$, +where the scalar $\alpha$ is `alpha`. + +```c++ +template<@_in-matrix_@ InMat, + @_possibly-packed-out-matrix_@ OutMat, + class Triangle> +void symmetric_matrix_rank_k_update( + InMat A, + OutMat C, + Triangle t); +template +void symmetric_matrix_rank_k_update( + ExecutionPolicy&& exec, + InMat A, + OutMat C, + Triangle t); +``` + +[6]{.pnum} *Effects:* +Computes $C = A A^T$. + +```c++ +template +void hermitian_matrix_rank_k_update( + Scalar alpha, + InMat A, + OutMat C, + Triangle t); +template +void hermitian_matrix_rank_k_update( + ExecutionPolicy&& exec, + Scalar alpha, + InMat A, + OutMat C, + Triangle t); +``` + +[7]{.pnum} *Effects:* +Computes $C = \alpha A A^H$, +where the scalar $\alpha$ is `alpha`. + +```c++ +template<@_in-matrix_@ InMat, + @_possibly-packed-out-matrix_@ OutMat, + class Triangle> +void hermitian_matrix_rank_k_update( + InMat A, + OutMat C, + Triangle t); +template +void hermitian_matrix_rank_k_update( + ExecutionPolicy&& exec, + InMat A, + OutMat C, + Triangle t); +``` + +[8]{.pnum} *Effects:* +Computes $C = A A^H$. + +```c++ +template +void symmetric_matrix_rank_k_update( + Scalar alpha, + InMat1 A, + InMat2 E, + OutMat C, + Triangle t); +template +void symmetric_matrix_rank_k_update( + ExecutionPolicy&& exec, + Scalar alpha, + InMat1 A, + InMat2 E, + OutMat C, + Triangle t); +``` + +[9]{.pnum} *Effects:* +Computes $C = E + \alpha A A^T$, +where the scalar $\alpha$ is `alpha`. + +```c++ +template<@_in-matrix_@ InMat1, + @_in-matrix_@ InMat2, + @_possibly-packed-out-matrix_@ OutMat, + class Triangle> +void symmetric_matrix_rank_k_update( + InMat1 A, + InMat2 E, + OutMat C, + Triangle t); +template +void symmetric_matrix_rank_k_update( + ExecutionPolicy&& exec, + InMat1 A, + InMat2 E, + OutMat C, + Triangle t); +``` + +[10]{.pnum} *Effects:* +Computes $C = E + A A^T$. + +```c++ +template +void hermitian_matrix_rank_k_update( + Scalar alpha, + InMat1 A, + InMat2 E, + OutMat C, + Triangle t); +template +void hermitian_matrix_rank_k_update( + ExecutionPolicy&& exec, + Scalar alpha, + InMat1 A, + InMat2 E, + OutMat C, + Triangle t); +``` + +[11]{.pnum} *Effects:* +Computes $C = E + \alpha A A^H$, +where the scalar $\alpha$ is `alpha`. + +```c++ +template<@_in-matrix_@ InMat1, + @_in-matrix_@ InMat2, + @_possibly-packed-out-matrix_@ OutMat, + class Triangle> +void hermitian_matrix_rank_k_update( + InMat1 A, + InMat2 E, + OutMat C, + Triangle t); +template +void hermitian_matrix_rank_k_update( + ExecutionPolicy&& exec, + InMat1 A, + InMat2 E, + OutMat C, + Triangle t); +``` + +[12]{.pnum} *Effects:* +Computes $C = E + A A^H$. + +## Specification of rank-2k update functions + +> Replace the entire contents of [linalg.algs.blas3.rank2k] with the following. + +[Note: These functions correspond to the BLAS functions +`xSYR2K` and `xHER2K`. -- end note] + +[1]{.pnum} The following elements apply to all functions in [linalg.algs.blas3.rank2k]. + +[2]{.pnum} *Mandates:* + + * [2.1]{.pnum} If `OutMat` has `layout_blas_packed` layout, then the + layout's `Triangle` template argument has the same type as + the function's `Triangle` template argument; + + * [2.2]{.pnum} _`possibly-multipliable`_`` is `true`. + + * [2.3]{.pnum} _`possibly-multipliable`_`` is `true`. + + * [2.4]{.pnum} _`possibly-addable`_`` is `true` for those overloads that take an `E` parameter. + +[3]{.pnum} *Preconditions:* + + * [3.1]{.pnum} _`multipliable`_`(A, transposed(B), C)` is `true`. + + * [3.2]{.pnum} _`multipliable`_`(B, transposed(A), C)` is `true`. [Note: This and the previous imply that `C` is square. -- end note] + + * [3.3]{.pnum} _`addable`_`(C, E, C)` is `true` for those overloads that take an `E` parameter. + +[4]{.pnum} *Complexity:* $O($ `A.extent(0)` $\cdot$ `A.extent(1)` $\cdot$ `B.extent(0)` $)$ + +```c++ +template<@_in-matrix_@ InMat1, + @_in-matrix_@ InMat2, + @_possibly-packed-out-matrix_@ OutMat, + class Triangle> +void symmetric_matrix_rank_2k_update( + InMat1 A, + InMat2 B, + OutMat C, + Triangle t); +template +void symmetric_matrix_rank_2k_update( + ExecutionPolicy&& exec, + InMat1 A, + InMat2 B, + OutMat C, + Triangle t); +``` + +[5]{.pnum} *Effects:* Computes $C = A B^T + B A^T$. + +```c++ +template<@_in-matrix_@ InMat1, + @_in-matrix_@ InMat2, + @_possibly-packed-out-matrix_@ OutMat, + class Triangle> +void hermitian_matrix_rank_2k_update( + InMat1 A, + InMat2 B, + OutMat C, + Triangle t); +template +void hermitian_matrix_rank_2k_update( + ExecutionPolicy&& exec, + InMat1 A, + InMat2 B, + OutMat C, + Triangle t); +``` + +[6]{.pnum} *Effects:* Computes $C = A B^H + B A^H$. + +```c++ +template<@_in-matrix_@ InMat1, + @_in-matrix_@ InMat2, + @_in-matrix_@ InMat3, + @_possibly-packed-out-matrix_@ OutMat, + class Triangle> +void symmetric_matrix_rank_2k_update( + InMat1 A, + InMat2 B, + InMat3 E, + OutMat C, + Triangle t); +template +void symmetric_matrix_rank_2k_update( + ExecutionPolicy&& exec, + InMat1 A, + InMat2 B, + InMat3 E, + OutMat C, + Triangle t); +``` + +[7]{.pnum} *Effects:* Computes $C = E + A B^T + B A^T$. + +```c++ +template<@_in-matrix_@ InMat1, + @_in-matrix_@ InMat2, + @_in-matrix_@ InMat3, + @_possibly-packed-out-matrix_@ OutMat, + class Triangle> +void hermitian_matrix_rank_2k_update( + InMat1 A, + InMat2 B, + InMat3 E, + OutMat C, + Triangle t); +template +void hermitian_matrix_rank_2k_update( + ExecutionPolicy&& exec, + InMat1 A, + InMat2 B, + InMat3 E, + OutMat C, + Triangle t); +``` + +[8]{.pnum} *Effects:* Computes $C = E + A B^H + B A^H$. From d23f8da45adb1b1cd08ce11e5d6e4f3e5883c33f Mon Sep 17 00:00:00 2001 From: Mark Hoemmen Date: Wed, 7 Aug 2024 21:58:34 -0600 Subject: [PATCH 02/17] P3371R0: Add Acknowledgment --- fix-rankk-rank2k/P3371R0.html | 4 ++++ fix-rankk-rank2k/fix-rankk-rank2k.md | 2 ++ 2 files changed, 6 insertions(+) diff --git a/fix-rankk-rank2k/P3371R0.html b/fix-rankk-rank2k/P3371R0.html index 645b16b5..c2b73a5b 100644 --- a/fix-rankk-rank2k/P3371R0.html +++ b/fix-rankk-rank2k/P3371R0.html @@ -663,6 +663,10 @@

8 Wording

Text in blockquotes is not proposed wording, but rather instructions diff --git a/fix-rankk-rank2k/fix-rankk-rank2k.md b/fix-rankk-rank2k/fix-rankk-rank2k.md index 7a746b71..5774e04d 100644 --- a/fix-rankk-rank2k/fix-rankk-rank2k.md +++ b/fix-rankk-rank2k/fix-rankk-rank2k.md @@ -84,6 +84,8 @@ Since we do not propose changing the symmetric and Hermitian rank-1 and rank-2 f There are some wording issues in **[linalg.algs.blas3.rankk]** and **[linalg.algs.blas3.rank2k]**, specifically in the Mandates, Preconditions, and Complexity clauses. These clauses do not reflect the design intent, which is expressed by the corresponding BLAS routines and the mathematical expressions in the current wording that express the functions' behavior. We have not yet filed an LWG issue for these wording issues. Given that the above changes will call for adjusting some parts of these clauses anyway (e.g., by changing `InOutMat` to `OutMat`), we have decided to propose "drive-by" fixes to these clauses in this proposal. We nevertheless plan to file an LWG issue to fix both these and other wording issues we have found. +Many thanks (with permission) to Raffaele Solcà (CSCS Swiss National Supercomputing Centre, `raffaele.solca@cscs.ch`) for pointing out this issue and related issues that we plan to file as part of the aforementioned LWG issue. + # Wording > Text in blockquotes is not proposed wording, From 5fffbc43f8adfc7e4d58d2221fdb36432ff3ff21 Mon Sep 17 00:00:00 2001 From: Mark Hoemmen Date: Wed, 4 Sep 2024 22:31:59 -0600 Subject: [PATCH 03/17] P3371: Start R1 Add Ilya Burylov as coauthor --- fix-rankk-rank2k/P3371R1.html | 1307 ++++++++++++++++++++++++++ fix-rankk-rank2k/fix-rankk-rank2k.md | 9 +- 2 files changed, 1315 insertions(+), 1 deletion(-) create mode 100644 fix-rankk-rank2k/P3371R1.html diff --git a/fix-rankk-rank2k/P3371R1.html b/fix-rankk-rank2k/P3371R1.html new file mode 100644 index 00000000..cc45751d --- /dev/null +++ b/fix-rankk-rank2k/P3371R1.html @@ -0,0 +1,1307 @@ + + + + + + + + Fix C++26 by making the symmetric and Hermitian rank-k and rank-2k updates consistent with the BLAS + + + + + + + + +

+
+

Fix C++26 by making the +symmetric and Hermitian rank-k and rank-2k updates consistent with the +BLAS

+ + + + + + + + + + + + + + + + + + +
Document #: P3371R1
Date: 2024-09-04
Project: Programming Language C++
+ LEWG
+
Reply-to: + Mark Hoemmen
<>
+ Ilya Burylov
<>
+
+ +
+
+ +

1 Authors

+
    +
  • Mark Hoemmen (mhoemmen@nvidia.com) (NVIDIA)
  • +
  • Ilya Burylov (burylov@gmail.com) (NVIDIA)
  • +
+

2 Revision history

+
    +
  • Revision 0 to be submitted 2024-08-15

  • +
  • Revision 1 to be submitted 2024-09-15

    +
      +
    • Add Ilya Burylov as coauthor
    • +
  • +
+

3 Abstract

+

The four std::linalg functions +symmetric_matrix_rank_k_update, +hermitian_matrix_rank_k_update, +symmetric_matrix_rank_2k_update, and +hermitian_matrix_rank_2k_update currently have behavior +inconsistent with their corresponding BLAS (Basic Linear Algebra +Subroutines) routines. Also, their behavior is inconsistent with +std::linalg’s matrix_product, even though in +mathematical terms they are special cases of a matrix-matrix +product.

+

We propose fixing this by making two changes. First, we will add +“updating” overloads analogous to the updating overloads of +matrix_product. For example, +symmetric_matrix_rank_k_update(A, scaled(beta, C), C, upper_triangle) +will perform C := βC + AAT. +Second, we will change the behavior of the existing overloads to be +“overwriting,” without changing the number of their parameters or how +they are called. For example, +symmetric_matrix_rank_k_update(A, C, upper_triangle) will +perform C := AAT +instead of C := C + AAT.

+

The second set of changes is a breaking change. Thus, we must finish +this before finalization of C++26.

+

4 Motivation

+

Each function in std::linalg generally corresponds to one or more +routines or functions in the original BLAS (Basic Linear Algebra +Subroutines). Every computation that the BLAS can do, a function in +std::linalg should be able to do.

+

One std::linalg user +reported +an exception to this rule. The BLAS routine DSYRK +(Double-precision SYmmetric Rank-K update) computes C := βC + αAAT, +but the corresponding std::linalg function +symmetric_matrix_rank_k_update only computes C := C + αAAT. +That is, std::linalg currently has no way to express this +BLAS operation with a general β scaling factor. This issue applies +to all of the following functions.

+
    +
  • symmetric_matrix_rank_k_update: computes C := C + αAAT
  • +
  • hermitian_matrix_rank_k_update: computes C := C + αAAH
  • +
  • symmetric_matrix_rank_2k_update: computes C := C + αABH + αBAH
  • +
  • hermitian_matrix_rank_2k_update: computes C := C + αABH + ᾱBAH, +where ᾱ denotes the complex +conjugate of α
  • +
+

These functions implement special cases of matrix-matrix products. +The matrix_product function in std::linalg +implements the general case of matrix-matrix products. This function +corresponds to the BLAS’s SGEMM, DGEMM, +CGEMM, and ZGEMM, which compute C := βC + αAB, +where α and β are scaling factors. The +matrix_product function has two kinds of overloads:

+
    +
  1. overwriting (C = AB) +and

  2. +
  3. updating (C = E + AB).

  4. +
+

The updating overloads handle the general α and β case by +matrix_product(scaled(alpha, A), B, scaled(beta, C), C). +The specification explicitly permits the input +scaled(beta, C) to alias the output C +([linalg.algs.blas3.gemm] 10: “Remarks: +C may alias E”). The std::linalg +library provides overwriting and updating overloads so that it can do +everything that the BLAS does, just in a more idiomatically C++ way. +Please see +P1673R13 +Section 10.3 (“Function argument aliasing and zero scalar +multipliers”) for a more detailed explanation.

+

The problem with functions like +symmetric_matrix_rank_k_update is that they have the same +calling syntax as the overwriting version of +matrix_product, but semantics that differ from +both the overwriting and the updating versions of +matrix_product. The current overloads are not overwriting, +so we can’t just fix this problem by introducing an “updating” overload +for each function.

+

Incidentally, the fact that these functions have “update” in their +name is not relevant, because that naming choice is original to the +BLAS. The BLAS calls its corresponding xSYRK, +xHERK, xSYR2K, and xHER2K +routines “{Symmetric, Hermitian} rank {one, two} update,” even though +setting β = 0 makes these +routines “overwriting” in the sense of std::linalg.

+

5 Proposed changes

+

We propose to fix this by making the four functions work just like +matrix_vector_product or matrix_product. This +entails two changes.

+
    +
  1. We will add “updating” overloads with a new input matrix +parameter E, analogous to the updating overloads of +matrix_product. For example, +symmetric_matrix_rank_k_update(A, E, C, upper_triangle) +will compute C = E + AAT. +We will specifically permit C and E to alias, +thus permitting the desired case where E is +scaled(beta, C). The updating overloads will take +E as an in-matrix, and will change +C from a possibly-packed-inout-matrix +to a possibly-packed-out-matrix.

  2. +
  3. We will change the behavior of the existing overloads to be +“overwriting,” without changing how they are called. For example, +symmetric_matrix_rank_k_update(A, C, upper_triangle) will +compute C = AAT +instead of C := C + AAT. +The overwriting overloads will change C from a +possibly-packed-inout-matrix to a +possibly-packed-out-matrix.

  4. +
+

This second set of changes is a breaking change. It needs to be so +that we can provide the overwriting behavior C := αAAT +that the corresponding BLAS routines already provide. Thus, we must +finish this before finalization of C++26.

+

Both sets of overloads still only write to the specified triangle +(lower or upper) of the output matrix C. As a result, the +new updating overloads only read from that triangle of the input matrix +E. Therefore, even though E may be a different +matrix than C, the updating overloads do not need an +additional Triangle t_E parameter for E. The +symmetric_* functions interpret E as symmetric +in the same way that they interpret C as symmetric, and the +hermitian_* functions interpret E as Hermitian +in the same way that they interpret C as Hermitian.

+

6 We do NOT propose changing rank-1 +or rank-2 update functions

+

The four functions we propose to change have the following rank-1 and +rank-2 analogs, where A +denotes a symmetric or Hermitian matrix (depending on the function’s +name) and x and y denote vectors.

+
    +
  • symmetric_matrix_rank_1_update: computes A := A + αxxT
  • +
  • hermetian_matrix_rank_1_update: computes A := A + αxxH
  • +
  • symmetric_matrix_rank_2_update: computes A := A + αxyT + αyxT
  • +
  • hermitian_matrix_rank_2_update: computes A := A + αxyH + ᾱxyH
  • +
+

We do NOT propose to change these functions analogously to the rank-k +and rank-2k update functions. This is because the BLAS routines +corresponding to the rank-1 and rank-2 functions – xSYR, +xHER, xSYR2, and xHER2 – do not +have a way to supply a β +scaling factor. That is, these std::linalg functions can +already do everything that their corresponding BLAS routines can do. +This is consistent with our design intent in +Section +10.3 of P1673R3 for translating Fortran INTENT(INOUT) +arguments into a C++ idiom.

+
+
    +
  1. Else, if the BLAS function unconditionally updates (like +xGER), we retain read-and-write behavior for that +argument.

  2. +
  3. Else, if the BLAS function uses a scalar beta +argument to decide whether to read the output argument as well as write +to it (like xGEMM), we provide two versions: a write-only +[that is, “overwriting”] version (as if beta is zero), and +a read-and-write [that is, “updating”] version (as if beta +is nonzero).

  4. +
+
+

The rank-1 and rank-2 update functions “unconditionally update,” in +the same way that the BLAS’s general rank-1 update routine +xGER does. However, the BLAS’s rank-k and rank-2k update +functions “use a scalar beta argument…,” so for +consistency, it makes sense for std::linalg to provide both +overwriting and updating versions.

+

Since we do not propose changing the symmetric and Hermitian rank-1 +and rank-2 functions, we retain the exposition-only concept +possibly-packed-inout-matrix, which they use to +constrain their parameter A.

+

7 Fixes for some wording issues

+

There are some wording issues in +[linalg.algs.blas3.rankk] and +[linalg.algs.blas3.rank2k], specifically in the +Mandates, Preconditions, and Complexity clauses. These clauses do not +reflect the design intent, which is expressed by the corresponding BLAS +routines and the mathematical expressions in the current wording that +express the functions’ behavior. We have not yet filed an LWG issue for +these wording issues. Given that the above changes will call for +adjusting some parts of these clauses anyway (e.g., by changing +InOutMat to OutMat), we have decided to +propose “drive-by” fixes to these clauses in this proposal. We +nevertheless plan to file an LWG issue to fix both these and other +wording issues we have found.

+

Many thanks (with permission) to Raffaele Solcà (CSCS Swiss National +Supercomputing Centre, raffaele.solca@cscs.ch) for pointing +out this issue and related issues that we plan to file as part of the +aforementioned LWG issue.

+

8 Wording

+
+

Text in blockquotes is not proposed wording, but rather instructions +for generating proposed wording. The � character is used to denote a +placeholder section number which the editor shall determine.

+

In [version.syn], for the following definition,

+
+
#define __cpp_lib_linalg YYYYMML // also in <linalg>
+
+

adjust the placeholder value YYYYMML as needed so as to +denote this proposal’s date of adoption.

+
+

8.1 New exposition-only concept

+
+

In the header <linalg> synopsis +[linalg.syn], immediately before the following,

+
+
  template<class T>
+    concept possibly-packed-inout-matrix = see below;   // exposition only
+
+

add the following.

+
+
  template<class T>
+    concept possibly-packed-out-matrix = see below;   // exposition only
+

Then, to [linalg.helpers.concepts], immediately +before the following,

+
template<class T>
+  concept possibly-packed-inout-matrix =
+    is-mdspan<T> && T::rank() == 2 &&
+    is_assignable_v<typename T::reference, typename T::element_type> &&
+    (T::is_always_unique() || is-layout-blas-packed<typename T::layout_type>);
+
+

add the following definition of the exposition-only concept +possibly-packed-out-matrix.

+
+
template<class T>
+  concept possibly-packed-out-matrix =
+    is-mdspan<T> && T::rank() == 2 &&
+    is_assignable_v<typename T::reference, typename T::element_type> &&
+    (T::is_always_unique() || is-layout-blas-packed<typename T::layout_type>);
+

8.2 Rank-k update functions in +synopsis

+
+

In the header <linalg> synopsis +[linalg.syn], change the following

+
+
  // rank-k symmetric matrix update
+  template<class Scalar, in-matrix InMat, possibly-packed-inout-matrix InOutMat, class Triangle>
+    void symmetric_matrix_rank_k_update(Scalar alpha, InMat A, InOutMat C, Triangle t);
+  template<class ExecutionPolicy, class Scalar,
+           in-matrix InMat, possibly-packed-inout-matrix InOutMat, class Triangle>
+    void symmetric_matrix_rank_k_update(ExecutionPolicy&& exec,
+                                        Scalar alpha, InMat A, InOutMat C, Triangle t);
+
+  template<in-matrix InMat, possibly-packed-inout-matrix InOutMat, class Triangle>
+    void symmetric_matrix_rank_k_update(InMat A, InOutMat C, Triangle t);
+  template<class ExecutionPolicy,
+           in-matrix InMat, possibly-packed-inout-matrix InOutMat, class Triangle>
+    void symmetric_matrix_rank_k_update(ExecutionPolicy&& exec,
+                                        InMat A, InOutMat C, Triangle t);
+
+  // rank-k Hermitian matrix update
+  template<class Scalar, in-matrix InMat, possibly-packed-inout-matrix InOutMat, class Triangle>
+    void hermitian_matrix_rank_k_update(Scalar alpha, InMat A, InOutMat C, Triangle t);
+  template<class ExecutionPolicy,
+           class Scalar, in-matrix InMat, possibly-packed-inout-matrix InOutMat, class Triangle>
+    void hermitian_matrix_rank_k_update(ExecutionPolicy&& exec,
+                                        Scalar alpha, InMat A, InOutMat C, Triangle t);
+
+  template<in-matrix InMat, possibly-packed-inout-matrix InOutMat, class Triangle>
+    void hermitian_matrix_rank_k_update(InMat A, InOutMat C, Triangle t);
+  template<class ExecutionPolicy,
+           in-matrix InMat, possibly-packed-inout-matrix InOutMat, class Triangle>
+    void hermitian_matrix_rank_k_update(ExecutionPolicy&& exec,
+                                        InMat A, InOutMat C, Triangle t);
+
+

to read as follows.

+
+
  // overwriting rank-k symmetric matrix update
+  template<class Scalar,
+           in-matrix InMat,
+           possibly-packed-out-matrix OutMat,
+           class Triangle>
+    void symmetric_matrix_rank_k_update(
+      Scalar alpha, InMat A, OutMat C, Triangle t);
+  template<class ExecutionPolicy, class Scalar,
+           in-matrix InMat,
+           possibly-packed-out-matrix OutMat,
+           class Triangle>
+    void symmetric_matrix_rank_k_update(
+      ExecutionPolicy&& exec, Scalar alpha, InMat A, OutMat C, Triangle t);
+  template<in-matrix InMat,
+           possibly-packed-out-matrix OutMat,
+           class Triangle>
+    void symmetric_matrix_rank_k_update(
+      InMat A, OutMat C, Triangle t);
+  template<class ExecutionPolicy,
+           in-matrix InMat,
+           possibly-packed-out-matrix OutMat,
+           class Triangle>
+    void symmetric_matrix_rank_k_update(
+      ExecutionPolicy&& exec, InMat A, OutMat C, Triangle t);
+
+  // updating rank-k symmetric matrix update
+  template<class Scalar,
+           in-matrix InMat1,
+           in-matrix InMat2,
+           possibly-packed-out-matrix OutMat,
+           class Triangle>
+    void symmetric_matrix_rank_k_update(
+      Scalar alpha,
+      InMat1 A, InMat2 E, OutMat C, Triangle t);
+  template<class ExecutionPolicy, class Scalar,
+           in-matrix InMat1,
+           in-matrix InMat2,
+           possibly-packed-out-matrix OutMat,
+           class Triangle>
+    void symmetric_matrix_rank_k_update(
+      ExecutionPolicy&& exec, Scalar alpha,
+      InMat1 A, InMat2 E, OutMat C, Triangle t);
+  template<in-matrix InMat1,
+           in-matrix InMat2,
+           possibly-packed-out-matrix OutMat,
+           class Triangle>
+    void symmetric_matrix_rank_k_update(
+      InMat1 A, InMat2 E, OutMat C, Triangle t);
+  template<class ExecutionPolicy,
+           in-matrix InMat1,
+           in-matrix InMat2,
+           possibly-packed-out-matrix OutMat,
+           class Triangle>
+    void symmetric_matrix_rank_k_update(
+      ExecutionPolicy&& exec,
+      InMat1 A, InMat2 E, OutMat C, Triangle t);
+
+  // overwriting rank-k Hermitian matrix update
+  template<class Scalar,
+           in-matrix InMat,
+           possibly-packed-out-matrix OutMat,
+           class Triangle>
+    void hermitian_matrix_rank_k_update(
+      Scalar alpha, InMat A, OutMat C, Triangle t);
+  template<class ExecutionPolicy, class Scalar,
+           in-matrix InMat,
+           possibly-packed-out-matrix OutMat,
+           class Triangle>
+    void hermitian_matrix_rank_k_update(
+      ExecutionPolicy&& exec, Scalar alpha, InMat A, OutMat C, Triangle t);
+  template<in-matrix InMat,
+           possibly-packed-out-matrix OutMat,
+           class Triangle>
+    void hermitian_matrix_rank_k_update(
+      InMat A, OutMat C, Triangle t);
+  template<class ExecutionPolicy,
+           in-matrix InMat,
+           possibly-packed-out-matrix OutMat,
+           class Triangle>
+    void hermitian_matrix_rank_k_update(
+      ExecutionPolicy&& exec, InMat A, OutMat C, Triangle t);
+
+  // updating rank-k Hermitian matrix update
+  template<class Scalar,
+           in-matrix InMat1,
+           in-matrix InMat2,
+           possibly-packed-out-matrix OutMat,
+           class Triangle>
+    void hermitian_matrix_rank_k_update(
+      Scalar alpha,
+      InMat1 A, InMat2 E, OutMat C, Triangle t);
+  template<class ExecutionPolicy, class Scalar,
+           in-matrix InMat1,
+           in-matrix InMat2,
+           possibly-packed-out-matrix OutMat,
+           class Triangle>
+    void hermitian_matrix_rank_k_update(
+      ExecutionPolicy&& exec, Scalar alpha,
+      InMat1 A, InMat2 E, OutMat C, Triangle t);
+  template<in-matrix InMat1,
+           in-matrix InMat2,
+           possibly-packed-out-matrix OutMat,
+           class Triangle>
+    void hermitian_matrix_rank_k_update(
+      InMat1 A, InMat2 E, OutMat C, Triangle t);
+  template<class ExecutionPolicy,
+           in-matrix InMat1,
+           in-matrix InMat2,
+           possibly-packed-out-matrix OutMat,
+           class Triangle>
+    void hermitian_matrix_rank_k_update(
+      ExecutionPolicy&& exec,
+      InMat1 A, InMat2 E, OutMat C, Triangle t);
+

8.3 Rank-2k update functions in +synopsis

+
+

In the header <linalg> synopsis +[linalg.syn], change the following

+
+
  // rank-2k symmetric matrix update
+  template<in-matrix InMat1, in-matrix InMat2,
+           possibly-packed-inout-matrix InOutMat, class Triangle>
+    void symmetric_matrix_rank_2k_update(InMat1 A, InMat2 B, InOutMat C, Triangle t);
+  template<class ExecutionPolicy,
+           in-matrix InMat1, in-matrix InMat2,
+           possibly-packed-inout-matrix InOutMat, class Triangle>
+    void symmetric_matrix_rank_2k_update(ExecutionPolicy&& exec,
+                                         InMat1 A, InMat2 B, InOutMat C, Triangle t);
+
+  // rank-2k Hermitian matrix update
+  template<in-matrix InMat1, in-matrix InMat2,
+           possibly-packed-inout-matrix InOutMat, class Triangle>
+    void hermitian_matrix_rank_2k_update(InMat1 A, InMat2 B, InOutMat C, Triangle t);
+  template<class ExecutionPolicy,
+           in-matrix InMat1, in-matrix InMat2,
+           possibly-packed-inout-matrix InOutMat, class Triangle>
+    void hermitian_matrix_rank_2k_update(ExecutionPolicy&& exec,
+                                         InMat1 A, InMat2 B, InOutMat C, Triangle t);
+
+

to read as follows.

+
+
  // overwriting rank-2k symmetric matrix update
+  template<in-matrix InMat1, in-matrix InMat2,
+           possibly-packed-out-matrix OutMat, class Triangle>
+    void symmetric_matrix_rank_2k_update(InMat1 A, InMat2 B, OutMat C, Triangle t);
+  template<class ExecutionPolicy,
+           in-matrix InMat1, in-matrix InMat2,
+           possibly-packed-out-matrix OutMat, class Triangle>
+    void symmetric_matrix_rank_2k_update(ExecutionPolicy&& exec,
+                                         InMat1 A, InMat2 B, OutMat C, Triangle t);
+
+  // updating rank-2k symmetric matrix update
+  template<in-matrix InMat1, in-matrix InMat2, in-matrix InMat3,
+           possibly-packed-out-matrix OutMat, class Triangle>
+    void symmetric_matrix_rank_2k_update(InMat1 A, InMat2 B, InMat3 E, OutMat C, Triangle t);
+  template<class ExecutionPolicy,
+           in-matrix InMat1, in-matrix InMat2, in-matrix InMat3,
+           possibly-packed-out-matrix OutMat, class Triangle>
+    void symmetric_matrix_rank_2k_update(ExecutionPolicy&& exec,
+                                         InMat1 A, InMat2 B, InMat3 E, OutMat C, Triangle t);
+
+  // overwriting rank-2k Hermitian matrix update
+  template<in-matrix InMat1, in-matrix InMat2,
+           possibly-packed-out-matrix OutMat, class Triangle>
+    void hermitian_matrix_rank_2k_update(InMat1 A, InMat2 B, OutMat C, Triangle t);
+  template<class ExecutionPolicy,
+           in-matrix InMat1, in-matrix InMat2,
+           possibly-packed-out-matrix OutMat, class Triangle>
+    void hermitian_matrix_rank_2k_update(ExecutionPolicy&& exec,
+                                         InMat1 A, InMat2 B, OutMat C, Triangle t);
+
+  // updating rank-2k Hermitian matrix update
+  template<in-matrix InMat1, in-matrix InMat2, in-matrix InMat3,
+           possibly-packed-out-matrix OutMat, class Triangle>
+    void hermitian_matrix_rank_2k_update(InMat1 A, InMat2 B, InMat3 E, OutMat C, Triangle t);
+  template<class ExecutionPolicy,
+           in-matrix InMat1, in-matrix InMat2, in-matrix InMat3,
+           possibly-packed-out-matrix OutMat, class Triangle>
+    void hermitian_matrix_rank_2k_update(ExecutionPolicy&& exec,
+                                         InMat1 A, InMat2 B, InMat3 E, OutMat C, Triangle t);
+

8.4 Specification of rank-k update +functions

+
+

Replace the entire contents of [linalg.algs.blas3.rankk] with the +following.

+
+

[Note: These functions correspond to the BLAS functions +xSYRK and xHERK. – end note]

+

1 +The following elements apply to all functions in +[linalg.algs.blas3.rankk].

+

2 +Mandates:

+
    +
  • (2.1) If +OutMat has layout_blas_packed layout, then the +layout’s Triangle template argument has the same type as +the function’s Triangle template argument.

  • +
  • (2.2) +possibly-multipliable<decltype(A), decltype(transposed(A)), decltype(C)> +is true.

  • +
  • (2.3) +possibly-addable<decltype(C), decltype(E), decltype(C)> +is true for those overloads that take an E +parameter.

  • +
+

3 +Preconditions:

+
    +
  • (3.1) +multipliable(A, transposed(A), C) is +true. [Note: This implies that C is +square. – end note]

  • +
  • (3.2) +addable(C, E, C) is true +for those overloads that take an E parameter.

  • +
+

4 +Complexity: O( +A.extent(0) +A.extent(1) +A.extent(0) ).

+
template<class Scalar,
+         in-matrix InMat,
+         possibly-packed-out-matrix OutMat,
+         class Triangle>
+void symmetric_matrix_rank_k_update(
+  Scalar alpha,
+  InMat A,
+  OutMat C,
+  Triangle t);
+template<class ExecutionPolicy,
+         class Scalar,
+         in-matrix InMat,
+         possibly-packed-out-matrix OutMat,
+         class Triangle>
+void symmetric_matrix_rank_k_update(
+  ExecutionPolicy&& exec,
+  Scalar alpha,
+  InMat A,
+  OutMat C,
+  Triangle t);
+

5 +Effects: Computes C = αAAT, +where the scalar α is +alpha.

+
template<in-matrix InMat,
+         possibly-packed-out-matrix OutMat,
+         class Triangle>
+void symmetric_matrix_rank_k_update(
+  InMat A,
+  OutMat C,
+  Triangle t);
+template<class ExecutionPolicy,
+         in-matrix InMat,
+         possibly-packed-out-matrix OutMat,
+         class Triangle>
+void symmetric_matrix_rank_k_update(
+  ExecutionPolicy&& exec,
+  InMat A,
+  OutMat C,
+  Triangle t);
+

6 +Effects: Computes C = AAT.

+
template<class Scalar,
+         in-matrix InMat,
+         possibly-packed-out-matrix OutMat,
+         class Triangle>
+void hermitian_matrix_rank_k_update(
+  Scalar alpha,
+  InMat A,
+  OutMat C,
+  Triangle t);
+template<class ExecutionPolicy,
+         class Scalar,
+         in-matrix InMat,
+         possibly-packed-out-matrix OutMat,
+         class Triangle>
+void hermitian_matrix_rank_k_update(
+  ExecutionPolicy&& exec,
+  Scalar alpha,
+  InMat A,
+  OutMat C,
+  Triangle t);
+

7 +Effects: Computes C = αAAH, +where the scalar α is +alpha.

+
template<in-matrix InMat,
+         possibly-packed-out-matrix OutMat,
+         class Triangle>
+void hermitian_matrix_rank_k_update(
+  InMat A,
+  OutMat C,
+  Triangle t);
+template<class ExecutionPolicy,
+         in-matrix InMat,
+         possibly-packed-out-matrix OutMat,
+         class Triangle>
+void hermitian_matrix_rank_k_update(
+  ExecutionPolicy&& exec,
+  InMat A,
+  OutMat C,
+  Triangle t);
+

8 +Effects: Computes C = AAH.

+
template<class Scalar,
+         in-matrix InMat1,
+         in-matrix InMat2,
+         possibly-packed-out-matrix OutMat,
+         class Triangle>
+void symmetric_matrix_rank_k_update(
+  Scalar alpha,
+  InMat1 A,
+  InMat2 E,
+  OutMat C,
+  Triangle t);
+template<class ExecutionPolicy,
+         class Scalar,
+         in-matrix InMat1,
+         in-matrix InMat2,
+         possibly-packed-out-matrix OutMat,
+         class Triangle>
+void symmetric_matrix_rank_k_update(
+  ExecutionPolicy&& exec,
+  Scalar alpha,
+  InMat1 A,
+  InMat2 E,
+  OutMat C,
+  Triangle t);
+

9 +Effects: Computes C = E + αAAT, +where the scalar α is +alpha.

+
template<in-matrix InMat1,
+         in-matrix InMat2,
+         possibly-packed-out-matrix OutMat,
+         class Triangle>
+void symmetric_matrix_rank_k_update(
+  InMat1 A,
+  InMat2 E,
+  OutMat C,
+  Triangle t);
+template<class ExecutionPolicy,
+         in-matrix InMat1,
+         in-matrix InMat2,
+         possibly-packed-out-matrix OutMat,
+         class Triangle>
+void symmetric_matrix_rank_k_update(
+  ExecutionPolicy&& exec,
+  InMat1 A,
+  InMat2 E,
+  OutMat C,
+  Triangle t);
+

10 +Effects: Computes C = E + AAT.

+
template<class Scalar,
+         in-matrix InMat1,
+         in-matrix InMat2,
+         possibly-packed-out-matrix OutMat,
+         class Triangle>
+void hermitian_matrix_rank_k_update(
+  Scalar alpha,
+  InMat1 A,
+  InMat2 E,
+  OutMat C,
+  Triangle t);
+template<class ExecutionPolicy,
+         class Scalar,
+         in-matrix InMat1,
+         in-matrix InMat2,
+         possibly-packed-out-matrix OutMat,
+         class Triangle>
+void hermitian_matrix_rank_k_update(
+  ExecutionPolicy&& exec,
+  Scalar alpha,
+  InMat1 A,
+  InMat2 E,
+  OutMat C,
+  Triangle t);
+

11 +Effects: Computes C = E + αAAH, +where the scalar α is +alpha.

+
template<in-matrix InMat1,
+         in-matrix InMat2,
+         possibly-packed-out-matrix OutMat,
+         class Triangle>
+void hermitian_matrix_rank_k_update(
+  InMat1 A,
+  InMat2 E,
+  OutMat C,
+  Triangle t);
+template<class ExecutionPolicy,
+         in-matrix InMat1,
+         in-matrix InMat2,
+         possibly-packed-out-matrix OutMat,
+         class Triangle>
+void hermitian_matrix_rank_k_update(
+  ExecutionPolicy&& exec,
+  InMat1 A,
+  InMat2 E,
+  OutMat C,
+  Triangle t);
+

12 +Effects: Computes C = E + AAH.

+

8.5 Specification of rank-2k update +functions

+
+

Replace the entire contents of [linalg.algs.blas3.rank2k] with the +following.

+
+

[Note: These functions correspond to the BLAS functions +xSYR2K and xHER2K. – end note]

+

1 +The following elements apply to all functions in +[linalg.algs.blas3.rank2k].

+

2 +Mandates:

+
    +
  • (2.1) If +OutMat has layout_blas_packed layout, then the +layout’s Triangle template argument has the same type as +the function’s Triangle template argument;

  • +
  • (2.2) +possibly-multipliable<decltype(A), decltype(transposed(B)), decltype(C)> +is true.

  • +
  • (2.3) +possibly-multipliable<decltype(B), decltype(transposed(A)), decltype(C)> +is true.

  • +
  • (2.4) +possibly-addable<decltype(C), decltype(E), decltype(C)> +is true for those overloads that take an E +parameter.

  • +
+

3 +Preconditions:

+
    +
  • (3.1) +multipliable(A, transposed(B), C) is +true.

  • +
  • (3.2) +multipliable(B, transposed(A), C) is +true. [Note: This and the previous imply that +C is square. – end note]

  • +
  • (3.3) +addable(C, E, C) is true +for those overloads that take an E parameter.

  • +
+

4 +Complexity: O( +A.extent(0) +A.extent(1) +B.extent(0) )

+
template<in-matrix InMat1,
+         in-matrix InMat2,
+         possibly-packed-out-matrix OutMat,
+         class Triangle>
+void symmetric_matrix_rank_2k_update(
+  InMat1 A,
+  InMat2 B,
+  OutMat C,
+  Triangle t);
+template<class ExecutionPolicy,
+         in-matrix InMat1,
+         in-matrix InMat2,
+         possibly-packed-out-matrix OutMat,
+         class Triangle>
+void symmetric_matrix_rank_2k_update(
+  ExecutionPolicy&& exec,
+  InMat1 A,
+  InMat2 B,
+  OutMat C,
+  Triangle t);
+

5 +Effects: Computes C = ABT + BAT.

+
template<in-matrix InMat1,
+         in-matrix InMat2,
+         possibly-packed-out-matrix OutMat,
+         class Triangle>
+void hermitian_matrix_rank_2k_update(
+  InMat1 A,
+  InMat2 B,
+  OutMat C,
+  Triangle t);
+template<class ExecutionPolicy,
+         in-matrix InMat1,
+         in-matrix InMat2,
+         possibly-packed-out-matrix OutMat,
+         class Triangle>
+void hermitian_matrix_rank_2k_update(
+  ExecutionPolicy&& exec,
+  InMat1 A,
+  InMat2 B,
+  OutMat C,
+  Triangle t);
+

6 +Effects: Computes C = ABH + BAH.

+
template<in-matrix InMat1,
+         in-matrix InMat2,
+         in-matrix InMat3,
+         possibly-packed-out-matrix OutMat,
+         class Triangle>
+void symmetric_matrix_rank_2k_update(
+  InMat1 A,
+  InMat2 B,
+  InMat3 E,
+  OutMat C,
+  Triangle t);
+template<class ExecutionPolicy,
+         in-matrix InMat1,
+         in-matrix InMat2,
+         in-matrix InMat3,
+         possibly-packed-out-matrix OutMat,
+         class Triangle>
+void symmetric_matrix_rank_2k_update(
+  ExecutionPolicy&& exec,
+  InMat1 A,
+  InMat2 B,
+  InMat3 E,
+  OutMat C,
+  Triangle t);
+

7 +Effects: Computes C = E + ABT + BAT.

+
template<in-matrix InMat1,
+         in-matrix InMat2,
+         in-matrix InMat3,
+         possibly-packed-out-matrix OutMat,
+         class Triangle>
+void hermitian_matrix_rank_2k_update(
+  InMat1 A,
+  InMat2 B,
+  InMat3 E,
+  OutMat C,
+  Triangle t);
+template<class ExecutionPolicy,
+         in-matrix InMat1,
+         in-matrix InMat2,
+         in-matrix InMat3,
+         possibly-packed-out-matrix OutMat,
+         class Triangle>
+void hermitian_matrix_rank_2k_update(
+  ExecutionPolicy&& exec,
+  InMat1 A,
+  InMat2 B,
+  InMat3 E,
+  OutMat C,
+  Triangle t);
+

8 +Effects: Computes C = E + ABH + BAH.

+
+
+ + diff --git a/fix-rankk-rank2k/fix-rankk-rank2k.md b/fix-rankk-rank2k/fix-rankk-rank2k.md index 5774e04d..812220c3 100644 --- a/fix-rankk-rank2k/fix-rankk-rank2k.md +++ b/fix-rankk-rank2k/fix-rankk-rank2k.md @@ -1,23 +1,30 @@ --- title: "Fix C++26 by making the symmetric and Hermitian rank-k and rank-2k updates consistent with the BLAS" -document: P3371R0 +document: P3371R1 date: today audience: LEWG author: - name: Mark Hoemmen email: + - name: Ilya Burylov + email: toc: true --- # Authors * Mark Hoemmen (mhoemmen@nvidia.com) (NVIDIA) +* Ilya Burylov (burylov@gmail.com) (NVIDIA) # Revision history * Revision 0 to be submitted 2024-08-15 +* Revision 1 to be submitted 2024-09-15 + + * Add Ilya Burylov as coauthor + # Abstract The four `std::linalg` functions `symmetric_matrix_rank_k_update`, `hermitian_matrix_rank_k_update`, `symmetric_matrix_rank_2k_update`, and `hermitian_matrix_rank_2k_update` currently have behavior inconsistent with their corresponding BLAS (Basic Linear Algebra Subroutines) routines. Also, their behavior is inconsistent with `std::linalg`'s `matrix_product`, even though in mathematical terms they are special cases of a matrix-matrix product. From 448b3038355f1b189831d3eaac3674b3b1fa4ea3 Mon Sep 17 00:00:00 2001 From: Mark Hoemmen Date: Thu, 5 Sep 2024 11:13:15 -0600 Subject: [PATCH 04/17] P3371: More R1 updates * Reorganize and expand nonwording sections * Add "`C` may alias `E`" to all the new updating overloads of the symmetric and Hermitian rank-k and rank-2k functions --- fix-rankk-rank2k/P3371R1.html | 1471 ++++++++++++++++---------- fix-rankk-rank2k/fix-rankk-rank2k.md | 229 +++- 2 files changed, 1134 insertions(+), 566 deletions(-) diff --git a/fix-rankk-rank2k/P3371R1.html b/fix-rankk-rank2k/P3371R1.html index cc45751d..3862c286 100644 --- a/fix-rankk-rank2k/P3371R1.html +++ b/fix-rankk-rank2k/P3371R1.html @@ -4,7 +4,7 @@ - + Fix C++26 by making the symmetric and Hermitian rank-k and rank-2k updates consistent with the BLAS + + + + + + + +
+
+

Fix C++26 by making the +rank-1, rank-2, rank-k, and rank-2k updates consistent with the +BLAS

+ + + + + + + + + + + + + + + + + + +
Document #: P3371R2
Date: 2024-10-07
Project: Programming Language C++
+ LEWG
+
Reply-to: + Mark Hoemmen
<>
+ Ilya Burylov
<>
+
+ +
+
+
+

Contents

+ +
+

1 Authors

+
    +
  • Mark Hoemmen (mhoemmen@nvidia.com) (NVIDIA)
  • +
  • Ilya Burylov (burylov@gmail.com) (NVIDIA)
  • +
+

2 Revision history

+
    +
  • Revision 0 to be submitted 2024-08-15

  • +
  • Revision 1 to be submitted 2024-09-15

    +
      +
    • Main wording changes from R0:

      +
        +
      • Instead of just changing the symmetric and Hermitian rank-k and +rank-2k updates to have both overwriting and updating overloads, change +all the update functions in this way: rank-1 (general, +symmetric, and Hermitian), rank-2, rank-k, and rank-2k

      • +
      • For all the new updating overloads, specify that C +(or A) may alias E

      • +
      • For the new symmetric and Hermitian updating overloads, specify +that the functions access the new E parameter in the same +way (e.g., with respect to the lower or upper triangle) as the +C (or A) parameter

      • +
      • Add exposition-only concept noncomplex to +constrain a scaling factor to be noncomplex, as needed for Hermitian +rank-1 and rank-k functions

      • +
    • +
    • Add Ilya Burylov as coauthor

    • +
    • Change title and abstract to express the wording changes

    • +
    • Add nonwording section explaining why we change rank-1 and rank-2 +updates to be consistent with rank-k and rank-2k updates

    • +
    • Add nonwording sections explaining why we don’t change +hermitian_matrix_vector_product, +hermitian_matrix_product, +triangular_matrix_product, or the triangular +solves

    • +
    • Add nonwording section explaining why we constrain some scaling +factors to be noncomplex at compile time, instead of taking a run-time +approach

    • +
    • Reorganize and expand nonwording sections

    • +
  • +
  • Revision 2 to be submitted 2024-10-15

    +
      +
    • For Hermitian matrix rank-1 and rank-k updates, do not constrain the +type of the scaling factor alpha, as R1 did. Instead, +define the algorithms to use +real-if-needed(alpha). Remove +exposition-only concept noncomplex.
    • +
  • +
+

3 Abstract

+

We propose the following changes to [linalg] that improve consistency +of the rank-1, rank-2, rank-k, and rank-2k update functions with the +BLAS.

+
    +
  1. Add “updating” overloads to all the rank-1, rank-2, rank-k, and +rank-2k update functions: general, symmetric, and Hermitian. The new +overloads are analogous to the updating overloads of +matrix_product. For example, +symmetric_matrix_rank_k_update(A, scaled(beta, C), C, upper_triangle) +will perform C := βC + AAT. +This makes the functions consistent with the BLAS’s behavior for nonzero +beta, and also more consistent with the behavior of +matrix_product (of which they are mathematically a special +case).

  2. +
  3. Change the behavior of all the existing rank-1, rank-2, rank-k, +and rank-2k update functions (general, symmetric, and Hermitian) to be +“overwriting” instead of “unconditionally updating.” For example, +symmetric_matrix_rank_k_update(A, C, upper_triangle) will +perform C = AAT +instead of C := C + AAT. +This makes them consistent with the BLAS’s behavior when +beta is zero.

  4. +
  5. For the overloads of hermitian_rank_1_update and +hermitian_rank_k_update that have an alpha +scaling factor parameter, only use +real-if-needed(alpha) in the update. +This ensures that the update will be mathematically Hermitian, and makes +the behavior well defined if alpha has nonzero imaginary +part. The change is also consistent with our proposed resolution for LWG +4136 (“Specify behavior of [linalg] Hermitian algorithms on diagonal +with nonzero imaginary part”).

  6. +
+

Items (2) and (3) are breaking changes to the current Working Draft. +Thus, we must finish this before finalization of C++26.

+

4 Discussion of proposed changes

+

4.1 Support both overwriting and +updating rank-k and rank-2k updates

+

4.1.1 BLAS supports scaling factor +beta; std::linalg currently does not

+

Each function in any section whose label begins with “linalg.algs” +generally corresponds to one or more routines or functions in the +original BLAS (Basic Linear Algebra Subroutines). Every computation that +the BLAS can do, a function in the C++ Standard Library should be able +to do.

+

One std::linalg user +reported +an exception to this rule. The BLAS routine DSYRK +(Double-precision SYmmetric Rank-K update) computes C := βC + αAAT, +but the corresponding std::linalg function +symmetric_matrix_rank_k_update only computes C := C + αAAT. +That is, std::linalg currently has no way to express this +BLAS operation with a general β scaling factor. This issue applies +to all of the symmetric and Hermitian rank-k and rank-2k update +functions.

+
    +
  • symmetric_matrix_rank_k_update: computes C := C + αAAT
  • +
  • hermitian_matrix_rank_k_update: computes C := C + αAAH
  • +
  • symmetric_matrix_rank_2k_update: computes C := C + αABH + αBAH
  • +
  • hermitian_matrix_rank_2k_update: computes C := C + αABH + ᾱBAH, +where ᾱ denotes the complex +conjugate of α
  • +
+

4.1.2 Make these functions +consistent with general matrix product

+

These functions implement special cases of matrix-matrix products. +The matrix_product function in std::linalg +implements the general case of matrix-matrix products. This function +corresponds to the BLAS’s SGEMM, DGEMM, +CGEMM, and ZGEMM, which compute C := βC + αAB, +where α and β are scaling factors. The +matrix_product function has two kinds of overloads:

+
    +
  1. overwriting (C = AB) +and

  2. +
  3. updating (C = E + AB).

  4. +
+

The updating overloads handle the general α and β case by +matrix_product(scaled(alpha, A), B, scaled(beta, C), C). +The specification explicitly permits the input +scaled(beta, C) to alias the output C +([linalg.algs.blas3.gemm] 10: “Remarks: +C may alias E”). The std::linalg +library provides overwriting and updating overloads so that it can do +everything that the BLAS does, just in a more idiomatically C++ way. +Please see +P1673R13 +Section 10.3 (“Function argument aliasing and zero scalar +multipliers”) for a more detailed explanation.

+

4.1.3 Fix requires changing +behavior of existing overloads

+

The problem with the current symmetric and Hermitian rank-k and +rank-2k functions is that they have the same calling syntax as +the overwriting version of matrix_product, but +semantics that differ from both the overwriting and the +updating versions of matrix_product. For example,

+
hermitian_matrix_rank_k_update(alpha, A, C);
+

updates C with C + αAAH, +but

+
matrix_product(scaled(alpha, A), conjugate_transposed(A), C);
+

overwrites C with αAAH. +The current rank-k and rank-2k overloads are not overwriting, so we +can’t just fix this problem by introducing an “updating” overload for +each function.

+

Incidentally, the fact that these functions have “update” in their +name is not relevant, because that naming choice is original to the +BLAS. The BLAS calls its corresponding xSYRK, +xHERK, xSYR2K, and xHER2K +routines “{Symmetric, Hermitian} rank {one, two} update,” even though +setting β = 0 makes these +routines “overwriting” in the sense of std::linalg.

+

4.1.4 Add new updating overloads; +make existing ones overwriting

+

We propose to fix this by making the functions work just like +matrix_vector_product or matrix_product. This +entails three changes.

+
    +
  1. Add two new exposition-only concepts +possibly-packed-in-matrix and +possibly-packed-out-matrix for constraining input +and output parameters of the changed or new symmetric and Hermitian +update functions.

  2. +
  3. Add “updating” overloads of the symmetric and Hermitian rank-k +and rank-2k update functions.

    +
      +
    1. The updating overloads take a new input matrix parameter +E, analogous to the updating overloads of +matrix_product, and make C an output parameter +instead of an in/out parameter. For example, +symmetric_matrix_rank_k_update(A, E, C, upper_triangle) +computes C = E + AAT.

    2. +
    3. Explicitly permit C and E to alias, +thus permitting the desired case where E is +scaled(beta, C).

    4. +
    5. The updating overloads take E as a +possibly-packed-in-matrix, and take C +as a possibly-packed-out-matrix (instead of a +possibly-packed-inout-matrix).

    6. +
    7. E must be accessed as a symmetric or Hermitian +matrix (depending on the function name) and such accesses must use the +same triangle as C. (The existing [linalg.general] 4 +wording for symmetric and Hermitian behavior does not cover +E.)

    8. +
  4. +
  5. Change the behavior of the existing symmetric and Hermitian +rank-k and rank-2k overloads to be overwriting instead of updating.

    +
      +
    1. For example, +symmetric_matrix_rank_k_update(A, C, upper_triangle) will +compute C = AAT +instead of C := C + AAT.

    2. +
    3. Change C from a +possibly-packed-inout-matrix to a +possibly-packed-out-matrix.

    4. +
  6. +
+

Items (2) and (3) are breaking changes to the current Working Draft. +This needs to be so that we can provide the overwriting behavior C := αAAT +or C := αAAH +that the corresponding BLAS routines already provide. Thus, we must +finish this before finalization of C++26.

+

Both sets of overloads still only write to the specified triangle +(lower or upper) of the output matrix C. As a result, the +new updating overloads only read from that triangle of the input matrix +E. Therefore, even though E may be a different +matrix than C, the updating overloads do not need an +additional Triangle t_E parameter for E. The +symmetric_* functions interpret E as symmetric +in the same way that they interpret C as symmetric, and the +hermitian_* functions interpret E as Hermitian +in the same way that they interpret C as Hermitian. +Nevertheless, we do need new wording to explain how the functions may +interpret and access E.

+

4.2 Change rank-1 and rank-2 +updates to be consistent with rank-k and rank-2k

+
    +
  1. Rank-1 and rank-2 updates currently unconditionally update and do +not take a β scaling +factor.

  2. +
  3. We propose making all the rank-1 and rank-2 update functions +consistent with the proposed change to the rank-k and rank-2k updates. +This means both changing the meaning of the current overloads to be +overwriting, and adding new overloads that are updating. This includes +general (nonsymmetric), symmetric, and Hermitian rank-1 update +functions, as well as symmetric and Hermitian rank-2 update +functions.

  4. +
  5. As a result, the exposition-only concept +possibly-packed-inout-matrix is no longer needed. +We propose removing it.

  6. +
+

4.2.1 Current std::linalg +behavior

+

The rank-k and rank-2k update functions have the following rank-1 and +rank-2 analogs, where A +denotes a symmetric or Hermitian matrix (depending on the function’s +name) and x and y denote vectors.

+
    +
  • symmetric_matrix_rank_1_update: computes A := A + αxxT
  • +
  • hermetian_matrix_rank_1_update: computes A := A + αxxH
  • +
  • symmetric_matrix_rank_2_update: computes A := A + αxyT + αyxT
  • +
  • hermitian_matrix_rank_2_update: computes A := A + αxyH + ᾱxyH
  • +
+

These functions unconditionally update the matrix A. They do not have an overwriting +option. In this, they follow the “general” (not necessarily symmetric or +Hermitian) rank-1 update functions.

+
    +
  • matrix_rank_1_update: computes A := A + xyT
  • +
  • matrix_rank_1_update_c: computes A := A + xyH
  • +
+

4.2.2 Current behavior is +inconsistent with BLAS Standard and rank-k and rank-2k updates

+

These six rank-1 and rank-2 update functions map to BLAS routines as +follows.

+
    +
  • matrix_rank_1_update: xGER
  • +
  • matrix_rank_1_update: xGERC
  • +
  • symmetric_matrix_rank_1_update: xSYR, +xSPR
  • +
  • hermitian_matrix_rank_1_update: xHER, +xHPR
  • +
  • hermitian_matrix_rank_1_update: xSYR2, +xSPR2
  • +
  • hermitian_matrix_rank_1_update: xHER2, +xHPR2
  • +
+

The Reference BLAS and the BLAS Standard (see Chapter 2, pp. 64 - 68) +differ here. The Reference BLAS and the original 1988 BLAS 2 paper +specify all of the rank-1 and rank-2 update routines listed above as +unconditionally updating, and not taking a β scaling factor. However, the +(2002) BLAS Standard specifies all of these rank-1 and rank-2 update +functions as taking a β +scaling factor. We consider the latter to express our design intent. It +is also consistent with the corresponding rank-k and rank-2k update +functions in the BLAS, which all take a β scaling factor and thus can do +either overwriting (with zero β) or updating (with nonzero β). These routines include +xSYRK, xHERK, xSYR2K, and +xHER2K. One could also include the general matrix-matrix +product xGEMM among these, as xGEMM also takes +a β scaling factor.

+

4.2.3 This change would remove a +special case in std::linalg’s design

+

Section +10.3 of P1673R13 explains the three ways that the std::linalg design +translates Fortran INTENT(INOUT) arguments into a C++ +idiom.

+
    +
  1. Provide in-place and not-in-place overloads for triangular solve +and triangular multiply.

  2. +
  3. “Else, if the BLAS function unconditionally updates (like +xGER), we retain read-and-write behavior for that +argument.”

  4. +
  5. “Else, if the BLAS function uses a scalar beta argument to decide +whether to read the output argument as well as write to it (like +xGEMM), we provide two versions: a write-only version (as +if beta is zero), and a read-and-write version (as if +beta is nonzero).”

  6. +
+

Our design goal was for functions with vector or matrix output to +imitate std::transform as much as possible. This favors Way +(3) as the default approach, which turns INTENT(INOUT) +arguments on the Fortran side into separate input and output parameters +on the C++ side. Way (2) is really an awkward special case. The BLAS +Standard effectively eliminates this special case on the Fortran side by +making the rank-1 and rank-2 updates work just like the rank-k and +rank-2k updates, with a β +scaling factor. This makes it natural to eliminate the Way (2) special +case on the C++ side as well.

+

4.2.4 Exposition-only concept no +longer needed

+

These changes make the exposition-only concept +possibly-packed-inout-matrix superfluous. We +propose removing it.

+

Note that this would not eliminate all uses of the exposition-only +concept inout-matrix. The in-place triangular +matrix product functions triangular_matrix_left_product and +triangular_matrix_right_product, and the in-place +triangular linear system solve functions +triangular_matrix_matrix_left_solve and +triangular_matrix_matrix_right_solve would still use +inout-matrix.

+

4.3 Use only the real part of +scaling factor alpha for Hermitian matrix rank-1 and rank-k +updates

+

For Hermitian rank-1 and rank-k matrix updates, if users provide a +scaling factor alpha, it must have zero imaginary part. +Otherwise, the matrix update will not be Hermitian. Neither the C++ +Working Draft nor any other proposal in flight requires this. We propose +fixing it by making these update algorithms only use the real part of +alpha, as in +real-if-needed(alpha). This solution +is consistent with our proposed resolution of +LWG Issue 4136, +“Specify behavior of [linalg] Hermitian algorithms on diagonal with +nonzero imaginary part,” where we make Hermitian rank-1 and rank-k +matrix updates use only the real part of matrices’ diagonals.

+

We can think of at least three alternative solutions, and will +explain here why we did not choose them. The first is to constrain +alpha to have a noncomplex number type, the second is to +exclude std::complex<T> but not try to define a +“noncomplex number” constraint, and the third is to impose a +precondition that the imaginary part of alpha is zero.

+

4.3.1 Justification: Scaling factor +needs to be noncomplex, else update may be non-Hermitian

+

The C++ Working Draft already has Scalar alpha overloads +of hermitian_rank_k_update. The Scalar type +currently can be complex. However, if alpha has nonzero +imaginary part, then αAAH +may no longer be a Hermitian matrix, even though AAH is +mathematically always Hermitian. For example, if A is the identity matrix (with ones +on the diagonal and zeros elsewhere) and α = i (the imaginary unit, +which is the square root of negative one), then αAAH +is the diagonal matrix whose diagonal elements are all i. While that matrix is symmetric, +it is not Hermitian, because all elements on the diagonal of a Hermitian +matrix must have nonzero imaginary part. The rank-1 update function +hermitian_rank_1_update has the analogous issue.

+

4.3.2 Alternatives

+

We can think of at least four different ways to solve this problem, +and will explain why we did not choose those solutions.

+
    +
  1. Constrain alpha by defining a generic “noncomplex +number type” constraint

  2. +
  3. Only constrain alpha not to be +std::complex<T>; do not try to define a generic +“noncomplex number” constraint

  4. +
  5. Constrain alpha by default using a generic +“noncomplex number type” constraint, but let users specialize an +“opt-out trait” to tell the library that their number type is +noncomplex

  6. +
  7. Impose a precondition that the imaginary part of +alpha is zero

  8. +
+

If we had to pick one of these solutions, we would favor first (4), +then (3), and then (1). We would object most to (2).

+

4.3.2.1 Alternative: Constrain +alpha via generic “noncomplex” constraint

+

The BLAS solves this problem by having the Hermitian rank-1 update +routines xHER and rank-k update routines xHERK +take the scaling factor α as a +noncomplex number. This suggests constraining alpha’s type +to be noncomplex. However, [linalg] accepts user-defined number types, +including user-defined complex number types. How do we define a +“noncomplex number type”? If we get it wrong and say that a number type +is complex when its “imaginary part” is always zero, we end up excluding +a perfectly valid custom number type.

+

For number types that have an additive identity (like zero for the +integers and reals), it’s mathematically correct to treat those as +“complex numbers” whose imaginary parts are always zero. This is what +conjugated_accessor does if you give it a number type for +which ADL cannot find conj. P3050 optimizes +conjugated for such number types by bypassing +conjugated_accessor and using default_accessor +instead, but this is just a code optimization. Therefore, it can afford +to be conservative: if a number type might be complex, then +conjugated needs to use conjugated_accessor. +This is why P3050’s approach doesn’t apply here. If a number type has +ADL-findable conj, real, and +imag, then it might be complex. However, if we +define the opposite of that as “noncomplex,” then we might be preventing +users from using otherwise reasonable number types.

+

The following MyRealNumber example always has zero +imaginary part, but nevertheless has ADL-findable conj, +real, and imag. Furthermore, it has a +constructor for which MyRealNumber(1.2, 3.4) is +well-formed. (This is an unfortunate design choice; making +precision have class type that is not implicitly +convertible from double would be wiser, so that users would +have to type MyRealNumber(1.2, Precision(42)).) As a +result, there is no reasonable way to tell at compile time if +MyRealNumber might represent a complex number.

+
class MyRealNumber {
+public:
+  explicit MyRealNumber(double initial_value);
+  // precision represents the amount of storage
+  // used by an arbitrary-precision real number object
+  explicit MyRealNumber(double initial_value, int precision);
+
+  MyRealNumber(); // result is the additive identity "zero"
+
+  // ... other members to make MyRealNumber regular ...
+  // ... hidden friend overloaded arithmetic operators ...
+
+  friend MyRealNumber conj(MyRealNumber x) { return  x; }
+  friend MyRealNumber real(MyRealNumber x) { return  x; }
+  friend MyRealNumber imag(MyRealNumber)   { return {}; }
+};
+

It’s reasonable to write custom noncomplex number types that define +ADL-findable conj, real, and +imag. First, users may want to write or use libraries of +generic numerical algorithms that work for both complex and noncomplex +number types. P1673 argues that defining conj to be +type-preserving (unlike std::conj in the C++ Standard +Library) makes this possible. For example, Appendix A below shows how to +implement a generic two-norm absolute value (or magnitude, for complex +numbers) function, using this interface. Second, the Standard Library +might lead users to write a type like this. std::conj +accepts arguments of any integer or floating-point type, none of which +represent complex numbers. The Standard makes the unfortunate choice for +std::conj of an integer type to return +std::complex<double>. However, users who try to make +conj(MyRealNumber) return +std::complex<MyRealNumber> would find out that +std::complex<MyRealNumber> does not compile, because +std::complex<T> requires that T be a +floating-point type. The least-effort next step would be to make +conj(MyRealNumber) return MyRealNumber.

+

We want rank-1 and rank-k Hermitian matrix updates to work with types +like MyRealNumber, but any reasonable way to constrain the +type of alpha would exclude MyRealNumber.

+

4.3.2.2 Alternative: Only constrain +alpha not to be std::complex

+

A variant of this suggestion would be only to constrain +alpha not to be std::complex<T>, and not +try to define a generic “noncomplex number” constraint. However, this +would break generic numerical algorithms by making valid code for the +non-Standard complex number case invalid code for +std::complex<T>. We do not want [linalg] to treat +custom complex number types differently than +std::complex.

+

4.3.2.3 Alternative: Constrain +alpha by default, but let users “opt out”

+

Another option would be to constrain alpha by default +using the same generic “noncomplex number type” constraint as in Option +(1), but let users specialize an “opt-out trait” to tell the library +that their number type is noncomplex. We would add a new “tag” type +trait is_noncomplex that users can specialize. By default, +is_noncomplex<T>::value is false for any +type T. This does not mean that the type is +complex, just that the user declares their type to be noncomplex. The +distinction matters, because a noncomplex number type might still +provide ADL-findable conj, real, and +imag, as we showed above. Users must take positive action +to declare their type U as “not a complex number type,” by +specializing is_noncomplex<U> so that +is_noncomplex<U>::value is true. If +users do that, then the library will ignore any ADL-findable functions +conj, real, and imag (whether or +not they exist), and will assume that the number type is noncomplex.

+

Standard Library precedent for this approach is in heterogeneous +lookup for associative containers (see N3657 for ordered associative +containers, and P0919 and P1690 for unordered containers). User-defined +hash functions and key equality comparison functions can tell the +container to provide heterogeneous comparisons by exposing a +static constexpr bool is_transparent whose value is +true. Default behavior does not expose heterogeneous +comparisons. Thus, users must opt in at compile time to assert something +about their user-defined types.

+

Of the three constraint-based approaches discussed in this proposal, +we favor this one the most. It still treats types “as they are” and does +not permit users to claim that a type is complex when it lacks the +needed operations, but it lets users optimize by giving the Standard +Library a single bit of compile-time information. By default, any linear +algebra value type (see [linalg.reqs.val]) that meets the +maybe_complex concept below would be considered “possibly +complex.” Types that do not meet this concept would result in +compilation errors; users would then be able to search documentation or +web pages to find out that they need to specialize +is_noncomplex.

+
template<class T>
+concept maybe_complex =
+  std::semiregular<T> &&
+  requires(T t) {
+    {conj(t)} -> T;
+    {real(t)} -> std::convertible_to<T>;
+    {imag(t)} -> std::convertible_to<T>;
+

P1673 generally avoids approaches based on specializing traits. Its +design philosophy favors treating types as they are. Users should not +need to do something to get correct behavior. We based this off our past +experiences in generic numerical algorithms development. In the 2010’s, +one of the authors maintained a generic mathematical algorithms library +called Trilinos. The Teuchos (pronounced “TEFF-os”) package of Trilinos +provides a monolithic ScalarTraits class template that +defines different properties of a number type. It combines the features +of std::numeric_limits with generic complex arithmetic +operations like conjugate, real, and +imag. Trilinos’ generic algorithms assume that number types +are regular and define overloaded +, -, +*, and /, but use +ScalarTraits<T>::conjugate, +ScalarTraits<T>::real, and +ScalarTraits<T>::imag. As a result, users with a +custom complex number type had to specialize ScalarTraits +and provide all these operations. Even if users had imitated +std::complex’s interface perfectly and provided +ADL-findable conj, real, and +imag, users had to do extra work to make Trilinos compile +and run correctly for their numbers. With P1673, we decided instead that +users who define a custom complex number type with an interface +sufficiently like std::complex should get reasonable +behavior without needing to do anything else.

+

As a tangent, we would like to comment on the monolithic design of +Teuchos::ScalarTraits. The monolithic design was partly an +imitation of std::numeric_limits, and partly a side effect +of a requirement to support pre-C++11 compilers that did not permit +partial specialization of function templates. (The typical pre-C++11 +work-around is to define an unspecialized function template that +dispatches to a possibly specialized class template.) C++11 permits +partial specialization of function templates and C++14 introduces +variable templates; these features have encouraged “breaking up” +monolithic traits classes into separate traits. Our paper P1370R1 +(“Generic numerical algorithm development with(out) +numeric_limits”) aligns with this trend.

+

4.3.2.4 Alternative: Impose +precondition on alpha

+

Another option would be to impose a precondition that +imag-if-needed(alpha) is zero. +However, this would be inconsistent with our proposed resolution of +LWG Issue 4136, +“Specify behavior of [linalg] Hermitian algorithms on diagonal with +nonzero imaginary part”. WG21 members have expressed wanting +fewer preconditions and less undefined behavior in the +Standard Library.

+

If users call Hermitian matrix rank-1 or rank-k updates with +alpha being std::complex<float> or +std::complex<double>, implementations of [linalg] +that call an underlying C or Fortran BLAS would have to get the real +part of alpha anyway, because these BLAS routines only take +alpha as a real type. Thus, our proposed solution – to +define the behavior of the update algorithms as using +real-if-needed(alpha) – would not add +overhead.

+

4.4 Things relating to scaling +factors that we do not propose changing

+

4.4.1 Nothing wrong with rank-2 or +rank-2k updates

+

This issue does not arise with the rank-2 or rank-2k +updates. In the BLAS, the rank-2 updates xHER2 and the +rank-2k updates xHER2K all take alpha as a +complex number. The corresponding [linalg] functions do not take a +separate alpha parameter, because users can pass it in via +scaled, attached either to A or +B: e.g., scaled(alpha, A). The matrix αABH + ᾱBAH +is Hermitian by construction, no matter the value of α.

+

4.4.2 Nothing wrong with scaling +factor beta

+

Both the current wording and the proposed changes to all these update +functions behave correctly with respect to beta.

+

For the new updating overloads of +hermitian_rank_1_update and +hermitian_rank_k_update, [linalg] expresses a +beta scaling factor by letting users supply +scaled(beta, C) as the argument for E. The +wording only requires that E be Hermitian. If +E is scaled(beta, C), this concerns only the +product of beta and C. It would be incorrect +to constrain beta or C separately. For +example, if β =  − i +and C is the matrix whose +elements are all i, then C is not Hermitian but βC (and therefore +scaled(beta, C)) is Hermitian.

+

This issue does not arise with the rank-2k updates. For +example, the BLAS routine xHER2K takes beta as +a real number. The previous paragraph’s reasoning for beta +applies here as well.

+

This issue also does not arise with the rank-2 updates. In the +Reference BLAS, the rank-2 update routines xHER2 do not +have a way to supply beta, while in the BLAS Standard, +xHER2 does take beta. The BLAS +Standard says that “α is a +complex scalar and and [sic] β +is a real scalar.” The Fortran 77 and C bindings specify the type of +beta as real (<rtype> resp. +RSCALAR_IN), but the Fortran 95 binding lists both +alpha and beta as +COMPLEX(<wp>). The type of beta in the +Fortran 95 is likely a typo, considering the wording.

+

4.4.3 Nothing wrong with Hermitian +matrix-vector and matrix-matrix products

+

In our view, the current behavior of +hermitian_matrix_vector_product and +hermitian_matrix_product is correct with respect to the +alpha scaling factor. These functions do not need extra +overloads that take Scalar alpha. Users who want to supply +alpha with nonzero imaginary part should not scale +the matrix A (as in scaled(alpha, A)). +Instead, they should scale the input vector x, as in the +following.

+
auto alpha = std::complex{0.0, 1.0};
+hermitian_matrix_vector_product(A, upper_triangle, scaled(alpha, x), y);
+

4.4.3.1 In BLAS, matrix is +Hermitian, but scaled matrix need not be

+

In Chapter 2 of the BLAS Standard, both xHEMV and +xHEMM take the scaling factors α and β as complex numbers +(COMPLEX<wp>, where <wp> +represents the current working precision). The BLAS permits +xHEMV or xHEMM to be called with +alpha whose imaginary part is nonzero. The matrix that the +BLAS assumes to be Hermitian is A, not αA. Even if A is Hermitian, αA might not necessarily be +Hermitian. For example, if A +is the identity matrix (diagonal all ones) and α is i, then αA is not Hermitian but +skew-Hermitian.

+

The current [linalg] wording requires that the input matrix be +Hermitian. This excludes using scaled(alpha, A) as the +input matrix, where alpha has nonzero imaginary part. For +example, the following gives mathematically incorrect results.

+
auto alpha = std::complex{0.0, 1.0};
+hermitian_matrix_vector_product(scaled(alpha, A), upper_triangle, x, y);
+

Note that the behavior of this is still otherwise well defined, at +least after applying the fix proposed in LWG4136 for diagonal elements +with nonzero imaginary part. It does not violate a precondition. +Therefore, the Standard has no way to tell the user that they did +something wrong.

+

4.4.3.2 Status quo permits scaling +via the input vector

+

The current wording permits introducing the scaling factor +alpha through the input vector.

+
auto alpha = std::complex{0.0, 1.0};
+hermitian_matrix_vector_product(A, upper_triangle, scaled(alpha, x), y);
+

This is fine as long as αAx equals Aαx, that is, as +long as α commutes with the +elements of A. This issue would only be of concern if those +multiplications might be noncommutative. We’re already well outside the +world of “types that do ordinary arithmetic with +std::complex.” This scenario would only arise given a +user-defined complex type, like +user_complex<user_noncommutative> in the example +below, whose real parts have noncommutative multiplication.

+
template<class T>
+class user_complex {
+public:
+  user_complex(T re, T im) : re_(re), im_(im) {}
+
+  // ... overloaded arithmetic operators ...
+
+  friend T real(user_complex<T> z) { return z.re_; }
+  friend T imag(user_complex<T> z) { return z.im_; } 
+  friend user_complex<T> conj(user_complex<T> z) {
+    return {real(z), -imag(z)};
+  }
+
+private:
+  T re_, im_;
+};
+
+auto alpha = user_complex<user_noncommutative>{something, something_else};
+hermitian_matrix_vector_product(N, upper_triangle, scaled(alpha, x), y);
+

The [linalg] library was designed to support element types with +noncommutative multiplication. On the other hand, generally, if we speak +of Hermitian matrices or even of inner products (which are used to +define Hermitian matrices), we’re working in a vector space. This means +that multiplication of the matrix’s elements is commutative. Anything +more general than that is far beyond what the BLAS can do. Thus, we +think restricting use of alpha with nonzero imaginary part +to scaled(alpha, x) is not so onerous.

+

4.4.3.3 Scaling via the input +vector is weird, but the least bad choice

+

Many users may not like the status quo of needing to scale +x instead of A. First, it differs from the +BLAS, which puts alpha before A in its +xHEMV and xHEMM function arguments. Second, +users would get no compile-time feedback and likely no run-time feedback +if they scale A instead of x. Their code would +compile and produce correct results for almost all matrix-vector or +matrix-matrix products, except for the Hermitian case, and +only when the scaling factor has a nonzero imaginary part. +However, we still think the status quo is the least bad choice. We +explain why by discussing the following alternatives.

+
    +
  1. Treat scaled(alpha, A) as a special case: expect +A to be Hermitian and permit alpha to have +nonzero imaginary part

  2. +
  3. Forbid scaled(alpha, A) at compile time, so that +users must scale x instead

  4. +
  5. Add overloads that take alpha, analogous to the +rank-1 and rank-k update functions

  6. +
+

The first choice is mathematically incorrect, as we will explain +below. The second is not incorrect, but could only catch some errors. +The third is likewise not incorrect, but would add a lot of overloads +for an uncommon use case, and would still not prevent users from scaling +the matrix in mathematically incorrect ways.

+
4.4.3.3.1 Bad choice: Special cases +for scaling the matrix
+

“Treat scaled(alpha, A) as a special case” actually +means three special cases, given some nested accessor type +Acc and a scaling factor alpha of type +Scalar.

+
    +
  1. scaled(alpha, A), whose accessor type is +scaled_accessor<Scalar, Acc>

  2. +
  3. conjugated(scaled(alpha, A)), whose accessor type is +conjugated_accessor<scaled_accessor<Scalar, Acc>>

  4. +
  5. scaled(alpha, conjugated(A)), whose accessor type is +scaled_accessor<Scalar, conjugated_accessor<Acc>>

  6. +
+

One could replace conjugated with +conjugate_transposed (which we expect to be more common in +practice) without changing the accessor type.

+

This approach violates the fundamental [linalg] principle that “… +each mdspan parameter of a function behaves as itself and +is not otherwise ‘modified’ by other parameters” (Section 10.2.5, +P1673R13). The behavior of [linalg] is agnostic of specific accessor +types, even though implementations likely have optimizations for +specific accessor types. [linalg] takes this approach for consistency, +even where it results in different behavior from the BLAS (see Section +10.5.2 of P1673R13). The application of this principle here is “the +input parameter A is always Hermitian.”

+

In this case, the consistency matters for mathematical correctness. +What if scaled(alpha, A) is Hermitian, but A +itself is not? An example is α =  − i and A is the 2 x 2 matrix whose elements +are all i. If we treat +scaled_accessor as a special case, then +hermitian_matrix_vector_product will compute different +results.

+

Another problem is that users are permitted to define their own +accessor types, including nested accessors. Arbitrary user accessors +might have scaled_accessor somewhere in that nesting, or +they might have the effect of scaled_accessor +without being called that. Thus, we can’t detect all possible ways that +the matrix might be scaled.

+
4.4.3.3.2 Not-so-good choice: +Forbid scaling the matrix at compile time
+

Hermitian matrix-matrix and matrix-vector product functions could +instead forbid scaling the matrix at compile time, at least for +the three accessor type cases listed in the previous option.

+
    +
  1. scaled_accessor<Scalar, Acc>

  2. +
  3. conjugated_accessor<scaled_accessor<Scalar, Acc>>

  4. +
  5. scaled_accessor<Scalar, conjugated_accessor<Acc>>

  6. +
+

Doing this would certainly encourage correct behavior for the most +common cases. However, as mentioned above, users are permitted to define +their own accessor types, including nested accessors. Thus, we can’t +detect all ways that an arbitrary, possibly user-defined accessor might +scale the matrix. Furthermore, scaling the matrix might still be +mathematically correct. A scaling factor with nonzero imaginary part +might even make the matrix Hermitian. Applying i as a scaling factor twice would +give a perfectly valid scaling factor of  − 1.

+
4.4.3.3.3 Not-so-good choice: Add +alpha overloads
+

One could imagine adding overloads that take alpha, +analogous to the rank-1 and rank-k update overloads that take +alpha. Users could then write

+
hermitian_matrix_vector_product(alpha, A, upper_triangle, x, y);
+

instead of

+
hermitian_matrix_vector_product(A, upper_triangle, scaled(alpha, x), y);
+

We do not support this approach. First, it would introduce many +overloads, without adding to what the existing overloads can accomplish. +(Users can already introduce the scaling factor alpha +through x.) Contrast this with the rank-1 and rank-k +Hermitian update functions, where not having alpha +overloads might break simple cases. Here are two examples.

+
    +
  1. If the matrix and vector element types and the scaling factor +α = 2 are all integers, then +the update C := C − 2xxH +can be computed using integer types and arithmetic with an +alpha overload. However, without an alpha +overload, the user would need to use scaled(sqrt(2), x) as +the input vector, thus forcing use of floating-point arithmetic that may +give inexact results.

  2. +
  3. The update C := C − xxH +would require resorting to complex arithmetic, as the only way to +express  − xxH +with the same scaling factor for both vectors is (ix)(ix)H.

  4. +
+

Second, alpha overloads would not prevent users from +also supplying scaled(gamma, A) as the matrix for +some other scaling factor gamma. Thus, instead of solving +the problem, the overloads would introduce more possibilities for +errors.

+

4.4.4 Triangular matrix products, +unit diagonals, and scaling factors

+
    +
  1. In BLAS, triangular matrix-vector and matrix-matrix products +apply alpha scaling to the implicit unit diagonal. In +[linalg], the scaling factor alpha is not applied to the +implicit unit diagonal. This is because the library does not interpret +scaled(alpha, A) differently than any other +mdspan.

  2. +
  3. Users of triangular matrix-vector products can recover BLAS +functionality by scaling the input vector instead of the input matrix, +so this only matters for triangular matrix-matrix products.

  4. +
  5. All calls of the BLAS’s triangular matrix-matrix product routine +xTRMM in LAPACK (other than in testing routines) use +alpha equal to one.

  6. +
  7. Straightforward approaches for fixing this issue would not break +backwards compatibility.

  8. +
  9. Therefore, we do not consider fixing this a high-priority issue, +and we do not propose a fix for it in this paper.

  10. +
+

4.4.4.1 BLAS applies alpha after +unit diagonal; linalg applies it before

+

The triangular_matrix_vector_product and +triangular_matrix_product algorithms have an +implicit_unit_diagonal option. This makes the algorithm not +access the diagonal of the matrix, and compute as if the diagonal were +all ones. The option corresponds to the BLAS’s “Unit” flag. BLAS +routines that take both a “Unit” flag and an alpha scaling +factor apply “Unit” before scaling by alpha, so +that the matrix is treated as if it has a diagonal of all +alpha values. In contrast, [linalg] follows the general +principle that scaled(alpha, A) should be treated like any +other kind of mdspan. As a result, algorithms interpret +implicit_unit_diagonal as applied to the matrix +after scaling by alpha, so that the matrix still +has a diagonal of all ones.

+

4.4.4.2 Triangular solve algorithms +not affected

+

The triangular solve algorithms in std::linalg are not affected, +because their BLAS analogs either do not take an alpha +argument (as with xTRSV), or the alpha +argument does not affect the triangular matrix (with xTRSM, +alpha affects the right-hand sides B, not the +triangular matrix A).

+

4.4.4.3 Triangular matrix-vector +product work-around

+

This issue only reduces functionality of +triangular_matrix_product. Users of +triangular_matrix_vector_product who wish to replicate the +original BLAS functionality can scale the input matrix (by supplying +scaled(alpha, x) instead of x as the input +argument) instead of the triangular matrix.

+

4.4.4.4 Triangular matrix-matrix +product example

+

The following example computes A := 2AB where +A is a lower triangular +matrix, but it makes the diagonal of A all ones on the input (right-hand) +side.

+
triangular_matrix_product(scaled(2.0, A), lower_triangle, implicit_unit_diagonal, B, A);
+

Contrast with the analogous BLAS routine DTRMM, which +has the effect of making the diagonal elements all 2.0.

+
dtrmm('Left side', 'Lower triangular', 'No transpose', 'Unit diagonal', m, n, 2.0, A, lda, B, ldb)
+

If we want to use [linalg] to express what the BLAS call expresses, +we need to perform a separate scaling.

+
triangular_matrix_product(A, lower_triangle, implicit_unit_diagonal, B, A);
+scale(2.0, A);
+

This is counterintuitive, and may also affect performance. +Performance of scale is typically bound by memory bandwidth +and/or latency, but if the work done by scale could be +fused with the work done by the triangular_matrix_product, +then scale’s memory operations could be “hidden” in the +cost of the matrix product.

+

4.4.4.5 LAPACK never calls +xTRMM with the implicit unit diagonal option and +alpha not equal to one

+

How much might users care about this missing [linalg] feature? +P1673R13 explains that the BLAS was codesigned with LAPACK and that +every reference BLAS routine is used by some LAPACK routine. “The BLAS +does not aim to provide a complete set of mathematical operations. Every +function in the BLAS exists because some LINPACK or LAPACK algorithm +needs it” (Section 10.6.1). Therefore, to judge the urgency of adding +new functionality to [linalg], we can ask whether the functionality +would be needed by a C++ re-implementation of LAPACK. We think not much, +because the highest-priority target audience of the BLAS is LAPACK +developers, and LAPACK routines (other than testing routines) never use +a scaling factor alpha other than one.

+

We survey calls to xTRMM in the latest version of LAPACK +as of the publication date of R1 of this proposal, LAPACK 3.12.0. It +suffices to survey DTRMM, the double-precision real case, +since for all the routines of interest, the complex case follows the +same pattern. (We did survey ZTRMM, the double-precision +complex case, just in case.) LAPACK has 24 routines that call +DTRMM directly. They fall into five categories.

+
    +
  1. Test routines: DCHK3, DCHKE, +DLARHS

  2. +
  3. Routines relating to QR factorization or using the result of a QR +factorization (especially with block Householder reflectors): +DGELQT3, DLARFB, DGEQRT3, +DLARFB_GETT, DLARZB, +DORM22

  4. +
  5. Routines relating to computing an inverse of a triangular matrix +or of a matrix that has been factored into triangular matrices: +DLAUUM, DTRITRI, DTFTRI, +DPFTRI

  6. +
  7. Routines relating to solving eigenvalue (or generalized +eigenvalue) problems: DLAHR2, DSYGST, +DGEHRD, DSYGV, DSYGV_2STAGE, +DSYGVD, DSYGVX (note that DLAQR5 +depends on DTRMM via EXTERNAL declaration, but +doesn’t actually call it)

  8. +
  9. Routines relating to symmetric indefinite factorizations: +DSYT01_AA, DSYTRI2X, +DSYTRI_3X

  10. +
+

The only routines that call DTRMM with +alpha equal to anything other than one or negative one are +the testing routines. Some calls in DGELQT3 and +DLARFB_GETT use negative one, but these calls never specify +an implicit unit diagonal (they use the explicit diagonal option). The +only routine that might possibly call DTRMM with both +negative one as alpha and the implicit unit diagonal is +DTFTRI. (This routine “computes the inverse of a triangular +matrix A stored in RFP [Rectangular Full Packed] format.” RFP format was +introduced to LAPACK in the late 2000’s, well after the BLAS Standard +was published. See +LAPACK +Working Note 199, which was published in 2008.) DTFTRI +passes its caller’s diag argument (which specifies either +implicit unit diagonal or explicit diagonal) to DTRMM. The +only two LAPACK routines that call DTFTRI are +DERRRFP (a testing routine) and DPFTRI. +DPFTRI only ever calls DTFTRI with +diag not specifying the implicit unit diagonal +option. Therefore, LAPACK never needs both alpha not equal +to one and the implicit unit diagonal option, so adding the ability to +“scale the implicit diagonal” in [linalg] is a low-priority feature.

+

4.4.4.6 Fixes would not break +backwards compatibility

+

We can think of two ways to fix this issue. First, we could add an +alpha scaling parameter, analogous to the symmetric and +Hermitian rank-1 and rank-k update functions. Second, we could add a new +kind of Diagonal template parameter type that expresses a +“diagonal value.” For example, implicit_diagonal_t{alpha} +(or a function form, implicit_diagonal(alpha)) would tell +the algorithm not to access the diagonal elements, but instead to assume +that their value is alpha. Both of these solutions would +let users specify the diagonal’s scaling factor separately from the +scaling factor for the rest of the matrix. Those two scaling factors +could differ, which is new functionality not offered by the BLAS. More +importantly, both of these solutions could be added later, after C++26, +without breaking backwards compatibility.

+

4.4.5 Triangular solves, unit +diagonals, and scaling factors

+
    +
  1. In BLAS, triangular solves with possibly multiple right-hand +sides (xTRSM) apply alpha scaling to the +implicit unit diagonal. In [linalg], the scaling factor +alpha is not applied to the implicit unit diagonal. This is +because the library does not interpret scaled(alpha, A) +differently than any other mdspan.

  2. +
  3. Users of triangular solves would need a separate +scale call to recover BLAS functionality.

  4. +
  5. LAPACK sometimes calls xTRSM with alpha +not equal to one.

  6. +
  7. Straightforward approaches for fixing this issue would not break +backwards compatibility.

  8. +
  9. Therefore, we do not consider fixing this a high-priority issue, +and we do not propose a fix for it in this paper.

  10. +
+

4.4.5.1 BLAS applies alpha after +unit diagonal; linalg applies it before

+

Triangular solves have a similar issue to the one explained in the +previous section. The BLAS routine xTRSM applies +alpha “after” the implicit unit diagonal, while std::linalg +applies alpha “before.” (xTRSV does not take +an alpha scaling factor.) As a result, the BLAS solves with +a different matrix than std::linalg.

+

In mathematical terms, xTRSM solves the equation α(A+I)X = B +for X, where A is the user’s input matrix +(without implicit unit diagonal) and I is the identity matrix (with ones +on the diagonal and zeros everywhere else). +triangular_matrix_matrix_left_solve solves the equation +(αA+I)Y = B +for Y. The two results X and Y are not equal in general.

+

4.4.5.2 Work-around requires +changing all elements of the matrix

+

Users could work around this problem by first scaling the matrix +A by α, and then solving for Y. In the common case where the +“other triangle” of A holds +another triangular matrix, users could not call +scale(alpha, A). They would instead need to iterate over +the elements of A manually. +Users might also need to “unscale” the matrix after the solve. Another +option would be to copy the matrix A before scaling.

+
for (size_t i = 0; i < A.extent(0); ++i) {
+  for (size_t j = i + 1; j < A.extent(1); ++j) {
+    A[i, j] *= alpha;
+  }
+}
+triangular_matrix_matrix_left_solve(A, lower_triangle, implicit_unit_diagonal, B, Y);
+for (size_t i = 0; i < A.extent(0); ++i) {
+  for (size_t j = i + 1; j < A.extent(1); ++j) {
+    A[i, j] /= alpha;
+  }
+}
+

Users cannot solve this problem by scaling B (either with +scaled(1.0 / alpha, B) or with +scale(1.0 / alpha, B)). Transforming X into Y or vice versa is mathematically +nontrivial in general, and may introduce new failure conditions. This +issue occurs with both the in-place and out-of-place triangular +solves.

+

4.4.5.3 Unsupported case occurs in +LAPACK

+

The common case in LAPACK is calling xTRSM with +alpha equal to one, but other values of alpha +occur. For example, xTRTRI calls xTRSM with +alpha equal to  − 1. Thus, +we cannot dismiss this issue, as we could with xTRMM.

+

4.4.5.4 Fixes would not break +backwards compatibility

+

As with triangular matrix products above, we can think of two ways to +fix this issue. First, we could add an alpha scaling +parameter, analogous to the symmetric and Hermitian rank-1 and rank-k +update functions. Second, we could add a new kind of +Diagonal template parameter type that expresses a “diagonal +value.” For example, implicit_diagonal_t{alpha} (or a +function form, implicit_diagonal(alpha)) would tell the +algorithm not to access the diagonal elements, but instead to assume +that their value is alpha. Both of these solutions would +let users specify the diagonal’s scaling factor separately from the +scaling factor for the rest of the matrix. Those two scaling factors +could differ, which is new functionality not offered by the BLAS. More +importantly, both of these solutions could be added later, after C++26, +without breaking backwards compatibility.

+

5 Ordering with respect to other +proposals and LWG issues

+

We currently have two other std::linalg fix papers in +review.

+
    +
  • P3222: Fix C++26 by adding transposed special cases +for P2642 layouts (forwarded by LEWG to LWG on 2024-08-27 pending +electronic poll results)

  • +
  • P3050: “Fix C++26 by optimizing linalg::conjugated +for noncomplex value types” (forwarded by LEWG to LWG on 2024-09-03 +pending electronic poll results)

  • +
+

LEWG was aware of these two papers and this pending paper P3371 in +its 2024-09-03 review of P3050R2. All three of these papers increment +the value of the __cpp_lib_linalg macro. While this +technically causes a conflict between the papers, advice from LEWG on +2024-09-03 was not to introduce special wording changes to avoid this +conflict.

+

We also have two outstanding LWG issues.

+
    +
  • LWG4136 +specifies the behavior of Hermitian algorithms on diagonal matrix +elements with nonzero imaginary part. (As the BLAS Standard specifies +and the Reference BLAS implements, the Hermitian algorithms do not +access the imaginary parts of diagonal elements, and assume they are +zero.) In our view, P3371 does not conflict with LWG4136.

  • +
  • LWG4137, +“Fix Mandates, Preconditions, and Complexity elements of [linalg] +algorithms,” affects several sections touched by this proposal, +including [linalg.algs.blas3.rankk] and [linalg.algs.blas3.rank2k]. We +consider P3371 rebased atop the wording changes proposed by LWG4137. +While the wording changes may conflict in a formal (“diff”) sense, it is +our view that they do not conflict in a mathematical or specification +sense.

  • +
+

6 Implementation status

+

The following function overload sets need changing.

+
    +
  • matrix_rank_1_update
  • +
  • matrix_rank_1_update_c
  • +
  • symmetric_matrix_rank_1_update
  • +
  • hermitian_matrix_rank_1_update
  • +
  • symmetric_matrix_rank_2_update
  • +
  • hermitian_matrix_rank_2_update
  • +
  • symmetric_matrix_rank_k_update
  • +
  • hermitian_matrix_rank_k_update
  • +
  • symmetric_matrix_rank_2k_update
  • +
  • hermitian_matrix_rank_2k_update
  • +
+

As of 2024/10/04, +Pull request +293 in the reference std::linalg implementation implements changes +to the following functions, and adds tests to ensure test coverage of +the new overloads.

+
    +
  • matrix_rank_1_update
  • +
  • matrix_rank_1_update_c
  • +
  • symmetric_matrix_rank_1_update
  • +
  • hermitian_matrix_rank_1_update
  • +
+

7 Acknowledgments

+

Many thanks (with permission) to Raffaele Solcà (CSCS Swiss National +Supercomputing Centre, raffaele.solca@cscs.ch) for pointing +out some of the issues fixed by this paper, as well as the issues +leading to LWG4137.

+

8 Wording

+
+

Text in blockquotes is not proposed wording, but rather instructions +for generating proposed wording. The � character is used to denote a +placeholder section number which the editor shall determine.

+

In [version.syn], for the following definition,

+
+
#define __cpp_lib_linalg YYYYMML // also in <linalg>
+
+

adjust the placeholder value YYYYMML as needed so as to +denote this proposal’s date of adoption.

+
+

8.1 New exposition-only concepts +for possibly-packed input and output matrices

+
+

Add the following lines to the header <linalg> +synopsis [linalg.syn], just after the declaration of +the exposition-only concept inout-matrix and +before the declaration of the exposition-only concept +possibly-packed-inout-matrix. [Editorial +Note: This addition is not shown in green, becuase the authors could +not convince Markdown to format the code correctly. – end +note]

+
+
  template<class T>
+    concept possibly-packed-in-matrix = see below;     // exposition only
+
+  template<class T>
+    concept possibly-packed-out-matrix = see below;     // exposition only
+
+

Then, remove the declaration of the exposition-only concept +possibly-packed-inout-matrix from the header +<linalg> synopsis [linalg.syn].

+
+
+

Then, add the following lines to +[linalg.helpers.concepts], just after the definition of +the exposition-only variable template +is-layout_blas_packed and just before the +definition of the exposition-only concept +possibly-packed-inout-matrix. [Editorial +Note: This addition is not shown in green, becuase the authors could +not convince Markdown to format the code correctly. – end +note]

+
+
  template<class T>
+    concept possibly-packed-in-matrix =
+      is-mdspan<T> && T::rank() == 2 &&
+      (T::is_always_unique() || is-layout-blas-packed<typename T::layout_type>);
+
+  template<class T>
+    concept possibly-packed-out-matrix =
+      is-mdspan<T> && T::rank() == 2 &&
+      is_assignable_v<typename T::reference, typename T::element_type> &&
+      (T::is_always_unique() || is-layout-blas-packed<typename T::layout_type>);
+
+

Then, remove the definition of the exposition-only concept +possibly-packed-inout-matrix from +[linalg.helpers.concepts].

+
+
+

Then, in [linalg.helpers.concepts], change paragraph 3 to read as +follows (new content “or +possibly-packed-out-matrix” in green; removed +content “or possibly-packed-inout-matrix” in +red).

+
+

Unless explicitly permitted, any inout-vector, +inout-matrix, inout-object, +out-vector, out-matrix, +out-object, or +possibly-packed-out-matrix, or +possibly-packed-inout-matrix parameter of a +function in [linalg] shall not overlap any other mdspan +parameter of the function.

+

8.2 Rank-1 update functions in +synopsis

+
+

In the header <linalg> synopsis +[linalg.syn], replace all the declarations of all the +matrix_rank_1_update, matrix_rank_1_update_c, +symmetric_matrix_rank_1_update, and +hermitian_matrix_rank_1_update overloads to read as +follows. [Editorial Note: There are two changes here. First, the +existing overloads become “overwriting” overloads. Second, new +“updating” overloads are added.

+

Changes do not use red or green highlighting, becuase the authors +could not convince Markdown to format the code correctly. – end +note]

+
+
  // [linalg.algs.blas2.rank1], nonsymmetric rank-1 matrix update
+
+  // overwriting nonsymmetric rank-1 matrix update
+  template<in-vector InVec1, in-vector InVec2, out-matrix OutMat>
+    void matrix_rank_1_update(InVec1 x, InVec2 y, OutMat A);
+  template<class ExecutionPolicy, in-vector InVec1, in-vector InVec2, out-matrix OutMat>
+    void matrix_rank_1_update(ExecutionPolicy&& exec,
+                              InVec1 x, InVec2 y, OutMat A);
+
+  template<in-vector InVec1, in-vector InVec2, out-matrix OutMat>
+    void matrix_rank_1_update_c(InVec1 x, InVec2 y, OutMat A);
+  template<class ExecutionPolicy, in-vector InVec1, in-vector InVec2, out-matrix OutMat>
+    void matrix_rank_1_update_c(ExecutionPolicy&& exec,
+                                InVec1 x, InVec2 y, OutMat A);
+
+  // updating nonsymmetric rank-1 matrix update
+  template<in-vector InVec1, in-vector InVec2, in-matrix InMat, out-matrix OutMat>
+    void matrix_rank_1_update(InVec1 x, InVec2 y, InMat E, OutMat A);
+  template<class ExecutionPolicy, in-vector InVec1, in-matrix InMat, in-vector InVec2, out-matrix OutMat>
+    void matrix_rank_1_update(ExecutionPolicy&& exec,
+                              InVec1 x, InVec2 y, InMat E, OutMat A);
+
+  template<in-vector InVec1, in-vector InVec2, in-matrix InMat, out-matrix OutMat>
+    void matrix_rank_1_update_c(InVec1 x, InVec2 y, InMat E, OutMat A);
+  template<class ExecutionPolicy, in-vector InVec1, in-vector InVec2, in-matrix InMat, out-matrix OutMat>
+    void matrix_rank_1_update_c(ExecutionPolicy&& exec,
+                                InVec1 x, InVec2 y, InMat E, OutMat A);
+
+  // [linalg.algs.blas2.symherrank1], symmetric or Hermitian rank-1 matrix update
+
+  // overwriting symmetric rank-1 matrix update 
+  template<class Scalar, in-vector InVec, possibly-packed-out-matrix OutMat, class Triangle>
+    void symmetric_matrix_rank_1_update(Scalar alpha, InVec x, OutMat A, Triangle t);
+  template<class ExecutionPolicy,
+           class Scalar, in-vector InVec, possibly-packed-out-matrix OutMat, class Triangle>
+    void symmetric_matrix_rank_1_update(ExecutionPolicy&& exec,
+                                        Scalar alpha, InVec x, OutMat A, Triangle t);
+  template<in-vector InVec, possibly-packed-out-matrix OutMat, class Triangle>
+    void symmetric_matrix_rank_1_update(InVec x, OutMat A, Triangle t);
+  template<class ExecutionPolicy,
+           in-vector InVec, possibly-packed-out-matrix OutMat, class Triangle>
+    void symmetric_matrix_rank_1_update(ExecutionPolicy&& exec,
+                                        InVec x, OutMat A, Triangle t);
+
+  // updating symmetric rank-1 matrix update 
+  template<class Scalar, in-vector InVec, possibly-packed-in-matrix InMat, possibly-packed-out-matrix OutMat, class Triangle>
+    void symmetric_matrix_rank_1_update(Scalar alpha, InVec x, InMat E, OutMat A, Triangle t);
+  template<class ExecutionPolicy,
+           class Scalar, in-vector InVec, possibly-packed-in-matrix InMat, possibly-packed-out-matrix OutMat, class Triangle>
+    void symmetric_matrix_rank_1_update(ExecutionPolicy&& exec,
+                                        Scalar alpha, InVec x, InMat E, OutMat A, Triangle t);
+  template<in-vector InVec, possibly-packed-in-matrix InMat, possibly-packed-out-matrix OutMat, class Triangle>
+    void symmetric_matrix_rank_1_update(InVec x, InMat E, OutMat A, Triangle t);
+  template<class ExecutionPolicy,
+           in-vector InVec, possibly-packed-in-matrix InMat, possibly-packed-out-matrix OutMat, class Triangle>
+    void symmetric_matrix_rank_1_update(ExecutionPolicy&& exec,
+                                        InVec x, InMat E, OutMat A, Triangle t);
+
+  // overwriting Hermitian rank-1 matrix update 
+  template<class Scalar, in-vector InVec, possibly-packed-out-matrix OutMat, class Triangle>
+    void hermitian_matrix_rank_1_update(Scalar alpha, InVec x, OutMat A, Triangle t);
+  template<class ExecutionPolicy,
+           class Scalar, in-vector InVec, possibly-packed-out-matrix OutMat, class Triangle>
+    void hermitian_matrix_rank_1_update(ExecutionPolicy&& exec,
+                                        Scalar alpha, InVec x, OutMat A, Triangle t);
+  template<in-vector InVec, possibly-packed-out-matrix OutMat, class Triangle>
+    void hermitian_matrix_rank_1_update(InVec x, OutMat A, Triangle t);
+  template<class ExecutionPolicy,
+           in-vector InVec, possibly-packed-out-matrix OutMat, class Triangle>
+    void hermitian_matrix_rank_1_update(ExecutionPolicy&& exec,
+                                        InVec x, OutMat A, Triangle t);
+
+  // updating Hermitian rank-1 matrix update 
+  template<class Scalar, in-vector InVec, possibly-packed-in-matrix InMat, possibly-packed-out-matrix OutMat, class Triangle>
+    void hermitian_matrix_rank_1_update(Scalar alpha, InVec x, InMat E, OutMat A, Triangle t);
+  template<class ExecutionPolicy,
+           class Scalar, in-vector InVec, possibly-packed-in-matrix InMat, possibly-packed-out-matrix OutMat, class Triangle>
+    void hermitian_matrix_rank_1_update(ExecutionPolicy&& exec,
+                                        Scalar alpha, InVec x, InMat E, OutMat A, Triangle t);
+  template<in-vector InVec, possibly-packed-in-matrix InMat, possibly-packed-out-matrix OutMat, class Triangle>
+    void hermitian_matrix_rank_1_update(InVec x, InMat E, OutMat A, Triangle t);
+  template<class ExecutionPolicy,
+           in-vector InVec, possibly-packed-in-matrix InMat, possibly-packed-out-matrix OutMat, class Triangle>
+    void hermitian_matrix_rank_1_update(ExecutionPolicy&& exec,
+                                        InVec x, InMat E, OutMat A, Triangle t);
+

8.3 Rank-2 update functions in +synopsis

+
+

In the header <linalg> synopsis +[linalg.syn], replace all the declarations of all the +symmetric_matrix_rank_2_update and +hermitian_matrix_rank_2_update overloads to read as +follows. [Editorial Note: These additions are not shown in green, +becuase the authors could not convince Markdown to format the code +correctly. – end note]

+
+
  // [linalg.algs.blas2.rank2], symmetric and Hermitian rank-2 matrix updates
+
+  // overwriting symmetric rank-2 matrix update
+  template<in-vector InVec1, in-vector InVec2,
+           possibly-packed-out-matrix OutMat, class Triangle>
+    void symmetric_matrix_rank_2_update(InVec1 x, InVec2 y, OutMat A, Triangle t);
+  template<class ExecutionPolicy, in-vector InVec1, in-vector InVec2,
+           possibly-packed-out-matrix OutMat, class Triangle>
+    void symmetric_matrix_rank_2_update(ExecutionPolicy&& exec,
+                                        InVec1 x, InVec2 y, OutMat A, Triangle t);
+
+  // updating symmetric rank-2 matrix update
+  template<in-vector InVec1, in-vector InVec2,
+           possibly-packed-in-matrix InMat,
+           possibly-packed-out-matrix OutMat, class Triangle>
+    void symmetric_matrix_rank_2_update(InVec1 x, InVec2 y, InMat E, OutMat A, Triangle t);
+  template<class ExecutionPolicy, in-vector InVec1, in-vector InVec2,
+           possibly-packed-in-matrix InMat,
+           possibly-packed-out-matrix OutMat, class Triangle>
+    void symmetric_matrix_rank_2_update(ExecutionPolicy&& exec,
+                                        InVec1 x, InVec2 y, InMat E, OutMat A, Triangle t);
+
+  // overwriting Hermitian rank-2 matrix update
+  template<in-vector InVec1, in-vector InVec2,
+           possibly-packed-out-matrix OutMat, class Triangle>
+    void hermitian_matrix_rank_2_update(InVec1 x, InVec2 y, OutMat A, Triangle t);
+  template<class ExecutionPolicy, in-vector InVec1, in-vector InVec2,
+           possibly-packed-out-matrix OutMat, class Triangle>
+    void hermitian_matrix_rank_2_update(ExecutionPolicy&& exec,
+                                        InVec1 x, InVec2 y, OutMat A, Triangle t);
+
+  // updating Hermitian rank-2 matrix update
+  template<in-vector InVec1, in-vector InVec2,
+           possibly-packed-in-matrix InMat,
+           possibly-packed-out-matrix OutMat, class Triangle>
+    void hermitian_matrix_rank_2_update(InVec1 x, InVec2 y, InMat E, OutMat A, Triangle t);
+  template<class ExecutionPolicy, in-vector InVec1, in-vector InVec2,
+           possibly-packed-in-matrix InMat,
+           possibly-packed-out-matrix OutMat, class Triangle>
+    void hermitian_matrix_rank_2_update(ExecutionPolicy&& exec,
+                                        InVec1 x, InVec2 y, InMat E, OutMat A, Triangle t);
+

8.4 Rank-k update functions in +synopsis

+
+

In the header <linalg> synopsis +[linalg.syn], replace all the declarations of all the +symmetric_matrix_rank_k_update and +hermitian_matrix_rank_k_update overloads to read as +follows. [Editorial Note: These additions are not shown in green, +becuase the authors could not convince Markdown to format the code +correctly. – end note]

+
+
  // [linalg.algs.blas3.rankk], rank-k update of a symmetric or Hermitian matrix
+
+  // overwriting symmetric rank-k matrix update
+  template<class Scalar,
+           in-matrix InMat,
+           possibly-packed-out-matrix OutMat,
+           class Triangle>
+    void symmetric_matrix_rank_k_update(
+      Scalar alpha, InMat A, OutMat C, Triangle t);
+  template<class ExecutionPolicy, class Scalar,
+           in-matrix InMat,
+           possibly-packed-out-matrix OutMat,
+           class Triangle>
+    void symmetric_matrix_rank_k_update(
+      ExecutionPolicy&& exec, Scalar alpha, InMat A, OutMat C, Triangle t);
+  template<in-matrix InMat,
+           possibly-packed-out-matrix OutMat,
+           class Triangle>
+    void symmetric_matrix_rank_k_update(
+      InMat A, OutMat C, Triangle t);
+  template<class ExecutionPolicy,
+           in-matrix InMat,
+           possibly-packed-out-matrix OutMat,
+           class Triangle>
+    void symmetric_matrix_rank_k_update(
+      ExecutionPolicy&& exec, InMat A, OutMat C, Triangle t);
+
+  // updating symmetric rank-k matrix update
+  template<class Scalar,
+           in-matrix InMat1,
+           possibly-packed-in-matrix InMat2,
+           possibly-packed-out-matrix OutMat,
+           class Triangle>
+    void symmetric_matrix_rank_k_update(
+      Scalar alpha,
+      InMat1 A, InMat2 E, OutMat C, Triangle t);
+  template<class ExecutionPolicy, class Scalar,
+           in-matrix InMat1,
+           possibly-packed-in-matrix InMat2,
+           possibly-packed-out-matrix OutMat,
+           class Triangle>
+    void symmetric_matrix_rank_k_update(
+      ExecutionPolicy&& exec, Scalar alpha,
+      InMat1 A, InMat2 E, OutMat C, Triangle t);
+  template<in-matrix InMat1,
+           possibly-packed-in-matrix InMat2,
+           possibly-packed-out-matrix OutMat,
+           class Triangle>
+    void symmetric_matrix_rank_k_update(
+      InMat1 A, InMat2 E, OutMat C, Triangle t);
+  template<class ExecutionPolicy,
+           in-matrix InMat1,
+           possibly-packed-in-matrix InMat2,
+           possibly-packed-out-matrix OutMat,
+           class Triangle>
+    void symmetric_matrix_rank_k_update(
+      ExecutionPolicy&& exec,
+      InMat1 A, InMat2 E, OutMat C, Triangle t);
+
+  // overwriting rank-k Hermitian matrix update
+  template<class Scalar,
+           in-matrix InMat,
+           possibly-packed-out-matrix OutMat,
+           class Triangle>
+    void hermitian_matrix_rank_k_update(
+      Scalar alpha, InMat A, OutMat C, Triangle t);
+  template<class ExecutionPolicy, class Scalar,
+           in-matrix InMat,
+           possibly-packed-out-matrix OutMat,
+           class Triangle>
+    void hermitian_matrix_rank_k_update(
+      ExecutionPolicy&& exec, Scalar alpha, InMat A, OutMat C, Triangle t);
+  template<in-matrix InMat,
+           possibly-packed-out-matrix OutMat,
+           class Triangle>
+    void hermitian_matrix_rank_k_update(
+      InMat A, OutMat C, Triangle t);
+  template<class ExecutionPolicy,
+           in-matrix InMat,
+           possibly-packed-out-matrix OutMat,
+           class Triangle>
+    void hermitian_matrix_rank_k_update(
+      ExecutionPolicy&& exec, InMat A, OutMat C, Triangle t);
+
+  // updating rank-k Hermitian matrix update
+  template<class Scalar,
+           in-matrix InMat1,
+           possibly-packed-in-matrix InMat2,
+           possibly-packed-out-matrix OutMat,
+           class Triangle>
+    void hermitian_matrix_rank_k_update(
+      Scalar alpha,
+      InMat1 A, InMat2 E, OutMat C, Triangle t);
+  template<class ExecutionPolicy, class Scalar,
+           in-matrix InMat1,
+           possibly-packed-in-matrix InMat2,
+           possibly-packed-out-matrix OutMat,
+           class Triangle>
+    void hermitian_matrix_rank_k_update(
+      ExecutionPolicy&& exec, Scalar alpha,
+      InMat1 A, InMat2 E, OutMat C, Triangle t);
+  template<in-matrix InMat1,
+           possibly-packed-in-matrix InMat2,
+           possibly-packed-out-matrix OutMat,
+           class Triangle>
+    void hermitian_matrix_rank_k_update(
+      InMat1 A, InMat2 E, OutMat C, Triangle t);
+  template<class ExecutionPolicy,
+           in-matrix InMat1,
+           possibly-packed-in-matrix InMat2,
+           possibly-packed-out-matrix OutMat,
+           class Triangle>
+    void hermitian_matrix_rank_k_update(
+      ExecutionPolicy&& exec,
+      InMat1 A, InMat2 E, OutMat C, Triangle t);
+

8.5 Rank-2k update functions in +synopsis

+
+

In the header <linalg> synopsis +[linalg.syn], replace all the declarations of all the +symmetric_matrix_rank_2k_update and +hermitian_matrix_rank_2k_update overloads to read as +follows. [Editorial Note: These additions are not shown in green, +becuase the authors could not convince Markdown to format the code +correctly. – end note]

+
+
  // [linalg.algs.blas3.rank2k], rank-2k update of a symmetric or Hermitian matrix
+
+  // overwriting symmetric rank-2k matrix update
+  template<in-matrix InMat1, in-matrix InMat2,
+           possibly-packed-out-matrix OutMat, class Triangle>
+    void symmetric_matrix_rank_2k_update(InMat1 A, InMat2 B, OutMat C, Triangle t);
+  template<class ExecutionPolicy,
+           in-matrix InMat1, in-matrix InMat2,
+           possibly-packed-out-matrix OutMat, class Triangle>
+    void symmetric_matrix_rank_2k_update(ExecutionPolicy&& exec,
+                                         InMat1 A, InMat2 B, OutMat C, Triangle t);
+
+  // updating symmetric rank-2k matrix update
+  template<in-matrix InMat1, in-matrix InMat2,
+           possibly-packed-in-matrix InMat3,
+           possibly-packed-out-matrix OutMat, class Triangle>
+    void symmetric_matrix_rank_2k_update(InMat1 A, InMat2 B, InMat3 E, OutMat C, Triangle t);
+  template<class ExecutionPolicy,
+           in-matrix InMat1, in-matrix InMat2,
+           possibly-packed-in-matrix InMat3,
+           possibly-packed-out-matrix OutMat, class Triangle>
+    void symmetric_matrix_rank_2k_update(ExecutionPolicy&& exec,
+                                         InMat1 A, InMat2 B, InMat3 E, OutMat C, Triangle t);
+
+  // overwriting Hermitian rank-2k matrix update
+  template<in-matrix InMat1, in-matrix InMat2,
+           possibly-packed-out-matrix OutMat, class Triangle>
+    void hermitian_matrix_rank_2k_update(InMat1 A, InMat2 B, OutMat C, Triangle t);
+  template<class ExecutionPolicy,
+           in-matrix InMat1, in-matrix InMat2,
+           possibly-packed-out-matrix OutMat, class Triangle>
+    void hermitian_matrix_rank_2k_update(ExecutionPolicy&& exec,
+                                         InMat1 A, InMat2 B, OutMat C, Triangle t);
+
+  // updating Hermitian rank-2k matrix update
+  template<in-matrix InMat1, in-matrix InMat2,
+           possibly-packed-in-matrix InMat3,
+           possibly-packed-out-matrix OutMat, class Triangle>
+    void hermitian_matrix_rank_2k_update(InMat1 A, InMat2 B, InMat3 E, OutMat C, Triangle t);
+  template<class ExecutionPolicy,
+           in-matrix InMat1, in-matrix InMat2,
+           possibly-packed-in-matrix InMat3,
+           possibly-packed-out-matrix OutMat, class Triangle>
+    void hermitian_matrix_rank_2k_update(ExecutionPolicy&& exec,
+                                         InMat1 A, InMat2 B, InMat3 E, OutMat C, Triangle t);
+

8.6 Specification of nonsymmetric +rank-1 update functions

+
+

Replace the entire contents of [linalg.algs.blas2.rank1] with the +following.

+
+

1 +The following elements apply to all functions in +[linalg.algs.blas2.rank1].

+

2 +Mandates:

+

(2.1) +possibly-multipliable<OutMat, InVec2, InVec1>() +is true, and

+

(2.2) +possibly-addable(A, E, A) is +true for those overloads that take an E +parameter.

+

3 +Preconditions:

+

(3.1) +multipliable(A, y, x) is true, and

+

(3.2) +addable(A, E, A) is true +for those overloads that take an E parameter.

+

4 +Complexity: O( +x.extent(0) × y.extent(0) ).

+
template<in-vector InVec1, in-vector InVec2, out-matrix OutMat>
+  void matrix_rank_1_update(InVec1 x, InVec2 y, OutMat A);
+template<class ExecutionPolicy, in-vector InVec1, in-vector InVec2, out-matrix OutMat>
+  void matrix_rank_1_update(ExecutionPolicy&& exec, InVec1 x, InVec2 y, OutMat A);
+

5 +These functions perform a overwriting nonsymmetric nonconjugated rank-1 +update.

+

[Note: These functions correspond to the BLAS functions +xGER (for real element types) and xGERU (for +complex element types)[bib]. – end note]

+

6 +Effects: Computes A = xyT.

+
template<in-vector InVec1, in-vector InVec2, in-matrix InMat, out-matrix OutMat>
+  void matrix_rank_1_update(InVec1 x, InVec2 y, InMat E, OutMat A);
+template<class ExecutionPolicy, in-vector InVec1, in-vector InVec2, in-matrix InMat, out-matrix OutMat>
+  void matrix_rank_1_update(ExecutionPolicy&& exec, InVec1 x, InVec2 y, InMat E, OutMat A);
+

7 +These functions perform an updating nonsymmetric nonconjugated rank-1 +update.

+

[Note: These functions correspond to the BLAS functions +xGER (for real element types) and xGERU (for +complex element types)[bib]. – end note]

+

8 +Effects: Computes A = E + xyT.

+

9 +Remarks: A may alias E.

+
template<in-vector InVec1, in-vector InVec2, out-matrix OutMat>
+  void matrix_rank_1_update_c(InVec1 x, InVec2 y, OutMat A);
+template<class ExecutionPolicy, in-vector InVec1, in-vector InVec2, out-matrix OutMat>
+  void matrix_rank_1_update_c(ExecutionPolicy&& exec, InVec1 x, InVec2 y, OutMat A);
+

10 These +functions perform a overwriting nonsymmetric conjugated rank-1 +update.

+

[Note: These functions correspond to the BLAS functions +xGER (for real element types) and xGERC (for +complex element types)[bib]. – end note]

+

11 +Effects:

+

(11.1) For +the overloads without an ExecutionPolicy argument, +equivalent to:

+
matrix_rank_1_update(x, conjugated(y), A);
+

(11.2) +otherwise, equivalent to:

+
matrix_rank_1_update(std::forward<ExecutionPolicy>(exec), x, conjugated(y), A);
+

8.7 Specification of symmetric and +Hermitian rank-1 update functions

+
+

Replace the entire contents of [linalg.algs.blas2.symherrank1] with +the following.

+
+

1 +[Note: These functions correspond to the BLAS functions +xSYR, xSPR, xHER, and +xHPR[bib]. They have overloads taking a scaling factor +alpha, because it would be impossible to express the update +A = AxxT +otherwise. – end note]

+

2 +The following elements apply to all functions in +[linalg.algs.blas2.symherrank1].

+

3 +For any function F in this section that takes a parameter +named t, an InMat template parameter, and a +function parameter InMat E, t applies to +accesses done through the parameter E. F will +only access the triangle of E specified by t. +For accesses of diagonal elements E[i, i], F +will use the value +real-if-needed(E[i, i]) if the name +of F starts with hermitian. For accesses +E[i, j] outside the triangle specified by t, +F will use the value

+

(3.1) +conj-if-needed(E[j, i]) if the name +of F starts with hermitian, or

+

(3.2) +E[j, i] if the name of F starts with +symmetric.

+

4 +Mandates:

+

(4.1) If +OutMat has layout_blas_packed layout, then the +layout’s Triangle template argument has the same type as +the function’s Triangle template argument;

+

(4.2) If +the function has an InMat template parameter and +InMat has layout_blas_packed layout, then the +layout’s Triangle template argument has the same type as +the function’s Triangle template argument;

+

(4.3) +compatible-static-extents<decltype(A), decltype(A)>(0, 1) +is true;

+

(4.4) +compatible-static-extents<decltype(A), decltype(x)>(0, 0) +is true; and

+

(4.5) +possibly-addable<decltype(A), decltype(E), decltype(A)> +is true for those overloads that take an E +parameter.

+

5 +Preconditions:

+

(5.1) +A.extent(0) equals A.extent(1),

+

(5.2) +A.extent(0) equals x.extent(0), and

+

(5.3) +addable(A, E, A) is true +for those overloads that take an E parameter.

+

6 +Complexity: O( +x.extent(0) × x.extent(0) ).

+
template<class Scalar, in-vector InVec, possibly-packed-out-matrix OutMat, class Triangle>
+  void symmetric_matrix_rank_1_update(Scalar alpha, InVec x, OutMat A, Triangle t);
+template<class ExecutionPolicy,
+         class Scalar, in-vector InVec, possibly-packed-out-matrix OutMat, class Triangle>
+  void symmetric_matrix_rank_1_update(ExecutionPolicy&& exec,
+                                      Scalar alpha, InVec x, OutMat A, Triangle t);
+

7 +These functions perform an overwriting symmetric rank-1 update of the +symmetric matrix A, taking into account the +Triangle parameter that applies to A +([linalg.general]).

+

8 +Effects: Computes A = αxxT, +where the scalar α is +alpha.

+
template<in-vector InVec, possibly-packed-out-matrix OutMat, class Triangle>
+  void symmetric_matrix_rank_1_update(InVec x, OutMat A, Triangle t);
+template<class ExecutionPolicy,
+         in-vector InVec, possibly-packed-out-matrix OutMat, class Triangle>
+  void symmetric_matrix_rank_1_update(ExecutionPolicy&& exec, InVec x, OutMat A, Triangle t);
+

9 +These functions perform an overwriting symmetric rank-1 update of the +symmetric matrix A, taking into account the +Triangle parameter that applies to A +([linalg.general]).

+

10 +Effects: Computes A = xxT.

+
template<class Scalar, in-vector InVec, possibly-packed-in-matrix InMat, possibly-packed-out-matrix OutMat, class Triangle>
+  void symmetric_matrix_rank_1_update(Scalar alpha, InVec x, InMat E, OutMat A, Triangle t);
+template<class ExecutionPolicy,
+         class Scalar, in-vector InVec, possibly-packed-in-matrix InMat, possibly-packed-out-matrix OutMat, class Triangle>
+  void symmetric_matrix_rank_1_update(ExecutionPolicy&& exec,
+                                      Scalar alpha, InVec x, InMat E, OutMat A, Triangle t);
+

11 These +functions perform an updating symmetric rank-1 update of the symmetric +matrix A using the symmetric matrix E, taking +into account the Triangle parameter that applies to +A and E ([linalg.general]).

+

12 +Effects: Computes A = E + αxxT, +where the scalar α is +real-if-needed(alpha).

+
template<in-vector InVec, possibly-packed-in-matrix InMat, possibly-packed-out-matrix OutMat, class Triangle>
+  void symmetric_matrix_rank_1_update(InVec x, InMat E, OutMat A, Triangle t);
+template<class ExecutionPolicy,
+         in-vector InVec, possibly-packed-in-matrix InMat, possibly-packed-out-matrix OutMat, class Triangle>
+  void symmetric_matrix_rank_1_update(ExecutionPolicy&& exec,
+                                      InVec x, InMat E, OutMat A, Triangle t);
+

13 These +functions perform an updating symmetric rank-1 update of the symmetric +matrix A using the symmetric matrix E, taking +into account the Triangle parameter that applies to +A and E ([linalg.general]).

+

14 +Effects: Computes A = E + xxT.

+
template<class Scalar, in-vector InVec, possibly-packed-out-matrix OutMat, class Triangle>
+  void hermitian_matrix_rank_1_update(Scalar alpha, InVec x, OutMat A, Triangle t);
+template<class ExecutionPolicy,
+         class Scalar, in-vector InVec, possibly-packed-out-matrix OutMat, class Triangle>
+  void hermitian_matrix_rank_1_update(ExecutionPolicy&& exec,
+                                      Scalar alpha, InVec x, OutMat A, Triangle t);
+

15 These +functions perform an overwriting Hermitian rank-1 update of the +Hermitian matrix A, taking into account the +Triangle parameter that applies to A +([linalg.general]).

+

16 +Effects: Computes A = αxxH, +where the scalar α is +real-if-needed(alpha).

+
template<in-vector InVec, possibly-packed-out-matrix OutMat, class Triangle>
+  void hermitian_matrix_rank_1_update(InVec x, OutMat A, Triangle t);
+template<class ExecutionPolicy,
+         in-vector InVec, possibly-packed-out-matrix OutMat, class Triangle>
+  void hermitian_matrix_rank_1_update(ExecutionPolicy&& exec, InVec x, OutMat A, Triangle t);
+

17 These +functions perform an overwriting Hermitian rank-1 update of the +Hermitian matrix A, taking into account the +Triangle parameter that applies to A +([linalg.general]).

+

18 +Effects: Computes A = xxT.

+
template<class Scalar, in-vector InVec, possibly-packed-in-matrix InMat, possibly-packed-out-matrix OutMat, class Triangle>
+  void hermitian_matrix_rank_1_update(Scalar alpha, InVec x, InMat E, OutMat A, Triangle t);
+template<class ExecutionPolicy,
+         class Scalar, in-vector InVec, possibly-packed-in-matrix InMat, possibly-packed-out-matrix OutMat, class Triangle>
+  void hermitian_matrix_rank_1_update(ExecutionPolicy&& exec,
+                                      Scalar alpha, InVec x, InMat E, OutMat A, Triangle t);
+

19 These +functions perform an updating Hermitian rank-1 update of the Hermitian +matrix A using the Hermitian matrix E, taking +into account the Triangle parameter that applies to +A and E ([linalg.general]).

+

20 +Effects: Computes A = E + αxxH, +where the scalar α is +real-if-needed(alpha).

+
template<in-vector InVec, possibly-packed-in-matrix InMat, possibly-packed-out-matrix OutMat, class Triangle>
+  void hermitian_matrix_rank_1_update(InVec x, InMat E, OutMat A, Triangle t);
+template<class ExecutionPolicy,
+         in-vector InVec, possibly-packed-in-matrix InMat, possibly-packed-out-matrix OutMat, class Triangle>
+  void hermitian_matrix_rank_1_update(ExecutionPolicy&& exec,
+                                      InVec x, InMat E, OutMat A, Triangle t);
+

21 These +functions perform an updating Hermitian rank-1 update of the Hermitian +matrix A using the Hermitian matrix E, taking +into account the Triangle parameter that applies to +A and E ([linalg.general]).

+

22 +Effects: Computes A = E + xxT.

+

8.8 Specification of symmetric and +Hermitian rank-2 update functions

+
+

Replace the entire contents of [linalg.algs.blas2.rank2] with the +following.

+
+

1 +[Note: These functions correspond to the BLAS functions +xSYR2, xSPR2, xHER2, and +xHPR2 [bib]. – end note]

+

2 +The following elements apply to all functions in +[linalg.algs.blas2.rank2].

+

3 +For any function F in this section that takes a parameter +named t, an InMat template parameter, and a +function parameter InMat E, t applies to +accesses done through the parameter E. F will +only access the triangle of E specified by t. +For accesses of diagonal elements E[i, i], F +will use the value +real-if-needed(E[i, i]) if the name +of F starts with hermitian. For accesses +E[i, j] outside the triangle specified by t, +F will use the value

+

(3.1) +conj-if-needed(E[j, i]) if the name +of F starts with hermitian, or

+

(3.2) +E[j, i] if the name of F starts with +symmetric.

+

4 +Mandates:

+

(4.1) If +OutMat has layout_blas_packed layout, then the +layout’s Triangle template argument has the same type as +the function’s Triangle template argument;

+

(4.2) If +the function has an InMat template parameter and +InMat has layout_blas_packed layout, then the +layout’s Triangle template argument has the same type as +the function’s Triangle template argument;

+

(4.3) +compatible-static-extents<decltype(A), decltype(A)>(0, 1) +is true;

+

(4.4) +possibly-multipliable<decltype(A), decltype(x), decltype(y)>() +is true; and

+

(4.5) +possibly-addable<decltype(A), decltype(E), decltype(A)> +is true for those overloads that take an E +parameter.

+

5 +Preconditions:

+

(5.1) +A.extent(0) equals A.extent(1),

+

(5.2) +multipliable(A, x, y) is +true, and

+

(5.3) +addable(A, E, A) is true +for those overloads that take an E parameter.

+

6 +Complexity: O( +x.extent(0) × y.extent(0) ).

+
template<in-vector InVec1, in-vector InVec2,
+         possibly-packed-out-matrix OutMat, class Triangle>
+  void symmetric_matrix_rank_2_update(InVec1 x, InVec2 y, OutMat A, Triangle t);
+template<class ExecutionPolicy, in-vector InVec1, in-vector InVec2,
+         possibly-packed-out-matrix OutMat, class Triangle>
+  void symmetric_matrix_rank_2_update(ExecutionPolicy&& exec,
+                                      InVec1 x, InVec2 y, OutMat A, Triangle t);
+

7 +These functions perform an overwriting symmetric rank-2 update of the +symmetric matrix A, taking into account the +Triangle parameter that applies to A +([linalg.general]).

+

8 +Effects: Computes A = xyT + yxT.

+
template<in-vector InVec1, in-vector InVec2,
+         possibly-packed-in-matrix InMat,
+         possibly-packed-out-matrix OutMat, class Triangle>
+  void symmetric_matrix_rank_2_update(InVec1 x, InVec2 y, InMat E, OutMat A, Triangle t);
+template<class ExecutionPolicy, in-vector InVec1, in-vector InVec2,
+         possibly-packed-in-matrix InMat,
+         possibly-packed-out-matrix OutMat, class Triangle>
+  void symmetric_matrix_rank_2_update(ExecutionPolicy&& exec,
+                                      InVec1 x, InVec2 y, InMat E, OutMat A, Triangle t);
+

9 +These functions perform an updating symmetric rank-2 update of the +symmetric matrix A using the symmetric matrix +E, taking into account the Triangle parameter +that applies to A and E +([linalg.general]).

+

10 +Effects: Computes A = E + xyT + yxT.

+
template<in-vector InVec1, in-vector InVec2,
+         possibly-packed-out-matrix OutMat, class Triangle>
+  void hermitian_matrix_rank_2_update(InVec1 x, InVec2 y, OutMat A, Triangle t);
+template<class ExecutionPolicy, in-vector InVec1, in-vector InVec2,
+         possibly-packed-out-matrix OutMat, class Triangle>
+  void hermitian_matrix_rank_2_update(ExecutionPolicy&& exec,
+                                      InVec1 x, InVec2 y, OutMat A, Triangle t);
+

11 These +functions perform an overwriting Hermitian rank-2 update of the +Hermitian matrix A, taking into account the +Triangle parameter that applies to A +([linalg.general]).

+

12 +Effects: Computes A = xyH + yxH.

+
template<in-vector InVec1, in-vector InVec2,
+         possibly-packed-in-matrix InMat,
+         possibly-packed-out-matrix OutMat, class Triangle>
+  void hermitian_matrix_rank_2_update(InVec1 x, InVec2 y, InMat E, OutMat A, Triangle t);
+template<class ExecutionPolicy, in-vector InVec1, in-vector InVec2,
+         possibly-packed-in-matrix InMat,
+         possibly-packed-out-matrix OutMat, class Triangle>
+  void hermitian_matrix_rank_2_update(ExecutionPolicy&& exec,
+                                      InVec1 x, InVec2 y, InMat E, OutMat A, Triangle t);
+

13 These +functions perform an updating Hermitian rank-2 update of the Hermitian +matrix A using the Hermitian matrix E, taking +into account the Triangle parameter that applies to +A and E ([linalg.general]).

+

14 +Effects: Computes A = E + xyH + yxH.

+

8.9 Specification of rank-k update +functions

+
+

Replace the entire contents of [linalg.algs.blas3.rankk] with the +following.

+
+

[Note: These functions correspond to the BLAS functions +xSYRK and xHERK. – end note]

+

1 +The following elements apply to all functions in +[linalg.algs.blas3.rankk].

+

2 +For any function F in this section that takes a parameter +named t, an InMat2 template parameter, and a +function parameter InMat2 E, t applies to +accesses done through the parameter E. F will +only access the triangle of E specified by t. +For accesses of diagonal elements E[i, i], F +will use the value +real-if-needed(E[i, i]) if the name +of F starts with hermitian. For accesses +E[i, j] outside the triangle specified by t, +F will use the value

+

(2.1) +conj-if-needed(E[j, i]) if the name +of F starts with hermitian, or

+

(2.2) +E[j, i] if the name of F starts with +symmetric.

+

3 +Mandates:

+
    +
  • (3.1) If +OutMat has layout_blas_packed layout, then the +layout’s Triangle template argument has the same type as +the function’s Triangle template argument.

  • +
  • (3.2) If +the function takes an InMat2 template parameter and if +InMat2 has layout_blas_packed layout, then the +layout’s Triangle template argument has the same type as +the function’s Triangle template argument.

  • +
  • (3.3) +possibly-multipliable<decltype(A), decltype(transposed(A)), decltype(C)> +is true.

  • +
  • (3.4) +possibly-addable<decltype(C), decltype(E), decltype(C)> +is true for those overloads that take an E +parameter.

  • +
+

4 +Preconditions:

+
    +
  • (4.1) +multipliable(A, transposed(A), C) is +true. [Note: This implies that C is +square. – end note]

  • +
  • (4.2) +addable(C, E, C) is true +for those overloads that take an E parameter.

  • +
+

5 +Complexity: O( +A.extent(0) +A.extent(1) +A.extent(0) ).

+

6 +Remarks: C may alias E for those +overloads that take an E parameter.

+
template<class Scalar,
+         in-matrix InMat,
+         possibly-packed-out-matrix OutMat,
+         class Triangle>
+void symmetric_matrix_rank_k_update(
+  Scalar alpha,
+  InMat A,
+  OutMat C,
+  Triangle t);
+template<class ExecutionPolicy,
+         class Scalar,
+         in-matrix InMat,
+         possibly-packed-out-matrix OutMat,
+         class Triangle>
+void symmetric_matrix_rank_k_update(
+  ExecutionPolicy&& exec,
+  Scalar alpha,
+  InMat A,
+  OutMat C,
+  Triangle t);
+

5 +Effects: Computes C = αAAT, +where the scalar α is +alpha.

+
template<in-matrix InMat,
+         possibly-packed-out-matrix OutMat,
+         class Triangle>
+void symmetric_matrix_rank_k_update(
+  InMat A,
+  OutMat C,
+  Triangle t);
+template<class ExecutionPolicy,
+         in-matrix InMat,
+         possibly-packed-out-matrix OutMat,
+         class Triangle>
+void symmetric_matrix_rank_k_update(
+  ExecutionPolicy&& exec,
+  InMat A,
+  OutMat C,
+  Triangle t);
+

6 +Effects: Computes C = AAT.

+
template<class Scalar,
+         in-matrix InMat,
+         possibly-packed-out-matrix OutMat,
+         class Triangle>
+void hermitian_matrix_rank_k_update(
+  Scalar alpha,
+  InMat A,
+  OutMat C,
+  Triangle t);
+template<class ExecutionPolicy,
+         class Scalar,
+         in-matrix InMat,
+         possibly-packed-out-matrix OutMat,
+         class Triangle>
+void hermitian_matrix_rank_k_update(
+  ExecutionPolicy&& exec,
+  Scalar alpha,
+  InMat A,
+  OutMat C,
+  Triangle t);
+

7 +Effects: Computes C = αAAH, +where the scalar α is +real-if-needed(alpha).

+
template<in-matrix InMat,
+         possibly-packed-out-matrix OutMat,
+         class Triangle>
+void hermitian_matrix_rank_k_update(
+  InMat A,
+  OutMat C,
+  Triangle t);
+template<class ExecutionPolicy,
+         in-matrix InMat,
+         possibly-packed-out-matrix OutMat,
+         class Triangle>
+void hermitian_matrix_rank_k_update(
+  ExecutionPolicy&& exec,
+  InMat A,
+  OutMat C,
+  Triangle t);
+

8 +Effects: Computes C = AAH.

+
template<class Scalar,
+         in-matrix InMat1,
+         possibly-packed-in-matrix InMat2,
+         possibly-packed-out-matrix OutMat,
+         class Triangle>
+void symmetric_matrix_rank_k_update(
+  Scalar alpha,
+  InMat1 A,
+  InMat2 E,
+  OutMat C,
+  Triangle t);
+template<class ExecutionPolicy,
+         class Scalar,
+         in-matrix InMat1,
+         possibly-packed-in-matrix InMat2,
+         possibly-packed-out-matrix OutMat,
+         class Triangle>
+void symmetric_matrix_rank_k_update(
+  ExecutionPolicy&& exec,
+  Scalar alpha,
+  InMat1 A,
+  InMat2 E,
+  OutMat C,
+  Triangle t);
+

9 +Effects: Computes C = E + αAAT, +where the scalar α is +real-if-needed(alpha).

+
template<in-matrix InMat1,
+         possibly-packed-in-matrix InMat2,
+         possibly-packed-out-matrix OutMat,
+         class Triangle>
+void symmetric_matrix_rank_k_update(
+  InMat1 A,
+  InMat2 E,
+  OutMat C,
+  Triangle t);
+template<class ExecutionPolicy,
+         in-matrix InMat1,
+         possibly-packed-in-matrix InMat2,
+         possibly-packed-out-matrix OutMat,
+         class Triangle>
+void symmetric_matrix_rank_k_update(
+  ExecutionPolicy&& exec,
+  InMat1 A,
+  InMat2 E,
+  OutMat C,
+  Triangle t);
+

10 +Effects: Computes C = E + AAT.

+
template<class Scalar,
+         in-matrix InMat1,
+         possibly-packed-in-matrix InMat2,
+         possibly-packed-out-matrix OutMat,
+         class Triangle>
+void hermitian_matrix_rank_k_update(
+  Scalar alpha,
+  InMat1 A,
+  InMat2 E,
+  OutMat C,
+  Triangle t);
+template<class ExecutionPolicy,
+         class Scalar,
+         in-matrix InMat1,
+         possibly-packed-in-matrix InMat2,
+         possibly-packed-out-matrix OutMat,
+         class Triangle>
+void hermitian_matrix_rank_k_update(
+  ExecutionPolicy&& exec,
+  Scalar alpha,
+  InMat1 A,
+  InMat2 E,
+  OutMat C,
+  Triangle t);
+

11 +Effects: Computes C = E + αAAH, +where the scalar α is +real-if-needed(alpha).

+
template<in-matrix InMat1,
+         possibly-packed-in-matrix InMat2,
+         possibly-packed-out-matrix OutMat,
+         class Triangle>
+void hermitian_matrix_rank_k_update(
+  InMat1 A,
+  InMat2 E,
+  OutMat C,
+  Triangle t);
+template<class ExecutionPolicy,
+         in-matrix InMat1,
+         possibly-packed-in-matrix InMat2,
+         possibly-packed-out-matrix OutMat,
+         class Triangle>
+void hermitian_matrix_rank_k_update(
+  ExecutionPolicy&& exec,
+  InMat1 A,
+  InMat2 E,
+  OutMat C,
+  Triangle t);
+

12 +Effects: Computes C = E + AAH.

+

8.10 Specification of rank-2k +update functions

+
+

Replace the entire contents of [linalg.algs.blas3.rank2k] with the +following.

+
+

[Note: These functions correspond to the BLAS functions +xSYR2K and xHER2K. – end note]

+

1 +The following elements apply to all functions in +[linalg.algs.blas3.rank2k].

+

2 +For any function F in this section that takes a parameter +named t, an InMat3 template parameter, and a +function parameter InMat3 E, t applies to +accesses done through the parameter E. F will +only access the triangle of E specified by t. +For accesses of diagonal elements E[i, i], F +will use the value +real-if-needed(E[i, i]) if the name +of F starts with hermitian. For accesses +E[i, j] outside the triangle specified by t, +F will use the value

+

(2.1) +conj-if-needed(E[j, i]) if the name +of F starts with hermitian, or

+

(2.2) +E[j, i] if the name of F starts with +symmetric.

+

3 +Mandates:

+
    +
  • (3.1) If +OutMat has layout_blas_packed layout, then the +layout’s Triangle template argument has the same type as +the function’s Triangle template argument;

  • +
  • (3.2) If +the function takes an InMat3 template parameter and if +InMat3 has layout_blas_packed layout, then the +layout’s Triangle template argument has the same type as +the function’s Triangle template argument.

  • +
  • (3.3) +possibly-multipliable<decltype(A), decltype(transposed(B)), decltype(C)> +is true.

  • +
  • (3.4) +possibly-multipliable<decltype(B), decltype(transposed(A)), decltype(C)> +is true.

  • +
  • (3.5) +possibly-addable<decltype(C), decltype(E), decltype(C)> +is true for those overloads that take an E +parameter.

  • +
+

4 +Preconditions:

+
    +
  • (4.1) +multipliable(A, transposed(B), C) is +true.

  • +
  • (4.2) +multipliable(B, transposed(A), C) is +true. [Note: This and the previous imply that +C is square. – end note]

  • +
  • (4.3) +addable(C, E, C) is true +for those overloads that take an E parameter.

  • +
+

4 +Complexity: O( +A.extent(0) +A.extent(1) +B.extent(0) )

+

5 +Remarks: C may alias E for those +overloads that take an E parameter.

+
template<in-matrix InMat1,
+         in-matrix InMat2,
+         possibly-packed-out-matrix OutMat,
+         class Triangle>
+void symmetric_matrix_rank_2k_update(
+  InMat1 A,
+  InMat2 B,
+  OutMat C,
+  Triangle t);
+template<class ExecutionPolicy,
+         in-matrix InMat1,
+         in-matrix InMat2,
+         possibly-packed-out-matrix OutMat,
+         class Triangle>
+void symmetric_matrix_rank_2k_update(
+  ExecutionPolicy&& exec,
+  InMat1 A,
+  InMat2 B,
+  OutMat C,
+  Triangle t);
+

5 +Effects: Computes C = ABT + BAT.

+
template<in-matrix InMat1,
+         in-matrix InMat2,
+         possibly-packed-out-matrix OutMat,
+         class Triangle>
+void hermitian_matrix_rank_2k_update(
+  InMat1 A,
+  InMat2 B,
+  OutMat C,
+  Triangle t);
+template<class ExecutionPolicy,
+         in-matrix InMat1,
+         in-matrix InMat2,
+         possibly-packed-out-matrix OutMat,
+         class Triangle>
+void hermitian_matrix_rank_2k_update(
+  ExecutionPolicy&& exec,
+  InMat1 A,
+  InMat2 B,
+  OutMat C,
+  Triangle t);
+

6 +Effects: Computes C = ABH + BAH.

+
template<in-matrix InMat1,
+         in-matrix InMat2,
+         in-matrix InMat3,
+         possibly-packed-out-matrix OutMat,
+         class Triangle>
+void symmetric_matrix_rank_2k_update(
+  InMat1 A,
+  InMat2 B,
+  InMat3 E,
+  OutMat C,
+  Triangle t);
+template<class ExecutionPolicy,
+         in-matrix InMat1,
+         in-matrix InMat2,
+         in-matrix InMat3,
+         possibly-packed-out-matrix OutMat,
+         class Triangle>
+void symmetric_matrix_rank_2k_update(
+  ExecutionPolicy&& exec,
+  InMat1 A,
+  InMat2 B,
+  InMat3 E,
+  OutMat C,
+  Triangle t);
+

7 +Effects: Computes C = E + ABT + BAT.

+
template<in-matrix InMat1,
+         in-matrix InMat2,
+         in-matrix InMat3,
+         possibly-packed-out-matrix OutMat,
+         class Triangle>
+void hermitian_matrix_rank_2k_update(
+  InMat1 A,
+  InMat2 B,
+  InMat3 E,
+  OutMat C,
+  Triangle t);
+template<class ExecutionPolicy,
+         in-matrix InMat1,
+         in-matrix InMat2,
+         in-matrix InMat3,
+         possibly-packed-out-matrix OutMat,
+         class Triangle>
+void hermitian_matrix_rank_2k_update(
+  ExecutionPolicy&& exec,
+  InMat1 A,
+  InMat2 B,
+  InMat3 E,
+  OutMat C,
+  Triangle t);
+

8 +Effects: Computes C = E + ABH + BAH.

+

9 Appendix A: Example of a generic +numerical algorithm

+

The following example shows how to implement a generic numerical +algorithm, two_norm_abs. This algorithm computes the +absolute value of a variety of number types. For complex numbers, it +returns the magnitude, which is the same as the two-norm of the +two-element vector composed of the real and imaginary parts of the +complex number. This Compiler +Explorer link demonstrates the implementation. This is not meant to +show an ideal implementation. (A better one would use rescaling, in the +manner of std::hypot, to avoid undue overflow or +underflow.) Instead, it illustrates generic numerical algorithm +development. Commenting out the +#define DEFINE_CONJ_REAL_IMAG_FOR_REAL 1 line shows that +without ADL-findable conj, real, and +imag, users’ generic numerical algorithms would need more +special cases and more assumptions on number types.

+
#include <cassert>
+#include <cmath>
+#include <complex>
+#include <concepts>
+#include <type_traits>
+
+#define DEFINE_CONJ_REAL_IMAG_FOR_REAL 1
+
+template<class T>
+constexpr bool is_std_complex = false;
+template<class R>
+constexpr bool is_std_complex<std::complex<R>> = true;
+
+template<class T>
+auto two_norm_abs(T t) {
+  if constexpr (std::is_unsigned_v<T>) {
+    return t;
+  }
+  else if constexpr (std::is_arithmetic_v<T> || is_std_complex<T>) {
+    return std::abs(t);
+  }
+#if ! defined(DEFINE_CONJ_REAL_IMAG_FOR_REAL)
+  else if constexpr (requires(T x) {
+      {abs(x)} -> std::convertible_to<T>;
+    }) {
+    return T{abs(t)};
+  }
+#endif
+  else if constexpr (requires(T x) {
+      {sqrt(real(x * conj(x)))} -> std::same_as<decltype(real(x))>;
+    }) {
+    return sqrt(real(t * conj(t)));
+  }
+  else {
+    static_assert(false, "No reasonable way to implement abs(t)");
+  }
+}
+
+struct MyRealNumber {
+  MyRealNumber() = default;
+  MyRealNumber(double value) : value_(value) {}
+
+  double value() const {
+    return value_;
+  }
+
+  friend bool operator==(MyRealNumber, MyRealNumber) = default;
+  friend MyRealNumber operator-(MyRealNumber x) {
+    return {-x.value_};
+  }
+  friend MyRealNumber operator+(MyRealNumber x, MyRealNumber y) {
+    return {x.value_ + y.value_};
+  }
+  friend MyRealNumber operator-(MyRealNumber x, MyRealNumber y) {
+    return {x.value_ - y.value_};
+  }
+  friend MyRealNumber operator*(MyRealNumber x, MyRealNumber y) {
+    return x.value_ * y.value_;
+  }
+
+#if defined(DEFINE_CONJ_REAL_IMAG_FOR_REAL)
+  friend MyRealNumber conj(MyRealNumber x) { return x; }
+  friend MyRealNumber real(MyRealNumber x) { return x; }
+  friend MyRealNumber imag(MyRealNumber x) { return {}; }
+#else
+  friend MyRealNumber abs(MyRealNumber x) { return std::abs(x.value_); }
+#endif
+  friend MyRealNumber sqrt(MyRealNumber x) { return std::sqrt(x.value_); }
+
+private:
+  double value_{};
+};
+
+class MyComplexNumber {
+public:
+  MyComplexNumber(MyRealNumber re, MyRealNumber im = {}) : re_(re), im_(im) {}
+  MyComplexNumber() = default;
+
+  std::complex<double> value() const {
+    return {re_.value(), im_.value()};
+  }
+
+  friend bool operator==(MyComplexNumber, MyComplexNumber) = default;
+  friend MyComplexNumber operator*(MyComplexNumber z, MyComplexNumber w) {
+    return {real(z) * real(w) - imag(z) * imag(w),
+      real(z) * imag(w) + imag(z) * real(w)};
+  }
+  friend MyComplexNumber conj(MyComplexNumber z) { return {real(z), -imag(z)}; }
+  friend MyRealNumber real(MyComplexNumber z) { return z.re_; }
+  friend MyRealNumber imag(MyComplexNumber z) { return z.im_; }
+
+private:
+  MyRealNumber re_{};
+  MyRealNumber im_{};
+};
+
+int main() {
+  [[maybe_unused]] double x1 = two_norm_abs(-4.2);
+  assert(x1 == 4.2);
+
+  [[maybe_unused]] float y0 = two_norm_abs(std::complex<float>{-3.0f, 4.0f});
+  assert(y0 == 5.0f);
+
+  [[maybe_unused]] MyRealNumber r = two_norm_abs(MyRealNumber{-6.7});
+  assert(r == MyRealNumber{6.7});
+
+  [[maybe_unused]] MyRealNumber z = two_norm_abs(MyComplexNumber{-3, 4});
+  assert(z.value() == 5.0);
+ 
+  return 0;
+}
+
+
+ + diff --git a/fix-rankk-rank2k/fix-rankk-rank2k.md b/fix-rankk-rank2k/fix-rankk-rank2k.md index 1c564cfb..b5c59f00 100644 --- a/fix-rankk-rank2k/fix-rankk-rank2k.md +++ b/fix-rankk-rank2k/fix-rankk-rank2k.md @@ -1,7 +1,7 @@ --- title: "Fix C++26 by making the rank-1, rank-2, rank-k, and rank-2k updates consistent with the BLAS" -document: P3371R1 +document: P3371R2 date: today audience: LEWG author: @@ -45,19 +45,23 @@ toc: true * Reorganize and expand nonwording sections +* Revision 2 to be submitted 2024-10-15 + + * For Hermitian matrix rank-1 and rank-k updates, do not constrain the type of the scaling factor `alpha`, as R1 did. Instead, define the algorithms to use _`real-if-needed`_`(alpha)`. Remove exposition-only concept _`noncomplex`_. + # Abstract -The [linalg] functions `matrix_rank_1_update`, `matrix_rank_1_update_c`, `symmetric_rank_1_update`, `hermitian_rank_1_update`, `symmetric_matrix_rank_k_update`, `hermitian_matrix_rank_k_update`, `symmetric_matrix_rank_2k_update`, and `hermitian_matrix_rank_2k_update` currently have behavior inconsistent with their corresponding BLAS (Basic Linear Algebra Subroutines) routines. Also, the behavior of the rank-k and rank-2k updates is inconsistent with that of `matrix_product`, even though in mathematical terms they are special cases of a matrix-matrix product. We propose three fixes. +We propose the following changes to [linalg] that improve consistency of the rank-1, rank-2, rank-k, and rank-2k update functions with the BLAS. -1. Add "updating" overloads to the rank-1, rank-2, rank-k, and rank-2k update functions. The new overloads are analogous to the updating overloads of `matrix_product`. For example, `symmetric_matrix_rank_k_update(A, scaled(beta, C), C, upper_triangle)` will perform $C := \beta C + A A^T$. +1. Add "updating" overloads to all the rank-1, rank-2, rank-k, and rank-2k update functions: general, symmetric, and Hermitian. The new overloads are analogous to the updating overloads of `matrix_product`. For example, `symmetric_matrix_rank_k_update(A, scaled(beta, C), C, upper_triangle)` will perform $C := \beta C + A A^T$. This makes the functions consistent with the BLAS's behavior for nonzero `beta`, and also more consistent with the behavior of `matrix_product` (of which they are mathematically a special case). -2. Change the behavior of the existing rank-1, rank-2, rank-k, and rank-2k update functions to be "overwriting" instead of "unconditionally updating." For example, `symmetric_matrix_rank_k_update(A, C, upper_triangle)` will perform $C = A A^T$ instead of $C := C + A A^T$. +2. Change the behavior of all the existing rank-1, rank-2, rank-k, and rank-2k update functions (general, symmetric, and Hermitian) to be "overwriting" instead of "unconditionally updating." For example, `symmetric_matrix_rank_k_update(A, C, upper_triangle)` will perform $C = A A^T$ instead of $C := C + A A^T$. This makes them consistent with the BLAS's behavior when `beta` is zero. -3. For `hermitian_rank_1_update` and `hermitian_rank_k_update`, constrain the `Scalar` template parameter (if any) to be noncomplex. This ensures that the update will be mathematically Hermitian. (A constraint is not needed for the rank-2 and rank-2k update functions.) +3. For the overloads of `hermitian_rank_1_update` and `hermitian_rank_k_update` that have an `alpha` scaling factor parameter, only use _`real-if-needed`_`(alpha)` in the update. This ensures that the update will be mathematically Hermitian, and makes the behavior well defined if `alpha` has nonzero imaginary part. The change is also consistent with our proposed resolution for LWG 4136 ("Specify behavior of [linalg] Hermitian algorithms on diagonal with nonzero imaginary part"). Items (2) and (3) are breaking changes to the current Working Draft. Thus, we must finish this before finalization of C++26. -# Discussion and proposed changes +# Discussion of proposed changes ## Support both overwriting and updating rank-k and rank-2k updates @@ -175,17 +179,98 @@ These changes make the exposition-only concept _`possibly-packed-inout-matrix`_ Note that this would not eliminate all uses of the exposition-only concept _`inout-matrix`_. The in-place triangular matrix product functions `triangular_matrix_left_product` and `triangular_matrix_right_product`, and the in-place triangular linear system solve functions `triangular_matrix_matrix_left_solve` and `triangular_matrix_matrix_right_solve` would still use _`inout-matrix`_. -## Constrain alpha in Hermitian rank-1 and rank-k updates to be noncomplex +## Use only the real part of scaling factor `alpha` for Hermitian matrix rank-1 and rank-k updates + +For Hermitian rank-1 and rank-k matrix updates, if users provide a scaling factor `alpha`, it must have zero imaginary part. Otherwise, the matrix update will not be Hermitian. Neither the C++ Working Draft nor any other proposal in flight requires this. We propose fixing it by making these update algorithms only use the real part of `alpha`, as in _`real-if-needed`_`(alpha)`. This solution is consistent with our proposed resolution of LWG Issue 4136, "Specify behavior of [linalg] Hermitian algorithms on diagonal with nonzero imaginary part," where we make Hermitian rank-1 and rank-k matrix updates use only the real part of matrices' diagonals. + +We can think of at least three alternative solutions, and will explain here why we did not choose them. The first is to constrain `alpha` to have a noncomplex number type, the second is to exclude `std::complex` but not try to define a "noncomplex number" constraint, and the third is to impose a precondition that the imaginary part of `alpha` is zero. + +### Justification: Scaling factor needs to be noncomplex, else update may be non-Hermitian + +The C++ Working Draft already has `Scalar alpha` overloads of `hermitian_rank_k_update`. The `Scalar` type currently can be complex. However, if `alpha` has nonzero imaginary part, then $\alpha A A^H$ may no longer be a Hermitian matrix, even though $A A^H$ is mathematically always Hermitian. For example, if $A$ is the identity matrix (with ones on the diagonal and zeros elsewhere) and $\alpha = i$ (the imaginary unit, which is the square root of negative one), then $\alpha A A^H$ is the diagonal matrix whose diagonal elements are all $i$. While that matrix is symmetric, it is not Hermitian, because all elements on the diagonal of a Hermitian matrix must have nonzero imaginary part. The rank-1 update function `hermitian_rank_1_update` has the analogous issue. + +### Alternatives + +We can think of at least four different ways to solve this problem, and will explain why we did not choose those solutions. + +1. Constrain `alpha` by defining a generic "noncomplex number type" constraint + +2. Only constrain `alpha` not to be `std::complex`; do not try to define a generic "noncomplex number" constraint + +3. Constrain `alpha` by default using a generic "noncomplex number type" constraint, but let users specialize an "opt-out trait" to tell the library that their number type is noncomplex + +4. Impose a precondition that the imaginary part of `alpha` is zero + +If we had to pick one of these solutions, we would favor first (4), then (3), and then (1). We would object most to (2). + +#### Alternative: Constrain `alpha` via generic "noncomplex" constraint + +The BLAS solves this problem by having the Hermitian rank-1 update routines `xHER` and rank-k update routines `xHERK` take the scaling factor $\alpha$ as a noncomplex number. This suggests constraining `alpha`'s type to be noncomplex. However, [linalg] accepts user-defined number types, including user-defined complex number types. How do we define a "noncomplex number type"? If we get it wrong and say that a number type is complex when its "imaginary part" is always zero, we end up excluding a perfectly valid custom number type. + +For number types that have an additive identity (like zero for the integers and reals), it's mathematically correct to treat those as "complex numbers" whose imaginary parts are always zero. This is what `conjugated_accessor` does if you give it a number type for which ADL cannot find `conj`. P3050 optimizes `conjugated` for such number types by bypassing `conjugated_accessor` and using `default_accessor` instead, but this is just a code optimization. Therefore, it can afford to be conservative: if a number type _might_ be complex, then `conjugated` needs to use `conjugated_accessor`. This is why P3050's approach doesn't apply here. If a number type has ADL-findable `conj`, `real`, and `imag`, then it _might_ be complex. However, if we define the opposite of that as "noncomplex," then we might be preventing users from using otherwise reasonable number types. + +The following `MyRealNumber` example always has zero imaginary part, but nevertheless has ADL-findable `conj`, `real`, and `imag`. Furthermore, it has a constructor for which `MyRealNumber(1.2, 3.4)` is well-formed. (This is an unfortunate design choice; making `precision` have class type that is not implicitly convertible from `double` would be wiser, so that users would have to type `MyRealNumber(1.2, Precision(42))`.) As a result, there is no reasonable way to tell at compile time if `MyRealNumber` might represent a complex number. -### Scaling factor alpha needs to be noncomplex, else update may be non-Hermitian +```c++ +class MyRealNumber { +public: + explicit MyRealNumber(double initial_value); + // precision represents the amount of storage + // used by an arbitrary-precision real number object + explicit MyRealNumber(double initial_value, int precision); + + MyRealNumber(); // result is the additive identity "zero" + + // ... other members to make MyRealNumber regular ... + // ... hidden friend overloaded arithmetic operators ... + + friend MyRealNumber conj(MyRealNumber x) { return x; } + friend MyRealNumber real(MyRealNumber x) { return x; } + friend MyRealNumber imag(MyRealNumber) { return {}; } +}; +``` + +It's reasonable to write custom noncomplex number types that define ADL-findable `conj`, `real`, and `imag`. First, users may want to write or use libraries of generic numerical algorithms that work for both complex and noncomplex number types. P1673 argues that defining `conj` to be type-preserving (unlike `std::conj` in the C++ Standard Library) makes this possible. For example, Appendix A below shows how to implement a generic two-norm absolute value (or magnitude, for complex numbers) function, using this interface. Second, the Standard Library might lead users to write a type like this. `std::conj` accepts arguments of any integer or floating-point type, none of which represent complex numbers. The Standard makes the unfortunate choice for `std::conj` of an integer type to return `std::complex`. However, users who try to make `conj(MyRealNumber)` return `std::complex` would find out that `std::complex` does not compile, because `std::complex` requires that `T` be a floating-point type. The least-effort next step would be to make `conj(MyRealNumber)` return `MyRealNumber`. + +We want rank-1 and rank-k Hermitian matrix updates to work with types like `MyRealNumber`, but any reasonable way to constrain the type of `alpha` would exclude `MyRealNumber`. -The C++ Working Draft already has `Scalar alpha` overloads of `hermitian_rank_k_update`. The `Scalar` type currently can be complex. However, if `alpha` has nonzero imaginary part, then $\alpha A A^H$ may no longer be a Hermitian matrix, even though $A A^H$ is mathematically always Hermitian. For example, if $A$ is the identity matrix (with ones on the diagonal and zeros elsewhere) and $\alpha = i$, then $\alpha A A^H$ is the diagonal matrix whose diagonal elements are all $i$. While that matrix is symmetric, it is not Hermitian, because all elements on the diagonal of a Hermitian matrix must have nonzero imaginary part. The rank-1 update function `hermitian_rank_1_update` has the analogous issue. +#### Alternative: Only constrain `alpha` not to be `std::complex` -The BLAS solves this problem by having the Hermitian rank-1 update routines `xHER` and rank-k update routines `xHERK` take the scaling factor $\alpha$ as a noncomplex number. This suggests a fix: For all `hermitian_rank_1_update` and `hermitian_rank_k_update` overloads that take `Scalar alpha`, constrain `Scalar` so that it is noncomplex. We can avoid introducing new undefined behavior (or "valid but unspecified" elements of the output matrix) by making "noncomplex" a constraint on the `Scalar` type of `alpha`. "Noncomplex" should follow the definition of "noncomplex" used by _`conj-if-needed`_: either an arithmetic type, or `conj(E)` is not ADL-findable for an expression `E` of type `Scalar`. +A variant of this suggestion would be only to constrain `alpha` not to be `std::complex`, and not try to define a generic "noncomplex number" constraint. However, this would break generic numerical algorithms by making valid code for the non-Standard complex number case invalid code for `std::complex`. We do not want [linalg] to treat custom complex number types differently than `std::complex`. + +#### Alternative: Constrain `alpha` by default, but let users "opt out" + +Another option would be to constrain `alpha` by default using the same generic "noncomplex number type" constraint as in Option (1), but let users specialize an "opt-out trait" to tell the library that their number type is noncomplex. We would add a new "tag" type trait `is_noncomplex` that users can specialize. By default, `is_noncomplex::value` is `false` for any type `T`. This does _not_ mean that the type is complex, just that the user declares their type to be noncomplex. The distinction matters, because a noncomplex number type might still provide ADL-findable `conj`, `real`, and `imag`, as we showed above. Users must take positive action to declare their type `U` as "not a complex number type," by specializing `is_noncomplex` so that `is_noncomplex::value` is `true`. If users do that, then the library will ignore any ADL-findable functions `conj`, `real`, and `imag` (whether or not they exist), and will assume that the number type is noncomplex. + +Standard Library precedent for this approach is in heterogeneous lookup for associative containers (see N3657 for ordered associative containers, and P0919 and P1690 for unordered containers). User-defined hash functions and key equality comparison functions can tell the container to provide heterogeneous comparisons by exposing a `static constexpr bool is_transparent` whose value is `true`. Default behavior does not expose heterogeneous comparisons. Thus, users must opt in at compile time to assert something about their user-defined types. + +Of the three constraint-based approaches discussed in this proposal, we favor this one the most. It still treats types "as they are" and does not permit users to claim that a type is complex when it lacks the needed operations, but it lets users optimize by giving the Standard Library a single bit of compile-time information. By default, any linear algebra value type (see [linalg.reqs.val]) that meets the `maybe_complex` concept below would be considered "possibly complex." Types that do not meet this concept would result in compilation errors; users would then be able to search documentation or web pages to find out that they need to specialize `is_noncomplex`. + +```c++ +template +concept maybe_complex = + std::semiregular && + requires(T t) { + {conj(t)} -> T; + {real(t)} -> std::convertible_to; + {imag(t)} -> std::convertible_to; +``` + +P1673 generally avoids approaches based on specializing traits. Its design philosophy favors treating types as they are. Users should not need to do something to get correct behavior. We based this off our past experiences in generic numerical algorithms development. In the 2010's, one of the authors maintained a generic mathematical algorithms library called Trilinos. The Teuchos (pronounced "TEFF-os") package of Trilinos provides a monolithic `ScalarTraits` class template that defines different properties of a number type. It combines the features of `std::numeric_limits` with generic complex arithmetic operations like `conjugate`, `real`, and `imag`. Trilinos' generic algorithms assume that number types are regular and define overloaded `+`, `-`, `*`, and `/`, but use `ScalarTraits::conjugate`, `ScalarTraits::real`, and `ScalarTraits::imag`. As a result, users with a custom complex number type had to specialize `ScalarTraits` and provide all these operations. Even if users had imitated `std::complex`'s interface perfectly and provided ADL-findable `conj`, `real`, and `imag`, users had to do extra work to make Trilinos compile and run correctly for their numbers. With P1673, we decided instead that users who define a custom complex number type with an interface sufficiently like `std::complex` should get reasonable behavior without needing to do anything else. + +As a tangent, we would like to comment on the monolithic design of `Teuchos::ScalarTraits`. The monolithic design was partly an imitation of `std::numeric_limits`, and partly a side effect of a requirement to support pre-C++11 compilers that did not permit partial specialization of function templates. (The typical pre-C++11 work-around is to define an unspecialized function template that dispatches to a possibly specialized class template.) C++11 permits partial specialization of function templates and C++14 introduces variable templates; these features have encouraged "breaking up" monolithic traits classes into separate traits. Our paper P1370R1 ("Generic numerical algorithm development with(out) `numeric_limits`") aligns with this trend. + +#### Alternative: Impose precondition on `alpha` + +Another option would be to impose a precondition that _`imag-if-needed`_`(alpha)` is zero. However, this would be inconsistent with our proposed resolution of LWG Issue 4136, "Specify behavior of [linalg] Hermitian algorithms on diagonal with nonzero imaginary part". WG21 members have expressed wanting _fewer_ preconditions and _less_ undefined behavior in the Standard Library. + +If users call Hermitian matrix rank-1 or rank-k updates with `alpha` being `std::complex` or `std::complex`, implementations of [linalg] that call an underlying C or Fortran BLAS would have to get the real part of `alpha` anyway, because these BLAS routines only take `alpha` as a real type. Thus, our proposed solution -- to _define_ the behavior of the update algorithms as using _`real-if-needed`_`(alpha)` -- would not add overhead. + +## Things relating to scaling factors that we do not propose changing ### Nothing wrong with rank-2 or rank-2k updates -This issue does *not* arise with the rank-2 or rank-2k updates. In the BLAS, the rank-2 updates `xHER2` and the rank-2k updates `xHER2K` all take `alpha` as a complex number. The matrix $\alpha A B^H + \bar{\alpha} B A^H$ is Hermitian by construction, so there's no need to impose a precondition on the value of $\alpha$. +This issue does *not* arise with the rank-2 or rank-2k updates. In the BLAS, the rank-2 updates `xHER2` and the rank-2k updates `xHER2K` all take `alpha` as a complex number. The corresponding [linalg] functions do not take a separate `alpha` parameter, because users can pass it in via `scaled`, attached either to `A` or `B`: e.g., `scaled(alpha, A)`. The matrix $\alpha A B^H + \bar{\alpha} B A^H$ is Hermitian by construction, no matter the value of $\alpha$. ### Nothing wrong with scaling factor beta @@ -214,7 +299,7 @@ The current [linalg] wording requires that the input matrix be Hermitian. This auto alpha = std::complex{0.0, 1.0}; hermitian_matrix_vector_product(scaled(alpha, A), upper_triangle, x, y); ``` -Note that the behavior of this is still well defined, at least after applying the fix proposed in LWG4136 for diagonal elements with nonzero imaginary part. It does not violate a precondition. Therefore, the Standard has no way to tell the user that they did something wrong. +Note that the behavior of this is still otherwise well defined, at least after applying the fix proposed in LWG4136 for diagonal elements with nonzero imaginary part. It does not violate a precondition. Therefore, the Standard has no way to tell the user that they did something wrong. #### Status quo permits scaling via the input vector @@ -247,7 +332,7 @@ auto alpha = user_complex{something, something_else}; hermitian_matrix_vector_product(N, upper_triangle, scaled(alpha, x), y); ``` -The [linalg] library was designed to support element types with noncommutative multiplication. On the other hand, generally, if we speak of Hermitian matrices or even of inner products (which are used to define Hermitian matrices), we're working in a vector space. This means that multiplication of the matrix's elements is commutative. Anything more general than that is far beyond what the BLAS can do. Thus, we think restricting use of `alpha` with nonzero imaginary part to `scaled(alpha, x)` is not so onerous. +The [linalg] library was designed to support element types with noncommutative multiplication. On the other hand, generally, if we speak of Hermitian matrices or even of inner products (which are used to define Hermitian matrices), we're working in a vector space. This means that multiplication of the matrix's elements is commutative. Anything more general than that is far beyond what the BLAS can do. Thus, we think restricting use of `alpha` with nonzero imaginary part to `scaled(alpha, x)` is not so onerous. #### Scaling via the input vector is weird, but the least bad choice @@ -310,27 +395,7 @@ We do not support this approach. First, it would introduce many overloads, with Second, `alpha` overloads would not prevent users from *also* supplying `scaled(gamma, A)` as the matrix for some other scaling factor `gamma`. Thus, instead of solving the problem, the overloads would introduce more possibilities for errors. -### What if `Scalar` is noncomplex but `conj` is ADL-findable? - -Our proposed change defines a "noncomplex number" at compile time. We say that complex numbers have `conj` that is findable by ADL, and noncomplex numbers are either arithmetic types or do not have an ADL-findable `conj`. We choose this definition because it is the same one that we use to define the behavior of `conjugated_accessor` (and also `conjugated`, if P3050 is adopted). It also is the C++ analog of what the BLAS does, namely specify the type of the `alpha` argument as real instead of complex. - -This definition is conservative, because it excludes complex numbers with zero imaginary part. For `conjugated_accessor` and `conjugated`, this does not matter; the class and function behave the same from the user's perspective. The exposition-only function _`conj-if-needed`_ specifically exists so that `conjugated_accessor` and `conjugated` do not change their input `mdspan`'s `value_type`. However, for the rank-1 and rank-k Hermitian update functions affected by this proposal, constraining `Scalar alpha` at compile time to be noncomplex prevents users from calling those functions with a "complex" number `alpha` whose imaginary part is zero. - -This matters if the user defines a number type `Real` that is meant to represent noncomplex numbers, but nevertheless has an ADL-findable `conj`, thus making it a "complex" number type from the perspective of [linalg] functions. There are two ways users might define `conj(Real)`. - -1. *Imitating* `std::complex`: Users might define a complex number type `UserComplex` whose real and imaginary parts have type `Real`, and then imitate the behavior of `std::conj(double)` by defining `UserComplex conj(Real x)` to return a `UserComplex` number with real part `x` and imaginary part zero. - -2. *Type-preserving*: `Real conj(Real x)` returns `x`. - -Option (1) would be an unfortunate choice. [linalg] defines _`conj-if-needed`_ specifically to fix the problem that `std::conj(double)` returns `std::complex` instead of `double`. However, Option (2) would be a reasonable thing for users to do, especially if they have designed custom number types without [linalg] in mind. One could accommodate such users by relaxing the constraint on `Scalar` and taking one of the following two approaches. - -1. Adding a precondition that _`imag-if-needed`_`(alpha)` equals `Scalar{}` - -2. Imitating LWG 4136, by defining the scaling factor to be _`real-if-needed`_`(alpha)` instead of `alpha` - -We did not take Approach (1), because adding a precondition decreases safety by adding undefined behavior. It also forces users to add run-time checks. Defining those checks correctly for generic, possibly but not necessarily complex number types would be challenging. We did not take Approach (2) because its behavior would deviate from the BLAS, which requires the scaling factor `alpha` to be noncomplex at compile time. - -## Triangular matrix products, unit diagonals, and scaling factors +### Triangular matrix products, unit diagonals, and scaling factors 1. In BLAS, triangular matrix-vector and matrix-matrix products apply `alpha` scaling to the implicit unit diagonal. In [linalg], the scaling factor `alpha` is not applied to the implicit unit diagonal. This is because the library does not interpret `scaled(alpha, A)` differently than any other `mdspan`. @@ -342,19 +407,19 @@ We did not take Approach (1), because adding a precondition decreases safety by 5. Therefore, we do not consider fixing this a high-priority issue, and we do not propose a fix for it in this paper. -### BLAS applies alpha after unit diagonal; linalg applies it before +#### BLAS applies alpha after unit diagonal; linalg applies it before The `triangular_matrix_vector_product` and `triangular_matrix_product` algorithms have an `implicit_unit_diagonal` option. This makes the algorithm not access the diagonal of the matrix, and compute as if the diagonal were all ones. The option corresponds to the BLAS's "Unit" flag. BLAS routines that take both a "Unit" flag and an `alpha` scaling factor apply "Unit" *before* scaling by `alpha`, so that the matrix is treated as if it has a diagonal of all `alpha` values. In contrast, [linalg] follows the general principle that `scaled(alpha, A)` should be treated like any other kind of `mdspan`. As a result, algorithms interpret `implicit_unit_diagonal` as applied to the matrix *after* scaling by `alpha`, so that the matrix still has a diagonal of all ones. -### Triangular solve algorithms not affected +#### Triangular solve algorithms not affected The triangular solve algorithms in std::linalg are not affected, because their BLAS analogs either do not take an `alpha` argument (as with `xTRSV`), or the `alpha` argument does not affect the triangular matrix (with `xTRSM`, `alpha` affects the right-hand sides `B`, not the triangular matrix `A`). -### Triangular matrix-vector product work-around +#### Triangular matrix-vector product work-around This issue only reduces functionality of `triangular_matrix_product`. Users of `triangular_matrix_vector_product` who wish to replicate the original BLAS functionality can scale the input matrix (by supplying `scaled(alpha, x)` instead of `x` as the input argument) instead of the triangular matrix. -### Triangular matrix-matrix product example +#### Triangular matrix-matrix product example The following example computes $A := 2 A B$ where $A$ is a lower triangular matrix, but it makes the diagonal of $A$ all ones on the input (right-hand) side. ```c++ @@ -371,7 +436,7 @@ scale(2.0, A); ``` This is counterintuitive, and may also affect performance. Performance of `scale` is typically bound by memory bandwidth and/or latency, but if the work done by `scale` could be fused with the work done by the `triangular_matrix_product`, then `scale`'s memory operations could be "hidden" in the cost of the matrix product. -### LAPACK never calls `xTRMM` with the implicit unit diagonal option and `alpha` not equal to one +#### LAPACK never calls `xTRMM` with the implicit unit diagonal option and `alpha` not equal to one How much might users care about this missing [linalg] feature? P1673R13 explains that the BLAS was codesigned with LAPACK and that every reference BLAS routine is used by some LAPACK routine. "The BLAS does not aim to provide a complete set of mathematical operations. Every function in the BLAS exists because some LINPACK or LAPACK algorithm needs it" (Section 10.6.1). Therefore, to judge the urgency of adding new functionality to [linalg], we can ask whether the functionality would be needed by a C++ re-implementation of LAPACK. We think not much, because the highest-priority target audience of the BLAS is LAPACK developers, and LAPACK routines (other than testing routines) never use a scaling factor alpha other than one. @@ -389,11 +454,11 @@ We survey calls to `xTRMM` in the latest version of LAPACK as of the publication The only routines that call `DTRMM` with `alpha` equal to anything other than one or negative one are the testing routines. Some calls in `DGELQT3` and `DLARFB_GETT` use negative one, but these calls never specify an implicit unit diagonal (they use the explicit diagonal option). The only routine that might possibly call `DTRMM` with both negative one as alpha and the implicit unit diagonal is `DTFTRI`. (This routine "computes the inverse of a triangular matrix A stored in RFP [Rectangular Full Packed] format." RFP format was introduced to LAPACK in the late 2000's, well after the BLAS Standard was published. See LAPACK Working Note 199, which was published in 2008.) `DTFTRI` passes its caller's `diag` argument (which specifies either implicit unit diagonal or explicit diagonal) to `DTRMM`. The only two LAPACK routines that call `DTFTRI` are `DERRRFP` (a testing routine) and `DPFTRI`. `DPFTRI` only ever calls `DTFTRI` with `diag` *not* specifying the implicit unit diagonal option. Therefore, LAPACK never needs both `alpha` not equal to one and the implicit unit diagonal option, so adding the ability to "scale the implicit diagonal" in [linalg] is a low-priority feature. -### Fixes would not break backwards compatibility +#### Fixes would not break backwards compatibility We can think of two ways to fix this issue. First, we could add an `alpha` scaling parameter, analogous to the symmetric and Hermitian rank-1 and rank-k update functions. Second, we could add a new kind of `Diagonal` template parameter type that expresses a "diagonal value." For example, `implicit_diagonal_t{alpha}` (or a function form, `implicit_diagonal(alpha)`) would tell the algorithm not to access the diagonal elements, but instead to assume that their value is `alpha`. Both of these solutions would let users specify the diagonal's scaling factor separately from the scaling factor for the rest of the matrix. Those two scaling factors could differ, which is new functionality not offered by the BLAS. More importantly, both of these solutions could be added later, after C++26, without breaking backwards compatibility. -## Triangular solves, unit diagonals, and scaling factors +### Triangular solves, unit diagonals, and scaling factors 1. In BLAS, triangular solves with possibly multiple right-hand sides (`xTRSM`) apply `alpha` scaling to the implicit unit diagonal. In [linalg], the scaling factor `alpha` is not applied to the implicit unit diagonal. This is because the library does not interpret `scaled(alpha, A)` differently than any other `mdspan`. @@ -405,13 +470,13 @@ We can think of two ways to fix this issue. First, we could add an `alpha` scal 5. Therefore, we do not consider fixing this a high-priority issue, and we do not propose a fix for it in this paper. -### BLAS applies alpha after unit diagonal; linalg applies it before +#### BLAS applies alpha after unit diagonal; linalg applies it before Triangular solves have a similar issue to the one explained in the previous section. The BLAS routine `xTRSM` applies `alpha` "after" the implicit unit diagonal, while std::linalg applies `alpha` "before." (`xTRSV` does not take an `alpha` scaling factor.) As a result, the BLAS solves with a different matrix than std::linalg. In mathematical terms, `xTRSM` solves the equation $\alpha (A + I) X = B$ for $X$, where $A$ is the user's input matrix (without implicit unit diagonal) and $I$ is the identity matrix (with ones on the diagonal and zeros everywhere else). `triangular_matrix_matrix_left_solve` solves the equation $(\alpha A + I) Y = B$ for $Y$. The two results $X$ and $Y$ are not equal in general. -### Work-around requires changing all elements of the matrix +#### Work-around requires changing all elements of the matrix Users could work around this problem by first scaling the matrix $A$ by $\alpha$, and then solving for $Y$. In the common case where the "other triangle" of $A$ holds another triangular matrix, users could not call `scale(alpha, A)`. They would instead need to iterate over the elements of $A$ manually. Users might also need to "unscale" the matrix after the solve. Another option would be to copy the matrix $A$ before scaling. ```c++ @@ -429,11 +494,11 @@ for (size_t i = 0; i < A.extent(0); ++i) { ``` Users cannot solve this problem by scaling $B$ (either with `scaled(1.0 / alpha, B)` or with `scale(1.0 / alpha, B)`). Transforming $X$ into $Y$ or vice versa is mathematically nontrivial in general, and may introduce new failure conditions. This issue occurs with both the in-place and out-of-place triangular solves. -### Unsupported case occurs in LAPACK +#### Unsupported case occurs in LAPACK The common case in LAPACK is calling `xTRSM` with `alpha` equal to one, but other values of `alpha` occur. For example, `xTRTRI` calls `xTRSM` with `alpha` equal to $-1$. Thus, we cannot dismiss this issue, as we could with `xTRMM`. -### Fixes would not break backwards compatibility +#### Fixes would not break backwards compatibility As with triangular matrix products above, we can think of two ways to fix this issue. First, we could add an `alpha` scaling parameter, analogous to the symmetric and Hermitian rank-1 and rank-k update functions. Second, we could add a new kind of `Diagonal` template parameter type that expresses a "diagonal value." For example, `implicit_diagonal_t{alpha}` (or a function form, `implicit_diagonal(alpha)`) would tell the algorithm not to access the diagonal elements, but instead to assume that their value is `alpha`. Both of these solutions would let users specify the diagonal's scaling factor separately from the scaling factor for the rest of the matrix. Those two scaling factors could differ, which is new functionality not offered by the BLAS. More importantly, both of these solutions could be added later, after C++26, without breaking backwards compatibility. @@ -453,6 +518,28 @@ We also have two outstanding LWG issues. * LWG4137, "Fix Mandates, Preconditions, and Complexity elements of [linalg] algorithms," affects several sections touched by this proposal, including [linalg.algs.blas3.rankk] and [linalg.algs.blas3.rank2k]. We consider P3371 rebased atop the wording changes proposed by LWG4137. While the wording changes may conflict in a formal ("diff") sense, it is our view that they do not conflict in a mathematical or specification sense. +# Implementation status + +The following function overload sets need changing. + +* `matrix_rank_1_update` +* `matrix_rank_1_update_c` +* `symmetric_matrix_rank_1_update` +* `hermitian_matrix_rank_1_update` +* `symmetric_matrix_rank_2_update` +* `hermitian_matrix_rank_2_update` +* `symmetric_matrix_rank_k_update` +* `hermitian_matrix_rank_k_update` +* `symmetric_matrix_rank_2k_update` +* `hermitian_matrix_rank_2k_update` + +As of 2024/10/04, Pull request 293 in the reference std::linalg implementation implements changes to the following functions, and adds tests to ensure test coverage of the new overloads. + +* `matrix_rank_1_update` +* `matrix_rank_1_update_c` +* `symmetric_matrix_rank_1_update` +* `hermitian_matrix_rank_1_update` + # Acknowledgments Many thanks (with permission) to Raffaele Solcà (CSCS Swiss National Supercomputing Centre, `raffaele.solca@cscs.ch`) for pointing out some of the issues fixed by this paper, as well as the issues leading to LWG4137. @@ -506,48 +593,17 @@ Many thanks (with permission) to Raffaele Solcà (CSCS Swiss National Supercompu > Then, remove the definition of the exposition-only concept _`possibly-packed-inout-matrix`_ from **[linalg.helpers.concepts]**. -## New exposition-only concept for noncomplex numbers - -> In the Header `` synopsis [linalg.syn], at the end of the section -> started by the following comment: -> -> `// [linalg.helpers.concepts], linear algebra argument concepts`, -> -> add the following declaration of the exposition-only concept _`noncomplex`_. -> [Editorial Note: This addition is not shown in green, becuase the authors could not convince Markdown to format the code correctly. -- end note] - -```c++ -template - concept @_noncomplex_@ = @_see below_@; // exposition only -``` - -> In [linalg.helpers.concepts], change paragraph 3 to read as follows (new content in green). - -Unless explicitly permitted, any _`inout-vector`_, _`inout-matrix`_, _`inout-object`_, _`out-vector`_, _`out-matrix`_, _`out-object`_, _`possibly-packed-out-matrix`_, or _`possibly-packed-inout-matrix`_ parameter of a function in [linalg] shall not overlap any other `mdspan` parameter of the function. - -> Append the following to the end of [linalg.helpers.concepts]. -> [Editorial Note: These additions are not shown in green, becuase the authors could not convince Markdown to format the code correctly. -- end note] - -```c++ -template - concept @_noncomplex_@ = @_see below_@; -``` - -[4]{.pnum} A type `T` models _`noncomplex`_ if `T` is a linear algebra value type, and either +> Then, in [linalg.helpers.concepts], change paragraph 3 to read as follows (new content "or _`possibly-packed-out-matrix`_" in green; removed content "or _`possibly-packed-inout-matrix`_" in red). -[4.1]{.pnum} `T` is not an arithmetic type, or - -[4.2]{.pnum} the expression `conj(E)` is not valid, with overload resolution performed in a context that includes the declaration `template T conj(const T&) = delete;`. +Unless explicitly permitted, any _`inout-vector`_, _`inout-matrix`_, _`inout-object`_, _`out-vector`_, _`out-matrix`_, _`out-object`_, or _`possibly-packed-out-matrix`_, or _`possibly-packed-inout-matrix`_ parameter of a function in [linalg] shall not overlap any other `mdspan` parameter of the function. ## Rank-1 update functions in synopsis > In the header `` synopsis **[linalg.syn]**, replace all the declarations of all the `matrix_rank_1_update`, `matrix_rank_1_update_c`, `symmetric_matrix_rank_1_update`, and `hermitian_matrix_rank_1_update` overloads to read as follows. > [Editorial Note: -> There are three changes here. +> There are two changes here. > First, the existing overloads become "overwriting" overloads. > Second, new "updating" overloads are added. -> Third, the `hermitian_rank_1_update` functions that take an `alpha` parameter -> now constrain `alpha` to be _`noncomplex`_. > > Changes do not use red or green highlighting, becuase the authors could not convince Markdown to format the code correctly. > -- end note] @@ -612,10 +668,10 @@ template InVec x, InMat E, OutMat A, Triangle t); // overwriting Hermitian rank-1 matrix update - template<@_noncomplex_@ Scalar, @_in-vector_@ InVec, @_possibly-packed-out-matrix_@ OutMat, class Triangle> + template void hermitian_matrix_rank_1_update(Scalar alpha, InVec x, OutMat A, Triangle t); template + class Scalar, @_in-vector_@ InVec, @_possibly-packed-out-matrix_@ OutMat, class Triangle> void hermitian_matrix_rank_1_update(ExecutionPolicy&& exec, Scalar alpha, InVec x, OutMat A, Triangle t); template<@_in-vector_@ InVec, @_possibly-packed-out-matrix_@ OutMat, class Triangle> @@ -626,10 +682,10 @@ template InVec x, OutMat A, Triangle t); // updating Hermitian rank-1 matrix update - template<@_noncomplex_@ Scalar, @_in-vector_@ InVec, @_possibly-packed-in-matrix_@ InMat, @_possibly-packed-out-matrix_@ OutMat, class Triangle> + template void hermitian_matrix_rank_1_update(Scalar alpha, InVec x, InMat E, OutMat A, Triangle t); template + class Scalar, @_in-vector_@ InVec, @_possibly-packed-in-matrix_@ InMat, @_possibly-packed-out-matrix_@ OutMat, class Triangle> void hermitian_matrix_rank_1_update(ExecutionPolicy&& exec, Scalar alpha, InVec x, InMat E, OutMat A, Triangle t); template<@_in-vector_@ InVec, @_possibly-packed-in-matrix_@ InMat, @_possibly-packed-out-matrix_@ OutMat, class Triangle> @@ -755,13 +811,13 @@ template InMat1 A, InMat2 E, OutMat C, Triangle t); // overwriting rank-k Hermitian matrix update - template<@_noncomplex_@ Scalar, + template void hermitian_matrix_rank_k_update( Scalar alpha, InMat A, OutMat C, Triangle t); - template @@ -780,7 +836,7 @@ template ExecutionPolicy&& exec, InMat A, OutMat C, Triangle t); // updating rank-k Hermitian matrix update - template<@_noncomplex_@ Scalar, + template void hermitian_matrix_rank_k_update( Scalar alpha, InMat1 A, InMat2 E, OutMat C, Triangle t); - template @@ -1033,17 +1089,17 @@ template +template void hermitian_matrix_rank_1_update(Scalar alpha, InVec x, OutMat A, Triangle t); template + class Scalar, @_in-vector_@ InVec, @_possibly-packed-out-matrix_@ OutMat, class Triangle> void hermitian_matrix_rank_1_update(ExecutionPolicy&& exec, Scalar alpha, InVec x, OutMat A, Triangle t); ``` [15]{.pnum} These functions perform an overwriting Hermitian rank-1 update of the Hermitian matrix `A`, taking into account the `Triangle` parameter that applies to `A` ([linalg.general]). -[16]{.pnum} *Effects*: Computes $A = \alpha x x^H$, where the scalar $\alpha$ is `alpha`. +[16]{.pnum} *Effects*: Computes $A = \alpha x x^H$, where the scalar $\alpha$ is _`real-if-needed`_`(alpha)`. ```c++ template<@_in-vector_@ InVec, @_possibly-packed-out-matrix_@ OutMat, class Triangle> @@ -1058,17 +1114,17 @@ template +template void hermitian_matrix_rank_1_update(Scalar alpha, InVec x, InMat E, OutMat A, Triangle t); template + class Scalar, @_in-vector_@ InVec, @_possibly-packed-in-matrix_@ InMat, @_possibly-packed-out-matrix_@ OutMat, class Triangle> void hermitian_matrix_rank_1_update(ExecutionPolicy&& exec, Scalar alpha, InVec x, InMat E, OutMat A, Triangle t); ``` [19]{.pnum} These functions perform an updating Hermitian rank-1 update of the Hermitian matrix `A` using the Hermitian matrix `E`, taking into account the `Triangle` parameter that applies to `A` and `E` ([linalg.general]). -[20]{.pnum} *Effects*: Computes $A = E + \alpha x x^H$, where the scalar $\alpha$ is `alpha`. +[20]{.pnum} *Effects*: Computes $A = E + \alpha x x^H$, where the scalar $\alpha$ is _`real-if-needed`_`(alpha)`. ```c++ template<@_in-vector_@ InVec, @_possibly-packed-in-matrix_@ InMat, @_possibly-packed-out-matrix_@ OutMat, class Triangle> @@ -1270,7 +1326,7 @@ void symmetric_matrix_rank_k_update( Computes $C = A A^T$. ```c++ -template<@_noncomplex_@ Scalar, +template @@ -1280,7 +1336,7 @@ void hermitian_matrix_rank_k_update( OutMat C, Triangle t); template @@ -1294,7 +1350,7 @@ void hermitian_matrix_rank_k_update( [7]{.pnum} *Effects:* Computes $C = \alpha A A^H$, -where the scalar $\alpha$ is `alpha`. +where the scalar $\alpha$ is _`real-if-needed`_`(alpha)`. ```c++ template<@_in-matrix_@ InMat, @@ -1347,7 +1403,7 @@ void symmetric_matrix_rank_k_update( [9]{.pnum} *Effects:* Computes $C = E + \alpha A A^T$, -where the scalar $\alpha$ is `alpha`. +where the scalar $\alpha$ is _`real-if-needed`_`(alpha)`. ```c++ template<@_in-matrix_@ InMat1, @@ -1376,7 +1432,7 @@ void symmetric_matrix_rank_k_update( Computes $C = E + A A^T$. ```c++ -template<@_noncomplex_@ Scalar, +templateThis Compiler Explorer link demonstrates the implementation. This is not meant to show an ideal implementation. (A better one would use rescaling, in the manner of `std::hypot`, to avoid undue overflow or underflow.) Instead, it illustrates generic numerical algorithm development. Commenting out the `#define DEFINE_CONJ_REAL_IMAG_FOR_REAL 1` line shows that without ADL-findable `conj`, `real`, and `imag`, users' generic numerical algorithms would need more special cases and more assumptions on number types. + +```c++ +#include +#include +#include +#include +#include + +#define DEFINE_CONJ_REAL_IMAG_FOR_REAL 1 + +template +constexpr bool is_std_complex = false; +template +constexpr bool is_std_complex> = true; + +template +auto two_norm_abs(T t) { + if constexpr (std::is_unsigned_v) { + return t; + } + else if constexpr (std::is_arithmetic_v || is_std_complex) { + return std::abs(t); + } +#if ! defined(DEFINE_CONJ_REAL_IMAG_FOR_REAL) + else if constexpr (requires(T x) { + {abs(x)} -> std::convertible_to; + }) { + return T{abs(t)}; + } +#endif + else if constexpr (requires(T x) { + {sqrt(real(x * conj(x)))} -> std::same_as; + }) { + return sqrt(real(t * conj(t))); + } + else { + static_assert(false, "No reasonable way to implement abs(t)"); + } +} + +struct MyRealNumber { + MyRealNumber() = default; + MyRealNumber(double value) : value_(value) {} + + double value() const { + return value_; + } + + friend bool operator==(MyRealNumber, MyRealNumber) = default; + friend MyRealNumber operator-(MyRealNumber x) { + return {-x.value_}; + } + friend MyRealNumber operator+(MyRealNumber x, MyRealNumber y) { + return {x.value_ + y.value_}; + } + friend MyRealNumber operator-(MyRealNumber x, MyRealNumber y) { + return {x.value_ - y.value_}; + } + friend MyRealNumber operator*(MyRealNumber x, MyRealNumber y) { + return x.value_ * y.value_; + } + +#if defined(DEFINE_CONJ_REAL_IMAG_FOR_REAL) + friend MyRealNumber conj(MyRealNumber x) { return x; } + friend MyRealNumber real(MyRealNumber x) { return x; } + friend MyRealNumber imag(MyRealNumber x) { return {}; } +#else + friend MyRealNumber abs(MyRealNumber x) { return std::abs(x.value_); } +#endif + friend MyRealNumber sqrt(MyRealNumber x) { return std::sqrt(x.value_); } + +private: + double value_{}; +}; + +class MyComplexNumber { +public: + MyComplexNumber(MyRealNumber re, MyRealNumber im = {}) : re_(re), im_(im) {} + MyComplexNumber() = default; + + std::complex value() const { + return {re_.value(), im_.value()}; + } + + friend bool operator==(MyComplexNumber, MyComplexNumber) = default; + friend MyComplexNumber operator*(MyComplexNumber z, MyComplexNumber w) { + return {real(z) * real(w) - imag(z) * imag(w), + real(z) * imag(w) + imag(z) * real(w)}; + } + friend MyComplexNumber conj(MyComplexNumber z) { return {real(z), -imag(z)}; } + friend MyRealNumber real(MyComplexNumber z) { return z.re_; } + friend MyRealNumber imag(MyComplexNumber z) { return z.im_; } + +private: + MyRealNumber re_{}; + MyRealNumber im_{}; +}; + +int main() { + [[maybe_unused]] double x1 = two_norm_abs(-4.2); + assert(x1 == 4.2); + + [[maybe_unused]] float y0 = two_norm_abs(std::complex{-3.0f, 4.0f}); + assert(y0 == 5.0f); + + [[maybe_unused]] MyRealNumber r = two_norm_abs(MyRealNumber{-6.7}); + assert(r == MyRealNumber{6.7}); + + [[maybe_unused]] MyRealNumber z = two_norm_abs(MyComplexNumber{-3, 4}); + assert(z.value() == 5.0); + + return 0; +} +``` From 996ad5f467ef9aa8a475bcaae0c615eb26223959 Mon Sep 17 00:00:00 2001 From: Mark Hoemmen Date: Wed, 9 Oct 2024 13:32:19 -0600 Subject: [PATCH 13/17] P3371R2: Improve nonwording section Thanks to Ilya Burylov and Christian Trott for feedback and suggestions! --- fix-rankk-rank2k/P3371R2.html | 509 ++++++++++++++++++++------- fix-rankk-rank2k/fix-rankk-rank2k.md | 164 +++++++-- 2 files changed, 506 insertions(+), 167 deletions(-) diff --git a/fix-rankk-rank2k/P3371R2.html b/fix-rankk-rank2k/P3371R2.html index 70d55152..e1ad21c6 100644 --- a/fix-rankk-rank2k/P3371R2.html +++ b/fix-rankk-rank2k/P3371R2.html @@ -4,7 +4,7 @@ - + Fix C++26 by making the rank-1, rank-2, rank-k, and rank-2k updates consistent with the BLAS