From ca1ed29072e09142c50e4de29fabd48c31234e98 Mon Sep 17 00:00:00 2001
From: Mark Hoemmen <mhoemmen@users.noreply.github.com>
Date: Tue, 10 May 2022 22:46:33 -0600
Subject: [PATCH] P1674: Changes based on PR suggestions for P1673

PR https://github.com/ORNL/cpp-proposals-pub/pull/228 by Jeff Hammond suggests changes to P1673.
Some of those can be applied to P1674 as well.
This PR does that.
---
 D1674/evolving-from-blas.md | 25 +++++++++++++++----------
 1 file changed, 15 insertions(+), 10 deletions(-)

diff --git a/D1674/evolving-from-blas.md b/D1674/evolving-from-blas.md
index 0b6a41a4..a6668def 100644
--- a/D1674/evolving-from-blas.md
+++ b/D1674/evolving-from-blas.md
@@ -100,10 +100,11 @@ comes only in Fortran.  It's also slow; for example,
 its matrix-matrix multiply routine uses nearly the same triply nested
 loops that a naïve developer would write.  The intent of the BLAS is
 that users who care about performance find optimized implementations,
-either by hardware vendors or by projects like ATLAS (Whaley et
-al. 2001), the
+either by hardware vendors or by projects like
+[ATLAS](http://math-atlas.sourceforge.net/) (see also Whaley et al. 2001),
 [GotoBLAS](https://www.tacc.utexas.edu/research-development/tacc-software/gotoblas2),
-or [OpenBLAS](http://www.openblas.net).
+[OpenBLAS](https://github.com/xianyi/OpenBLAS),
+or [BLIS](https://github.com/flame/blis).
 
 Suppose that our developer has found an optimized implementation of
 the BLAS, and they want to call some of its routines from C++.  Here
@@ -1494,16 +1495,16 @@ Thanks to Damien Lebrun-Grandie for reviewing Revision 1 changes.
   A Portable, High-Performance, ANSI C Coding Methodology and its
   application to Matrix Multiply," LAPACK Working Note 111, 1996.
 
-* K. Goto and R. A. van de Geijn, "Anatomy of high-performance matrix
-  multiplication", ACM Transactions of Mathematical Software (TOMS),
-  Vol. 34, No. 3, May 2008.
+* K. Goto and R. A. van de Geijn,
+  ["Anatomy of high-performance matrix multiplication"](https://doi.org/10.1145/1356052.1356053),
+  *ACM Transactions of Mathematical Software* (TOMS),
+  Vol. 34, No. 3, May 2008.  See also 
 
 * M. Hoemmen, D. Hollman, C. Trott, D. Sunderland, N. Liber, A. Klinvex,
   Li-Ta Lo, D. Lebrun-Grandie, G. Lopez, P. Caday, S. Knepper, P. Luszczek,
   and T. Costa,
   "A free function linear algebra interface based on the BLAS,"
-  P1673R6,
-  Dec. 2021.
+  P1673R7, Apr. 2022.
 
 * C. Trott, D. Hollman, M. Hoemmen, and D. Sunderland,
   "`mdarray`: An Owning Multidimensional Array Analog of `mdspan`",
@@ -1521,14 +1522,18 @@ Thanks to Damien Lebrun-Grandie for reviewing Revision 1 changes.
 
 * J. Siek and A. Lumsdaine, "The Matrix Template Library: A Generic
   Programming Approach to High Performance Numerical Linear Algebra,"
-  in proceedings of the Second International Symposium on Computing in
+  in Proceedings of the Second International Symposium on Computing in
   Object-Oriented Parallel Environments (ISCOPE) 1998, Santa Fe, NM,
   USA, Dec. 1998.
+  
+* F. G. Van Zee and R. A. van de Geijn,
+  ["BLIS: A Framework for Rapidly Instantiating BLAS Functionality,"](https://doi.org/10.1145/2764454),
+  *ACM Transactions on Mathematical Software* (TOMS), Vol. 41, No. 3, June 2015.
 
 * R. Vuduc, "Automatic performance tuning of sparse matrix kernels,"
   PhD dissertation, Electrical Engineering and Computer Science,
   University of California Berkeley, 2004.
 
 * R. C. Whaley, A. Petitet, and J. Dongarra, "Automated Empirical
-  Optimization of Software and the ATLAS Project," Parallel Computing,
+  Optimization of Software and the ATLAS Project," *Parallel Computing*,
   Vol. 27, No. 1-2, Jan. 2001, pp. 3-35.