diff --git a/Accelerated_Python_User_Guide/notebooks/2D_Navier_Stokes_In_Warp.ipynb b/Accelerated_Python_User_Guide/notebooks/2D_Navier_Stokes_In_Warp.ipynb new file mode 100644 index 00000000..7d57b9ef --- /dev/null +++ b/Accelerated_Python_User_Guide/notebooks/2D_Navier_Stokes_In_Warp.ipynb @@ -0,0 +1,1535 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# SPDX-License-Identifier: Apache-2.0 AND CC-BY-NC-4.0\n", + "#\n", + "# Licensed under the Apache License, Version 2.0 (the \"License\");\n", + "# you may not use this file except in compliance with the License.\n", + "# You may obtain a copy of the License at\n", + "#\n", + "# http://www.apache.org/licenses/LICENSE-2.0\n", + "#\n", + "# Unless required by applicable law or agreed to in writing, software\n", + "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", + "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", + "# See the License for the specific language governing permissions and\n", + "# limitations under the License." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# GPU-Accelerated 2-D Navier Stokes Simulation using NVIDIA Warp\n", + "\n", + "## Overview\n", + "\n", + "In this notebook, we will build a fluid solver entirely in Python using the Warp framework. Specifically, we will simulate 2-D turbulent flow in a box.\n", + "\n", + "Through this implementation, you will learn:\n", + "\n", + "* Writing SIMT (single instruction, multiple threads) kernels for finite difference operators and flow initialization\n", + "* Warp's tile-based programming primitives for efficient matrix operations, e.g., FFT, inverse FFT, and matrix transpose\n", + "* CUDA graph capture for improving performance of timestepping codes\n", + "\n", + "By the end of this lab, we will have built a GPU-accelerated 2-D Navier-Stokes solver in a periodic box, while gaining experience in applying Warp's features to leverage a GPU for a computational physics problem. " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "---\n", + "## Setup\n", + "\n", + "Before we begin implementing the 2-D N-S solver, let's ensure we have all the necessary packages installed." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# install Warp\n", + "%pip install warp-lang \n", + "\n", + "# install visualization package\n", + "%pip install matplotlib Pillow" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now let us import the necessary libraries and initialize Warp to check if GPU support is available:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import numpy as np\n", + "import warp as wp\n", + "\n", + "if wp.get_cuda_device_count() > 0:\n", + " print(\"GPU detected successfully\")\n", + "else:\n", + " print(\"No GPU detected!\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "---\n", + "## Introduction\n", + "\n", + "This seemingly simple 2-D example combines multiple Warp features that can be also leveraged to build industrial-grade solvers:\n", + " - Accelerated Warp kernels for finite difference operators\n", + " - Tile-based matrix operations for a more performant code\n", + " - CUDA graph capture to speed-up multiple calls to the same kernel\n", + "\n", + "Before diving into the code, let's understand a bit of physics behind the underlying equations. " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "---\n", + "## Background: Equations, boundary conditions, and initial conditions\n", + "\n", + "### 1. Vorticity-stream function formulation\n", + "\n", + "For a 2-D incompressible flow, we define:\n", + "- **Vorticity**: $\\omega = \\partial v/\\partial x - \\partial u/\\partial y$\n", + "- **Stream function**: $\\psi$ such that $u = \\partial\\psi/\\partial y$, $v = -\\partial\\psi/\\partial x$\n", + "\n", + "In the vorticity and stream function equation above, $u$ and $v$ are the two components of the velocity vector $\\mathbf{u} = [u, v]$ in 2-D. This formulation automatically satisfies continuity and eliminates pressure from the Navier-Stokes equations.\n", + "\n", + "### 2. Governing equation\n", + "\n", + "In 2-D, the non-dimensional governing equations for incompressible flows can be written using the **vorticity-stream function transport equation** as\n", + "\n", + "$$\\underbrace{\\frac{\\partial \\omega}{\\partial t}}_{\\text{unsteadiness}} + \\underbrace{\\frac{\\partial \\psi}{\\partial y}\\frac{\\partial \\omega}{\\partial x} - \\frac{\\partial \\psi}{\\partial x}\\frac{\\partial \\omega}{\\partial y}}_{\\text{advection}} = \\underbrace{\\frac{1}{Re}\\left(\\frac{\\partial^2 \\omega}{\\partial x^2} + \\frac{\\partial^2 \\omega}{\\partial y^2}\\right)}_{\\text{diffusion}} \\tag{1},$$\n", + "\n", + "along with the relationship between stream function and vorticity, given by a Poisson equation. \n", + "\n", + "Starting from the definition of vorticity (in 2-D)\n", + " \n", + "$$\\omega = \\frac{\\partial v}{\\partial x} - \\frac{\\partial u}{\\partial y}.$$\n", + " \n", + "Substituting the streamfunction relations $u = \\frac{\\partial \\psi}{\\partial y}$ and $v = -\\frac{\\partial \\psi}{\\partial x}$ gives\n", + " \n", + "$$\\omega = \\frac{\\partial}{\\partial x}\\left(-\\frac{\\partial \\psi}{\\partial x}\\right) - \\frac{\\partial}{\\partial y}\\left(\\frac{\\partial \\psi}{\\partial y}\\right),$$\n", + " \n", + "$$\\omega = -\\frac{\\partial^2 \\psi}{\\partial x^2} - \\frac{\\partial^2 \\psi}{\\partial y^2}.$$\n", + " \n", + "Rearranging gives the Poisson equation\n", + " \n", + "$$\\frac{\\partial^2 \\psi}{\\partial x^2} + \\frac{\\partial^2 \\psi}{\\partial y^2} = -\\omega \\tag{2}.$$\n", + "\n", + "When writing the solver, equations (1) and (2) will be solved at each timestep to evolve the flow. \n", + "\n", + "**NOTE: These equations might look intimidating, but we will break down the different parts in the subsequent cells to make them more tractable. So, please bear with us for a bit!**\n", + "\n", + "### 3. Spectral Poisson solver\n", + "\n", + "The Poisson equation in (2) can be discretized and solved using iterative methods like Jacobi and Gauss-Seidel methods. However, since we have a periodic domain, we can represent both $\\omega$ and $\\psi$ using Fourier series with sine and cosine basis functions as\n", + " \n", + "$$\\omega(x,y) = \\sum_{m,n} \\hat{\\omega}_{m,n} e^{i(k_x x + k_y y)},$$\n", + " \n", + "$$\\psi(x,y) = \\sum_{m,n} \\hat{\\psi}_{m,n} e^{i(k_x x + k_y y)},$$\n", + " \n", + "where $\\hat{\\omega}_{m,n}$ and $\\hat{\\psi}_{m,n}$ are what we call the Fourier coefficients, and $k_x = \\frac{2\\pi m}{L_x}$, $k_y = \\frac{2\\pi n}{L_y}$ are the wavenumbers. Now, if we substitute these Fourier representations back into equation (2), we can compute the derivatives as\n", + "\n", + "$$\\frac{\\partial^2 \\psi}{\\partial x^2} = \\sum_{m,n} -k_x^2 \\hat{\\psi}_{m,n} e^{i(k_x x + k_y y)},$$\n", + " \n", + "$$\\frac{\\partial^2 \\psi}{\\partial y^2} = \\sum_{m,n} -k_y^2 \\hat{\\psi}_{m,n} e^{i(k_x x + k_y y)}.$$\n", + " \n", + "Substituting these into the Poisson equation, i.e., equation (2) above will result in\n", + " \n", + "$$\\sum_{m,n} -(k_x^2 + k_y^2) \\hat{\\psi}_{m,n} e^{i(k_x x + k_y y)} = -\\sum_{m,n} \\hat{\\omega}_{m,n} e^{i(k_x x + k_y y)}.$$\n", + " \n", + "Since this must hold for each pair of $[k_x, k_y]$ independently, **the Poisson equation reduces to a simple algebraic equation in Fourier space** \n", + "\n", + "$$\\hat{\\psi}_{m,n} = \\frac{\\hat{\\omega}_{m,n}}{(k_x^2 + k_y^2)}. \\tag{3}$$\n", + "\n", + "\n", + "Note that **Fourier space** refers to the spectral domain where we work directly with the Fourier coefficients $\\hat{\\omega}_{m,n}$ and $\\hat{\\psi}_{m,n}$ along with their corresponding wavenumbers $k_x$ and $k_y$. After solving in Fourier space, we recover the physical fields $\\omega$ and $\\psi$ using the inverse Fourier transform. The wavenumbers are defined as $k_x = \\frac{2\\pi m}{L_x}, \\quad k_y = \\frac{2\\pi n}{L_y}$, where $m$ and $n$ are the Fourier mode indices, each ranging from $[-N/2, N/2 -1]$ on a square grid (with $N$ being the total number of grid points along any given direction of the square grid). \n", + "\n", + "\n", + "### 4. Time integration \n", + "Runge-Kutta methods advance the solution by taking multiple intermediate \"substeps\" within each timestep, achieving better accuracy than a single forward step. In this example, we use a **strong-stability preserving (SSP) third-order** **explicit** Runge-Kutta method.\n", + "\n", + "- **Third-order**: The numerical error scales as $\\mathcal{O}(\\Delta t)^3$. So, halving the timestep reduces error by $\\sim 8\\times$.\n", + "\n", + "- **SSP**: Schemes designed specifically to prevent spurious oscillations for hyperbolic PDEs. Interested readers can read more [here](https://api.drum.lib.umd.edu/server/api/core/bitstreams/faa96f40-0fbb-4af0-84a3-5e3b80c908df/content).\n", + "\n", + "- **Explicit**: Each new timestep is computed directly from the previous one (easier to program, as we will see!).\n", + "\n", + "SSP RK3 scheme takes the following form below, where $\\omega^{(n)}$ and $\\omega^{(n+1)}$ are the vorticity field at the beginning and the end of a timestep. $\\omega^{(1)}$ and $\\omega^{(2)}$ are the intermediate vorticity fields.\n", + "\n", + "$$\\omega^{(1)} = \\omega^{(n)} + \\Delta t \\mathcal{L}(\\omega^{(n)}),$$\n", + " \n", + "$$\\omega^{(2)} = \\frac{3}{4}\\omega^{(n)} + \\frac{1}{4}\\omega^{(1)} + \\frac{1}{4}\\Delta t \\mathcal{L}(\\omega^{(1)}),$$ \n", + "\n", + "$$\\omega^{(n+1)} = \\frac{1}{3}\\omega^{(n)} + \\frac{2}{3}\\omega^{(2)} + \\frac{2}{3}\\Delta t \\mathcal{L}(\\omega^{(2)}). \\tag{4}$$\n", + "\n", + "Here $\\mathcal{L}(\\omega)$ encapsulates the finite-difference forms of both advection and diffusion terms in equation (1). \n", + "\n", + "\n", + "### 5. Initial condition: 2-D decaying turbulence\n", + "\n", + "In this problem, we solve the governing equations on a square domain of size $2\\pi \\times 2\\pi$. Recall that the wavenumbers are defined as $k_x = 2 \\pi m / L_x$ and $k_y = 2 \\pi n / L_y$. By choosing $L_x = L_y = 2\\pi$, the wavenumbers $k_x$ and $k_y$ become integers (specifically, $k_x = m$ and $k_y = n$), which simplifies some of the FFT-based operations. This choice of domain size is a common convention in spectral solver literature for this reason.\n", + "\n", + "The initial condition is generated using the energy spectrum from San & Staples CNF (2012)\n", + "\n", + "$$E(k) = \\frac{a_s}{2 k_p}\\left(\\frac{k}{k_p}\\right)^{2s+1} \\exp\\left[-\\left(s + \\frac{1}{2}\\right)\\left(\\frac{k}{k_p}\\right)^2\\right] \\tag{5},$$\n", + "\n", + "where\n", + "- $k = |\\mathbf{k}| = \\sqrt{k_x^2 + k_y^2}$,\n", + "- $k_p = 12 (wavenumber at which $E(k)$ peaks), \n", + "- $s = 3$ (shape parameter of the spectrum),\n", + "- $a_s = (2s+1)^{s+1}/(2^s s!)$.\n", + "\n", + "The magnitude of Fourier coefficients related to the initial energy spectrum is given by \n", + "\n", + "$$|\\hat{\\omega}(\\mathbf{k})| = \\sqrt{\\frac{k}{\\pi}E(k)},$$\n", + "\n", + "which is then randomized with a phase function $\\zeta(\\mathbf{k})$ so that $\\hat{\\omega}(k_x, k_y) = \\hat{\\omega}(\\mathbf{k})$ becomes \n", + "\n", + "$$\\hat{\\omega}(\\mathbf{k}) = \\sqrt{\\frac{k}{\\pi}E(k)} \\, e^{i\\zeta(\\mathbf{k})}. \\tag{6}$$\n", + " \n", + "The phase function $\\zeta(\\mathbf{k})$ is constructed with specific symmetry constraints to ensure that the vorticity field $\\omega(x,y)$ is real-valued in the physical space. The details of how $\\zeta(\\mathbf{k})$ is formulated is provided in **Appendix A**.\n", + "\n", + "**NOTE: Although the initialization equations may appear complex, this notebook focuses primarily on the Warp implementation. Much of the boilerplate code will be provided for you.**" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "---\n", + "## Building blocks of the solver\n", + "\n", + "$$\n", + "\\boxed{\\;\\;\n", + "\\begin{array}{c}\n", + "\\large\\textbf{Solver Pipeline} \\\\[14pt]\n", + "\\hat{\\omega} \\;\\xrightarrow{\\;\\text{IFFT}\\;}\\; \\omega(t) \\;\\xrightarrow{\\quad}\\;\n", + "\\underbrace{\\text{Discretize} \\;\\to\\; \\text{1 RK substep} \\;\\to\\; \\text{Poisson solve}}_{\\times\\, 3 \\text{\\ times for a RK3 scheme}} \n", + "\\;\\xrightarrow{\\quad}\\; \\omega(t+\\Delta t) \\\\[6pt]\n", + "\\uparrow \\hspace{20em} \\downarrow \\\\[-2pt]\n", + "\\xleftarrow{\\hspace{5em} \\text{For next timestep/RK step} \\hspace{2em}}\n", + "\\end{array}\n", + "\\;\\;}\n", + "$$\n", + "\n", + "There are four main building blocks of the solver that we would tackle in a sequential manner.\n", + "\n", + "- Initialize $\\omega$, i.e., generate $\\hat{\\omega}$ in the Fourier space using the energy spectrum specified above and inverse FFT it to the physical space.\n", + "- Discretize the advection and diffusion terms to obtain the RHS $\\mathcal{L}(\\omega)$ of $\\frac{\\partial \\omega}{\\partial t} = \\mathcal{L}(\\omega)$ (see equation 1).\n", + "- Advancing $\\omega(t)$ to $\\omega(t+\\Delta t)$ using SSP RK3 scheme (equation 4).\n", + "- Solve Poisson equation (equation 2) in the Fourier space $\\hat{\\psi}_{m,n} = \\hat{\\omega}_{m,n}/(k_x^2 + k_y^2)$ (equation 3), and inverse FFT $\\hat{psi}$ to $\\psi$ after each RK substep. " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "---\n", + "## Simulation parameters" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# grid resolution\n", + "N_GRID = 512\n", + "# box size\n", + "LEN = 2 * np.pi\n", + "# delta t for timestepping\n", + "DT = 0.001\n", + "# Reynolds number\n", + "RE = 1000.0\n", + "# define SSP-RK3 coefficients used for timestepping\n", + "RK3_COEFFS = [\n", + " [1.0, 0.0, 1.0],\n", + " [3.0 / 4.0, 1.0 / 4.0, 1.0 / 4.0],\n", + " [1.0 / 3.0, 2.0 / 3.0, 2.0 / 3.0],\n", + "]\n", + "# h = l/N (size of a computational cell)\n", + "H = LEN / N_GRID\n", + "# parameters for Warp's tiled-FFT functionality\n", + "TILE_M = 1\n", + "TILE_N = N_GRID\n", + "TILE_TRANSPOSE_DIM = 16\n", + "BLOCK_DIM = TILE_N // 2" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "---\n", + "## Building block 1: Flow initialization in Warp\n", + "\n", + "Recall from equation (6) in the Background section that the initial vorticity field $\\hat{\\omega}(\\mathbf{k})$ in Fourier space is given by\n", + "\n", + "$$\\hat{\\omega}(\\mathbf{k}) = \\underbrace{\\sqrt{\\frac{k}{\\pi}E(k)}}_{\\text{amplitude}} \\cdot \\underbrace{e^{i\\zeta(\\mathbf{k})}}_{\\text{phase}} = \\underbrace{\\sqrt{\\frac{k}{\\pi}E(k)}}_{\\text{amplitude}} \\cdot \\left(\\cos\\zeta(\\mathbf{k}) + i\\sin\\zeta(\\mathbf{k})\\right),$$\n", + "\n", + "which gives us the real and imaginary parts\n", + " \n", + "$$\\text{Re}(\\hat{\\omega}) = \\sqrt{\\frac{k}{\\pi}E(k)} \\cos\\zeta(\\mathbf{k}), \\quad \\text{Im}(\\hat{\\omega}) = \\sqrt{\\frac{k}{\\pi}E(k)} \\sin\\zeta(\\mathbf{k}).$$\n", + "\n", + "$E(k)$ is the energy spectrum (equation 5) and $\\zeta(\\mathbf{k})$ is the phase function with symmetry constraints ensuring a real-valued physical field.\n", + "\n", + "**What do we need to implement?**\n", + "\n", + "To initialize the flow, we need:\n", + "\n", + "1. A kernel to populate $\\hat{\\omega}(\\mathbf{k})$ on the entire 2-D grid using equation (6).\n", + "\n", + "2. Two helper functions for that kernel (and equation 6):\n", + "\n", + " a. A function to compute the **energy spectrum** $E(k)$ at each wavenumber magnitude.\n", + "\n", + " b. A function to compute the **phase** $\\zeta(\\mathbf{k})$.\n", + " \n", + "The helper functions will be `@wp.func` device functions (callable from kernels), and the initialization kernel will be a `@wp.kernel`." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Note on Warp math utilities:** Warp provides built-in functions for basic operations like `wp.cos()`, `wp.sqrt()`, `wp.sin()`, etc. for use in kernels. For complex number arithmetic, Warp uses `wp.vec2f` and `wp.vec2d` as data types for 32-bit and 64-bit precision, respectively. For example, complex numbers can represented as `wp.vec2f(real_part, imaginary_part)`, where the first component is the real part of type `wp.float32` and the second component is the imaginary part of type `wp.float32`.\n", + "\n", + "**EXERCISE: Assuming the helper functions are correct, complete the kernel `decaying_turbulence_initializer(...)` by computing the amplitude using the Warp function `def energy_spectrum(...)`, the phase function using `def phase_randomizer(...)`, and combining them to form the complex valued `omega_hat_init[i, j]`.**" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "@wp.func\n", + "def factorial(n: wp.int32) -> wp.int32:\n", + " \"\"\"Compute factorial.\n", + "\n", + " Args:\n", + " n: Input integer for which we want factorial.\n", + "\n", + " Returns:\n", + " Factorial of input n.\n", + " \"\"\"\n", + " result = wp.int32(1)\n", + " for i in range(2, n + 1):\n", + " result *= i\n", + " return result\n", + "\n", + "\n", + "@wp.func\n", + "def energy_spectrum(k: wp.float32, s: wp.int32, kp: wp.float32) -> wp.float32:\n", + " \"\"\"Compute energy at wavenumber magnitude k.\n", + "\n", + " Follows San and Staples 2012 Computers and Fluids (page 49).\n", + " https://www.sciencedirect.com/science/article/abs/pii/S0045793012001363.\n", + "\n", + " Args:\n", + " k: Input wavenumber magnitude.\n", + " s: Shape parameter of spectrum.\n", + " kp: Wavenumber magnitude at which maximum of energy spectrum lies.\n", + "\n", + " Returns:\n", + " Energy contained at wavenumber magnitude k.\n", + " \"\"\"\n", + " s_factorial = wp.float32(factorial(s))\n", + " s_float32 = wp.float32(s)\n", + " a_s = (2.0 * s_float32 + 1.0) ** (s_float32 + 1.0) / (2.0**s_float32 * s_factorial)\n", + " energy_k = (\n", + " a_s\n", + " / (2.0 * kp)\n", + " * (k / kp) ** (2.0 * s_float32 + 1.0)\n", + " * wp.exp(-(s_float32 + 0.5) * (k / kp) ** 2.0)\n", + " )\n", + " return energy_k\n", + "\n", + "\n", + "@wp.func\n", + "def phase_randomizer(\n", + " n: int,\n", + " zeta: wp.array2d(dtype=wp.float32),\n", + " eta: wp.array2d(dtype=wp.float32),\n", + " i: int,\n", + " j: int,\n", + ") -> wp.float32:\n", + " \"\"\"Calculate value of the random phase at index (i, j).\n", + "\n", + " Follows San and Staples 2012 to return phase value in any quadrant based on\n", + " the values of eta and zeta in the first quadrant.\n", + "\n", + " Args:\n", + " n: Size of the simulation domain.\n", + " zeta: First phase function.\n", + " eta: Second phase function\n", + " i: rowwise index on the 2-D simulation domain.\n", + " j: columnwise index on the 2-D simulation domain\n", + "\n", + " Returns:\n", + " Value of the random phase in any quadrant.\n", + " \"\"\"\n", + " n_half = n // 2\n", + "\n", + " # first quadrant\n", + " if i < n_half and j < n_half:\n", + " return zeta[i, j] + eta[i, j]\n", + " # second quadrant\n", + " if i >= n_half and j < n_half:\n", + " return -zeta[n - i, j] + eta[n - i, j]\n", + " # third quadrant\n", + " if i >= n_half and j >= n_half:\n", + " return -zeta[n - i, n - j] - eta[n - i, n - j]\n", + " # fourth quadrant\n", + " return zeta[i, n - j] - eta[i, n - j]\n", + "\n", + "@wp.kernel\n", + "def decaying_turbulence_initializer(\n", + " n: int,\n", + " kp: wp.float32,\n", + " s: wp.int32,\n", + " k_mag: wp.array2d(dtype=wp.float32),\n", + " zeta: wp.array2d(dtype=wp.float32),\n", + " eta: wp.array2d(dtype=wp.float32),\n", + " omega_hat_init: wp.array2d(dtype=wp.vec2f), # In Warp, wp.vec2f/wp.vec2d are used to handle complex numbers\n", + "):\n", + " \"\"\"Initialize the vorticity field in Fourier space for decaying turbulence.\n", + "\n", + " Args:\n", + " n: Size of the simulation domain.\n", + " kp: Wavenumber magnitude at which maximum of energy spectrum lies.\n", + " s: Shape parameter of the energy spectrum.\n", + " k_mag: Wavenumber magnitude on the 2-D grid.\n", + " zeta: First phase function for phase randomization.\n", + " eta: Second phase function for phase randomization.\n", + " omega_hat_init: Output vorticity field in Fourier space.\n", + " \"\"\"\n", + " i, j = wp.tid()\n", + "\n", + " amplitude = wp.sqrt((k_mag[i, j] / wp.pi) * energy_spectrum(k_mag[i, j], s, kp)) # MISSING\n", + " phase = phase_randomizer(n, zeta, eta, i, j) # MISSING\n", + " omega_hat_init[i, j] = wp.vec2f(amplitude*wp.cos(phase), amplitude*wp.sin(phase)) # MISSING" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "---\n", + "## Building blocks 2 and 3: Discretization of advection and viscous diffusion terms for RHS and Runge-Kutta timestepping\n", + "\n", + "Recall the vorticity transport equation (1) from the Background section given as\n", + "\n", + "$$\\frac{\\partial \\omega}{\\partial t} + \\underbrace{\\frac{\\partial \\psi}{\\partial y}\\frac{\\partial \\omega}{\\partial x} - \\frac{\\partial \\psi}{\\partial x}\\frac{\\partial \\omega}{\\partial y}}_{\\text{advection}} = \\underbrace{\\frac{1}{Re}\\left(\\frac{\\partial^2 \\omega}{\\partial x^2} + \\frac{\\partial^2 \\omega}{\\partial y^2}\\right)}_{\\text{diffusion}}.$$\n", + "\n", + "We discretize the **advection** and **diffusion** terms on a **uniform grid** using **central finite difference schemes**. Let us see what does that mean. It's not as scary as it sounds!\n", + "### Advection term\n", + "\n", + "Using central differences for first derivatives with grid spacing $h$,\n", + "\n", + "$$\\frac{\\partial \\omega}{\\partial x}\\bigg|_{i,j} \\approx \\frac{\\omega_{i+1,j} - \\omega_{i-1,j}}{2h}, \\quad \\frac{\\partial \\omega}{\\partial y}\\bigg|_{i,j} \\approx \\frac{\\omega_{i,j+1} - \\omega_{i,j-1}}{2h}, \\quad \\frac{\\partial \\psi}{\\partial x}\\bigg|_{i,j} \\approx \\frac{\\psi_{i+1,j} - \\psi_{i-1,j}}{2h}, \\quad \\frac{\\partial \\psi}{\\partial y}\\bigg|_{i,j} \\approx \\frac{\\psi_{i,j+1} - \\psi_{i,j-1}}{2h}.$$\n", + "\n", + "The advection term becomes\n", + "\n", + "$$\\text{Advection}_{i,j} = \\underbrace{\\frac{\\omega_{i+1,j} - \\omega_{i-1,j}}{2h} \\cdot \\frac{\\psi_{i,j+1} - \\psi_{i,j-1}}{2h}}_{\\texttt{term\\_1}} - \\underbrace{\\frac{\\omega_{i,j+1} - \\omega_{i,j-1}}{2h} \\cdot \\frac{\\psi_{i+1,j} - \\psi_{i-1,j}}{2h}}_{\\texttt{term\\_2}}.$$\n", + "\n", + "### Diffusion (or Laplacian) term\n", + "\n", + "Using the standard 5-point stencil for the Laplacian, the diffusion term becomes\n", + "\n", + "$$\\text{Diffusion}_{i,j} = \\nabla^2 \\omega\\bigg|_{i,j} = \\frac{\\partial^2 \\omega}{\\partial x^2}\\bigg|_{i,j} + \\frac{\\partial^2 \\omega}{\\partial y^2}\\bigg|_{i,j} \\approx \\frac{\\omega_{i+1,j} - 2\\omega_{i,j} + \\omega_{i-1,j}}{h^2} + \\frac{\\omega_{i,j+1} - 2\\omega_{i,j} + \\omega_{i,j-1}}{h^2} = \\frac{\\omega_{i+1,j} + \\omega_{i-1,j} + \\omega_{i,j+1} + \\omega_{i,j-1} - 4\\omega_{i,j}}{h^2}.$$\n", + "\n", + "### Runge-Kutta timestepping\n", + "\n", + "Recall the SSP-RK3 scheme from equation (4). Each sub-step has the general form\n", + "\n", + "$$\\omega^{(\\text{new})} = c_0 \\cdot \\omega^{(n)} + c_1 \\cdot \\omega^{(\\text{old})} + c_2 \\cdot \\Delta t \\cdot \\mathcal{L}(\\omega^{(\\text{old})}),$$\n", + "\n", + "where \n", + "\n", + "- $\\mathcal{L}(\\omega) = \\frac{1}{Re}\\text{Diffusion} - \\text{Advection}$ is the RHS term we just discretized above. \n", + "- $\\omega^{(n)}$ is the vorticity field on at the beginning of the timestep $t$. \n", + "- $\\omega^{\\text{old}}$ and $\\omega^{\\text{new}}$ are the vorticity field at the beginning and the end **of a RK substep**, respectively. \n", + "\n", + "**Note: At the first and last RK substep specifically, $\\omega^{\\text{(new)}}$ and $\\omega^{\\text{(old)}}$ are equal to $\\omega(t+1)$ and $\\omega(t)$, respectively. For intermediate RK substeps, this does not necessarily hold true.**\n", + "\n", + "In a Warp kernel, all of these operations would translate to:\n", + "1. Compute `rhs = (1/Re) * diffusion(...) - advection(...)`.\n", + "2. Update `omega_old = coeff0 * omega_n + coeff1 * omega_old + coeff2 * dt * rhs`, where `omega_n` is the vorticity field at the very beginning of the RK timestepping. By passing different coefficients, each time the kernel is called, the same kernel can handle all three RK sub-steps. Here, note that the same variable `omega_old` is overwritten to store the new values after a RK substep to store $\\omega^{\\text{(new)}}$. \n", + "\n", + "**What do we need to implement?**\n", + "\n", + "Two `@wp.func` helper functions:\n", + "1. `viscous_advection_rk3_kernel(...)` - Warp kernel that calls two Warp functions `advection(...)` and `diffusion(...)` to calculate the advection and diffusion terms, and then performs one RK update on the entire 2-D grid.\n", + "2. `advection(...)` - Warp function that computes the advection term at a grid point given the flow information on the neighboring left, right, top, and bottom neighbors.\n", + "3. `diffusion(...)` - Warp function computes the Laplacian at a grid point given the flow information on the left, right, top and bottom neighbors." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**EXERCISE: Fill in the MISSING parts in `advection(...)`, `diffusion(...)`, using the finite difference formulas described above and the RK update equation.**\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "@wp.func\n", + "def cyclic_index(idx: wp.int32, n: wp.int32) -> wp.int32:\n", + " \"\"\"Map any index to [0, n-1] for periodic boundary conditions.\n", + "\n", + " Args:\n", + " idx: Input index that may be outside the valid range.\n", + " n: Grid size defining the periodic domain.\n", + "\n", + " Returns:\n", + " Index wrapped to the range [0, n-1].\n", + " \"\"\"\n", + " ret_idx = idx % n\n", + " if ret_idx < 0:\n", + " ret_idx += n\n", + " return ret_idx\n", + " \n", + "@wp.func\n", + "def advection(\n", + " omega_left: wp.float32,\n", + " omega_right: wp.float32,\n", + " omega_top: wp.float32,\n", + " omega_down: wp.float32,\n", + " psi_left: wp.float32,\n", + " psi_right: wp.float32,\n", + " psi_top: wp.float32,\n", + " psi_down: wp.float32,\n", + " h: wp.float32,\n", + ") -> wp.float32:\n", + " \"\"\"Calculate the advection term using central finite difference.\n", + "\n", + " Args:\n", + " omega_left: Vorticity at (i-1, j).\n", + " omega_right: Vorticity at (i+1, j).\n", + " omega_top: Vorticity at (i, j+1).\n", + " omega_down: Vorticity at (i, j-1).\n", + " psi_left: Stream function at (i-1, j).\n", + " psi_right: Stream function at (i+1, j).\n", + " psi_top: Stream function at (i, j+1).\n", + " psi_down: Stream function at (i, j-1).\n", + " h: Grid spacing.\n", + "\n", + " Returns:\n", + " Advection term value at grid point (i, j).\n", + " \"\"\"\n", + " inv_2h = 1.0 / (2.0 * h)\n", + " term_1 = ((omega_right - omega_left) * inv_2h) * ((psi_top - psi_down) * inv_2h) # MISSING\n", + " term_2 = ((omega_top - omega_down) * inv_2h) * ((psi_right - psi_left) * inv_2h) # MISSING\n", + " return term_1 - term_2\n", + "\n", + "\n", + "@wp.func\n", + "def diffusion(\n", + " omega_left: wp.float32,\n", + " omega_right: wp.float32,\n", + " omega_center: wp.float32,\n", + " omega_down: wp.float32,\n", + " omega_top: wp.float32,\n", + " h: wp.float32,\n", + ") -> wp.float32:\n", + " \"\"\"Calculate the Laplacian for viscous diffusion using central difference.\n", + "\n", + " Args:\n", + " omega_left: Vorticity at (i-1, j).\n", + " omega_right: Vorticity at (i+1, j).\n", + " omega_center: Vorticity at (i, j).\n", + " omega_down: Vorticity at (i, j-1).\n", + " omega_top: Vorticity at (i, j+1).\n", + " h: Grid spacing.\n", + "\n", + " Returns:\n", + " Laplacian of vorticity at grid point (i, j).\n", + " \"\"\"\n", + " inv_h2 = 1.0 / (h * h)\n", + " # combine both the diffusion terms in the x and y direction together\n", + " laplacian = (omega_left + omega_right + omega_top + omega_down - 4.0*omega_center) * inv_h2 # MISSING\n", + " return laplacian" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Validating advection and diffusion kernels\n", + "\n", + "Before moving to `def viscous_advection_rk3_kernel(...)`, we validate the finite-difference implementation using a **manufactured solution** with known analytical derivatives.\n", + "\n", + "$$\\psi(x,y) = \\sin(k_x x)\\sin(k_y y), \\quad \\omega(x,y) = (k_x^2 + k_y^2)\\sin(k_x x)\\sin(k_y y)$$\n", + "\n", + "Note that $\\nabla^{2} \\psi = -\\omega$. \n", + "\n", + "Applying the Laplacian to $\\omega$\n", + "\n", + "$$\\nabla^2\\omega = \\frac{\\partial^2 \\omega}{\\partial x^2} + \\frac{\\partial^2 \\omega}{\\partial y^2} = -(k_x^2 + k_y^2)^2 \\sin(k_x x)\\sin(k_y y) = -(k_x^2 + k_y^2)\\omega.$$\n", + "\n", + "Evaluating $\\frac{\\partial \\psi}{\\partial x}\\frac{\\partial \\omega}{\\partial y} - \\frac{\\partial \\psi}{\\partial y}\\frac{\\partial \\omega}{\\partial x}$ gives\n", + "\n", + "$$\\frac{\\partial \\psi}{\\partial x} = k_x\\cos(k_x x)\\sin(k_y y), \\quad \\frac{\\partial \\psi}{\\partial y} = k_y\\sin(k_x x)\\cos(k_y y)$$\n", + "\n", + "$$\\frac{\\partial \\omega}{\\partial x} = (k_x^2+k_y^2)k_x\\cos(k_x x)\\sin(k_y y), \\quad \\frac{\\partial \\omega}{\\partial y} = (k_x^2+k_y^2)k_y\\sin(k_x x)\\cos(k_y y)$$\n", + "\n", + "Substituting into the Jacobian:\n", + "\n", + "$$\\frac{\\partial \\psi}{\\partial x}\\frac{\\partial \\omega}{\\partial y} = (k_x^2+k_y^2)k_x k_y \\cos(k_x x)\\sin(k_x x)\\sin(k_y y)\\cos(k_y y)$$\n", + "\n", + "$$\\frac{\\partial \\psi}{\\partial y}\\frac{\\partial \\omega}{\\partial x} = (k_x^2+k_y^2)k_x k_y \\sin(k_x x)\\cos(k_x x)\\cos(k_y y)\\sin(k_y y)$$\n", + "\n", + "These two terms are identical, so advection terms evaluate to zero.\n", + "\n", + "The cell below compares numerical output against these analytical values. \n", + "\n", + "**EXERCISE: Run the initial validation and try playing with `n_grid`, `kx`, `ky` parameters. For now, treat the `def compute_advection_diffusion_kernel(...)` as a black box. We will encounter a similar kernel soon.**" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from utils import validate_advection_diffusion\n", + "\n", + "@wp.kernel\n", + "def compute_advection_diffusion_kernel(\n", + " omega: wp.array2d(dtype=wp.float32),\n", + " psi: wp.array2d(dtype=wp.float32),\n", + " advection_out: wp.array2d(dtype=wp.float32),\n", + " diffusion_out: wp.array2d(dtype=wp.float32),\n", + " h: wp.float32,\n", + " n: wp.int32,\n", + "):\n", + " \"\"\"Compute advection and diffusion terms at each grid point.\n", + "\n", + " Args:\n", + " omega: Vorticity field.\n", + " psi: Stream function field.\n", + " advection_out: Output array for advection term J(psi, omega).\n", + " diffusion_out: Output array for diffusion term (Laplacian of omega).\n", + " h: Grid spacing.\n", + " n: Grid size.\n", + " \"\"\"\n", + " i, j = wp.tid()\n", + "\n", + " # obtain the neighboring indices for the [i, j]th cell in a periodic square box\n", + " i_left = cyclic_index(i - 1, n)\n", + " i_right = cyclic_index(i + 1, n)\n", + " j_down = cyclic_index(j - 1, n)\n", + " j_top = cyclic_index(j + 1, n)\n", + "\n", + " # gather omega values at neighbors\n", + " omega_center = omega[i, j]\n", + " omega_left = omega[i_left, j]\n", + " omega_right = omega[i_right, j]\n", + " omega_down = omega[i, j_down]\n", + " omega_top = omega[i, j_top]\n", + "\n", + " # gather psi values at neighbors\n", + " psi_left = psi[i_left, j]\n", + " psi_right = psi[i_right, j]\n", + " psi_down = psi[i, j_down]\n", + " psi_top = psi[i, j_top]\n", + "\n", + " # compute diffusion term (Laplacian)\n", + " diffusion_out[i, j] = diffusion(\n", + " omega_left,\n", + " omega_right,\n", + " omega_center,\n", + " omega_down,\n", + " omega_top,\n", + " h,\n", + " )\n", + "\n", + " # compute advection term J(psi, omega)\n", + " advection_out[i, j] = advection(\n", + " omega_left,\n", + " omega_right,\n", + " omega_top,\n", + " omega_down,\n", + " psi_left,\n", + " psi_right,\n", + " psi_top,\n", + " psi_down,\n", + " h,\n", + " )\n", + "\n", + "# Change n_grid, kx, ky here and play with different values\n", + "(fig_diff, _), (fig_adv, _) = validate_advection_diffusion(\n", + " compute_advection_diffusion_kernel, n_grid=N_GRID, kx=2, ky=3\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**EXERCISE: Given that we have the correct `def advection(...)` and `def diffusion(...)` functions, let's assemble the timestepping kernel together. Recall**\n", + "\n", + "$$\\omega^{(\\text{new})} = c_0 \\cdot \\omega^{(n)} + c_1 \\cdot \\omega^{(\\text{old})} + c_2 \\cdot \\Delta t \\cdot \\mathcal{L}(\\omega^{(\\text{old})}).$$\n", + "\n", + "In coding terminology, we can write `omega_old = coeff0 * omega_n + coeff1 * omega_old + coeff2 * dt * rhs`.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "\n", + "@wp.kernel\n", + "def viscous_advection_rk3_kernel(\n", + " n: int,\n", + " h: float,\n", + " re: float,\n", + " dt: float,\n", + " coeff0: float,\n", + " coeff1: float,\n", + " coeff2: float,\n", + " omega_0: wp.array2d(dtype=float),\n", + " omega_1: wp.array2d(dtype=float),\n", + " psi: wp.array2d(dtype=float),\n", + " rhs: wp.array2d(dtype=float),\n", + "):\n", + " \"\"\"Perform a single substep of SSP-RK3.\n", + "\n", + " Args:\n", + " n: Grid size.\n", + " h: Grid spacing.\n", + " re: Reynolds number.\n", + " dt: Timestep size.\n", + " coeff0: SSP-RK3 coefficient for omega_0.\n", + " coeff1: SSP-RK3 coefficient for omega_1.\n", + " coeff2: SSP-RK3 coefficient for RHS.\n", + " omega_0: Vorticity field at the beginning of the timestep.\n", + " omega_1: Vorticity field at the end of the RK step.\n", + " psi: Stream function field.\n", + " rhs: Temporarily stores diffusion + advection terms.\n", + " \"\"\"\n", + " i, j = wp.tid()\n", + "\n", + " # obtain the neighboring indices for the [i, j]th cell in a periodic square box\n", + " left_idx = cyclic_index(i - 1, n)\n", + " right_idx = cyclic_index(i + 1, n)\n", + " top_idx = cyclic_index(j + 1, n)\n", + " down_idx = cyclic_index(j - 1, n)\n", + "\n", + " # compute viscous diffusion term\n", + " rhs[i, j] = (1.0 / re) * diffusion(\n", + " omega_1[left_idx, j],\n", + " omega_1[right_idx, j],\n", + " omega_1[i, j],\n", + " omega_1[i, down_idx],\n", + " omega_1[i, top_idx],\n", + " h,\n", + " )\n", + "\n", + " # add advection term\n", + " rhs[i, j] -= advection(\n", + " omega_1[left_idx, j],\n", + " omega_1[right_idx, j],\n", + " omega_1[i, top_idx],\n", + " omega_1[i, down_idx],\n", + " psi[left_idx, j],\n", + " psi[right_idx, j],\n", + " psi[i, top_idx],\n", + " psi[i, down_idx],\n", + " h,\n", + " )\n", + "\n", + " # perform RK update\n", + " omega_1[i, j] = coeff0 * omega_0[i, j] + coeff1 * omega_1[i, j] + coeff2 * dt * rhs[i, j] # MISSING" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "---\n", + "## Building block 4: Spectral Poisson solver using tile-based FFT and transpose\n", + "\n", + "Recall from equation (3) in the Background section that the Poisson equation in the Fourier space is\n", + "\n", + "$$\\hat{\\psi}_{m,n} = \\frac{\\hat{\\omega}_{m,n}}{k_x^2 + k_y^2}.$$\n", + "\n", + "To solve this, we need to perform the follow steps:\n", + "1. Transform $\\omega$ to Fourier space using 2-D FFT → $\\hat{\\omega}$.\n", + "2. Divide by $|k|^2$ to obtain $\\hat{\\psi}$.\n", + "3. Transform $\\hat{\\psi}$ back to physical space using 2-D IFFT.\n", + "\n", + "**Warp's FFT primitives**: Warp provides `wp.tile_fft()` and `wp.tile_ifft()` that operate on `wp.vec2f` (or `wp.vec2d`) arrays and perform row-wise FFT/IFFT, where:\n", + "- `.x` component = **real part**\n", + "- `.y` component = **imaginary part**\n", + "\n", + "Since our physical fields $\\omega$, $\\psi$ are stored as `wp.float32`, we need a helper kernel to convert between data types.\n", + "\n", + "**What do we need to implement?**\n", + "\n", + "1. **Data type conversion**: Kernels to move data from `wp.float32` to `wp.vec2f` (for FFT) and from `wp.vec2f` to `wp.float32` (after IFFT).\n", + "\n", + "2. **Tile-based 1-D FFT/IFFT kernels**: Kernels that will use `wp.tile_fft()` and `wp.tile_ifft()` to perform row-wise transforms.\n", + "\n", + "3. **Transpose kernel**: A tiled transpose for efficient matrix transpose.\n", + "\n", + "4. **2-D FFT composition**: A Python function that will put together the kernels in steps (2) and (3) above to do 2-D FFT as **row FFT → transpose → row FFT**.\n", + "\n", + "Let's build these step by step." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Step 1: Data type conversion kernels\n", + "\n", + "Since Warp's FFT operates on `wp.vec2f` (complex) arrays but our physical fields are `wp.float32` (real), we need:\n", + "\n", + "- `copy_float_to_vec2`: Convert real array to complex with zero imaginary part (before FFT).\n", + "- `extract_real_and_scale`: Extract real part and apply scaling by dividing the real part by `scale` (after IFFT)." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**EXERCISE: Complete the `def extract_real_and_scale(...)` kernel definition. `def copy_float_to_vec2(...)` is already completed for you for reference.**" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "@wp.kernel\n", + "def copy_float_to_vec2(\n", + " omega: wp.array2d(dtype=wp.float32), omega_complex: wp.array2d(dtype=wp.vec2f)\n", + "):\n", + " \"\"\"Copy real vorticity to a complex array with zero imaginary part.\n", + "\n", + " Args:\n", + " omega: Input real-valued vorticity array.\n", + " omega_complex: Output complex array where real part is omega, imaginary is 0.\n", + " \"\"\"\n", + " i, j = wp.tid()\n", + " omega_complex[i, j] = wp.vec2f(omega[i, j], 0.0)\n", + "\n", + "\n", + "@wp.kernel\n", + "def extract_real_and_scale(\n", + " scale: wp.float32,\n", + " complex_array: wp.array2d(dtype=wp.vec2f),\n", + " real_array: wp.array2d(dtype=wp.float32),\n", + "):\n", + " \"\"\"Extract real part from complex array and scale in one pass.\n", + "\n", + " Args:\n", + " scale: Scale factor to multiply each element by.\n", + " complex_array: Input complex array (vec2f where .x is real part).\n", + " real_array: Output real array (scaled).\n", + " \"\"\"\n", + " i, j = wp.tid()\n", + " real_array[i, j] = complex_array[i, j].x / scale # MISSING" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Step 2: Tile-based 1-D FFT and IFFT kernels\n", + "\n", + "Warp's `wp.tile_fft()` and `wp.tile_ifft()` perform FFT on tiles loaded into registers. For row-wise transforms:\n", + "\n", + "1. Each thread block loads one row (tile) of the 2-D array\n", + "2. Performs FFT/IFFT using `wp.tile_fft()` / `wp.tile_ifft()`\n", + "3. Stores the result back to global memory\n", + "\n", + "The tile dimensions `TILE_M=1` and `TILE_N=N_GRID` mean each tile is exactly one row.\n", + "\n", + "**TODO: I am not sure if we should ask the readers to fill in the kernels definitions for `tile_fft(...)` and `tile_ifft(...)`. Also, @Eric for a better description of tile-based operations...**" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "@wp.kernel\n", + "def fft_tiled(x: wp.array2d(dtype=wp.vec2f), y: wp.array2d(dtype=wp.vec2f)):\n", + " \"\"\"Perform 1-D FFT on each row using wp.tile_fft().\n", + "\n", + " Args:\n", + " x: Input complex array of shape (N, N).\n", + " y: Output complex array of shape (N, N) storing FFT results.\n", + " \"\"\"\n", + " i, _, _ = wp.tid()\n", + " a = wp.tile_load(x, shape=(TILE_M, TILE_N), offset=(i * TILE_M, 0))\n", + " wp.tile_fft(a)\n", + " wp.tile_store(y, a, offset=(i * TILE_M, 0))\n", + "\n", + "\n", + "@wp.kernel\n", + "def ifft_tiled(x: wp.array2d(dtype=wp.vec2f), y: wp.array2d(dtype=wp.vec2f)):\n", + " \"\"\"Perform 1-D inverse FFT on each row using wp.tile_ifft().\n", + "\n", + " Args:\n", + " x: Input complex array of shape (N, N).\n", + " y: Output complex array of shape (N, N) storing IFFT results.\n", + " \"\"\"\n", + " i, _, _ = wp.tid()\n", + " a = wp.tile_load(x, shape=(TILE_M, TILE_N), offset=(i * TILE_M, 0))\n", + " wp.tile_ifft(a)\n", + " wp.tile_store(y, a, offset=(i * TILE_M, 0))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Validating FFT/IFFT kernels \n", + "To verify that `fft_tiled` and `ifft_tiled` work correctly, we test them with a known signal $f(x) = \\sin(x)$ on the domain $[0, 2\\pi)$. The Fourier transform of $\\sin(x)$ has a simple analytical form. Since $\\sin(x) = \\frac{e^{ix} - e^{-ix}}{2i}$, the FFT should produce peaks only at wavenumbers $k = \\pm 1$, with zero amplitude elsewhere. \n", + "\n", + "The `validate_fft_roundtrip(...)` function initializes the signal $f(x)$ on the $[0, 2\\pi)$ domain and performs a full roundtrip, i.e., it applies the FFT to the signal, then applies the IFFT to the result. The function then plots the magnitude spectrum $|\\hat{f}(k)|$ and compares the original signal with the reconstructed signal obtained from the FFT -> IFFT roundtrip." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Validate FFT kernels with a simple sine wave test\n", + "from utils import validate_fft_roundtrip\n", + "\n", + "fig, axes, max_error = validate_fft_roundtrip(\n", + " fft_kernel=fft_tiled,\n", + " ifft_kernel=ifft_tiled,\n", + " n_grid=N_GRID,\n", + " tile_m=TILE_M,\n", + " tile_n=TILE_N,\n", + " block_dim=BLOCK_DIM,\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Step 3: Tiled transpose kernel\n", + "\n", + "To compute 2-D FFT using 1-D row transforms, we need a transpose operation between passes\n", + "\n", + "$$\\text{2-D FFT} = \\text{row FFT} \\rightarrow \\text{transpose} \\rightarrow \\text{row FFT}.$$\n", + "\n", + "To perform the matrix transpose, we again use Warp's tile-based primitive `wp.tile_transpose()` below in the kernel `def tiled_transpose(...)` that takes in a `x` array and transposes it to give the `y` array.\n", + "\n", + "**TODO**: Same question as above, should we ask them to fill in these tile kernels. Again @Eric for a more accurate description of tile-based transpose...\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "@wp.kernel\n", + "def tiled_transpose(x: wp.array2d(dtype=wp.vec2f), y: wp.array2d(dtype=wp.vec2f)):\n", + " \"\"\"Transpose a 2-D array using tiled approach with shared memory.\n", + "\n", + " Args:\n", + " x: Input complex array.\n", + " y: Output complex array storing the transpose of x.\n", + " \"\"\"\n", + " i, j = wp.tid()\n", + " t = wp.tile_load(\n", + " x,\n", + " shape=(TILE_TRANSPOSE_DIM, TILE_TRANSPOSE_DIM),\n", + " offset=(i * TILE_TRANSPOSE_DIM, j * TILE_TRANSPOSE_DIM),\n", + " storage=\"shared\",\n", + " )\n", + " t_transposed = wp.tile_transpose(t)\n", + " wp.tile_store(\n", + " y, t_transposed, offset=(j * TILE_TRANSPOSE_DIM, i * TILE_TRANSPOSE_DIM)\n", + " )" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Validating transpose kernel\n", + "\n", + "In `def validate_transpose(...)`, we validate this using an upper triangular matrix, which should become lower triangular after transpose. For example, in case of a $4 \\times 4$ upper triangular matrix, transpose operation should result in the following change\n", + "\n", + "$$\n", + "\\begin{bmatrix} 1 & 1 & 1 & 1 \\\\ 0 & 1 & 1 & 1 \\\\ 0 & 0 & 1 & 1 \\\\ 0 & 0 & 0 & 1 \\end{bmatrix}\n", + "\\xrightarrow{\\text{transpose}}\n", + "\\begin{bmatrix} 1 & 0 & 0 & 0 \\\\ 1 & 1 & 0 & 0 \\\\ 1 & 1 & 1 & 0 \\\\ 1 & 1 & 1 & 1 \\end{bmatrix}.\n", + "$$" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import utils\n", + "from utils import validate_transpose\n", + "\n", + "fig, axes = validate_transpose(\n", + " transpose_kernel=tiled_transpose,\n", + " tile_transpose_dim=TILE_TRANSPOSE_DIM,\n", + " n_test=64,\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Step 4: Going from $\\hat{\\omega}$ to $\\hat{\\psi}$ in the Fourier space\n", + "\n", + "After transforming $\\omega$ to Fourier space, we solve equation (3)\n", + "\n", + "$$\\hat{\\psi}_{m,n} = \\frac{\\hat{\\omega}_{m,n}}{k_x^2 + k_y^2} = \\frac{\\hat{\\omega}_{m,n}}{|k|^2} = \\hat{\\omega}_{m,n} \\cdot |k|^{-2}.$$\n", + "\n", + "We precompute $1/|k|^2$ and multiply it elementwise to the `omega_hat` array to obtain the `psi_hat` array." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**EXERCISE: Please fill out the `def multiply_k2_inverse(...)` kernel definition.**" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "@wp.kernel\n", + "def multiply_k2_inverse(\n", + " k2i: wp.array2d(dtype=wp.float32),\n", + " omega_hat: wp.array2d(dtype=wp.vec2f),\n", + " psi_hat: wp.array2d(dtype=wp.vec2f),\n", + "):\n", + " \"\"\"Solve Poisson equation in Fourier space: psi_hat = omega_hat / |k|^2.\n", + "\n", + " Args:\n", + " k2i: Precomputed 1/|k|^2 array (0 at k=0 to handle DC component).\n", + " omega_hat: Fourier transform of vorticity.\n", + " psi_hat: Output Fourier transform of stream function.\n", + " \"\"\"\n", + " i, j = wp.tid()\n", + " psi_hat[i, j] = omega_hat[i, j] * k2i[i, j] # MISSING" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now that we have all the kernel calls figured out, we need to put together two Python functions before we start assembling the solver together!\n", + "\n", + "1. `def fft_2d(fft_kernel, input_arr)` - Python function that composes row-wise FFT/IFFT -> transpose → row-wise FFT to perform a full 2-D FFT/IFFT on a input complex (`wp.vec2f`) array.\n", + "\n", + "2. `def solve_poisson(omega, k2i)` - Python function that takes the real-valued vorticity field and pre-computed $|k|^2$ and gives out real values $\\psi$.\n", + "\n", + "3. `def step(omega_old, omega_new)` - Python function that performs on Runge-Kutte timestep to go from `omega_old` to `omega_new`." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**EXERCISE: Until now, you have mostly filled in and worked through Warp kernels and functions. In the next cell, we ask you to call a Warp kernel from inside the Python function `def fft_2d(fft_kernel, input_arr)`. This function is FFT/IFFT agnostic and can be used for both (how?). We ask you to fill two kernel calls, one to `def tiled_transpose(...)` and another to `def fft_kernel(...)`.**" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def fft_2d(fft_kernel, input_arr):\n", + " \"\"\"Perform 2-D FFT or IFFT using row-wise transform + transpose pattern.\n", + "\n", + " Args:\n", + " fft_kernel: Either def fft_tiled(...) or def ifft_tiled(...) Warp kernel.\n", + " input_arr: Input complex array.\n", + " \n", + " Returns:\n", + " Output complex array with FFT/IFFT results.\n", + " \"\"\"\n", + " # allocate temporary buffers for intermediate FFT results\n", + " temp_1 = wp.zeros((N_GRID, N_GRID), dtype=wp.vec2f)\n", + " temp_2 = wp.zeros((N_GRID, N_GRID), dtype=wp.vec2f)\n", + "\n", + " # perform rowwise FFT/IFFT\n", + " wp.launch_tiled(\n", + " fft_kernel,\n", + " dim=[N_GRID, 1],\n", + " inputs=[input_arr],\n", + " outputs=[temp_1],\n", + " block_dim=BLOCK_DIM,\n", + " )\n", + "\n", + " wp.launch_tiled(\n", + " tiled_transpose,\n", + " dim=(N_GRID // TILE_TRANSPOSE_DIM, N_GRID // TILE_TRANSPOSE_DIM),\n", + " inputs= [temp_1], # MISSING \n", + " outputs= [temp_2], # MISSING \n", + " block_dim=TILE_TRANSPOSE_DIM * TILE_TRANSPOSE_DIM,\n", + " )\n", + "\n", + " # perform columnwise FFT/IFFT\n", + " output_arr = wp.empty_like(input_arr)\n", + " wp.launch_tiled(\n", + " fft_kernel,\n", + " dim=[N_GRID, 1],\n", + " inputs= [temp_2], # MISSING \n", + " outputs=[output_arr],\n", + " block_dim=BLOCK_DIM,\n", + " )\n", + " \n", + " return output_arr" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**EXERCISE: Now, let's fill in the function call to `def fft_2d(...)` in the `def solve_poisson(...)` below.**" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def solve_poisson(omega, psi, k2i):\n", + " \"\"\"Solve the Poisson equation using FFT.\n", + " \n", + " Args:\n", + " omega: Vorticity field (wp.float32 array).\n", + " psi: Stream function solution (wp.float32 array).\n", + " k2i: Precomputed 1/|k|^2 array.\n", + " \"\"\"\n", + " # allocate all temporary arrays locally\n", + " omega_complex = wp.zeros((N_GRID, N_GRID), dtype=wp.vec2f)\n", + " temp_1 = wp.zeros((N_GRID, N_GRID), dtype=wp.vec2f)\n", + " temp_2 = wp.zeros((N_GRID, N_GRID), dtype=wp.vec2f)\n", + "\n", + " # convert vorticity from wp.float32 to wp.vec2f\n", + " wp.launch(copy_float_to_vec2, dim=(N_GRID, N_GRID),\n", + " inputs=[omega], outputs=[omega_complex])\n", + "\n", + " # forward FFT: take omega\n", + " # Remember fft_2d first and second arguments are fft_tiled/ifft_tiled and vorticity field (complex valued, wp.vec2f)\n", + " temp_1 = fft_2d(fft_tiled, omega_complex) # MISSING\n", + "\n", + " # solve Poisson in Fourier space: psi_hat = omega_hat / |k|^2\n", + " wp.launch(multiply_k2_inverse, dim=(N_GRID, N_GRID),\n", + " inputs=[k2i, temp_1], outputs=[temp_2])\n", + "\n", + " # inverse FFT: psi_hat -> psi\n", + " # Remember fft_2d first and second arguments are fft_tiled/ifft_tiled and vorticity field (complex valued, wp.vec2f)\n", + " temp_1 = fft_2d(ifft_tiled, temp_2) # MISSING\n", + "\n", + " # extract real part and normalize\n", + " wp.launch(extract_real_and_scale, dim=(N_GRID, N_GRID),\n", + " inputs=[wp.float32(N_GRID * N_GRID), temp_1],\n", + " outputs=[psi]) # MISSING" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Validating the Poisson solver\n", + "\n", + "We validate `solve_poisson` using an analytical test case. Consider the Poisson equation $\\nabla^2 \\psi = -\\omega$. If we choose $\\psi_{\\text{analytical}} = \\sin(k_x x)\\sin(k_y y)$, then\n", + "\n", + "$$\\nabla^2 \\psi = -k_x^2 \\sin(k_x x)\\sin(k_y y) - k_y^2 \\sin(k_x x)\\sin(k_y y) = -(k_x^2 + k_y^2)\\sin(k_x x)\\sin(k_y y).$$\n", + "\n", + "which means $\\omega = (k_x^2 + k_y^2)\\sin(k_x x)\\sin(k_y y)$.\n", + "\n", + "In the `def validate_poisson_solver(...)`, we set $\\omega = (k_x^2 + k_y^2)\\sin(k_x x)\\sin(k_y y)$ and make a call to `def solve_poisson(...)` to obtain $\\psi_{\\text{numerical}}$. Thereafter, we compare the obtained $\\psi_{\\text{numerical}}$ with $\\psi_{\\text{analytical}}$.\n", + "\n", + "In the cell below, for starters, we have set $k_x = k_y = 1$. Run the cell and see what you get. \n", + "\n", + "**Exercise: Change the values of $k_x$ and $k_y$ and see how the plots look for different $[k_x, k_y]$ pair.**" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Validate Poisson solver\n", + "from utils import validate_poisson_solver\n", + "\n", + "fig, axes, max_error = validate_poisson_solver(\n", + " solve_poisson_func=solve_poisson,\n", + " n_grid=N_GRID,\n", + " kx=1,\n", + " ky=1,\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We have already given you the `def step(...)` full definition in the cell below. A single call to `def step(...)` basically advances the whole $\\omega$ field on the 2-D grid from $t$ to $t + \\Delta t$. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def step(omega_0, omega_1, psi, rhs, k2i):\n", + " \"\"\"Advance simulation by one timestep using SSP-RK3.\n", + " \n", + " Args:\n", + " omega_0: Vorticity at the start of timestep on the 2-D grid.\n", + " omega_1: Vorticity at the end of timestep/RK substep on the 2-D grid.\n", + " psi: Stream function on the 2-D grid.\n", + " rhs: Temporary array for advection + diffusion terms.\n", + " k2i: Precomputed 1/|k|^2 for Poisson solver.\n", + " \"\"\"\n", + " # loop over 3 RK substeps with coefficients (c0, c1, c2)\n", + " for c0, c1, c2 in RK3_COEFFS:\n", + " rhs.zero_()\n", + "\n", + " # kernel call to compute the advection and diffusion components\n", + " # then advance the vorticity 1 RK3 substep\n", + " wp.launch(\n", + " viscous_advection_rk3_kernel,\n", + " dim=(N_GRID, N_GRID),\n", + " inputs=[N_GRID, H, RE, DT, c0, c1, c2, omega_0, omega_1, psi, rhs],\n", + " )\n", + "\n", + " # solve Poisson equation to get psi from omega_1\n", + " solve_poisson(omega_1, psi, k2i)\n", + "\n", + " # omega_1 now holds omega(t+dt), copy back to omega_0 for next timestep\n", + " wp.copy(omega_0, omega_1)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "---\n", + "## Assembling the Solver\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now that we have all the building blocks in place, let's put them together to run the simulation. In the cells below, we will:\n", + "\n", + "1. Allocate arrays for vorticity ($\\omega$), stream function ($\\psi$), RHS terms, and pre-compute $1/|k|^2$.\n", + "2. Initialize the vorticity field using the decaying turbulence spectrum mentioned above.\n", + "3. Capture the time-stepping loop in a CUDA graph for improved performance.\n", + "4. Run the simulation and visualize the results." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 1. Allocating arrays for the simulation" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# allocate warp arrays for vorticity, stream-function, and RHS of NS equation\n", + "omega_0 = wp.zeros((N_GRID, N_GRID), dtype=wp.float32) # MISSING\n", + "omega_1 = wp.zeros((N_GRID, N_GRID), dtype=wp.float32) # MISSING\n", + "psi = wp.zeros((N_GRID, N_GRID), dtype=wp.float32) # MISSING\n", + "rhs = wp.zeros((N_GRID, N_GRID), dtype=wp.float32) # MISSING\n", + "\n", + "# precompute 1/k^2 for spectral Poisson solver (avoid division by zero at k=0)\n", + "k = np.fft.fftfreq(N_GRID, d=1.0 / N_GRID)\n", + "kx, ky = np.meshgrid(k, k)\n", + "k2 = kx**2 + ky**2\n", + "k2i_np = np.zeros_like(k2)\n", + "nonzero = k2 != 0\n", + "k2i_np[nonzero] = 1.0 / k2[nonzero]\n", + "k2i = wp.array2d(k2i_np.astype(np.float32), dtype=wp.float32)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 2. Initialize vorticity on the 2-D grid using the spectrum discussed before" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# compute initial vorticity distribution for decaying turbulence\n", + "k_mag_np = np.sqrt(k**2 + k[:, np.newaxis] ** 2)\n", + "k_mag = wp.array2d(k_mag_np.astype(np.float32), dtype=wp.float32)\n", + "\n", + "rng = np.random.default_rng(42)\n", + "zeta_np = 2 * np.pi * rng.random((N_GRID // 2 + 1, N_GRID // 2 + 1))\n", + "eta_np = 2 * np.pi * rng.random((N_GRID // 2 + 1, N_GRID // 2 + 1))\n", + "zeta = wp.array2d(zeta_np.astype(np.float32), dtype=wp.float32)\n", + "eta = wp.array2d(eta_np.astype(np.float32), dtype=wp.float32)\n", + "omega_complex = wp.zeros((N_GRID, N_GRID), dtype=wp.vec2f)\n", + "\n", + "# set parameters for energy spectrum\n", + "K_P = 12.0\n", + "S = 3\n", + "\n", + "wp.launch(\n", + " decaying_turbulence_initializer,\n", + " dim=(N_GRID, N_GRID),\n", + " inputs=[N_GRID, K_P, S, k_mag, zeta, eta],\n", + " outputs=[omega_complex], # MISSING\n", + ")\n", + "\n", + "# compute IFFT of omega_complex field to get initial vorticity in physical space\n", + "# Remember fft_2d first and second arguments are fft_tiled/ifft_tiled and vorticity field (complex valued, wp.vec2f)\n", + "fft_result = fft_2d(ifft_tiled, omega_complex) # MISSING\n", + "\n", + "# extract real part to get initial vorticity field (scale=1.0 for normalization)\n", + "wp.launch(\n", + " extract_real_and_scale,\n", + " dim=(N_GRID, N_GRID),\n", + " inputs=[1.0, fft_result],\n", + " outputs=[omega_0],\n", + ")\n", + "\n", + "# for initial distribution, set both omega_1 and omega_0 to be the same\n", + "wp.copy(omega_1, omega_0)\n", + "\n", + "# solve initial Poisson equation to get psi from initial vorticity field\n", + "solve_poisson(omega_1, psi, k2i)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 3. CUDA graph capture\n", + "\n", + "CUDA graphs let you define a sequence of operations (kernel launches, data movement) and their dependencies once, then replay it repeatedly with minimal overhead. Normally, each kernel launch incurs CPU-side setup costs. For short-running kernels launched many times, this overhead dominates. By capturing the entire workflow in a graph, these setup costs are paid once during graph creation, and subsequent launches are nearly free.\n", + "\n", + "In Warp, you can capture a CUDA graph with `wp.ScopedCapture()`. In the code block below, a single call to `step(...)` is captured and will be replayed for as many times as we want to run the simulation." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# capture first step in a CUDA graph\n", + "with wp.ScopedCapture() as capture:\n", + " step(omega_0, omega_1, psi, rhs, k2i)\n", + "step_graph = capture.graph" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 4. Running the simulation and saving the results\n", + "\n", + "The boilerplate code below runs the simulation for 2000 timesteps and saves the vorticity field at every 10th timestep. Thereafter, the vorticity fields are plotted and saved as a GIF." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import time\n", + "from PIL import Image\n", + "import matplotlib.pyplot as plt\n", + "from matplotlib.colors import Normalize\n", + "import os\n", + "\n", + "NUM_FRAMES = 200\n", + "STEPS_PER_FRAME = 10\n", + "\n", + "# Colormap setup\n", + "cmap = plt.cm.twilight\n", + "norm = Normalize(vmin=-15, vmax=15)\n", + "\n", + "frames = []\n", + "print(f\"Running {NUM_FRAMES} frames ({STEPS_PER_FRAME} steps each)...\")\n", + "\n", + "start_time = time.perf_counter()\n", + "for frame in range(NUM_FRAMES):\n", + " # Advance simulation\n", + " for _ in range(STEPS_PER_FRAME):\n", + " wp.capture_launch(step_graph)\n", + "\n", + " # Capture frame\n", + " vorticity = omega_1.numpy().T # Transpose for correct orientation\n", + " colored = cmap(norm(vorticity))\n", + " rgb = (colored[:, :, :3] * 255).astype(np.uint8)\n", + " frames.append(rgb)\n", + "\n", + " if (frame + 1) % 50 == 0:\n", + " print(f\" Frame {frame + 1}/{NUM_FRAMES}\")\n", + "\n", + "elapsed = time.perf_counter() - start_time\n", + "total_steps = NUM_FRAMES * STEPS_PER_FRAME\n", + "print(f\"Completed {total_steps} steps in {elapsed:.2f}s ({total_steps/elapsed:.0f} steps/s)\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import IPython.display\n", + "\n", + "# Create output directory\n", + "os.makedirs(\"./images/chapter-12.2\", exist_ok=True)\n", + "output_file = f\"./images/chapter-12.2/turbulence_{N_GRID}x{N_GRID}_Re{int(RE)}.gif\"\n", + "\n", + "# Save as GIF\n", + "pil_images = [Image.fromarray(frame) for frame in frames]\n", + "pil_images[0].save(\n", + " output_file,\n", + " save_all=True,\n", + " append_images=pil_images[1:],\n", + " duration=50,\n", + " loop=0,\n", + ")\n", + "\n", + "print(f\"Saved: {output_file}\")\n", + "IPython.display.Image(output_file)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "---\n", + "## Validation\n", + "\n", + "**TODO**" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "---\n", + "## Conclusion\n", + "\n", + "**TODO**" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "---\n", + "## References\n", + "\n", + "**TODO**" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "---\n", + "## Appendix A: Phase symmetry for real-valued physical fields\n", + "\n", + "Following San and Staples 2012, we decompose the phase as $\\zeta(\\mathbf{k}) = \\xi(\\mathbf{k}) + \\eta(\\mathbf{k})$, where $\\xi$ and $\\eta$ are independently chosen random values in $[0, 2\\pi]$ in the first quadrant ($k_x \\geq 0$, $k_y \\geq 0$).\n", + "\n", + "$\\xi(\\mathbf{k})$ and $\\eta(\\mathbf{k})$ must follow these relations in other quadrants\n", + "\n", + "$$\\xi(-k_x, k_y) = -\\xi(k_x, k_y),$$\n", + "$$\\xi(-k_x, -k_y) = -\\xi(k_x, k_y),$$\n", + "$$\\xi(k_x, -k_y) = \\xi(k_x, k_y),$$\n", + "\n", + "$$\\eta(-k_x, k_y) = \\eta(k_x, k_y),$$\n", + "$$\\eta(-k_x, -k_y) = -\\eta(k_x, k_y),$$\n", + "$$\\eta(k_x, -k_y) = -\\eta(k_x, k_y).$$" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "warp-cfd", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.10" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/Accelerated_Python_User_Guide/notebooks/utils.py b/Accelerated_Python_User_Guide/notebooks/utils.py new file mode 100644 index 00000000..fa08403b --- /dev/null +++ b/Accelerated_Python_User_Guide/notebooks/utils.py @@ -0,0 +1,449 @@ +# SPDX-License-Identifier: Apache-2.0 AND CC-BY-NC-4.0 +"""Utility functions for validating FFT kernels in the 2D Navier-Stokes solver.""" + +import numpy as np +import warp as wp +import matplotlib.pyplot as plt + + +def validate_fft_roundtrip( + fft_kernel, + ifft_kernel, + n_grid: int, + tile_m: int, + tile_n: int, + block_dim: int, + figsize=(14, 4), +): + """Validate FFT and IFFT kernels using a sine wave test. + + Creates a sin(x) function on [0, 2\pi], applies FFT to show the amplitude + spectrum (should peak at k=1), then applies IFFT and compares with original. + + Args: + fft_kernel: The fft_tiled Warp kernel. + ifft_kernel: The ifft_tiled Warp kernel. + n_grid: Grid resolution (must match TILE_N used in kernel compilation). + tile_m: TILE_M parameter (typically 1 for row-wise FFT). + tile_n: TILE_N parameter (typically N_GRID). + block_dim: Block dimension for kernel launch. + figsize: Figure size tuple. + + Returns: + fig, axes: Matplotlib figure and axes. + max_error: Maximum absolute error between original and reconstructed. + """ + # Create physical grid on [0, 2\pi) + L = 2.0 * np.pi + x = np.linspace(0, L, n_grid, endpoint=False) + + # Create test signal: sin(x) which has a peak at k=1 + # Since f(x) = sin(x) = (e^{ix} - e^{-ix}) / 2i, FFT should show peaks at k=\pm 1 + signal = np.sin(x) + + # Prepare 2D array (fft_tiled operates on rows of 2D arrays) + # We'll use a single row for this 1D test + signal_2d = signal.reshape(1, n_grid).astype(np.float32) + + # Convert to complex (wp.vec2f format): [real, imag] + # Shape must be (1, n_grid, 2) for Warp to interpret as (1, n_grid) of vec2f + signal_complex = np.zeros((1, n_grid, 2), dtype=np.float32) + signal_complex[:, :, 0] = signal_2d # Real part + signal_complex[:, :, 1] = 0.0 # Imaginary part = 0 + + # Create Warp arrays - pass shape (1, n_grid) and let Warp handle vec2f + input_wp = wp.array(signal_complex, dtype=wp.vec2f, shape=(1, n_grid)) + fft_result = wp.zeros((1, n_grid), dtype=wp.vec2f) + ifft_result = wp.zeros((1, n_grid), dtype=wp.vec2f) + + # Perform FFT + wp.launch_tiled( + fft_kernel, + dim=[1, 1], + inputs=[input_wp], + outputs=[fft_result], + block_dim=block_dim, + ) + + # Perform IFFT + wp.launch_tiled( + ifft_kernel, + dim=[1, 1], + inputs=[fft_result], + outputs=[ifft_result], + block_dim=block_dim, + ) + + # Synchronize and get results + wp.synchronize() + + # Convert back to numpy + fft_np = fft_result.numpy() + ifft_np = ifft_result.numpy() + + # Extract real and imaginary parts from FFT result + fft_real = fft_np[0, :, 0] + fft_imag = fft_np[0, :, 1] + fft_magnitude = np.sqrt(fft_real**2 + fft_imag**2) + + # IFFT result (need to normalize by N) + reconstructed = ifft_np[0, :, 0] / n_grid # Real part, normalized + + # Wavenumbers for FFT (standard ordering: 0, 1, 2, ..., N/2-1, -N/2, ..., -1) + k = np.fft.fftfreq(n_grid, d=L / (2 * np.pi * n_grid)) + + # Compute error + max_error = np.max(np.abs(signal - reconstructed)) + + # Create figure with 3 panels + fig, axes = plt.subplots(1, 3, figsize=figsize) + + # Panel 1: Original signal + axes[0].plot(x, signal, "b-", linewidth=2, label="sin(x)") + axes[0].set_xlabel("x") + axes[0].set_ylabel("f(x)") + axes[0].set_title("Original Signal: f(x) = sin(x)") + axes[0].set_xlim([0, L]) + axes[0].grid(True, alpha=0.3) + axes[0].legend() + + # Panel 2: FFT magnitude spectrum + # Shift for better visualization (put k=0 at center) + k_shifted = np.fft.fftshift(k) + mag_shifted = np.fft.fftshift(fft_magnitude) + + axes[1].stem(k_shifted, mag_shifted, basefmt=" ") + axes[1].set_xlabel("Wavenumber k") + axes[1].set_ylabel(r"$|\hat{f}(k)|$") + axes[1].set_title("FFT Amplitude Spectrum") + axes[1].set_xlim([-10, 10]) # Focus on low wavenumbers + axes[1].grid(True, alpha=0.3) + axes[1].axvline(x=1, color="r", linestyle="--", alpha=0.5, label="k=1") + axes[1].axvline(x=-1, color="r", linestyle="--", alpha=0.5, label="k=-1") + axes[1].legend() + + # Panel 3: Comparison of original vs reconstructed + axes[2].plot(x, signal, "b-", linewidth=2, label="Original") + axes[2].plot(x, reconstructed, "r--", linewidth=2, label="FFT→IFFT") + axes[2].set_xlabel("x") + axes[2].set_ylabel("f(x)") + axes[2].set_title(f"Comparison b/w original and reconstructed signal") + axes[2].set_xlim([0, L]) + axes[2].grid(True, alpha=0.3) + axes[2].legend() + + plt.tight_layout() + + return fig, axes, max_error + + +def validate_transpose( + transpose_kernel, + tile_transpose_dim: int, + n_test: int = 64, + figsize=(10, 4), +): + """Validate the tiled transpose kernel with a visual comparison. + + Creates an upper triangular matrix (1s above diagonal, 0s below), + applies transpose, and shows before/after. + + Args: + transpose_kernel: The tiled_transpose Warp kernel. + tile_transpose_dim: TILE_TRANSPOSE_DIM parameter used in kernel. + n_test: Size of the test matrix (default 64x64). + figsize: Figure size tuple. + + Returns: + fig, axes: Matplotlib figure and axes. + """ + # Create upper triangular matrix: 1s above diagonal, 0s below + pattern = np.triu(np.ones((n_test, n_test), dtype=np.float32)) + + # Convert to complex format (wp.vec2f) - store pattern in real part + input_complex = np.zeros((n_test, n_test, 2), dtype=np.float32) + input_complex[:, :, 0] = pattern + + # Create Warp arrays + input_wp = wp.array(input_complex, dtype=wp.vec2f, shape=(n_test, n_test)) + output_wp = wp.zeros((n_test, n_test), dtype=wp.vec2f) + + # Launch transpose kernel + wp.launch_tiled( + transpose_kernel, + dim=(n_test // tile_transpose_dim, n_test // tile_transpose_dim), + inputs=[input_wp], + outputs=[output_wp], + block_dim=tile_transpose_dim * tile_transpose_dim, + ) + wp.synchronize() + + # Get result + transposed_result = output_wp.numpy()[:, :, 0] + + # Plot + fig, axes = plt.subplots(1, 2, figsize=figsize) + + axes[0].imshow(pattern, cmap="gray_r") + axes[0].set_title("Original") + axes[0].set_xlabel("j") + axes[0].set_ylabel("i") + + axes[1].imshow(transposed_result, cmap="gray_r") + axes[1].set_title("After transpose") + axes[1].set_xlabel("j") + axes[1].set_ylabel("i") + + plt.tight_layout() + return fig, axes + + +def validate_poisson_solver( + solve_poisson_func, + n_grid: int, + kx: int = 1, + ky: int = 1, + figsize=(14, 4), +): + """Validate the Poisson solver using an analytical test case. + + Tests with \omega = (kx^2 + ky^2)sin(kx·x)sin(ky·y), which has the analytical + solution psi = sin(kx·x)sin(ky·y) for the Poisson equation \laplacian \psi = -\omega. + + Args: + solve_poisson_func: The solve_poisson function to validate. + n_grid: Grid resolution. + kx: Wavenumber in x-direction (default 1). + ky: Wavenumber in y-direction (default 1). + figsize: Figure size tuple. + + Returns: + fig, axes: Matplotlib figure and axes. + max_error: Maximum absolute error between computed and analytical solution. + """ + import warp as wp + + # Create 2D grid on [0, 2\pi) \times [0, 2\pi) + L = 2.0 * np.pi + x = np.linspace(0, L, n_grid, endpoint=False) + y = np.linspace(0, L, n_grid, endpoint=False) + X, Y = np.meshgrid(x, y) + + # Analytical solution: \psi = sin(kx·x)sin(ky·y) + psi_analytical = np.sin(kx * X) * np.sin(ky * Y) + + # Corresponding vorticity: \omega = (kx^2 + ky^2)sin(kx·x)sin(ky·y) + # From \laplacian\psi = -\omega, we have \omega = -\laplacian\psi = (kx^2 + ky^2)\psi + k_squared = kx**2 + ky**2 + omega = k_squared * psi_analytical + + # Precompute 1/|k|^2 for spectral Poisson solver + k_freq = np.fft.fftfreq(n_grid, d=1.0 / n_grid) + kx_grid, ky_grid = np.meshgrid(k_freq, k_freq) + k2 = kx_grid**2 + ky_grid**2 + k2i_np = np.zeros_like(k2) + nonzero = k2 != 0 + k2i_np[nonzero] = 1.0 / k2[nonzero] + k2i = wp.array(k2i_np.astype(np.float32), dtype=wp.float32) + + # Create Warp arrays for omega and psi + omega_wp = wp.array(omega.astype(np.float32), dtype=wp.float32) + psi_wp = wp.zeros((n_grid, n_grid), dtype=wp.float32) + + # Solve Poisson equation + solve_poisson_func(omega_wp, psi_wp, k2i) + wp.synchronize() + + # Get computed solution + psi_computed = psi_wp.numpy() + + # Compute error + error = np.abs(psi_computed - psi_analytical) + max_error = np.max(error) + + # Create visualization + fig, axes = plt.subplots(1, 3, figsize=figsize) + + # Common colormap settings + vmin = min(psi_analytical.min(), psi_computed.min()) + vmax = max(psi_analytical.max(), psi_computed.max()) + + # Panel 1: Computed solution + im0 = axes[0].imshow( + psi_computed, + extent=[0, L, 0, L], + origin="lower", + cmap="RdBu_r", + vmin=vmin, + vmax=vmax, + ) + axes[0].set_title(r"Computed $\psi$") + axes[0].set_xlabel("x") + axes[0].set_ylabel("y") + plt.colorbar(im0, ax=axes[0], shrink=0.8) + + # Panel 2: Analytical solution + im1 = axes[1].imshow( + psi_analytical, + extent=[0, L, 0, L], + origin="lower", + cmap="RdBu_r", + vmin=vmin, + vmax=vmax, + ) + axes[1].set_title(r"Analytical $\psi = \sin(x)\sin(y)$") + axes[1].set_xlabel("x") + axes[1].set_ylabel("y") + plt.colorbar(im1, ax=axes[1], shrink=0.8) + + # Panel 3: Error + im2 = axes[2].imshow(error, extent=[0, L, 0, L], origin="lower", cmap="hot") + axes[2].set_title(f"Error") + axes[2].set_xlabel("x") + axes[2].set_ylabel("y") + plt.colorbar(im2, ax=axes[2], shrink=0.8) + + plt.tight_layout() + return fig, axes, max_error + + +def validate_advection_diffusion( + advection_diffusion_kernel, + n_grid: int, + kx: int = 2, + ky: int = 3, + figsize=(10, 4), +): + """Validate finite-difference advection and diffusion using analytical test case. + + Uses test functions: + omega = (kx^2 + ky^2) * sin(kx*x) * sin(ky*y) + psi = sin(kx*x) * sin(ky*y) + + Analytical results: + Diffusion: nabla^2 omega = -(kx^2 + ky^2)^2 * sin(kx*x) * sin(ky*y) + Advection: J(psi, omega) = 0 (since omega = c * psi with constant c) + + Args: + advection_diffusion_kernel: Warp kernel with signature: + (omega, psi, advection_out, diffusion_out, h, n) where all arrays + are wp.array2d(dtype=wp.float32), h is wp.float32, n is wp.int32. + n_grid: Grid resolution. + kx: Wavenumber in x-direction. + ky: Wavenumber in y-direction. + figsize: Figure size for each 1x2 panel. + + Returns: + (fig_diff, axes_diff): Matplotlib figure/axes for diffusion comparison. + (fig_adv, axes_adv): Matplotlib figure/axes for advection comparison. + max_diff_error: Maximum absolute error in diffusion. + max_adv_error: Maximum absolute error in advection. + """ + # Create 2D grid on [0, 2*pi) x [0, 2*pi) + L = 2.0 * np.pi + h = L / n_grid + x = np.linspace(0, L, n_grid, endpoint=False) + y = np.linspace(0, L, n_grid, endpoint=False) + X, Y = np.meshgrid(x, y) + + # Analytical fields + k_squared = kx**2 + ky**2 + psi_analytical = np.sin(kx * X) * np.sin(ky * Y) + omega_analytical = k_squared * psi_analytical + + # Analytical diffusion: nabla^2 omega = -(kx^2 + ky^2) * omega + diffusion_analytical = -k_squared * omega_analytical + + # Analytical advection: J(psi, omega) = 0 (since omega = c * psi) + advection_analytical = np.zeros_like(omega_analytical) + + # Create Warp arrays + omega_wp = wp.array(omega_analytical.astype(np.float32), dtype=wp.float32) + psi_wp = wp.array(psi_analytical.astype(np.float32), dtype=wp.float32) + advection_wp = wp.zeros((n_grid, n_grid), dtype=wp.float32) + diffusion_wp = wp.zeros((n_grid, n_grid), dtype=wp.float32) + + # Launch user-provided kernel + wp.launch( + advection_diffusion_kernel, + dim=(n_grid, n_grid), + inputs=[omega_wp, psi_wp, advection_wp, diffusion_wp, float(h), n_grid], + ) + wp.synchronize() + + # Get numerical results + diffusion_numerical = diffusion_wp.numpy() + advection_numerical = advection_wp.numpy() + + # --- Diffusion comparison plot (1x2 panel) --- + fig_diff, axes_diff = plt.subplots(1, 2, figsize=figsize) + + vmin_d = min(diffusion_analytical.min(), diffusion_numerical.min()) + vmax_d = max(diffusion_analytical.max(), diffusion_numerical.max()) + + im0 = axes_diff[0].imshow( + diffusion_analytical, + extent=[0, L, 0, L], + origin="lower", + cmap="RdBu_r", + vmin=vmin_d, + vmax=vmax_d, + ) + axes_diff[0].set_title(r"Analytical $\nabla^2\omega$") + axes_diff[0].set_xlabel("x") + axes_diff[0].set_ylabel("y") + plt.colorbar(im0, ax=axes_diff[0], shrink=0.8) + + im1 = axes_diff[1].imshow( + diffusion_numerical, + extent=[0, L, 0, L], + origin="lower", + cmap="RdBu_r", + vmin=vmin_d, + vmax=vmax_d, + ) + axes_diff[1].set_title(rf"Numerical Diffusion") + axes_diff[1].set_xlabel("x") + axes_diff[1].set_ylabel("y") + plt.colorbar(im1, ax=axes_diff[1], shrink=0.8) + + fig_diff.suptitle(f"Diffusion Validation (kx={kx}, ky={ky}, N={n_grid})", y=1.02) + fig_diff.tight_layout() + + # --- Advection comparison plot (1x2 panel) --- + fig_adv, axes_adv = plt.subplots(1, 2, figsize=figsize) + + # For advection, analytical is zero, so set symmetric limits around numerical + adv_max = max(np.abs(advection_numerical).max(), 1e-10) + vmin_a, vmax_a = -adv_max, adv_max + + im2 = axes_adv[0].imshow( + advection_analytical, + extent=[0, L, 0, L], + origin="lower", + cmap="RdBu_r", + vmin=vmin_a, + vmax=vmax_a, + ) + axes_adv[0].set_title(r"Analytical Advection (= 0)") + axes_adv[0].set_xlabel("x") + axes_adv[0].set_ylabel("y") + plt.colorbar(im2, ax=axes_adv[0], shrink=0.8) + + im3 = axes_adv[1].imshow( + advection_numerical, + extent=[0, L, 0, L], + origin="lower", + cmap="RdBu_r", + vmin=vmin_a, + vmax=vmax_a, + ) + axes_adv[1].set_title(rf"Numerical Advection") + axes_adv[1].set_xlabel("x") + axes_adv[1].set_ylabel("y") + plt.colorbar(im3, ax=axes_adv[1], shrink=0.8) + + fig_adv.suptitle(f"Advection Validation (kx={kx}, ky={ky}, N={n_grid})", y=1.02) + fig_adv.tight_layout() + + return (fig_diff, axes_diff), (fig_adv, axes_adv)