diff --git a/Accelerated_Python_User_Guide/notebooks/2D_Navier_Stokes_In_Warp.ipynb b/Accelerated_Python_User_Guide/notebooks/2D_Navier_Stokes_In_Warp.ipynb
new file mode 100644
index 00000000..7d57b9ef
--- /dev/null
+++ b/Accelerated_Python_User_Guide/notebooks/2D_Navier_Stokes_In_Warp.ipynb
@@ -0,0 +1,1535 @@
+{
+ "cells": [
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# SPDX-License-Identifier: Apache-2.0 AND CC-BY-NC-4.0\n",
+ "#\n",
+ "# Licensed under the Apache License, Version 2.0 (the \"License\");\n",
+ "# you may not use this file except in compliance with the License.\n",
+ "# You may obtain a copy of the License at\n",
+ "#\n",
+ "# http://www.apache.org/licenses/LICENSE-2.0\n",
+ "#\n",
+ "# Unless required by applicable law or agreed to in writing, software\n",
+ "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
+ "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
+ "# See the License for the specific language governing permissions and\n",
+ "# limitations under the License."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# GPU-Accelerated 2-D Navier Stokes Simulation using NVIDIA Warp\n",
+ "\n",
+ "## Overview\n",
+ "\n",
+ "In this notebook, we will build a fluid solver entirely in Python using the Warp framework. Specifically, we will simulate 2-D turbulent flow in a box.\n",
+ "\n",
+ "Through this implementation, you will learn:\n",
+ "\n",
+ "* Writing SIMT (single instruction, multiple threads) kernels for finite difference operators and flow initialization\n",
+ "* Warp's tile-based programming primitives for efficient matrix operations, e.g., FFT, inverse FFT, and matrix transpose\n",
+ "* CUDA graph capture for improving performance of timestepping codes\n",
+ "\n",
+ "By the end of this lab, we will have built a GPU-accelerated 2-D Navier-Stokes solver in a periodic box, while gaining experience in applying Warp's features to leverage a GPU for a computational physics problem. "
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "---\n",
+ "## Setup\n",
+ "\n",
+ "Before we begin implementing the 2-D N-S solver, let's ensure we have all the necessary packages installed."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# install Warp\n",
+ "%pip install warp-lang \n",
+ "\n",
+ "# install visualization package\n",
+ "%pip install matplotlib Pillow"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Now let us import the necessary libraries and initialize Warp to check if GPU support is available:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import numpy as np\n",
+ "import warp as wp\n",
+ "\n",
+ "if wp.get_cuda_device_count() > 0:\n",
+ " print(\"GPU detected successfully\")\n",
+ "else:\n",
+ " print(\"No GPU detected!\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "---\n",
+ "## Introduction\n",
+ "\n",
+ "This seemingly simple 2-D example combines multiple Warp features that can be also leveraged to build industrial-grade solvers:\n",
+ " - Accelerated Warp kernels for finite difference operators\n",
+ " - Tile-based matrix operations for a more performant code\n",
+ " - CUDA graph capture to speed-up multiple calls to the same kernel\n",
+ "\n",
+ "Before diving into the code, let's understand a bit of physics behind the underlying equations. "
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "---\n",
+ "## Background: Equations, boundary conditions, and initial conditions\n",
+ "\n",
+ "### 1. Vorticity-stream function formulation\n",
+ "\n",
+ "For a 2-D incompressible flow, we define:\n",
+ "- **Vorticity**: $\\omega = \\partial v/\\partial x - \\partial u/\\partial y$\n",
+ "- **Stream function**: $\\psi$ such that $u = \\partial\\psi/\\partial y$, $v = -\\partial\\psi/\\partial x$\n",
+ "\n",
+ "In the vorticity and stream function equation above, $u$ and $v$ are the two components of the velocity vector $\\mathbf{u} = [u, v]$ in 2-D. This formulation automatically satisfies continuity and eliminates pressure from the Navier-Stokes equations.\n",
+ "\n",
+ "### 2. Governing equation\n",
+ "\n",
+ "In 2-D, the non-dimensional governing equations for incompressible flows can be written using the **vorticity-stream function transport equation** as\n",
+ "\n",
+ "$$\\underbrace{\\frac{\\partial \\omega}{\\partial t}}_{\\text{unsteadiness}} + \\underbrace{\\frac{\\partial \\psi}{\\partial y}\\frac{\\partial \\omega}{\\partial x} - \\frac{\\partial \\psi}{\\partial x}\\frac{\\partial \\omega}{\\partial y}}_{\\text{advection}} = \\underbrace{\\frac{1}{Re}\\left(\\frac{\\partial^2 \\omega}{\\partial x^2} + \\frac{\\partial^2 \\omega}{\\partial y^2}\\right)}_{\\text{diffusion}} \\tag{1},$$\n",
+ "\n",
+ "along with the relationship between stream function and vorticity, given by a Poisson equation. \n",
+ "\n",
+ "Starting from the definition of vorticity (in 2-D)\n",
+ " \n",
+ "$$\\omega = \\frac{\\partial v}{\\partial x} - \\frac{\\partial u}{\\partial y}.$$\n",
+ " \n",
+ "Substituting the streamfunction relations $u = \\frac{\\partial \\psi}{\\partial y}$ and $v = -\\frac{\\partial \\psi}{\\partial x}$ gives\n",
+ " \n",
+ "$$\\omega = \\frac{\\partial}{\\partial x}\\left(-\\frac{\\partial \\psi}{\\partial x}\\right) - \\frac{\\partial}{\\partial y}\\left(\\frac{\\partial \\psi}{\\partial y}\\right),$$\n",
+ " \n",
+ "$$\\omega = -\\frac{\\partial^2 \\psi}{\\partial x^2} - \\frac{\\partial^2 \\psi}{\\partial y^2}.$$\n",
+ " \n",
+ "Rearranging gives the Poisson equation\n",
+ " \n",
+ "$$\\frac{\\partial^2 \\psi}{\\partial x^2} + \\frac{\\partial^2 \\psi}{\\partial y^2} = -\\omega \\tag{2}.$$\n",
+ "\n",
+ "When writing the solver, equations (1) and (2) will be solved at each timestep to evolve the flow. \n",
+ "\n",
+ "**NOTE: These equations might look intimidating, but we will break down the different parts in the subsequent cells to make them more tractable. So, please bear with us for a bit!**\n",
+ "\n",
+ "### 3. Spectral Poisson solver\n",
+ "\n",
+ "The Poisson equation in (2) can be discretized and solved using iterative methods like Jacobi and Gauss-Seidel methods. However, since we have a periodic domain, we can represent both $\\omega$ and $\\psi$ using Fourier series with sine and cosine basis functions as\n",
+ " \n",
+ "$$\\omega(x,y) = \\sum_{m,n} \\hat{\\omega}_{m,n} e^{i(k_x x + k_y y)},$$\n",
+ " \n",
+ "$$\\psi(x,y) = \\sum_{m,n} \\hat{\\psi}_{m,n} e^{i(k_x x + k_y y)},$$\n",
+ " \n",
+ "where $\\hat{\\omega}_{m,n}$ and $\\hat{\\psi}_{m,n}$ are what we call the Fourier coefficients, and $k_x = \\frac{2\\pi m}{L_x}$, $k_y = \\frac{2\\pi n}{L_y}$ are the wavenumbers. Now, if we substitute these Fourier representations back into equation (2), we can compute the derivatives as\n",
+ "\n",
+ "$$\\frac{\\partial^2 \\psi}{\\partial x^2} = \\sum_{m,n} -k_x^2 \\hat{\\psi}_{m,n} e^{i(k_x x + k_y y)},$$\n",
+ " \n",
+ "$$\\frac{\\partial^2 \\psi}{\\partial y^2} = \\sum_{m,n} -k_y^2 \\hat{\\psi}_{m,n} e^{i(k_x x + k_y y)}.$$\n",
+ " \n",
+ "Substituting these into the Poisson equation, i.e., equation (2) above will result in\n",
+ " \n",
+ "$$\\sum_{m,n} -(k_x^2 + k_y^2) \\hat{\\psi}_{m,n} e^{i(k_x x + k_y y)} = -\\sum_{m,n} \\hat{\\omega}_{m,n} e^{i(k_x x + k_y y)}.$$\n",
+ " \n",
+ "Since this must hold for each pair of $[k_x, k_y]$ independently, **the Poisson equation reduces to a simple algebraic equation in Fourier space** \n",
+ "\n",
+ "$$\\hat{\\psi}_{m,n} = \\frac{\\hat{\\omega}_{m,n}}{(k_x^2 + k_y^2)}. \\tag{3}$$\n",
+ "\n",
+ "\n",
+ "Note that **Fourier space** refers to the spectral domain where we work directly with the Fourier coefficients $\\hat{\\omega}_{m,n}$ and $\\hat{\\psi}_{m,n}$ along with their corresponding wavenumbers $k_x$ and $k_y$. After solving in Fourier space, we recover the physical fields $\\omega$ and $\\psi$ using the inverse Fourier transform. The wavenumbers are defined as $k_x = \\frac{2\\pi m}{L_x}, \\quad k_y = \\frac{2\\pi n}{L_y}$, where $m$ and $n$ are the Fourier mode indices, each ranging from $[-N/2, N/2 -1]$ on a square grid (with $N$ being the total number of grid points along any given direction of the square grid). \n",
+ "\n",
+ "\n",
+ "### 4. Time integration \n",
+ "Runge-Kutta methods advance the solution by taking multiple intermediate \"substeps\" within each timestep, achieving better accuracy than a single forward step. In this example, we use a **strong-stability preserving (SSP) third-order** **explicit** Runge-Kutta method.\n",
+ "\n",
+ "- **Third-order**: The numerical error scales as $\\mathcal{O}(\\Delta t)^3$. So, halving the timestep reduces error by $\\sim 8\\times$.\n",
+ "\n",
+ "- **SSP**: Schemes designed specifically to prevent spurious oscillations for hyperbolic PDEs. Interested readers can read more [here](https://api.drum.lib.umd.edu/server/api/core/bitstreams/faa96f40-0fbb-4af0-84a3-5e3b80c908df/content).\n",
+ "\n",
+ "- **Explicit**: Each new timestep is computed directly from the previous one (easier to program, as we will see!).\n",
+ "\n",
+ "SSP RK3 scheme takes the following form below, where $\\omega^{(n)}$ and $\\omega^{(n+1)}$ are the vorticity field at the beginning and the end of a timestep. $\\omega^{(1)}$ and $\\omega^{(2)}$ are the intermediate vorticity fields.\n",
+ "\n",
+ "$$\\omega^{(1)} = \\omega^{(n)} + \\Delta t \\mathcal{L}(\\omega^{(n)}),$$\n",
+ " \n",
+ "$$\\omega^{(2)} = \\frac{3}{4}\\omega^{(n)} + \\frac{1}{4}\\omega^{(1)} + \\frac{1}{4}\\Delta t \\mathcal{L}(\\omega^{(1)}),$$ \n",
+ "\n",
+ "$$\\omega^{(n+1)} = \\frac{1}{3}\\omega^{(n)} + \\frac{2}{3}\\omega^{(2)} + \\frac{2}{3}\\Delta t \\mathcal{L}(\\omega^{(2)}). \\tag{4}$$\n",
+ "\n",
+ "Here $\\mathcal{L}(\\omega)$ encapsulates the finite-difference forms of both advection and diffusion terms in equation (1). \n",
+ "\n",
+ "\n",
+ "### 5. Initial condition: 2-D decaying turbulence\n",
+ "\n",
+ "In this problem, we solve the governing equations on a square domain of size $2\\pi \\times 2\\pi$. Recall that the wavenumbers are defined as $k_x = 2 \\pi m / L_x$ and $k_y = 2 \\pi n / L_y$. By choosing $L_x = L_y = 2\\pi$, the wavenumbers $k_x$ and $k_y$ become integers (specifically, $k_x = m$ and $k_y = n$), which simplifies some of the FFT-based operations. This choice of domain size is a common convention in spectral solver literature for this reason.\n",
+ "\n",
+ "The initial condition is generated using the energy spectrum from San & Staples CNF (2012)\n",
+ "\n",
+ "$$E(k) = \\frac{a_s}{2 k_p}\\left(\\frac{k}{k_p}\\right)^{2s+1} \\exp\\left[-\\left(s + \\frac{1}{2}\\right)\\left(\\frac{k}{k_p}\\right)^2\\right] \\tag{5},$$\n",
+ "\n",
+ "where\n",
+ "- $k = |\\mathbf{k}| = \\sqrt{k_x^2 + k_y^2}$,\n",
+ "- $k_p = 12 (wavenumber at which $E(k)$ peaks), \n",
+ "- $s = 3$ (shape parameter of the spectrum),\n",
+ "- $a_s = (2s+1)^{s+1}/(2^s s!)$.\n",
+ "\n",
+ "The magnitude of Fourier coefficients related to the initial energy spectrum is given by \n",
+ "\n",
+ "$$|\\hat{\\omega}(\\mathbf{k})| = \\sqrt{\\frac{k}{\\pi}E(k)},$$\n",
+ "\n",
+ "which is then randomized with a phase function $\\zeta(\\mathbf{k})$ so that $\\hat{\\omega}(k_x, k_y) = \\hat{\\omega}(\\mathbf{k})$ becomes \n",
+ "\n",
+ "$$\\hat{\\omega}(\\mathbf{k}) = \\sqrt{\\frac{k}{\\pi}E(k)} \\, e^{i\\zeta(\\mathbf{k})}. \\tag{6}$$\n",
+ " \n",
+ "The phase function $\\zeta(\\mathbf{k})$ is constructed with specific symmetry constraints to ensure that the vorticity field $\\omega(x,y)$ is real-valued in the physical space. The details of how $\\zeta(\\mathbf{k})$ is formulated is provided in **Appendix A**.\n",
+ "\n",
+ "**NOTE: Although the initialization equations may appear complex, this notebook focuses primarily on the Warp implementation. Much of the boilerplate code will be provided for you.**"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "---\n",
+ "## Building blocks of the solver\n",
+ "\n",
+ "$$\n",
+ "\\boxed{\\;\\;\n",
+ "\\begin{array}{c}\n",
+ "\\large\\textbf{Solver Pipeline} \\\\[14pt]\n",
+ "\\hat{\\omega} \\;\\xrightarrow{\\;\\text{IFFT}\\;}\\; \\omega(t) \\;\\xrightarrow{\\quad}\\;\n",
+ "\\underbrace{\\text{Discretize} \\;\\to\\; \\text{1 RK substep} \\;\\to\\; \\text{Poisson solve}}_{\\times\\, 3 \\text{\\ times for a RK3 scheme}} \n",
+ "\\;\\xrightarrow{\\quad}\\; \\omega(t+\\Delta t) \\\\[6pt]\n",
+ "\\uparrow \\hspace{20em} \\downarrow \\\\[-2pt]\n",
+ "\\xleftarrow{\\hspace{5em} \\text{For next timestep/RK step} \\hspace{2em}}\n",
+ "\\end{array}\n",
+ "\\;\\;}\n",
+ "$$\n",
+ "\n",
+ "There are four main building blocks of the solver that we would tackle in a sequential manner.\n",
+ "\n",
+ "- Initialize $\\omega$, i.e., generate $\\hat{\\omega}$ in the Fourier space using the energy spectrum specified above and inverse FFT it to the physical space.\n",
+ "- Discretize the advection and diffusion terms to obtain the RHS $\\mathcal{L}(\\omega)$ of $\\frac{\\partial \\omega}{\\partial t} = \\mathcal{L}(\\omega)$ (see equation 1).\n",
+ "- Advancing $\\omega(t)$ to $\\omega(t+\\Delta t)$ using SSP RK3 scheme (equation 4).\n",
+ "- Solve Poisson equation (equation 2) in the Fourier space $\\hat{\\psi}_{m,n} = \\hat{\\omega}_{m,n}/(k_x^2 + k_y^2)$ (equation 3), and inverse FFT $\\hat{psi}$ to $\\psi$ after each RK substep. "
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "---\n",
+ "## Simulation parameters"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# grid resolution\n",
+ "N_GRID = 512\n",
+ "# box size\n",
+ "LEN = 2 * np.pi\n",
+ "# delta t for timestepping\n",
+ "DT = 0.001\n",
+ "# Reynolds number\n",
+ "RE = 1000.0\n",
+ "# define SSP-RK3 coefficients used for timestepping\n",
+ "RK3_COEFFS = [\n",
+ " [1.0, 0.0, 1.0],\n",
+ " [3.0 / 4.0, 1.0 / 4.0, 1.0 / 4.0],\n",
+ " [1.0 / 3.0, 2.0 / 3.0, 2.0 / 3.0],\n",
+ "]\n",
+ "# h = l/N (size of a computational cell)\n",
+ "H = LEN / N_GRID\n",
+ "# parameters for Warp's tiled-FFT functionality\n",
+ "TILE_M = 1\n",
+ "TILE_N = N_GRID\n",
+ "TILE_TRANSPOSE_DIM = 16\n",
+ "BLOCK_DIM = TILE_N // 2"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "---\n",
+ "## Building block 1: Flow initialization in Warp\n",
+ "\n",
+ "Recall from equation (6) in the Background section that the initial vorticity field $\\hat{\\omega}(\\mathbf{k})$ in Fourier space is given by\n",
+ "\n",
+ "$$\\hat{\\omega}(\\mathbf{k}) = \\underbrace{\\sqrt{\\frac{k}{\\pi}E(k)}}_{\\text{amplitude}} \\cdot \\underbrace{e^{i\\zeta(\\mathbf{k})}}_{\\text{phase}} = \\underbrace{\\sqrt{\\frac{k}{\\pi}E(k)}}_{\\text{amplitude}} \\cdot \\left(\\cos\\zeta(\\mathbf{k}) + i\\sin\\zeta(\\mathbf{k})\\right),$$\n",
+ "\n",
+ "which gives us the real and imaginary parts\n",
+ " \n",
+ "$$\\text{Re}(\\hat{\\omega}) = \\sqrt{\\frac{k}{\\pi}E(k)} \\cos\\zeta(\\mathbf{k}), \\quad \\text{Im}(\\hat{\\omega}) = \\sqrt{\\frac{k}{\\pi}E(k)} \\sin\\zeta(\\mathbf{k}).$$\n",
+ "\n",
+ "$E(k)$ is the energy spectrum (equation 5) and $\\zeta(\\mathbf{k})$ is the phase function with symmetry constraints ensuring a real-valued physical field.\n",
+ "\n",
+ "**What do we need to implement?**\n",
+ "\n",
+ "To initialize the flow, we need:\n",
+ "\n",
+ "1. A kernel to populate $\\hat{\\omega}(\\mathbf{k})$ on the entire 2-D grid using equation (6).\n",
+ "\n",
+ "2. Two helper functions for that kernel (and equation 6):\n",
+ "\n",
+ " a. A function to compute the **energy spectrum** $E(k)$ at each wavenumber magnitude.\n",
+ "\n",
+ " b. A function to compute the **phase** $\\zeta(\\mathbf{k})$.\n",
+ " \n",
+ "The helper functions will be `@wp.func` device functions (callable from kernels), and the initialization kernel will be a `@wp.kernel`."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "**Note on Warp math utilities:** Warp provides built-in functions for basic operations like `wp.cos()`, `wp.sqrt()`, `wp.sin()`, etc. for use in kernels. For complex number arithmetic, Warp uses `wp.vec2f` and `wp.vec2d` as data types for 32-bit and 64-bit precision, respectively. For example, complex numbers can represented as `wp.vec2f(real_part, imaginary_part)`, where the first component is the real part of type `wp.float32` and the second component is the imaginary part of type `wp.float32`.\n",
+ "\n",
+ "**EXERCISE: Assuming the helper functions are correct, complete the kernel `decaying_turbulence_initializer(...)` by computing the amplitude using the Warp function `def energy_spectrum(...)`, the phase function using `def phase_randomizer(...)`, and combining them to form the complex valued `omega_hat_init[i, j]`.**"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "@wp.func\n",
+ "def factorial(n: wp.int32) -> wp.int32:\n",
+ " \"\"\"Compute factorial.\n",
+ "\n",
+ " Args:\n",
+ " n: Input integer for which we want factorial.\n",
+ "\n",
+ " Returns:\n",
+ " Factorial of input n.\n",
+ " \"\"\"\n",
+ " result = wp.int32(1)\n",
+ " for i in range(2, n + 1):\n",
+ " result *= i\n",
+ " return result\n",
+ "\n",
+ "\n",
+ "@wp.func\n",
+ "def energy_spectrum(k: wp.float32, s: wp.int32, kp: wp.float32) -> wp.float32:\n",
+ " \"\"\"Compute energy at wavenumber magnitude k.\n",
+ "\n",
+ " Follows San and Staples 2012 Computers and Fluids (page 49).\n",
+ " https://www.sciencedirect.com/science/article/abs/pii/S0045793012001363.\n",
+ "\n",
+ " Args:\n",
+ " k: Input wavenumber magnitude.\n",
+ " s: Shape parameter of spectrum.\n",
+ " kp: Wavenumber magnitude at which maximum of energy spectrum lies.\n",
+ "\n",
+ " Returns:\n",
+ " Energy contained at wavenumber magnitude k.\n",
+ " \"\"\"\n",
+ " s_factorial = wp.float32(factorial(s))\n",
+ " s_float32 = wp.float32(s)\n",
+ " a_s = (2.0 * s_float32 + 1.0) ** (s_float32 + 1.0) / (2.0**s_float32 * s_factorial)\n",
+ " energy_k = (\n",
+ " a_s\n",
+ " / (2.0 * kp)\n",
+ " * (k / kp) ** (2.0 * s_float32 + 1.0)\n",
+ " * wp.exp(-(s_float32 + 0.5) * (k / kp) ** 2.0)\n",
+ " )\n",
+ " return energy_k\n",
+ "\n",
+ "\n",
+ "@wp.func\n",
+ "def phase_randomizer(\n",
+ " n: int,\n",
+ " zeta: wp.array2d(dtype=wp.float32),\n",
+ " eta: wp.array2d(dtype=wp.float32),\n",
+ " i: int,\n",
+ " j: int,\n",
+ ") -> wp.float32:\n",
+ " \"\"\"Calculate value of the random phase at index (i, j).\n",
+ "\n",
+ " Follows San and Staples 2012 to return phase value in any quadrant based on\n",
+ " the values of eta and zeta in the first quadrant.\n",
+ "\n",
+ " Args:\n",
+ " n: Size of the simulation domain.\n",
+ " zeta: First phase function.\n",
+ " eta: Second phase function\n",
+ " i: rowwise index on the 2-D simulation domain.\n",
+ " j: columnwise index on the 2-D simulation domain\n",
+ "\n",
+ " Returns:\n",
+ " Value of the random phase in any quadrant.\n",
+ " \"\"\"\n",
+ " n_half = n // 2\n",
+ "\n",
+ " # first quadrant\n",
+ " if i < n_half and j < n_half:\n",
+ " return zeta[i, j] + eta[i, j]\n",
+ " # second quadrant\n",
+ " if i >= n_half and j < n_half:\n",
+ " return -zeta[n - i, j] + eta[n - i, j]\n",
+ " # third quadrant\n",
+ " if i >= n_half and j >= n_half:\n",
+ " return -zeta[n - i, n - j] - eta[n - i, n - j]\n",
+ " # fourth quadrant\n",
+ " return zeta[i, n - j] - eta[i, n - j]\n",
+ "\n",
+ "@wp.kernel\n",
+ "def decaying_turbulence_initializer(\n",
+ " n: int,\n",
+ " kp: wp.float32,\n",
+ " s: wp.int32,\n",
+ " k_mag: wp.array2d(dtype=wp.float32),\n",
+ " zeta: wp.array2d(dtype=wp.float32),\n",
+ " eta: wp.array2d(dtype=wp.float32),\n",
+ " omega_hat_init: wp.array2d(dtype=wp.vec2f), # In Warp, wp.vec2f/wp.vec2d are used to handle complex numbers\n",
+ "):\n",
+ " \"\"\"Initialize the vorticity field in Fourier space for decaying turbulence.\n",
+ "\n",
+ " Args:\n",
+ " n: Size of the simulation domain.\n",
+ " kp: Wavenumber magnitude at which maximum of energy spectrum lies.\n",
+ " s: Shape parameter of the energy spectrum.\n",
+ " k_mag: Wavenumber magnitude on the 2-D grid.\n",
+ " zeta: First phase function for phase randomization.\n",
+ " eta: Second phase function for phase randomization.\n",
+ " omega_hat_init: Output vorticity field in Fourier space.\n",
+ " \"\"\"\n",
+ " i, j = wp.tid()\n",
+ "\n",
+ " amplitude = wp.sqrt((k_mag[i, j] / wp.pi) * energy_spectrum(k_mag[i, j], s, kp)) # MISSING\n",
+ " phase = phase_randomizer(n, zeta, eta, i, j) # MISSING\n",
+ " omega_hat_init[i, j] = wp.vec2f(amplitude*wp.cos(phase), amplitude*wp.sin(phase)) # MISSING"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "---\n",
+ "## Building blocks 2 and 3: Discretization of advection and viscous diffusion terms for RHS and Runge-Kutta timestepping\n",
+ "\n",
+ "Recall the vorticity transport equation (1) from the Background section given as\n",
+ "\n",
+ "$$\\frac{\\partial \\omega}{\\partial t} + \\underbrace{\\frac{\\partial \\psi}{\\partial y}\\frac{\\partial \\omega}{\\partial x} - \\frac{\\partial \\psi}{\\partial x}\\frac{\\partial \\omega}{\\partial y}}_{\\text{advection}} = \\underbrace{\\frac{1}{Re}\\left(\\frac{\\partial^2 \\omega}{\\partial x^2} + \\frac{\\partial^2 \\omega}{\\partial y^2}\\right)}_{\\text{diffusion}}.$$\n",
+ "\n",
+ "We discretize the **advection** and **diffusion** terms on a **uniform grid** using **central finite difference schemes**. Let us see what does that mean. It's not as scary as it sounds!\n",
+ "### Advection term\n",
+ "\n",
+ "Using central differences for first derivatives with grid spacing $h$,\n",
+ "\n",
+ "$$\\frac{\\partial \\omega}{\\partial x}\\bigg|_{i,j} \\approx \\frac{\\omega_{i+1,j} - \\omega_{i-1,j}}{2h}, \\quad \\frac{\\partial \\omega}{\\partial y}\\bigg|_{i,j} \\approx \\frac{\\omega_{i,j+1} - \\omega_{i,j-1}}{2h}, \\quad \\frac{\\partial \\psi}{\\partial x}\\bigg|_{i,j} \\approx \\frac{\\psi_{i+1,j} - \\psi_{i-1,j}}{2h}, \\quad \\frac{\\partial \\psi}{\\partial y}\\bigg|_{i,j} \\approx \\frac{\\psi_{i,j+1} - \\psi_{i,j-1}}{2h}.$$\n",
+ "\n",
+ "The advection term becomes\n",
+ "\n",
+ "$$\\text{Advection}_{i,j} = \\underbrace{\\frac{\\omega_{i+1,j} - \\omega_{i-1,j}}{2h} \\cdot \\frac{\\psi_{i,j+1} - \\psi_{i,j-1}}{2h}}_{\\texttt{term\\_1}} - \\underbrace{\\frac{\\omega_{i,j+1} - \\omega_{i,j-1}}{2h} \\cdot \\frac{\\psi_{i+1,j} - \\psi_{i-1,j}}{2h}}_{\\texttt{term\\_2}}.$$\n",
+ "\n",
+ "### Diffusion (or Laplacian) term\n",
+ "\n",
+ "Using the standard 5-point stencil for the Laplacian, the diffusion term becomes\n",
+ "\n",
+ "$$\\text{Diffusion}_{i,j} = \\nabla^2 \\omega\\bigg|_{i,j} = \\frac{\\partial^2 \\omega}{\\partial x^2}\\bigg|_{i,j} + \\frac{\\partial^2 \\omega}{\\partial y^2}\\bigg|_{i,j} \\approx \\frac{\\omega_{i+1,j} - 2\\omega_{i,j} + \\omega_{i-1,j}}{h^2} + \\frac{\\omega_{i,j+1} - 2\\omega_{i,j} + \\omega_{i,j-1}}{h^2} = \\frac{\\omega_{i+1,j} + \\omega_{i-1,j} + \\omega_{i,j+1} + \\omega_{i,j-1} - 4\\omega_{i,j}}{h^2}.$$\n",
+ "\n",
+ "### Runge-Kutta timestepping\n",
+ "\n",
+ "Recall the SSP-RK3 scheme from equation (4). Each sub-step has the general form\n",
+ "\n",
+ "$$\\omega^{(\\text{new})} = c_0 \\cdot \\omega^{(n)} + c_1 \\cdot \\omega^{(\\text{old})} + c_2 \\cdot \\Delta t \\cdot \\mathcal{L}(\\omega^{(\\text{old})}),$$\n",
+ "\n",
+ "where \n",
+ "\n",
+ "- $\\mathcal{L}(\\omega) = \\frac{1}{Re}\\text{Diffusion} - \\text{Advection}$ is the RHS term we just discretized above. \n",
+ "- $\\omega^{(n)}$ is the vorticity field on at the beginning of the timestep $t$. \n",
+ "- $\\omega^{\\text{old}}$ and $\\omega^{\\text{new}}$ are the vorticity field at the beginning and the end **of a RK substep**, respectively. \n",
+ "\n",
+ "**Note: At the first and last RK substep specifically, $\\omega^{\\text{(new)}}$ and $\\omega^{\\text{(old)}}$ are equal to $\\omega(t+1)$ and $\\omega(t)$, respectively. For intermediate RK substeps, this does not necessarily hold true.**\n",
+ "\n",
+ "In a Warp kernel, all of these operations would translate to:\n",
+ "1. Compute `rhs = (1/Re) * diffusion(...) - advection(...)`.\n",
+ "2. Update `omega_old = coeff0 * omega_n + coeff1 * omega_old + coeff2 * dt * rhs`, where `omega_n` is the vorticity field at the very beginning of the RK timestepping. By passing different coefficients, each time the kernel is called, the same kernel can handle all three RK sub-steps. Here, note that the same variable `omega_old` is overwritten to store the new values after a RK substep to store $\\omega^{\\text{(new)}}$. \n",
+ "\n",
+ "**What do we need to implement?**\n",
+ "\n",
+ "Two `@wp.func` helper functions:\n",
+ "1. `viscous_advection_rk3_kernel(...)` - Warp kernel that calls two Warp functions `advection(...)` and `diffusion(...)` to calculate the advection and diffusion terms, and then performs one RK update on the entire 2-D grid.\n",
+ "2. `advection(...)` - Warp function that computes the advection term at a grid point given the flow information on the neighboring left, right, top, and bottom neighbors.\n",
+ "3. `diffusion(...)` - Warp function computes the Laplacian at a grid point given the flow information on the left, right, top and bottom neighbors."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "**EXERCISE: Fill in the MISSING parts in `advection(...)`, `diffusion(...)`, using the finite difference formulas described above and the RK update equation.**\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "@wp.func\n",
+ "def cyclic_index(idx: wp.int32, n: wp.int32) -> wp.int32:\n",
+ " \"\"\"Map any index to [0, n-1] for periodic boundary conditions.\n",
+ "\n",
+ " Args:\n",
+ " idx: Input index that may be outside the valid range.\n",
+ " n: Grid size defining the periodic domain.\n",
+ "\n",
+ " Returns:\n",
+ " Index wrapped to the range [0, n-1].\n",
+ " \"\"\"\n",
+ " ret_idx = idx % n\n",
+ " if ret_idx < 0:\n",
+ " ret_idx += n\n",
+ " return ret_idx\n",
+ " \n",
+ "@wp.func\n",
+ "def advection(\n",
+ " omega_left: wp.float32,\n",
+ " omega_right: wp.float32,\n",
+ " omega_top: wp.float32,\n",
+ " omega_down: wp.float32,\n",
+ " psi_left: wp.float32,\n",
+ " psi_right: wp.float32,\n",
+ " psi_top: wp.float32,\n",
+ " psi_down: wp.float32,\n",
+ " h: wp.float32,\n",
+ ") -> wp.float32:\n",
+ " \"\"\"Calculate the advection term using central finite difference.\n",
+ "\n",
+ " Args:\n",
+ " omega_left: Vorticity at (i-1, j).\n",
+ " omega_right: Vorticity at (i+1, j).\n",
+ " omega_top: Vorticity at (i, j+1).\n",
+ " omega_down: Vorticity at (i, j-1).\n",
+ " psi_left: Stream function at (i-1, j).\n",
+ " psi_right: Stream function at (i+1, j).\n",
+ " psi_top: Stream function at (i, j+1).\n",
+ " psi_down: Stream function at (i, j-1).\n",
+ " h: Grid spacing.\n",
+ "\n",
+ " Returns:\n",
+ " Advection term value at grid point (i, j).\n",
+ " \"\"\"\n",
+ " inv_2h = 1.0 / (2.0 * h)\n",
+ " term_1 = ((omega_right - omega_left) * inv_2h) * ((psi_top - psi_down) * inv_2h) # MISSING\n",
+ " term_2 = ((omega_top - omega_down) * inv_2h) * ((psi_right - psi_left) * inv_2h) # MISSING\n",
+ " return term_1 - term_2\n",
+ "\n",
+ "\n",
+ "@wp.func\n",
+ "def diffusion(\n",
+ " omega_left: wp.float32,\n",
+ " omega_right: wp.float32,\n",
+ " omega_center: wp.float32,\n",
+ " omega_down: wp.float32,\n",
+ " omega_top: wp.float32,\n",
+ " h: wp.float32,\n",
+ ") -> wp.float32:\n",
+ " \"\"\"Calculate the Laplacian for viscous diffusion using central difference.\n",
+ "\n",
+ " Args:\n",
+ " omega_left: Vorticity at (i-1, j).\n",
+ " omega_right: Vorticity at (i+1, j).\n",
+ " omega_center: Vorticity at (i, j).\n",
+ " omega_down: Vorticity at (i, j-1).\n",
+ " omega_top: Vorticity at (i, j+1).\n",
+ " h: Grid spacing.\n",
+ "\n",
+ " Returns:\n",
+ " Laplacian of vorticity at grid point (i, j).\n",
+ " \"\"\"\n",
+ " inv_h2 = 1.0 / (h * h)\n",
+ " # combine both the diffusion terms in the x and y direction together\n",
+ " laplacian = (omega_left + omega_right + omega_top + omega_down - 4.0*omega_center) * inv_h2 # MISSING\n",
+ " return laplacian"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Validating advection and diffusion kernels\n",
+ "\n",
+ "Before moving to `def viscous_advection_rk3_kernel(...)`, we validate the finite-difference implementation using a **manufactured solution** with known analytical derivatives.\n",
+ "\n",
+ "$$\\psi(x,y) = \\sin(k_x x)\\sin(k_y y), \\quad \\omega(x,y) = (k_x^2 + k_y^2)\\sin(k_x x)\\sin(k_y y)$$\n",
+ "\n",
+ "Note that $\\nabla^{2} \\psi = -\\omega$. \n",
+ "\n",
+ "Applying the Laplacian to $\\omega$\n",
+ "\n",
+ "$$\\nabla^2\\omega = \\frac{\\partial^2 \\omega}{\\partial x^2} + \\frac{\\partial^2 \\omega}{\\partial y^2} = -(k_x^2 + k_y^2)^2 \\sin(k_x x)\\sin(k_y y) = -(k_x^2 + k_y^2)\\omega.$$\n",
+ "\n",
+ "Evaluating $\\frac{\\partial \\psi}{\\partial x}\\frac{\\partial \\omega}{\\partial y} - \\frac{\\partial \\psi}{\\partial y}\\frac{\\partial \\omega}{\\partial x}$ gives\n",
+ "\n",
+ "$$\\frac{\\partial \\psi}{\\partial x} = k_x\\cos(k_x x)\\sin(k_y y), \\quad \\frac{\\partial \\psi}{\\partial y} = k_y\\sin(k_x x)\\cos(k_y y)$$\n",
+ "\n",
+ "$$\\frac{\\partial \\omega}{\\partial x} = (k_x^2+k_y^2)k_x\\cos(k_x x)\\sin(k_y y), \\quad \\frac{\\partial \\omega}{\\partial y} = (k_x^2+k_y^2)k_y\\sin(k_x x)\\cos(k_y y)$$\n",
+ "\n",
+ "Substituting into the Jacobian:\n",
+ "\n",
+ "$$\\frac{\\partial \\psi}{\\partial x}\\frac{\\partial \\omega}{\\partial y} = (k_x^2+k_y^2)k_x k_y \\cos(k_x x)\\sin(k_x x)\\sin(k_y y)\\cos(k_y y)$$\n",
+ "\n",
+ "$$\\frac{\\partial \\psi}{\\partial y}\\frac{\\partial \\omega}{\\partial x} = (k_x^2+k_y^2)k_x k_y \\sin(k_x x)\\cos(k_x x)\\cos(k_y y)\\sin(k_y y)$$\n",
+ "\n",
+ "These two terms are identical, so advection terms evaluate to zero.\n",
+ "\n",
+ "The cell below compares numerical output against these analytical values. \n",
+ "\n",
+ "**EXERCISE: Run the initial validation and try playing with `n_grid`, `kx`, `ky` parameters. For now, treat the `def compute_advection_diffusion_kernel(...)` as a black box. We will encounter a similar kernel soon.**"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from utils import validate_advection_diffusion\n",
+ "\n",
+ "@wp.kernel\n",
+ "def compute_advection_diffusion_kernel(\n",
+ " omega: wp.array2d(dtype=wp.float32),\n",
+ " psi: wp.array2d(dtype=wp.float32),\n",
+ " advection_out: wp.array2d(dtype=wp.float32),\n",
+ " diffusion_out: wp.array2d(dtype=wp.float32),\n",
+ " h: wp.float32,\n",
+ " n: wp.int32,\n",
+ "):\n",
+ " \"\"\"Compute advection and diffusion terms at each grid point.\n",
+ "\n",
+ " Args:\n",
+ " omega: Vorticity field.\n",
+ " psi: Stream function field.\n",
+ " advection_out: Output array for advection term J(psi, omega).\n",
+ " diffusion_out: Output array for diffusion term (Laplacian of omega).\n",
+ " h: Grid spacing.\n",
+ " n: Grid size.\n",
+ " \"\"\"\n",
+ " i, j = wp.tid()\n",
+ "\n",
+ " # obtain the neighboring indices for the [i, j]th cell in a periodic square box\n",
+ " i_left = cyclic_index(i - 1, n)\n",
+ " i_right = cyclic_index(i + 1, n)\n",
+ " j_down = cyclic_index(j - 1, n)\n",
+ " j_top = cyclic_index(j + 1, n)\n",
+ "\n",
+ " # gather omega values at neighbors\n",
+ " omega_center = omega[i, j]\n",
+ " omega_left = omega[i_left, j]\n",
+ " omega_right = omega[i_right, j]\n",
+ " omega_down = omega[i, j_down]\n",
+ " omega_top = omega[i, j_top]\n",
+ "\n",
+ " # gather psi values at neighbors\n",
+ " psi_left = psi[i_left, j]\n",
+ " psi_right = psi[i_right, j]\n",
+ " psi_down = psi[i, j_down]\n",
+ " psi_top = psi[i, j_top]\n",
+ "\n",
+ " # compute diffusion term (Laplacian)\n",
+ " diffusion_out[i, j] = diffusion(\n",
+ " omega_left,\n",
+ " omega_right,\n",
+ " omega_center,\n",
+ " omega_down,\n",
+ " omega_top,\n",
+ " h,\n",
+ " )\n",
+ "\n",
+ " # compute advection term J(psi, omega)\n",
+ " advection_out[i, j] = advection(\n",
+ " omega_left,\n",
+ " omega_right,\n",
+ " omega_top,\n",
+ " omega_down,\n",
+ " psi_left,\n",
+ " psi_right,\n",
+ " psi_top,\n",
+ " psi_down,\n",
+ " h,\n",
+ " )\n",
+ "\n",
+ "# Change n_grid, kx, ky here and play with different values\n",
+ "(fig_diff, _), (fig_adv, _) = validate_advection_diffusion(\n",
+ " compute_advection_diffusion_kernel, n_grid=N_GRID, kx=2, ky=3\n",
+ ")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "**EXERCISE: Given that we have the correct `def advection(...)` and `def diffusion(...)` functions, let's assemble the timestepping kernel together. Recall**\n",
+ "\n",
+ "$$\\omega^{(\\text{new})} = c_0 \\cdot \\omega^{(n)} + c_1 \\cdot \\omega^{(\\text{old})} + c_2 \\cdot \\Delta t \\cdot \\mathcal{L}(\\omega^{(\\text{old})}).$$\n",
+ "\n",
+ "In coding terminology, we can write `omega_old = coeff0 * omega_n + coeff1 * omega_old + coeff2 * dt * rhs`.\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "\n",
+ "@wp.kernel\n",
+ "def viscous_advection_rk3_kernel(\n",
+ " n: int,\n",
+ " h: float,\n",
+ " re: float,\n",
+ " dt: float,\n",
+ " coeff0: float,\n",
+ " coeff1: float,\n",
+ " coeff2: float,\n",
+ " omega_0: wp.array2d(dtype=float),\n",
+ " omega_1: wp.array2d(dtype=float),\n",
+ " psi: wp.array2d(dtype=float),\n",
+ " rhs: wp.array2d(dtype=float),\n",
+ "):\n",
+ " \"\"\"Perform a single substep of SSP-RK3.\n",
+ "\n",
+ " Args:\n",
+ " n: Grid size.\n",
+ " h: Grid spacing.\n",
+ " re: Reynolds number.\n",
+ " dt: Timestep size.\n",
+ " coeff0: SSP-RK3 coefficient for omega_0.\n",
+ " coeff1: SSP-RK3 coefficient for omega_1.\n",
+ " coeff2: SSP-RK3 coefficient for RHS.\n",
+ " omega_0: Vorticity field at the beginning of the timestep.\n",
+ " omega_1: Vorticity field at the end of the RK step.\n",
+ " psi: Stream function field.\n",
+ " rhs: Temporarily stores diffusion + advection terms.\n",
+ " \"\"\"\n",
+ " i, j = wp.tid()\n",
+ "\n",
+ " # obtain the neighboring indices for the [i, j]th cell in a periodic square box\n",
+ " left_idx = cyclic_index(i - 1, n)\n",
+ " right_idx = cyclic_index(i + 1, n)\n",
+ " top_idx = cyclic_index(j + 1, n)\n",
+ " down_idx = cyclic_index(j - 1, n)\n",
+ "\n",
+ " # compute viscous diffusion term\n",
+ " rhs[i, j] = (1.0 / re) * diffusion(\n",
+ " omega_1[left_idx, j],\n",
+ " omega_1[right_idx, j],\n",
+ " omega_1[i, j],\n",
+ " omega_1[i, down_idx],\n",
+ " omega_1[i, top_idx],\n",
+ " h,\n",
+ " )\n",
+ "\n",
+ " # add advection term\n",
+ " rhs[i, j] -= advection(\n",
+ " omega_1[left_idx, j],\n",
+ " omega_1[right_idx, j],\n",
+ " omega_1[i, top_idx],\n",
+ " omega_1[i, down_idx],\n",
+ " psi[left_idx, j],\n",
+ " psi[right_idx, j],\n",
+ " psi[i, top_idx],\n",
+ " psi[i, down_idx],\n",
+ " h,\n",
+ " )\n",
+ "\n",
+ " # perform RK update\n",
+ " omega_1[i, j] = coeff0 * omega_0[i, j] + coeff1 * omega_1[i, j] + coeff2 * dt * rhs[i, j] # MISSING"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "---\n",
+ "## Building block 4: Spectral Poisson solver using tile-based FFT and transpose\n",
+ "\n",
+ "Recall from equation (3) in the Background section that the Poisson equation in the Fourier space is\n",
+ "\n",
+ "$$\\hat{\\psi}_{m,n} = \\frac{\\hat{\\omega}_{m,n}}{k_x^2 + k_y^2}.$$\n",
+ "\n",
+ "To solve this, we need to perform the follow steps:\n",
+ "1. Transform $\\omega$ to Fourier space using 2-D FFT → $\\hat{\\omega}$.\n",
+ "2. Divide by $|k|^2$ to obtain $\\hat{\\psi}$.\n",
+ "3. Transform $\\hat{\\psi}$ back to physical space using 2-D IFFT.\n",
+ "\n",
+ "**Warp's FFT primitives**: Warp provides `wp.tile_fft()` and `wp.tile_ifft()` that operate on `wp.vec2f` (or `wp.vec2d`) arrays and perform row-wise FFT/IFFT, where:\n",
+ "- `.x` component = **real part**\n",
+ "- `.y` component = **imaginary part**\n",
+ "\n",
+ "Since our physical fields $\\omega$, $\\psi$ are stored as `wp.float32`, we need a helper kernel to convert between data types.\n",
+ "\n",
+ "**What do we need to implement?**\n",
+ "\n",
+ "1. **Data type conversion**: Kernels to move data from `wp.float32` to `wp.vec2f` (for FFT) and from `wp.vec2f` to `wp.float32` (after IFFT).\n",
+ "\n",
+ "2. **Tile-based 1-D FFT/IFFT kernels**: Kernels that will use `wp.tile_fft()` and `wp.tile_ifft()` to perform row-wise transforms.\n",
+ "\n",
+ "3. **Transpose kernel**: A tiled transpose for efficient matrix transpose.\n",
+ "\n",
+ "4. **2-D FFT composition**: A Python function that will put together the kernels in steps (2) and (3) above to do 2-D FFT as **row FFT → transpose → row FFT**.\n",
+ "\n",
+ "Let's build these step by step."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Step 1: Data type conversion kernels\n",
+ "\n",
+ "Since Warp's FFT operates on `wp.vec2f` (complex) arrays but our physical fields are `wp.float32` (real), we need:\n",
+ "\n",
+ "- `copy_float_to_vec2`: Convert real array to complex with zero imaginary part (before FFT).\n",
+ "- `extract_real_and_scale`: Extract real part and apply scaling by dividing the real part by `scale` (after IFFT)."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "**EXERCISE: Complete the `def extract_real_and_scale(...)` kernel definition. `def copy_float_to_vec2(...)` is already completed for you for reference.**"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "@wp.kernel\n",
+ "def copy_float_to_vec2(\n",
+ " omega: wp.array2d(dtype=wp.float32), omega_complex: wp.array2d(dtype=wp.vec2f)\n",
+ "):\n",
+ " \"\"\"Copy real vorticity to a complex array with zero imaginary part.\n",
+ "\n",
+ " Args:\n",
+ " omega: Input real-valued vorticity array.\n",
+ " omega_complex: Output complex array where real part is omega, imaginary is 0.\n",
+ " \"\"\"\n",
+ " i, j = wp.tid()\n",
+ " omega_complex[i, j] = wp.vec2f(omega[i, j], 0.0)\n",
+ "\n",
+ "\n",
+ "@wp.kernel\n",
+ "def extract_real_and_scale(\n",
+ " scale: wp.float32,\n",
+ " complex_array: wp.array2d(dtype=wp.vec2f),\n",
+ " real_array: wp.array2d(dtype=wp.float32),\n",
+ "):\n",
+ " \"\"\"Extract real part from complex array and scale in one pass.\n",
+ "\n",
+ " Args:\n",
+ " scale: Scale factor to multiply each element by.\n",
+ " complex_array: Input complex array (vec2f where .x is real part).\n",
+ " real_array: Output real array (scaled).\n",
+ " \"\"\"\n",
+ " i, j = wp.tid()\n",
+ " real_array[i, j] = complex_array[i, j].x / scale # MISSING"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Step 2: Tile-based 1-D FFT and IFFT kernels\n",
+ "\n",
+ "Warp's `wp.tile_fft()` and `wp.tile_ifft()` perform FFT on tiles loaded into registers. For row-wise transforms:\n",
+ "\n",
+ "1. Each thread block loads one row (tile) of the 2-D array\n",
+ "2. Performs FFT/IFFT using `wp.tile_fft()` / `wp.tile_ifft()`\n",
+ "3. Stores the result back to global memory\n",
+ "\n",
+ "The tile dimensions `TILE_M=1` and `TILE_N=N_GRID` mean each tile is exactly one row.\n",
+ "\n",
+ "**TODO: I am not sure if we should ask the readers to fill in the kernels definitions for `tile_fft(...)` and `tile_ifft(...)`. Also, @Eric for a better description of tile-based operations...**"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "@wp.kernel\n",
+ "def fft_tiled(x: wp.array2d(dtype=wp.vec2f), y: wp.array2d(dtype=wp.vec2f)):\n",
+ " \"\"\"Perform 1-D FFT on each row using wp.tile_fft().\n",
+ "\n",
+ " Args:\n",
+ " x: Input complex array of shape (N, N).\n",
+ " y: Output complex array of shape (N, N) storing FFT results.\n",
+ " \"\"\"\n",
+ " i, _, _ = wp.tid()\n",
+ " a = wp.tile_load(x, shape=(TILE_M, TILE_N), offset=(i * TILE_M, 0))\n",
+ " wp.tile_fft(a)\n",
+ " wp.tile_store(y, a, offset=(i * TILE_M, 0))\n",
+ "\n",
+ "\n",
+ "@wp.kernel\n",
+ "def ifft_tiled(x: wp.array2d(dtype=wp.vec2f), y: wp.array2d(dtype=wp.vec2f)):\n",
+ " \"\"\"Perform 1-D inverse FFT on each row using wp.tile_ifft().\n",
+ "\n",
+ " Args:\n",
+ " x: Input complex array of shape (N, N).\n",
+ " y: Output complex array of shape (N, N) storing IFFT results.\n",
+ " \"\"\"\n",
+ " i, _, _ = wp.tid()\n",
+ " a = wp.tile_load(x, shape=(TILE_M, TILE_N), offset=(i * TILE_M, 0))\n",
+ " wp.tile_ifft(a)\n",
+ " wp.tile_store(y, a, offset=(i * TILE_M, 0))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Validating FFT/IFFT kernels \n",
+ "To verify that `fft_tiled` and `ifft_tiled` work correctly, we test them with a known signal $f(x) = \\sin(x)$ on the domain $[0, 2\\pi)$. The Fourier transform of $\\sin(x)$ has a simple analytical form. Since $\\sin(x) = \\frac{e^{ix} - e^{-ix}}{2i}$, the FFT should produce peaks only at wavenumbers $k = \\pm 1$, with zero amplitude elsewhere. \n",
+ "\n",
+ "The `validate_fft_roundtrip(...)` function initializes the signal $f(x)$ on the $[0, 2\\pi)$ domain and performs a full roundtrip, i.e., it applies the FFT to the signal, then applies the IFFT to the result. The function then plots the magnitude spectrum $|\\hat{f}(k)|$ and compares the original signal with the reconstructed signal obtained from the FFT -> IFFT roundtrip."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Validate FFT kernels with a simple sine wave test\n",
+ "from utils import validate_fft_roundtrip\n",
+ "\n",
+ "fig, axes, max_error = validate_fft_roundtrip(\n",
+ " fft_kernel=fft_tiled,\n",
+ " ifft_kernel=ifft_tiled,\n",
+ " n_grid=N_GRID,\n",
+ " tile_m=TILE_M,\n",
+ " tile_n=TILE_N,\n",
+ " block_dim=BLOCK_DIM,\n",
+ ")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Step 3: Tiled transpose kernel\n",
+ "\n",
+ "To compute 2-D FFT using 1-D row transforms, we need a transpose operation between passes\n",
+ "\n",
+ "$$\\text{2-D FFT} = \\text{row FFT} \\rightarrow \\text{transpose} \\rightarrow \\text{row FFT}.$$\n",
+ "\n",
+ "To perform the matrix transpose, we again use Warp's tile-based primitive `wp.tile_transpose()` below in the kernel `def tiled_transpose(...)` that takes in a `x` array and transposes it to give the `y` array.\n",
+ "\n",
+ "**TODO**: Same question as above, should we ask them to fill in these tile kernels. Again @Eric for a more accurate description of tile-based transpose...\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "@wp.kernel\n",
+ "def tiled_transpose(x: wp.array2d(dtype=wp.vec2f), y: wp.array2d(dtype=wp.vec2f)):\n",
+ " \"\"\"Transpose a 2-D array using tiled approach with shared memory.\n",
+ "\n",
+ " Args:\n",
+ " x: Input complex array.\n",
+ " y: Output complex array storing the transpose of x.\n",
+ " \"\"\"\n",
+ " i, j = wp.tid()\n",
+ " t = wp.tile_load(\n",
+ " x,\n",
+ " shape=(TILE_TRANSPOSE_DIM, TILE_TRANSPOSE_DIM),\n",
+ " offset=(i * TILE_TRANSPOSE_DIM, j * TILE_TRANSPOSE_DIM),\n",
+ " storage=\"shared\",\n",
+ " )\n",
+ " t_transposed = wp.tile_transpose(t)\n",
+ " wp.tile_store(\n",
+ " y, t_transposed, offset=(j * TILE_TRANSPOSE_DIM, i * TILE_TRANSPOSE_DIM)\n",
+ " )"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Validating transpose kernel\n",
+ "\n",
+ "In `def validate_transpose(...)`, we validate this using an upper triangular matrix, which should become lower triangular after transpose. For example, in case of a $4 \\times 4$ upper triangular matrix, transpose operation should result in the following change\n",
+ "\n",
+ "$$\n",
+ "\\begin{bmatrix} 1 & 1 & 1 & 1 \\\\ 0 & 1 & 1 & 1 \\\\ 0 & 0 & 1 & 1 \\\\ 0 & 0 & 0 & 1 \\end{bmatrix}\n",
+ "\\xrightarrow{\\text{transpose}}\n",
+ "\\begin{bmatrix} 1 & 0 & 0 & 0 \\\\ 1 & 1 & 0 & 0 \\\\ 1 & 1 & 1 & 0 \\\\ 1 & 1 & 1 & 1 \\end{bmatrix}.\n",
+ "$$"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import utils\n",
+ "from utils import validate_transpose\n",
+ "\n",
+ "fig, axes = validate_transpose(\n",
+ " transpose_kernel=tiled_transpose,\n",
+ " tile_transpose_dim=TILE_TRANSPOSE_DIM,\n",
+ " n_test=64,\n",
+ ")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Step 4: Going from $\\hat{\\omega}$ to $\\hat{\\psi}$ in the Fourier space\n",
+ "\n",
+ "After transforming $\\omega$ to Fourier space, we solve equation (3)\n",
+ "\n",
+ "$$\\hat{\\psi}_{m,n} = \\frac{\\hat{\\omega}_{m,n}}{k_x^2 + k_y^2} = \\frac{\\hat{\\omega}_{m,n}}{|k|^2} = \\hat{\\omega}_{m,n} \\cdot |k|^{-2}.$$\n",
+ "\n",
+ "We precompute $1/|k|^2$ and multiply it elementwise to the `omega_hat` array to obtain the `psi_hat` array."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "**EXERCISE: Please fill out the `def multiply_k2_inverse(...)` kernel definition.**"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "@wp.kernel\n",
+ "def multiply_k2_inverse(\n",
+ " k2i: wp.array2d(dtype=wp.float32),\n",
+ " omega_hat: wp.array2d(dtype=wp.vec2f),\n",
+ " psi_hat: wp.array2d(dtype=wp.vec2f),\n",
+ "):\n",
+ " \"\"\"Solve Poisson equation in Fourier space: psi_hat = omega_hat / |k|^2.\n",
+ "\n",
+ " Args:\n",
+ " k2i: Precomputed 1/|k|^2 array (0 at k=0 to handle DC component).\n",
+ " omega_hat: Fourier transform of vorticity.\n",
+ " psi_hat: Output Fourier transform of stream function.\n",
+ " \"\"\"\n",
+ " i, j = wp.tid()\n",
+ " psi_hat[i, j] = omega_hat[i, j] * k2i[i, j] # MISSING"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Now that we have all the kernel calls figured out, we need to put together two Python functions before we start assembling the solver together!\n",
+ "\n",
+ "1. `def fft_2d(fft_kernel, input_arr)` - Python function that composes row-wise FFT/IFFT -> transpose → row-wise FFT to perform a full 2-D FFT/IFFT on a input complex (`wp.vec2f`) array.\n",
+ "\n",
+ "2. `def solve_poisson(omega, k2i)` - Python function that takes the real-valued vorticity field and pre-computed $|k|^2$ and gives out real values $\\psi$.\n",
+ "\n",
+ "3. `def step(omega_old, omega_new)` - Python function that performs on Runge-Kutte timestep to go from `omega_old` to `omega_new`."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "**EXERCISE: Until now, you have mostly filled in and worked through Warp kernels and functions. In the next cell, we ask you to call a Warp kernel from inside the Python function `def fft_2d(fft_kernel, input_arr)`. This function is FFT/IFFT agnostic and can be used for both (how?). We ask you to fill two kernel calls, one to `def tiled_transpose(...)` and another to `def fft_kernel(...)`.**"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "def fft_2d(fft_kernel, input_arr):\n",
+ " \"\"\"Perform 2-D FFT or IFFT using row-wise transform + transpose pattern.\n",
+ "\n",
+ " Args:\n",
+ " fft_kernel: Either def fft_tiled(...) or def ifft_tiled(...) Warp kernel.\n",
+ " input_arr: Input complex array.\n",
+ " \n",
+ " Returns:\n",
+ " Output complex array with FFT/IFFT results.\n",
+ " \"\"\"\n",
+ " # allocate temporary buffers for intermediate FFT results\n",
+ " temp_1 = wp.zeros((N_GRID, N_GRID), dtype=wp.vec2f)\n",
+ " temp_2 = wp.zeros((N_GRID, N_GRID), dtype=wp.vec2f)\n",
+ "\n",
+ " # perform rowwise FFT/IFFT\n",
+ " wp.launch_tiled(\n",
+ " fft_kernel,\n",
+ " dim=[N_GRID, 1],\n",
+ " inputs=[input_arr],\n",
+ " outputs=[temp_1],\n",
+ " block_dim=BLOCK_DIM,\n",
+ " )\n",
+ "\n",
+ " wp.launch_tiled(\n",
+ " tiled_transpose,\n",
+ " dim=(N_GRID // TILE_TRANSPOSE_DIM, N_GRID // TILE_TRANSPOSE_DIM),\n",
+ " inputs= [temp_1], # MISSING \n",
+ " outputs= [temp_2], # MISSING \n",
+ " block_dim=TILE_TRANSPOSE_DIM * TILE_TRANSPOSE_DIM,\n",
+ " )\n",
+ "\n",
+ " # perform columnwise FFT/IFFT\n",
+ " output_arr = wp.empty_like(input_arr)\n",
+ " wp.launch_tiled(\n",
+ " fft_kernel,\n",
+ " dim=[N_GRID, 1],\n",
+ " inputs= [temp_2], # MISSING \n",
+ " outputs=[output_arr],\n",
+ " block_dim=BLOCK_DIM,\n",
+ " )\n",
+ " \n",
+ " return output_arr"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "**EXERCISE: Now, let's fill in the function call to `def fft_2d(...)` in the `def solve_poisson(...)` below.**"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "def solve_poisson(omega, psi, k2i):\n",
+ " \"\"\"Solve the Poisson equation using FFT.\n",
+ " \n",
+ " Args:\n",
+ " omega: Vorticity field (wp.float32 array).\n",
+ " psi: Stream function solution (wp.float32 array).\n",
+ " k2i: Precomputed 1/|k|^2 array.\n",
+ " \"\"\"\n",
+ " # allocate all temporary arrays locally\n",
+ " omega_complex = wp.zeros((N_GRID, N_GRID), dtype=wp.vec2f)\n",
+ " temp_1 = wp.zeros((N_GRID, N_GRID), dtype=wp.vec2f)\n",
+ " temp_2 = wp.zeros((N_GRID, N_GRID), dtype=wp.vec2f)\n",
+ "\n",
+ " # convert vorticity from wp.float32 to wp.vec2f\n",
+ " wp.launch(copy_float_to_vec2, dim=(N_GRID, N_GRID),\n",
+ " inputs=[omega], outputs=[omega_complex])\n",
+ "\n",
+ " # forward FFT: take omega\n",
+ " # Remember fft_2d first and second arguments are fft_tiled/ifft_tiled and vorticity field (complex valued, wp.vec2f)\n",
+ " temp_1 = fft_2d(fft_tiled, omega_complex) # MISSING\n",
+ "\n",
+ " # solve Poisson in Fourier space: psi_hat = omega_hat / |k|^2\n",
+ " wp.launch(multiply_k2_inverse, dim=(N_GRID, N_GRID),\n",
+ " inputs=[k2i, temp_1], outputs=[temp_2])\n",
+ "\n",
+ " # inverse FFT: psi_hat -> psi\n",
+ " # Remember fft_2d first and second arguments are fft_tiled/ifft_tiled and vorticity field (complex valued, wp.vec2f)\n",
+ " temp_1 = fft_2d(ifft_tiled, temp_2) # MISSING\n",
+ "\n",
+ " # extract real part and normalize\n",
+ " wp.launch(extract_real_and_scale, dim=(N_GRID, N_GRID),\n",
+ " inputs=[wp.float32(N_GRID * N_GRID), temp_1],\n",
+ " outputs=[psi]) # MISSING"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Validating the Poisson solver\n",
+ "\n",
+ "We validate `solve_poisson` using an analytical test case. Consider the Poisson equation $\\nabla^2 \\psi = -\\omega$. If we choose $\\psi_{\\text{analytical}} = \\sin(k_x x)\\sin(k_y y)$, then\n",
+ "\n",
+ "$$\\nabla^2 \\psi = -k_x^2 \\sin(k_x x)\\sin(k_y y) - k_y^2 \\sin(k_x x)\\sin(k_y y) = -(k_x^2 + k_y^2)\\sin(k_x x)\\sin(k_y y).$$\n",
+ "\n",
+ "which means $\\omega = (k_x^2 + k_y^2)\\sin(k_x x)\\sin(k_y y)$.\n",
+ "\n",
+ "In the `def validate_poisson_solver(...)`, we set $\\omega = (k_x^2 + k_y^2)\\sin(k_x x)\\sin(k_y y)$ and make a call to `def solve_poisson(...)` to obtain $\\psi_{\\text{numerical}}$. Thereafter, we compare the obtained $\\psi_{\\text{numerical}}$ with $\\psi_{\\text{analytical}}$.\n",
+ "\n",
+ "In the cell below, for starters, we have set $k_x = k_y = 1$. Run the cell and see what you get. \n",
+ "\n",
+ "**Exercise: Change the values of $k_x$ and $k_y$ and see how the plots look for different $[k_x, k_y]$ pair.**"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Validate Poisson solver\n",
+ "from utils import validate_poisson_solver\n",
+ "\n",
+ "fig, axes, max_error = validate_poisson_solver(\n",
+ " solve_poisson_func=solve_poisson,\n",
+ " n_grid=N_GRID,\n",
+ " kx=1,\n",
+ " ky=1,\n",
+ ")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "We have already given you the `def step(...)` full definition in the cell below. A single call to `def step(...)` basically advances the whole $\\omega$ field on the 2-D grid from $t$ to $t + \\Delta t$. "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "def step(omega_0, omega_1, psi, rhs, k2i):\n",
+ " \"\"\"Advance simulation by one timestep using SSP-RK3.\n",
+ " \n",
+ " Args:\n",
+ " omega_0: Vorticity at the start of timestep on the 2-D grid.\n",
+ " omega_1: Vorticity at the end of timestep/RK substep on the 2-D grid.\n",
+ " psi: Stream function on the 2-D grid.\n",
+ " rhs: Temporary array for advection + diffusion terms.\n",
+ " k2i: Precomputed 1/|k|^2 for Poisson solver.\n",
+ " \"\"\"\n",
+ " # loop over 3 RK substeps with coefficients (c0, c1, c2)\n",
+ " for c0, c1, c2 in RK3_COEFFS:\n",
+ " rhs.zero_()\n",
+ "\n",
+ " # kernel call to compute the advection and diffusion components\n",
+ " # then advance the vorticity 1 RK3 substep\n",
+ " wp.launch(\n",
+ " viscous_advection_rk3_kernel,\n",
+ " dim=(N_GRID, N_GRID),\n",
+ " inputs=[N_GRID, H, RE, DT, c0, c1, c2, omega_0, omega_1, psi, rhs],\n",
+ " )\n",
+ "\n",
+ " # solve Poisson equation to get psi from omega_1\n",
+ " solve_poisson(omega_1, psi, k2i)\n",
+ "\n",
+ " # omega_1 now holds omega(t+dt), copy back to omega_0 for next timestep\n",
+ " wp.copy(omega_0, omega_1)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "---\n",
+ "## Assembling the Solver\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Now that we have all the building blocks in place, let's put them together to run the simulation. In the cells below, we will:\n",
+ "\n",
+ "1. Allocate arrays for vorticity ($\\omega$), stream function ($\\psi$), RHS terms, and pre-compute $1/|k|^2$.\n",
+ "2. Initialize the vorticity field using the decaying turbulence spectrum mentioned above.\n",
+ "3. Capture the time-stepping loop in a CUDA graph for improved performance.\n",
+ "4. Run the simulation and visualize the results."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### 1. Allocating arrays for the simulation"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# allocate warp arrays for vorticity, stream-function, and RHS of NS equation\n",
+ "omega_0 = wp.zeros((N_GRID, N_GRID), dtype=wp.float32) # MISSING\n",
+ "omega_1 = wp.zeros((N_GRID, N_GRID), dtype=wp.float32) # MISSING\n",
+ "psi = wp.zeros((N_GRID, N_GRID), dtype=wp.float32) # MISSING\n",
+ "rhs = wp.zeros((N_GRID, N_GRID), dtype=wp.float32) # MISSING\n",
+ "\n",
+ "# precompute 1/k^2 for spectral Poisson solver (avoid division by zero at k=0)\n",
+ "k = np.fft.fftfreq(N_GRID, d=1.0 / N_GRID)\n",
+ "kx, ky = np.meshgrid(k, k)\n",
+ "k2 = kx**2 + ky**2\n",
+ "k2i_np = np.zeros_like(k2)\n",
+ "nonzero = k2 != 0\n",
+ "k2i_np[nonzero] = 1.0 / k2[nonzero]\n",
+ "k2i = wp.array2d(k2i_np.astype(np.float32), dtype=wp.float32)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### 2. Initialize vorticity on the 2-D grid using the spectrum discussed before"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# compute initial vorticity distribution for decaying turbulence\n",
+ "k_mag_np = np.sqrt(k**2 + k[:, np.newaxis] ** 2)\n",
+ "k_mag = wp.array2d(k_mag_np.astype(np.float32), dtype=wp.float32)\n",
+ "\n",
+ "rng = np.random.default_rng(42)\n",
+ "zeta_np = 2 * np.pi * rng.random((N_GRID // 2 + 1, N_GRID // 2 + 1))\n",
+ "eta_np = 2 * np.pi * rng.random((N_GRID // 2 + 1, N_GRID // 2 + 1))\n",
+ "zeta = wp.array2d(zeta_np.astype(np.float32), dtype=wp.float32)\n",
+ "eta = wp.array2d(eta_np.astype(np.float32), dtype=wp.float32)\n",
+ "omega_complex = wp.zeros((N_GRID, N_GRID), dtype=wp.vec2f)\n",
+ "\n",
+ "# set parameters for energy spectrum\n",
+ "K_P = 12.0\n",
+ "S = 3\n",
+ "\n",
+ "wp.launch(\n",
+ " decaying_turbulence_initializer,\n",
+ " dim=(N_GRID, N_GRID),\n",
+ " inputs=[N_GRID, K_P, S, k_mag, zeta, eta],\n",
+ " outputs=[omega_complex], # MISSING\n",
+ ")\n",
+ "\n",
+ "# compute IFFT of omega_complex field to get initial vorticity in physical space\n",
+ "# Remember fft_2d first and second arguments are fft_tiled/ifft_tiled and vorticity field (complex valued, wp.vec2f)\n",
+ "fft_result = fft_2d(ifft_tiled, omega_complex) # MISSING\n",
+ "\n",
+ "# extract real part to get initial vorticity field (scale=1.0 for normalization)\n",
+ "wp.launch(\n",
+ " extract_real_and_scale,\n",
+ " dim=(N_GRID, N_GRID),\n",
+ " inputs=[1.0, fft_result],\n",
+ " outputs=[omega_0],\n",
+ ")\n",
+ "\n",
+ "# for initial distribution, set both omega_1 and omega_0 to be the same\n",
+ "wp.copy(omega_1, omega_0)\n",
+ "\n",
+ "# solve initial Poisson equation to get psi from initial vorticity field\n",
+ "solve_poisson(omega_1, psi, k2i)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### 3. CUDA graph capture\n",
+ "\n",
+ "CUDA graphs let you define a sequence of operations (kernel launches, data movement) and their dependencies once, then replay it repeatedly with minimal overhead. Normally, each kernel launch incurs CPU-side setup costs. For short-running kernels launched many times, this overhead dominates. By capturing the entire workflow in a graph, these setup costs are paid once during graph creation, and subsequent launches are nearly free.\n",
+ "\n",
+ "In Warp, you can capture a CUDA graph with `wp.ScopedCapture()`. In the code block below, a single call to `step(...)` is captured and will be replayed for as many times as we want to run the simulation."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# capture first step in a CUDA graph\n",
+ "with wp.ScopedCapture() as capture:\n",
+ " step(omega_0, omega_1, psi, rhs, k2i)\n",
+ "step_graph = capture.graph"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### 4. Running the simulation and saving the results\n",
+ "\n",
+ "The boilerplate code below runs the simulation for 2000 timesteps and saves the vorticity field at every 10th timestep. Thereafter, the vorticity fields are plotted and saved as a GIF."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import time\n",
+ "from PIL import Image\n",
+ "import matplotlib.pyplot as plt\n",
+ "from matplotlib.colors import Normalize\n",
+ "import os\n",
+ "\n",
+ "NUM_FRAMES = 200\n",
+ "STEPS_PER_FRAME = 10\n",
+ "\n",
+ "# Colormap setup\n",
+ "cmap = plt.cm.twilight\n",
+ "norm = Normalize(vmin=-15, vmax=15)\n",
+ "\n",
+ "frames = []\n",
+ "print(f\"Running {NUM_FRAMES} frames ({STEPS_PER_FRAME} steps each)...\")\n",
+ "\n",
+ "start_time = time.perf_counter()\n",
+ "for frame in range(NUM_FRAMES):\n",
+ " # Advance simulation\n",
+ " for _ in range(STEPS_PER_FRAME):\n",
+ " wp.capture_launch(step_graph)\n",
+ "\n",
+ " # Capture frame\n",
+ " vorticity = omega_1.numpy().T # Transpose for correct orientation\n",
+ " colored = cmap(norm(vorticity))\n",
+ " rgb = (colored[:, :, :3] * 255).astype(np.uint8)\n",
+ " frames.append(rgb)\n",
+ "\n",
+ " if (frame + 1) % 50 == 0:\n",
+ " print(f\" Frame {frame + 1}/{NUM_FRAMES}\")\n",
+ "\n",
+ "elapsed = time.perf_counter() - start_time\n",
+ "total_steps = NUM_FRAMES * STEPS_PER_FRAME\n",
+ "print(f\"Completed {total_steps} steps in {elapsed:.2f}s ({total_steps/elapsed:.0f} steps/s)\")"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import IPython.display\n",
+ "\n",
+ "# Create output directory\n",
+ "os.makedirs(\"./images/chapter-12.2\", exist_ok=True)\n",
+ "output_file = f\"./images/chapter-12.2/turbulence_{N_GRID}x{N_GRID}_Re{int(RE)}.gif\"\n",
+ "\n",
+ "# Save as GIF\n",
+ "pil_images = [Image.fromarray(frame) for frame in frames]\n",
+ "pil_images[0].save(\n",
+ " output_file,\n",
+ " save_all=True,\n",
+ " append_images=pil_images[1:],\n",
+ " duration=50,\n",
+ " loop=0,\n",
+ ")\n",
+ "\n",
+ "print(f\"Saved: {output_file}\")\n",
+ "IPython.display.Image(output_file)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "---\n",
+ "## Validation\n",
+ "\n",
+ "**TODO**"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "---\n",
+ "## Conclusion\n",
+ "\n",
+ "**TODO**"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "---\n",
+ "## References\n",
+ "\n",
+ "**TODO**"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "---\n",
+ "## Appendix A: Phase symmetry for real-valued physical fields\n",
+ "\n",
+ "Following San and Staples 2012, we decompose the phase as $\\zeta(\\mathbf{k}) = \\xi(\\mathbf{k}) + \\eta(\\mathbf{k})$, where $\\xi$ and $\\eta$ are independently chosen random values in $[0, 2\\pi]$ in the first quadrant ($k_x \\geq 0$, $k_y \\geq 0$).\n",
+ "\n",
+ "$\\xi(\\mathbf{k})$ and $\\eta(\\mathbf{k})$ must follow these relations in other quadrants\n",
+ "\n",
+ "$$\\xi(-k_x, k_y) = -\\xi(k_x, k_y),$$\n",
+ "$$\\xi(-k_x, -k_y) = -\\xi(k_x, k_y),$$\n",
+ "$$\\xi(k_x, -k_y) = \\xi(k_x, k_y),$$\n",
+ "\n",
+ "$$\\eta(-k_x, k_y) = \\eta(k_x, k_y),$$\n",
+ "$$\\eta(-k_x, -k_y) = -\\eta(k_x, k_y),$$\n",
+ "$$\\eta(k_x, -k_y) = -\\eta(k_x, k_y).$$"
+ ]
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "warp-cfd",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.11.10"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/Accelerated_Python_User_Guide/notebooks/utils.py b/Accelerated_Python_User_Guide/notebooks/utils.py
new file mode 100644
index 00000000..fa08403b
--- /dev/null
+++ b/Accelerated_Python_User_Guide/notebooks/utils.py
@@ -0,0 +1,449 @@
+# SPDX-License-Identifier: Apache-2.0 AND CC-BY-NC-4.0
+"""Utility functions for validating FFT kernels in the 2D Navier-Stokes solver."""
+
+import numpy as np
+import warp as wp
+import matplotlib.pyplot as plt
+
+
+def validate_fft_roundtrip(
+ fft_kernel,
+ ifft_kernel,
+ n_grid: int,
+ tile_m: int,
+ tile_n: int,
+ block_dim: int,
+ figsize=(14, 4),
+):
+ """Validate FFT and IFFT kernels using a sine wave test.
+
+ Creates a sin(x) function on [0, 2\pi], applies FFT to show the amplitude
+ spectrum (should peak at k=1), then applies IFFT and compares with original.
+
+ Args:
+ fft_kernel: The fft_tiled Warp kernel.
+ ifft_kernel: The ifft_tiled Warp kernel.
+ n_grid: Grid resolution (must match TILE_N used in kernel compilation).
+ tile_m: TILE_M parameter (typically 1 for row-wise FFT).
+ tile_n: TILE_N parameter (typically N_GRID).
+ block_dim: Block dimension for kernel launch.
+ figsize: Figure size tuple.
+
+ Returns:
+ fig, axes: Matplotlib figure and axes.
+ max_error: Maximum absolute error between original and reconstructed.
+ """
+ # Create physical grid on [0, 2\pi)
+ L = 2.0 * np.pi
+ x = np.linspace(0, L, n_grid, endpoint=False)
+
+ # Create test signal: sin(x) which has a peak at k=1
+ # Since f(x) = sin(x) = (e^{ix} - e^{-ix}) / 2i, FFT should show peaks at k=\pm 1
+ signal = np.sin(x)
+
+ # Prepare 2D array (fft_tiled operates on rows of 2D arrays)
+ # We'll use a single row for this 1D test
+ signal_2d = signal.reshape(1, n_grid).astype(np.float32)
+
+ # Convert to complex (wp.vec2f format): [real, imag]
+ # Shape must be (1, n_grid, 2) for Warp to interpret as (1, n_grid) of vec2f
+ signal_complex = np.zeros((1, n_grid, 2), dtype=np.float32)
+ signal_complex[:, :, 0] = signal_2d # Real part
+ signal_complex[:, :, 1] = 0.0 # Imaginary part = 0
+
+ # Create Warp arrays - pass shape (1, n_grid) and let Warp handle vec2f
+ input_wp = wp.array(signal_complex, dtype=wp.vec2f, shape=(1, n_grid))
+ fft_result = wp.zeros((1, n_grid), dtype=wp.vec2f)
+ ifft_result = wp.zeros((1, n_grid), dtype=wp.vec2f)
+
+ # Perform FFT
+ wp.launch_tiled(
+ fft_kernel,
+ dim=[1, 1],
+ inputs=[input_wp],
+ outputs=[fft_result],
+ block_dim=block_dim,
+ )
+
+ # Perform IFFT
+ wp.launch_tiled(
+ ifft_kernel,
+ dim=[1, 1],
+ inputs=[fft_result],
+ outputs=[ifft_result],
+ block_dim=block_dim,
+ )
+
+ # Synchronize and get results
+ wp.synchronize()
+
+ # Convert back to numpy
+ fft_np = fft_result.numpy()
+ ifft_np = ifft_result.numpy()
+
+ # Extract real and imaginary parts from FFT result
+ fft_real = fft_np[0, :, 0]
+ fft_imag = fft_np[0, :, 1]
+ fft_magnitude = np.sqrt(fft_real**2 + fft_imag**2)
+
+ # IFFT result (need to normalize by N)
+ reconstructed = ifft_np[0, :, 0] / n_grid # Real part, normalized
+
+ # Wavenumbers for FFT (standard ordering: 0, 1, 2, ..., N/2-1, -N/2, ..., -1)
+ k = np.fft.fftfreq(n_grid, d=L / (2 * np.pi * n_grid))
+
+ # Compute error
+ max_error = np.max(np.abs(signal - reconstructed))
+
+ # Create figure with 3 panels
+ fig, axes = plt.subplots(1, 3, figsize=figsize)
+
+ # Panel 1: Original signal
+ axes[0].plot(x, signal, "b-", linewidth=2, label="sin(x)")
+ axes[0].set_xlabel("x")
+ axes[0].set_ylabel("f(x)")
+ axes[0].set_title("Original Signal: f(x) = sin(x)")
+ axes[0].set_xlim([0, L])
+ axes[0].grid(True, alpha=0.3)
+ axes[0].legend()
+
+ # Panel 2: FFT magnitude spectrum
+ # Shift for better visualization (put k=0 at center)
+ k_shifted = np.fft.fftshift(k)
+ mag_shifted = np.fft.fftshift(fft_magnitude)
+
+ axes[1].stem(k_shifted, mag_shifted, basefmt=" ")
+ axes[1].set_xlabel("Wavenumber k")
+ axes[1].set_ylabel(r"$|\hat{f}(k)|$")
+ axes[1].set_title("FFT Amplitude Spectrum")
+ axes[1].set_xlim([-10, 10]) # Focus on low wavenumbers
+ axes[1].grid(True, alpha=0.3)
+ axes[1].axvline(x=1, color="r", linestyle="--", alpha=0.5, label="k=1")
+ axes[1].axvline(x=-1, color="r", linestyle="--", alpha=0.5, label="k=-1")
+ axes[1].legend()
+
+ # Panel 3: Comparison of original vs reconstructed
+ axes[2].plot(x, signal, "b-", linewidth=2, label="Original")
+ axes[2].plot(x, reconstructed, "r--", linewidth=2, label="FFT→IFFT")
+ axes[2].set_xlabel("x")
+ axes[2].set_ylabel("f(x)")
+ axes[2].set_title(f"Comparison b/w original and reconstructed signal")
+ axes[2].set_xlim([0, L])
+ axes[2].grid(True, alpha=0.3)
+ axes[2].legend()
+
+ plt.tight_layout()
+
+ return fig, axes, max_error
+
+
+def validate_transpose(
+ transpose_kernel,
+ tile_transpose_dim: int,
+ n_test: int = 64,
+ figsize=(10, 4),
+):
+ """Validate the tiled transpose kernel with a visual comparison.
+
+ Creates an upper triangular matrix (1s above diagonal, 0s below),
+ applies transpose, and shows before/after.
+
+ Args:
+ transpose_kernel: The tiled_transpose Warp kernel.
+ tile_transpose_dim: TILE_TRANSPOSE_DIM parameter used in kernel.
+ n_test: Size of the test matrix (default 64x64).
+ figsize: Figure size tuple.
+
+ Returns:
+ fig, axes: Matplotlib figure and axes.
+ """
+ # Create upper triangular matrix: 1s above diagonal, 0s below
+ pattern = np.triu(np.ones((n_test, n_test), dtype=np.float32))
+
+ # Convert to complex format (wp.vec2f) - store pattern in real part
+ input_complex = np.zeros((n_test, n_test, 2), dtype=np.float32)
+ input_complex[:, :, 0] = pattern
+
+ # Create Warp arrays
+ input_wp = wp.array(input_complex, dtype=wp.vec2f, shape=(n_test, n_test))
+ output_wp = wp.zeros((n_test, n_test), dtype=wp.vec2f)
+
+ # Launch transpose kernel
+ wp.launch_tiled(
+ transpose_kernel,
+ dim=(n_test // tile_transpose_dim, n_test // tile_transpose_dim),
+ inputs=[input_wp],
+ outputs=[output_wp],
+ block_dim=tile_transpose_dim * tile_transpose_dim,
+ )
+ wp.synchronize()
+
+ # Get result
+ transposed_result = output_wp.numpy()[:, :, 0]
+
+ # Plot
+ fig, axes = plt.subplots(1, 2, figsize=figsize)
+
+ axes[0].imshow(pattern, cmap="gray_r")
+ axes[0].set_title("Original")
+ axes[0].set_xlabel("j")
+ axes[0].set_ylabel("i")
+
+ axes[1].imshow(transposed_result, cmap="gray_r")
+ axes[1].set_title("After transpose")
+ axes[1].set_xlabel("j")
+ axes[1].set_ylabel("i")
+
+ plt.tight_layout()
+ return fig, axes
+
+
+def validate_poisson_solver(
+ solve_poisson_func,
+ n_grid: int,
+ kx: int = 1,
+ ky: int = 1,
+ figsize=(14, 4),
+):
+ """Validate the Poisson solver using an analytical test case.
+
+ Tests with \omega = (kx^2 + ky^2)sin(kx·x)sin(ky·y), which has the analytical
+ solution psi = sin(kx·x)sin(ky·y) for the Poisson equation \laplacian \psi = -\omega.
+
+ Args:
+ solve_poisson_func: The solve_poisson function to validate.
+ n_grid: Grid resolution.
+ kx: Wavenumber in x-direction (default 1).
+ ky: Wavenumber in y-direction (default 1).
+ figsize: Figure size tuple.
+
+ Returns:
+ fig, axes: Matplotlib figure and axes.
+ max_error: Maximum absolute error between computed and analytical solution.
+ """
+ import warp as wp
+
+ # Create 2D grid on [0, 2\pi) \times [0, 2\pi)
+ L = 2.0 * np.pi
+ x = np.linspace(0, L, n_grid, endpoint=False)
+ y = np.linspace(0, L, n_grid, endpoint=False)
+ X, Y = np.meshgrid(x, y)
+
+ # Analytical solution: \psi = sin(kx·x)sin(ky·y)
+ psi_analytical = np.sin(kx * X) * np.sin(ky * Y)
+
+ # Corresponding vorticity: \omega = (kx^2 + ky^2)sin(kx·x)sin(ky·y)
+ # From \laplacian\psi = -\omega, we have \omega = -\laplacian\psi = (kx^2 + ky^2)\psi
+ k_squared = kx**2 + ky**2
+ omega = k_squared * psi_analytical
+
+ # Precompute 1/|k|^2 for spectral Poisson solver
+ k_freq = np.fft.fftfreq(n_grid, d=1.0 / n_grid)
+ kx_grid, ky_grid = np.meshgrid(k_freq, k_freq)
+ k2 = kx_grid**2 + ky_grid**2
+ k2i_np = np.zeros_like(k2)
+ nonzero = k2 != 0
+ k2i_np[nonzero] = 1.0 / k2[nonzero]
+ k2i = wp.array(k2i_np.astype(np.float32), dtype=wp.float32)
+
+ # Create Warp arrays for omega and psi
+ omega_wp = wp.array(omega.astype(np.float32), dtype=wp.float32)
+ psi_wp = wp.zeros((n_grid, n_grid), dtype=wp.float32)
+
+ # Solve Poisson equation
+ solve_poisson_func(omega_wp, psi_wp, k2i)
+ wp.synchronize()
+
+ # Get computed solution
+ psi_computed = psi_wp.numpy()
+
+ # Compute error
+ error = np.abs(psi_computed - psi_analytical)
+ max_error = np.max(error)
+
+ # Create visualization
+ fig, axes = plt.subplots(1, 3, figsize=figsize)
+
+ # Common colormap settings
+ vmin = min(psi_analytical.min(), psi_computed.min())
+ vmax = max(psi_analytical.max(), psi_computed.max())
+
+ # Panel 1: Computed solution
+ im0 = axes[0].imshow(
+ psi_computed,
+ extent=[0, L, 0, L],
+ origin="lower",
+ cmap="RdBu_r",
+ vmin=vmin,
+ vmax=vmax,
+ )
+ axes[0].set_title(r"Computed $\psi$")
+ axes[0].set_xlabel("x")
+ axes[0].set_ylabel("y")
+ plt.colorbar(im0, ax=axes[0], shrink=0.8)
+
+ # Panel 2: Analytical solution
+ im1 = axes[1].imshow(
+ psi_analytical,
+ extent=[0, L, 0, L],
+ origin="lower",
+ cmap="RdBu_r",
+ vmin=vmin,
+ vmax=vmax,
+ )
+ axes[1].set_title(r"Analytical $\psi = \sin(x)\sin(y)$")
+ axes[1].set_xlabel("x")
+ axes[1].set_ylabel("y")
+ plt.colorbar(im1, ax=axes[1], shrink=0.8)
+
+ # Panel 3: Error
+ im2 = axes[2].imshow(error, extent=[0, L, 0, L], origin="lower", cmap="hot")
+ axes[2].set_title(f"Error")
+ axes[2].set_xlabel("x")
+ axes[2].set_ylabel("y")
+ plt.colorbar(im2, ax=axes[2], shrink=0.8)
+
+ plt.tight_layout()
+ return fig, axes, max_error
+
+
+def validate_advection_diffusion(
+ advection_diffusion_kernel,
+ n_grid: int,
+ kx: int = 2,
+ ky: int = 3,
+ figsize=(10, 4),
+):
+ """Validate finite-difference advection and diffusion using analytical test case.
+
+ Uses test functions:
+ omega = (kx^2 + ky^2) * sin(kx*x) * sin(ky*y)
+ psi = sin(kx*x) * sin(ky*y)
+
+ Analytical results:
+ Diffusion: nabla^2 omega = -(kx^2 + ky^2)^2 * sin(kx*x) * sin(ky*y)
+ Advection: J(psi, omega) = 0 (since omega = c * psi with constant c)
+
+ Args:
+ advection_diffusion_kernel: Warp kernel with signature:
+ (omega, psi, advection_out, diffusion_out, h, n) where all arrays
+ are wp.array2d(dtype=wp.float32), h is wp.float32, n is wp.int32.
+ n_grid: Grid resolution.
+ kx: Wavenumber in x-direction.
+ ky: Wavenumber in y-direction.
+ figsize: Figure size for each 1x2 panel.
+
+ Returns:
+ (fig_diff, axes_diff): Matplotlib figure/axes for diffusion comparison.
+ (fig_adv, axes_adv): Matplotlib figure/axes for advection comparison.
+ max_diff_error: Maximum absolute error in diffusion.
+ max_adv_error: Maximum absolute error in advection.
+ """
+ # Create 2D grid on [0, 2*pi) x [0, 2*pi)
+ L = 2.0 * np.pi
+ h = L / n_grid
+ x = np.linspace(0, L, n_grid, endpoint=False)
+ y = np.linspace(0, L, n_grid, endpoint=False)
+ X, Y = np.meshgrid(x, y)
+
+ # Analytical fields
+ k_squared = kx**2 + ky**2
+ psi_analytical = np.sin(kx * X) * np.sin(ky * Y)
+ omega_analytical = k_squared * psi_analytical
+
+ # Analytical diffusion: nabla^2 omega = -(kx^2 + ky^2) * omega
+ diffusion_analytical = -k_squared * omega_analytical
+
+ # Analytical advection: J(psi, omega) = 0 (since omega = c * psi)
+ advection_analytical = np.zeros_like(omega_analytical)
+
+ # Create Warp arrays
+ omega_wp = wp.array(omega_analytical.astype(np.float32), dtype=wp.float32)
+ psi_wp = wp.array(psi_analytical.astype(np.float32), dtype=wp.float32)
+ advection_wp = wp.zeros((n_grid, n_grid), dtype=wp.float32)
+ diffusion_wp = wp.zeros((n_grid, n_grid), dtype=wp.float32)
+
+ # Launch user-provided kernel
+ wp.launch(
+ advection_diffusion_kernel,
+ dim=(n_grid, n_grid),
+ inputs=[omega_wp, psi_wp, advection_wp, diffusion_wp, float(h), n_grid],
+ )
+ wp.synchronize()
+
+ # Get numerical results
+ diffusion_numerical = diffusion_wp.numpy()
+ advection_numerical = advection_wp.numpy()
+
+ # --- Diffusion comparison plot (1x2 panel) ---
+ fig_diff, axes_diff = plt.subplots(1, 2, figsize=figsize)
+
+ vmin_d = min(diffusion_analytical.min(), diffusion_numerical.min())
+ vmax_d = max(diffusion_analytical.max(), diffusion_numerical.max())
+
+ im0 = axes_diff[0].imshow(
+ diffusion_analytical,
+ extent=[0, L, 0, L],
+ origin="lower",
+ cmap="RdBu_r",
+ vmin=vmin_d,
+ vmax=vmax_d,
+ )
+ axes_diff[0].set_title(r"Analytical $\nabla^2\omega$")
+ axes_diff[0].set_xlabel("x")
+ axes_diff[0].set_ylabel("y")
+ plt.colorbar(im0, ax=axes_diff[0], shrink=0.8)
+
+ im1 = axes_diff[1].imshow(
+ diffusion_numerical,
+ extent=[0, L, 0, L],
+ origin="lower",
+ cmap="RdBu_r",
+ vmin=vmin_d,
+ vmax=vmax_d,
+ )
+ axes_diff[1].set_title(rf"Numerical Diffusion")
+ axes_diff[1].set_xlabel("x")
+ axes_diff[1].set_ylabel("y")
+ plt.colorbar(im1, ax=axes_diff[1], shrink=0.8)
+
+ fig_diff.suptitle(f"Diffusion Validation (kx={kx}, ky={ky}, N={n_grid})", y=1.02)
+ fig_diff.tight_layout()
+
+ # --- Advection comparison plot (1x2 panel) ---
+ fig_adv, axes_adv = plt.subplots(1, 2, figsize=figsize)
+
+ # For advection, analytical is zero, so set symmetric limits around numerical
+ adv_max = max(np.abs(advection_numerical).max(), 1e-10)
+ vmin_a, vmax_a = -adv_max, adv_max
+
+ im2 = axes_adv[0].imshow(
+ advection_analytical,
+ extent=[0, L, 0, L],
+ origin="lower",
+ cmap="RdBu_r",
+ vmin=vmin_a,
+ vmax=vmax_a,
+ )
+ axes_adv[0].set_title(r"Analytical Advection (= 0)")
+ axes_adv[0].set_xlabel("x")
+ axes_adv[0].set_ylabel("y")
+ plt.colorbar(im2, ax=axes_adv[0], shrink=0.8)
+
+ im3 = axes_adv[1].imshow(
+ advection_numerical,
+ extent=[0, L, 0, L],
+ origin="lower",
+ cmap="RdBu_r",
+ vmin=vmin_a,
+ vmax=vmax_a,
+ )
+ axes_adv[1].set_title(rf"Numerical Advection")
+ axes_adv[1].set_xlabel("x")
+ axes_adv[1].set_ylabel("y")
+ plt.colorbar(im3, ax=axes_adv[1], shrink=0.8)
+
+ fig_adv.suptitle(f"Advection Validation (kx={kx}, ky={ky}, N={n_grid})", y=1.02)
+ fig_adv.tight_layout()
+
+ return (fig_diff, axes_diff), (fig_adv, axes_adv)