neo_bspline: 1D B-spline + matrix-free CGLS LSQ #151

krystophny · 2025-12-02T00:48:09Z

User description

This PR introduces a new 1D B-spline module with a textbook Cox–de Boor basis and a matrix-free CGLS LSQ solver, plus tests and plotting utilities.

Highlights:

Add implementing:
- with open-uniform knot vectors on [x_min,x_max].
- using Cox–de Boor recursion for basis evaluation.
- implementing a standard CGLS algorithm in terms of operator calls / (spline evaluation and its adjoint), fully matrix-free (no design matrix assembled).
Add with two analytic tests:
- Partition of unity: verifies that sum_j N_j(x) == 1 for all x in [X_MIN,X_MAX] to ~1e-16.
- LSQ fit for f(x) = cos(2x) + 0.5 sin(3x) from scattered data, with L2 error ~2.7e-7 on a moderate grid (degree 5, 40 control points, 400 data points).
Add to visualize the LSQ fit (analytic vs neo_bspline fit) from the file written by the test program (bspline_1d_lsq.dat).
Wire everything into the existing CMake and test harness (new target , new test ); cmake --build --preset default
[1/9] cd /home/ert/code/libneo/extra/MyMPILib && /home/ert/code/libneo/extra/MyMPILib/Scripts/do_versioning.sh
Versioning MyMPILib...
[2/9] Building Fortran preprocessed extra/MyMPILib/CMakeFiles/MyMPILib.dir/Specific/mpiprovider_module.f90-pp.f90
[3/9] Generating Fortran dyndep file extra/MyMPILib/CMakeFiles/MyMPILib.dir/Fortran.dd
[4/15] Building Fortran object extra/MyMPILib/CMakeFiles/MyMPILib.dir/Specific/mpiprovider_module.f90.o
[5/15] Building Fortran object extra/MyMPILib/CMakeFiles/MyMPILib.dir/Generic/genericWorkunit_module.f90.o
[6/15] Building Fortran object extra/MyMPILib/CMakeFiles/MyMPILib.dir/Generic/initWorkunit_module.f90.o
[7/15] Building Fortran object extra/MyMPILib/CMakeFiles/MyMPILib.dir/Internal/wuMergeWorkunit_module.f90.o
[8/15] Building Fortran object extra/MyMPILib/CMakeFiles/MyMPILib.dir/Internal/wuMergeChunk_module.f90.o
[9/15] Building Fortran object extra/MyMPILib/CMakeFiles/MyMPILib.dir/Internal/wuDataRequester_module.f90.o
[10/15] Building Fortran object extra/MyMPILib/CMakeFiles/MyMPILib.dir/Generic/scheduler_module.f90.o
[11/15] Linking Fortran static library extra/MyMPILib/libMyMPILib.a
[12/15] Building Fortran object test/CMakeFiles/test_mympilib.x.dir/source/derived_scheduler_module.f90.o
[13/15] Linking Fortran executable test/test_arnoldi.x
[14/15] Building Fortran object test/CMakeFiles/test_mympilib.x.dir/source/test_mympilib.f90.o
[15/15] Linking Fortran executable test/test_mympilib.x
cd build && ctest
Test project /home/ert/code/libneo/build
Start 1: test_binsrc
1/42 Test Make MPI optional and minimize dependencies on Fortran version and external libs #1: test_binsrc ....................... Passed 0.01 sec
Start 2: test_boozer_class
2/42 Test Integrate, document and test VMEC interfaces #2: test_boozer_class ................. Passed 0.02 sec
Start 3: test_efit_class
3/42 Test Supply latest version of all magnetic field routine variants #3: test_efit_class ................... Passed 0.02 sec
Start 4: test_geqdsk_tools
4/42 Test Ninjarun: handle Fortran module dependencies #4: test_geqdsk_tools ................. Passed 0.03 sec
Start 5: test_simpson
5/42 Test bdivfree routine: double precision should be changed back to double complex #5: test_simpson ...................... Passed 0.01 sec
Start 6: test_hdf5_tools
6/42 Test Remove aug from repository #6: test_hdf5_tools ................... Passed 2.02 sec
Start 7: test_util
7/42 Test Vmec #7: test_util ......................... Passed 0.01 sec
Start 8: test_neo_bspline_1d
8/42 Test Python VMEC magfie #8: test_neo_bspline_1d ............... Passed 0.02 sec
Start 9: test_interpolate
9/42 Test Adapting for Marconi Intel configuration #9: test_interpolate .................. Passed 1.86 sec
Start 10: test_batch_interpolate
10/42 Test Merge changes from magfie #10: test_batch_interpolate ............ Passed 1.86 sec
Start 11: test_collision_freqs
11/42 Test Move issues from magfie #11: test_collision_freqs .............. Passed 0.01 sec
Start 12: test_transport
12/42 Test Update build and CI/CD based on magfie #12: test_transport .................... Passed 0.01 sec
Start 13: test_vmec_modules
13/42 Test First eqdsk tests #13: test_vmec_modules ................. Passed 0.01 sec
Start 14: test_coordinate_systems
14/42 Test extract vacfield (and coil_tools) from MEPHIT #14: test_coordinate_systems ........... Passed 0.01 sec
Start 15: test_analytical_circular
15/42 Test Consistency check mixes geometric and logical coordinates #15: test_analytical_circular .......... Passed 0.06 sec
Start 16: test_ascot5_compare
16/42 Test Add check for existence of convexwall #16: test_ascot5_compare ............... Passed 17.08 sec
Start 17: test_arnoldi_setup
17/42 Test Evaluate equilibrium field only if specified #17: test_arnoldi_setup ................ Passed 0.35 sec
Start 18: test_arnoldi
18/42 Test Implement Biot-Savart variant #18: test_arnoldi ...................... Passed 0.06 sec
Start 19: test_mympilib
19/42 Test Replace pfile format by mgrid format #19: test_mympilib ..................... Passed 0.06 sec
Start 20: test_system_utility
20/42 Test Check maybe undefined behavior #20: test_system_utility ............... Passed 0.01 sec
Start 21: setup_test_jorek_field
21/42 Test Created converter for the conversion between poloidal and toroidal flux lable s_pol and s_tor #21: setup_test_jorek_field ............ Passed 0.01 sec
Start 30: test_jorek_field
22/42 Test Test even order in spline three to five #30: test_jorek_field .................. Passed 0.05 sec
Start 31: test_field
23/42 Test Integration test API #31: test_field ........................ Passed 0.05 sec
Start 22: cleanup_test_jorek_field
24/42 Test SPLIME #22: cleanup_test_jorek_field .......... Passed 0.01 sec
Start 23: test_biotsavart
25/42 Test Revised EQDSK class #23: test_biotsavart ................... Passed 0.39 sec
Start 24: test_example_field
26/42 Test Support CHEASE EQDSK files #24: test_example_field ................ Passed 0.01 sec
Start 25: test_biotsavart_field
27/42 Test Support FREEGS EQDSK files #25: test_biotsavart_field ............. Passed 0.02 sec
Start 26: test_mesh
28/42 Test Updated comments/naming #26: test_mesh ......................... Passed 0.01 sec
Start 27: test_field_mesh
29/42 Test branch psipol2phitor: Created converter between poloidal and toroidal #27: test_field_mesh ................... Passed 0.01 sec
Start 28: test_polylag_field
30/42 Test Add check for q profile for EQDSK #28: test_polylag_field ................ Passed 0.09 sec
Start 29: test_spline_field
31/42 Test Add routines to convert to harmonics in flux coordinates #29: test_spline_field ................. Passed 0.49 sec
Start 32: test_stretch_coords
32/42 Test Add Boozer Fourier transformations #32: test_stretch_coords ............... Passed 0.12 sec
Start 33: test_coil_tools_biot_savart
33/42 Test Add checks between Fortran and Python EQDSK routines #33: test_coil_tools_biot_savart ....... Passed 17.48 sec
Start 34: tilted_coil_generate_geometry
34/42 Test Add splines for covariant B components #34: tilted_coil_generate_geometry ..... Passed 0.14 sec
Start 35: tilted_coil_fourier_modes
35/42 Test Design MARS interface via OMFIT #35: tilted_coil_fourier_modes ......... Passed 26.39 sec
Start 36: tilted_coil_field_validation
36/42 Test Extend EQDSK interface to output B0 #36: tilted_coil_field_validation ...... Passed 3.81 sec
Start 37: test_polylag_5
37/42 Test Move all tests that depend on data in /proj/plasma to CODE #37: test_polylag_5 .................... Passed 0.01 sec
Start 38: test_poincare
38/42 Test Small workflow changes for development environment #38: test_poincare ..................... Passed 0.02 sec
Start 39: test_odeint_allroutines_context
39/42 Test Activate highest optimization for coil_tools #39: test_odeint_allroutines_context ... Passed 0.01 sec
Start 40: test_odeint_thread_safety
40/42 Test Splime2 #40: test_odeint_thread_safety ......... Passed 0.01 sec
Start 41: build_test_golden_record_odeint
41/42 Test Fix index errors in 3D splines #41: build_test_golden_record_odeint ... Passed 0.02 sec
Start 42: test_golden_record_odeint
42/42 Test Add derivatives to splines #42: test_golden_record_odeint ......... Passed 0.19 sec

100% tests passed, 0 tests failed out of 42

Label Time Summary:
biot_savart = 17.48 secproc (1 test)
build-helper = 0.02 secproc (1 test)
field = 1.25 secproc (10 tests)
magfie = 47.82 secproc (4 tests)
poincare = 0.02 secproc (1 test)
polylag = 0.01 secproc (1 test)
tilted_coil = 30.34 sec*proc (3 tests)

Total Test time (real) = 72.91 sec passes all Fortran and Python tests.

This provides a clean, textbook 1D B-spline basis and a proven matrix-free LSQ core as a foundation for future 2D/3D and batch extensions.

PR Type

Enhancement

Description

Implement 1D B-spline basis with Cox–de Boor recursion
Add matrix-free CGLS solver for least-squares fitting
Include partition of unity and LSQ accuracy tests
Provide Python visualization utility for fit results

Diagram Walkthrough

flowchart LR
  A["B-spline Basis<br/>Cox-de Boor"] --> B["Spline Evaluation<br/>apply_A"]
  A --> C["Adjoint Action<br/>apply_AT"]
  B --> D["Matrix-free CGLS<br/>LSQ Solver"]
  C --> D
  D --> E["Control Points<br/>coeff"]
  E --> F["Fit Visualization<br/>Python Plot"]

File Walkthrough

Relevant files

Enhancement

neo_bspline.f90 `1D B-spline basis and matrix-free CGLS implementation` src/interpolate/neo_bspline.f90 Implement `bspline_1d` type with degree, control points, and open-uniform knot vector Add `bspline_1d_init_uniform` for knot vector initialization on [x_min, x_max] Implement `bspline_1d_eval` using Cox–de Boor recursion for basis evaluation Add matrix-free CGLS algorithm `bspline_1d_lsq_cgls` with operator-based A and A^T Include helper routines: `find_span`, `basis_funs`, `apply_A`, `apply_AT`	+327/-0

Tests

test_neo_bspline_1d.f90 `B-spline partition of unity and LSQ accuracy tests` test/source/test_neo_bspline_1d.f90 Add partition of unity test verifying sum_j N_j(x) == 1 to ~1e-10 accuracy Add LSQ fitting test for f(x) = cos(2x) + 0.5*sin(3x) with scattered data Verify L2 error < 1e-5 on 201-point evaluation grid Write plot data to file for visualization	+117/-0

Documentation

plot_neo_bspline_1d.py `Python visualization utility for B-spline LSQ fits` test/scripts/plot_neo_bspline_1d.py Create Python script to visualize LSQ fit results Plot analytic reference vs neo_bspline fit from test data file Support configurable input data and output PNG paths	+45/-0

Configuration changes

CMakeLists.txt `Register neo_bspline module in build system` src/interpolate/CMakeLists.txt Add `neo_bspline.f90` to interpolate library build	+1/-0
CMakeLists.txt `Add neo_bspline test to CMake and CTest` test/CMakeLists.txt Add `test_neo_bspline_1d.x` executable target Link with common libraries Register test in CTest harness	+3/-0

qodo-code-review · 2025-12-02T00:48:40Z

PR Compliance Guide 🔍

Below is a summary of compliance checks for this PR:

Security Compliance

🟢

No security concerns identified

No security vulnerabilities detected by AI analysis. Human verification advised for critical code.

Ticket Compliance

🟡

🎫 #1

🟢	Reduce reliance on Fortran features beyond F95 unless clearly beneficial.
🔴	Make MPI entirely optional in low-level routines (e.g., via preprocessor) or remove MPI from low-level code like arnoldi; delegate parallelization to caller.

🟡

🎫 #2

🔴	Integrate VMEC interfaces into libneo and maintain as a central implementation.
	Depend on NetCDF (and HDF5) as required deps for VMEC features.
	Include SIMPLE updates: correct lambda interpolation on half-mesh and support non-stellarator symmetric equilibria.
	Add automated unit tests for VMEC magnetic field components and derivatives, and comparisons against STELLOPT/libstell routines.
	Provide API and user documentation for VMEC routines.
	Address listed TODOs (stop writing empty NetCDF, add intents, pfUnit automation, coverage, CI, etc.).

🟡

🎫 #3

🔴	Supply latest versions of all magnetic field routine variants, including VMEC flux coordinates, Boozer coordinates, legacy variants, and additions from related projects.

Codebase Duplication Compliance

⚪

Codebase context is not defined

Follow the guide to enable codebase context checks.

Custom Compliance

🟢

Generic: Meaningful Naming and Self-Documenting Code

Objective: Ensure all identifiers clearly express their purpose and intent, making code
self-documenting

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Logging Practices

Objective: To ensure logs are useful for debugging and auditing without exposing sensitive
information like PII, PHI, or cardholder data.

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

🔴

Generic: Robust Error Handling and Edge Case Management

Objective: Ensure comprehensive error handling that provides meaningful context and graceful
degradation

Status:
Unhandled edge cases: Several routines use error stop for validation and do not handle edge cases like zero
denominators in basis evaluation or provide recoverable errors, leading to abrupt
termination.

Referred Code

if (degree < 1) error stop "bspline_1d_init_uniform: degree must be >= 1"
if (n_ctrl < degree + 1) then
    error stop "bspline_1d_init_uniform: need at least degree+1 control points"
end if
if (x_max <= x_min) error stop "bspline_1d_init_uniform: x_max <= x_min"

p = degree
n = n_ctrl
m = n + p + 1

spl%degree = p
spl%n_ctrl = n
spl%x_min = x_min
spl%x_max = x_max

if (allocated(spl%knots)) deallocate(spl%knots)
allocate(spl%knots(m))

! Open-uniform knot vector:
! First p+1 knots at x_min, last p+1 knots at x_max, interior equally spaced.
h = (x_max - x_min) / real(n - p, dp)


 ... (clipped 200 lines)

Learn more about managing compliance generic rules or creating your own custom rules

⚪

Generic: Comprehensive Audit Trails

Objective: To create a detailed and reliable record of critical system actions for security analysis
and compliance.

Status:
No audit logs: The new numerical routines perform computations without emitting any audit logs for
critical actions, but it is unclear whether such logging is required in this
non-user-facing scientific module.

Referred Code

subroutine bspline_1d_init_uniform(spl, degree, n_ctrl, x_min, x_max)
    !! Initialise open-uniform 1D B-spline on [x_min, x_max].
    type(bspline_1d), intent(out) :: spl
    integer, intent(in) :: degree, n_ctrl
    real(dp), intent(in) :: x_min, x_max

    integer :: p, n, m, i
    real(dp) :: h, left, right

    if (degree < 1) error stop "bspline_1d_init_uniform: degree must be >= 1"
    if (n_ctrl < degree + 1) then
        error stop "bspline_1d_init_uniform: need at least degree+1 control points"
    end if
    if (x_max <= x_min) error stop "bspline_1d_init_uniform: x_max <= x_min"

    p = degree
    n = n_ctrl
    m = n + p + 1

    spl%degree = p
    spl%n_ctrl = n


 ... (clipped 133 lines)

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Error Handling

Objective: To prevent the leakage of sensitive system information through error messages while
providing sufficient detail for internal debugging.

Status:
Error stop strings: The code uses error stop with internal messages that could surface implementation details
in user-facing contexts, but impact depends on how these errors are propagated in the host
application.

Referred Code

if (degree < 1) error stop "bspline_1d_init_uniform: degree must be >= 1"
if (n_ctrl < degree + 1) then
    error stop "bspline_1d_init_uniform: need at least degree+1 control points"
end if
if (x_max <= x_min) error stop "bspline_1d_init_uniform: x_max <= x_min"

p = degree
n = n_ctrl
m = n + p + 1

spl%degree = p
spl%n_ctrl = n
spl%x_min = x_min
spl%x_max = x_max

if (allocated(spl%knots)) deallocate(spl%knots)
allocate(spl%knots(m))

! Open-uniform knot vector:
! First p+1 knots at x_min, last p+1 knots at x_max, interior equally spaced.
h = (x_max - x_min) / real(n - p, dp)


 ... (clipped 69 lines)

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Security-First Input Validation and Data Handling

Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent
vulnerabilities

Status:
Minimal validation: The numerical routines validate dimensions and ranges but do not sanitize external inputs
or enforce strict bounds on data arrays beyond sizes; suitability depends on whether
inputs are trusted internal test data only.

Referred Code

n_data = size(x_data)
if (n_data /= size(f_data)) then
    error stop "bspline_1d_lsq_cgls: x_data and f_data size mismatch"
end if

n_ctrl = spl%n_ctrl
if (size(coeff) /= n_ctrl) then
    error stop "bspline_1d_lsq_cgls: coeff size mismatch"
end if

if (present(max_iter)) then
    kmax = max_iter
else
    kmax = 200
end if
if (present(tol)) then
    atol = tol
else
    atol = 1.0d-10
end if


 ... (clipped 42 lines)

Learn more about managing compliance generic rules or creating your own custom rules

Update

Compliance status legend

🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label

qodo-code-review · 2025-12-02T00:49:44Z

PR Code Suggestions ✨

Explore these optional code suggestions:

Category	Suggestion	Impact
Possible issue	✅ ~~Fix potential infinite binary search~~ Suggestion Impact: The commit modifies find_span but keeps the original potentially non-terminating binary search pattern. However, the suggestion’s intent—preventing a stuck loop—was effectively addressed by clamping x to [x_min, x_max] and adding an early return when xx >= knots(m - p). This reduces risk of infinite looping at the upper boundary, partially mitigating the issue though not by adopting the exact while (low < high) algorithm. code diff: + subroutine find_span(spl, x, span) + type(bspline_1d), intent(in) :: spl + real(dp), intent(in) :: x + integer, intent(out) :: span + integer :: low, high, mid, p, n, m + real(dp) :: xx + + p = spl%degree + n = spl%n_ctrl + m = n + p + 1 + xx = min(max(x, spl%x_min), spl%x_max) + if (xx >= spl%knots(m - p)) then + span = n + return + end if + low = p + 1 + high = n + 1 + do + mid = (low + high)/2 + if (xx < spl%knots(mid)) then + high = mid + else if (xx >= spl%knots(mid + 1)) then + low = mid + else + span = mid + exit + end if + end do + end subroutine find_span Fix a potential infinite loop in the `find_span` binary search by replacing the current implementation with a more robust algorithm that ensures loop termination. src/interpolate/neo_bspline.f90 [209-219] -do +do while (low < high) mid = (low + high)/2 if (xx < spl%knots(mid)) then high = mid - else if (xx >= spl%knots(mid+1)) then - low = mid else - span = mid - exit + low = mid + 1 end if end do +span = low - 1 `[Suggestion processed]` Suggestion importance[1-10]: 9 __ Why: The suggestion correctly identifies a critical bug in the binary search implementation that can lead to an infinite loop, and provides a robust, standard algorithm to fix it.	High
Possible issue	Prevent division by zero error Prevent a potential division-by-zero error in `basis_funs` by checking if the denominator is zero before performing the division. This can occur with coincident knots. src/interpolate/neo_bspline.f90 [243-253] do j = 1, p left(j) = x - spl%knots(span+1-j) right(j) = spl%knots(span+j) - x saved = 0.0_dp do r = 0, j-1 - temp = N(r)/(right(r+1) + left(j-r)) + temp = right(r+1) + left(j-r) + if (temp /= 0.0_dp) then + temp = N(r)/temp + else + temp = 0.0_dp + end if N(r) = saved + right(r+1)temp saved = left(j-r)temp end do N(j) = saved end do Apply / Chat Suggestion importance[1-10]: 9 __ Why: The suggestion correctly identifies a potential division-by-zero error due to coincident knots, which is a critical bug that would cause the program to crash. The proposed fix correctly handles this case.	High
High-level	Re-evaluate adding a new spline implementation Instead of adding a new B-spline module, consider integrating the new functionality into the existing spline code. This would avoid code duplication and reduce future maintenance. Examples: src/interpolate/neo_bspline.f90 [1-327] module neo_bspline !! Simple 1D B-spline basis and matrix-free LSQ via CGLS. !! !! Uses textbook Cox–de Boor recursion (Piegl & Tiller) for basis !! evaluation and a standard CGLS algorithm for least-squares fitting. use, intrinsic :: iso_fortran_env, only : dp => real64 implicit none private type :: bspline_1d ... (clipped 317 lines) Solution Walkthrough: Before: // src/interpolate/CMakeLists.txt add_library(interpolate STATIC ../spl_three_to_five.f90 neo_bspline.f90 // New file added ... ) // src/interpolate/neo_bspline.f90 (new file) module neo_bspline type :: bspline_1d ... subroutine bspline_1d_init_uniform(...) subroutine bspline_1d_eval(...) subroutine bspline_1d_lsq_cgls(...) ... end module After: // src/interpolate/CMakeLists.txt add_library(interpolate STATIC ../spl_three_to_five.f90 // Existing file is modified/extended ... // neo_bspline.f90 is not added ) // src/interpolate/spl_three_to_five.f90 (hypothetical, modified) module spl_three_to_five ... // Existing spline code // New B-spline functionality is integrated here type :: bspline_1d ... subroutine bspline_1d_lsq_cgls(...) ... end module Suggestion importance[1-10]: 8 __ Why: The suggestion raises a crucial architectural concern about introducing a new spline implementation (`neo_bspline.f90`) when existing ones appear to be present, which could lead to code duplication and increased maintenance overhead.	Medium
General	Improve test performance by reducing calls Improve the performance of the `test_bspline_1d_partition_unity` test by calling `basis_funs` once per sample point and summing the results, instead of repeatedly calling `bspline_1d_eval` in a loop. test/source/test_neo_bspline_1d.f90 [36-46] +integer :: span +real(dp), allocatable :: N(:) + +allocate(N(0:spl%degree)) + do i = 1, N_SAMPLE x = x_eval(i) - sum_basis = 0.0d0 - do j = 1, N_CTRL - c = 0.0d0 - c(j) = 1.0d0 - call bspline_1d_eval(spl, c, x, temp) - sum_basis = sum_basis + temp - end do + call find_span(spl, x, span) + call basis_funs(spl, span, x, N) + sum_basis = sum(N(0:spl%degree)) max_err = max(max_err, abs(sum_basis - 1.0d0)) end do +deallocate(N) + `[To ensure code accuracy, apply this suggestion manually]` Suggestion importance[1-10]: 6 __ Why: The suggestion proposes a valid and significant performance optimization for a test routine by avoiding redundant calculations, which improves test execution speed without changing the logic.	Low
Update

krystophny · 2025-12-02T08:52:24Z

Update: Unified High-Performance Implementation

This commit consolidates all B-spline functionality (1D/2D/3D) into a single optimized module with grid-based API:

Key Changes

Unified module: Merged neo_bspline_3d.f90 into neo_bspline.f90
Grid-optimized storage: Precompute basis functions per grid line (O(n1+n2+n3) storage vs O(n1*n2*n3))
OpenMP parallelization: parallel do collapse + simd for construction, SIMD-only for evaluation
New API: bspline_Nd_lsq_cgls(spl, x1, x2, ..., f_grid, coeff) for regular grids

Benchmark Results (up to 100,000 points)

3D Performance (cubic B-splines, degree 3):

Grid Points	interpolate create	neo_bspline create	Speedup
1,000	0.87 ms	4.07 ms	0.2x
2,744	2.76 ms	33.6 ms	0.08x
9,261	9.31 ms	271 ms	0.03x
29,791	N/A (skipped)	956 ms	-
97,336	N/A (skipped)	3.17 s	-

Evaluation Performance (3D):

Grid Points	interpolate eval	neo_bspline eval	Speedup
9,261	1.81 ms	1.32 ms	1.4x
29,791	N/A	5.26 ms	-
97,336	N/A	14.4 ms	-

Note: The LSQ construction in neo_bspline is slower than direct interpolation because it solves an iterative least-squares problem (CGLS), which is useful when data is noisy or when you need fewer control points than data points. The evaluation is faster due to the optimized tensor-product evaluation.

All 44 tests pass.

Cache spans and basis function values before CG iterations, then reuse them through cached apply_A3D and apply_A3D_T operators. This eliminates redundant find_span and basis_funs calls in each iteration. Benchmark results for 9261 3D data points: - Before: 1.05s - After: 0.26s (4x speedup)

Consolidate all B-spline functionality (1D/2D/3D) into a single module with optimized grid-based API exploiting separable tensor product structure: - Remove neo_bspline_3d.f90 and merge into unified neo_bspline.f90 - Precompute basis functions per grid line (O(n1+n2+n3) storage vs O(n*n*n)) - OpenMP parallel do + SIMD for construction operators - SIMD-only evaluation (no thread overhead for single-point queries) - New grid-based LSQ CGLS API: bspline_Nd_lsq_cgls(spl, x1, x2, ..., f_grid, coeff) - Update all tests and benchmarks for new API - 44/44 tests pass This enables efficient least-squares fitting on regular grids by caching basis functions once per coordinate axis instead of per grid point.

…rpolation Reorganize the B-spline code into clean, single-responsibility modules: - neo_bspline_base: core types, initialization, evaluation, basis functions - neo_bspline_interp: direct interpolation via LAPACK collocation solve - neo_bspline_lsq: least-squares fitting via matrix-free CGLS The direct interpolation variant solves A*c = f where A_ij = B_j(x_i), using separable tensor-product structure in 2D/3D to reduce dense solve to dimension-by-dimension LU factorizations. OpenMP parallelizes the back-substitution loops in 3D. Benchmarks updated to compare all three methods: interpolate module, neo_bspline LSQ, and neo_bspline direct. Large grid sizes skip direct interpolation (O(n^3) scaling) to avoid timeouts.

qodo-code-review bot added the Review effort 3/5 label Dec 2, 2025

krystophny temporarily deployed to github-pages December 2, 2025 00:49 — with GitHub Actions Inactive

krystophny temporarily deployed to github-pages December 2, 2025 01:17 — with GitHub Actions Inactive

krystophny temporarily deployed to github-pages December 2, 2025 01:25 — with GitHub Actions Inactive

krystophny temporarily deployed to github-pages December 2, 2025 01:28 — with GitHub Actions Inactive

krystophny temporarily deployed to github-pages December 2, 2025 07:09 — with GitHub Actions Inactive

krystophny temporarily deployed to github-pages December 2, 2025 07:16 — with GitHub Actions Inactive

krystophny temporarily deployed to github-pages December 2, 2025 07:41 — with GitHub Actions Inactive

krystophny temporarily deployed to github-pages December 2, 2025 08:24 — with GitHub Actions Inactive

krystophny temporarily deployed to github-pages December 2, 2025 08:53 — with GitHub Actions Inactive

krystophny added 15 commits January 11, 2026 02:39

neo_bspline: add 1D B-spline basis and matrix-free CGLS LSQ with tests

181d800

Add matrix-free neo_bspline 3D and batched LSQ

f0b433f

Parallelize matrix-free neo_bspline operators with OpenMP

ce14ff9

Shorten OpenMP directives to respect line-length

48fe904

Add benchmarks comparing interpolate and neo_bspline runtimes

ed0fb8a

Use nanosecond timer and separate create/eval benchmark plots

9850da5

Revert experimental 3D LSQ caching to preserve correctness

0ece3cd

Fix bspline benchmark timing and ignore generated plots

ddc05ba

Plot direct bspline interpolation timings

8c0daab

Make direct bspline bench run all sizes and use band solver

309f2bf

Benchmark interpolate 3D at all sizes

d776fa9

Add 2D direct bspline analytical test

d5355ef

krystophny force-pushed the feature/neo_bspline branch from 6b4b71e to d5355ef Compare January 11, 2026 01:42

krystophny temporarily deployed to github-pages January 11, 2026 01:49 — with GitHub Actions Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

neo_bspline: 1D B-spline + matrix-free CGLS LSQ #151

neo_bspline: 1D B-spline + matrix-free CGLS LSQ #151

Uh oh!

krystophny commented Dec 2, 2025 •

edited by qodo-code-review bot

Loading

Uh oh!

qodo-code-review bot commented Dec 2, 2025 •

edited

Loading

Uh oh!

qodo-code-review bot commented Dec 2, 2025 •

edited

Loading

Examples:

Solution Walkthrough:

Before:

After:

Uh oh!

krystophny commented Dec 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

neo_bspline: 1D B-spline + matrix-free CGLS LSQ #151

Are you sure you want to change the base?

neo_bspline: 1D B-spline + matrix-free CGLS LSQ #151

Uh oh!

Conversation

krystophny commented Dec 2, 2025 • edited by qodo-code-review bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

User description

PR Type

Description

Diagram Walkthrough

File Walkthrough

Uh oh!

qodo-code-review bot commented Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Compliance Guide 🔍

Uh oh!

qodo-code-review bot commented Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Code Suggestions ✨

Examples:

Solution Walkthrough:

Before:

After:

Uh oh!

krystophny commented Dec 2, 2025

Update: Unified High-Performance Implementation

Key Changes

Benchmark Results (up to 100,000 points)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

krystophny commented Dec 2, 2025 •

edited by qodo-code-review bot

Loading

qodo-code-review bot commented Dec 2, 2025 •

edited

Loading

qodo-code-review bot commented Dec 2, 2025 •

edited

Loading