Skip to content

Conversation

@JohnCari
Copy link
Contributor

@JohnCari JohnCari commented Dec 28, 2025

Summary

PostgreSQL uses Nested Loop joins when joining foreign tables. During a Nested Loop join, the inner (right) table is scanned multiple times - once for each row of the outer table. The FDW framework calls re_scan() to restart the inner scan for each outer row.

The ClickHouse FDW was using the default no-op re_scan() implementation, which caused joins to return incomplete results because after the first scan:

  • is_scan_complete was still true
  • row_receiver channel was exhausted
  • No new streaming task was spawned

This fix implements re_scan() to restart the async streaming by:

  • Aborting existing streaming task if running
  • Resetting scan state flags
  • Reinitializing the bounded channel
  • Spawning new streaming task with the same SQL query
  • Fetching the first row to initialize the rescan

Test plan

  • Added integration test clickhouse_join_test that verifies inner, left, and right joins return correct results
  • All existing tests pass (cargo pgrx test pg16)
  • cargo clippy passes with no warnings
  • cargo fmt passes

The test creates two foreign tables with values (1, 2, 3) each and verifies:

  • Inner join returns 3 rows with values (1,1), (2,2), (3,3)
  • Left join returns 3 rows
  • Right join returns 3 rows

Fixes supabase#532

PostgreSQL uses Nested Loop joins when joining foreign tables.
During a Nested Loop join, the inner (right) table is scanned
multiple times - once for each row of the outer table. The FDW
framework calls re_scan() to restart the inner scan for each
outer row.

The ClickHouse FDW was using the default no-op re_scan()
implementation, which meant that after the first scan completed,
subsequent rescans returned no data because:
- is_scan_complete was still true
- row_receiver channel was exhausted
- No new streaming task was spawned

This fix implements re_scan() to restart the async streaming:
- Abort existing streaming task if running
- Reset scan state flags
- Reinitialize the bounded channel
- Spawn new streaming task with the same SQL query
- Fetch the first row to initialize the rescan

Also adds integration tests for inner, left, and right joins.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[clickhouse_fdw] Data is missing when joining two foreign tables

1 participant