Skip to content

perf: faster PSQL ingestion and queries via parallelism#686

Merged
william-silversmith merged 13 commits intomasterfrom
psql_perf
Feb 26, 2026
Merged

perf: faster PSQL ingestion and queries via parallelism#686
william-silversmith merged 13 commits intomasterfrom
psql_perf

Conversation

@ranlu
Copy link
Contributor

@ranlu ranlu commented Feb 23, 2026

Improve psql performance for ingestion and query all spatial indices, multithread with optimized psql command, so psql can process data with multiple cores. Extended the query call with a nthread parameter to specify the number of thread used for psql in fast path. The change speed up the spatial index ingestion and mesh merge task generation from > 8 hours to < 2 hours.

…d distinct labels

Replace the single-threaded named-cursor SELECT DISTINCT with a
parallel range-partitioned approach for the Postgres fast path in
query():

- Add PG_RANGE_DISTINCT_SQL template that scans a [low, high) slice
  of the PK B-tree with hash aggregation for dedup within each range.
- Add _parse_pg_binary_copy_bigint() to parse PG binary COPY output
  for a single BIGINT column into a numpy uint64 array.
- Add _pg_parallel_distinct_labels() which splits the label keyspace
  into 8 non-overlapping ranges via MIN/MAX + np.linspace, queries
  each on a separate Postgres connection (= separate backend process
  = separate CPU core) using ThreadPoolExecutor, then concatenates
  the already-sorted results with no final sort step.
- Update query() Postgres fast path to delegate to the new parallel
  function instead of using a single named cursor.

Each worker sets work_mem=256MB to keep hash aggregation in-memory.
Ranges are non-overlapping so the concatenated result is globally
sorted without an extra sort pass.
…aints and binary COPY

Significantly reduce Postgres ingestion time by eliminating the main
bottlenecks: incremental B-tree maintenance and text-based COPY.

- Defer PK/FK constraints: create file_lookup without PRIMARY KEY or
  FOREIGN KEY during bulk load. After all data is inserted, add them
  via ALTER TABLE which builds the B-tree in a single sort+bulk-load
  pass (~10x faster than incremental insertion).
- Replace text COPY with binary COPY: add _build_pg_binary_copy_two_bigints()
  that constructs the PG binary COPY buffer using a numpy structured
  array (single tobytes() call). This replaces the old StringIO-based
  text COPY that did millions of f-string formats and .write() calls.
- Vectorize value construction: build flat numpy arrays (labels_arr,
  fids_arr) instead of a list of (int, int) tuples, reducing memory
  allocations and GIL contention.
- Skip redundant file_lbl index for Postgres: the PK (label, fid)
  leading column already covers label-only lookups.
- Add ANALYZE after index creation to ensure the planner has accurate
  statistics for hash aggregation.
- Remove unused AUTOINC/INTEGER variables from insert_index_files()
  that were only needed by the old text COPY path.
…st path

Add a 'nthread' keyword argument (default=1) to query() that controls
the number of threads used by _pg_parallel_distinct_labels(). Print a
warning when nthread > 1 but the fast path or Postgres is not in use.
@william-silversmith
Copy link
Contributor

This looks pretty good except for a few bits of missing documentation and exclusion of other databased from logic that might benefit them.

Did Claude one shot this?

print("Creating filename index...")
cur.execute("CREATE INDEX fname ON file_lookup (fid)")

if db_type == DbType.POSTGRES:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Postgres needs ANALYZE to optimize hash aggregation when fetching all indices out of database. ANALYZE TABLE in mysql does not help the performance for our access pattern. Not sure about SQLite

@william-silversmith william-silversmith changed the title Psql perf perf: faster PSQL ingestion and queries via parallelism Feb 23, 2026
@william-silversmith william-silversmith added the spatial-index Relates to the CloudVolume spatial index usually constructed for meshes and skeletons. label Feb 23, 2026
ranlu added 10 commits February 24, 2026 12:56
…y_copy_bigint

Add a reference to the PostgreSQL COPY file format documentation
in the docstring, as requested in PR review.
Replace byte slicing with memoryview slicing to avoid an intermediate
copy when parsing PG binary COPY output. np.frombuffer already accepts
memoryview, so the body slice is now zero-copy.
Unify the duplicated b'PGCOPY\n\377\r\n\0' magic string into a single
PG_BINARY_COPY_SIGNATURE constant to prevent future inconsistencies.
Extend the deferred constraint pattern (previously Postgres-only) to
MySQL and SQLite. All DB types now create file_lookup without PK/FK
during bulk insert, then add constraints post-load:
- Postgres/MySQL: ALTER TABLE ADD PRIMARY KEY + ADD FOREIGN KEY
- SQLite: CREATE UNIQUE INDEX (SQLite lacks ALTER TABLE ADD PRIMARY KEY)
The deferred PK/FK constraint creation was running unconditionally,
ignoring the create_indices parameter. Move all post-load constraint
and index creation under the if create_indices guard.
ANALYZE was previously only run for Postgres. MySQL and SQLite also
benefit from updated table statistics for query planning. MySQL uses
ANALYZE TABLE syntax; Postgres and SQLite use ANALYZE.
Add a comment explaining that numpy basic slicing returns views
(no data copy), and that the actual copy occurs inside
_build_pg_binary_copy_two_bigints when building the structured array.
MySQL's ANALYZE TABLE returns a result set (Table, Op, Msg_type,
Msg_text columns). mysql.connector requires this to be consumed
before calling conn.commit(), otherwise it raises
'InternalError: Unread result found'. Add cur.fetchall() to drain it.
The deferred constraint pattern (create table without PK/FK, then
ALTER TABLE ADD PRIMARY KEY post-load) causes a ~2x slowdown on MySQL
compared to the original inline PK/FK at CREATE TABLE time.

Root cause: InnoDB uses a clustered index where the primary key IS the
table's physical row storage order. When PK is defined at CREATE TABLE
time, InnoDB inserts each row directly into its sorted position in the
clustered B-tree -- this is incremental but efficient because the
B-tree is maintained as rows arrive.

When PK is deferred and added via ALTER TABLE ADD PRIMARY KEY after
bulk loading, InnoDB must rebuild the entire table: it copies all rows,
sorts them by the new PK columns, and reconstructs the clustered index
from scratch. This is effectively a full table copy + sort, which is
significantly slower than incremental insertion for large tables.

This is the opposite of PostgreSQL, where heap tables are unordered
(rows are appended to the heap in insertion order) and the PK is a
separate B-tree structure. Building that B-tree in one bulk pass after
all data is loaded avoids the overhead of incremental B-tree splits
during insertion, and the UNLOGGED table mode eliminates WAL overhead
entirely.

SQLite also benefits from deferred indexing because its tables use a
rowid-based heap similar to PostgreSQL, and CREATE UNIQUE INDEX after
bulk load is faster than maintaining a UNIQUE constraint per-row.

Also skips ANALYZE for MySQL -- InnoDB's ANALYZE TABLE performs random
index dives but does not materially improve query plans for this
workload and adds unnecessary overhead. ANALYZE is kept for Postgres
and SQLite where it meaningfully helps the query planner.

Also skips the redundant file_lbl index for MySQL (the PK leading
column already covers label-only lookups, same as Postgres).
@ranlu
Copy link
Contributor Author

ranlu commented Feb 25, 2026

@william-silversmith, addressed most of the review points. I tested the deferred constraints approach for mariadb but it made the ingestion a lot slower, so that change was reverted with detailed comments from claude in the commit message. Have not tested sqlite, but I suppose we will not use sqlite to support large dataset anyway.

@william-silversmith
Copy link
Contributor

william-silversmith commented Feb 25, 2026 via email

@william-silversmith
Copy link
Contributor

I think this looks good to me. If you have the time to run a sqlite test, that would be great. Otherwise LGTM.

@ranlu
Copy link
Contributor Author

ranlu commented Feb 26, 2026

Tested sqlite, does not see a significant change in performance based on 5GB test, took ~10 minutes with or without the pr.

@william-silversmith
Copy link
Contributor

Thank you for testing!

@william-silversmith william-silversmith merged commit fcc90e3 into master Feb 26, 2026
@william-silversmith william-silversmith deleted the psql_perf branch February 26, 2026 18:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

spatial-index Relates to the CloudVolume spatial index usually constructed for meshes and skeletons.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants