perf: faster PSQL ingestion and queries via parallelism#686
perf: faster PSQL ingestion and queries via parallelism#686william-silversmith merged 13 commits intomasterfrom
Conversation
…d distinct labels Replace the single-threaded named-cursor SELECT DISTINCT with a parallel range-partitioned approach for the Postgres fast path in query(): - Add PG_RANGE_DISTINCT_SQL template that scans a [low, high) slice of the PK B-tree with hash aggregation for dedup within each range. - Add _parse_pg_binary_copy_bigint() to parse PG binary COPY output for a single BIGINT column into a numpy uint64 array. - Add _pg_parallel_distinct_labels() which splits the label keyspace into 8 non-overlapping ranges via MIN/MAX + np.linspace, queries each on a separate Postgres connection (= separate backend process = separate CPU core) using ThreadPoolExecutor, then concatenates the already-sorted results with no final sort step. - Update query() Postgres fast path to delegate to the new parallel function instead of using a single named cursor. Each worker sets work_mem=256MB to keep hash aggregation in-memory. Ranges are non-overlapping so the concatenated result is globally sorted without an extra sort pass.
…aints and binary COPY Significantly reduce Postgres ingestion time by eliminating the main bottlenecks: incremental B-tree maintenance and text-based COPY. - Defer PK/FK constraints: create file_lookup without PRIMARY KEY or FOREIGN KEY during bulk load. After all data is inserted, add them via ALTER TABLE which builds the B-tree in a single sort+bulk-load pass (~10x faster than incremental insertion). - Replace text COPY with binary COPY: add _build_pg_binary_copy_two_bigints() that constructs the PG binary COPY buffer using a numpy structured array (single tobytes() call). This replaces the old StringIO-based text COPY that did millions of f-string formats and .write() calls. - Vectorize value construction: build flat numpy arrays (labels_arr, fids_arr) instead of a list of (int, int) tuples, reducing memory allocations and GIL contention. - Skip redundant file_lbl index for Postgres: the PK (label, fid) leading column already covers label-only lookups. - Add ANALYZE after index creation to ensure the planner has accurate statistics for hash aggregation. - Remove unused AUTOINC/INTEGER variables from insert_index_files() that were only needed by the old text COPY path.
…st path Add a 'nthread' keyword argument (default=1) to query() that controls the number of threads used by _pg_parallel_distinct_labels(). Print a warning when nthread > 1 but the fast path or Postgres is not in use.
|
This looks pretty good except for a few bits of missing documentation and exclusion of other databased from logic that might benefit them. Did Claude one shot this? |
| print("Creating filename index...") | ||
| cur.execute("CREATE INDEX fname ON file_lookup (fid)") | ||
|
|
||
| if db_type == DbType.POSTGRES: |
There was a problem hiding this comment.
It looks like MySQL and SQLite also have ANALYZE
https://sqlite.org/lang_analyze.html
https://anotherboringtechblog.com/2024/05/analyze-in-mysql/
There was a problem hiding this comment.
Postgres needs ANALYZE to optimize hash aggregation when fetching all indices out of database. ANALYZE TABLE in mysql does not help the performance for our access pattern. Not sure about SQLite
…y_copy_bigint Add a reference to the PostgreSQL COPY file format documentation in the docstring, as requested in PR review.
Replace byte slicing with memoryview slicing to avoid an intermediate copy when parsing PG binary COPY output. np.frombuffer already accepts memoryview, so the body slice is now zero-copy.
Unify the duplicated b'PGCOPY\n\377\r\n\0' magic string into a single PG_BINARY_COPY_SIGNATURE constant to prevent future inconsistencies.
Extend the deferred constraint pattern (previously Postgres-only) to MySQL and SQLite. All DB types now create file_lookup without PK/FK during bulk insert, then add constraints post-load: - Postgres/MySQL: ALTER TABLE ADD PRIMARY KEY + ADD FOREIGN KEY - SQLite: CREATE UNIQUE INDEX (SQLite lacks ALTER TABLE ADD PRIMARY KEY)
The deferred PK/FK constraint creation was running unconditionally, ignoring the create_indices parameter. Move all post-load constraint and index creation under the if create_indices guard.
ANALYZE was previously only run for Postgres. MySQL and SQLite also benefit from updated table statistics for query planning. MySQL uses ANALYZE TABLE syntax; Postgres and SQLite use ANALYZE.
Add a comment explaining that numpy basic slicing returns views (no data copy), and that the actual copy occurs inside _build_pg_binary_copy_two_bigints when building the structured array.
MySQL's ANALYZE TABLE returns a result set (Table, Op, Msg_type, Msg_text columns). mysql.connector requires this to be consumed before calling conn.commit(), otherwise it raises 'InternalError: Unread result found'. Add cur.fetchall() to drain it.
The deferred constraint pattern (create table without PK/FK, then ALTER TABLE ADD PRIMARY KEY post-load) causes a ~2x slowdown on MySQL compared to the original inline PK/FK at CREATE TABLE time. Root cause: InnoDB uses a clustered index where the primary key IS the table's physical row storage order. When PK is defined at CREATE TABLE time, InnoDB inserts each row directly into its sorted position in the clustered B-tree -- this is incremental but efficient because the B-tree is maintained as rows arrive. When PK is deferred and added via ALTER TABLE ADD PRIMARY KEY after bulk loading, InnoDB must rebuild the entire table: it copies all rows, sorts them by the new PK columns, and reconstructs the clustered index from scratch. This is effectively a full table copy + sort, which is significantly slower than incremental insertion for large tables. This is the opposite of PostgreSQL, where heap tables are unordered (rows are appended to the heap in insertion order) and the PK is a separate B-tree structure. Building that B-tree in one bulk pass after all data is loaded avoids the overhead of incremental B-tree splits during insertion, and the UNLOGGED table mode eliminates WAL overhead entirely. SQLite also benefits from deferred indexing because its tables use a rowid-based heap similar to PostgreSQL, and CREATE UNIQUE INDEX after bulk load is faster than maintaining a UNIQUE constraint per-row. Also skips ANALYZE for MySQL -- InnoDB's ANALYZE TABLE performs random index dives but does not materially improve query plans for this workload and adds unnecessary overhead. ANALYZE is kept for Postgres and SQLite where it meaningfully helps the query planner. Also skips the redundant file_lbl index for MySQL (the PK leading column already covers label-only lookups, same as Postgres).
|
@william-silversmith, addressed most of the review points. I tested the deferred constraints approach for mariadb but it made the ingestion a lot slower, so that change was reverted with detailed comments from claude in the commit message. Have not tested sqlite, but I suppose we will not use sqlite to support large dataset anyway. |
|
Sqlite can sometimes be used for large datasets fyi. I used it in 2020 to
skeletonize minnie65 on Della. Will look when I get a chance!
…On Tue, Feb 24, 2026 at 9:17 PM ranlu ***@***.***> wrote:
*ranlu* left a comment (seung-lab/cloud-volume#686)
<#686 (comment)>
@william-silversmith <https://github.com/william-silversmith>, addressed
most of the review points. I tested the deferred constraints approach for
mariadb but it made the ingestion a lot slower, so that change was reverted
with detailed comments from claude in the commit message. Have not tested
sqlite, but I suppose we will not use sqlite to support large dataset
anyway.
—
Reply to this email directly, view it on GitHub
<#686 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AATGQSLSR6ADEAURCAGDNWL4NUA43AVCNFSM6AAAAACV4IK2IKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTSNJWGI4TSMZWGM>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
|
I think this looks good to me. If you have the time to run a sqlite test, that would be great. Otherwise LGTM. |
|
Tested sqlite, does not see a significant change in performance based on 5GB test, took ~10 minutes with or without the pr. |
|
Thank you for testing! |
Improve psql performance for ingestion and query all spatial indices, multithread with optimized psql command, so psql can process data with multiple cores. Extended the query call with a
nthreadparameter to specify the number of thread used for psql in fast path. The change speed up the spatial index ingestion and mesh merge task generation from > 8 hours to < 2 hours.