Feature/ducklake crude operations and data preview#197
Feature/ducklake crude operations and data preview#197
Conversation
- Implement DROP TABLE deletion for DuckDB/SQLite/PostgreSQL adapters - Finish DuckLakeService.deleteTable with validation + existence checks - Add DataLake tables UI delete action with confirmation dialog - Reuse existing React Query mutation/toast and invalidate tables cache
…PC channels
- Implement CatalogAdapter updateRows/deleteRows/upsertRows in duckdb/sqlite/postgresql adapters
- Return {rowsAffected} for DML operations
- Ensure IPC channel typing includes ducklake:table:updateRows/deleteRows/upsertRows
- Add addColumn/dropColumn to catalog adapter contract and implement for duckdb/sqlite/postgresql - Wire through DuckLakeService, IPC channels, renderer service, and React Query hooks - Add Schema tab UI in DataLakeTableDetails (add column dialog + drop column confirmation) - Prevent dropping partition columns; default value treated as raw SQL expression
- Add CatalogAdapter.renameColumn and implement in duckdb/sqlite/postgresql adapters - Add DuckLakeService.renameColumn with validation and partition-column guard - Wire IPC channel ducklake:table:renameColumn and renderer service method - Add React Query mutation useRenameDuckLakeColumn with cache invalidations - Add Schema-tab per-column Rename action + dialog in DataLakeTableDetails
- extend CatalogAdapter with alterColumnType and implement for duckdb/sqlite/postgresql - add DuckLakeService.alterColumnType with table/column validation and partition-column guard - wire IPC channel ducklake:table:alterColumnType + typed DuckLakeChannels update - add renderer service + React Query mutation with cache invalidations - add Schema tab UI action + dialog; fix related lint formatting
- add Data tab to DataLakeTableDetails with dialogs for UPDATE/DELETE/UPSERT SQL - wire dialogs to existing React Query mutations (useUpdateDuckLakeRows/useDeleteDuckLakeRows/useUpsertDuckLakeRows) - disable actions while mutations are running - fix tab index ordering after inserting Data tab
- add CatalogAdapter.setPartitionedBy and implement for duckdb/sqlite/postgresql - add DuckLakeService.setPartitionedBy with table/column validation - wire IPC channel ducklake:table:setPartitionedBy + typed channel updates - add renderer service + React Query mutation hook with cache invalidations - extend Import Data wizard with Partitioning step and apply partitioning after import
- Add DuckLake CDC types (DuckLakeTableChange) - Implement queryTableChanges in adapter + service and expose via IPC - Add renderer service + React Query hook for table changes - Add CDC UI (snapshot range selects + changes table) to table details - Fix TS/ESLint issues introduced during integration
…complete, utilizing a newly defined list of DuckLake supported types.
- Extracted specific tab logic from [DataLakeTableDetails.tsx] into individual child components (Overview, Schema, Data Rows, Statistics, Data Files, Partitions, History, Tags) located in `src/renderer/components/dataLake/tableDetails/`. - Moved utility functions (`formatBytes`, `formatNumber`, `safeToString`) to the global `src/renderer/helpers/utils.ts` to promote reuse and reduce duplication. - Simplified `DataLakeTableDetails` to act as a container component, improving maintainability and readability. - Cleaned up imports and removed the temporary local `utils.ts` file.
…rt options to data lake tables.
… pattern for table row updates.
…e a dynamic delete query builder.
… up console logs.
📝 WalkthroughWalkthroughAdds end-to-end DuckLake table/column/row mutation and snapshot restore capabilities: new renderer UI/tabs and hooks, new IPC channels and main-process handlers, extended DuckLakeService and adapter implementations (DuckDB/Postgres/SQLite), and supporting types/utilities. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant UI as Renderer (UI)
participant Rsvc as Renderer DuckLakeService
participant IPC as Electron IPC
participant MainHandler as Main IPC Handler
participant MSvc as Main DuckLakeService
participant Adapter as CatalogAdapter
participant DB as Database
UI->>Rsvc: trigger mutation (e.g. updateRows)
Rsvc->>IPC: invoke 'ducklake:table:updateRows'(instanceId, table, query)
IPC->>MainHandler: deliver channel call
MainHandler->>MSvc: DuckLakeService.updateRows(...)
MSvc->>Adapter: adapter.updateRows(table, updateQuery)
Adapter->>DB: execute SQL (UPDATE / CTAS / RETURNING)
DB-->>Adapter: result / rows affected
Adapter-->>MSvc: { rowsAffected }
MSvc-->>MainHandler: return result
MainHandler-->>IPC: IPC response
IPC-->>Rsvc: resolve promise { rowsAffected }
Rsvc-->>UI: onSuccess -> invalidate queries / refresh UI
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 17
🤖 Fix all issues with AI agents
In `@src/main/ipcHandlers/duckLake.ipcHandlers.ts`:
- Around line 182-216: The ipc handlers pass raw SQL strings to
DuckLakeService.updateRows/deleteRows/upsertRows without protection against
statement chaining; add validation inside those service methods (or a shared
validator called by them) to trim the query, ensure it matches the expected
single-statement prefix (case-insensitive), reject any query containing a
semicolon (or any additional SQL after the first statement) and/or explicitly
check there is only one statement token, and optionally migrate these methods to
parameterized execution to eliminate chaining entirely; update
DuckLakeService.updateRows, DuckLakeService.deleteRows and
DuckLakeService.upsertRows to perform this validation before executing the SQL
and return a clear error when rejected.
In `@src/main/services/duckLake.service.ts`:
- Around line 53-101: Add a strict format check for SQL type and default value
strings to prevent injection: in addColumn (and similarly in alterColumnType for
newType) validate columnType (and newType) against an allowlist regex such as
/^[A-Za-z_][A-Za-z0-9_() ,]*$/ after trimming and throw
DuckLakeError.validation('Invalid column type format') if it fails; also
validate defaultValue (when provided) with a conservative pattern appropriate
for literals (or whitelist simple numeric/quoted literal forms) and throw
DuckLakeError.validation('Invalid default value format') if it fails; place
these checks right after the existing non-empty checks in addColumn and in
alterColumnType before calling adapter.addColumn/alterColumnType.
- Around line 406-414: The validation in updateRows incorrectly allows CREATE
statements; change the check on normalizedQuery inside updateRows to only permit
UPDATE statements (remove the CREATE branch) so that any non-UPDATE query throws
DuckLakeError.validation('Query must be an UPDATE statement'); keep using the
existing normalizedQuery variable and the same error helper
(DuckLakeError.validation) and adjust the error message accordingly.
In `@src/main/services/duckLake/adapters/duckdb.adapter.ts`:
- Around line 779-817: The pagination count logic can both mis-handle zero
results and accidentally re-execute DML; modify the block that builds countQuery
so it only runs when the base SQL is a read/select statement (e.g., test the
original query/request.sql with a regex for /^\s*SELECT\b/i and skip counting
for INSERT|UPDATE|DELETE|MERGE|CREATE|ALTER|DROP), and change the falsy-zero
extraction to use nullish coalescing when reading countRow (replace
countRow.total || Object.values(countRow)[0] with countRow.total ??
Object.values(countRow)[0]). Ensure you still set totalRows from
Number(countVal) and keep the existing try/catch logging around the counting
code (refer to variables countQuery, countResult, countRows, countRow,
totalRows, and request.limit/request.offset).
- Around line 265-351: The affected-row counting in updateRows, deleteRows, and
upsertRows is unreliable because it uses result.getRows() (requires RETURNING);
replace that logic by executing a follow-up "SELECT changes() AS changes" on the
same DuckDB connection (this.connectionInfo.connection) after the initial run,
parse the first row's changes value into rowsAffected (fallback to 0 if
missing), and return that value; update the functions updateRows, deleteRows,
and upsertRows to perform this SELECT changes() query instead of trying to read
result.getRows().
In `@src/main/services/duckLake/adapters/postgresql.adapter.ts`:
- Around line 190-214: The ALTER column DDL in alterColumnType interpolates
newType directly into SQL (same risk exists for columnType in addColumn); add a
validation step inside alterColumnType (and reuse for addColumn) that checks
newType against an allowlist or strict regex of permitted PostgreSQL type tokens
(e.g., ^(SMALLINT|INTEGER|BIGINT|BOOLEAN|TEXT|DATE|TIMESTAMP|REAL|DOUBLE
PRECISION|VARCHAR\(\d+\)|CHAR\(\d+\))$ with case-insensitive matching),
normalize/canonicalize the token (uppercase) and reject anything that doesn't
match by throwing a clear error before running connection.run, so only
validated, safe type strings are ever interpolated into the ALTER TABLE
statement.
- Around line 109-139: The addColumn implementation interpolates defaultValue
directly into SQL, enabling injection; in addColumn (and the same method across
other adapters) validate and sanitize defaultValue before building
defaultClause: reject or strip any semicolons or SQL comment markers, allow only
simple literal patterns (e.g. numeric, boolean) or treat as a string by escaping
single quotes and surrounding with single quotes, and then use that sanitized
token when constructing the ALTER TABLE ... ADD COLUMN statement via
connectionInfo.connection.run; alternatively, enforce that callers pass
null/undefined for complex defaults and centralize validation so defaultClause
is only set to a safe, escaped literal.
In `@src/renderer/components/dataLake/DataLakeTablesView.tsx`:
- Around line 45-60: The partition mutation call using
setPartitionedByMutation.mutate lacks an onError handler so when it fails the
import wizard stays open; update the mutate call for setPartitionedByMutation
(the block calling setPartitionedByMutation.mutate with { instanceId, tableName,
columnNames: partitionColumns }) to include an onError callback that at minimum
calls setImportWizardOpen(false) (and also calls invalidateTables(instanceId) to
keep state consistent) so the wizard is closed and tables refreshed even when
the mutation errors.
In `@src/renderer/components/dataLake/tableDetails/TableDataFilesTab.tsx`:
- Around line 22-24: The prop TableDataFilesTabProps currently types
tableDetails as any and accesses tableDetails.dataFiles without guards; change
the prop type to a proper interface (e.g., TableDataFilesTabProps {
tableDetails: { dataFiles?: DuckLakeDataFileInfo[] } }) and update the
TableDataFilesTab component to use a safe access pattern (optional chaining or
defaulting to an empty array) when checking length and mapping (references:
TableDataFilesTabProps, tableDetails, dataFiles, TableDataFilesTab). Ensure all
occurrences replace raw .dataFiles usage with tableDetails.dataFiles ?? [] or
tableDetails.dataFiles?.length checks so no runtime .length on undefined occurs.
In `@src/renderer/components/dataLake/tableDetails/TableDataRowsTab.tsx`:
- Around line 242-255: Remove the useEffect that watches
updateRowsMutation.isSuccess / deleteRowsMutation.isSuccess /
upsertRowsMutation.isSuccess and calls fetchData(), and keep the explicit
fetchData() calls in the onSuccess handlers (handleConfirmUpdateRows,
handleConfirmDeleteRows, handleConfirmUpsertRows) so each mutation only triggers
one fetch; also reset each mutation's status after handling success (e.g., call
updateRowsMutation.reset(), deleteRowsMutation.reset(),
upsertRowsMutation.reset() in the onSuccess flows) to prevent stale isSuccess
from causing future redundant fetches when fetchData reference changes.
- Line 139: The export SQL builds identifiers using `"${tableName}"` without
escaping double-quotes inside tableName, which breaks SQL for names containing
quotes; update the construction in TableDataRowsTab (the exportQuery variable
and the other similar instances flagged) to escape double quotes in tableName by
replacing every " with "" before embedding (e.g., use tableName.replace(/"/g,
'""') when interpolating), so the identifier becomes a safely quoted string;
apply this same change to the other occurrences that build SELECT ... FROM
"${tableName}" in this file.
- Around line 280-348: The generated CTAS and delete query construction is
vulnerable to SQL injection because numeric/boolean inputs for updateFields and
whereConditions are interpolated raw when needsQuotes(...) is false; update the
React.useEffect logic that builds whereClauses and caseStatements to validate
and sanitize non-quoted values: for each updateFields[].value and
whereConditions[].value for columns where needsQuotes(col) === false, confirm
they parse as a valid number/boolean (e.g., Number(value) isFinite or
/^[+-]?\d+(\.\d+)?$/.test(value) for numbers, and strict match for true/false),
and if invalid either reject/skip the field (set generated query to empty and
surface an error) or coerce to a safe NULL/validated literal before
interpolation; also add the same validation/sanitization to the delete query
builder code path so both CREATE OR REPLACE TABLE generation (caseStatements,
updatedCols, whereClauses) and delete query builders never interpolate
unvalidated raw inputs.
In `@src/renderer/components/dataLake/tableDetails/TableOverviewTab.tsx`:
- Around line 96-133: The JSX in TableOverviewTab reads
tableDetails.stats.recordCount, fileSizeBytes and nextRowId directly which can
throw if stats is null/undefined; update the TableOverviewTab render to use safe
access and defaults (e.g., tableDetails.stats?.recordCount ?? 0 for
formatNumber, tableDetails.stats?.fileSizeBytes ?? 0 for formatBytes, and
tableDetails.stats?.nextRowId ?? 0) and also defensively handle arrays like
tableDetails.columns?.length ?? 0 and tableDetails.dataFiles?.length ?? 0 so the
component never dereferences undefined.
In `@src/renderer/components/dataLake/tableDetails/TablePartitionsTab.tsx`:
- Around line 34-70: tableDetails.partitionInfo may be truthy while its arrays
are undefined, causing .length and .map to crash; update all usages inside
TablePartitionsTab (references: tableDetails.partitionInfo,
partitionInfo.columns, partitionInfo.filePartitionValues, and
DuckLakePartitionColumn mapping) to guard with optional chaining or default
empty arrays (e.g., use partitionInfo?.columns ?? [] for .map and
(partitionInfo?.columns ?? []).length for counts) so safeToString and the row
render logic never receive undefined.
In `@src/renderer/helpers/utils.ts`:
- Around line 423-429: formatBytes currently passes negative numbers to Math.log
and can index past the sizes array for very large values; change formatBytes to
record the original sign, work with absBytes = Math.abs(bytes) (handle
non-finite/NaN by returning '0 Bytes' or similar), compute i =
Math.floor(Math.log(absBytes) / Math.log(k)) but clamp i to sizes.length - 1,
then format using sizes[i] and reapply the negative sign prefix if the original
bytes was negative; ensure all references are to formatBytes, sizes, i, k and
guard against division by zero when absBytes === 0.
In `@src/types/duckLake.ts`:
- Around line 544-548: The DuckLakeIpcChannels interface is missing type
signatures for eight IPC channels declared elsewhere; update the
DuckLakeIpcChannels interface (the type that currently contains
'ducklake:table:setPartitionedBy') to include the following method signatures
with matching parameter shapes: 'ducklake:table:rename' (instanceId: string,
oldName: string, newName: string) => Promise<void>, 'ducklake:table:addColumn'
(instanceId: string, tableName: string, columnDef: any) => Promise<void>,
'ducklake:table:dropColumn' (instanceId: string, tableName: string, columnName:
string) => Promise<void>, 'ducklake:table:renameColumn' (instanceId: string,
tableName: string, oldName: string, newName: string) => Promise<void>,
'ducklake:table:alterColumnType' (instanceId: string, tableName: string,
columnName: string, newType: string) => Promise<void>,
'ducklake:table:updateRows' (instanceId: string, tableName: string, filter: any,
updates: any) => Promise<number>, 'ducklake:table:deleteRows' (instanceId:
string, tableName: string, filter: any) => Promise<number>, and
'ducklake:table:upsertRows' (instanceId: string, tableName: string, rows: any[])
=> Promise<number> so the interface matches the IPC channel declarations.
In `@src/types/ipc.ts`:
- Around line 264-273: The union includes the channel
'ducklake:table:queryChanges' but there is no service/controller/IPC handler for
it; either remove 'ducklake:table:queryChanges' from the IPC union in the types,
or implement it by adding a service method (e.g., add queryChanges(...) to
DuckLakeService), a controller handler (e.g.,
duckLakeController.handleQueryChanges that calls DuckLakeService.queryChanges),
and wire the controller to the IPC dispatcher/registration so the
'ducklake:table:queryChanges' channel is handled consistently with other table
mutation channels like updateRows/deleteRows/upsertRows.
🧹 Nitpick comments (18)
src/renderer/config/ducklake.ts (1)
13-29: Hardcoded parameterized types limit user flexibility.
decimal(18,3),list(INTEGER),struct(name VARCHAR), andmap(VARCHAR, INTEGER)are single fixed instances rather than templates. If this drives a type-selection dropdown, users can't choose, e.g.,decimal(10,2)orlist(VARCHAR). Consider either allowing free-text input alongside the dropdown, or separating base types (decimal,list,struct,map) and prompting for parameters in a follow-up input.src/renderer/components/dataLake/DataLakeTableImportWizard.tsx (1)
114-141: Partition-column parsing is duplicated.The
.split(',').map(…).filter(…)logic appears identically inhandleImport(Lines 132-135) andrenderReview(Lines 317-320). Extract it once (e.g., auseMemoor a local helper) to keep them in sync and avoid subtle divergence.♻️ Suggested refactor
Add a helper or memo near the top of the component:
+ const parsedPartitionColumns = useMemo( + () => partitionColumnsText.split(',').map((c) => c.trim()).filter((c) => c.length > 0), + [partitionColumnsText], + );Then reference
parsedPartitionColumnsin bothhandleImportandrenderReview.src/main/ipcHandlers/utils.ipcHandlers.ts (2)
42-58: Consider passing the parentBrowserWindowto make dialogs modal.
dialog.showOpenDialog(options)anddialog.showSaveDialog(options)without aBrowserWindowargument create non-modal dialogs that can appear behind the application window on some platforms. Passing the focused window ensures correct modal behavior:- async (_event, options: OpenDialogOptions) => { - const result = await dialog.showOpenDialog(options); + async (event, options: OpenDialogOptions) => { + const win = BrowserWindow.fromWebContents(event.sender); + const result = win + ? await dialog.showOpenDialog(win, options) + : await dialog.showOpenDialog(options);Same pattern for
showSaveDialog.
43-49: IPC dialog options are passed through without sanitization.The renderer sends arbitrary
OpenDialogOptions/SaveDialogOptionsto the main process. While Electron's dialog APIs are inherently safe (native OS dialogs), consider allowlisting properties (e.g.,filters,defaultPath,properties) to follow the principle of least privilege for IPC boundaries.Also applies to: 52-58
src/renderer/components/dataLake/tableDetails/TableOverviewTab.tsx (1)
18-20: UseDuckLakeTableDetailsinstead ofanyfortableDetailsprop.All tab components (
TableOverviewTab,TableTagsTab,TableSchemaTab,TableHistoryTab, etc.) typetableDetailsasany, losing compile-time safety. TheDuckLakeTableDetailsinterface already exists insrc/types/duckLake.tsand matches the shape being accessed. This would catch property access errors at build time.♻️ Proposed fix
+import { DuckLakeTableDetails } from '../../../../types/duckLake'; + interface TableOverviewTabProps { - tableDetails: any; + tableDetails: DuckLakeTableDetails; }src/renderer/components/dataLake/tableDetails/TableTagsTab.tsx (1)
74-95: Column Tags table omits Snapshot Range, inconsistent with Table Tags.The Table Tags section displays
beginSnapshot/endSnapshotin a "Snapshot Range" column, but the Column Tags section omits it even thoughDuckLakeColumnTaghas both fields. Consider adding the snapshot range column to Column Tags for consistency.♻️ Proposed fix — add Snapshot Range to Column Tags
<TableHead> <TableRow> <TableCell>Column</TableCell> <TableCell>Key</TableCell> <TableCell>Value</TableCell> + <TableCell>Snapshot Range</TableCell> </TableRow> </TableHead> <TableBody> {tableDetails.columnTags.map( (tag: DuckLakeColumnTag, idx: number) => ( <TableRow key={idx}> <TableCell> <strong>{tag.columnName}</strong> </TableCell> <TableCell>{tag.key}</TableCell> <TableCell>{tag.value}</TableCell> + <TableCell> + {safeToString(tag.beginSnapshot)} + {tag.endSnapshot + ? ` - ${safeToString(tag.endSnapshot)}` + : ' (current)'} + </TableCell> </TableRow> ), )}src/renderer/components/dataLake/tableDetails/TableSchemaTab.tsx (2)
76-83: Memoize the partition column ID set to avoid re-creating it on every row render.
isPartitionColumnIdbuilds a newSeton each invocation. Since it's called once per column row in the render loop, the Set is needlessly rebuilt O(N) times. Extract it withuseMemo.♻️ Proposed fix
+import { useMemo } from 'react'; + + const partitionColumnIds = useMemo(() => { + return new Set<number>( + (tableDetails?.partitionInfo?.columns || []).map((c: any) => + Number(c.columnId), + ), + ); + }, [tableDetails?.partitionInfo?.columns]); + - const isPartitionColumnId = (columnId: number) => { - const ids = new Set<number>( - (tableDetails?.partitionInfo?.columns || []).map((c: any) => - Number(c.columnId), - ), - ); - return ids.has(Number(columnId)); - }; + const isPartitionColumnId = (columnId: number) => + partitionColumnIds.has(Number(columnId));
92-108: Dialog closes before mutation result — mutation errors won't re-open the dialog.
handleConfirmAddColumn(and the other confirm handlers) close the dialog and reset state immediately after callingmutate(). If the mutation fails, the user sees a toast but must re-open the dialog and re-enter all fields. Consider keeping the dialog open untilonSuccess/onSettled, or usemutateAsyncwith try/catch.This applies to all four confirm handlers (add, drop, rename, alter type).
src/renderer/components/dataLake/tableDetails/TablePartitionsTab.tsx (2)
82-94: Prefer a composite key over array index for file partition values.
DuckLakeFilePartitionValuehasdataFileIdandpartitionKeyIndexwhich together form a natural unique key. Using the array index (idx) can cause React reconciliation issues if the list is ever reordered or filtered.♻️ Suggested key
- (val: DuckLakeFilePartitionValue, idx: number) => ( - <TableRow key={idx}> + (val: DuckLakeFilePartitionValue) => ( + <TableRow key={`${val.dataFileId}-${val.partitionKeyIndex}`}>
21-23: Consider stronger typing fortableDetails.
tableDetails: anyloses all type safety. If there's a sharedDuckLakeTableDetailsinterface (or similar) used by sibling tab components, using it here would catch shape mismatches at compile time.src/renderer/components/dataLake/DataLakeTables.tsx (2)
102-115: Delete and rename mutations fire-and-forget: dialog closes before result.
handleConfirmDelete(and similarlyhandleConfirmRename) closes the dialog and clears state immediately after callingmutate, not in the mutation'sonSuccess. If the mutation is slow or fails, the user has already lost the dialog context. Consider moving the dialog-close logic into theonSuccess/onErrorcallbacks of the.mutate()call.♻️ Example for delete
const handleConfirmDelete = () => { if (!tableToDelete) { setDeleteDialogOpen(false); return; } deleteTableMutation.mutate( { instanceId: tableToDelete.instanceId, tableName: tableToDelete.name, }, + { + onSettled: () => { + setDeleteDialogOpen(false); + setTableToDelete(null); + }, + }, ); - - setDeleteDialogOpen(false); - setTableToDelete(null); };
123-138: Rename allows submitting the same name and doesn't trim whitespace.The disabled check on Line 385 blocks empty/whitespace-only names, but
newTableNameis submitted untrimmed (Line 132). Also, renaming to the same name isn't blocked—this will hit the backend and return an error. A quick client-side guard would improve UX.♻️ Proposed fix
const handleConfirmRename = () => { if (!tableToRename) { setRenameDialogOpen(false); return; } + const trimmedName = newTableName.trim(); + if (trimmedName === tableToRename.name) return; + renameTableMutation.mutate({ instanceId: tableToRename.instanceId, oldName: tableToRename.name, - newName: newTableName, + newName: trimmedName, });src/renderer/components/dataLake/tableDetails/TableDataRowsTab.tsx (1)
526-791: Significant duplication between the Update and Delete dialog WHERE-condition builders.The WHERE section UI (column select, operator select, value input, add/remove row) is nearly identical between the Update dialog (Lines 638–752) and Delete dialog (Lines 803–917). Extracting a shared
WhereConditionsBuildercomponent would eliminate ~120 lines of duplication and keep the two dialogs in sync for future changes.src/renderer/controllers/duckLake.controller.ts (1)
54-396: Extract shared cache invalidation logic to reduce duplication.Nine of the new hooks contain the identical four-line invalidation block for
tableDetails,snapshots,table, andtables. A small helper would DRY this up and ensure consistency if keys are added/changed later.♻️ Suggested helper
function invalidateTableCaches( queryClient: ReturnType<typeof useQueryClient>, instanceId: string, tableName: string, ) { queryClient.invalidateQueries(duckLakeKeys.tableDetails(instanceId, tableName)); queryClient.invalidateQueries(duckLakeKeys.snapshots(instanceId, tableName)); queryClient.invalidateQueries(duckLakeKeys.table(instanceId, tableName)); queryClient.invalidateQueries(duckLakeKeys.tables(instanceId)); }Then in each hook's
onSuccess:- queryClient.invalidateQueries(duckLakeKeys.tableDetails(instanceId, tableName)); - queryClient.invalidateQueries(duckLakeKeys.snapshots(instanceId, tableName)); - queryClient.invalidateQueries(duckLakeKeys.table(instanceId, tableName)); - queryClient.invalidateQueries(duckLakeKeys.tables(instanceId)); + invalidateTableCaches(queryClient, instanceId, tableName);src/main/services/duckLake/adapters/postgresql.adapter.ts (1)
267-349:updateRows,deleteRows, andupsertRowsare nearly identical — extract shared logic.All three methods share the same pattern: connection check → run query →
SELECT changes()→ parse hugeint → returnrowsAffected. Only the error message differs. Consider extracting a private helper.Proposed helper extraction
+ private async executeAndCountChanges( + query: string, + operationName: string, + ): Promise<{ rowsAffected: number }> { + if (!this.connectionInfo) { + throw new Error('No active connection'); + } + try { + await this.connectionInfo.connection.run(query); + const changesResult = await this.connectionInfo.connection.run( + 'SELECT changes() as changes', + ); + const rows = await changesResult.getRows(); + const value = rows?.[0]?.[0] ?? 0; + const numeric = + typeof value === 'object' && value?.hugeint !== undefined + ? Number(value.hugeint) + : Number(value); + return { rowsAffected: Number.isFinite(numeric) ? numeric : 0 }; + } catch (error) { + console.error(`Failed to ${operationName} rows:`, error); + throw error; + } + }Then each method becomes a one-liner delegating to this helper. This could also be lifted to the base adapter since SQLite uses the same pattern.
src/main/services/duckLake/adapters/sqlite.adapter.ts (1)
88-348: All mutation methods are duplicated verbatim across DuckDB, PostgreSQL, and SQLite adapters.Since all three adapters execute DuckDB SQL through the same DuckDB engine (only the attach/connect path differs), the mutation methods (
restoreSnapshot,addColumn,dropColumn,renameColumn,alterColumnType,setPartitionedBy,renameTable,deleteTable,updateRows,deleteRows,upsertRows) are identical across adapters. Only the error log prefix changes (e.g., "SQLite" vs "PostgreSQL").Consider implementing these once in the base
CatalogAdapterclass as concrete (non-abstract) methods, since they all share the samethis.connectionInfo.connection.run(...)pattern.Sketch: lift common mutations to the base class
// In base.adapter.ts — replace the abstract declarations with concrete implementations: - abstract deleteTable(tableName: string): Promise<void>; + async deleteTable(tableName: string): Promise<void> { + if (!this.connectionInfo) throw new Error('No active connection'); + const escaped = tableName.replace(/"/g, '""'); + await this.connectionInfo.connection.run(`DROP TABLE IF EXISTS "${escaped}"`); + } // Same for addColumn, dropColumn, renameColumn, alterColumnType, // setPartitionedBy, renameTable, restoreSnapshot, and the row mutation methods.This eliminates ~700+ lines of near-identical code and ensures behavioral consistency across adapters. Any adapter needing custom behavior can override the method.
src/main/services/duckLake/adapters/base.adapter.ts (2)
542-568:mapResultRow— currently unused; adapters perform domain-specific row mapping elsewhere.This utility centralizes array-to-object row normalization, but it is not called anywhere in the codebase. The adapters (duckdb, postgresql, sqlite) inline equivalent logic extensively—checking
Array.isArray(row)and accessingrow[0],row[1], etc. to build typed domain objects (SnapshotMetadata, ColumnStatistics, DataFileMetadata, etc.).Consider adding a TODO comment if this is intended for future use, or refactor adapters to use
mapResultRowif its generic column-mapping behavior aligns with adapter needs.
517-540:getMetadataDatabaseName— apply to adapters to eliminate duplication.The extracted helper method is defined but not used. All three adapter implementations (DuckDB, PostgreSQL, SQLite) inline the same metadata database lookup logic across multiple methods instead of calling this helper. This creates ~10+ duplicate query patterns throughout the codebase.
| async updateRows( | ||
| _tableName: string, | ||
| updateQuery: string, | ||
| ): Promise<{ rowsAffected: number }> { | ||
| try { | ||
| if (!this.connectionInfo) { | ||
| throw new Error('No active connection'); | ||
| } | ||
|
|
||
| const result = await this.connectionInfo.connection.run(updateQuery); | ||
|
|
||
| // Try to get rows to determine affected count (requires RETURNING clause in query) | ||
| let rowsAffected = 0; | ||
| try { | ||
| const rows = await result.getRows(); | ||
| if (rows && rows.length > 0) { | ||
| rowsAffected = rows.length; | ||
| } | ||
| } catch (e) { | ||
| // Ignore errors if result doesn't support getRows | ||
| } | ||
|
|
||
| return { rowsAffected }; | ||
| } catch (error) { | ||
| // eslint-disable-next-line no-console | ||
| console.error('Failed to update rows:', error); | ||
| throw error; | ||
| } | ||
| } | ||
|
|
||
| async deleteRows( | ||
| _tableName: string, | ||
| deleteQuery: string, | ||
| ): Promise<{ rowsAffected: number }> { | ||
| try { | ||
| if (!this.connectionInfo) { | ||
| throw new Error('No active connection'); | ||
| } | ||
|
|
||
| const result = await this.connectionInfo.connection.run(deleteQuery); | ||
|
|
||
| let rowsAffected = 0; | ||
| try { | ||
| const rows = await result.getRows(); | ||
| if (rows && rows.length > 0) { | ||
| rowsAffected = rows.length; | ||
| } | ||
| } catch (e) { | ||
| // Ignore | ||
| } | ||
|
|
||
| return { rowsAffected }; | ||
| } catch (error) { | ||
| // eslint-disable-next-line no-console | ||
| console.error('Failed to delete rows:', error); | ||
| throw error; | ||
| } | ||
| } | ||
|
|
||
| async upsertRows( | ||
| _tableName: string, | ||
| upsertQuery: string, | ||
| ): Promise<{ rowsAffected: number }> { | ||
| try { | ||
| if (!this.connectionInfo) { | ||
| throw new Error('No active connection'); | ||
| } | ||
|
|
||
| const result = await this.connectionInfo.connection.run(upsertQuery); | ||
|
|
||
| let rowsAffected = 0; | ||
| try { | ||
| const rows = await result.getRows(); | ||
| if (rows && rows.length > 0) { | ||
| rowsAffected = rows.length; | ||
| } | ||
| } catch (e) { | ||
| // Ignore | ||
| } | ||
|
|
||
| return { rowsAffected }; | ||
| } catch (error) { | ||
| // eslint-disable-next-line no-console | ||
| console.error('Failed to upsert rows:', error); | ||
| throw error; | ||
| } | ||
| } |
There was a problem hiding this comment.
DuckDB rowsAffected counting is unreliable compared to PostgreSQL/SQLite adapters.
The DuckDB adapter counts affected rows via result.getRows().length, which only works if the query includes a RETURNING clause. Without it, rowsAffected will always be 0. In contrast, the PostgreSQL and SQLite adapters use SELECT changes() which reliably returns the count regardless of the query shape.
Since all three adapters run queries through DuckDB's engine (DuckLake), SELECT changes() should work here too.
Proposed fix: use `changes()` consistently
async updateRows(
_tableName: string,
updateQuery: string,
): Promise<{ rowsAffected: number }> {
try {
if (!this.connectionInfo) {
throw new Error('No active connection');
}
- const result = await this.connectionInfo.connection.run(updateQuery);
-
- // Try to get rows to determine affected count (requires RETURNING clause in query)
- let rowsAffected = 0;
- try {
- const rows = await result.getRows();
- if (rows && rows.length > 0) {
- rowsAffected = rows.length;
- }
- } catch (e) {
- // Ignore errors if result doesn't support getRows
- }
-
- return { rowsAffected };
+ await this.connectionInfo.connection.run(updateQuery);
+ const changesResult = await this.connectionInfo.connection.run(
+ 'SELECT changes() as changes',
+ );
+ const rows = await changesResult.getRows();
+ const value = rows?.[0]?.[0] ?? 0;
+ const numeric =
+ typeof value === 'object' && value?.hugeint !== undefined
+ ? Number(value.hugeint)
+ : Number(value);
+ return { rowsAffected: Number.isFinite(numeric) ? numeric : 0 };
} catch (error) {
// eslint-disable-next-line no-console
console.error('Failed to update rows:', error);
throw error;
}
}Apply the same pattern to deleteRows and upsertRows.
🤖 Prompt for AI Agents
In `@src/main/services/duckLake/adapters/duckdb.adapter.ts` around lines 265 -
351, The affected-row counting in updateRows, deleteRows, and upsertRows is
unreliable because it uses result.getRows() (requires RETURNING); replace that
logic by executing a follow-up "SELECT changes() AS changes" on the same DuckDB
connection (this.connectionInfo.connection) after the initial run, parse the
first row's changes value into rowsAffected (fallback to 0 if missing), and
return that value; update the functions updateRows, deleteRows, and upsertRows
to perform this SELECT changes() query instead of trying to read
result.getRows().
| <Table size="small"> | ||
| <TableBody> | ||
| <TableRow> | ||
| <TableCell> | ||
| <strong>Record Count</strong> | ||
| </TableCell> | ||
| <TableCell> | ||
| {formatNumber(tableDetails.stats.recordCount)} | ||
| </TableCell> | ||
| </TableRow> | ||
| <TableRow> | ||
| <TableCell> | ||
| <strong>File Size</strong> | ||
| </TableCell> | ||
| <TableCell> | ||
| {formatBytes(tableDetails.stats.fileSizeBytes)} | ||
| </TableCell> | ||
| </TableRow> | ||
| <TableRow> | ||
| <TableCell> | ||
| <strong>Next Row ID</strong> | ||
| </TableCell> | ||
| <TableCell> | ||
| {formatNumber(tableDetails.stats.nextRowId)} | ||
| </TableCell> | ||
| </TableRow> | ||
| <TableRow> | ||
| <TableCell> | ||
| <strong>Columns</strong> | ||
| </TableCell> | ||
| <TableCell>{tableDetails.columns.length}</TableCell> | ||
| </TableRow> | ||
| <TableRow> | ||
| <TableCell> | ||
| <strong>Data Files</strong> | ||
| </TableCell> | ||
| <TableCell>{tableDetails.dataFiles.length}</TableCell> | ||
| </TableRow> |
There was a problem hiding this comment.
Potential runtime error if tableDetails.stats is missing or has unexpected shape.
formatNumber and formatBytes are called directly on tableDetails.stats.recordCount, tableDetails.stats.fileSizeBytes, and tableDetails.stats.nextRowId. If stats is ever null/undefined (e.g., newly created table with no stats yet), this will throw. Given the any typing, the compiler won't catch this.
Consider adding a guard or defaulting: tableDetails.stats?.recordCount ?? 0.
🤖 Prompt for AI Agents
In `@src/renderer/components/dataLake/tableDetails/TableOverviewTab.tsx` around
lines 96 - 133, The JSX in TableOverviewTab reads
tableDetails.stats.recordCount, fileSizeBytes and nextRowId directly which can
throw if stats is null/undefined; update the TableOverviewTab render to use safe
access and defaults (e.g., tableDetails.stats?.recordCount ?? 0 for
formatNumber, tableDetails.stats?.fileSizeBytes ?? 0 for formatBytes, and
tableDetails.stats?.nextRowId ?? 0) and also defensively handle arrays like
tableDetails.columns?.length ?? 0 and tableDetails.dataFiles?.length ?? 0 so the
component never dereferences undefined.
| {tableDetails.partitionInfo ? ( | ||
| <Box> | ||
| <Typography variant="subtitle1" gutterBottom> | ||
| Partition Columns ({tableDetails.partitionInfo.columns.length}) | ||
| </Typography> | ||
| <TableContainer sx={{ mb: 3 }}> | ||
| <Table> | ||
| <TableHead> | ||
| <TableRow> | ||
| <TableCell>Key Index</TableCell> | ||
| <TableCell>Column Name</TableCell> | ||
| <TableCell>Transform</TableCell> | ||
| </TableRow> | ||
| </TableHead> | ||
| <TableBody> | ||
| {tableDetails.partitionInfo.columns.map( | ||
| (col: DuckLakePartitionColumn) => ( | ||
| <TableRow | ||
| key={`${col.partitionId}-${safeToString(col.partitionKeyIndex)}`} | ||
| > | ||
| <TableCell> | ||
| {safeToString(col.partitionKeyIndex)} | ||
| </TableCell> | ||
| <TableCell> | ||
| <strong>{col.columnName}</strong> | ||
| </TableCell> | ||
| <TableCell>{col.transform || 'identity'}</TableCell> | ||
| </TableRow> | ||
| ), | ||
| )} | ||
| </TableBody> | ||
| </Table> | ||
| </TableContainer> | ||
|
|
||
| <Typography variant="subtitle1" gutterBottom> | ||
| File Partition Values ( | ||
| {tableDetails.partitionInfo.filePartitionValues.length}) |
There was a problem hiding this comment.
Guard against missing columns or filePartitionValues when partitionInfo is truthy.
If partitionInfo exists but columns or filePartitionValues is undefined/null, accessing .length and .map will throw at runtime (Lines 37, 49, 70, 82). Consider optional chaining or default arrays.
🛡️ Proposed safeguard
- {tableDetails.partitionInfo ? (
+ {tableDetails.partitionInfo ? (
<Box>
<Typography variant="subtitle1" gutterBottom>
- Partition Columns ({tableDetails.partitionInfo.columns.length})
+ Partition Columns ({tableDetails.partitionInfo.columns?.length ?? 0})
</Typography>
<TableContainer sx={{ mb: 3 }}>
<Table>
...
<TableBody>
- {tableDetails.partitionInfo.columns.map(
+ {(tableDetails.partitionInfo.columns ?? []).map(Apply the same pattern to filePartitionValues.
🤖 Prompt for AI Agents
In `@src/renderer/components/dataLake/tableDetails/TablePartitionsTab.tsx` around
lines 34 - 70, tableDetails.partitionInfo may be truthy while its arrays are
undefined, causing .length and .map to crash; update all usages inside
TablePartitionsTab (references: tableDetails.partitionInfo,
partitionInfo.columns, partitionInfo.filePartitionValues, and
DuckLakePartitionColumn mapping) to guard with optional chaining or default
empty arrays (e.g., use partitionInfo?.columns ?? [] for .map and
(partitionInfo?.columns ?? []).length for counts) so safeToString and the row
render logic never receive undefined.
| export const formatBytes = (bytes: number) => { | ||
| if (bytes === 0) return '0 Bytes'; | ||
| const k = 1024; | ||
| const sizes = ['Bytes', 'KB', 'MB', 'GB', 'TB']; | ||
| const i = Math.floor(Math.log(bytes) / Math.log(k)); | ||
| return `${Math.round((bytes / k ** i) * 100) / 100} ${sizes[i]}`; | ||
| }; |
There was a problem hiding this comment.
formatBytes breaks on negative or extremely large values.
Math.log(negative) returns NaN, causing the output to be "NaN undefined". Similarly, values beyond the TB range will index past the sizes array, producing undefined as the unit.
🛡️ Proposed defensive fix
export const formatBytes = (bytes: number) => {
if (bytes === 0) return '0 Bytes';
+ if (bytes < 0) return '-';
const k = 1024;
const sizes = ['Bytes', 'KB', 'MB', 'GB', 'TB'];
- const i = Math.floor(Math.log(bytes) / Math.log(k));
+ const i = Math.min(Math.floor(Math.log(bytes) / Math.log(k)), sizes.length - 1);
return `${Math.round((bytes / k ** i) * 100) / 100} ${sizes[i]}`;
};🤖 Prompt for AI Agents
In `@src/renderer/helpers/utils.ts` around lines 423 - 429, formatBytes currently
passes negative numbers to Math.log and can index past the sizes array for very
large values; change formatBytes to record the original sign, work with absBytes
= Math.abs(bytes) (handle non-finite/NaN by returning '0 Bytes' or similar),
compute i = Math.floor(Math.log(absBytes) / Math.log(k)) but clamp i to
sizes.length - 1, then format using sizes[i] and reapply the negative sign
prefix if the original bytes was negative; ensure all references are to
formatBytes, sizes, i, k and guard against division by zero when absBytes === 0.
| 'ducklake:table:setPartitionedBy': ( | ||
| instanceId: string, | ||
| tableName: string, | ||
| columnNames: string[], | ||
| ) => Promise<void>; |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check if the other new ducklake:table:* IPC channels are declared in ipc.ts or elsewhere
rg -n "ducklake:table:rename\b|ducklake:table:addColumn|ducklake:table:dropColumn|ducklake:table:renameColumn|ducklake:table:alterColumnType|ducklake:table:updateRows|ducklake:table:deleteRows|ducklake:table:upsertRows" --type=ts -g '*.ts' -g '!*.ipcHandlers.ts'Repository: rosettadb/dbt-studio
Length of output: 1139
🏁 Script executed:
# Also check duckLake.ts to see all declared channels
cat -n src/types/duckLake.ts | grep -A 50 "DuckLakeIpcChannels"Repository: rosettadb/dbt-studio
Length of output: 2099
🏁 Script executed:
# Check ipc.ts for DuckLake channel declarations
cat -n src/types/ipc.ts | head -100Repository: rosettadb/dbt-studio
Length of output: 3372
🏁 Script executed:
# Search for where IPC handlers are defined
fd -e ts -x grep -l "ducklake:table:" {}Repository: rosettadb/dbt-studio
Length of output: 197
🏁 Script executed:
# View the context around lines 264-272 in ipc.ts
sed -n '250,280p' src/types/ipc.tsRepository: rosettadb/dbt-studio
Length of output: 999
🏁 Script executed:
# Also check if there are more DuckLake channels after line 272
sed -n '260,290p' src/types/ipc.tsRepository: rosettadb/dbt-studio
Length of output: 935
🏁 Script executed:
# View more of the DuckLakeIpcChannels interface in duckLake.ts to see all declared channels
sed -n '512,600p' src/types/duckLake.ts | head -100Repository: rosettadb/dbt-studio
Length of output: 2796
🏁 Script executed:
# Check if there are more table management channels in the duckLake.ts interface
rg -A 100 "export interface DuckLakeIpcChannels" src/types/duckLake.ts | grep "ducklake:table"Repository: rosettadb/dbt-studio
Length of output: 298
Add missing DuckLake IPC channel type signatures to the interface.
Eight of the nine new IPC channels are declared in src/types/ipc.ts but missing from the DuckLakeIpcChannels interface in src/types/duckLake.ts. Only setPartitionedBy has been added to the interface. The missing channels are: ducklake:table:rename, ducklake:table:addColumn, ducklake:table:dropColumn, ducklake:table:renameColumn, ducklake:table:alterColumnType, ducklake:table:updateRows, ducklake:table:deleteRows, and ducklake:table:upsertRows. Add their type signatures to the interface to keep all DuckLake channel declarations consistent in one location.
🤖 Prompt for AI Agents
In `@src/types/duckLake.ts` around lines 544 - 548, The DuckLakeIpcChannels
interface is missing type signatures for eight IPC channels declared elsewhere;
update the DuckLakeIpcChannels interface (the type that currently contains
'ducklake:table:setPartitionedBy') to include the following method signatures
with matching parameter shapes: 'ducklake:table:rename' (instanceId: string,
oldName: string, newName: string) => Promise<void>, 'ducklake:table:addColumn'
(instanceId: string, tableName: string, columnDef: any) => Promise<void>,
'ducklake:table:dropColumn' (instanceId: string, tableName: string, columnName:
string) => Promise<void>, 'ducklake:table:renameColumn' (instanceId: string,
tableName: string, oldName: string, newName: string) => Promise<void>,
'ducklake:table:alterColumnType' (instanceId: string, tableName: string,
columnName: string, newType: string) => Promise<void>,
'ducklake:table:updateRows' (instanceId: string, tableName: string, filter: any,
updates: any) => Promise<number>, 'ducklake:table:deleteRows' (instanceId:
string, tableName: string, filter: any) => Promise<number>, and
'ducklake:table:upsertRows' (instanceId: string, tableName: string, rows: any[])
=> Promise<number> so the interface matches the IPC channel declarations.
| | 'ducklake:table:rename' | ||
| | 'ducklake:table:addColumn' | ||
| | 'ducklake:table:dropColumn' | ||
| | 'ducklake:table:renameColumn' | ||
| | 'ducklake:table:alterColumnType' | ||
| | 'ducklake:table:setPartitionedBy' | ||
| | 'ducklake:table:updateRows' | ||
| | 'ducklake:table:deleteRows' | ||
| | 'ducklake:table:upsertRows' | ||
| | 'ducklake:table:queryChanges' |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Search for all occurrences of queryChanges
echo "=== All occurrences of 'queryChanges' ==="
rg 'queryChanges' --type=ts -i -n
echo ""
echo "=== Checking src/types/ipc.ts for context ==="
rg -B5 -A5 'queryChanges' src/types/ipc.ts
echo ""
echo "=== Looking for handler registration/wiring ==="
rg 'on.*queryChanges|queryChanges.*handler|ipcMain.*queryChanges' --type=ts -n
echo ""
echo "=== Checking for service implementation files ==="
find . -name '*duckLake*service*' -o -name '*ipc*handler*' -o -name '*channel*handler*' | grep -v node_modulesRepository: rosettadb/dbt-studio
Length of output: 752
🏁 Script executed:
#!/bin/bash
echo "=== Main service file ==="
cat -n src/main/services/duckLake.service.ts | grep -i 'changes\|query' || echo "No matches found"
echo ""
echo "=== Renderer service file ==="
cat -n src/renderer/services/duckLake.service.ts | grep -i 'changes\|query' || echo "No matches found"
echo ""
echo "=== Check surrounding channels for implementation pattern ==="
echo "Checking if updateRows/deleteRows/upsertRows have handlers:"
rg 'updateRows|deleteRows|upsertRows' --type=ts -n | head -20Repository: rosettadb/dbt-studio
Length of output: 4668
Remove ducklake:table:queryChanges from the channel union or implement its handler and service method.
This channel is declared in the type union but has no corresponding service method, controller handler, or IPC wiring anywhere in the codebase. All other table mutation channels (updateRows, deleteRows, upsertRows) have complete implementations with service methods in src/renderer/services/duckLake.service.ts and handlers in src/renderer/controllers/duckLake.controller.ts. Either remove this dead channel or implement it fully.
🤖 Prompt for AI Agents
In `@src/types/ipc.ts` around lines 264 - 273, The union includes the channel
'ducklake:table:queryChanges' but there is no service/controller/IPC handler for
it; either remove 'ducklake:table:queryChanges' from the IPC union in the types,
or implement it by adding a service method (e.g., add queryChanges(...) to
DuckLakeService), a controller handler (e.g.,
duckLakeController.handleQueryChanges that calls DuckLakeService.queryChanges),
and wire the controller to the IPC dispatcher/registration so the
'ducklake:table:queryChanges' channel is handled consistently with other table
mutation channels like updateRows/deleteRows/upsertRows.
…aining and injection, replacing inline query validation in `updateTable`, `deleteTable`, and `upsertTable`.
…led file/row statistics in the DuckDB adapter and UI.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@src/main/services/duckLake.service.ts`:
- Around line 75-88: The SQL comment detection currently only matches `--` at
line ends; update the suspiciousPatterns/check to catch inline `--` comments too
by either adding a pattern like /--/m or /--(?=\s|$)/ to the suspiciousPatterns
array or, preferably, strip SQL comments (both `--` single-line and `/*...*/`
multi-line) from queryWithoutTerminalSemicolon before running the
prefix/validation checks; make sure the code that throws
DuckLakeError.validation('Query contains suspicious patterns that may indicate
SQL injection') still references the same suspiciousPatterns and
queryWithoutTerminalSemicolon so all inline `--` occurrences are detected and
cause the validation error.
In `@src/main/services/duckLake/adapters/duckdb.adapter.ts`:
- Around line 87-105: The restoreSnapshot implementation uses invalid DuckDB
syntax and will lose DuckLake lineage by recreating the table; update the
restore logic in the restoreSnapshot method (and the analogous method in
postgresql.adapter.ts) to use DuckLake time-travel syntax (SELECT ... AT
(VERSION => snapshotId)) instead of `FOR SYSTEM_VERSION AS OF`, and restore data
by inserting the time-traveled rows into the existing table to preserve row
lineage (e.g., run a transaction that truncates or deletes current rows from the
target table and then runs an INSERT INTO "<escapedName>" SELECT * FROM
"<escapedName>" AT (VERSION => snapshotId) via connectionInfo.connection.run),
ensuring you keep the escapedName handling and error handling around
connectionInfo and rethrow on failure.
🧹 Nitpick comments (3)
src/main/services/duckLake.service.ts (1)
112-174: Significant code duplication across DDL methods.Every method repeats the same pattern: validate inputs →
listTables→ check existence →getTableDetails→ check columns → call adapter. Consider extracting shared helpers likeensureTableExists(adapter, tableName)andgetActiveColumn(adapter, tableName, columnName)to reduce boilerplate and make future changes easier.Also applies to: 176-229, 231-302, 304-367, 369-425, 427-465
src/renderer/components/dataLake/DataLakeInstanceDetails.tsx (1)
763-784: LocalsafeToStringduplicates the shared utility insrc/renderer/helpers/utils.ts.This component defines its own
safeToStringwith a slightly different implementation (it additionally handlesBigInt). Consider consolidating into the shared utility — add theBigInthandling there and import it here to avoid divergence.src/renderer/components/dataLake/tableDetails/TableHistoryTab.tsx (1)
28-32:tableDetailsprop is typed asany.Consider using the
DuckLakeTableDetailstype fromsrc/types/duckLake.tsfor better type safety, which would catch issues like accessingtableDetails.snapshotswhen it might not exist.
| // Additional check: ensure no suspicious patterns that could indicate injection | ||
| // Check for common SQL injection patterns after semicolon removal | ||
| const suspiciousPatterns = [ | ||
| /--\s*$/m, // SQL comments at the end | ||
| /\/\*.*\*\//s, // Multi-line comments | ||
| ]; | ||
|
|
||
| suspiciousPatterns.forEach((pattern) => { | ||
| if (pattern.test(queryWithoutTerminalSemicolon)) { | ||
| throw DuckLakeError.validation( | ||
| 'Query contains suspicious patterns that may indicate SQL injection', | ||
| ); | ||
| } | ||
| }); |
There was a problem hiding this comment.
SQL comment detection is incomplete — inline -- comments are not caught.
The regex /--\s*$/m only matches -- at the end of a line. A query like UPDATE t SET x=1 -- comment\nDROP TABLE t would pass all checks because the semicolon check was already stripped by the earlier logic and the -- isn't at EOL in the trimmed single-statement form. Consider anchoring on -- anywhere (not just at line end) or stripping comments before prefix validation.
🤖 Prompt for AI Agents
In `@src/main/services/duckLake.service.ts` around lines 75 - 88, The SQL comment
detection currently only matches `--` at line ends; update the
suspiciousPatterns/check to catch inline `--` comments too by either adding a
pattern like /--/m or /--(?=\s|$)/ to the suspiciousPatterns array or,
preferably, strip SQL comments (both `--` single-line and `/*...*/` multi-line)
from queryWithoutTerminalSemicolon before running the prefix/validation checks;
make sure the code that throws DuckLakeError.validation('Query contains
suspicious patterns that may indicate SQL injection') still references the same
suspiciousPatterns and queryWithoutTerminalSemicolon so all inline `--`
occurrences are detected and cause the validation error.
| async restoreSnapshot(tableName: string, snapshotId: number): Promise<void> { | ||
| try { | ||
| if (!this.connectionInfo) { | ||
| throw new Error('No active connection'); | ||
| } | ||
|
|
||
| const escapedName = tableName.replace(/"/g, '""'); | ||
| await this.connectionInfo.connection.run( | ||
| `CREATE OR REPLACE TABLE "${escapedName}" AS SELECT * FROM "${escapedName}" FOR SYSTEM_VERSION AS OF ${snapshotId}`, | ||
| ); | ||
| } catch (error) { | ||
| // eslint-disable-next-line no-console | ||
| console.error( | ||
| `Failed to restore DuckDB table ${tableName} to snapshot ${snapshotId}:`, | ||
| error, | ||
| ); | ||
| throw error; | ||
| } | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
DuckLake snapshot restore preserve lineage metadata
💡 Result:
In DuckLake, “restoring” a snapshot is typically done via time travel (querying or attaching the catalog at an older snapshot), not by rewriting the current head in-place. When you do this, you’re reading the exact historical metadata + data state recorded by that snapshot. [1]
-
Read/query an old snapshot (preserves the snapshot’s state/metadata):
- Table time travel:
SELECT * FROM my_table AT (VERSION => <snapshot_id>);[1] - Catalog attach at a snapshot:
ATTACH 'ducklake:metadata.duckdb' (SNAPSHOT_VERSION <snapshot_id>);[1]
Attaching withSNAPSHOT_VERSION/SNAPSHOT_TIMEis read-only, so it won’t “move HEAD”; it just exposes that historical state for querying/exporting. [2]
- Table time travel:
-
Row lineage (“rowid”) preservation:
DuckLake assigns each row a uniquerowidon first insert and preserves it when rows are moved between files (e.g., UPDATE/compaction). So if you access an older snapshot via time travel, therowidlineage for rows in that table history is preserved. [3] -
What breaks lineage/history:
If you “restore” by re-inserting data into a new table (or rewriting tables as new inserts), those rows get newrowids (new lineage). (Becauserowidis assigned on first insert.) [3]
If you expire snapshots, you delete historical snapshots (and thus the ability to time travel to them / retain their historical view). [4]
Rule of thumb: use time travel/attach-at-snapshot to preserve historical state and lineage; use “copy data out then reinsert” only if you accept losing original row lineage. [1][3][2]
Sources: [1] Time Travel – DuckLake (ducklake.select), [2] DuckLake attachment access modes (snapshot attach forces read-only) (deepwiki.com), [3] Row Lineage – DuckLake (ducklake.select), [4] Expire Snapshots – DuckLake (ducklake.select)
Citations:
- 1: https://ducklake.select/docs/stable/duckdb/usage/time_travel
- 2: https://deepwiki.com/duckdb/ducklake/2.2-database-initialization-and-attachment?utm_source=openai
- 3: https://ducklake.select/docs/stable/duckdb/advanced_features/row_lineage.html
- 4: https://ducklake.select/docs/stable/duckdb/maintenance/expire_snapshots?utm_source=openai
🌐 Web query:
DuckDB CREATE OR REPLACE TABLE metadata system_version_as_of
💡 Result:
DuckDB doesn’t have a special SYSTEM_VERSION_AS_OF / system_version_as_of clause for CREATE OR REPLACE TABLE. If your goal is to materialize an Iceberg table “as of” a snapshot into a (local) DuckDB table, do it with CTAS + Iceberg time-travel parameters:
-- optional: keep these tables in a separate schema
CREATE SCHEMA IF NOT EXISTS metadata;
-- pick a snapshot_id (see iceberg_snapshots() below), then materialize it
CREATE OR REPLACE TABLE metadata.system_version_as_of AS
SELECT *
FROM iceberg_scan('path_or_table', snapshot_from_id = 7635660646343998149);To discover available snapshots:
SELECT *
FROM iceberg_snapshots('path_or_table');Notes:
CREATE OR REPLACE TABLE ... AS SELECT ...overwrites the table if it already exists. [1]- Iceberg “as of” is done via
snapshot_from_idorsnapshot_from_timestampparameters (not a SQL... AS OF ...clause). [2]
Sources: DuckDB CREATE OR REPLACE TABLE docs [1], DuckDB Iceberg extension docs (snapshot parameters + iceberg_snapshots) [2].
[1] (duckdb.org)
[2] (duckdb.org)
Citations:
- 1: https://duckdb.org/docs/stable/sql/statements/create_table?utm_source=openai
- 2: https://duckdb.org/docs/stable/core_extensions/iceberg/overview
🏁 Script executed:
# Search for DuckLake-related code to understand metadata/lineage handling
rg -i "duckLake|lineage|metadata" src/main/services/duckLake --type ts -A 3 -B 1Repository: rosettadb/dbt-studio
Length of output: 50377
🏁 Script executed:
# Look for snapshot-related code and tests
find . -type f -name "*snapshot*" | head -20Repository: rosettadb/dbt-studio
Length of output: 46
🏁 Script executed:
# Check for DuckLake documentation or comments about restore behavior
rg -i "restore|snapshot" src/main/services/duckLake --type ts -B 2 -A 5Repository: rosettadb/dbt-studio
Length of output: 50377
Invalid DuckDB syntax and wrong restore semantic — fix required.
The FOR SYSTEM_VERSION AS OF syntax is not valid DuckDB SQL. DuckDB doesn't support temporal table syntax; it uses the DuckLake extension with time-travel semantics. The correct approach is to use SELECT * FROM table_name AT (VERSION => snapshot_id) or attach at a specific snapshot version.
Additionally, CREATE OR REPLACE TABLE ... AS SELECT * loses DuckLake row lineage — each re-inserted row gets a new rowid rather than preserving the original. Per DuckLake design, snapshot restore should use time-travel queries instead of table replacement.
This pattern appears in both duckdb.adapter.ts and postgresql.adapter.ts.
🤖 Prompt for AI Agents
In `@src/main/services/duckLake/adapters/duckdb.adapter.ts` around lines 87 - 105,
The restoreSnapshot implementation uses invalid DuckDB syntax and will lose
DuckLake lineage by recreating the table; update the restore logic in the
restoreSnapshot method (and the analogous method in postgresql.adapter.ts) to
use DuckLake time-travel syntax (SELECT ... AT (VERSION => snapshotId)) instead
of `FOR SYSTEM_VERSION AS OF`, and restore data by inserting the time-traveled
rows into the existing table to preserve row lineage (e.g., run a transaction
that truncates or deletes current rows from the target table and then runs an
INSERT INTO "<escapedName>" SELECT * FROM "<escapedName>" AT (VERSION =>
snapshotId) via connectionInfo.connection.run), ensuring you keep the
escapedName handling and error handling around connectionInfo and rethrow on
failure.
…in DuckDB pagination.
… apply it to column default clauses in database adapters to prevent SQL injection.
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/main/services/duckLake/adapters/postgresql.adapter.ts (1)
901-941:⚠️ Potential issue | 🟡 MinorPostgreSQL adapter missing
author/commitMessagecolumns inlistInstanceSnapshots.The DuckDB adapter dynamically detects and includes optional
author,commit_message, andcommit_extra_infocolumns viagetAvailableSnapshotColumns(), but the PostgreSQL adapter'slistInstanceSnapshotsandgetTableDetailssnapshot queries omit them entirely. This creates an inconsistent API response shape across adapters.Consider adding the same
getAvailableSnapshotColumnshelper and optional column detection to keep the adapter outputs uniform.
🤖 Fix all issues with AI agents
In `@src/main/services/duckLake/adapters/base.adapter.ts`:
- Around line 547-576: sanitizeDefaultValue currently uses case-sensitive
substring checks and an overly broad blocklist (e.g., includes('DROP'),
includes('CREATE')) which can be bypassed or wrongly reject benign defaults;
update sanitizeDefaultValue to either (A) perform case-insensitive,
word-boundary checks (e.g., compare defaultValue.toLowerCase() against regexes
like /\b(drop|delete|insert|update|create|alter)\b/) to avoid matching benign
substrings, or (preferred) remove the manual blocklist and instead strictly
validate against the allowed literal pattern used in validateColumnType (reuse
or mirror the simpleLiteralPattern logic) and reject anything that does not
match the expected literal form; ensure that when treating values as string
literals you still escape single quotes and wrap in quotes as currently done in
sanitizeDefaultValue.
In `@src/main/services/duckLake/adapters/duckdb.adapter.ts`:
- Around line 365-405: The PRAGMA call in getAvailableSnapshotColumns
interpolates metadataDatabase directly into a single-quoted string, risking SQL
injection if the name contains a quote; change the call to use the safe metadata
DB naming used elsewhere (e.g., getMetadataDatabaseName()) or escape/quote the
identifier with double-quotes instead of single quotes, and build the PRAGMA
string using the safe identifier so table_info gets called on
"metadataDatabase". Ensure you update the variable used in the PRAGMA query
inside getAvailableSnapshotColumns and keep the rest of the result handling
unchanged.
🧹 Nitpick comments (1)
src/main/services/duckLake/adapters/base.adapter.ts (1)
517-540:getMetadataDatabaseName— good extraction of repeated logic.This helper consolidates the metadata database lookup pattern that's duplicated across many methods in the concrete adapters. However, note that most methods in the concrete adapters still inline this same lookup rather than calling this helper.
| protected sanitizeDefaultValue(defaultValue: string): string { | ||
| // Check for dangerous SQL patterns | ||
| if ( | ||
| defaultValue.includes(';') || | ||
| defaultValue.includes('--') || | ||
| defaultValue.includes('/*') || | ||
| defaultValue.includes('*/') || | ||
| defaultValue.includes('DROP') || | ||
| defaultValue.includes('DELETE') || | ||
| defaultValue.includes('INSERT') || | ||
| defaultValue.includes('UPDATE') || | ||
| defaultValue.includes('CREATE') || | ||
| defaultValue.includes('ALTER') | ||
| ) { | ||
| throw new Error( | ||
| 'Invalid default value: contains potentially dangerous SQL patterns', | ||
| ); | ||
| } | ||
|
|
||
| // Check if it's a simple literal (number, boolean, NULL) | ||
| const simpleLiteralPattern = | ||
| /^(NULL|TRUE|FALSE|-?\d+(\.\d+)?|CURRENT_TIMESTAMP|CURRENT_DATE|CURRENT_TIME)$/i; | ||
| if (simpleLiteralPattern.test(defaultValue.trim())) { | ||
| return defaultValue.trim(); | ||
| } | ||
|
|
||
| // Treat as string literal: escape single quotes and wrap in quotes | ||
| const escapedValue = defaultValue.replace(/'/g, "''"); | ||
| return `'${escapedValue}'`; | ||
| } |
There was a problem hiding this comment.
sanitizeDefaultValue blocklist checks are case-sensitive — easily bypassed and overly broad.
The keyword checks on lines 554–559 use exact-case includes('DROP'), includes('DELETE'), etc., so drop table... in lowercase passes through. Meanwhile validateColumnType correctly uses .toLowerCase(). Additionally, the blocklist approach will reject legitimate default values containing these substrings (e.g., a string default of "CREATED_AT" is blocked by includes('CREATE')).
Consider either switching to .toLowerCase() for consistency and using word-boundary checks, or better yet, relying on the same pattern as validateColumnType — validate that the value matches an expected literal pattern and reject everything else:
Proposed fix sketch
protected sanitizeDefaultValue(defaultValue: string): string {
- // Check for dangerous SQL patterns
- if (
- defaultValue.includes(';') ||
- defaultValue.includes('--') ||
- defaultValue.includes('/*') ||
- defaultValue.includes('*/') ||
- defaultValue.includes('DROP') ||
- defaultValue.includes('DELETE') ||
- defaultValue.includes('INSERT') ||
- defaultValue.includes('UPDATE') ||
- defaultValue.includes('CREATE') ||
- defaultValue.includes('ALTER')
- ) {
+ const lower = defaultValue.toLowerCase();
+ // Check for dangerous SQL patterns (case-insensitive, word-boundary)
+ const dangerousPatterns = /[;]|--|\/\*|\*\/|\b(drop|delete|insert|update|create|alter)\b/i;
+ if (dangerousPatterns.test(defaultValue)) {
throw new Error(
'Invalid default value: contains potentially dangerous SQL patterns',
);
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| protected sanitizeDefaultValue(defaultValue: string): string { | |
| // Check for dangerous SQL patterns | |
| if ( | |
| defaultValue.includes(';') || | |
| defaultValue.includes('--') || | |
| defaultValue.includes('/*') || | |
| defaultValue.includes('*/') || | |
| defaultValue.includes('DROP') || | |
| defaultValue.includes('DELETE') || | |
| defaultValue.includes('INSERT') || | |
| defaultValue.includes('UPDATE') || | |
| defaultValue.includes('CREATE') || | |
| defaultValue.includes('ALTER') | |
| ) { | |
| throw new Error( | |
| 'Invalid default value: contains potentially dangerous SQL patterns', | |
| ); | |
| } | |
| // Check if it's a simple literal (number, boolean, NULL) | |
| const simpleLiteralPattern = | |
| /^(NULL|TRUE|FALSE|-?\d+(\.\d+)?|CURRENT_TIMESTAMP|CURRENT_DATE|CURRENT_TIME)$/i; | |
| if (simpleLiteralPattern.test(defaultValue.trim())) { | |
| return defaultValue.trim(); | |
| } | |
| // Treat as string literal: escape single quotes and wrap in quotes | |
| const escapedValue = defaultValue.replace(/'/g, "''"); | |
| return `'${escapedValue}'`; | |
| } | |
| protected sanitizeDefaultValue(defaultValue: string): string { | |
| const lower = defaultValue.toLowerCase(); | |
| // Check for dangerous SQL patterns (case-insensitive, word-boundary) | |
| const dangerousPatterns = /[;]|--|\/\*|\*\/|\b(drop|delete|insert|update|create|alter)\b/i; | |
| if (dangerousPatterns.test(defaultValue)) { | |
| throw new Error( | |
| 'Invalid default value: contains potentially dangerous SQL patterns', | |
| ); | |
| } | |
| // Check if it's a simple literal (number, boolean, NULL) | |
| const simpleLiteralPattern = | |
| /^(NULL|TRUE|FALSE|-?\d+(\.\d+)?|CURRENT_TIMESTAMP|CURRENT_DATE|CURRENT_TIME)$/i; | |
| if (simpleLiteralPattern.test(defaultValue.trim())) { | |
| return defaultValue.trim(); | |
| } | |
| // Treat as string literal: escape single quotes and wrap in quotes | |
| const escapedValue = defaultValue.replace(/'/g, "''"); | |
| return `'${escapedValue}'`; | |
| } |
🤖 Prompt for AI Agents
In `@src/main/services/duckLake/adapters/base.adapter.ts` around lines 547 - 576,
sanitizeDefaultValue currently uses case-sensitive substring checks and an
overly broad blocklist (e.g., includes('DROP'), includes('CREATE')) which can be
bypassed or wrongly reject benign defaults; update sanitizeDefaultValue to
either (A) perform case-insensitive, word-boundary checks (e.g., compare
defaultValue.toLowerCase() against regexes like
/\b(drop|delete|insert|update|create|alter)\b/) to avoid matching benign
substrings, or (preferred) remove the manual blocklist and instead strictly
validate against the allowed literal pattern used in validateColumnType (reuse
or mirror the simpleLiteralPattern logic) and reject anything that does not
match the expected literal form; ensure that when treating values as string
literals you still escape single quotes and wrap in quotes as currently done in
sanitizeDefaultValue.
| private async getAvailableSnapshotColumns(metadataDatabase: string): Promise<{ | ||
| hasAuthor: boolean; | ||
| hasCommitMessage: boolean; | ||
| hasCommitExtraInfo: boolean; | ||
| }> { | ||
| try { | ||
| // Check available columns in ducklake_snapshot_changes | ||
| // We use the database name directly in the PRAGMA string | ||
| if (!this.connectionInfo) { | ||
| return { | ||
| hasAuthor: false, | ||
| hasCommitMessage: false, | ||
| hasCommitExtraInfo: false, | ||
| }; | ||
| } | ||
| const tableInfoQuery = `PRAGMA table_info('${metadataDatabase}.main.ducklake_snapshot_changes')`; | ||
| const result = await this.connectionInfo.connection.run(tableInfoQuery); | ||
| const rows = await result.getRows(); | ||
|
|
||
| const columnNames = rows | ||
| .map((row: any) => (Array.isArray(row) ? row[1] : row.name)) | ||
| .map((name: string) => name.toLowerCase()); | ||
|
|
||
| return { | ||
| hasAuthor: columnNames.includes('author'), | ||
| hasCommitMessage: columnNames.includes('commit_message'), | ||
| hasCommitExtraInfo: columnNames.includes('commit_extra_info'), | ||
| }; | ||
| } catch (error) { | ||
| // eslint-disable-next-line no-console | ||
| console.warn( | ||
| 'Failed to check table info for ducklake_snapshot_changes', | ||
| error, | ||
| ); | ||
| return { | ||
| hasAuthor: false, | ||
| hasCommitMessage: false, | ||
| hasCommitExtraInfo: false, | ||
| }; | ||
| } | ||
| } |
There was a problem hiding this comment.
getAvailableSnapshotColumns — potential SQL injection in PRAGMA call.
Line 380: the metadataDatabase value is interpolated into a single-quoted string passed to PRAGMA table_info(...). If the database name contains a single quote, this would break. Use the same double-quote escaping approach used elsewhere, or use the getMetadataDatabaseName() helper.
Proposed fix
- const tableInfoQuery = `PRAGMA table_info('${metadataDatabase}.main.ducklake_snapshot_changes')`;
+ const safeName = metadataDatabase.replace(/'/g, "''");
+ const tableInfoQuery = `PRAGMA table_info('${safeName}.main.ducklake_snapshot_changes')`;🤖 Prompt for AI Agents
In `@src/main/services/duckLake/adapters/duckdb.adapter.ts` around lines 365 -
405, The PRAGMA call in getAvailableSnapshotColumns interpolates
metadataDatabase directly into a single-quoted string, risking SQL injection if
the name contains a quote; change the call to use the safe metadata DB naming
used elsewhere (e.g., getMetadataDatabaseName()) or escape/quote the identifier
with double-quotes instead of single quotes, and build the PRAGMA string using
the safe identifier so table_info gets called on "metadataDatabase". Ensure you
update the variable used in the PRAGMA query inside getAvailableSnapshotColumns
and keep the rest of the result handling unchanged.
Summary by CodeRabbit
New Features
Refactor
Helpers