improvement(mothership): table speed parity — keyset reads, limit bounds, background import/delete jobs#5012
improvement(mothership): table speed parity — keyset reads, limit bounds, background import/delete jobs#5012TheodoreSpeaks wants to merge 5 commits into
Conversation
… + tenant-scoped query performance (#4915) * feat(tables): paginated background row-delete jobs via table_jobs * fix(tables): address review on async row-delete (filtered count, scoped optimistic clear, Cmd+A select-all, hide delete from tray) * improvement(tables): filter-aware select-all runs, delete-job read mask, keyset index + autovacuum tuning * feat(tables): run import/delete/export/backfill jobs on trigger.dev with in-process fallback * improvement(tables): raise delete page to 10k and export batch to 5k * improvement(tables): raise CSV import batch to 5k rows (param-cap bounded) * feat(tables): surface export jobs in the header tray with progress, cancel, and download * improvement(tables): surface exports as derived tables-scoped toasts instead of the import tray * Revert "improvement(tables): surface exports as derived tables-scoped toasts instead of the import tray" This reverts commit 1ea5871. * fix(tables): preserve export storage key (NoSuchKey) and unify jobs in one spinner tray * improvement(tables): jobs tray icon reflects aggregate state (spinner/check/alert) * fix(tables): restore jobs tray on the tables list (dropped in staging merge) * improvement(tables): keyset-paginate export row reads (offset paging was O(n^2) over large tables) * perf(tables): keyset pagination for grid infinite scroll Default-order row pages now cursor on (order_key, id) instead of OFFSET — each page is an index seek on tableOrderKeyIdx, where OFFSET re-scans and discards every prior row (O(N²) across deep scrolls and full drains like select-all/export-to-clipboard). Sorted views keep offset paging; the contract refines after+sort as mutually exclusive. v1 public rows API is unchanged (extends the unrefined base, omits after). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(tables): show export in job tray immediately on kickoff The export-jobs query's poll only self-sustains once a running job is already in the cache, so a freshly kicked export stayed invisible until an SSE event or page refresh. Invalidate the tray query on kickoff success so the icon appears right away. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(tables): surface real row/column write errors in toasts Drizzle wraps DB errors in DrizzleQueryError whose message is the failed SQL — the real cause (e.g. the row-limit trigger's RAISE) sits on .cause, so the routes' substring classification never matched and everything fell through to generic 500s ("Failed to insert row"). Add rootErrorMessage (cause-chain unwrap) and a shared rowWriteErrorResponse classifier that consolidates the per-route pattern lists and rewrites the trigger message into a friendly "Row limit exceeded — capped at N rows". Applied across the app and v1 row-write routes and the columns route. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * perf(tables): tenant-bound filtered row counts (12.7s -> 0.6s) JSONB filter predicates (->> ILIKE / range casts) are opaque to the planner: it estimates a handful of matches and picks a parallel seq scan over the entire shared user_table_rows relation — every tenant's rows — for the page-0 COUNT(*), so any non-equality filter on a large table cost 10s+ regardless of how few rows matched. Run filtered counts in a transaction with SET LOCAL enable_seqscan = off, forcing the tenant-bounded bitmap plan. Unfiltered counts keep their index-only scan. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * perf(tables): tenant-bound Cmd+F search and stream its window (75s -> 2s) Same planner trap as the filtered count, compounded: the lateral jsonb_each_text ILIKE is unestimatable, so findRowMatches on a 1M-row table seq-scanned the whole 12M-row shared relation and disk-sorted ~120MB of window input (75s measured). SET LOCAL enable_seqscan=off bounds the scan to the tenant; on the default order, additionally penalizing bitmap/sort/parallel steers the planner onto the already- sorted (table_id, order_key, id) index walk so row_number() streams with no sort at all (2s measured). Flags only penalize plan shapes — a custom sort still sorts. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * perf(tables): tenant-bound sorted pages and filtered write selections Extends the seqscan fix to every remaining jsonb-predicate path, all measured on a 1M-row table in a 12M-row shared relation: - sorted page query (ORDER BY data->>'col'): 9.7s/page -> 0.76s, and deep pages stop spilling ~130MB sorts to disk - updateRowsByFilter / deleteRowsByFilter row selection: 14.4s -> bounded - delete-job worker selectRowIdPage with a filter: 12.6s/page -> bounded - dispatcher filtered-scope window walk: same shape, same fix Shared withSeqscanOff helper moves to lib/table/planner.ts (service + dispatcher both consume it; dispatcher can't import service). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * perf(tables): tenant-scoped containment index (migration 0232) The plain GIN on user_table_rows.data matched @> candidates across every tenant sharing the relation — a hot value in someone else's table inflated everyone's equality filters (1.07M candidates fetched for a 33k-row match, lossy bitmap, 1.1s). Replace it with btree_gin (table_id, data jsonb_path_ops): the tenant intersection happens inside the index and paths are single hashed entries. Rare-equality probe 326ms -> 17ms with zero wasted candidates; unique-constraint checks and upsert conflict lookups ride the same index. The new index is smaller than the one it replaces (529MB vs 694MB on the 12M-row dev relation). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * perf(tables): tenant-bound unique-constraint checks (3.5s -> <1s per write) The unique check runs lower(data->>'col') = $1 LIMIT 1 on every insert and cell edit touching a unique column. The predicate is unestimatable and a unique (non-conflicting) value never exits early, so the planner seq-scanned all 12.3M shared-relation rows per check — 3.5s measured. Tenant-bound both the single and batch variants; the batch path sets the flag on the caller's transaction when one is supplied (SET LOCAL dies at its commit, and the statements that follow are tenant-scoped writes). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * perf(tables): tenant-bound upsert conflict lookup Same unestimatable data->>key predicate as the unique checks; an insert-path upsert has no existing match so the lookup can't exit early and seq-scans the whole shared relation. The upsert already runs in a transaction — set the planner flag on it. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * refactor(tables): consolidate executor types onto planner exports service.ts kept a private DbTransaction alias and two inline typeof db | DbTransaction unions after planner.ts began exporting the canonical DbTransaction/DbExecutor — import those instead. From the /simplify review of the perf series; no behavior change. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(tests): drop narrow schema mock override in process-contents test The local vi.mock('@sim/db/schema') stubbed only document/knowledgeBase, but the file's import graph reaches lib/table/service whose module scope now references tableJobs. The global schema mock already covers all of it — rely on it per the testing rules instead of re-mocking. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(tables): scope cancels and counts to the filtered selection (review) Addresses the open Bugbot/Greptile findings on filtered select-all: - Filtered runs no longer cancel the whole table: cancelWorkflowGroupRuns takes a filter — it stops only dispatches with that exact filter scope and only in-flight cells on matching rows (semi-join); whole-table and differently-scoped dispatches keep running, their cancelled cells skipped via cancelledAt > requestedAt. - Stop on a filtered select-all sends the filter through cancel-runs (contract + route + mutation) instead of a table-wide cancel. - runColumnBodySchema rejects rowIds + filter together (mirrors deleteTableRowsBodySchema). - Select-all delete clears the selection in onSuccess, not at click, so a failed kickoff restores both rows and selection. - Clipboard copy/cut estimates use the filter-aware total (rowTotal) instead of the whole-table rowCount. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * chore: retrigger CI (Actions dropped the previous push events) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * chore: bump api-validation route baseline to 807 (staging route + merge) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(tables): release job claim when trigger.dev dispatch fails If tasks.trigger (or its dynamic imports) throws after markTableJobRunning, the ghost running row held the table's one-write-job slot until the stale-job janitor fired (~15-20 min of 409s). All four kickoff routes now release the claim and rethrow; the backfill runner releases and warns (a failed backfill never fails the schema change). Greptile P1. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * chore(db): squash 0232 into 0231 (one migration for the PR) Both are branch-only — no environment has applied them through the migration ledger yet — so the tenant-scoped GIN (btree_gin extension, index swap) folds into 0231_table_jobs_and_keyset. Snapshot chain re-pointed; drizzle-kit generate confirms zero drift. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(tables): context-menu delete label shows the true select-all count Under select-all the context menu counted only the loaded page ("Delete 1000 rows" on a 999k-row table) while the action correctly deletes every matching row via the background job. Delete now gets its own count from the filter-aware total minus deselections; the run-action labels keep the loaded-row count since those actions act on loaded rows only. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(tables): context-menu bulk actions act on the full select-all scope Follow-up to the label fix: under select-all the context menu's Run / Re-run / Stop only acted on the loaded page of rows. They now route through the same scopes as the action bar — runs dispatch by filter (whole table when unfiltered), Stop uses the filter-scoped cancel — and all labels share one true count (filter-aware total minus deselections, locale-formatted). Like the action bar, filter-scoped runs ignore deselections (the run API has no exclusion set). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * feat(tables): exclusion set for select-all runs and stops Select-all minus deselected rows now means exactly that for every bulk action, not just delete. runColumnBodySchema and cancelTableRunsBodySchema accept excludeRowIds (bounded by MAX_EXCLUDE_ROW_IDS, select-all scope only); the dispatch scope persists it and the dispatcher window walk, eager bulk-clear, pre-run cancel, and filter/table-scoped cancel all skip excluded rows. Client threads exclusions from the selection through the action bar and the grid context menu, including the optimistic stamps. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(tables): spare excluded-row dispatches on Stop; no orphan placeholder table Two Bugbot findings on the exclusion work: - Select-all-minus-deselections Stop (no filter) cancelled every active dispatch table-wide, killing row-scoped runs on deselected rows. markActiveDispatchesCancelled now spares dispatches whose scope.rowIds are fully contained in the exclusion set (coalesce(false) keeps table-wide dispatches cancellable). - Create-mode import: a failed trigger.dev dispatch released the job claim but left the just-created placeholder table in the workspace. Archive it on the failure path (no hard-delete surface exists). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(tables): row counts reflect a running delete everywhere A mid-delete refresh resurrected the old counts: the optimistic update stripped cached rows but left page-0 totalCount (footer / select-all label) at the old total, and list/detail counts reported raw row_count including doomed-but-not-yet-deleted rows. - onMutate now sets the active view's totalCount to the kept rows and decrements the cached detail rowCount by the doomed estimate - the kickoff persists that estimate on the job (payload.doomedCount, clamped server-side); getTableById/listTables subtract the not-yet-deleted remainder (doomedCount - rows_processed) while the delete runs, so refetched counts match the read path's delete mask Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * copy(tables): drop background mention from delete confirm Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(tables): clear select-all immediately when a delete kicks off The header checkbox lingered as a minus over the optimistically-emptied grid: rowSelectionCoversAll treats zero rows as not-covered, and the selection clear waited for the kickoff's onSuccess. Clear at click (failed kickoffs visibly restore rows + toast; re-selecting is cheap) and render an empty grid's header checkbox unchecked regardless — a selection over zero rows is vacuous. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(tables): export takes the async path while a delete job runs The sync/async export choice reads rowCount, which is a doomed-estimate- adjusted number during a running delete (and the estimate is client- supplied) — an overstated estimate could route a still-large masked set through the synchronous stream. Mid-delete exports now always run as a job: safe at any size, and exports bypass the one-job-per-table gate. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(build): stop uploads setup from sweeping the project into route graphs next build (Turbopack) failed with "Two or more assets with different content were emitted to the same output path" on the server-root chunk. Root cause: setup.server.ts's unscoped path.resolve(process.cwd()) made node-file-tracing sweep the entire project — next.config.ts included — into every route graph reaching lib/uploads (the files/upload route and, since the export job, the export-async path). Two producers emitted the swept config into same-named chunks; staging's latest commits made their contents diverge and the names collided. Annotate the path derivation with turbopackIgnore per the NFT warning's own remediation — the build passes and all ~390 "unexpected file in NFT list" warnings disappear. Also inline the releaseJobClaim dynamic imports in the kickoff routes to plain static imports — service is already statically imported there. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
…post-index ANALYZE guard (#4997) * fix(tables): per-batch delete-job commits, real trigger.dev retries, post-index ANALYZE guard * fix(tables): resume job progress across retries, rethrow root cause for clean failure messages
… up tables feature (#4995) * improvement(tables): migrate inputs to emcn chip components and clean up tables feature * fix(tables): address review feedback — stale filter column label, shared FieldError in enrichment config * improvement(tables): scope create-table callback to stable mutateAsync
…me/drop prompt `drizzle-kit push --force` only suppresses the data-loss confirm, not the rename-vs-drop disambiguation prompt. That prompt fires whenever a diff both adds and drops tables/columns at once (e.g. migration 0231 created sim_trigger_state while dropping the workspace_notification_* tables), and in CI it crashes with a bare "Interactive prompts require a TTY" stack trace. Catch that specific failure in the dev push step and emit a GitHub error annotation explaining the cause and the fix (drop the stale objects on the dev DB to match schema.ts — the same DROPs the versioned migration already applied to staging/prod), instead of leaving an opaque trace. Exit status is preserved either way. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…nds, background import/delete jobs Mothership's table operations lagged the fast paths the tables feature already has: - function_execute mounted inputTables via queryRows with defaults, silently truncating the sandbox CSV to 100 rows and paying for a count and execution metadata it never used. Mounts now drain the keyset export reader page by page, remap stored column-id keys to display names (headers previously didn't match id-keyed cell data), mount tables in parallel, and count toward the 50MB sandbox mount budget. - user_table query_rows accepted unbounded limits (whole table into one tool result) and only offset paging. It now enforces the contracts' MAX_QUERY_LIMIT, skips execution metadata, accepts the keyset `after` cursor, and returns `nextCursor` when a default-order page fills. - import_file / create_from_file / delete_rows_by_filter mutated without claiming the per-table job slot, racing running background jobs, and ran whole imports inline in the chat request. CSV/TSV imports ≥8MB and unbounded filter-deletes matching >1000 rows now dispatch the same trigger.dev jobs the UI routes use (slot claim, release-on-dispatch- failure, delete mask); inline imports claim and release the slot. A new get_job operation surfaces the table's derived job state for polling. - TableImportPayload grows deleteSourceFile (default true) so Mothership imports of persistent workspace files survive the worker's single-use source cleanup. Generated tool catalog artifacts regenerated from simstudioai/copilot#289. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
|
@greptile review |
|
The latest updates on your projects. Learn more about Vercel for GitHub. |
PR SummaryHigh Risk Overview Bulk operations now accept The workspace grid treats select-all as filter-scoped totals (including unloaded rows): exclusions survive toggles, Cmd+A toggles whole-table selection, delete-all goes through a background job instead of loading every id, and run/stop/copy counts use the filter-scoped total. SSE handles Dev CI Reviewed by Cursor Bugbot for commit 7310c06. Bugbot is set up for automated code reviews on this repo. Configure here. |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 7310c06. Configure here.
| sort: validated.sort ? wire.sortIn(validated.sort) : undefined, | ||
| limit: validated.limit, | ||
| offset: validated.offset, | ||
| after: validated.after, |
There was a problem hiding this comment.
HTTP rows missing next cursor
Medium Severity
The GET rows handler now accepts an after keyset cursor and passes it to queryRows, but the JSON response never includes a nextCursor when a full page is returned. REST clients cannot advance past the first default-order page without guessing offsets or reusing Mothership-only tooling.
Reviewed by Cursor Bugbot for commit 7310c06. Configure here.
Greptile SummaryThis PR brings Mothership's table operations to parity with the UI's recent performance work: full-table keyset CSV mounts replacing the silently-truncated 100-row
Confidence Score: 3/5The job infrastructure, migration, and background worker paths are solid, but the sandbox table mount builds every table CSV fully in memory before the budget check can reject it — for large enterprise tables this can spike the web container's heap before the error is thrown. The core job-slot enforcement (partial unique index, markTableJobRunning/releaseJobClaim, stale-job janitor) is well-structured and the import/delete runner paths handle failure and supersession correctly. The one concrete defect is in buildTableCsvForMount: tables are drained in parallel via Promise.all before the 50MB budget loop runs, so all CSVs land in memory simultaneously regardless of size — the budget check comes too late to prevent the allocation. Everything else in the PR (keyset pagination, limit clamping, deleteSourceFile flag, migration) looks correct and tested. apps/sim/lib/copilot/tools/handlers/function-execute.ts — the parallel CSV build before budget enforcement is the main concern; the rest of the changed files look safe to merge. Important Files Changed
Sequence DiagramsequenceDiagram
participant MB as Mothership Tool
participant SVC as table/service
participant JOB as table_jobs DB
participant WRK as Background Worker
participant SSE as SSE / get_job
Note over MB,SSE: import_file (large CSV / trigger.dev path)
MB->>SVC: markTableJobRunning(tableId, importId, 'import')
SVC->>JOB: INSERT table_jobs ON CONFLICT DO NOTHING
JOB-->>SVC: "claimed=true"
MB->>WRK: tasks.trigger('table-import', payload)
WRK-->>MB: dispatched
MB-->>MB: return jobId
Note over MB,SSE: delete_rows_by_filter (unbounded >1000 matches)
MB->>SVC: queryRows(filter, limit:1) totalCount
MB->>SVC: markTableJobRunning(tableId, jobId, 'delete', payload)
SVC->>JOB: "INSERT status=running payload={filter,cutoff,doomedCount}"
MB->>WRK: tasks.trigger('table-delete')
WRK->>SVC: selectRowIdPage + deletePageByIds keyset loop
WRK->>SVC: updateJobProgress / markJobReady
SVC->>JOB: "UPDATE status=ready"
WRK->>SSE: "appendTableEvent status=ready"
Note over MB,SSE: get_job polling
MB->>SVC: getTableById(tableId)
SVC->>JOB: SELECT latest non-export job
JOB-->>SVC: status rowsProcessed error
SVC-->>MB: jobStatus jobRowsProcessed
Reviews (1): Last reviewed commit: "improvement(mothership): table speed par..." | Re-trigger Greptile |
| const tableMounts = await Promise.all( | ||
| inputTables.map(async (tableRef) => { | ||
| const tableId = | ||
| typeof tableRef === 'string' | ||
| ? tableRef | ||
| : tableRef && typeof tableRef === 'object' | ||
| ? (tableRef as CanonicalTableInput).tableId || (tableRef as CanonicalTableInput).path | ||
| : undefined | ||
| if (!tableId) return null | ||
| const table = await resolveTableRef(tableId, tablePathLookup) | ||
| if (!table || table.workspaceId !== workspaceId) { | ||
| throw new Error( | ||
| `Input table not found: "${tableId}". Pass the table id (tbl_...) from tables/{name}/meta.json, or a tables/{name}/meta.json path.` | ||
| ) | ||
| } | ||
| const csvContent = await buildTableCsvForMount(table) | ||
| const sandboxPath = | ||
| typeof tableRef === 'object' && tableRef !== null | ||
| ? (tableRef as CanonicalTableInput).sandboxPath | ||
| : undefined | ||
| if (!tableId) continue | ||
| const table = await resolveTableRef(tableId, tablePathLookup) | ||
| if (!table || table.workspaceId !== workspaceId) { | ||
| throw new Error( | ||
| `Input table not found: "${tableId}". Pass the table id (tbl_...) from tables/{name}/meta.json, or a tables/{name}/meta.json path.` | ||
| ) | ||
| } | ||
| const rows = await queryRows(table, {}, 'copilot-fn-exec') | ||
|
|
||
| const allKeys = new Set(table.schema.columns.map((column) => column.name)) | ||
| for (const row of rows.rows ?? []) { | ||
| if (row.data && typeof row.data === 'object') { | ||
| for (const key of Object.keys(row.data as Record<string, unknown>)) { | ||
| allKeys.add(key) | ||
| } | ||
| return { | ||
| path: sandboxPath || `/home/user/tables/${table.id}.csv`, | ||
| content: csvContent, | ||
| } | ||
| } | ||
| const headers = Array.from(allKeys) | ||
| const csvLines = [headers.join(',')] | ||
| for (const row of rows.rows ?? []) { | ||
| const data = (row.data || {}) as Record<string, unknown> | ||
| csvLines.push( | ||
| headers | ||
| .map((h) => { | ||
| const val = data[h] | ||
| const str = val === null || val === undefined ? '' : String(val) | ||
| return str.includes(',') || str.includes('"') || str.includes('\n') | ||
| ? `"${str.replace(/"/g, '""')}"` | ||
| : str | ||
| }) | ||
| .join(',') | ||
| }) | ||
| ) | ||
| for (const mount of tableMounts) { | ||
| if (!mount) continue | ||
| if (totalSize + mount.content.length > MAX_TOTAL_SIZE) { | ||
| throw new Error( | ||
| `Mounting table "${mount.path}" would exceed the ${MAX_TOTAL_SIZE / 1024 / 1024}MB total mount limit. Mount fewer or smaller tables.` | ||
| ) | ||
| } | ||
| const csvContent = csvLines.join('\n') | ||
| const sandboxPath = | ||
| typeof tableRef === 'object' && tableRef !== null | ||
| ? (tableRef as CanonicalTableInput).sandboxPath | ||
| : undefined | ||
| sandboxFiles.push({ | ||
| path: sandboxPath || `/home/user/tables/${table.id}.csv`, | ||
| content: csvContent, | ||
| }) | ||
| totalSize += mount.content.length | ||
| sandboxFiles.push(mount) | ||
| } |
There was a problem hiding this comment.
Full table CSVs built before budget enforcement
All input tables are serialized in parallel via Promise.all before any size check runs. For each table the buildTableCsvForMount loop fetches every row page-by-page and accumulates the entire CSV in memory — no size gate inside the loop. The outer budget check (totalSize + mount.content.length > MAX_TOTAL_SIZE) only fires after every table has been fully loaded.
Concrete failure: a user mounts two enterprise-plan tables each at ~40MB of rows. Both CSVs are built (80MB in memory) before the loop sees that the second one exceeds the 50MB cap and throws. File and directory mounts check record.size before fetching; tables lack that pre-flight guard. Under traffic, several concurrent requests doing this can exhaust the web container's heap.
| if (args.limit === undefined) { | ||
| const { totalCount } = await queryRows( | ||
| table, | ||
| { filter: idFilter, limit: 1, withExecutions: false }, | ||
| requestId | ||
| ) | ||
| const matchCount = totalCount ?? 0 | ||
| if (matchCount > TABLE_LIMITS.MAX_BULK_OPERATION_SIZE) { | ||
| const doomedCount = Math.min(matchCount, table.rowCount) | ||
| const cutoff = new Date() | ||
| const jobId = generateId() | ||
| const payload: TableDeleteJobPayload = { | ||
| filter: idFilter, | ||
| cutoff: cutoff.toISOString(), | ||
| doomedCount, | ||
| } | ||
| assertNotAborted() | ||
| const claimed = await markTableJobRunning(table.id, jobId, 'delete', payload) | ||
| if (!claimed) { | ||
| return { success: false, message: 'A job is already in progress for this table' } | ||
| } | ||
| await dispatchDeleteJob({ | ||
| jobId, | ||
| tableId: table.id, | ||
| workspaceId, | ||
| filter: idFilter, | ||
| cutoff, | ||
| }) | ||
| return { | ||
| success: true, | ||
| message: `Started background delete of ${doomedCount} matching rows (job ${jobId}). The rows are hidden from reads immediately; call get_job to track progress.`, | ||
| data: { jobId, doomedCount }, | ||
| } | ||
| } |
There was a problem hiding this comment.
doomedCount can overestimate when rows are inserted between count and dispatch
matchCount comes from queryRows(..., { limit: 1, withExecutions: false }) and reflects the delete-mask-filtered row count. Between that count query and markTableJobRunning, rows inserted after the cutoff timestamp could match the filter, making matchCount stale. The cutoff passed to the background worker correctly caps new-row inclusion, but doomedCount stored in the job payload — and surfaced in the get_job response message — would over-report the number of rows being removed. Not a correctness issue for the worker, but the user-visible estimate is an upper bound, not an exact figure.
Greptile SummaryThis PR closes three performance gaps in Mothership's table operations: sandbox table mounts now drain all rows via keyset pagination instead of silently truncating to 100,
Confidence Score: 3/5Safe to merge for most workloads; enterprise tables with hundreds of thousands of rows could exhaust Vercel function memory during a sandbox mount before the 50 MB budget check fires. The core job-slot migration, keyset pagination, background import/delete dispatch, and schema change are all well-structured and tested. The one concern worth resolving before a wide enterprise rollout is buildTableCsvForMount: all table CSVs are fully assembled in parallel memory before the budget check, so a user who mounts a 500k-row enterprise table gets a function OOM rather than a clean rejection. apps/sim/lib/copilot/tools/handlers/function-execute.ts — the buildTableCsvForMount loop and the parallel Promise.all mount assembly need a per-page budget check to avoid OOM on large enterprise tables. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[Mothership: import_file / create_from_file] --> B{CSV/TSV 8 MB?}
B -- No --> C[markTableJobRunning claim slot]
C --> D[Inline import parseFileRows + batchInsert]
D --> E[releaseJobClaim]
B -- Yes --> F[createTable with jobStatus=running or markTableJobRunning]
F --> G{isTriggerDevEnabled?}
G -- Yes --> H[tasks.trigger tableImportTask]
G -- No --> I[runDetached runTableImport]
H -- dispatch error --> J[releaseJobClaim / deleteTable]
I --> K[runTableImport streaming keyset import]
H --> K
L[Mothership: delete_rows_by_filter] --> M{limit === undefined?}
M -- Yes --> N[queryRows count with pendingDeleteMask]
N --> O{matchCount > 1000?}
O -- No --> P[deleteRowsByFilter inline]
O -- Yes --> Q[markTableJobRunning claim]
Q --> R[dispatchDeleteJob]
R --> S[runTableDelete keyset page walk]
M -- No --> P
T[Mothership: query_rows] --> U{limit > 1000?}
U -- Yes --> V[return error]
U -- No --> W[queryRows with pendingDeleteMask + after cursor]
W --> X[return rows + nextCursor]
Y[Mothership: get_job] --> Z[getTableById latestJobForTable]
Z --> AA[return job status / rowCount]
Reviews (2): Last reviewed commit: "improvement(mothership): table speed par..." | Re-trigger Greptile |
| * would corrupt values. | ||
| */ | ||
| function formatMountCsvValue(value: unknown): string { | ||
| if (value === null || value === undefined) return '' | ||
| if (value instanceof Date) return value.toISOString() | ||
| if (typeof value === 'object') return JSON.stringify(value) | ||
| return String(value) | ||
| } | ||
|
|
||
| /** | ||
| * Serializes a full table to CSV for a sandbox mount. Walks the keyset export | ||
| * reader page by page so every row is included (`queryRows` with defaults | ||
| * silently truncated mounts to its 100-row page and paid for a count and | ||
| * execution metadata the CSV never used), and remaps stored column-id keys | ||
| * back to display names so headers line up with cell values. | ||
| */ | ||
| async function buildTableCsvForMount(table: TableDefinition): Promise<string> { | ||
| const nameById = buildNameById(table.schema) | ||
| const headers = table.schema.columns.map((c) => c.name) | ||
| const lines = [toCsvRow(headers)] | ||
| let after: { position: number; id: string } | null = null | ||
| while (true) { | ||
| const page = await selectExportRowPage(table, after, TABLE_MOUNT_PAGE_SIZE) | ||
| for (const row of page) { | ||
| const data = rowDataIdToName(row.data, nameById) | ||
| lines.push(toCsvRow(headers.map((header) => formatMountCsvValue(data[header])))) | ||
| } | ||
| if (page.length < TABLE_MOUNT_PAGE_SIZE) return lines.join('\n') | ||
| const last = page[page.length - 1] | ||
| after = { position: last.position, id: last.id } | ||
| } | ||
| } | ||
|
|
||
| async function resolveInputFiles( | ||
| workspaceId: string, | ||
| inputFiles?: unknown[], |
There was a problem hiding this comment.
Full CSV materialized in memory before the 50 MB budget check
buildTableCsvForMount drains every row into a lines[] array and only returns after the full table is assembled. The budget check in the caller happens after all table CSVs are built via Promise.all. For an enterprise table with 100k+ rows, the CSV string can be hundreds of megabytes or more before the check ever fires — the budget guard prevents the file from reaching the sandbox, but the memory damage is already done (and in the parallel case, all tables are materialized simultaneously).
The old queryRows path implicitly bounded this to 100 rows. A straightforward fix is to track a running byte count inside the drain loop and throw early once the caller's budget would be exceeded — or pass maxBytes as a parameter and abort the page walk.
| * would corrupt values. | ||
| */ | ||
| function formatMountCsvValue(value: unknown): string { | ||
| if (value === null || value === undefined) return '' | ||
| if (value instanceof Date) return value.toISOString() | ||
| if (typeof value === 'object') return JSON.stringify(value) | ||
| return String(value) | ||
| } | ||
|
|
||
| /** | ||
| * Serializes a full table to CSV for a sandbox mount. Walks the keyset export | ||
| * reader page by page so every row is included (`queryRows` with defaults | ||
| * silently truncated mounts to its 100-row page and paid for a count and | ||
| * execution metadata the CSV never used), and remaps stored column-id keys | ||
| * back to display names so headers line up with cell values. | ||
| */ | ||
| async function buildTableCsvForMount(table: TableDefinition): Promise<string> { | ||
| const nameById = buildNameById(table.schema) | ||
| const headers = table.schema.columns.map((c) => c.name) | ||
| const lines = [toCsvRow(headers)] | ||
| let after: { position: number; id: string } | null = null | ||
| while (true) { | ||
| const page = await selectExportRowPage(table, after, TABLE_MOUNT_PAGE_SIZE) | ||
| for (const row of page) { | ||
| const data = rowDataIdToName(row.data, nameById) | ||
| lines.push(toCsvRow(headers.map((header) => formatMountCsvValue(data[header])))) | ||
| } | ||
| if (page.length < TABLE_MOUNT_PAGE_SIZE) return lines.join('\n') | ||
| const last = page[page.length - 1] | ||
| after = { position: last.position, id: last.id } | ||
| } | ||
| } | ||
|
|
||
| async function resolveInputFiles( | ||
| workspaceId: string, | ||
| inputFiles?: unknown[], |
There was a problem hiding this comment.
pendingDeleteMask fetched on every page — N+1 DB round-trips during drain
selectExportRowPage calls pendingDeleteMask(table) unconditionally, so a 100k-row table drained in 20 pages makes 20 extra DB round-trips just to check whether a delete job is running. The mask is derived from table.id which is constant, and the delete job status won't change meaningfully mid-drain. Fetching it once before the loop and threading it through would eliminate the repeated queries.
| * @param requestId - Request ID for logging | ||
| * @returns Query result with rows and pagination info | ||
| */ | ||
| /** | ||
| * Visibility mask for a running delete job: returns a clause keeping only rows the job will NOT | ||
| * delete, or `undefined` when no delete job is running. The job's persisted scope | ||
| * ({@link TableDeleteJobPayload}) defines the doomed set — `matches(filter) AND created_at <= | ||
| * cutoff AND id NOT IN excludeRowIds` — exactly what the worker's `selectRowIdPage` selects, so | ||
| * mid-job reads (refresh, other clients, exports) are consistent with the eventual result. The | ||
| * mask lifts automatically when the job leaves `running` (done, failed, or canceled). | ||
| * | ||
| * `(doomed) IS NOT TRUE` rather than `NOT (doomed)`: JSONB predicates evaluate to NULL on missing | ||
| * cells, and those rows are NOT selected for deletion (NULL ≠ TRUE) — they must stay visible. | ||
| */ |
There was a problem hiding this comment.
pendingDeleteMask adds an extra DB query to every queryRows call
pendingDeleteMask is now called at the top of queryRows on every invocation — adding a round-trip to every read path (grid page loads, Mothership query_rows, count refreshes, etc.). The vast majority of tables will never have a running delete job, so this query returns nothing almost all the time. A lightweight fast-path could skip the query if the table's derived jobStatus (already available on TableDefinition) isn't 'running'/'delete', or the mask could be pre-fetched where the caller already holds a recent TableDefinition.


Context
Audit of Mothership's table operations against the tables feature's recent perf work (tenant-scoped queries, keyset pagination, background
table_jobs). The coreuser_tablerow CRUD was already on the fast paths; these were the gaps.Changes
function_execute table mounts were silently truncated to 100 rows.
inputTablesmounted viaqueryRows(table, {}, …)— default limit 100, plus aCOUNT(*)and an executions load the CSV never used. Sandbox code computed over truncated data. Mounts now drainselectExportRowPage(keyset, tenant-bounded, delete-mask-aware) in 5k pages, remap stored column-id keys back to display names (previously headers were names but cells were id-keyed, producing empty columns on id-keyed tables), run per-table mounts in parallel, and count toward the 50MB mount budget.query_rowsgets contract-parity limits and keyset paging.limitwas passed through unclamped (the model could pull an entire table into one tool result) and onlyoffsetpaging existed. Now:limit≤MAX_QUERY_LIMIT(1000) rejected with the contract's message,withExecutions: false,after: {orderKey, id}accepted (rejected withsort, like the HTTP contract), andnextCursorreturned when a default-order page fills.Imports and unbounded filter-deletes get full async-job parity with the UI routes.
import_file/create_from_file: CSV/TSV ≥CSV_ASYNC_IMPORT_THRESHOLD_BYTES(8MB) dispatchtableImportTask(claim slot → dispatch → release-on-failure, mirroring theimport-asyncroutes; create mode uses the placeholder-table pattern from the workspace-level route). Smaller/other-format files stay inline but now claim and release the table's one-write-job slot, so a Mothership import can no longer interleave with a running background import/delete.delete_rows_by_filter: explicitlimit≤ 1000 stays inline; with no limit, the match count is measured first (tenant-bounded count) and >1000 matches dispatchtableDeleteTaskwith the standard{filter, cutoff, doomedCount}payload — doomed rows hidden immediately via the existingpendingDeleteMask.get_joboperation returns the table's derived job state ({id, type, status, error, rowsProcessed}) for polling.TableImportPayload.deleteSourceFile(defaulttrue). The import worker deletes its source object when terminal — correct for the UI's single-use direct uploads, destructive for the persistent workspace files Mothership imports from. Mothership passesfalse.Generated artifacts (
tool-catalog-v1.ts,tool-schemas-v1.ts) regenerated from the companion contract PR simstudioai/copilot#289 (get_job,afterparam, limit maximums, background-behavior docs). Sim-side behavior is live regardless; the model sees the new schema once the copilot PR deploys.Tests
user-table.test.ts: limit clamps, keysetafter/nextCursor(+after+sortrejection), inline slot claim/release, slot-held rejection, background import/delete dispatch payloads,get_job(33 passing)function-execute.test.ts(new): full keyset drain across pages (5002-row table), id→name header mapping, cross-workspace rejectionimport-runner.test.ts(new): source-file cleanup default vsdeleteSourceFile: falsebunx vitest run lib/copilot lib/table(78 files, 721 tests),bun run lint:check,bun run check:api-validation,tsc --noEmitall greenRelated: #5011 (cross-tenant
outputTablefix, split out), simstudioai/copilot#289 (contract).🤖 Generated with Claude Code