feat(records): route to secondary store based on plan#6363
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6b5cfc6b24
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
9 issues found across 34 files
Confidence score: 2/5
- High-risk changes are concentrated in routing and DB safety paths, with multiple high-severity/high-confidence findings (7–8/10) indicating concrete regression potential rather than minor polish issues.
packages/records/lib/stores/postgres/config.tsdisables SSL certificate verification for records DB connections, which is a meaningful security regression and not just an observability concern.packages/shared/lib/services/plans/plans.tsandpackages/records/lib/catalog/router.tsintroduce silent fallbacks/caching to the default store on lookup or persistence failures, creating a realistic path to persistent misrouting after transient errors.- Pay close attention to
packages/records/lib/stores/postgres/config.ts,packages/records/lib/catalog/router.ts,packages/shared/lib/services/plans/plans.ts,packages/persist/lib/routes/environment/environmentId/connection/connectionId/getCursor.ts,packages/persist/lib/daemons/autopruning.daemon.ts- security hardening and fallback routing behavior can mask failures and route data operations to the wrong store.
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="packages/records/lib/stores/postgres/postgres.ts">
<violation number="1" location="packages/records/lib/stores/postgres/postgres.ts:1704">
P2: Routing persistence failures are silently downgraded to the default store and cached, so one DB error can pin a connection/model to the wrong store with no diagnostic signal.
(Based on your team's feedback about observability for failure paths and dropped work.) [FEEDBACK_USED]</violation>
</file>
<file name="packages/shared/lib/services/plans/plans.ts">
<violation number="1" location="packages/shared/lib/services/plans/plans.ts:67">
P1: `getPlanSafe()` hides plan lookup failures and silently falls back to default-store routing</violation>
</file>
<file name="packages/persist/lib/routes/environment/environmentId/connection/connectionId/getCursor.ts">
<violation number="1" location="packages/persist/lib/routes/environment/environmentId/connection/connectionId/getCursor.ts:34">
P1: Passing `plan` here makes cursor reads silently fall back to the default store when routing lookup fails.</violation>
</file>
<file name="packages/records/lib/stores/postgres/config.ts">
<violation number="1" location="packages/records/lib/stores/postgres/config.ts:21">
P1: SSL mode disables certificate verification for all records DB connections.</violation>
</file>
Tip: instead of fixing issues one by one fix them all with cubic
Re-trigger cubic
| const [row] = result.rows; | ||
| return Ok(row!.store_key as K); | ||
| } catch (err) { | ||
| return Err(new Error('Failed to get or create routing', { cause: err })); |
There was a problem hiding this comment.
P2: Routing persistence failures are silently downgraded to the default store and cached, so one DB error can pin a connection/model to the wrong store with no diagnostic signal.
(Based on your team's feedback about observability for failure paths and dropped work.)
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/records/lib/stores/postgres/postgres.ts, line 1704:
<comment>Routing persistence failures are silently downgraded to the default store and cached, so one DB error can pin a connection/model to the wrong store with no diagnostic signal.
(Based on your team's feedback about observability for failure paths and dropped work.) </comment>
<file context>
@@ -1672,6 +1672,39 @@ export class PostgresStore implements RecordsStore {
+ const [row] = result.rows;
+ return Ok(row!.store_key as K);
+ } catch (err) {
+ return Err(new Error('Failed to get or create routing', { cause: err }));
+ }
+ }
</file context>
| return Err(new Error('Failed to get or create routing', { cause: err })); | |
| logger.error( | |
| `[records routing] failed to get or create routing for connection_id=${connectionId} model=${model}: ${err instanceof Error ? err.message : String(err)}` | |
| ); | |
| return Err(new Error('Failed to get or create routing', { cause: err })); |
| // In the future, this can be extended to route based on connectionId, model, plan, account, etc. | ||
| // and routing decisions can be stored in a database | ||
| return 'default'; | ||
| const routingCache = new Map<string, keyof typeof stores>(); |
There was a problem hiding this comment.
no LRU or capping. A quick estimate looking at the db shows that storing every single possible key would currently take 12MB max
- Add records_store column to plans (default: 'default') - Add records_routing table to persist per-connection/model store assignments - Existing connection/model with records always pin to 'default' store - Thread plan through all records operations for routing context - No-op when secondary store is not configured
6b5cfc6 to
658a3a3
Compare
| RECORDS_DATABASE_SCHEMA: z.string().optional().default('nango_records'), | ||
| RECORDS_DATABASE_SSL: z.stringbool().optional().default(false), | ||
|
|
||
| RECORDS_SECONDARY_DATABASE_URL: z.url().optional(), |
There was a problem hiding this comment.
if we plan to have more and more, maybe we can name them as RECORDS_STORE_N_ and the default one is 1 - or we just skip the 1 from now on
| import { makePostgresConfig } from '../stores/postgres/config.js'; | ||
| import { PostgresStore } from '../stores/postgres/postgres.js'; | ||
|
|
||
| export const secondaryStore: PostgresStore | undefined = (() => { |
There was a problem hiding this comment.
left a comment below, the other think that we could do is name them like recordsStoreN
| ELSE :storeKey | ||
| END | ||
| ) | ||
| ON CONFLICT (connection_id, model) DO UPDATE SET store_key = "${RECORDS_ROUTING_TABLE}".store_key |
There was a problem hiding this comment.
Is it me or the whole account is pinned into the same storage regardless of the connection id and model?
There was a problem hiding this comment.
this line ensures that an existing dataset (connection/model) stays pinned to the same store. A ON CONFLICT DO NOTHING would be clearer but with DO NOTHING pg skip the row. We want to return it though so the trick is to touch the row by setting the same value again
| @@ -0,0 +1,19 @@ | |||
| import type { Knex } from 'knex'; | |||
|
|
|||
| const TABLE = 'records_routing'; | |||
There was a problem hiding this comment.
we will end up creating this table to all record stores right but only one would be the right source of truth
Im wondering if this is more a table for the control plane - which we have not, well could be server Today.
There was a problem hiding this comment.
you are right. I considered the tradeoff of reusing the existing store but having a unused empty table in all pg databases vs adding yet another db interface for the routing control plane.
As you can see in the Routing logic, I even aliased this db as routingStore. In the future it could be its own database but for now since there is no plan to get rid of the current records db I went for the simplest solution which is to reuse it.
There was a problem hiding this comment.
2 issues found across 4 files (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="packages/types/lib/plans/db.ts">
<violation number="1" location="packages/types/lib/plans/db.ts:206">
P2: Move `records_store` to a shared validated enum/constant instead of another ad-hoc string union. Right now the DB accepts any string and the router hardcodes the same keys separately, so store names can drift between persisted data, types, and runtime routing.
(Based on your team's feedback about validated enums and typed constants for known domain values.) [FEEDBACK_USED].</violation>
</file>
<file name="packages/records/lib/catalog/router.ts">
<violation number="1" location="packages/records/lib/catalog/router.ts:49">
P2: `migrate()` should wait for every store migration to settle before throwing. The new `Promise.all()` fail-fast behavior can abort startup while another store is still migrating and before the surrounding startup cleanup runs.</violation>
</file>
Reply with feedback, questions, or to request a fix.
Fix all with cubic | Re-trigger cubic
| * Records store key for this account | ||
| * @default 'default' | ||
| */ | ||
| records_store: 'default' | 'records2'; |
There was a problem hiding this comment.
P2: Move records_store to a shared validated enum/constant instead of another ad-hoc string union. Right now the DB accepts any string and the router hardcodes the same keys separately, so store names can drift between persisted data, types, and runtime routing.
(Based on your team's feedback about validated enums and typed constants for known domain values.) .
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/types/lib/plans/db.ts, line 206:
<comment>Move `records_store` to a shared validated enum/constant instead of another ad-hoc string union. Right now the DB accepts any string and the router hardcodes the same keys separately, so store names can drift between persisted data, types, and runtime routing.
(Based on your team's feedback about validated enums and typed constants for known domain values.) .</comment>
<file context>
@@ -203,7 +203,7 @@ export interface DBPlan extends Timestamps {
* @default 'default'
*/
- records_store: 'default' | 'secondary';
+ records_store: 'default' | 'records2';
/**
</file context>
| // Lifecycle: runs against all stores | ||
| migrate: RecordsStore['migrate'] = async () => { | ||
| await Promise.allSettled([...this.stores.values()].map((s) => s.migrate())); | ||
| await Promise.all([...this.stores.values()].map((s) => s.migrate())); |
There was a problem hiding this comment.
P2: migrate() should wait for every store migration to settle before throwing. The new Promise.all() fail-fast behavior can abort startup while another store is still migrating and before the surrounding startup cleanup runs.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/records/lib/catalog/router.ts, line 49:
<comment>`migrate()` should wait for every store migration to settle before throwing. The new `Promise.all()` fail-fast behavior can abort startup while another store is still migrating and before the surrounding startup cleanup runs.</comment>
<file context>
@@ -45,7 +46,7 @@ export class RecordsRouter<K extends string> implements RoutedRecordsStore {
// Lifecycle: runs against all stores
migrate: RecordsStore['migrate'] = async () => {
- await Promise.allSettled([...this.stores.values()].map((s) => s.migrate()));
+ await Promise.all([...this.stores.values()].map((s) => s.migrate()));
};
close: RecordsStore['close'] = async () => {
</file context>
| await Promise.all([...this.stores.values()].map((s) => s.migrate())); | |
| const results = await Promise.allSettled([...this.stores.values()].map((s) => s.migrate())); | |
| const failures = results.filter((result): result is PromiseRejectedResult => result.status === 'rejected'); | |
| if (failures.length > 0) { | |
| throw new AggregateError( | |
| failures.map((failure) => failure.reason), | |
| 'Failed to migrate one or more record stores' | |
| ); | |
| } |
Came up with a middle ground approach, creating a second logical records store but pointing it our for now at the same db instance (as another schema). This way we can have all the code ready (including plan support) but not have to operate another database yet. Let me know what you think.
The main design decision being made is
datasetsare pinned to a store. Not accounts, environments, etc... To do so we would need to implement live migration of existing records or limit new 'stores' to new customers, env, etc...This PR:
Note:
secondaryname is TBD. I am open to suggestion