cmd/seed: synthetic dataset for local development#70
Conversation
Add a deterministic ~6-month synthetic dataset (~9.8k rows, gentle sinusoid with occasional spikes and quiet days) for exercising the dashboard locally without needing real production exports. The generator deliberately spans every period (7d / 30d / 3m / 6m / 1y) so the chart UI has data to render at any range. Safety properties: - Refuses to run unless Config.Environment == "development". - INSERT … ON CONFLICT (id) DO NOTHING, so re-running is a no-op. - Steam IDs use a clearly-synthetic 76561198000000000 prefix. - Snowflake IDs encode the same created_at + sequence layout as the production generator, so synthetic rows sort chronologically alongside any real rows already in the DB. internal-docs/ and internal/devseed/fixtures/ are added to .gitignore to keep author scratch space and any future local CSV fixtures out of the public repo. Co-authored-by: Cursor <cursoragent@cursor.com>
| const ( | ||
| syntheticRNGSeed int64 = 42 | ||
| syntheticDays = 180 | ||
| syntheticTargetTotal = 9800 |
There was a problem hiding this comment.
Specify the datatype here.
Also, we probably don't need 9800 entries, a couple thousand should work.
There was a problem hiding this comment.
Done — syntheticTargetTotal int = 2000.
| // days). Snowflake IDs are unique within the slice and won't collide with | ||
| // real CSV-seeded IDs, so callers can pipe the result straight into | ||
| // InsertReversals. | ||
| func GenerateSynthetic(now time.Time) []*models.Reversal { |
There was a problem hiding this comment.
Let's normalize now() to UTC before we derive any dates.
There was a problem hiding this comment.
Done — now() is normalized to UTC before any dates are derived.
| The seed: | ||
|
|
||
| - Refuses to run unless `Environment` is `development`. | ||
| - Uses `INSERT … ON CONFLICT (id) DO NOTHING`, so it's safe to re-run. | ||
| - Generates a deterministic 6-month dataset so the dashboard at `/` has enough data to exercise every period (7d / 30d / 3m / 6m / 1y). | ||
| - Uses a synthetic Steam ID prefix (`76561198000000000`) so generated IDs are clearly fake. |
There was a problem hiding this comment.
You can just state that this must be ran in a development environment. Also, please move this to the end of the README.
There was a problem hiding this comment.
Done — simplified to "must be run in a development environment" and moved the section to the end of the README.
- Give syntheticTargetTotal an explicit int type and reduce the seed to ~2,000 rows (a couple thousand) for faster local seeding. - Normalize the incoming now to UTC before deriving any dates so the generated series is timezone-independent. - Simplify the README seeding guard to "must be run in a development environment", move the seeding section to the end of the README, and stop claiming idempotency (reruns add more data since IDs derive from wall-clock time). Co-authored-by: Cursor <cursoragent@cursor.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes using default effort and found 2 potential issues.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Want higher recall? High effort reviews run extra passes and find more bugs. A team admin can switch effort levels in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit c4d8523. Configure here.
Re-running the seed on a different day produces new snowflake IDs but the same deterministic (steam_id, marketplace_slug) pairs, which collide with the partial unique index idx_reversals_steam_id_marketplace_slug. Target that natural key (with the WHERE deleted_at IS NULL predicate) in the ON CONFLICT DO NOTHING clause so reruns skip existing rows instead of raising a unique-constraint error, and correct the README/doc comments to describe the idempotent behavior. Also clamp the synthetic createdAt to models.Epoch before the unsigned snowflake timestamp subtraction so an out-of-range clock can't underflow into a garbage ID.

Summary
Add a deterministic ~6-month synthetic dataset (~9.8k rows, gentle sinusoid with occasional spikes and quiet days) for exercising the dashboard locally without needing a real production export. The generator deliberately spans every period the chart picker offers (7d / 30d / 3m / 6m / 1y), so any range renders meaningful data.
Safety properties
.gitignore
Adds `internal-docs/` (author scratch space) and `internal/devseed/fixtures/` (room for any future local-only CSV fixtures) so neither leaks into the public repo.
Test plan
Made with Cursor
Note
Low Risk
Dev-only CLI with an environment guard and idempotent inserts; not linked to the production server binary.
Overview
Adds a dev-only seed path so local Postgres can hold a deterministic ~6-month reversal history (~9.8k rows) without production exports.
go run ./cmd/seedloads config, exits unlessEnvironmentisdevelopment, then bulk-inserts generated rows into the public DB viaON CONFLICT (id) DO NOTHING(safe to re-run).internal/devseedbuilds daily volume with variance, marketplace mix, sources, optional expungements, synthetic Steam IDs, and snowflake IDs aligned with production ordering.README documents the workflow and dashboard period coverage;
.gitignoreexcludesinternal-docs/and optionalinternal/devseed/fixtures/.Reviewed by Cursor Bugbot for commit b067b6b. Bugbot is set up for automated code reviews on this repo. Configure here.