Skip to content

cmd/seed: synthetic dataset for local development#70

Open
ZukwiZ wants to merge 3 commits into
masterfrom
feat/dev-seed-synthetic
Open

cmd/seed: synthetic dataset for local development#70
ZukwiZ wants to merge 3 commits into
masterfrom
feat/dev-seed-synthetic

Conversation

@ZukwiZ

@ZukwiZ ZukwiZ commented May 27, 2026

Copy link
Copy Markdown
Collaborator

Summary

Add a deterministic ~6-month synthetic dataset (~9.8k rows, gentle sinusoid with occasional spikes and quiet days) for exercising the dashboard locally without needing a real production export. The generator deliberately spans every period the chart picker offers (7d / 30d / 3m / 6m / 1y), so any range renders meaningful data.

Safety properties

  • Refuses to run unless `Config.Environment == "development"`.
  • `INSERT … ON CONFLICT (id) DO NOTHING`, so re-running is a no-op.
  • Steam IDs use a clearly-synthetic `76561198000000000` prefix; real Steam IDs sit higher up the 64-bit range, so generated rows are easy to spot in the DB.
  • Snowflake IDs encode the same `created_at` + sequence layout as the production generator (`domain/models/snowflake.go`), so synthetic rows sort chronologically alongside any real rows already present.

.gitignore

Adds `internal-docs/` (author scratch space) and `internal/devseed/fixtures/` (room for any future local-only CSV fixtures) so neither leaks into the public repo.

Test plan

  • `go build ./cmd/seed`
  • Against a local dev DB with `Environment=development`: `go run ./cmd/seed` prints `generated …` and `seed complete: N inserted, 0 already present`.
  • Re-run: prints `seed complete: 0 inserted, N already present` (idempotent).
  • With `Environment` set to anything else: exits non-zero with a refusal message.
  • After seeding, the dashboard at `/` shows ~6 months of activity across all period picker ranges.

Made with Cursor


Note

Low Risk
Dev-only CLI with an environment guard and idempotent inserts; not linked to the production server binary.

Overview
Adds a dev-only seed path so local Postgres can hold a deterministic ~6-month reversal history (~9.8k rows) without production exports.

go run ./cmd/seed loads config, exits unless Environment is development, then bulk-inserts generated rows into the public DB via ON CONFLICT (id) DO NOTHING (safe to re-run). internal/devseed builds daily volume with variance, marketplace mix, sources, optional expungements, synthetic Steam IDs, and snowflake IDs aligned with production ordering.

README documents the workflow and dashboard period coverage; .gitignore excludes internal-docs/ and optional internal/devseed/fixtures/.

Reviewed by Cursor Bugbot for commit b067b6b. Bugbot is set up for automated code reviews on this repo. Configure here.

Add a deterministic ~6-month synthetic dataset (~9.8k rows, gentle
sinusoid with occasional spikes and quiet days) for exercising the
dashboard locally without needing real production exports. The
generator deliberately spans every period (7d / 30d / 3m / 6m / 1y)
so the chart UI has data to render at any range.

Safety properties:

- Refuses to run unless Config.Environment == "development".
- INSERT … ON CONFLICT (id) DO NOTHING, so re-running is a no-op.
- Steam IDs use a clearly-synthetic 76561198000000000 prefix.
- Snowflake IDs encode the same created_at + sequence layout as
  the production generator, so synthetic rows sort chronologically
  alongside any real rows already in the DB.

internal-docs/ and internal/devseed/fixtures/ are added to .gitignore
to keep author scratch space and any future local CSV fixtures out of
the public repo.

Co-authored-by: Cursor <cursoragent@cursor.com>
Comment thread cmd/seed/main.go
@zedimytch zedimytch self-requested a review June 3, 2026 16:54
Comment thread cmd/seed/main.go
Comment thread internal/devseed/synthetic.go Outdated
const (
syntheticRNGSeed int64 = 42
syntheticDays = 180
syntheticTargetTotal = 9800

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specify the datatype here.
Also, we probably don't need 9800 entries, a couple thousand should work.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done — syntheticTargetTotal int = 2000.

// days). Snowflake IDs are unique within the slice and won't collide with
// real CSV-seeded IDs, so callers can pipe the result straight into
// InsertReversals.
func GenerateSynthetic(now time.Time) []*models.Reversal {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's normalize now() to UTC before we derive any dates.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done — now() is normalized to UTC before any dates are derived.

Comment thread README.md Outdated
Comment on lines +31 to +36
The seed:

- Refuses to run unless `Environment` is `development`.
- Uses `INSERT … ON CONFLICT (id) DO NOTHING`, so it's safe to re-run.
- Generates a deterministic 6-month dataset so the dashboard at `/` has enough data to exercise every period (7d / 30d / 3m / 6m / 1y).
- Uses a synthetic Steam ID prefix (`76561198000000000`) so generated IDs are clearly fake.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can just state that this must be ran in a development environment. Also, please move this to the end of the README.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done — simplified to "must be run in a development environment" and moved the section to the end of the README.

- Give syntheticTargetTotal an explicit int type and reduce the seed to
  ~2,000 rows (a couple thousand) for faster local seeding.
- Normalize the incoming now to UTC before deriving any dates so the
  generated series is timezone-independent.
- Simplify the README seeding guard to "must be run in a development
  environment", move the seeding section to the end of the README, and
  stop claiming idempotency (reruns add more data since IDs derive from
  wall-clock time).

Co-authored-by: Cursor <cursoragent@cursor.com>

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes using default effort and found 2 potential issues.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Want higher recall? High effort reviews run extra passes and find more bugs. A team admin can switch effort levels in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit c4d8523. Configure here.

Comment thread internal/devseed/synthetic.go Outdated
Comment thread internal/devseed/synthetic.go
Re-running the seed on a different day produces new snowflake IDs but the
same deterministic (steam_id, marketplace_slug) pairs, which collide with
the partial unique index idx_reversals_steam_id_marketplace_slug. Target
that natural key (with the WHERE deleted_at IS NULL predicate) in the
ON CONFLICT DO NOTHING clause so reruns skip existing rows instead of
raising a unique-constraint error, and correct the README/doc comments to
describe the idempotent behavior.

Also clamp the synthetic createdAt to models.Epoch before the unsigned
snowflake timestamp subtraction so an out-of-range clock can't underflow
into a garbage ID.
@ZukwiZ ZukwiZ requested a review from zedimytch June 24, 2026 13:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants