Skip to content

fix: guard executor shutdown in BaseCaptureStrategy.stop()#5627

Open
tsushanth wants to merge 8 commits into
getsentry:mainfrom
tsushanth:fix-persisting-executor-leak-5564
Open

fix: guard executor shutdown in BaseCaptureStrategy.stop()#5627
tsushanth wants to merge 8 commits into
getsentry:mainfrom
tsushanth:fix-persisting-executor-leak-5564

Conversation

@tsushanth

Copy link
Copy Markdown
Contributor

Fixes #5564

Problem

Each ReplayIntegration.start() constructs a fresh SessionCaptureStrategy or BufferCaptureStrategy. Both inherit BaseCaptureStrategy, which owns a persistingExecutor that was previously declared as a Kotlin lazy delegate.

stop() resets the delegated properties segmentTimestamp and currentReplayId. Their setters call runInBackground, which — when options.threadChecker.isMainThread() is true — accesses persistingExecutor, silently initialising the lazy. stop() never shuts the executor down, so one SentryReplayPersister-* thread is abandoned on every cycle. Kiln benchmark data confirmed monotonically growing thread counts and a GC allocation rate roughly 9× baseline after the repeated start/stop pattern introduced in kiln#73.

Fix

Replace the lazy delegate with an explicit nullable holder (persistingExecutorHolder). The custom get() mirrors the old lazy behaviour (create-on-first-access) while making initialisation detectable. At the end of stop(), if the holder is non-null the executor is shut down via shutdownNow() — non-blocking, safe to call on the main thread — and the holder is cleared so a subsequent start() gets a fresh executor without any residual state.

ReplayExecutorService.shutdown() is intentionally not used here: it blocks for options.shutdownTimeoutMillis and risks an ANR when invoked on the main thread. shutdownNow() interrupts any in-flight persistence write and discards the queue, which is acceptable because cache.close() is called immediately before in the same stop() body.

Test

Added stop shuts down persisting executor so no SentryReplayPersister threads leak across cycles to SessionCaptureStrategyTest. The test stubs threadChecker.isMainThread() to true (forcing the persisting executor path), runs three start/stop cycles, and asserts that no SentryReplayPersister-* threads remain alive after a short drain window.

@runningcode

Copy link
Copy Markdown
Contributor

Thanks for contributing, could you run spotlessApply to fix the formatting failure? Then we'll review this more closely.

@romtsn romtsn left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tsushanth thanks for your contribution, that's very much appreciated! I'm thinking if there's potentially a better approach, which would be to pull persistingExecutor up to the ReplayIntegration level and then pass it as a ctor argument to the respective CaptureStrategy.

It could then have the same lifecycle as the replayExecutor (that is, shut it down inside ReplayIntegration.close()) and would survive multiple start/stop calls, potentially saving us the cost of creating a new executor every time we start a new recording.

@tsushanth

Copy link
Copy Markdown
Contributor Author

Refactored per the feedback — persistingExecutor is now owned by ReplayIntegration instead of BaseCaptureStrategy:

  • Declared as a lazy field in ReplayIntegration using the same pattern as replayExecutor (thread factory moved there too)
  • Passed as a constructor argument to SessionCaptureStrategy and BufferCaptureStrategy, which forward it up to BaseCaptureStrategy
  • BufferCaptureStrategy.convert() hands both executors to the new SessionCaptureStrategy to preserve ordering
  • ReplayIntegration.close() shuts down persistingExecutor alongside replayExecutor
  • BaseCaptureStrategy.stop() no longer touches the executor — the holder/lazy workaround and per-stop shutdown are gone

Tests updated to pass a mock persistingExecutor matching the existing mock pattern.

@tsushanth

Copy link
Copy Markdown
Contributor Author

Refactored as suggested @romtsnpersistingExecutor is now owned by ReplayIntegration and passed as a constructor argument to the capture strategies. It shares the same lifecycle as replayExecutor: created lazily on first use, survives stop()/start() cycles, and shut down in ReplayIntegration.close().

Also ran spotlessApply to fix the formatting failure @runningcode flagged.

@romtsn romtsn force-pushed the fix-persisting-executor-leak-5564 branch from d32fda0 to 25cec73 Compare July 1, 2026 17:04
tsushanth and others added 4 commits July 1, 2026 19:04
Each start/stop cycle leaked one SentryReplayPersister-* thread because
stop() reset delegated properties (segmentTimestamp, currentReplayId)
whose setters dispatch to persistingExecutor, initialising the lazy —
but stop() never shut it down.

Replace the lazy delegate with an explicit nullable holder so the
executor is only created when actually needed and can be detected at
stop() time.  Call shutdownNow() (non-blocking) rather than the blocking
shutdown() to avoid ANRs when stop() runs on the main thread.

Fixes getsentry#5564
Move persistingExecutor out of BaseCaptureStrategy and into
ReplayIntegration, passing it as a constructor argument to CaptureStrategy
subclasses. Shut it down in ReplayIntegration.close() alongside
replayExecutor so executor lifecycle is managed in one place.
…ut down

Add the persistingExecutor argument to SessionCaptureStrategy and
BufferCaptureStrategy constructor calls in tests, and add changelog entry.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@romtsn romtsn force-pushed the fix-persisting-executor-leak-5564 branch from 25cec73 to 6d4e33b Compare July 1, 2026 17:05

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 6d4e33b. Configure here.

Comment thread CHANGELOG.md Outdated
Comment thread sentry-android-replay/src/main/java/io/sentry/android/replay/ReplayIntegration.kt Outdated
romtsn and others added 2 commits July 1, 2026 19:11
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The test used a mocked executor that never spawned threads, so the
thread-count assertion was always true regardless of the fix.
The executor lifecycle is now owned by ReplayIntegration, not
SessionCaptureStrategy, so the test belonged at the wrong layer.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Comment thread sentry-android-replay/src/main/java/io/sentry/android/replay/ReplayIntegration.kt Outdated
Uses real ScheduledThreadPoolExecutor threads so the test actually
fails if the shutdown in close() is removed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@romtsn romtsn force-pushed the fix-persisting-executor-leak-5564 branch 4 times, most recently from 6c60e00 to 7a3e33b Compare July 1, 2026 19:27
shutdown() calls awaitTermination() which blocks up to
shutdownTimeoutMillis. Since close() can run on the main thread
(via Sentry.close() from hybrid SDKs), this risks an ANR.
shutdownNow() is non-blocking and sufficient at teardown.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@romtsn romtsn force-pushed the fix-persisting-executor-leak-5564 branch from 7a3e33b to c14175f Compare July 1, 2026 19:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fix leaked persisting executor in BaseCaptureStrategy

3 participants