Skip to content

Tolerate slow or unreachable cross-node SharedDoc during session cleanup#4845

Open
midigofrank wants to merge 1 commit into
mainfrom
4817-crash-collab-editor
Open

Tolerate slow or unreachable cross-node SharedDoc during session cleanup#4845
midigofrank wants to merge 1 commit into
mainfrom
4817-crash-collab-editor

Conversation

@midigofrank

Copy link
Copy Markdown
Collaborator

Description

The collaborative editor Session calls SharedDoc.unobserve/1 during cleanup, which issues a cross-node GenServer.call to a SharedDoc that may live on another node. When that node is unreachable (:noconnection) or slow to reply (:timeout), the call exited and crashed the Session; a Phoenix channel blocked in save_workflow then inherited the exit and dropped the client's editor connection.

Wrap both unobserve call sites (terminate/2 and the parent :DOWN handler) in safe_unobserve/1, which catches the exit and logs at warning instead of crashing. Skipping the unobserve is safe: the SharedDoc monitors each observer and runs the same cleanup from its own :DOWN handler when the Session dies.

Closes #4817

AI Usage

Please disclose whether you've used AI anywhere in this PR (it's cool, we just
want to know!):

  • I have used Claude Code
  • I have used another model
  • I have not used AI

You can read more details in our
Responsible AI Policy

Pre-submission checklist

  • I have performed an AI review of my code (we recommend using /review
    with Claude Code)
  • I have implemented and tested all related authorization policies.
    (e.g., :owner, :admin, :editor, :viewer)
  • I have updated the changelog.
  • I have ticked a box in "AI usage" in this PR

The collaborative editor Session calls SharedDoc.unobserve/1 during
cleanup, which issues a cross-node GenServer.call to a SharedDoc that may
live on another node. When that node is unreachable (:noconnection) or
slow to reply (:timeout), the call exited and crashed the Session; a
Phoenix channel blocked in save_workflow then inherited the exit and
dropped the client's editor connection.

Wrap both unobserve call sites (terminate/2 and the parent :DOWN handler)
in safe_unobserve/1, which catches the exit and logs at warning instead
of crashing. Skipping the unobserve is safe: the SharedDoc monitors each
observer and runs the same cleanup from its own :DOWN handler when the
Session dies.

Closes #4817
@github-project-automation github-project-automation Bot moved this to New Issues in Core Jun 10, 2026
@midigofrank midigofrank requested a review from stuartc June 10, 2026 08:41
@midigofrank midigofrank marked this pull request as ready for review June 10, 2026 08:41
@github-actions

Copy link
Copy Markdown

The PR changes are scoped to a Session cleanup resilience fix — safe_unobserve/1 catches :exit from cross-node SharedDoc.unobserve/1 in terminate/2 and the :DOWN handler. No data access, web entrypoint, or persistence paths are touched.

Security Review ✅

  • S0 (project scoping): N/A — change only wraps a cleanup-path GenServer.call in lib/lightning/collaboration/session.ex:140,438 with an :exit rescue; no new queries or entrypoints touch project-scoped data.
  • S1 (authorization): N/A — no new web-layer actions, handle_events, or controllers; the modified code is internal GenServer cleanup invoked by process lifecycle, not user requests.
  • S2 (audit trail): N/A — no inserts/updates/deletes on configuration resources; the patch only changes a failed cleanup unobserve into a logged no-op (lib/lightning/collaboration/session.ex:156).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: New Issues

Development

Successfully merging this pull request may close these issues.

Collaborative editor: cross-node SharedDoc unobserve crashes Session/channel on node disconnect or timeout (LIGHTNING-1D5 + LIGHTNING-1A5)

1 participant