IGNITE-28747 GridToStringBuilder#handleRecursion may cause NPE. Colle…#13262
IGNITE-28747 GridToStringBuilder#handleRecursion may cause NPE. Colle…#13262EgorBaranovEnjoysTyping wants to merge 15 commits into
Conversation
…ct toString as tree, avoid extra allocations and problems in recursion resolution
…ide all SBLengthLimit's parent methods (wrong logic). Added new tests.
… up in case of exception
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Refactors Ignite’s toString infrastructure to a node-based implementation with improved recursion handling and enhanced bounded string building, plus adds/extends tests for the new behavior.
Changes:
- Replaced the previous thread-local
SBLimitedLength+ recursion map approach with aGridToStringNodetree, recursion monitors, and marker recovery. - Extended
SBLimitedLength/CircularStringBuildercapabilities (insert/substring, head/tail limits) to support the new toString flow. - Added new unit tests for
SBLimitedLengthand expanded existing toString/circular buffer tests.
Reviewed changes
Copilot reviewed 20 out of 20 changed files in this pull request and generated 12 comments.
Show a summary per file
| File | Description |
|---|---|
| modules/core/src/test/java/org/apache/ignite/testsuites/IgniteUtilSelfTestSuite.java | Registers the new SBLimitedLengthSelfTest in the util test suite. |
| modules/core/src/test/java/org/apache/ignite/internal/util/tostring/SBLimitedLengthSelfTest.java | Adds focused tests for append/insert/toString and prohibited operations in SBLimitedLength. |
| modules/core/src/test/java/org/apache/ignite/internal/util/tostring/GridToStringBuilderSelfTest.java | Adds regression tests around recursion and an NPE scenario for the new implementation. |
| modules/core/src/test/java/org/apache/ignite/internal/util/tostring/CircularStringBuilderSelfTest.java | Adds tests for CircularStringBuilder.insert and substring. |
| modules/commons/src/main/java/org/apache/ignite/internal/util/tostring/SBLimitedLength.java | Updates limited-length builder behavior, adds insert routing logic, forbids mutating reductions, and integrates marker recovery. |
| modules/commons/src/main/java/org/apache/ignite/internal/util/tostring/SBLengthLimit.java | Adjusts overflow logic and tail creation behavior used by SBLimitedLength. |
| modules/commons/src/main/java/org/apache/ignite/internal/util/tostring/NodeRecursionMonitor.java | Introduces recursion tracking via a thread-local identity registry. |
| modules/commons/src/main/java/org/apache/ignite/internal/util/tostring/Node.java | Adds node types and a factory to build a stringification tree with recursion termination and collection/map/array handling. |
| modules/commons/src/main/java/org/apache/ignite/internal/util/tostring/GridToStringBuilder.java | Refactors to the node-based flow and adds throwable rendering. |
| modules/commons/src/main/java/org/apache/ignite/internal/util/tostring/CircularStringBuilder.java | Adds insert and substring operations; removes reset. |
| modules/commons/src/main/java/org/apache/ignite/internal/util/GridStringBuilder.java | Integrates marker recovery into various append/insert/replace operations and updates javadocs. |
Comments suppressed due to low confidence (1)
modules/commons/src/main/java/org/apache/ignite/internal/util/tostring/SBLimitedLength.java:1
- These
i(...)overloads bypassSBLimitedLength's custom insert routing by callingsuper.i(...), which inserts directly into the head buffer and ignores head/tail limiting. They should delegate to this class’si(int, String)(or equivalent) so inserts beyond the head limit are correctly redirected/handled.
/*
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
anton-vinogradov
left a comment
There was a problem hiding this comment.
Thanks for digging into this — the NPE is real and the diagnosis is correct. My concern is that a small, local defect is being fixed with a full rewrite of the toString engine, and the rewrite introduces problems that are worse than the original bug.
Root cause (small and local). SBLimitedLength overrides every a(...) (append) but none of the i(...) (insert) methods. handleRecursion() calls buf.i(pos, hash), which writes straight into impl() and bypasses onWrite() — the only place that allocates tail. The insert pushes the head past HEAD_LEN, so overflowed() flips to true while tail is still null, and the next a(savedName) takes the overflow branch -> tail.append(...) -> NPE. The RecursivePayload test reproduces exactly this.
The minimal fix is already in this PR. The i(...) overrides + lazy-tail in SBLimitedLength are the correct fix and are enough to close the ticket on their own (~2 files + the new test). Everything else — the GridToStringNode tree, NodeRecursionMonitor, the marker/recover mechanism, gutting GridToStringBuilder by ~685 lines — is an unrelated refactor.
Blocking concerns with the refactor (details inline):
- Marker/recover breaks the
toString()contract. A nestedS.toString()returns a marker string (new String("GridToStringNode")) instead of its value, recovered later by String identity. Any wrapping —"Wrapper{" + S.toString(...) + "}", very common in our code — defeats the identity lookup and leaks the literalGridToStringNodeinto the log.recoverObject's own javadoc concedes the design can't guarantee correctness. - Thread-keyed static map leaks.
static ConcurrentHashMap<Thread, ...> CATCHED_NODESpinsThreadkeys (and captured nodes) wheneverclear()is skipped on any path. The originalThreadLocalwas leak-free by construction; this replaces it with a slower, leak-prone map of identical semantics. toString()can now throw.markNode()ends withidentities().orElseThrow(); on a path whereinit()didn't run, an innertoString()throws — the exact failure mode the original blanket try/catch existed to prevent.
Also: the tree is built in full (reflection + recursion + each field's toString()) and only length-limited at render time, which defeats the whole point of SBLimitedLength (cap the cost of toString on large graphs). The PR claims fewer allocations but adds per-node objects, Optional/lambda chains, a per-call new SBLimitedLength, and a recoverObject lookup on every append — with no benchmark.
Proposal: reduce this PR to the SBLimitedLength insert overrides + lazy tail + the RecursivePayload test. If we genuinely want to cut toString allocations, let's do it as a separate ticket with JMH before/after and an explicit story for large graphs and wrapped toString().
| * @return The unique marker string. | ||
| */ | ||
| static String markNode(GridToStringNode node) { | ||
| String result = new String(GridToStringNode.class.getSimpleName()); |
There was a problem hiding this comment.
Two problems on this method:
- Using
new String(...)identity as a map key is extremely fragile (see the wrapped-toStringcase). orElseThrow()makesmarkNodethrow if the thread state wasn't initialized (a nested call on a path whereinit()didn't run).toString()must never throw — the original guarded every path with try/catch; this reintroduces throwing fromtoString.
There was a problem hiding this comment.
- new String will recover new Object, toString may return value from String pool
- It's assertion, that init run.
There was a problem hiding this comment.
-
Agreed on
new String(...)— you do need a fresh identity becausetoString()may hand back an interned/pooled instance. That part is fine. -
But
Optional.orElseThrow()is not a Javaassert: it throwsNoSuchElementExceptionunconditionally (it's not gated by-ea), and that throw escapes from insidetoString()— exactly the "toString must never throw" failure mode the original blanket try/catch existed to prevent. If the intent is only to document the "init() ran" invariant, useassert identities().isPresent()(ororElseThrow(() -> new AssertionError(...))), so it's a no-op in production instead of being able to turn a logging call into a thrown exception.
| * A thread-local cache for nodes, used to handle references of | ||
| * inner toString() calls by mapping temporary markers to actual nodes. | ||
| */ | ||
| static final ConcurrentHashMap<Thread, IdentityHashMap<String, GridToStringNode>> CATCHED_NODES |
There was a problem hiding this comment.
static ConcurrentHashMap<Thread, ...> keyed by Thread is a classic leak: in a pooled-thread server the keys never die, and any path that skips clear() (exception in a user toString, reentrancy) accumulates entries that pin the Thread and all captured nodes/objects. The value is a per-thread map touched only by its owner, so the ConcurrentHashMap buys nothing but overhead. This should be a ThreadLocal — which is what the original used and why it didn't leak. (nit: CATCHED -> CACHED.)
There was a problem hiding this comment.
clear is always called in finaly statements for first call in sequence
There was a problem hiding this comment.
The finally-clear handles the leak (given perfectly balanced init/clear), but the design point stands: this map is only ever touched by its owning thread, so ConcurrentHashMap<Thread, …> is pure overhead over a ThreadLocal — and LAST_CONSTRUCTED_… / OBJECT_REGISTRY right next to it are already ThreadLocal. A ThreadLocal<IdentityHashMap<…>> is leak-free by construction and removes the per-call thread lookup. Also the CATCHED → CACHED rename (Copilot's nit) doesn't look applied yet.
| } | ||
|
|
||
| /** {@inheritDoc} */ | ||
| @Override public GridStringBuilder i(int offset, String str) { |
There was a problem hiding this comment.
This is the correct fix for the NPE. My suggestion is to ship this (the insert overrides + lazy tail) plus the RecursivePayload test as the entire patch, drop recoverObject from it, and split the node-tree work into a separate ticket/RFC.
There was a problem hiding this comment.
I dropped recoverObject.
I don't want to artificially modify the tests by removing valid cases just to satisfy the 'Fix NPE' requirement. I'd rather view this task as 'The handleRecursion method is not working correctly'."
|
I pushed a minimal alternative targeted at this branch: EgorBaranovEnjoysTyping#1 The root cause is a one-method omission: The alternative overrides My reasoning for preferring a localized fix over the node-tree rewrite is in the inline review comments above (marker/recover changing the |
…I review comments
…I review comments
…I review comments
…I review comments. Rollback overflowed method, because it breaks tests
…I review comments. Method reference replaced with supplier to avoid NPE
|
Thanks for the round of fixes — re-reviewed at The main blocker is genuinely resolved. Cycle handling and the original NPE repro (self-ref sweep across the A few things still open:
On the bigger picture: I understand you'd rather frame this as " |
Collect toString as tree, avoid extra allocations and problems in recursion resolution
Problems solved:
Any recursive call in toString method would be handled
Allocate less memory
Memory wouldn't be kept for a whole thread life