test(conformance): arm the fixtures for the input-required scenario family#2319
Conversation
|
@modelcontextprotocol/client
@modelcontextprotocol/codemod
@modelcontextprotocol/server
@modelcontextprotocol/server-legacy
@modelcontextprotocol/express
@modelcontextprotocol/fastify
@modelcontextprotocol/hono
@modelcontextprotocol/node
commit: |
There was a problem hiding this comment.
No bugs found, but this is a sizeable conformance-fixture change (new SEP-2322 tool/prompt fixtures, a custom-fetch discover shim in the client scenario, and baseline burn-down) stacked on the unmerged MRTR server-seam branch, so a human should confirm the fixture design and that the conformance legs stay green once the base lands.
Extended reasoning...
Overview
This PR is test-infrastructure only: it adds SEP-2322 (multi round-trip / input-required) diagnostic tools and a prompt to the conformance everything server, a new sep-2322-client-request-state scenario handler to the everything client, and removes the corresponding entries from both expected-failures baselines (keeping input-required-result-tampered-state with an updated rationale). No SDK source under src/ or published package code is touched.
Security risks
None to production code — the HMAC requestState minting/verification helpers and the local server/discover fetch shim live entirely in the conformance fixture. The HMAC helper is actually a worked example of the integrity-protection obligation the migration guide documents, and uses timingSafeEqual correctly with a length pre-check.
Level of scrutiny
Moderate. Although test-only, the change is large (~450 added lines), depends on the not-yet-merged MRTR server-seam branch for the inputRequired() / acceptedContent / ServerContext APIs it imports, and encodes design judgments a maintainer may want to weigh in on: which scenarios stay baselined (tampered-state because tool throws surface as isError results), the ctx as ServerContext cast in the no-args prompt registration, and especially the client scenario answering server/discover locally through a custom fetch because the conformance mock implements neither negotiation path — that shim is a reasonable workaround but is the kind of fixture trick a maintainer should bless. The baselines are self-policing (the runner fails on stale entries and unexpected failures), so CI on the merged stack will catch any mismatch between the registered fixture tool names and what the pinned conformance release expects.
Other factors
The bug-hunting pass found no issues, and the author reports all five conformance legs exiting 0 against the pinned conformance release with zero stale entries. The changeset bot warning is fine since this is a test package only. Given the size, the stacked base, and the draft-spec (SEP-2322) subject matter, this falls outside the simple/mechanical bar for shadow approval, but there is nothing blocking from the automated review side.
cfa23e9 to
58297e2
Compare
e62b9cc to
e240f12
Compare
317ad59 to
893242c
Compare
e240f12 to
dccb610
Compare
dccb610 to
6c96dd6
Compare
…cenarios Register the test_input_required_result_* diagnostic tools and the input-required prompt in the everything server, written write-once style with the inputRequired() builder and completing from ctx.mcpReq.inputResponses on the retried request. Tools that mint requestState integrity-protect it with a fixture-local HMAC helper and verify it on every retry, as the worked example of the consumer obligation documented in the migration guide. Add the SEP-2322 client scenario to the everything client: the SDK client negotiates the 2026-07-28 era and its auto-fulfilment driver answers the mock's embedded elicitation requests, echoes requestState byte-exact, and retries on fresh request ids. The scenario's mock answers neither server/discover nor initialize, so the fixture supplies the connect-time discover response through the transport's custom fetch; everything the scenario measures runs against the real mock.
… now pass Remove the scenarios the armed fixtures make pass from both expected-failures files: the input-required-result server family (except tampered-state) from the draft and forced-2026 server legs, and sep-2322-client-request-state from both client legs. The tampered-state entry stays with an updated reason: the fixture detects tampering, but a rejection thrown from a tool callback surfaces as an isError tool result rather than the JSON-RPC error the scenario requires.
…red-state now passes Moves the fixture-local HMAC verification of requestState into the new ServerOptions.requestState.verify hook so a tampered retry answers the wire-level -32602 the scenario requires (instead of an isError tool result from the tools/call funnel). Removes the input-required-result-tampered-state entry from both expected-failures baselines.
6c96dd6 to
72cab85
Compare
This arms the conformance fixtures for the multi round-trip request scenarios (SEP-2322,
protocol revision 2026-07-28) now that the SDK has the
inputRequired()server API and theclient auto-fulfilment engine, and burns the corresponding entries out of the expected-failures
baselines.
Motivation and Context
The conformance suite's
input-required-result-*server family and the SEP-2322 clientscenario were baselined as expected failures because the fixtures predate the multi round-trip
feature: the everything server did not register the diagnostic
test_input_required_result_*tools and the everything client never reached the auto-fulfilment driver. With the feature in
place, the fixtures can exercise it and the baselines can shrink.
How Has This Been Tested?
@modelcontextprotocol/conformancerelease, and exit 0 with zero stale baseline entries:the 2025 server leg stays 42/0, the draft server leg goes 39→58 passed checks (14→3
expected-failure entries), the forced-2026 server leg 63→82 (15→4), the client
allleg324→329 (15→14), and the forced-2026 client leg 207→212 (25→24).
with exactly the newly passing scenarios reported as stale, which is the gating mechanism
this change relies on.
Breaking Changes
None — test fixtures and expected-failure baselines only.
Types of changes
Checklist
Additional context
inputRequired()and completes from
ctx.mcpReq.inputResponseson the retry. Tools that mintrequestStateintegrity-protect it with a small fixture-local HMAC helper and verify it on every retry —
the worked example of the consumer obligation the migration guide documents.
input-required-result-tampered-statestays in the baseline: the fixture detects tamperingand rejects, but a rejection thrown from a tool callback surfaces as an
isErrortool result(the tools/call funnel wraps handler throws), not the JSON-RPC error the scenario requires.
It unblocks when the SDK can reject invalid
requestStateat the protocol layer (the plannedopt-in sealing helper) or lets this rejection escape as a protocol error.
server/discovernorinitialize, so the everything client supplies the connect-time discover response throughthe transport's custom
fetchand lets every other request hit the real mock; everything thescenario measures (auto-fulfilment through the registered elicitation handler, byte-exact
requestStateecho, fresh request ids, isolation of unrelated calls, no retry on completeresults) is the SDK driver's behavior.
missing-input-responseandignore-extra-paramsentries areremoved too: the armed tool re-requests missing keys and ignores unrecognized ones, so those
scenarios now pass cleanly.
(checklist note: "Documentation update" is the closest box for a tests/baselines-only change —
swap to "New feature" if the reviewer prefers classifying fixture arming that way.)