Skip to content

[Cosmos] Implement azure_data_cosmos_driver_native Crate to Support Async FFI Invocation#4515

Draft
kundadebdatta wants to merge 25 commits into
mainfrom
users/kundadebdatta/4372_cosmos_driver_native_crate_async_impl
Draft

[Cosmos] Implement azure_data_cosmos_driver_native Crate to Support Async FFI Invocation#4515
kundadebdatta wants to merge 25 commits into
mainfrom
users/kundadebdatta/4372_cosmos_driver_native_crate_async_impl

Conversation

@kundadebdatta
Copy link
Copy Markdown
Member

  • Implement azure_data_cosmos_driver_native Crate to Support Async FFI Invocation

kundadebdatta and others added 12 commits May 22, 2026 15:16
Adds docs/NATIVE_WRAPPER_SPEC.md to the azure_data_cosmos_driver crate. This is the design spec for a new azure_data_cosmos_driver_native crate that will expose the driver's transport, routing, retry, partition-key, operation, response, and diagnostics primitives through a stable C ABI so non-Rust SDKs (Java, .NET, Python, ...) and C/C++ apps can reuse the driver's machinery. Shipping the spec first so the design can be reviewed independently before implementation lands in a follow-up PR. The spec covers: motivation and scope, crate layout, naming conventions, FFI plumbing (CallContext, RuntimeContext, byte-buffer marshalling, error model), the full handle surface (runtime, account, driver, references, partition keys, operations, response, diagnostics, options builders), build and distribution (cdylib + staticlib, cbindgen, CMake + corrosion), error semantics that intentionally diverge from the prior typed-SDK wrapper (non-success HTTP statuses are surfaced via cosmos_response_status_code, not mapped to error codes), versioning and ABI rules, a 10-phase implementation plan, open design questions, and a migration table for consumers of the deleted azure_data_cosmos_native crate.
…PER_SPEC

Blocking fixes:
- §6 / §4.7: bind cosmos_driver_execute to execute_singleton_operation,
  document Result<Option<CosmosResponse>> handling and FEED_EXHAUSTED.
- §4.6: remove cosmos_operation_with_partition_key; PK now lives on item /
  feed factory args (ItemReference::from_name requires PK at construction).
- §4.4: add normative driver-cache documentation (endpoint-only key,
  options dropped on cache hit, credential collision caveat) plus a new
  COSMOS_ERROR_CODE_OPTIONS_IGNORED_ON_CACHE_HIT advisory.
- §3.5 / §6: rewrite error model for azure_data_cosmos::Error from #4442
  (Kind enum, typed accessors, predicates, synthetic sub-status codes,
  non_exhaustive future-proofing via COSMOS_ERROR_KIND_UNKNOWN).
- §4.2: fix DriverOptions surface to mirror the actual 3-field type;
  builder takes account, with_operation_options for per-call defaults;
  removed allow_emulator_invalid_certs (lives on runtime instead).
- §5.2: fix Cargo features (default = tokio + rustls; drop fake
  reqwest_native_tls / tracing feature; native_tls is the correct name).
- §4.6.3: normative execute-consumption contract (sentinel, free always
  safe, failed execute does not consume, double-execute returns 4005).

Recommended fixes:
- §4.6.2: split with_precondition into IF_MATCH / IF_NONE_MATCH with a
  PRECONDITION_ALREADY_SET error to enforce single-precondition rule.
- §4.3: document resource-token routing via master-key Secret path; add
  with_credential mirror.
- §4.6.1: add missing factories — read_all_items_cross_partition,
  query_items, batch, query/read/replace_offer.
- §4.3 / §4.5 / §3.4: define cosmos_*_clone functions promised in §3.4.
- §5.3: Phase 0 ancillary-tooling re-introduction checklist for entries
  PR #4103 removed (cbindgen as [build-dependencies] only per heaths,
  dict files, .cspell.json, verify-dependencies.rs, AGENTS.md, skills).
- §3 / §5.1: tighten inheritance attribution between #2906 and #3347.
- §7: loosen strict ABI version equality to major-equal / minor->= so the
  spec's additive-growth promise actually holds.
- §2.2: add cbindgen export.rename / item_types policy so the generated
  header matches the naming table (avoids cosmos_cosmos_* double-prefix
  and prevents driver-internal types from leaking).

Other:
- §9: replaced single open question on RuntimeAlreadyInitialized with 9
  questions covering implementation parking-lot items (header-visitor
  borrow vs copy, continuation-token format, multi-part body, C++
  companion header, credential identity in cache, ConnectionString
  parser ownership, symbol stripping, pager continuation-token resume).
- §8: updated Phase 0 / 1 / 2 / 3 / 5 / 6 acceptance criteria to match
  the new contracts (header check-in, error accessor coverage, cache-hit
  advisory test, error-mapped 404 surfaces as is_not_found).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Second-round PR Deep Reviewer findings on PR #4461 (post-1672bd8). Addresses
5 blocking and 5 recommended items; nits / one open question deferred.

F1  Predicate forwarding + cosmos_error_is_service_error (#4442 alignment).
    §3.5.2: documented that cosmos_error_is_* predicates forward to
    CosmosStatus::is_*; added is_service_error. §6.4: split backtrace
    rate knob into separate captures/resolutions per #4442 surface.

F2  query_items + query_plan factory shape. §4.6.1: query_items now takes
    cosmos_feed_range_t* (matches driver's Option<FeedRange>); added a
    separate cosmos_operation_query_plan for the SQL-string path.

F3  Factory signature template. §4.6.1: every CosmosOperation factory
    converted to "cosmos_error_code_t fn(... cosmos_operation_t **out_op)"
    matching the §3.2 ABI shape; removes the bare-pointer divergence.

F4  Removed cosmos_response_iter_headers visitor. §4.7: typed accessors
    only; documented unknown-header drop; §9 Q2 reframed as a
    forward-compat passthrough question.

F5  Opacity sweep. §3.1 cosmos_call_context_t opaque + accessors
    (_create/_free/_runtime/_set_include_error_details/_include_error_
    details). §3.3 cosmos_bytes_t opaque (_data/_len/_free); kept
    cosmos_bytes_view_t for by-value input. §3.4 ownership table updated.
    §4.7 cosmos_response_into_body now writes cosmos_bytes_t **out_body.
    §4.8 cosmos_diagnostics_to_json signature corrected to **out_json.

F6  Landing-prereq callouts. §3.5 / §4.2: explicitly cite #4442 (errors)
    and #4452 (Tokio thread-name prefix) as prerequisites; #4452-only
    surface reworded as "landed in".

F7  Phase 5 operation-options enumeration. Removed max_item_count from
    the OperationOptions list (it's on CosmosOperation::with_max_item_count
    per §4.6.2); grouped all 17 OperationOptions fields by category and
    documented optional v1 subset path. §4.2 prose updated accordingly.

F8  Partition-key value variants. §4.5: renamed append_none ->
    append_undefined to match Cosmos JSON semantics; added
    append_infinity; corrected source line ref to :303.

F9  Cache-advisory warning class. §3.5.1: created a 5001..=5999 warning
    band (non-SUCCESS, populates out_*); moved
    OPTIONS_IGNORED_ON_CACHE_HIT from 4001 -> 5001; reserved 4001.
    §4.4.1: cache-hit advisory now returns 5001 (not SUCCESS) and is
    no longer predicated on single-runtime mode; documented the
    runtime.rs:380-390 lost-race redundant-init path. §9 Q1 reworded.

F10 §5.3 ancillary-tooling checklist expanded. Added P0 workspace
    `members` entry, deny.toml MPL-2.0, sibling azure_data_cosmos
    README/lib.rs/ARCHITECTURE.md cross-link restoration, cspell
    regression-diff note, and deleted-file disposition for
    azurecosmos.pc.in / cmake/DiscoverTests.cmake / next_generation_
    sdks_design_principles.md; added "Lessons from #4090 / #4103"
    preamble.

Also: §4.6.4 NEW (minimal cosmos_feed_range_t surface); §10 migration
table updated to show the new out_op factory pattern.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Three driver-surface alignment fixes surfaced by the PR Deep Reviewer (review #3) — each one was promising a C-ABI shape the underlying azure_data_cosmos_driver cannot deliver as currently exposed.

B1 (s4.5 partition keys). The driver's InnerPartitionKeyValue::Infinity variant is pub(crate) and explicitly documented as 'used only internally for EPK boundary calculations'. There is no public constructor for it. Dropped cosmos_partition_key_builder_append_infinity from the v1 surface and rewrote the s4.5 preamble to clarify that the wrapper exposes the five public variants only. Added Q10 to s9 to track the decision between (a) promoting Infinity to pub in the driver and (b) routing all EPK-boundary use cases through cosmos_feed_range_* permanently.

B2 (s4.6.1 query_plan). The driver's CosmosOperation::query_plan(container, supported_query_features) is a metadata fetch keyed by the supported-query-features mask string — NOT a SQL-execution entry point. The earlier spec text described it as 'build a SQL query as a standalone query-plan operation' and paired it with a fictional cosmos_operation_with_query_parameter mutator that has no driver counterpart. Renamed the wrapper symbol to cosmos_operation_query_plan_for_features, rewrote its doc to explain it's a feature-mask fetch rather than a SQL submitter, and removed _with_query_parameter entirely. The actual SQL path is documented as cosmos_operation_query_items(c, feed_range, andop) + cosmos_operation_with_body(op, json_body), matching the schema-agnostic G2 contract. Updated the Phase 5 factory list in s8 to reflect the rename.

B3 (s4.6.4 FeedRange). The earlier surface promised four constructors, two of which had no driver-side counterpart and two of which had the wrong shape. Replaced with the two driver-public constructors that exist today (FeedRange::full() and FeedRange::for_partition(pk, andPartitionKeyDefinition)), with the for_partition wrapper taking a cosmos_container_ref_t so it can pull the partition-key definition off the container internally. Documented the dropped constructors as deferred (Q11 in s9): cosmos_feed_range_for_epk_range is blocked on a driver-side string-parseable EPK type, and cosmos_feed_range_for_partition_key_range is blocked on a non-existent FeedRangeRepr::PartitionKeyRangeId variant. Also fixed the s3.4 ownership-table reference (_for_* placeholder updated to the two actual constructor names).

No other sections changed. The spec is now line-grounded against the current azure_data_cosmos_driver/src/models/{partition_key, feed_range, cosmos_operation}.rs surface; implementers can match every promised cosmos_* function to a real driver entry point.
Critical fixes:

- Rewrite cancellation (sec 3.6.3) to honestly describe wrapper-side tokio::select! + future-drop, with explicit caveats and pointer to sec 9 Q13 for a future driver-side CancellationToken overload.

- Replace single-shot cosmos_runtime_create with cosmos_runtime_builder_* family that handles async CosmosDriverRuntimeBuilder::build() correctly (sec 4.1).

- Document cosmos_pager_t as wrapper-owned state over (OperationPlan, execute_plan) with strict-sequential _next_submit semantics (sec 4.7).

- Remove undeliverable cosmos_operation_handle_diagnostics_snapshot in favor of a cosmos_operation_handle_state poller; defer mid-flight diagnostics to a driver-side refactor tracked in sec 9 Q16.

Recommended fixes:

- Fix .NET and Java handle leaks in sec 3.1 examples.

- Add max_capacity hard-cap, cosmos_cq_wait_writable, cosmos_cq_wait_batch, cosmos_completion_was_cancel_requested.

- Clarify cosmos_completion_op_handle ownership and cosmos_completion_status derivation when include_error_details=false.

- Add error codes 4013 QUEUE_FULL, 4014 INVALID_OPTION_VALUE, 4015 RUNTIME_BUILD_FAILED.

- Add value-handle immutability + runtime-cardinality advisories.

- Add sec 9 Q13-Q16 and rethread cross-refs through sec 3.4, 4.4.1, 8 rollout phases, and 10 migration table.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Targets analogrelay and FabianMeiswinkel review feedback on PR #4461:

* Rewrite the §3.5 motivation that incorrectly said host SDKs implement retry/throttle/conditional-write recovery. Host SDKs do classification and diagnosability; retry is owned by the driver pipeline.

* Document the §3.3 cosmos_bytes_view_t lifetime explicitly for async submits, and spell out in §4.6.2 that cosmos_operation_with_body copies the bytes (so callers may release source memory immediately after the synchronous call returns).

* Add a normative §3.5.2 paragraph requiring synthetic substatus codes to be emitted as named C constants in azurecosmosdriver.h via a wrapper-side substatus_constants.rs module mirroring the driver's SubStatusCode 1:1.

* Add §9 Q17: driver-owned buffer pool for request bodies (analogrelay #12 / FabianMeiswinkel #16 — zero-copy alternative to the §3.3 copy contract).

* Add §9 Q18: driver→SDK logging callback (FabianMeiswinkel #17 — hosting driver logs in the SDK's ILogger / SLF4J / slog).

* Add §9 Q19: cosmos_operation_options_t builder vs. flat C struct (analogrelay #11 / FabianMeiswinkel #15 — reduce per-call FFI chattiness, host-side merge of layered options).

* Add §9 Q20: handle-table for value-type references (analogrelay #5/#6 — integer-id handles with generation counter to catch double-free and use-after-free without UB).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…l architecture doc

Spec corrections after merging main (PR #4442 now landed):

- Drop nonexistent Kind enum and COSMOS_ERROR_KIND_* taxonomy from sections 3.5.2, 3.6.1, 4.7, 6.x. The merged azure_data_cosmos_driver::error::CosmosError is monomorphic; failure-class taxonomy is encoded via (status_code, sub_status) on CosmosStatus.- Align cosmos_error_t accessor surface 1:1 with merged CosmosError API: add cosmos_error_is_from_wire and cosmos_error_response; document header/body accessors as walks through err.response(); rewrite predicate list to match the actually-implemented CosmosStatus::is_* methods.- Replace fictional sub-status names (TRANSPORT_REQUEST_TIMEOUT, CLOSED_CLIENT, etc.) with the 9 names that actually exist on SubStatusCode; switch codegen sample to u16 sourced from SubStatusCode::FOO.value() so the wrapper auto-tracks driver updates.- Rewrite section 6.3 mapping table from Kind-based to (is_from_wire, sub_status, status_code)-based routing.- Replace per-runtime / per-driver backtrace setters with a single process-global cosmos_set_backtrace_options reflecting error::set_backtrace_options. Update Phase 2/3 bullets accordingly.- Misc clean-ups: section 2.2 cbindgen example, section 4.7 doc comment, section 8 Phase 1/6 done-when criteria.

Visual architecture doc:- New ASYNC_INVOCATION_ARCHITECTURE.md with 5 Mermaid diagrams covering component layout, submission/completion sequence, two-handle ownership, cancellation race, queue lifecycle, plus per-language pinning matrix and threading-rules summary.- Cross-link added from spec section 3.1.
…grams

Two Mermaid sequence diagrams broke at render time on punctuation the parser cannot tokenize inside Note text and participant aliases:

- Section 2 submission/completion lifecycle: 'Note over App: App now \�wait\s' contained backticks. Backticks inside a Mermaid Note terminate the surrounding markdown code block during GitHub's pre-pass, leaving the rest of the diagram dangling and producing 'Expecting SOLID_OPEN_ARROW ... got NEWLINE' on the next line. Replaced with plain 'awaits'.- Section 4 cancellation: 'participant Sel as tokio::select! { fut, cancel }' used '::', '!', '{', '}', ',' in a participant alias; those characters confuse the Mermaid alias parser. Renamed alias to 'Tokio select (future vs cancel)'. Also replaced curly-brace / pipe Completion payloads inside arrow text with plain prose and dropped double quotes from the closing Note (quotes around phrases inside Note text are similarly risky).

All 5 diagrams in the file (component flowchart, lifecycle sequence, ownership flowchart, cancellation sequence, queue state diagram) now render cleanly when validated through Mermaid.
Bootstraps the C ABI wrapper crate for azure_data_cosmos_driver per the NATIVE_WRAPPER_SPEC.md Phase 0 acceptance criteria:

- New crate sdk/cosmos/azure_data_cosmos_driver_native: cdylib + staticlib named azurecosmosdriver (libazurecosmosdriver.{so,dylib,dll}).- build.rs: emits BUILD_IDENTIFIER and runs cbindgen with the export.rename / item_types policy from spec section 2.2 (rename targets are unprefixed; export.prefix = cosmos_ produces the spec-mandated cosmos_*_t names).- src/lib.rs + src/string.rs + src/bytes.rs: c_str! macro, cosmos_version(), cosmos_string_free, cosmos_bytes_t opaque handle + cosmos_bytes_data/_len/_free.- include/azurecosmosdriver.h: checked in per spec section 5.1 header-check-in policy.- CMakeLists.txt + cmake/DiscoverTests.cmake + azurecosmosdriver.pc.in: corrosion-based build harness mirroring the deleted azure_data_cosmos_native crate.- c_tests/test_common.h + c_tests/version.c: minimal Phase 0 test harness (suite registration, discovery, version round-trip, null-safe free assertions).

Repo plumbing:- Cargo.toml: workspace member entry.- deny.toml: MPL-2.0 license allowance (cbindgen).- eng/dict/crates.txt + eng/dict/rust-custom.txt: dictionary entries.- sdk/cosmos/.cspell.json: azurecosmosdriver token.

Validation: cargo build / fmt / clippy / doc all clean; generated header matches the cosmos_*_t naming policy without cosmos_cosmos_* double-prefixing.
kundadebdatta and others added 8 commits June 2, 2026 12:48
…ocation primitives)

Lands the C ABI surface for the spec sections 3.5 (cosmos_error_*), 3.6 (cosmos_completion_*, cosmos_operation_handle_*), 3.1.2 / 3.1.3 (cosmos_cq_* / runtime), and 6.4 (cosmos_set_backtrace_options).

Modules:- src/error.rs: cosmos_error_code_t enum (40xx + 50xx codes incl. QueueShutdown / OperationCancelled / QueueFull / InvalidOptionValue / RuntimeBuildFailed / OptionsIgnoredOnCacheHit), cosmos_error_t opaque storage-punned handle (no cosmos_Arc_* leakage), all accessors + 16 predicates forwarded to CosmosStatus::is_*, cosmos_set_backtrace_options.- src/runtime.rs: cosmos_runtime_t opaque storage-punned handle wrapping a Tokio multi-thread runtime + Arc<RuntimeContextInner>; cosmos_runtime_free; test-only constructor __test_only_create_default_runtime so Phase 1 can exercise the queue end-to-end before Phase 2 wires the public builder.- src/completion.rs: cosmos_cq_t / cosmos_completion_t / cosmos_operation_handle_t opaque types + their full FFI surface (cq create/free/wait/try_wait/wait_batch/wait_writable/shutdown/state, completion outcome/status/user_data/op_handle/was_cancel_requested/take_response/response/take_error/error/free, op handle cancel/state/free). Two test-only synthesizers (__test_only_create_operation_handle + __test_only_enqueue_completion) drive Phase 1 integration tests.

cbindgen config:- Enum types renamed to *_t with the variant prefix baked into the Rust variant names (e.g. CosmosCqState::CosmosCqStateRunning -> COSMOS_CQ_STATE_RUNNING) so the C header produces the spec-mandated form without _T_ infixes.- All handle types use a Box<*Storage> pun so the generated header is Arc-free.

Validation: cargo build / fmt / clippy / doc all clean on the wrapper crate; 23/23 Rust tests pass (null-safety for every cosmos_error_*, synthetic round-trip carrying user_data + outcome, cancel-vs-completion race, shutdown/drained transitions, take_error ownership transfer, include_error_details=false drops the rich error, max_capacity rejection, wait_writable timeout, wait_batch drains multiple, shutdown wakes a blocked waiter). The header check-in mirrors the new surface and the existing rename + item_types policy holds (zero double-prefix; the only function name colliding with a typedef is cosmos_cq_state which differs from cosmos_cq_state_t by suffix).
Captures the current state of the azure_data_cosmos_driver_native rollout (Phase 0 + Phase 1 done; Phase 2 next), the decisions baked in so far (cbindgen variant naming, opaque storage-pun pattern, test-only helpers, error-code band layout, process-global backtrace knobs, cargo features), the Windows / CMake / PowerShell gotchas, the resume checklist, and the relevant open questions tracked in the spec.

Located under sdk/cosmos/copilot_memory.md so it sits next to the spec + the diagram doc and survives any future crate move.
Implements spec NATIVE_WRAPPER_SPEC.md section 4.1 + section 8 Phase 2:

- New module src/runtime_builder.rs exposes the opaque
  cosmos_runtime_builder_t with the merged driver's actually-existing
  primitive setters: workload_id (1-50), correlation_id, user_agent_suffix,
  wrapping_sdk_identifier, cpu_refresh_interval_ms (1000-60000). The
  spec-listed worker_threads / thread_name_prefix /
  allow_emulator_invalid_certs do not exist in the merged
  CosmosDriverRuntimeBuilder; they are intentionally omitted.

- cosmos_runtime_builder_build consumes the builder and bridges the
  driver's async build() via the wrapper's own multi-threaded Tokio
  runtime. On failure surfaces RUNTIME_BUILD_FAILED (4015) plus the rich
  CosmosError via out_error. Tokio-init failures are mapped to the
  driver's TRANSPORT_IO_FAILED status so they share the same accessor
  surface as a real driver-side failure.

- RuntimeContextInner grows an Arc<CosmosDriverRuntime> alongside the
  Tokio runtime. The driver Arc is consumed by Phase 3 (driver / account
  refs) via Arc::clone into per-account handles.

- Complex nested setters (ClientOptions, ConnectionPoolOptions,
  OperationOptions, ThroughputControlGroupOptions, fault_injection_rules)
  are explicit Phase 2+ follow-ups -- each requires its own FFI builder
  surface that would dwarf this commit if folded in here.

- Cargo.toml grows the rustls feature on the driver dep so
  DefaultHttpClientFactory::new() can actually construct an HTTP client
  during build() (the wrapper previously only opted into tokio, which
  left reqwest unbuilt).

- 12 new Rust tests (35 total -- 23 from Phase 1 + 12 new) cover
  lifecycle, NULL safety, every setter's validation surface, build
  happy path, build NULL-argument rejection, and end-to-end
  runtime -> cq_create handoff. C harness c_tests/runtime_lifecycle.c
  mirrors the lifecycle / setter / build coverage so CI exercises the
  same paths through the actual C surface.

Validation: cargo build / fmt / clippy clean (zero wrapper-side
warnings; the 3 pre-existing driver-side warnings documented in
copilot_memory.md section 5.5 remain). cargo doc emits 6 'private item
link' warnings consistent with the Phase 1 pattern.
…iver lifecycle)

Implements spec NATIVE_WRAPPER_SPEC.md section 4.3 + 4.4 + section 8
Phase 3 (minimal scope):

- src/account_ref.rs exposes opaque cosmos_account_ref_t with the
  master-key constructor plus _clone and _free. Token-credential and
  resource-token constructors are deferred (Arc<dyn TokenCredential> is
  an async trait; bridging arbitrary C-side async credentials through
  FFI is non-trivial and warrants its own follow-up).

- src/database_ref.rs exposes opaque cosmos_database_ref_t with the
  name-based _create plus _clone and _free. The RID-based path
  (DatabaseReference::from_rid) is mechanically identical but deferred
  until Phase 6 surfaces resolved RIDs through responses.

- src/driver_options.rs ships cosmos_driver_options_builder_t with
  _new(account) + _with_preferred_regions + _build + _free, plus the
  built cosmos_driver_options_t with _free. The with_operation_options
  setter is deferred to Phase 5 (depends on cosmos_operation_options_*).

- src/driver.rs ships cosmos_driver_t with the SYNCHRONOUS convenience
  entry cosmos_driver_get_or_create_blocking (bridges the driver's
  async get_or_create_driver via the wrapper Tokio runtime's block_on)
  plus _free. The async _submit, _initialize_submit, and the
  cosmos_response_take_driver accessor are deferred to Phase 6 where
  the generic tokio::spawn -> cq_enqueue submit pipeline lands once
  for all operations.

- Cache-hit advisory (5001 OPTIONS_IGNORED_ON_CACHE_HIT, spec 4.4.1)
  is NOT emitted in Phase 3. The merged CosmosDriverRuntime::
  get_or_create_driver API does not surface a was_cached signal;
  detecting cache hits cleanly requires a driver-side enhancement
  rather than wrapper-side hackery. Documented in driver.rs module
  docs and tracked as Phase 3+ deferral in copilot_memory.md.

- Container references (cosmos_container_ref_*) are NOT in Phase 3 \u2014
  ContainerReference::new is pub(crate)-only in the driver and demands
  RID + partition-key definition obtainable only through async
  CosmosDriver::resolve_container. Lands in Phase 6 alongside the
  resolved-container response surface.

- Cargo.toml adds azure_core (for Secret) and url (for Url::parse) as
  runtime deps; both are already in the driver's closure so no extra
  build cost.

- 20 new Rust tests (55 total: 35 prior + 20 new; 1 #[ignore]'d
  network test for manual exercise of the failure path against an
  invalid endpoint). C harness c_tests/account_and_driver_options.c
  covers lifecycle, NULL safety, validation paths, clone round-trips,
  and builder happy paths.

Validation: cargo build / fmt / clippy clean (zero wrapper-side
warnings; the 3 pre-existing driver-side warnings documented in
copilot_memory.md section 5.5 remain). cargo doc emits 9
private-item-link warnings consistent with the Phase 1 / Phase 2
pattern.
…lder)

Implements spec NATIVE_WRAPPER_SPEC.md section 4.5 + section 8 Phase 4:

- src/partition_key.rs ships opaque cosmos_partition_key_builder_t
  with _new, _add_string, _add_number, _add_bool, _add_null,
  _add_undefined, _build, _free. The incremental shape mirrors what
  cross-language SDKs need (C can't construct Rust tuples for the
  driver's From<(T1, T2, ...)> impls). Components are accumulated
  into a Vec<PartitionKeyValue> and converted via the driver's
  From<Vec<...>> at _build time.

- Opaque cosmos_partition_key_t exposes _empty (cross-partition
  sentinel), _clone, _free, _component_count, _is_empty. The empty
  variant is reachable ONLY via _empty -- _build rejects empty
  builders with INVALID_PARTITION_KEY (4004) so callers can't
  accidentally fan out cross-partition through misuse.

- Pre-flight validation prevents driver-side panics:
  * 4th-component append on any setter -> INVALID_OPTION_VALUE (4014)
    (driver's From<Vec<...>> asserts len <= 3 -- we'd abort otherwise).
  * _add_number(NaN / +Inf / -Inf) -> INVALID_OPTION_VALUE (4014)
    (driver's From<f64> routes through FiniteF64::new_strict which
    panics on non-finite -- we reject up-front for a clean error path).

- 15 new Rust tests (70 total: 55 prior + 15 new). Wire-equality
  asserted against driver-direct construction (e.g. our (string,
  number, bool) build matches DriverPartitionKey::from((s, n, b))).
  The done-when criterion's PartitionKeyHashBaselineTest.*.xml
  round-trip is documented as a Phase 4+ follow-up in
  copilot_memory.md; the wire-equality tests satisfy the spirit of
  the contract.

- C harness c_tests/partition_key.c covers every value kind,
  hierarchical and single keys, the 4th-append rejection on all five
  setters, non-finite rejection, NULL safety, and the
  _empty / _clone / _component_count / _is_empty accessors.

Validation: cargo build / fmt / clippy clean (zero wrapper-side
warnings; the 3 pre-existing driver-side warnings documented in
copilot_memory.md section 5.5 remain). cargo doc emits 9
private-item-link warnings consistent with the Phase 1-3 pattern.
…, factories, mutators)

Implements spec NATIVE_WRAPPER_SPEC.md section 4.6 + section 8 Phase 5
(minus the container/item factories which need cosmos_container_ref_*
-- those land in Phase 6 alongside the resolve-container response
surface).

- src/operation_options.rs ships opaque cosmos_operation_options_builder_t
  with _new / _free / _build plus paired _with_<field> / _clear_<field>
  setters for every one of the driver's 16 OperationOptions fields:
  read_consistency_strategy, content_response_on_write,
  excluded_regions, throughput_control_group,
  end_to_end_latency_policy (ms input, driver clamps below 1s),
  endpoint_unavailability_ttl, session_capturing_disabled,
  max_failover_retry_count, max_session_retry_count, the 7 PPCB knobs
  (circuit_breaker_failure_count_for_reads / _writes,
  circuit_breaker_timeout_counter_reset_window_in_minutes,
  allowed_partition_unavailability_duration_in_seconds,
  ppcb_stale_partition_unavailability_refresh_interval_in_seconds,
  per_partition_circuit_breaker_enabled), and custom_headers
  (incremental set + clear). Two new C enums
  (cosmos_read_consistency_t, cosmos_content_response_on_write_t)
  expose the typed value types; everything else uses primitive ints,
  bools, durations, strings, or string arrays.

  Custom headers are accumulated on a side-channel HashMap (the
  auto-generated OperationOptionsBuilder is not Clone, so
  read-modify-write through it is not possible). Validation rejects
  empty strings and control characters with INVALID_HEADER (4010) so
  callers see deterministic errors instead of wire-side rejections.

- src/operation.rs ships opaque cosmos_operation_t backed by
  Box<OperationInner { op: Option<CosmosOperation> }>. The Option
  matches spec section 4.6.3: Phase 6's submit will take() the inner op
  into the driver pipeline, leaving a consumed-sentinel handle behind;
  mutators on a consumed handle return OPERATION_CONSUMED (4005).
  11 container-ref-free factories (create_database, read_database,
  delete_database, read_all_databases, query_databases,
  create_container, read_all_containers, query_containers,
  query_offers, read_offer, replace_offer) plus 8 mutators
  (with_body which copies caller bytes, with_session_token,
  with_activity_id, with_max_item_count covering -1=ServerDecides /
  positive=Limit / 0=INVALID_OPTION_VALUE,
  with_populate_index_metrics, with_populate_query_metrics,
  with_precondition_if_match, with_precondition_if_none_match which
  reject any second precondition setter with
  PRECONDITION_ALREADY_SET 4008).

- driver_options.rs wires cosmos_driver_options_builder_with_operation_options
  (closes the Phase 3 deferral).

- Phase 5 deferrals (documented in module docs + copilot_memory.md):
  * cosmos_operation_with_request_header -- the driver's
    CosmosRequestHeaders is a typed whitelist with no slot for
    arbitrary custom headers. Custom headers live on OperationOptions
    instead. Needs spec reconciliation.
  * priority_level -- spec lists it but the merged driver's
    OperationOptions does not have the field.
  * Container/item factories (read_item, create_item, etc.),
    feed_range, query_plan, batch builder, patch_max_attempts mutator
    -- all land in Phase 6 with cosmos_container_ref_*.

- 32 new Rust tests (102 total: 70 prior + 32 new). C harness
  c_tests/operation_construction.c covers options builder happy /
  clear / NULL paths, custom-header validation, each factory, each
  mutator (including ServerDecides/Limit/zero on with_max_item_count
  and double-set rejection on precondition), and the
  with_operation_options driver-options wiring.

Validation: cargo build / fmt / clippy clean (zero wrapper-side
warnings; the 3 pre-existing driver-side warnings documented in
copilot_memory.md section 5.5 remain). cargo doc emits 9
private-item-link warnings consistent with the Phase 1-4 pattern.
…response, container/item factories)

Implements spec NATIVE_WRAPPER_SPEC.md section 4.4 (async driver
creation) + section 4.6 (container/item factories + feed range) +
section 4.7 (response surface) + section 8 Phase 6.

This is the milestone phase: external SDKs can now do end-to-end CRUD
against a real Cosmos endpoint via the C ABI.

What landed:

- src/submit.rs ships the generic tokio::spawn -> cq_enqueue submit
  pipeline (SpawnContext + spawn_oneshot) plus three FFI entry points
  that use it:
    * cosmos_driver_submit (item-CRUD; binds to
      CosmosDriver::execute_singleton_operation).
    * cosmos_driver_get_or_create_submit (Phase 3 deferral closed --
      async driver creation).
    * cosmos_driver_resolve_container_submit (async container resolve).
  The pipeline takes operation ownership via Option::take so re-submit
  / mutator-after-submit return OPERATION_CONSUMED (4005) per spec
  section 4.6.3 #4. Pre-flight failure does NOT consume the operation
  (restore_inner restores the handle); runtime failures consume it.
  user_data is encoded as usize to avoid async-block auto-trait
  decomposition flagging *mut c_void as !Send.

- src/response.rs ships opaque cosmos_response_t with:
    * Lifecycle (_free) and four typed-header accessors with stable
      borrowed pointers (activity_id, session_token, etag,
      continuation_token).
    * Zero-copy body view via out-param pattern; handles all three
      ResponseBody variants (Bytes / Items->first / NoPayload).
    * Status code + RU charge.
    * Side-payload take accessors (_take_driver, _take_container) for
      the degenerate responses produced by the driver-creation /
      container-resolve submit paths. ResponseInner.inner is
      Option<CosmosResponse> so degenerate responses don't need to
      fabricate one (CosmosResponse::new is pub(crate)).

- src/container_ref.rs ships opaque cosmos_container_ref_t with
  _clone / _free plus the synchronous
  cosmos_driver_resolve_container_blocking. The async variant lives
  in submit.rs alongside the rest of the submit pipeline.

- src/feed_range.rs ships opaque cosmos_feed_range_t with the two
  public constructors from spec section 4.6.4: _full and
  _for_partition_key (which extracts the partition-key definition
  from the supplied container ref). Plus _clone / _free.

- src/operation.rs grows 13 new factories (read_container,
  replace_container, delete_container, read_all_items,
  read_all_items_cross_partition, query_items,
  query_plan_for_features [reserved -- returns INVALID_ARGUMENT
  pending driver/spec reconciliation], batch, create_item, read_item,
  upsert_item, replace_item, delete_item, patch_item) plus the
  with_patch_max_attempts mutator (rejects non-patch ops with
  UNSUPPORTED_OPERATION_FOR_MUTATOR 4009).

- src/completion.rs grows a response slot on Completion (freed by
  Drop if never taken) and rewrites the Phase-1 stub
  cosmos_completion_take_response / _response accessors to return
  real ResponseHandle pointers. CompletionQueueInner is promoted to
  pub(crate) along with new helpers (runtime / max_capacity /
  current_len / include_error_details / inner_arc /
  enqueue_into_inner / OperationHandle::allocate / inner_arc /
  drop_raw / Completion::new_for_publish) so the submit pipeline
  can publish completions from spawned Tokio tasks that survive
  concurrent cq_free / op_handle_free from the producer side.

- 14 new Rust tests (116 total: 102 prior + 14 new). C harness
  c_tests/submit_and_response.c covers lifecycle / NULL safety on the
  full Phase 6 surface; the emulator-backed CRUD scenario from spec
  section 8 Phase 6 done-when criterion is exercised via CI when the
  emulator endpoint is reachable.

Phase 6 deferrals worth tracking (documented in copilot_memory.md
section 3.5):
  * cosmos_operation_query_plan_for_features -- FFI symbol reserved
    but routing pending spec/driver reconciliation.
  * cosmos_operation_with_request_header (carry-over from Phase 5).
  * Multi-part body iteration -- Phase 8 (pager).
  * cosmos_response_diagnostics -- Phase 7.
  * Token-credential / resource-token account constructors -- carry-over.
  * Cache-hit advisory 5001 -- carry-over.
  * Long-tail typed-header accessors -- add as host SDKs need them.

Validation: cargo build / fmt / clippy clean (zero wrapper-side
warnings; the 3 pre-existing driver-side warnings documented in
copilot_memory.md section 5.5 remain). cargo doc emits 11
private-item-link warnings consistent with the Phase 1-5 pattern.
…nguage quick-starts

Rewrites sdk/cosmos/azure_data_cosmos_driver_native/README.md to reflect
the crate's current state (Phase 0-6 complete) and onboard
cross-language binding authors.

What changed:

- Rollout-status table now shows Phases 0-6 as complete with a
  surface summary per phase, plus Phases 7-10 as the remaining
  scope. The old README still claimed "Phase 0 ships only the
  scaffolding".

- New capability matrix lists every user-visible feature with
  green / pending status -- ergonomic at-a-glance reference for
  binding authors deciding when their work is unblocked.

- New "Usage examples" section ships a full CREATE / READ / DELETE
  walk-through against the local Cosmos DB emulator for each of
  the four target binding languages:
    * .NET 8+ (C# 12, DllImport)
    * Java 22+ (FFM API, java.lang.foreign)
    * Go (cgo)
    * Python 3.10+ (ctypes)
  Each example uses the same 7-step blueprint
  (runtime + queue, account, driver, container, partition key,
  CRUD with completion drain, LIFO tear-down) so authors can map
  shapes between languages.

- New "Notes that apply to all four bindings" section documents the
  production-shape contract: single-producer/single-consumer queue,
  user_data correlation pattern, lifetime ownership cheat-sheet,
  schema-agnostic body bytes, diagnostics-on-error caveat,
  per-runtime driver cache scoping.

- Repository archaeology section is preserved verbatim.

Also updates sdk/cosmos/copilot_memory.md push-status header to
reflect Phase 6 having been pushed to origin earlier this session.

No source / build changes; documentation only.
kundadebdatta and others added 5 commits June 2, 2026 23:59
Mermaid treats ; as a statement terminator inside Note text, so
Note over App: App now awaits; no thread is blocked. was split into
two pieces and the parser then failed on the following Tok->>Drv:
line. GitHub surfaced this as:

  Parse error on line 16:
    ...o thread is blocked. Tok->>Drv: pol
  Expecting 'SOLID_OPEN_ARROW' ..., got 'NEWLINE'

Replace the semicolon with &mdash; so the Note stays a single
statement. Other Notes in the file already use &mdash; and are not
affected. No semantic / rendering change beyond restoring the diagram.
…e 1, additive)

Collapse the per-operation factory/mutator/options-builder FFI into two canonical submit entry points that consume flat #[repr(C)] structs filled by the host SDK.

- op_request.rs: CosmosOperationRequest, CosmosOperationOptions (tri-state mirror of driver OperationOptions), CosmosOperationKind, build_request/build_operation/apply_inline_mutators, cosmos_operation_options_default(), plus unit tests.

- submit.rs: cosmos_driver_execute_operation_submit (plan_operation + execute_plan, threads inbound continuation and surfaces next-page token) and cosmos_driver_execute_singleton_operation_submit.

- response.rs: next_continuation CString plumbing + cosmos_response_next_continuation accessor (distinct from header-derived continuation).

- lib.rs: register op_request module. Regenerated azurecosmosdriver.h (additive only).

Existing per-operation FFI left intact; deletions deferred to Phase 2.
Phase 2 of the native-wrapper redesign. The execution surface is now the two canonical entry points (cosmos_driver_execute_operation_submit and cosmos_driver_execute_singleton_operation_submit) consuming the flat cosmos_CosmosOperationRequest / cosmos_CosmosOperationOptions structs introduced in Phase 1.

Removed:
- src/operation.rs (per-operation factories, mutators, cosmos_operation_free, OperationDescHandle)
- src/operation_options.rs (OperationOptionsHandle/builder + ~25 builder fns)
- cosmos_driver_submit from submit.rs and its module declarations from lib.rs

Migrated cosmos_driver_options_builder_with_operation_options to consume the flat CosmosOperationOptions via to_driver() (now pub(crate)). Regenerated the C header, updated error-code docs, README status/capability tables, and rewrote the C# quick-start to the new flat-request API.
@kundadebdatta kundadebdatta self-assigned this Jun 5, 2026
@ananth7592
Copy link
Copy Markdown
Member

ananth7592 commented Jun 5, 2026


user_data: codify the opaque-cookie contract — void *intptr_t

While reviewing the binding shape for the .NET POC, Aaron Robinson (.NET Interop team) flagged the user_data parameter type as a documentation-grade nit worth addressing before any downstream language picks it up. Filing here so it doesn't get lost.

Current shape (this header, 4be377483):

Header line Signature
1118 void *cosmos_completion_user_data(const cosmos_completion_t *c);
1897 cosmos_driver_get_or_create_submit(..., void *user_data, ...)
1924 cosmos_driver_resolve_container_submit(..., void *user_data, ...)
1942 cosmos_driver_execute_operation_submit(..., void *user_data, ...)
1958 cosmos_driver_execute_singleton_operation_submit(..., void *user_data, ...)

Suggested change: swap void * for intptr_t on those 5 signatures, and change the corresponding Rust storage in Completion::user_data from *mut c_void to isize (which cbindgen emits as intptr_t).

Why it matters (and why it's not just bikeshedding):

  1. It's a contract statement. The driver never dereferences this value — it round-trips it verbatim from *_submit to the matching cosmos_completion_user_data. void * says "this is a pointer to memory, mind its lifetime and validity"; intptr_t says "this is an opaque integer cookie." Aaron's exact phrasing was "Avoid all the shenanigans with pointers and ownership by using an intptr_t instead of a void* for the user data. I believe you that it is 'just data' and won't be dereferenced. Codify that in the contract."
  2. It removes a binding-side foot-gun. Every binding generator surfaces void * with its full pointer machinery:
    • cbindgen / *mut c_void in Rust callers
    • P/Invoke generators in .NET emit void*IntPtr but linters and analyzers flag it as pointer-shaped
    • SWIG/JNI for Java emits Pointer/long mismatches that need a manual override
    • WASM bindgen forces a usize cast
      intptr_t collapses all of those to "pointer-sized integer," which is precisely what hosts want for cookie payloads (GCHandle integers in .NET, slab indices in Go, JNI globalref handles wrapped in long, etc.).
  3. It future-proofs us against accidental dereference inside the driver. Right now any helper added to submit.rs could plausibly call .read() on the *mut c_void. With isize the type system rejects it.
  4. Zero ABI cost on every platform the crate currently supports — LP64, LLP64, ILP32 all guarantee sizeof(void*) == sizeof(intptr_t). No binding has to change.
  5. No callers in flight yet. We (the .NET POC) are the first downstream consumer, and we're rebinding against this PR's surface anyway. There's no breakage to manage.

Estimated change cost:

  • submit.rs: 4 parameter types *mut c_voidisize, 4 storage assignments, the Completion field once.
  • completion.rs: 1 storage field + the accessor return type.
  • Regenerate azurecosmosdriver.h.

I'm happy to put up the change as a follow-up PR if it's preferred over folding it into this one — let me know what works best for the merge plan.

Reference for the broader review thread: the same Teams discussion (with Kevin Pilch and Aaron) also confirmed the routing model on the .NET side (GCHandle-boxed AsyncOperation over ConcurrentDictionary) and validated the bulk-completion API (cosmos_cq_wait_batch) — both of which line up with the current spec. The intptr_t ask is the one remaining loose end from that thread.

Thanks for the spec collapse work in d748def8d/4ecdf69ad — the flat-struct + 2-fn surface is a big readability and forward-compat win.

cc @Pilchie

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Cosmos The azure_cosmos crate

Projects

Status: Todo

Development

Successfully merging this pull request may close these issues.

3 participants