Skip to content

[Question]: How does nvshmem_signal_wait_until guarantee data visibility? #75

@xxxxx-ctrl

Description

@xxxxx-ctrl

Question

Hi,

I noticed that nvshmem_uint64_wait_until calls nvshmemi_transfer_syncapi_update_mem() after the wait, which triggers a flush (via cuFlushGPUDirectRDMAWrites or RDMA Read loopback) to ensure preceding RDMA Write data is visible to GPU SMs.

However, nvshmem_signal_wait_until skips this entirely — it only does a volatile poll and returns. The proxy's process_channel_put_signal also does not call enforce_cst.

how does nvshmem_signal_wait_until guarantee that the put data (sent before the signal) is visible to GPU SMs when the signal is observed? Should users use nvshmem_uint64_wait_until instead?

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions