[SYCL][DOC] Remove the P2P-related requirement from enqueue_signal_event#22201
[SYCL][DOC] Remove the P2P-related requirement from enqueue_signal_event#22201slawekptak wants to merge 1 commit into
Conversation
There was a problem hiding this comment.
Pull request overview
This PR updates the proposed sycl_ext_oneapi_reusable_events extension specification to reflect that counter-based events may be enqueued for signaling on any device in the system, removing the previously documented requirement for P2P accessibility between the event’s device and the queue’s device.
Changes:
- Removed the documented
_Throws_condition that required P2P access (and referencedext_oneapi_can_access_peer/ext_oneapi_enable_peer_access) whenevtandqwere associated with different devices. - Keeps the
enqueue_signal_eventbehavior consistent with the rest of the document’s cross-device/cross-context event usage remarks.
| using the `device::ext_oneapi_enable_peer_access` function defined in the | ||
| link:../experimental/sycl_ext_oneapi_peer_access.asciidoc[ | ||
| sycl_ext_oneapi_peer_access] extension. | ||
|
|
There was a problem hiding this comment.
I think this was the whole reason we added the "device" parameter to make_event in #21485. If there is no longer any requirement for the event's device to match the queue's device, can we revert most of the changes in #21485? The API would be more similar to CUDA and easier to use if the you can create an event without specifying a device that is associated with it.
There was a problem hiding this comment.
This is my current thinking too... It seems like SYCL will not use the device for any restrictions for both: reusable events and events IPC. SYCL RT can handle event wait, even if there is no P2P access to the signaling device. The only issue is which device to use when creating the event - maybe we can delay the Level Zero create event call until the first signal event call.
There was a problem hiding this comment.
The only issue is which device to use when creating the event - maybe we can delay the Level Zero create event call until the first signal event call.
I like that idea. What is the default state of an event immediately after creation? It wasn't clear to me from this spec.
cudaEventCreate starts in "unrecorded" state, where waiting on it would block.
I'm asking, because delayed event creation would have consequences for how e.g., passing a new unsubmitted event to a waitlist is handled.
There was a problem hiding this comment.
What is the default state of an event immediately after creation?
Heh, good point! We should specify that in the spec. When you create an event with the default event constructor, the initial state is "signaled". The make_event function doesn't necessarily need to be the same, but it would probably be less confusing to users if it has the same semantic.
The only issue is which device to use when creating the event - maybe we can delay the Level Zero create event call until the first signal event call.
Yes, that could make sense. Another option is to arbitrarily choose one of the devices from the provided context. Does the implementation re-associate the event with a new device each time the event is recorded? If so, the initial device doesn't matter much because it will get overwritten anyways the first time the event is recorded.
There was a problem hiding this comment.
These are all great points. I will post an update once I check what is the initial event state in Level Zero.
Sure, we can choose one of the devices. The event will be re-associated to a new device on the first enqueue signal call. I guess this might have some performance impact, since some resource are pre-allocated on the selected device when the event is created, so probably it is best to create it and then enqueue for signal on the same device. I'm not sure how much impact it might have - also will let you know once I learn this.
There was a problem hiding this comment.
If it turns out there is a performance benefit, then we could keep the overload of make_event that takes a device, but document it as a hint for performance. In that case, we'd probably also provide an overload that does not take a device. I think we'd also remove queries like ext_oneapi_get_device since the device no longer has a semantic meaning.
Counter based events can be enqueued for signaling on any device in the system, so there is no requirement for P2P access between devices.
Counter based events can be enqueued for signaling on any device in the system, so there is no requirement for P2P access between devices.