Skip to content

ci(e2e): add RHEL 10 Docker/Podman/SELinux compatibility suite#2093

Open
ericcurtin wants to merge 1 commit into
NVIDIA:mainfrom
ericcurtin:test-podman-docker
Open

ci(e2e): add RHEL 10 Docker/Podman/SELinux compatibility suite#2093
ericcurtin wants to merge 1 commit into
NVIDIA:mainfrom
ericcurtin:test-podman-docker

Conversation

@ericcurtin

Copy link
Copy Markdown
Contributor

Summary

Adds an opt-in E2E suite that installs Docker Engine alongside RHEL's preinstalled Podman on a bare RHEL 10 runner, verifies SELinux is enforcing, and runs the standard e2e:docker and e2e:podman suites against both engines on the same SELinux-enforcing host, per the discussion in #2092.

Related Issue

Follow-up to the CI request in #2092 (comment).

Changes

  • New reusable workflow .github/workflows/e2e-rhel-selinux.yml:
    • Runs on a bare (non-containerized) RHEL 10 runner so it can install/configure dockerd and read real SELinux/audit state (a step running inside our usual ghcr.io/nvidia/openshell/ci container can't do either).
    • Installs Docker via get.docker.com and applies the --ip-forward-no-drop dockerd override from the linked comment so Docker and Podman coexist on one host.
    • Fails fast if SELinux isn't Enforcing, so the suite can't silently pass on a permissive host.
    • Runs mise run e2e:docker then mise run e2e:podman against the built supervisor image.
    • Greps the audit log for AVC denials recorded during the run and fails the job if any are found, surfacing SELinux/bind-mount relabeling regressions (e.g. the :z/:Z support being added in feat(docker,podman): add SELinux label support for bind mounts #2092).
  • Wires the suite into branch-e2e.yml behind a new test:e2e-rhel label, following the existing test:e2e-kubernetes pattern (optional, not part of the required CI gate, since it's new/unproven runner infrastructure).
  • Updates e2e-label-help.yml, CI.md, and CONTRIBUTING.md to document the new label.

Note on runner provisioning

This requires an org-provisioned GitHub-hosted "larger runner" using the RHEL 10 partner image, labeled linux-amd64-rhel10 (the workflow's default runner input). That provisioning happens in GitHub org settings and isn't something this PR can do; until it exists, test:e2e-rhel runs will queue. The runner input is overridable if a different label is preferred.

Testing

  • Validated all touched/added workflow YAML with actionlint (no new findings; pre-existing runner-label warnings are unrelated, caused by no actionlint.yaml runner-label config in the repo)
  • mise run markdown:lint passes
  • License header check (update_license_headers.py --check) passes
  • Not runnable end-to-end from this session: no linux-amd64-rhel10 runner is provisioned yet, and no Rust/Python source changed, so mise run pre-commit's Rust/Python checks weren't re-run

Checklist

  • Follows Conventional Commits
  • Commits are signed off (DCO)

PR NVIDIA#2092 adds SELinux relabeling (:z/:Z) support to the Docker and
Podman driver bind mounts, but our existing E2E lanes run inside the
Ubuntu-based ghcr.io/nvidia/openshell/ci container image, which does
not enforce SELinux. That leaves the new relabeling behavior, and any
future SELinux-sensitive change, without a host that can actually deny
a mislabeled mount.

Add an opt-in `test:e2e-rhel` label that runs a new reusable workflow,
e2e-rhel-selinux.yml, on a bare (non-containerized) RHEL 10 runner. The
job installs Docker Engine alongside RHEL's preinstalled Podman using
the commands from the PR discussion (get.docker.com plus the
--ip-forward-no-drop dockerd override needed for the two engines to
coexist), verifies SELinux is enforcing, runs the standard e2e:docker
and e2e:podman suites against both engines on the same host, and fails
the job if the audit log shows AVC denials during the run.

This requires an org-provisioned GitHub-hosted "larger runner" using
the RHEL 10 partner image (public preview per the June 2026 GitHub
changelog) under the linux-amd64-rhel10 label; until that runner
exists, jobs using the default label will queue. The suite is wired in
as optional/non-blocking, matching the existing test:e2e-kubernetes
pattern, since it exercises unproven infrastructure.

Documentation in CI.md and CONTRIBUTING.md is updated to list the new
label alongside the existing E2E labels.

Signed-off-by: Eric Curtin <eric.curtin@docker.com>
@copy-pr-bot

copy-pr-bot Bot commented Jul 1, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@ericcurtin

Copy link
Copy Markdown
Contributor Author

I think an org admin has to turn on RHEL 10 runners though

@maxamillion

Copy link
Copy Markdown
Collaborator

This looks good to me, but I think you're right on needing an admin. cc @drew @TaylorMutch

@ericcurtin

Copy link
Copy Markdown
Contributor Author

This looks good to me, but I think you're right on needing an admin. cc @drew @TaylorMutch

We could theoretically do CentOS Stream VMs to work around this, would be good enough...

@maxamillion

Copy link
Copy Markdown
Collaborator

@ericcurtin yeah, good point. It probably makes sense too since CentOS Stream more community upstream oriented than RHEL. I do love me some RHEL but this is probably a better fit for CentOS Stream.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants