Improve Linux distro rootfs compatibility on Apple Silicon#96
Improve Linux distro rootfs compatibility on Apple Silicon#96doanbaotrung wants to merge 1 commit into
Conversation
c503893 to
0331222
Compare
There was a problem hiding this comment.
3 issues found and verified against the latest diff
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="src/syscall/fs.c">
<violation number="1" location="src/syscall/fs.c:335">
P1: `relative_path_between` incorrectly returns EXDEV for single-component sysroot paths</violation>
</file>
<file name="src/syscall/proc-identity.c">
<violation number="1" location="src/syscall/proc-identity.c:39">
P2: Environment UID/GID parsing accepts UINT32_MAX ((uint32_t)-1), a reserved Linux sentinel, as a valid initial identity value</violation>
</file>
Reply with feedback, questions, or to request a fix.
Re-trigger cubic
| common = i; | ||
| } | ||
|
|
||
| if (common == 0) { |
There was a problem hiding this comment.
P1: relative_path_between incorrectly returns EXDEV for single-component sysroot paths
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/syscall/fs.c, line 335:
<comment>`relative_path_between` incorrectly returns EXDEV for single-component sysroot paths</comment>
<file context>
@@ -288,6 +288,133 @@ static int64_t reject_unsupported_fuse_path_op(const path_translation_t *tx)
+ common = i;
+ }
+
+ if (common == 0) {
+ errno = EXDEV;
+ return -1;
</file context>
| errno = 0; | ||
| char *end = NULL; | ||
| unsigned long parsed = strtoul(value, &end, 10); | ||
| if (errno != 0 || end == value || *end != '\0' || parsed > UINT32_MAX) |
There was a problem hiding this comment.
P2: Environment UID/GID parsing accepts UINT32_MAX ((uint32_t)-1), a reserved Linux sentinel, as a valid initial identity value
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/syscall/proc-identity.c, line 39:
<comment>Environment UID/GID parsing accepts UINT32_MAX ((uint32_t)-1), a reserved Linux sentinel, as a valid initial identity value</comment>
<file context>
@@ -24,16 +27,33 @@ static _Atomic int64_t guest_sid = 1, guest_pgid = 1;
+ errno = 0;
+ char *end = NULL;
+ unsigned long parsed = strtoul(value, &end, 10);
+ if (errno != 0 || end == value || *end != '\0' || parsed > UINT32_MAX)
+ return fallback;
+ return (uint32_t) parsed;
</file context>
| if (errno != 0 || end == value || *end != '\0' || parsed > UINT32_MAX) | |
| if (errno != 0 || end == value || *end != '\0' || parsed >= UINT32_MAX) |
|
|
||
| static int fork_child_vfork_notify_fd = -1; | ||
|
|
||
| /* Linux clone flags */ |
There was a problem hiding this comment.
completely duplicated to forkipc.c:574–582
bc4c7e0 to
a2d33ce
Compare
Improve compatibility with real Linux distro rootfs environments on Apple Silicon hosts. Package-manager and shell workflows need behavior closer to Linux for credentials, script execution, fork/clone state, wait handling, pipes, /proc, and shared mappings. Preserve dynamic guest UID/GID state in auxv instead of always reporting fixed guest IDs, and allow the initial guest identity to be configured with ELFUSE_GUEST_UID and ELFUSE_GUEST_GID. This lets distro workflows such as apt post-install scripts run with root-like guest credentials when needed. Probe ELF binaries quietly before falling back to shebang handling, so script execution does not emit misleading "not an ELF" diagnostics. Extend fork IPC state and child restore handling to carry more complete CPU state, including TLS-related registers, PAC keys, clone flags, child TID handling, TPIDRRO_EL0, TPIDR2_EL0, and the original SPSR. Add child process monitoring so host child exit can wake Linux-style wait and signal behavior. Align non-fixed file-backed MAP_SHARED mappings to 2 MiB stage-2 boundaries to avoid HVF mapping issues on Apple Silicon. Improve sysroot symlink creation for absolute guest symlink targets, and add small Linux compatibility behavior for sync_file_range and pipe F_SETNOSIGPIPE. These changes were tested with an Ubuntu arm64 rootfs using shell pipelines, /proc checks, and apt-get update smoke testing.
a2d33ce to
22d8532
Compare
| if (saved_errno == EPIPE) | ||
| signal_queue(LINUX_SIGPIPE); | ||
| errno = saved_errno; | ||
| return linux_errno(); |
There was a problem hiding this comment.
SIGPIPE is no longer queued on EPIPE for sys_write. F_SETNOSIGPIPE in pipe2 only suppresses the host signal; the guest stops seeing SIGPIPE on broken-pipe write. Restore signal_queue(LINUX_SIGPIPE) on EPIPE here.
| } | ||
| if (nr == SYS_write && errno == EPIPE) | ||
| signal_queue(LINUX_SIGPIPE); | ||
| result = linux_errno(); |
There was a problem hiding this comment.
Same SIGPIPE regression on the dispatch fast path. Re-queue LINUX_SIGPIPE when errno == EPIPE before returning linux_errno().
|
|
||
| /* Round length up to align size (overflow-safe) */ | ||
| if (length > UINT64_MAX - (align - 1)) | ||
| return -LINUX_ENOMEM; |
There was a problem hiding this comment.
Rounding length up to BLOCK_2MIB for file-backed MAP_SHARED turns a 4 KiB shm into a 2 MiB VMA. Tail access past EOF SIGBUSes and the Linux-visible length is wrong. Only 2 MiB-align placement (search start); keep length at PAGE_ALIGN_UP(length, 4 KiB).
| } | ||
| } else { | ||
| char dir_host[LINUX_PATH_MAX]; | ||
| if (fcntl(dir_ref->fd, F_GETPATH, dir_host) < 0) |
There was a problem hiding this comment.
fcntl(dir_ref->fd, F_GETPATH, ...) returns EBADF when dirfd is AT_FDCWD, so symlinkat(target="/abs", AT_FDCWD, rel-linkpath) fails with -EBADF. Branch on dir_ref->fd == AT_FDCWD and use getcwd(); fall back to the original guest target if path recovery still fails.
| @@ -762,6 +913,9 @@ static void *thread_create_and_run(void *arg) | |||
| } else { | |||
| WORKER_HV(hv_vcpu_set_sys_reg(vcpu, HV_SYS_REG_TPIDR_EL0, tca->tpidr)); | |||
| } | |||
There was a problem hiding this comment.
Two lines below this added pauth restore, line 942 still has hv_vcpu_set_reg(vcpu, HV_REG_CPSR, 0) /* EL0t */. Same pattern at vm_clone_thread_run line 1252. fork_child_main was fixed to use regs.spsr_el1; these two in-process worker paths should also set HV_REG_CPSR to tca->spsr so parent NZCV/PSTATE survives the clone return.
| errno = 0; | ||
| } while (kevent(kq, NULL, 0, &kev, 1, NULL) < 0 && errno == EINTR); | ||
| close(kq); | ||
| signal_queue(LINUX_SIGCHLD); |
There was a problem hiding this comment.
The kqueue loop sets errno=0 then waits; on non-EINTR kevent failure or zero events SIGCHLD is queued anyway. Also: no waitpid/status capture, no pidfd notify, no shutdown hook on exit_group. Gate signal_queue on (ret == 1 && (kev.fflags & NOTE_EXIT)); add pidfd notification; tie monitor lifetime to a shutdown flag.
| pty_keepalive_table[slot].slave_host_fd = slave_host_fd; | ||
| } else { | ||
| if (slave_host_fd >= 0) | ||
| close(slave_host_fd); |
There was a problem hiding this comment.
Closing slave_host_fd when stale_open_once is false defeats the keepalive's HUP suppression for live entries. If the intent is to let HUP propagate on real child close, split that into an explicit flag distinct from stale_open_once and add a regression test for master HUP behavior.
| int ngroups = get_cached_linux_groups(); | ||
| if (ngroups < 0) | ||
| return linux_errno(); | ||
| const int ngroups = 1; |
There was a problem hiding this comment.
Returning [proc_get_gid()] fabricates membership in the primary gid (Linux supplementary groups are independent of primary gid). Return ngroups=0 until elfuse implements setgroups.
| * otherwise libc's post-fork canary check observes zeroed guard storage | ||
| * and aborts before the child can exec. | ||
| */ | ||
| if (n < max && g->interp_base > 0 && |
There was a problem hiding this comment.
Hardcoded BLOCK_2MIB at interp_base may over- or under-copy a future dynamic linker. Track the interpreter's actual load_min..load_max in elf_resolve_interp and emit a region rounded from that. The comment about __stack_chk_guard is also misleading: that symbol normally lives in libc, not ld.so.
| * interpret unknown trailing fields. | ||
| */ | ||
| #define IPC_VERSION 11 | ||
| #define IPC_VERSION 13 |
There was a problem hiding this comment.
IPC_VERSION jumps 11 -> 13, skipping wire value 12. The magic-mismatch check still catches old children, but the gap reads like a rebase artifact. Either renumber to 12 or note in the comment that version 12 was rolled into the same release.
There was a problem hiding this comment.
IPC_VERSION was removed in #93
Don't touch this portion.
Improve compatibility with real Linux distro rootfs environments on
Apple Silicon hosts. Package-manager and shell workflows need behavior
closer to Linux for credentials, script execution, fork/clone state,
wait handling, pipes, /proc, and shared mappings.
Preserve dynamic guest UID/GID state in auxv instead of always reporting
fixed guest IDs, and allow the initial guest identity to be configured
with ELFUSE_GUEST_UID and ELFUSE_GUEST_GID. This lets distro workflows
such as apt post-install scripts run with root-like guest credentials
when needed.
Probe ELF binaries quietly before falling back to shebang handling, so
script execution does not emit misleading "not an ELF" diagnostics.
Extend fork IPC state and child restore handling to carry more complete
CPU state, including TLS-related registers, PAC keys, clone flags,
child TID handling, TPIDRRO_EL0, TPIDR2_EL0, and the original SPSR. Add
child process monitoring so host child exit can wake Linux-style wait
and signal behavior.
Align non-fixed file-backed MAP_SHARED mappings to 2 MiB stage-2
boundaries to avoid HVF mapping issues on Apple Silicon.
Improve sysroot symlink creation for absolute guest symlink targets, and
add small Linux compatibility behavior for sync_file_range and pipe
F_SETNOSIGPIPE.
These changes were tested with an Ubuntu arm64 rootfs using shell
pipelines, /proc checks, and apt-get update smoke testing.
Summary by cubic
Improves Linux distro rootfs behavior on Apple Silicon so package managers and shell pipelines run cleanly. Tightens identity, exec, fork/clone, memory, pipes, /proc, and symlink handling to better match real Linux.
New Features
ELFUSE_GUEST_UID/ELFUSE_GUEST_GID. /proc and getgroups return these values. set*id syscalls follow Linux privileged semantics.symlinkatrewrites absolute guest targets to a relative path inside the sysroot;pipe2sets F_SETNOSIGPIPE when available;sync_file_rangeis stubbed.Bug Fixes
Written for commit 22d8532. Summary will update on new commits.