Skip to content

Conversation

@c22
Copy link

@c22 c22 commented Jan 29, 2026

This work is based on the previous work by @seruman in their fork https://github.com/seruman/shpool/tree/macos

I tried to retain as much of the existing work done and base my work on top of that.

seruman's fork had socket timeouts disabled on macOS. I wasn't able to reproduce socket timeouts not working on macOS but I was seeing unhandled EINVAL errors on daemon presence probe connections (where the peer closes immediately). These are now handled gracefully, so I have dropped the changes that disabled socket timeouts.

I also ran into a bug where ConfigWatcher was not using canonicalized paths, but inotify et. al. use canonicalized paths. This likely wasn't being ran into much on Linux environments but becomes very obvious on macOS because /tmp is a symlink.

Timing on macOS is indeed tricky, the fix would involve probably significantly re-working the way test are being done, but the workaround I found was to run them in serial.

I think the rest of the changes were relatively minor.

Here's a summary of the changes:

  • fix: handle EINVAL on set_read_timeout for macOS
  • test: fix integration tests for macOS
  • chore: bump shpool_pty to 0.3.2
  • fix: correct test synchronization in ConfigWatcher
  • fix: canonicalize watched paths in ConfigWatcher
  • fix: use into_owned() to avoid double allocation

and the cherry-picked commits from https://github.com/seruman/shpool/tree/macos

  • use platform agnostic watcher instead of inotify
  • add macos/bsd specific passwd fields
  • use std::env::current_exe instead of proc fs
  • add macos specific peer creds handling
  • use getpeereid instead of LOCAL_PEERCRED socket opts

@google-cla
Copy link

google-cla bot commented Jan 29, 2026

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

seruman and others added 11 commits January 29, 2026 17:23
basically the same according to manual
File system watchers (inotify, FSEvents) report canonical paths, but ConfigWatcher was storing paths as provided by the caller. When a path contained symlinks (e.g., /var -> /private/var on macOS), event paths wouldn't match stored paths, causing config changes to be missed.

Add canonicalize_path() to resolve symlinks in the existing portion of a path (handling paths where the final components don't exist yet), and apply it when storing paths in the watcher.

Include a regression test that creates a symlink and verifies events are received when watching through the symlink.
The worker_ready() test helper signals when the worker is idle, but it was signaling even when there was a pending reload_deadline. This could cause tests to proceed before the debounce timeout fired, leading to flaky test results.

Only signal idle when there's no pending work (reload_deadline is None).

Update the debounce test to not call worker_ready() between writes, as that now correctly waits for the reload to complete.
This version includes macOS support via ptsname (vs ptsname_r).

See: shell-pool/shpool_pty#4
- Forward test_hooks feature from shpool to libshpool crate
- Shorten socket paths to avoid macOS 104-byte sun_path limit
- Forward HOME and PATH env vars to attach subprocess (macOS typically lacks XDG_RUNTIME_DIR, needs HOME instead)
- Use std::env::var directly to avoid unused import warning
- Document that tests should run with --test-threads=1 on macOS due to FSEvents timing behavior under concurrent load
macOS returns EINVAL when setting socket timeout on a connection where the peer has already closed (documented in setsockopt(2)).

This typically happens with daemon presence probe connections.
@c22 c22 marked this pull request as ready for review January 29, 2026 06:24
@c22 c22 mentioned this pull request Jan 29, 2026
// closed (e.g., a daemon presence probe). This is documented in the
// macOS setsockopt(2) man page. Treat this the same as a broken pipe.
#[cfg(target_os = "macos")]
if let Err(e) = stream.set_read_timeout(Some(consts::SOCK_STREAM_TIMEOUT)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This block could be simplified by removing the #[cfg(not(target_os = "macos"))] block below and moving the #[cfg(target_os = "macos")] block to only be attached to the if statement.

In fact, since it looks like on linux EINVAL is only returned when the input is malformed, and we're using a safe wrapper that should never malform input, we could probably get away with not even including the cfg annotation. It's probably better to keep it just to be explicit though, since this would do the wrong thing on linux if we somehow did get EINVAL.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I originally had the EINVAL check as something which ran on all platforms but I was being particularly careful not to affect other platforms with potentially BSD 4.4/macOS specific stuff, mainly because I couldn't really find a definitive answer on what conditions we would see EINVAL on other platforms (even the macOS man pages leave a lot unanswered questions for me).

I'll take another swing at this section.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was being particularly careful not to affect other platforms with potentially BSD 4.4/macOS specific stuff

Yeah, probably a good idea. Thanks for being careful about that kind of thing.

Copy link
Contributor

@ethanpailes ethanpailes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks pretty good to me. The main merge blockers are the lack of safety comments and the fact that I still have to patch this in on my macbook and run the tests.

We should also consider adding support for running the tests under macos in CI. I'm not sure how hard that will be to set up. I think it would be fair to punt on doing that until a followup.


let mut peer_uid: libc::uid_t = 0;
let mut peer_gid: libc::gid_t = 0;
unsafe {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a // Safety: comment that explains why this unsafe block is safe. In this case, we probably just need to say that "sock" is a valid fd, and peer_{uid,gid} are valid references to the stack for the duration of the call.


let mut peer_pid: libc::pid_t = 0;
let mut len = std::mem::size_of::<libc::pid_t>() as libc::socklen_t;
unsafe {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing about a // Safety comment.

@ethanpailes
Copy link
Contributor

I just patched this branch in on my macbook and ran the tests, including the single-threading flag. Unfortunately I saw a bunch of errors.

@c22 is this expected? I would be ok with merging even with these errors since it's an improvement over the status quo, but we'll need to add something to the README explaining that while shpool builds on macos it's not yet an officially supported platform since the test suite doesn't pass yet.

@c22
Copy link
Author

c22 commented Jan 29, 2026

@c22 is this expected?

Ah yeah sorry, I should have been more explicit about that. A bunch of integration tests failed for me as well and bounce is still probably a bit flaky on macOS even in single threaded testing. I will diff against my side later today and see if anything stands out between your results.

@ethanpailes
Copy link
Contributor

A bunch of integration tests failed for me as well and bounce is still probably a bit flaky on macOS even in single threaded testing.

Got it. Where you thinking of trying to fix them all before merge or do you want to just get something merged and then fix the remaining tests on mac in followups? Like I said, since all the linux tests are passing I'm happy to merge even with some tests broken on mac.

@c22
Copy link
Author

c22 commented Jan 30, 2026

I checked my end and the exact same tests were failing on my machine as well, so yes, that was expected.

Where you thinking of trying to fix them all before merge or do you want to just get something merged and then fix the remaining tests on mac in followups?

I was thinking of leaving them failed for now as they were also failing/skipped on seruman's builds.

With some quick-fixes and workarounds, I was able to get 8 more tests to pass (2 are related to hardcoded fish/zsh paths, 6 are related to aggressive timing, eg. 500ms on Linux -> 2500ms on macOS). 4 more tests are MOTD pager related, I didn't look into them too deeply.

5 of the tests are re-attachment failures, these are a bit more concerning to me, because re-attachment is probably a key part of my use-case on Mac. Fixing that locally required changing the way shutdown() works in shell.rs to handle ENOTCONN errors on macOS.

I'm happy for you guys to take this as-is if you like (with the unsafe comments added), but if you're receptive to another fix, I can incorporate it as part of this change as well, I just probably won't get to it until after the weekend.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants