Skip to content

Conversation

@Shubhamag12
Copy link
Contributor

@Shubhamag12 Shubhamag12 commented Jan 16, 2026

Context: https://ibm.ent.box.com/file/2102027246213

Description

Problem Statement

Currently, ai-services only supports execution as the root user.

Solution

This PR enables ai-services to run as a non-root user. The tool now supports a hybrid approach:

  • bootstrap (validate/configure): Runs with sudo for one-time system setup
  • runtime operations (application commands): Run as a regular user without elevated privileges

User Setup Requirements

To use ai-services as a non-root user, the following one-time setup is required:

1. Add user to sudo group (wheel)

sudo usermod -aG wheel <username>

This grants the user permission to execute specific commands with sudo when needed.

2. Enable systemd user session persistence

sudo loginctl enable-linger <username>

This ensures the user's systemd instance persists even when not logged in, which is required for:

  • Rootless podman socket to remain active
  • User services to run in the background

Implementation Details

Rootless Podman

  • Containers run as the regular user on the host (not root)
  • Uses user namespaces: processes appear as "root" inside containers but are actually the user's UID on the host
  • Podman socket created at /run/user/<UID>/podman/podman.sock (user-specific, not system-wide)

Directory Permissions

  • Bootstrap creates directories with ownership set to $SUDO_USER
  • Regular user has full read/write access to application directories
  • No root privileges required for runtime operations

Environment Variables

  • XDG_RUNTIME_DIR: Automatically set if missing (required for user systemd operations)
  • SUDO_USER: Used during bootstrap to determine the actual user for ownership settings

Q&A

Q: Why do we need to add the non-root user to the sudo group?
A: We still need certain commands to be executed with sudo, like ai-services bootstrap, as it configures and modifies the env which requires root permissions.

Q: Which commands would be run without sudo?
A: Except bootstrap, every other command would be run without sudo.

@Shubhamag12 Shubhamag12 changed the title Non root user enable ai-services to run as non-root-user Jan 16, 2026
@Shubhamag12 Shubhamag12 force-pushed the non-root-user branch 2 times, most recently from 3a07de7 to eea51b8 Compare January 16, 2026 08:27
@Shubhamag12 Shubhamag12 marked this pull request as ready for review January 16, 2026 09:21
@Shubhamag12 Shubhamag12 requested a review from manju956 January 19, 2026 09:06
return fmt.Errorf("current user is not root (EUID: %d)", euid)
if euid != 0 && os.Getenv("XDG_RUNTIME_DIR") == "" {
uid := os.Getuid()
logger.Infoln("running command as %s", uid, logger.VerbosityLevelDebug)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to set the userID in XDG_RUNTIME_DIR?. Also a query on why do we need to set this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I checked online, they say not to explicitly set this value tho (I dont know much on this :))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Usually it is automatically set, but I noticed in one LPAR where it was not set and that is why we configure it if not already done
  2. XDG_RUNTIME_DIR points to the user specific runtime files.
  3. systemctl --user needs to connect to the user's systemd instance and without this variable, systemctl doesn't know where to find the socket and hence we get "No medium found" error.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does setting userID in XDG_RUNTIMR_DIR has any implications?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, looks like its not needed if we are enabling linger, that is, loginctl enable-linger user, which is required and will be mentioned in the documentation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will remove this from the code and push the changes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait, my bad, while creating the application we get failed to pull image registry.redhat.io/rhaiis/vllm-spyre-rhel9:3.2.5: reading JSON file "/run/containers/1001/auth.json": open /run/containers/1001/auth.json: permission denied because it is not able to lookup the directory.
So yes, this is required.

Apologies for creating the confusion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok

Comment on lines +81 to +86
func userPodmanURI(uid int) string {
return fmt.Sprintf(
"unix:///run/user/%d/podman/podman.sock",
uid,
)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a check. Did all the pods deployment worked fine with non root?
Did you test by deploying the RAG application?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried this on a 3 spyre card setup, where all pods were in healthy state except vllm as it required 4 cards for instruct and 1 for reranker. I configured instruct to use 2 cards and hence it was unhealthy

cmd := exec.Command("sudo", "ppc64_cpu", "--smt")
out, err := cmd.CombinedOutput()
if err != nil {
return fmt.Errorf("failed to check current SMT level: %v, output: %s", err, string(out))
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are using sudo, will this prompt for the password to be entered by user here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if passwordless user is enabled, then no
else for all sudo operations, it will ask for password

but I think here it might fail, as exec.Command doesn't attach to terminal stdin by default afaik. Let me confirm and post results here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but I think here it might fail, as exec.Command doesn't attach to terminal stdin by default afaik.

confirmed this, I have disabled the spinner for smt level, since we might need user input for this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to enable sudo user without password so that there wont be any user intervention needed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont think we should do that
Its upto admin on how the user is setup, passwordless or with password

Copy link
Member

@mayuka-c mayuka-c Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So now create command also requires SUDO_USER right because of setting smt level?.
Wouldnt this defeat the purpose of what we wanted to achieve if I'm correct.

We wanted bootstrap to be run by either root or sudo user and the create by a normal user right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

User will anyway have permissions to run as sudo, and since moving it to configure.go cannot be possible as we want to keep this smt value configurable as per application, thats why thought of adding sudo.
CC: @mkumatag

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know if there is any other alternative

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure. Maybe we can have a default constant value set I guess during bootstrap.
For now, I guess we need to document that only root user and non root users with sudo privileges can run.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, SMT will set per application template need

Signed-off-by: sagarwal-ibm <sagarwal@ibm.com>
Copy link
Member

@mkumatag mkumatag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

initial comments, overall flow looks really complicated, might need some more time to go through this code and the description.

cmd := exec.Command("sudo", "ppc64_cpu", "--smt")
out, err := cmd.CombinedOutput()
if err != nil {
return fmt.Errorf("failed to check current SMT level: %v, output: %s", err, string(out))
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, SMT will set per application template need

Signed-off-by: sagarwal-ibm <sagarwal@ibm.com>

address comments
@mkumatag
Copy link
Member

mkumatag commented Feb 2, 2026

please squash and merge the changes post @mayuka-c's approval.

@mayuka-c
Copy link
Member

mayuka-c commented Feb 3, 2026

@Shubhamag12 is currently testing it. Once done we can merge

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants