protect_exec(3) is intended to provide a single execve-like function that accepts the following arguments:
- SquashFS file path
- Path to mount SquashFS file (a.k.a. root path)
- Path to the root of the desired cgroup filesystem
- UID in which to execute the program
- Arguments to pass to
execve(2)
and performs the following steps:
- Link a loopback device to the SquashFS file
- Mount that loopback device at /, tmpfs at /db, and all automatic
/etc/fstabentries (relative to new root path) - Call
clone(2)withCLONE_NEWNS,CLONE_NEWNET,CLONE_NEWIPC, andCLONE_NEWUTSoptions set - Join the specified cgroup
pivot_root(2)'s into the new root- Change the namespaces to their desired configuration (e.g. unmount everything not from step 2, remove all interfaces)
- Perform
setuid(2)with the specified UID execve(2)the specified program
protect_exec(3) must be called by a process with a user which has the following capabilities:
- CAP_SYS_ADMIN
- CAP_SYS_CHROOT
- CAP_SETUID
and write access to the cgroup tasks file. In any case, root meets these requirements and is likely the simplest option.
Presently, there is no means of specifying which device nodes should be created besides specifying a devtmpfs in /etc/fstab. We could later add support for a node table like CPIO called /etc/nodtab or similar.