From e44e4eb9285d48b030b5784e805de87dd10a1292 Mon Sep 17 00:00:00 2001 From: "W. Trevor King" Date: Wed, 17 May 2017 22:54:13 -0700 Subject: [PATCH 1/2] config: Make capabilities and noNewPrivileges Linux-only (again) Roll back the genericization from 718f9f3f (minor narrative cleanup regarding config compatibility, 2017-01-30, #673). Lifting the restriction there seems to have been motivated by "Solaris supports capabilities", but that was before the split into a capabilities object which happened in eb114f05 (Add ambient and bounding capability support, 2017-02-02, #675). It's not clear if Solaris supports ambient caps, or what Solaris API noNewPrivileges were punting to [1]. And John Howard has recently confirmed that Windows does not support capabilities and is unlikely to do so in the future [2]. He also confirmed that Windows does not support rlimits [3]. John's statement didn't directly address noNewPrivileges, but we can always restore any of these properties to the Solaris/Windows platforms if/when we get docs about which API we're punting to on those platforms. Also add some backticks, remove the hyphens in "OPTIONAL) - the", standardize lines I touch to use "the process" [4], and use four-space indents here to keep Pandoc happy (see 7795661 (runtime.md: Fix sub-bullet indentation, 2016-06-08, #495). [1]: https://github.com/opencontainers/runtime-spec/pull/673#discussion_r99353136 [2]: https://github.com/opencontainers/runtime-spec/pull/810#issuecomment-301594590 [3]: https://github.com/opencontainers/runtime-spec/pull/835#issuecomment-303455386 [4]: https://github.com/opencontainers/runtime-spec/pull/809#discussion_r116297660 Signed-off-by: W. Trevor King --- config.md | 25 ++++++++++++------------- 1 file changed, 12 insertions(+), 13 deletions(-) diff --git a/config.md b/config.md index 78de158cf..8eb2e3414 100644 --- a/config.md +++ b/config.md @@ -145,16 +145,6 @@ For all platform-specific configuration values, the scope defined below in the [ * **`env`** (array of strings, OPTIONAL) with the same semantics as [IEEE Std 1003.1-2001's `environ`][ieee-1003.1-2001-xbd-c8.1]. * **`args`** (array of strings, REQUIRED) with similar semantics to [IEEE Std 1003.1-2001 `execvp`'s *argv*][ieee-1003.1-2001-xsh-exec]. This specification extends the IEEE standard in that at least one entry is REQUIRED, and that entry is used with the same semantics as `execvp`'s *file*. -* **`capabilities`** (object, OPTIONAL) is an object containing arrays that specifies the sets of capabilities for the process(es) inside the container. - Valid values are platform-specific. - For example, valid values for Linux are defined in the [capabilities(7)][capabilities.7] man page, such as `CAP_CHOWN`. - Any value which cannot be mapped to a relevant kernel interface MUST cause an error. - capabilities contains the following properties: - * **`effective`** (array of strings, OPTIONAL) - the `effective` field is an array of effective capabilities that are kept for the process. - * **`bounding`** (array of strings, OPTIONAL) - the `bounding` field is an array of bounding capabilities that are kept for the process. - * **`inheritable`** (array of strings, OPTIONAL) - the `inheritable` field is an array of inheritable capabilities that are kept for the process. - * **`permitted`** (array of strings, OPTIONAL) - the `permitted` field is an array of permitted capabilities that are kept for the process. - * **`ambient`** (array of strings, OPTIONAL) - the `ambient` field is an array of ambient capabilities that are kept for the process. * **`rlimits`** (array of objects, OPTIONAL) allows setting resource limits for a process inside the container. Each entry has the following structure: @@ -165,13 +155,22 @@ For all platform-specific configuration values, the scope defined below in the [ If `rlimits` contains duplicated entries with same `type`, the runtime MUST error out. -* **`noNewPrivileges`** (bool, OPTIONAL) setting `noNewPrivileges` to true prevents the processes in the container from gaining additional privileges. - As an example, the ['no_new_privs'][no-new-privs] article in the kernel documentation has information on how this is achieved using a prctl system call on Linux. - For Linux-based systems the process structure supports the following process-specific fields. * **`apparmorProfile`** (string, OPTIONAL) specifies the name of the AppArmor profile to be applied to processes in the container. For more information about AppArmor, see [AppArmor documentation][apparmor]. +* **`capabilities`** (object, OPTIONAL) is an object containing arrays that specifies the sets of capabilities for the process. + Valid values are defined in the [capabilities(7)][capabilities.7] man page, such as `CAP_CHOWN`. + Any value which cannot be mapped to a relevant kernel interface MUST cause an error. + `capabilities` contains the following properties: + + * **`effective`** (array of strings, OPTIONAL) the `effective` field is an array of effective capabilities that are kept for the process. + * **`bounding`** (array of strings, OPTIONAL) the `bounding` field is an array of bounding capabilities that are kept for the process. + * **`inheritable`** (array of strings, OPTIONAL) the `inheritable` field is an array of inheritable capabilities that are kept for the process. + * **`permitted`** (array of strings, OPTIONAL) the `permitted` field is an array of permitted capabilities that are kept for the process. + * **`ambient`** (array of strings, OPTIONAL) the `ambient` field is an array of ambient capabilities that are kept for the process. +* **`noNewPrivileges`** (bool, OPTIONAL) setting `noNewPrivileges` to true prevents the process from gaining additional privileges. + As an example, the [`no_new_privs`][no-new-privs] article in the kernel documentation has information on how this is achieved using a `prctl` system call on Linux. * **`oomScoreAdj`** *(int, OPTIONAL)* adjusts the oom-killer score in `[pid]/oom_score_adj` for the container process's `[pid]` in a [proc pseudo-filesystem][procfs]. If `oomScoreAdj` is set, the runtime MUST set `oom_score_adj` to the given value. If `oomScoreAdj` is not set, the runtime MUST NOT change the value of `oom_score_adj`. From cb8df7b0c568fdc928d1ebca0327b18218d50b5c Mon Sep 17 00:00:00 2001 From: "W. Trevor King" Date: Tue, 23 May 2017 10:28:50 -0700 Subject: [PATCH 2/2] config: Make rlimits POSIX-specific This property was initially Linux-specific. 718f9f3f (minor narrative cleanup regarding config compatibility, 2017-01-30, #673) removed the Linux restriction, but the rlimit concept is from POSIX and Windows doesn't support it [1]. This commit adds new subsections for the POSIX-specific and Linux-specific process entries (to match the approach we currently use for process.user), and punts to POSIX for the Solaris values and compliance testing approach. If/when we get a Solaris-specific doc for valid values, we can replace the POSIX punt there, but we probably want to continue punting to POSIX for getrlimit(3)-based compliance testing. I've renamed the overly-specific LinuxRlimit to POSIXRlimit. We could use the generic Rlimit, but then we'd be stuck if/when Windows adds support for some rlimit-like thing that doesn't match up cleanly enough for us to use the POSIX structure. [1]: https://github.com/opencontainers/runtime-spec/pull/835#issuecomment-303455386 Signed-off-by: W. Trevor King --- config.md | 33 +++++++++++++++++++++++++-------- specs-go/config.go | 6 +++--- 2 files changed, 28 insertions(+), 11 deletions(-) diff --git a/config.md b/config.md index 8eb2e3414..16234f4eb 100644 --- a/config.md +++ b/config.md @@ -145,17 +145,33 @@ For all platform-specific configuration values, the scope defined below in the [ * **`env`** (array of strings, OPTIONAL) with the same semantics as [IEEE Std 1003.1-2001's `environ`][ieee-1003.1-2001-xbd-c8.1]. * **`args`** (array of strings, REQUIRED) with similar semantics to [IEEE Std 1003.1-2001 `execvp`'s *argv*][ieee-1003.1-2001-xsh-exec]. This specification extends the IEEE standard in that at least one entry is REQUIRED, and that entry is used with the same semantics as `execvp`'s *file*. -* **`rlimits`** (array of objects, OPTIONAL) allows setting resource limits for a process inside the container. + +### Linux and Solaris Process + +For POSIX-based systems (Linux and Solaris), the `process` object supports the following process-specific properties: + +* **`rlimits`** (array of objects, OPTIONAL) allows setting resource limits for the process. Each entry has the following structure: - * **`type`** (string, REQUIRED) - the platform resource being limited, for example on Linux as defined in the [setrlimit(2)][setrlimit.2] man page. - * **`soft`** (uint64, REQUIRED) - the value of the limit enforced for the corresponding resource. - * **`hard`** (uint64, REQUIRED) - the ceiling for the soft limit that could be set by an unprivileged process. - Only a privileged process (e.g. under Linux: one with the CAP_SYS_RESOURCE capability) can raise a hard limit. + * **`type`** (string, REQUIRED) the platform resource being limited. + * Linux: valid values are defined in the [`getrlimit(2)`][setrlimit.2] man page, such as `RLIMIT_MSGQUEUE`. + * Solaris: valid values are defined in the [`getrlimit(3)`][getrlimit.3] man page, such as `RLIMIT_CORE`. + + The runtime MUST [generate an error](runtime.md#errors) for any values which cannot be mapped to a relevant kernel interface + For each entry in `rlimits`, a [`getrlimit(3)`][getrlimit.3] on `type` MUST succeed. + For the following properties, `rlim` refers to the status returned by the `getrlimit(3)` call. + + * **`soft`** (uint64, REQUIRED) the value of the limit enforced for the corresponding resource. + `rlim.rlim_cur` MUST match the configured value. + * **`hard`** (uint64, REQUIRED) the ceiling for the soft limit that could be set by an unprivileged process. + `rlim.rlim_max` MUST match the configured value. + Only a privileged process (e.g. one with the `CAP_SYS_RESOURCE` capability) can raise a hard limit. + + If `rlimits` contains duplicated entries with same `type`, the runtime MUST [generate an error](runtime.md#errors). - If `rlimits` contains duplicated entries with same `type`, the runtime MUST error out. +### Linux Process -For Linux-based systems the process structure supports the following process-specific fields. +For Linux-based systems, the `process` object supports the following process-specific properties. * **`apparmorProfile`** (string, OPTIONAL) specifies the name of the AppArmor profile to be applied to processes in the container. For more information about AppArmor, see [AppArmor documentation][apparmor]. @@ -862,7 +878,8 @@ Here is a full example `config.json` for reference. [mount.8]: http://man7.org/linux/man-pages/man8/mount.8.html [mount.8-filesystem-independent]: http://man7.org/linux/man-pages/man8/mount.8.html#FILESYSTEM-INDEPENDENT_MOUNT%20OPTIONS [mount.8-filesystem-specific]: http://man7.org/linux/man-pages/man8/mount.8.html#FILESYSTEM-SPECIFIC_MOUNT%20OPTIONS -[setrlimit.2]: http://man7.org/linux/man-pages/man2/setrlimit.2.html +[getrlimit.2]: http://man7.org/linux/man-pages/man2/getrlimit.2.html +[getrlimit.3]: http://pubs.opengroup.org/onlinepubs/9699919799/functions/getrlimit.html [stdin.3]: http://man7.org/linux/man-pages/man3/stdin.3.html [uts-namespace.7]: http://man7.org/linux/man-pages/man7/namespaces.7.html [zonecfg.1m]: http://docs.oracle.com/cd/E86824_01/html/E54764/zonecfg-1m.html diff --git a/specs-go/config.go b/specs-go/config.go index 595fbb4bd..fe0d8ac6c 100644 --- a/specs-go/config.go +++ b/specs-go/config.go @@ -47,7 +47,7 @@ type Process struct { // Capabilities are Linux capabilities that are kept for the process. Capabilities *LinuxCapabilities `json:"capabilities,omitempty" platform:"linux"` // Rlimits specifies rlimit options to apply to the process. - Rlimits []LinuxRlimit `json:"rlimits,omitempty" platform:"linux"` + Rlimits []POSIXRlimit `json:"rlimits,omitempty" platform:"linux,solaris"` // NoNewPrivileges controls whether additional privileges could be gained by processes in the container. NoNewPrivileges bool `json:"noNewPrivileges,omitempty" platform:"linux"` // ApparmorProfile specifies the apparmor profile for the container. @@ -214,8 +214,8 @@ type LinuxIDMapping struct { Size uint32 `json:"size"` } -// LinuxRlimit type and restrictions -type LinuxRlimit struct { +// POSIXRlimit type and restrictions +type POSIXRlimit struct { // Type of the rlimit to set Type string `json:"type"` // Hard is the hard limit for the specified type