diff --git a/.github/workflows/egress-test.yaml.yml b/.github/workflows/egress-test.yaml.yml index 675634b9..102f9556 100644 --- a/.github/workflows/egress-test.yaml.yml +++ b/.github/workflows/egress-test.yaml.yml @@ -50,3 +50,23 @@ jobs: run: | chmod +x tests/smoke-nft.sh ./tests/smoke-nft.sh + + - name: Run dynamic ip test + working-directory: components/egress + run: | + chmod +x tests/smoke-dynamic-ip.sh + ./tests/smoke-dynamic-ip.sh + + bench: + runs-on: ubuntu-latest + steps: + - name: Checkout code + uses: actions/checkout@v4 + + - name: Run bench test + working-directory: components/egress + run: | + chmod +x tests/bench-dns-nft.sh + ./tests/bench-dns-nft.sh + env: + BENCH_SAMPLE_SIZE: "20" diff --git a/components/egress/README.md b/components/egress/README.md index 71f7b0f5..d6b5dc60 100644 --- a/components/egress/README.md +++ b/components/egress/README.md @@ -10,6 +10,7 @@ The **Egress Sidecar** is a core component of OpenSandbox that provides **FQDN-b - **FQDN-based Allowlist**: Control outbound traffic by domain name (e.g., `api.github.com`). - **Wildcard Support**: Allow subdomains using wildcards (e.g., `*.pypi.org`). - **Transparent Interception**: Uses transparent DNS proxying; no application configuration required. +- **Dynamic DNS (dns+nft mode)**: When a domain is allowed and the proxy resolves it, the resolved A/AAAA IPs are added to nftables with TTL so that default-deny + domain-allow is enforced at the network layer. - **Privilege Isolation**: Requires `CAP_NET_ADMIN` only for the sidecar; the application container runs unprivileged. - **Graceful Degradation**: If `CAP_NET_ADMIN` is missing, it warns and disables enforcement instead of crashing. @@ -23,8 +24,9 @@ The egress control is implemented as a **Sidecar** that shares the network names - Filters queries based on the allowlist. - Returns `NXDOMAIN` for denied domains. -2. **Network Filter (Layer 2)** (Roadmap): - - Will use `nftables` to enforce IP-level restrictions based on resolved domains. +2. **Network Filter (Layer 2)** (when `OPENSANDBOX_EGRESS_MODE=dns+nft`): + - Uses `nftables` to enforce IP-level allow/deny. Resolved IPs for allowed domains are added to dynamic allow sets with TTL (dynamic DNS). + - At startup, the sidecar whitelists **127.0.0.1** (redirect target for the proxy) and **nameserver IPs** from `/etc/resolv.conf` so DNS resolution and proxy upstream work (including private DNS). Nameserver count is capped and invalid IPs are filtered; see [Configuration](#configuration). ## Requirements @@ -43,6 +45,11 @@ The egress control is implemented as a **Sidecar** that shares the network names - Mode (`OPENSANDBOX_EGRESS_MODE`, default `dns`): - `dns`: DNS proxy only, no nftables (IP/CIDR rules have no effect at L2). - `dns+nft`: enable nftables; if nft apply fails, fallback to `dns`. IP/CIDR enforcement and DoH/DoT blocking require this mode. +- **DNS and nft mode (nameserver whitelist)** + In `dns+nft` mode, the sidecar automatically allows: + - **127.0.0.1** — so packets redirected by iptables to the proxy (127.0.0.1:15353) are accepted by nft. + - **Nameserver IPs** from `/etc/resolv.conf` — so client DNS and proxy upstream work (e.g. private DNS). + Nameserver IPs are validated (unspecified and loopback are skipped) and capped. Use `OPENSANDBOX_EGRESS_MAX_NS` (default `3`; `0` = no cap, `1`–`10` = cap). See [SECURITY-RISKS.md](SECURITY-RISKS.md) for trust and scope of this whitelist. - DoH/DoT blocking: - DoT (tcp/udp 853) blocked by default. - Optional DoH over 443: `OPENSANDBOX_EGRESS_BLOCK_DOH_443=true`. If enabled without blocklist, all 443 is dropped. @@ -139,15 +146,30 @@ To test the sidecar with a sandbox application: - **Key Packages**: - `pkg/dnsproxy`: DNS server and policy matching logic. - `pkg/iptables`: `iptables` rule management. + - `pkg/nftables`: nftables static/dynamic rules and DNS-resolved IP sets. - `pkg/policy`: Policy parsing and definition. +- **Main (egress)**: + - `nameserver.go`: Builds the list of IPs to whitelist for DNS in nft mode (127.0.0.1 + validated/capped nameservers from resolv.conf). ```bash # Run tests go test ./... ``` +### E2E benchmark: dns vs dns+nft (sync dynamic IP write) + +An end-to-end benchmark compares **dns** (pass-through, no nft write) and **dns+nft** (sync `AddResolvedIPs` before each DNS reply) under real conditions: sidecar in Docker, iptables redirect, real DNS + HTTPS from a client container. + +```bash +./tests/bench-dns-nft.sh +``` + +More details in [docs/benchmark.md](docs/benchmark.md). + ## Troubleshooting - **"iptables setup failed"**: Ensure the sidecar container has `--cap-add=NET_ADMIN`. -- **DNS resolution fails for all domains**: Check if the upstream DNS (from `/etc/resolv.conf`) is reachable. -- **Traffic not blocked**: If nftables应用失败会回退为 DNS-only;检查日志、`nft list table inet opensandbox`、以及 `CAP_NET_ADMIN` 权限。 +- **DNS resolution fails for all domains**: + - Check if the upstream DNS (from `/etc/resolv.conf`) is reachable. + - In `dns+nft` mode, the sidecar whitelists nameserver IPs from resolv.conf at startup; check logs for `[dns] whitelisting proxy listen + N nameserver(s)` and ensure `/etc/resolv.conf` is readable and contains valid, reachable nameservers. The proxy prefers the first non-loopback nameserver from resolv.conf; if only loopback exists (e.g. Docker 127.0.0.11), it is used (proxy upstream traffic bypasses the redirect). Fallback to 8.8.8.8 only when resolv.conf is empty or unreadable. +- **Traffic not blocked**: If nftables apply fails, the sidecar falls back to dns; check logs, `nft list table inet opensandbox`, and `CAP_NET_ADMIN`. diff --git a/components/egress/TODO.md b/components/egress/TODO.md index 680dbdb8..f796d81b 100644 --- a/components/egress/TODO.md +++ b/components/egress/TODO.md @@ -1,7 +1,7 @@ # Egress Sidecar TODO (Linux MVP → Full OSEP-0001) -- Layer 2 still partial: static IP/CIDR now pushed to nftables, DoH/DoT blocking added (853 + optional 443 blocklist). DNS-learned IPs/dynamic isolation intentionally NOT targeted (see No goals). -- Policy surface: IP/CIDR parsing/validation done; `require_full_isolation` and richer validation messages are out of scope (see No goals). Dynamic IP learn/apply is out of scope (see No goals). +- Layer 2 still partial: static IP/CIDR now pushed to nftables, DoH/DoT blocking added (853 + optional 443 blocklist). DNS-learned IPs/dynamic isolation planned (see Short-term priorities). +- Policy surface: IP/CIDR parsing/validation done; `require_full_isolation` and richer validation messages are out of scope (see No goals). - Observability missing: no violation logs. - Capability probing missing: no CAP_NET_ADMIN/nftables detection; hostNetwork 已由 server 侧阻断。 Capability detection + mode exposure moved to No goals. - Platform integration completed: specs/SDK/server wiring done; NET_ADMIN only on sidecar. diff --git a/components/egress/docs/benchmark.md b/components/egress/docs/benchmark.md new file mode 100644 index 00000000..99040f00 --- /dev/null +++ b/components/egress/docs/benchmark.md @@ -0,0 +1,83 @@ +# Egress Benchmark + +This document describes the **Egress Sidecar** end-to-end benchmark: it compares **dns** and **dns+nft** modes under real conditions for latency and throughput. + +## Purpose + +- **dns**: DNS proxy only (pass-through), no nftables writes; used as the baseline. +- **dns+nft**: DNS proxy plus synchronous `AddResolvedIPs` before each DNS reply, writing resolved IPs into nftables for + L2 egress enforcement. + +The benchmark runs the same workload in both modes and reports end-to-end latency (P50, P99) and throughput (Req/s) to +measure the overhead of the synchronous nft write path. + +## Environment and Flow + +- **Environment**: The Egress sidecar runs in a Docker container on the host. The container includes the sidecar (DNS + proxy and optional nft), iptables redirect of port 53 to the proxy, and the policy server on port 18080. The workload + runs **inside the same container**: DNS and HTTPS traffic go through the proxy. +- **Flow** (per phase): + 1. Start the sidecar with the chosen mode (`dns` or `dns+nft`). + 2. Wait for health checks, then push the allow list to `/policy` (see domain list below). + 3. Write the domain list into the container as `/tmp/bench-domains.txt` (one `https://` per line). + 4. **Warm-up**: One request to each of the first 10 domains (10 concurrent), 1 round. + 5. **Timed run**: One request per domain for all domains (N concurrent per round), for 10 rounds; each request + records `time_namelookup` and `time_total`. + 6. Copy results from the container and compute P50, P99, average latency, and Req/s. +- **Execution order**: **dns+nft** runs first, then **dns**; the comparison table is printed at the end. + +## Workload + +- **Domain list**: Read from `components/egress/tests/hostname.txt`, one domain per line (lines starting with `#` and + empty lines are ignored). Default is about 100 resolvable domains. +- **Rounds and concurrency**: The script uses `ROUNDS=10`. Each round issues one HTTPS request per domain in + `hostname.txt`, with all requests in that round concurrent; 10 rounds total. +- **Total requests**: `TOTAL_REQUESTS = ROUNDS × NUM_DOMAINS` (e.g. 10 × 100 = 1000). +- **Per request**: Inside the container, `curl -o /dev/null -s -w "%{time_namelookup}\t%{time_total}\n"` is used against + `https://`, with a 10s timeout per request; the whole benchmark run has a 300s wall-clock timeout. + +## Policy + +- Policy is default-deny with explicit allow rules: one `{"action":"allow","target":""}` per domain in + `hostname.txt` is sent via `POST /policy`, so every domain used in the benchmark is allowed. + +## How to Run + +**Script**: `components/egress/tests/bench-e2e-dns-nft.sh` + +**Requirements**: Docker and `curl` on the host (for pushing policy); the Egress image includes `curl` for the workload. + +**Commands** (from repo root or from `components/egress`): + +```bash +./tests/bench-dns-nft.sh +``` + +The script resolves `tests/hostname.txt` relative to its own path, so the working directory does not need to be changed. + +## Configuration + +| Item | Location / variable | Default / notes | +|---------------------|----------------------------------------|------------------------------------------------| +| Domain list | `components/egress/tests/hostname.txt` | One domain per line; `#` comments allowed | +| Rounds | `ROUNDS` in script | 10 | +| Per-request timeout | `CURL_TIMEOUT` in script | 10 seconds | +| Benchmark timeout | `BENCH_EXEC_TIMEOUT` in script | 300 seconds (max wall time for the timed run) | +| Image | `IMG` in script | See script; override for a locally built image | + +Changing the number of domains or rounds updates the total request count; the report shows “N rounds × M domains” for +the current config. + +## Output and Metrics + +- **Terminal**: A table with **Req/s**, **Avg(s)**, **P50(s)**, **P99(s)** for both modes, plus short notes (dns vs + dns+nft, warm-up, first-resolution cost). +- **Artifacts** (on the host under `/tmp`): `bench-e2e-dns-total.txt`, `bench-e2e-dns+nft-total.txt` (one + `time_total` per line), and `-namelookup.txt`, `-wall.txt`, etc., for further analysis or plotting. + +## Notes + +- The first resolution of a domain in dns+nft triggers a DNS lookup and an nft write, so cost is higher; later requests + for the same domain hit the set and are cheaper. The multi-round, multi-domain design mixes cold and warm resolution. +- In CI (e.g. GitHub Actions), the script wraps the timed-run `docker exec` with `timeout` inside the shell function so + `timeout` runs a real command, not a function name, avoiding “No such file or directory” errors. diff --git a/components/egress/main.go b/components/egress/main.go index d5e3b0f8..bca6d33b 100644 --- a/components/egress/main.go +++ b/components/egress/main.go @@ -17,7 +17,6 @@ package main import ( "context" "log" - "net/netip" "os" "os/signal" "strings" @@ -26,33 +25,22 @@ import ( "github.com/alibaba/opensandbox/egress/pkg/constants" "github.com/alibaba/opensandbox/egress/pkg/dnsproxy" "github.com/alibaba/opensandbox/egress/pkg/iptables" - "github.com/alibaba/opensandbox/egress/pkg/nftables" ) -// Linux MVP: DNS proxy + iptables REDIRECT. No nftables/full isolation yet. func main() { ctx, cancel := signal.NotifyContext(context.Background(), syscall.SIGINT, syscall.SIGTERM) defer cancel() - // Optional bootstrap via env; still allow runtime HTTP updates. - initialPolicy, err := dnsproxy.LoadPolicyFromEnvVar(constants.EnvEgressRules) + initialRules, err := dnsproxy.LoadPolicyFromEnvVar(constants.EnvEgressRules) if err != nil { log.Fatalf("failed to parse %s: %v", constants.EnvEgressRules, err) } - if initialPolicy != nil { - log.Printf("loaded initial egress policy from %s", constants.EnvEgressRules) - } - - requestedMode := parseMode() - enforcementMode := requestedMode - var nftMgr nftApplier - if requestedMode == constants.PolicyDnsNft { - nftOpts := parseNftOptions() - nftMgr = nftables.NewManagerWithOptions(nftOpts) - } + allowIPs := AllowIPsForNft("/etc/resolv.conf") - proxy, err := dnsproxy.New(initialPolicy, "") + mode := parseMode() + nftMgr := createNftManager(mode) + proxy, err := dnsproxy.New(initialRules, "") if err != nil { log.Fatalf("failed to init dns proxy: %v", err) } @@ -66,20 +54,11 @@ func main() { } log.Printf("iptables redirect configured (OUTPUT 53 -> 15353) with SO_MARK bypass for proxy upstream traffic") - if nftMgr != nil { - if err := nftMgr.ApplyStatic(ctx, initialPolicy); err != nil { - log.Fatalf("nftables static apply failed; please check logs): %v", err) - } else { - log.Printf("nftables static policy applied (table inet opensandbox)") - } - } + setupNft(ctx, nftMgr, initialRules, proxy, allowIPs) - httpAddr := os.Getenv(constants.EnvEgressHTTPAddr) - if httpAddr == "" { - httpAddr = constants.DefaultEgressServerAddr - } - token := os.Getenv(constants.EnvEgressToken) - if err := startPolicyServer(ctx, proxy, nftMgr, enforcementMode, httpAddr, token); err != nil { + // start policy server + httpAddr := envOrDefault(constants.EnvEgressHTTPAddr, constants.DefaultEgressServerAddr) + if err = startPolicyServer(ctx, proxy, nftMgr, mode, httpAddr, os.Getenv(constants.EnvEgressToken), allowIPs); err != nil { log.Fatalf("failed to start policy server: %v", err) } log.Printf("policy server listening on %s (POST /policy)", httpAddr) @@ -89,38 +68,11 @@ func main() { _ = os.Stderr.Sync() } -func parseNftOptions() nftables.Options { - opts := nftables.Options{BlockDoT: true} - if isTruthy(os.Getenv(constants.EnvBlockDoH443)) { - opts.BlockDoH443 = true - } - if raw := os.Getenv(constants.EnvDoHBlocklist); strings.TrimSpace(raw) != "" { - parts := strings.Split(raw, ",") - for _, p := range parts { - target := strings.TrimSpace(p) - if target == "" { - continue - } - if addr, err := netip.ParseAddr(target); err == nil { - if addr.Is4() { - opts.DoHBlocklistV4 = append(opts.DoHBlocklistV4, target) - } else if addr.Is6() { - opts.DoHBlocklistV6 = append(opts.DoHBlocklistV6, target) - } - continue - } - if prefix, err := netip.ParsePrefix(target); err == nil { - if prefix.Addr().Is4() { - opts.DoHBlocklistV4 = append(opts.DoHBlocklistV4, target) - } else if prefix.Addr().Is6() { - opts.DoHBlocklistV6 = append(opts.DoHBlocklistV6, target) - } - continue - } - log.Printf("ignoring invalid DoH blocklist entry: %s", target) - } +func envOrDefault(key, defaultVal string) string { + if v := strings.TrimSpace(os.Getenv(key)); v != "" { + return v } - return opts + return defaultVal } func isTruthy(v string) bool { diff --git a/components/egress/nameserver.go b/components/egress/nameserver.go new file mode 100644 index 00000000..a3d84c47 --- /dev/null +++ b/components/egress/nameserver.go @@ -0,0 +1,91 @@ +// Copyright 2026 Alibaba Group Holding Ltd. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package main + +import ( + "log" + "net/netip" + "os" + "strconv" + + "github.com/alibaba/opensandbox/egress/pkg/constants" + "github.com/alibaba/opensandbox/egress/pkg/dnsproxy" +) + +// AllowIPsForNft returns the list of IPs to merge into the nft allow set for DNS in dns+nft mode: +// 127.0.0.1 (proxy listen / iptables redirect target) plus validated, capped nameserver IPs from resolvPath. +// Validation: skips unspecified (0.0.0.0, ::) and loopback (127.x, ::1). +// Cap: at most max nameservers (default 3; set EGRESS_MAX_NAMESERVERS=0 for no cap, or 1–10). +func AllowIPsForNft(resolvPath string) []netip.Addr { + raw, _ := dnsproxy.ResolvNameserverIPs(resolvPath) + maxNsCount := maxNameserversFromEnv() + + var validated []netip.Addr + for _, ip := range raw { + if maxNsCount > 0 && len(validated) >= maxNsCount { + break + } + if !isValidNameserverIP(ip) { + continue + } + validated = append(validated, ip) + } + + // 127.0.0.1 first so packets redirected to proxy are accepted by nft. + out := make([]netip.Addr, 0, 1+len(validated)) + out = append(out, netip.MustParseAddr("127.0.0.1")) + out = append(out, validated...) + + if len(out) > 1 { + log.Printf("[dns] whitelisting proxy listen + %d nameserver(s) for nft: %v", len(validated), formatIPs(out)) + } else { + log.Printf("[dns] whitelisting proxy listen (127.0.0.1); no valid nameserver IPs from %s", resolvPath) + } + return out +} + +func maxNameserversFromEnv() int { + s := os.Getenv(constants.EnvMaxNameservers) + if s == "" { + return constants.DefaultMaxNameservers + } + n, err := strconv.Atoi(s) + if err != nil || n < 0 { + return constants.DefaultMaxNameservers + } + if n > 10 { + return 10 + } + // 0 = no cap + return n +} + +func isValidNameserverIP(ip netip.Addr) bool { + if ip.IsUnspecified() { + return false + } + if ip.IsLoopback() { + return false + } + return true +} + +func formatIPs(ips []netip.Addr) []string { + out := make([]string, len(ips)) + for i, ip := range ips { + out[i] = ip.String() + } + return out +} diff --git a/components/egress/nameserver_test.go b/components/egress/nameserver_test.go new file mode 100644 index 00000000..2edc5142 --- /dev/null +++ b/components/egress/nameserver_test.go @@ -0,0 +1,150 @@ +// Copyright 2026 Alibaba Group Holding Ltd. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package main + +import ( + "net/netip" + "os" + "path/filepath" + "testing" + + "github.com/alibaba/opensandbox/egress/pkg/constants" +) + +func TestAllowIPsForNft_EmptyResolv(t *testing.T) { + dir := t.TempDir() + resolv := filepath.Join(dir, "resolv.conf") + if err := os.WriteFile(resolv, []byte("# empty\n"), 0644); err != nil { + t.Fatal(err) + } + ips := AllowIPsForNft(resolv) + if len(ips) != 1 { + t.Fatalf("expected 1 IP (127.0.0.1), got %d", len(ips)) + } + if ips[0] != netip.MustParseAddr("127.0.0.1") { + t.Fatalf("expected 127.0.0.1, got %s", ips[0]) + } +} + +func TestAllowIPsForNft_ValidNameservers(t *testing.T) { + dir := t.TempDir() + resolv := filepath.Join(dir, "resolv.conf") + // Standard resolv.conf with two nameservers + content := "nameserver 192.168.65.7\nnameserver 10.0.0.1\n" + if err := os.WriteFile(resolv, []byte(content), 0644); err != nil { + t.Fatal(err) + } + ips := AllowIPsForNft(resolv) + if len(ips) != 3 { + t.Fatalf("expected 3 IPs (127.0.0.1 + 2 nameservers), got %d", len(ips)) + } + if ips[0] != netip.MustParseAddr("127.0.0.1") { + t.Fatalf("expected first 127.0.0.1, got %s", ips[0]) + } + if ips[1] != netip.MustParseAddr("192.168.65.7") { + t.Fatalf("expected 192.168.65.7, got %s", ips[1]) + } + if ips[2] != netip.MustParseAddr("10.0.0.1") { + t.Fatalf("expected 10.0.0.1, got %s", ips[2]) + } +} + +func TestAllowIPsForNft_FiltersInvalid(t *testing.T) { + dir := t.TempDir() + resolv := filepath.Join(dir, "resolv.conf") + // 0.0.0.0 and 127.0.0.11 should be filtered; 192.168.1.1 kept + content := "nameserver 0.0.0.0\nnameserver 192.168.1.1\nnameserver 127.0.0.11\n" + if err := os.WriteFile(resolv, []byte(content), 0644); err != nil { + t.Fatal(err) + } + ips := AllowIPsForNft(resolv) + if len(ips) != 2 { + t.Fatalf("expected 2 IPs (127.0.0.1 + 192.168.1.1), got %d: %v", len(ips), ips) + } + if ips[0] != netip.MustParseAddr("127.0.0.1") { + t.Fatalf("expected first 127.0.0.1, got %s", ips[0]) + } + if ips[1] != netip.MustParseAddr("192.168.1.1") { + t.Fatalf("expected 192.168.1.1, got %s", ips[1]) + } +} + +func TestAllowIPsForNft_Cap(t *testing.T) { + dir := t.TempDir() + resolv := filepath.Join(dir, "resolv.conf") + content := "nameserver 10.0.0.1\nnameserver 10.0.0.2\nnameserver 10.0.0.3\nnameserver 10.0.0.4\n" + if err := os.WriteFile(resolv, []byte(content), 0644); err != nil { + t.Fatal(err) + } + old := os.Getenv(constants.EnvMaxNameservers) + defer os.Setenv(constants.EnvMaxNameservers, old) + os.Setenv(constants.EnvMaxNameservers, "2") + + ips := AllowIPsForNft(resolv) + // 127.0.0.1 + 2 nameservers (cap) + if len(ips) != 3 { + t.Fatalf("expected 3 IPs (127.0.0.1 + 2 capped), got %d: %v", len(ips), ips) + } + if ips[1] != netip.MustParseAddr("10.0.0.1") || ips[2] != netip.MustParseAddr("10.0.0.2") { + t.Fatalf("expected first two nameservers, got %v", ips[1:]) + } +} + +func TestIsValidNameserverIP(t *testing.T) { + tests := []struct { + ip string + want bool + }{ + {"0.0.0.0", false}, + {"::", false}, + {"127.0.0.1", false}, + {"127.0.0.11", false}, + {"::1", false}, + {"192.168.65.7", true}, + {"10.0.0.1", true}, + {"8.8.8.8", true}, + } + for _, tt := range tests { + ip := netip.MustParseAddr(tt.ip) + got := isValidNameserverIP(ip) + if got != tt.want { + t.Errorf("isValidNameserverIP(%s) = %v, want %v", tt.ip, got, tt.want) + } + } +} + +func TestMaxNameserversFromEnv(t *testing.T) { + old := os.Getenv(constants.EnvMaxNameservers) + defer os.Setenv(constants.EnvMaxNameservers, old) + + for _, s := range []string{"", "x", "-1"} { + os.Setenv(constants.EnvMaxNameservers, s) + if got := maxNameserversFromEnv(); got != constants.DefaultMaxNameservers { + t.Errorf("maxNameserversFromEnv(%q) = %d, want default %d", s, got, constants.DefaultMaxNameservers) + } + } + os.Setenv(constants.EnvMaxNameservers, "0") + if got := maxNameserversFromEnv(); got != 0 { + t.Errorf("maxNameserversFromEnv(0) = %d, want 0", got) + } + os.Setenv(constants.EnvMaxNameservers, "5") + if got := maxNameserversFromEnv(); got != 5 { + t.Errorf("maxNameserversFromEnv(5) = %d, want 5", got) + } + os.Setenv(constants.EnvMaxNameservers, "99") + if got := maxNameserversFromEnv(); got != 10 { + t.Errorf("maxNameserversFromEnv(99) = %d, want 10 (capped)", got) + } +} diff --git a/components/egress/nft.go b/components/egress/nft.go new file mode 100644 index 00000000..74ce65a1 --- /dev/null +++ b/components/egress/nft.go @@ -0,0 +1,88 @@ +// Copyright 2026 Alibaba Group Holding Ltd. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package main + +import ( + "context" + "log" + "net/netip" + "os" + "strings" + + "github.com/alibaba/opensandbox/egress/pkg/constants" + "github.com/alibaba/opensandbox/egress/pkg/dnsproxy" + "github.com/alibaba/opensandbox/egress/pkg/nftables" + "github.com/alibaba/opensandbox/egress/pkg/policy" +) + +// createNftManager returns an nft manager for dns+nft mode, or nil for dns-only. +func createNftManager(mode string) nftApplier { + if mode != constants.PolicyDnsNft { + return nil + } + return nftables.NewManagerWithOptions(parseNftOptions()) +} + +// setupNft applies static policy to nft and wires DNS-resolved IPs into the proxy when nft is enabled. +// nameserverIPs are merged into the allow set at startup so system DNS works (client + proxy upstream, e.g. private DNS). +func setupNft(ctx context.Context, nftMgr nftApplier, initialPolicy *policy.NetworkPolicy, proxy *dnsproxy.Proxy, nameserverIPs []netip.Addr) { + if nftMgr == nil { + return + } + policyWithNS := initialPolicy.WithExtraAllowIPs(nameserverIPs) + if err := nftMgr.ApplyStatic(ctx, policyWithNS); err != nil { + log.Fatalf("nftables static apply failed: %v", err) + } + log.Printf("nftables static policy applied (table inet opensandbox)") + proxy.SetOnResolved(func(domain string, ips []nftables.ResolvedIP) { + if err := nftMgr.AddResolvedIPs(ctx, ips); err != nil { + log.Printf("[dns] add resolved IPs to nft failed: %v", err) + } + }) +} + +func parseNftOptions() nftables.Options { + opts := nftables.Options{BlockDoT: true} + if isTruthy(os.Getenv(constants.EnvBlockDoH443)) { + opts.BlockDoH443 = true + } + if raw := os.Getenv(constants.EnvDoHBlocklist); strings.TrimSpace(raw) != "" { + parts := strings.Split(raw, ",") + for _, p := range parts { + target := strings.TrimSpace(p) + if target == "" { + continue + } + if addr, err := netip.ParseAddr(target); err == nil { + if addr.Is4() { + opts.DoHBlocklistV4 = append(opts.DoHBlocklistV4, target) + } else if addr.Is6() { + opts.DoHBlocklistV6 = append(opts.DoHBlocklistV6, target) + } + continue + } + if prefix, err := netip.ParsePrefix(target); err == nil { + if prefix.Addr().Is4() { + opts.DoHBlocklistV4 = append(opts.DoHBlocklistV4, target) + } else if prefix.Addr().Is6() { + opts.DoHBlocklistV6 = append(opts.DoHBlocklistV6, target) + } + continue + } + log.Printf("ignoring invalid DoH blocklist entry: %s", target) + } + } + return opts +} diff --git a/components/egress/pkg/constants/configuration.go b/components/egress/pkg/constants/configuration.go index 64be6d0e..d4d44fe2 100644 --- a/components/egress/pkg/constants/configuration.go +++ b/components/egress/pkg/constants/configuration.go @@ -21,6 +21,7 @@ const ( EnvEgressHTTPAddr = "OPENSANDBOX_EGRESS_HTTP_ADDR" EnvEgressToken = "OPENSANDBOX_EGRESS_TOKEN" EnvEgressRules = "OPENSANDBOX_EGRESS_RULES" + EnvMaxNameservers = "OPENSANDBOX_EGRESS_MAX_NS" ) const ( @@ -30,4 +31,5 @@ const ( const ( DefaultEgressServerAddr = ":18080" + DefaultMaxNameservers = 3 ) diff --git a/components/egress/pkg/dnsproxy/proxy.go b/components/egress/pkg/dnsproxy/proxy.go index e6c0659c..6e6e853d 100644 --- a/components/egress/pkg/dnsproxy/proxy.go +++ b/components/egress/pkg/dnsproxy/proxy.go @@ -19,12 +19,14 @@ import ( "fmt" "log" "net" + "net/netip" "os" "sync" "time" "github.com/miekg/dns" + "github.com/alibaba/opensandbox/egress/pkg/nftables" "github.com/alibaba/opensandbox/egress/pkg/policy" ) @@ -36,6 +38,9 @@ type Proxy struct { listenAddr string upstream string // single upstream for MVP servers []*dns.Server + + // optional; called in goroutine when A/AAAA are present + onResolved func(domain string, ips []nftables.ResolvedIP) } // New builds a proxy with resolved upstream; listenAddr can be empty for default. @@ -118,9 +123,23 @@ func (p *Proxy) serveDNS(w dns.ResponseWriter, r *dns.Msg) { _ = w.WriteMsg(fail) return } + p.maybeNotifyResolved(domain, resp) _ = w.WriteMsg(resp) } +// maybeNotifyResolved calls onResolved synchronously when resp contains A/AAAA, +// so that IPs are in nft before the client receives the DNS response and connects. +func (p *Proxy) maybeNotifyResolved(domain string, resp *dns.Msg) { + if p.onResolved == nil { + return + } + ips := extractResolvedIPs(resp) + if len(ips) == 0 { + return + } + p.onResolved(domain, ips) +} + func (p *Proxy) forward(r *dns.Msg) (*dns.Msg, error) { c := &dns.Client{ Timeout: 5 * time.Second, @@ -154,14 +173,90 @@ func (p *Proxy) CurrentPolicy() *policy.NetworkPolicy { return p.policy } +// SetOnResolved sets the callback invoked when an allowed domain resolves to A/AAAA. +// Called in a goroutine; pass nil to disable. Only used when L2 dynamic IP is enabled (e.g. dns+nft mode). +func (p *Proxy) SetOnResolved(fn func(domain string, ips []nftables.ResolvedIP)) { + p.onResolved = fn +} + +// extractResolvedIPs parses A and AAAA records from resp.Answer into ResolvedIP slice. +func extractResolvedIPs(resp *dns.Msg) []nftables.ResolvedIP { + if resp == nil || len(resp.Answer) == 0 { + return nil + } + + var out []nftables.ResolvedIP + for _, rr := range resp.Answer { + switch v := rr.(type) { + case *dns.A: + if v.A == nil { + continue + } + addr, err := netip.ParseAddr(v.A.String()) + if err != nil { + continue + } + out = append(out, nftables.ResolvedIP{Addr: addr, TTL: time.Duration(v.Hdr.Ttl) * time.Second}) + case *dns.AAAA: + if v.AAAA == nil { + continue + } + addr, err := netip.ParseAddr(v.AAAA.String()) + if err != nil { + continue + } + out = append(out, nftables.ResolvedIP{Addr: addr, TTL: time.Duration(v.Hdr.Ttl) * time.Second}) + } + } + return out +} + +const fallbackUpstream = "8.8.8.8:53" + func discoverUpstream() (string, error) { cfg, err := dns.ClientConfigFromFile("/etc/resolv.conf") - if err == nil && len(cfg.Servers) > 0 { - return net.JoinHostPort(cfg.Servers[0], cfg.Port), nil + if err != nil || len(cfg.Servers) == 0 { + if err != nil { + log.Printf("[dns] fallback upstream resolver due to error: %v", err) + } + return fallbackUpstream, nil + } + // Prefer first non-loopback nameserver (e.g. K8s cluster DNS after 127.0.0.11). + // If only loopback exists (e.g. Docker 127.0.0.11), use it: proxy upstream traffic + // is marked and bypasses the redirect, so loopback is reachable from the sidecar. + var chosen string + for _, s := range cfg.Servers { + if ip := net.ParseIP(s); ip != nil && ip.IsLoopback() { + if chosen == "" { + chosen = s + } + continue + } + chosen = s + break + } + if chosen == "" { + chosen = cfg.Servers[0] + } + return net.JoinHostPort(chosen, cfg.Port), nil +} + +// ResolvNameserverIPs reads nameserver lines from resolvPath and returns parsed IPv4/IPv6 addresses. +// Used at startup to whitelist the system DNS so client traffic to it is allowed and proxy can use it as upstream. +func ResolvNameserverIPs(resolvPath string) ([]netip.Addr, error) { + cfg, err := dns.ClientConfigFromFile(resolvPath) + if err != nil || len(cfg.Servers) == 0 { + return nil, nil + } + var out []netip.Addr + for _, s := range cfg.Servers { + ip, err := netip.ParseAddr(s) + if err != nil { + continue + } + out = append(out, ip) } - // fallback to public resolver; comment to explain deterministic behavior - log.Printf("[dns] fallback upstream resolver due to error: %v", err) - return "8.8.8.8:53", nil + return out, nil } // LoadPolicyFromEnvVar reads the given env var and parses a policy; empty falls back to default deny-all. diff --git a/components/egress/pkg/dnsproxy/proxy_test.go b/components/egress/pkg/dnsproxy/proxy_test.go index 31f831e1..4f1a52b0 100644 --- a/components/egress/pkg/dnsproxy/proxy_test.go +++ b/components/egress/pkg/dnsproxy/proxy_test.go @@ -15,8 +15,13 @@ package dnsproxy import ( + "net" "testing" + "time" + "github.com/miekg/dns" + + "github.com/alibaba/opensandbox/egress/pkg/nftables" "github.com/alibaba/opensandbox/egress/pkg/policy" ) @@ -79,3 +84,139 @@ func TestLoadPolicyFromEnvVar(t *testing.T) { t.Fatalf("expected default deny when env is empty, got %+v", pol) } } + +func TestExtractResolvedIPs(t *testing.T) { + msg := new(dns.Msg) + msg.Answer = []dns.RR{ + &dns.A{Hdr: dns.RR_Header{Name: "example.com.", Ttl: 120}, A: net.ParseIP("1.2.3.4")}, + &dns.AAAA{Hdr: dns.RR_Header{Name: "example.com.", Ttl: 60}, AAAA: net.ParseIP("2001:db8::1")}, + &dns.A{Hdr: dns.RR_Header{Name: "example.com.", Ttl: 90}, A: net.ParseIP("5.6.7.8")}, + } + ips := extractResolvedIPs(msg) + if len(ips) != 3 { + t.Fatalf("expected 3 IPs, got %d", len(ips)) + } + // Order follows Answer; check first A and AAAA + if ips[0].Addr.String() != "1.2.3.4" || ips[0].TTL != 120*time.Second { + t.Fatalf("first IP: got %s TTL %v", ips[0].Addr, ips[0].TTL) + } + if ips[1].Addr.String() != "2001:db8::1" || ips[1].TTL != 60*time.Second { + t.Fatalf("second IP: got %s TTL %v", ips[1].Addr, ips[1].TTL) + } + if ips[2].Addr.String() != "5.6.7.8" || ips[2].TTL != 90*time.Second { + t.Fatalf("third IP: got %s TTL %v", ips[2].Addr, ips[2].TTL) + } +} + +func TestExtractResolvedIPs_EmptyOrNil(t *testing.T) { + if got := extractResolvedIPs(nil); got != nil { + t.Fatalf("nil msg: expected nil, got %v", got) + } + msg := new(dns.Msg) + if got := extractResolvedIPs(msg); got != nil { + t.Fatalf("empty answer: expected nil, got %v", got) + } + msg.Answer = []dns.RR{&dns.CNAME{Hdr: dns.RR_Header{Name: "x."}, Target: "y."}} + if got := extractResolvedIPs(msg); got != nil { + t.Fatalf("CNAME only: expected nil, got %v", got) + } +} + +func TestSetOnResolved(t *testing.T) { + proxy, err := New(policy.DefaultDenyPolicy(), "") + if err != nil { + t.Fatalf("New: %v", err) + } + var called bool + var capturedDomain string + var capturedIPs []nftables.ResolvedIP + proxy.SetOnResolved(func(domain string, ips []nftables.ResolvedIP) { + called = true + capturedDomain = domain + capturedIPs = ips + }) + if proxy.onResolved == nil { + t.Fatalf("SetOnResolved did not set callback") + } + proxy.SetOnResolved(nil) + if proxy.onResolved != nil { + t.Fatalf("SetOnResolved(nil) did not clear callback") + } + _ = called + _ = capturedDomain + _ = capturedIPs +} + +func TestMaybeNotifyResolved_CallsCallbackWhenAOrAAAA(t *testing.T) { + proxy, err := New(policy.DefaultDenyPolicy(), "") + if err != nil { + t.Fatalf("New: %v", err) + } + ch := make(chan struct { + domain string + ips []nftables.ResolvedIP + }, 1) + proxy.SetOnResolved(func(domain string, ips []nftables.ResolvedIP) { + ch <- struct { + domain string + ips []nftables.ResolvedIP + }{domain, ips} + }) + + msg := new(dns.Msg) + msg.Answer = []dns.RR{ + &dns.A{Hdr: dns.RR_Header{Name: "example.com.", Ttl: 120}, A: net.ParseIP("1.2.3.4")}, + } + proxy.maybeNotifyResolved("example.com.", msg) + + select { + case got := <-ch: + if got.domain != "example.com." { + t.Fatalf("domain: got %q", got.domain) + } + if len(got.ips) != 1 || got.ips[0].Addr.String() != "1.2.3.4" { + t.Fatalf("ips: got %v", got.ips) + } + case <-time.After(2 * time.Second): + t.Fatal("callback was not invoked") + } +} + +func TestMaybeNotifyResolved_NoCallWhenOnResolvedNil(t *testing.T) { + proxy, err := New(policy.DefaultDenyPolicy(), "") + if err != nil { + t.Fatalf("New: %v", err) + } + msg := new(dns.Msg) + msg.Answer = []dns.RR{&dns.A{Hdr: dns.RR_Header{Name: "x.", Ttl: 60}, A: net.ParseIP("10.0.0.1")}} + proxy.maybeNotifyResolved("x.", msg) + // No callback set; should not panic. No assertion needed. +} + +func TestMaybeNotifyResolved_NoCallWhenNoAOrAAAA(t *testing.T) { + proxy, err := New(policy.DefaultDenyPolicy(), "") + if err != nil { + t.Fatalf("New: %v", err) + } + ch := make(chan struct { + domain string + ips []nftables.ResolvedIP + }, 1) + proxy.SetOnResolved(func(domain string, ips []nftables.ResolvedIP) { + ch <- struct { + domain string + ips []nftables.ResolvedIP + }{domain, ips} + }) + + msg := new(dns.Msg) + msg.Answer = []dns.RR{&dns.CNAME{Hdr: dns.RR_Header{Name: "x."}, Target: "y."}} + proxy.maybeNotifyResolved("x.", msg) + + select { + case <-ch: + t.Fatal("callback should not be invoked when resp has no A/AAAA") + case <-time.After(200 * time.Millisecond): + // Expected: no callback + } +} diff --git a/components/egress/pkg/nftables/dynamic.go b/components/egress/pkg/nftables/dynamic.go new file mode 100644 index 00000000..1e16e621 --- /dev/null +++ b/components/egress/pkg/nftables/dynamic.go @@ -0,0 +1,69 @@ +// Copyright 2026 Alibaba Group Holding Ltd. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package nftables + +import ( + "fmt" + "net/netip" + "strings" + "time" +) + +const ( + dynAllowV4Set = "dyn_allow_v4" + dynAllowV6Set = "dyn_allow_v6" + dynSetTimeoutS = 300 + minTTLSec = 60 + maxTTLSec = 300 +) + +// ResolvedIP is a single IP learned from DNS with TTL for dynamic nft set. +type ResolvedIP struct { + Addr netip.Addr + TTL time.Duration +} + +// buildAddResolvedIPsScript returns a nft script fragment that +// adds resolved IPs to dyn_allow_v4/v6 with timeout. +func buildAddResolvedIPsScript(table string, ips []ResolvedIP) string { + var v4, v6 []string + for _, r := range ips { + sec := clampTTL(r.TTL) + if r.Addr.Is4() { + v4 = append(v4, fmt.Sprintf("%s timeout %ds", r.Addr.String(), sec)) + } else if r.Addr.Is6() { + v6 = append(v6, fmt.Sprintf("%s timeout %ds", r.Addr.String(), sec)) + } + } + var b strings.Builder + if len(v4) > 0 { + fmt.Fprintf(&b, "add element inet %s %s { %s }\n", table, dynAllowV4Set, strings.Join(v4, ", ")) + } + if len(v6) > 0 { + fmt.Fprintf(&b, "add element inet %s %s { %s }\n", table, dynAllowV6Set, strings.Join(v6, ", ")) + } + return b.String() +} + +func clampTTL(d time.Duration) int { + sec := int(d.Seconds()) + if sec < minTTLSec { + return minTTLSec + } + if sec > maxTTLSec { + return maxTTLSec + } + return sec +} diff --git a/components/egress/pkg/nftables/manager.go b/components/egress/pkg/nftables/manager.go index 1477ecd3..a58c02f3 100644 --- a/components/egress/pkg/nftables/manager.go +++ b/components/egress/pkg/nftables/manager.go @@ -19,6 +19,7 @@ import ( "fmt" "os/exec" "strings" + "sync" "github.com/alibaba/opensandbox/egress/pkg/constants" "github.com/alibaba/opensandbox/egress/pkg/policy" @@ -47,10 +48,11 @@ type Options struct { DoHBlocklistV6 []string } -// Manager applies static IP/CIDR policy into nftables. +// Manager applies static IP/CIDR policy into nftables and dynamic DNS-learned IPs. type Manager struct { run runner opts Options + mu sync.Mutex } // NewManager builds an nftables manager that shells out to `nft -f -` with defaults. @@ -95,6 +97,23 @@ func (m *Manager) ApplyStatic(ctx context.Context, p *policy.NetworkPolicy) erro return nil } +// AddResolvedIPs adds DNS-learned IPs to dynamic allow sets with TTL-based timeout. +// TTL is clamped to minTTLSec–maxTTLSec. Call only when table exists (dns+nft mode). +func (m *Manager) AddResolvedIPs(ctx context.Context, ips []ResolvedIP) error { + if len(ips) == 0 { + return nil + } + + m.mu.Lock() + defer m.mu.Unlock() + script := buildAddResolvedIPsScript(tableName, ips) + if script == "" { + return nil + } + _, err := m.run(ctx, script) + return err +} + func buildRuleset(p *policy.NetworkPolicy, opts Options) string { allowV4, allowV6, denyV4, denyV6 := p.StaticIPSets() @@ -107,6 +126,8 @@ func buildRuleset(p *policy.NetworkPolicy, opts Options) string { fmt.Fprintf(&b, "add set inet %s %s { type ipv4_addr; flags interval; }\n", tableName, denyV4Set) fmt.Fprintf(&b, "add set inet %s %s { type ipv6_addr; flags interval; }\n", tableName, allowV6Set) fmt.Fprintf(&b, "add set inet %s %s { type ipv6_addr; flags interval; }\n", tableName, denyV6Set) + fmt.Fprintf(&b, "add set inet %s %s { type ipv4_addr; timeout %ds; }\n", tableName, dynAllowV4Set, dynSetTimeoutS) + fmt.Fprintf(&b, "add set inet %s %s { type ipv6_addr; timeout %ds; }\n", tableName, dynAllowV6Set, dynSetTimeoutS) if len(opts.DoHBlocklistV4) > 0 { fmt.Fprintf(&b, "add set inet %s %s { type ipv4_addr; flags interval; }\n", tableName, dohBlockV4Set) @@ -149,6 +170,8 @@ func buildRuleset(p *policy.NetworkPolicy, opts Options) string { } fmt.Fprintf(&b, "add rule inet %s %s ip daddr @%s drop\n", tableName, chainName, denyV4Set) fmt.Fprintf(&b, "add rule inet %s %s ip6 daddr @%s drop\n", tableName, chainName, denyV6Set) + fmt.Fprintf(&b, "add rule inet %s %s ip daddr @%s accept\n", tableName, chainName, dynAllowV4Set) + fmt.Fprintf(&b, "add rule inet %s %s ip6 daddr @%s accept\n", tableName, chainName, dynAllowV6Set) fmt.Fprintf(&b, "add rule inet %s %s ip daddr @%s accept\n", tableName, chainName, allowV4Set) fmt.Fprintf(&b, "add rule inet %s %s ip6 daddr @%s accept\n", tableName, chainName, allowV6Set) if chainPolicy == "drop" { diff --git a/components/egress/pkg/nftables/manager_test.go b/components/egress/pkg/nftables/manager_test.go index 8abbd897..02126c6c 100644 --- a/components/egress/pkg/nftables/manager_test.go +++ b/components/egress/pkg/nftables/manager_test.go @@ -17,8 +17,10 @@ package nftables import ( "context" "fmt" + "net/netip" "strings" "testing" + "time" "github.com/alibaba/opensandbox/egress/pkg/policy" ) @@ -52,8 +54,12 @@ func TestApplyStatic_BuildsRuleset_DefaultDeny(t *testing.T) { expectContains(t, rendered, "add rule inet opensandbox egress oifname \"lo\" accept") expectContains(t, rendered, "add rule inet opensandbox egress tcp dport 853 drop") expectContains(t, rendered, "add rule inet opensandbox egress udp dport 853 drop") + expectContains(t, rendered, "add set inet opensandbox dyn_allow_v4 { type ipv4_addr; timeout 300s; }") + expectContains(t, rendered, "add set inet opensandbox dyn_allow_v6 { type ipv6_addr; timeout 300s; }") expectContains(t, rendered, "add element inet opensandbox allow_v4 { 1.1.1.1, 2.2.0.0/16 }") expectContains(t, rendered, "add element inet opensandbox deny_v6 { 2001:db8::/32 }") + expectContains(t, rendered, "add rule inet opensandbox egress ip daddr @dyn_allow_v4 accept") + expectContains(t, rendered, "add rule inet opensandbox egress ip6 daddr @dyn_allow_v6 accept") expectContains(t, rendered, "add rule inet opensandbox egress counter drop") } @@ -138,3 +144,50 @@ func TestApplyStatic_DoHBlocklist(t *testing.T) { expectContains(t, rendered, "add rule inet opensandbox egress ip daddr @doh_block_v4 tcp dport 443 drop") expectContains(t, rendered, "add rule inet opensandbox egress ip6 daddr @doh_block_v6 tcp dport 443 drop") } + +func TestAddResolvedIPs_BuildsDynamicElements(t *testing.T) { + var rendered string + m := NewManagerWithRunner(func(_ context.Context, script string) ([]byte, error) { + rendered = script + return nil, nil + }) + ips := []ResolvedIP{ + {Addr: netip.MustParseAddr("1.1.1.1"), TTL: 120 * time.Second}, + {Addr: netip.MustParseAddr("2001:db8::1"), TTL: 60 * time.Second}, + } + if err := m.AddResolvedIPs(context.Background(), ips); err != nil { + t.Fatalf("AddResolvedIPs: %v", err) + } + expectContains(t, rendered, "add element inet opensandbox dyn_allow_v4 { 1.1.1.1 timeout 120s }") + expectContains(t, rendered, "add element inet opensandbox dyn_allow_v6 { 2001:db8::1 timeout 60s }") +} + +func TestAddResolvedIPs_ClampsTTL(t *testing.T) { + var rendered string + m := NewManagerWithRunner(func(_ context.Context, script string) ([]byte, error) { + rendered = script + return nil, nil + }) + ips := []ResolvedIP{ + {Addr: netip.MustParseAddr("10.0.0.1"), TTL: 10 * time.Second}, + {Addr: netip.MustParseAddr("10.0.0.2"), TTL: 9999 * time.Second}, + } + if err := m.AddResolvedIPs(context.Background(), ips); err != nil { + t.Fatalf("AddResolvedIPs: %v", err) + } + expectContains(t, rendered, "10.0.0.1 timeout 60s") + expectContains(t, rendered, "10.0.0.2 timeout 300s") +} + +func TestAddResolvedIPs_EmptyNoOp(t *testing.T) { + m := NewManagerWithRunner(func(_ context.Context, script string) ([]byte, error) { + t.Fatal("runner should not be called for empty ips") + return nil, nil + }) + if err := m.AddResolvedIPs(context.Background(), nil); err != nil { + t.Fatalf("AddResolvedIPs: %v", err) + } + if err := m.AddResolvedIPs(context.Background(), []ResolvedIP{}); err != nil { + t.Fatalf("AddResolvedIPs: %v", err) + } +} diff --git a/components/egress/pkg/policy/policy.go b/components/egress/pkg/policy/policy.go index 9bc55e77..84cdb5fd 100644 --- a/components/egress/pkg/policy/policy.go +++ b/components/egress/pkg/policy/policy.go @@ -143,6 +143,26 @@ func normalizePolicy(p *NetworkPolicy) error { return nil } +// WithExtraAllowIPs returns a copy of the policy with additional allow rules for each IP. +// Used at startup to whitelist system nameservers so client DNS and proxy upstream work with private DNS. +func (p *NetworkPolicy) WithExtraAllowIPs(ips []netip.Addr) *NetworkPolicy { + if p == nil || len(ips) == 0 { + return p + } + out := *p + out.Egress = make([]EgressRule, len(p.Egress), len(p.Egress)+len(ips)) + copy(out.Egress, p.Egress) + for _, ip := range ips { + out.Egress = append(out.Egress, EgressRule{ + Action: ActionAllow, + Target: ip.String(), + targetKind: targetIP, + ip: ip, + }) + } + return &out +} + // StaticIPSets splits static IP/CIDR rules into allow/deny IPv4/IPv6 buckets. // Empty or nil policy returns empty slices. func (p *NetworkPolicy) StaticIPSets() (allowV4, allowV6, denyV4, denyV6 []string) { diff --git a/components/egress/pkg/policy/policy_test.go b/components/egress/pkg/policy/policy_test.go index be654c37..135d6995 100644 --- a/components/egress/pkg/policy/policy_test.go +++ b/components/egress/pkg/policy/policy_test.go @@ -14,7 +14,10 @@ package policy -import "testing" +import ( + "net/netip" + "testing" +) func TestParsePolicy_EmptyOrNullDefaultsDeny(t *testing.T) { cases := []string{ @@ -103,3 +106,35 @@ func TestParsePolicy_EmptyTargetError(t *testing.T) { t.Fatalf("expected error for empty target") } } + +func TestWithExtraAllowIPs(t *testing.T) { + p, _ := ParsePolicy(`{"defaultAction":"deny","egress":[{"action":"allow","target":"example.com"}]}`) + allowV4, allowV6, _, _ := p.StaticIPSets() + if len(allowV4) != 0 || len(allowV6) != 0 { + t.Fatalf("domain-only policy should have no static allow IPs, got allowV4=%v allowV6=%v", allowV4, allowV6) + } + + ips := []netip.Addr{ + netip.MustParseAddr("192.168.65.7"), + netip.MustParseAddr("2001:db8::1"), + } + merged := p.WithExtraAllowIPs(ips) + if merged == p { + t.Fatalf("expected new policy instance") + } + allowV4, allowV6, _, _ = merged.StaticIPSets() + if len(allowV4) != 1 || allowV4[0] != "192.168.65.7" { + t.Fatalf("allowV4 expected [192.168.65.7], got %v", allowV4) + } + if len(allowV6) != 1 || allowV6[0] != "2001:db8::1" { + t.Fatalf("allowV6 expected [2001:db8::1], got %v", allowV6) + } + + // nil/empty ips returns same policy + if got := p.WithExtraAllowIPs(nil); got != p { + t.Fatalf("WithExtraAllowIPs(nil) should return same policy") + } + if got := p.WithExtraAllowIPs([]netip.Addr{}); got != p { + t.Fatalf("WithExtraAllowIPs([]) should return same policy") + } +} diff --git a/components/egress/policy_server.go b/components/egress/policy_server.go index 73a2b8e1..25117dce 100644 --- a/components/egress/policy_server.go +++ b/components/egress/policy_server.go @@ -23,10 +23,12 @@ import ( "io" "log" "net/http" + "net/netip" "strings" "time" "github.com/alibaba/opensandbox/egress/pkg/constants" + "github.com/alibaba/opensandbox/egress/pkg/nftables" "github.com/alibaba/opensandbox/egress/pkg/policy" ) @@ -40,18 +42,21 @@ type enforcementReporter interface { EnforcementMode() string } -// nftApplier is a narrow interface for applying static IP/CIDR rules. +// nftApplier applies static policy and optional dynamic DNS-learned IPs to nftables. type nftApplier interface { ApplyStatic(context.Context, *policy.NetworkPolicy) error + AddResolvedIPs(context.Context, []nftables.ResolvedIP) error } // startPolicyServer launches a lightweight HTTP API for updating the egress policy at runtime. // Supported endpoints: // - GET /policy : returns the currently enforced policy. // - POST /policy : replace the policy; empty body resets to default deny-all. -func startPolicyServer(ctx context.Context, proxy policyUpdater, nft nftApplier, enforcementMode string, addr string, token string) error { +// +// nameserverIPs are merged into every applied policy so system DNS stays allowed (e.g. private DNS). +func startPolicyServer(ctx context.Context, proxy policyUpdater, nft nftApplier, enforcementMode string, addr string, token string, nameserverIPs []netip.Addr) error { mux := http.NewServeMux() - handler := &policyServer{proxy: proxy, nft: nft, token: token, enforcementMode: enforcementMode} + handler := &policyServer{proxy: proxy, nft: nft, token: token, enforcementMode: enforcementMode, nameserverIPs: nameserverIPs} mux.HandleFunc("/policy", handler.handlePolicy) mux.HandleFunc("/healthz", func(w http.ResponseWriter, _ *http.Request) { w.WriteHeader(http.StatusOK) @@ -98,6 +103,7 @@ type policyServer struct { server *http.Server token string enforcementMode string + nameserverIPs []netip.Addr } func (s *policyServer) handlePolicy(w http.ResponseWriter, r *http.Request) { @@ -136,7 +142,15 @@ func (s *policyServer) handlePost(w http.ResponseWriter, r *http.Request) { } raw := strings.TrimSpace(string(body)) if raw == "" { - s.proxy.UpdatePolicy(policy.DefaultDenyPolicy()) + def := policy.DefaultDenyPolicy() + if s.nft != nil { + defWithNS := def.WithExtraAllowIPs(s.nameserverIPs) + if err := s.nft.ApplyStatic(r.Context(), defWithNS); err != nil { + http.Error(w, fmt.Sprintf("failed to apply nftables: %v", err), http.StatusInternalServerError) + return + } + } + s.proxy.UpdatePolicy(def) writeJSON(w, http.StatusOK, map[string]any{ "status": "ok", "mode": "deny_all", @@ -151,7 +165,8 @@ func (s *policyServer) handlePost(w http.ResponseWriter, r *http.Request) { return } if s.nft != nil { - if err := s.nft.ApplyStatic(r.Context(), pol); err != nil { + polWithNS := pol.WithExtraAllowIPs(s.nameserverIPs) + if err := s.nft.ApplyStatic(r.Context(), polWithNS); err != nil { http.Error(w, fmt.Sprintf("failed to apply nftables policy: %v", err), http.StatusInternalServerError) return } diff --git a/components/egress/policy_server_test.go b/components/egress/policy_server_test.go index 12e3b4db..032d1435 100644 --- a/components/egress/policy_server_test.go +++ b/components/egress/policy_server_test.go @@ -23,6 +23,7 @@ import ( "strings" "testing" + "github.com/alibaba/opensandbox/egress/pkg/nftables" "github.com/alibaba/opensandbox/egress/pkg/policy" ) @@ -50,6 +51,10 @@ func (s *stubNft) ApplyStatic(_ context.Context, p *policy.NetworkPolicy) error return s.err } +func (s *stubNft) AddResolvedIPs(_ context.Context, _ []nftables.ResolvedIP) error { + return nil +} + func TestHandlePolicy_AppliesNftAndUpdatesProxy(t *testing.T) { proxy := &stubProxy{} nft := &stubNft{} diff --git a/components/egress/tests/bench-dns-nft.sh b/components/egress/tests/bench-dns-nft.sh new file mode 100755 index 00000000..be046344 --- /dev/null +++ b/components/egress/tests/bench-dns-nft.sh @@ -0,0 +1,304 @@ +#!/bin/bash + +# Copyright 2026 Alibaba Group Holding Ltd. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# E2E benchmark: baseline (no egress) vs dns (pass-through) vs dns+nft (sync dynamic IP write). +# Baseline: plain curl container, same workload, no container. Then egress dns and dns+nft. +# Metrics: E2E latency (p50, p99), throughput (req/s). +# +# Usage: ./tests/bench-dns-nft.sh +# Optional: BENCH_SAMPLE_SIZE=n to randomly use n domains from hostname.txt (default: use all). +# Requires: Docker, curl in PATH (for policy push). Egress image and baseline image (default curlimages/curl:latest) must have curl. +# Domain list: tests/hostname.txt (one domain per line). + +set -euo pipefail + +info() { echo "[$(date +%H:%M:%S)] $*"; } + +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +HOSTNAME_FILE="${SCRIPT_DIR}/hostname.txt" + +IMG="opensandbox/egress:local" +BASELINE_IMG="${BASELINE_IMG:-curlimages/curl:latest}" +CONTAINER_NAME="egress-bench-e2e" +POLICY_PORT=18080 +ROUNDS=10 + +# Load benchmark domains from hostname.txt (one domain per line). +if [[ ! -f "${HOSTNAME_FILE}" ]] || [[ ! -s "${HOSTNAME_FILE}" ]]; then + echo "Error: domain file not found or empty: ${HOSTNAME_FILE}" >&2 + exit 1 +fi +BENCH_DOMAINS=() +while IFS= read -r line; do + line="${line%%#*}" + line="${line#"${line%%[![:space:]]*}"}" + line="${line%"${line##*[![:space:]]}"}" + [[ -n "$line" ]] && BENCH_DOMAINS+=( "$line" ) +done < "${HOSTNAME_FILE}" +total_in_file=${#BENCH_DOMAINS[@]} +if [[ "$total_in_file" -eq 0 ]]; then + echo "Error: no domains in ${HOSTNAME_FILE}" >&2 + exit 1 +fi + +# Optionally randomly sample n domains (BENCH_SAMPLE_SIZE); if unset or 0, use all. +if [[ -n "${BENCH_SAMPLE_SIZE:-}" ]] && [[ "${BENCH_SAMPLE_SIZE}" -gt 0 ]]; then + if [[ "${BENCH_SAMPLE_SIZE}" -ge "$total_in_file" ]]; then + NUM_DOMAINS=$total_in_file + else + # Portable shuffle: shuf (Linux), gshuf (macOS coreutils), else awk + if command -v shuf >/dev/null 2>&1; then + BENCH_DOMAINS=( $(printf '%s\n' "${BENCH_DOMAINS[@]}" | shuf -n "${BENCH_SAMPLE_SIZE}") ) + elif command -v gshuf >/dev/null 2>&1; then + BENCH_DOMAINS=( $(printf '%s\n' "${BENCH_DOMAINS[@]}" | gshuf -n "${BENCH_SAMPLE_SIZE}") ) + else + BENCH_DOMAINS=( $(printf '%s\n' "${BENCH_DOMAINS[@]}" | awk 'BEGIN{srand()} {printf "%s\t%s\n", rand(), $0}' | sort -n | cut -f2- | head -n "${BENCH_SAMPLE_SIZE}") ) + fi + NUM_DOMAINS=${#BENCH_DOMAINS[@]} + info "Using ${NUM_DOMAINS} randomly sampled domains (of ${total_in_file}) from ${HOSTNAME_FILE}" + fi +else + NUM_DOMAINS=$total_in_file +fi +TOTAL_REQUESTS=$((ROUNDS * NUM_DOMAINS)) +CURL_TIMEOUT=10 +# Max wall time for the benchmark loop (docker exec); avoid hanging forever. +BENCH_EXEC_TIMEOUT=300 + +cleanup() { + docker rm -f "${CONTAINER_NAME}" >/dev/null 2>&1 || true +} +trap cleanup EXIT + +# Compute stats from a file with one numeric value per line (e.g. time_total in seconds). +# Output: count avg_s p50_s p99_s +stats() { + local file="$1" + if [[ ! -f "$file" ]] || [[ ! -s "$file" ]]; then + echo "0 0 0 0" + return + fi + sort -n "$file" > "${file}.sorted" + local n + n=$(wc -l < "${file}.sorted") + if [[ "$n" -eq 0 ]]; then + echo "0 0 0 0" + return + fi + local avg p50 p99 + avg=$(awk '{s+=$1; c++} END { if(c>0) print s/c; else print 0 }' "$file") + p50=$(awk -v n="$n" 'NR==int(n*0.5+0.5){print $1; exit}' "${file}.sorted") + p99=$(awk -v n="$n" 'NR==int(n*0.99+0.5){print $1; exit}' "${file}.sorted") + echo "$n $avg $p50 $p99" +} + +# Run workload inside CONTAINER_NAME; /tmp/bench-domains.txt must already exist in container. +# Usage: run_bench_to [limit] [rounds] [timeout] +run_bench_to() { + local outfile="$1" + local limit="${2:-9999}" + local rounds="${3:-1}" + local use_timeout="${4:-}" + local cmd=( + docker exec -e BENCH_TIMEOUT="${CURL_TIMEOUT}" -e BENCH_OUTFILE="${outfile}" -e BENCH_LIMIT="${limit}" -e BENCH_ROUNDS="${rounds}" \ + "${CONTAINER_NAME}" sh -c ' + : > "$BENCH_OUTFILE" + r=1 + while [ "$r" -le "$BENCH_ROUNDS" ]; do + n=0 + while IFS= read -r url && [ "$n" -lt "$BENCH_LIMIT" ]; do + ( curl -o /dev/null -s -I -w "%{time_namelookup}\t%{time_total}\n" --max-time "$BENCH_TIMEOUT" "$url" >> "$BENCH_OUTFILE" ) & + n=$((n+1)) + done < /tmp/bench-domains.txt + wait + r=$((r+1)) + done + ' + ) + if [[ "$use_timeout" == "timeout" ]] && command -v timeout >/dev/null 2>&1; then + timeout "${BENCH_EXEC_TIMEOUT}" "${cmd[@]}" + else + "${cmd[@]}" + fi +} + +# Copy URL file into container (create temp file, docker cp, rm). Uses BENCH_DOMAINS. +copy_url_file_to_container() { + local url_file="/tmp/bench-e2e-domains-$$.txt" + : > "${url_file}" + for d in "${BENCH_DOMAINS[@]}"; do + echo "https://${d}" >> "${url_file}" + done + docker cp "${url_file}" "${CONTAINER_NAME}:/tmp/bench-domains.txt" + rm -f "${url_file}" +} + +# Run warm-up + timed benchmark, collect timings. Writes /tmp/bench-e2e-{mode}-total.txt, -namelookup.txt, -wall.txt. +# Requires: CONTAINER_NAME running, /tmp/bench-domains.txt inside container. +run_workload() { + local mode="$1" + local out_total="/tmp/bench-e2e-${mode}-total.txt" + local out_namelookup="/tmp/bench-e2e-${mode}-namelookup.txt" + : > "$out_total" + : > "$out_namelookup" + + local first_url="https://${BENCH_DOMAINS[0]}" + sleep 1 + # HEAD request: no response body, only check DNS + TCP + TLS + HTTP response. + if ! docker exec "${CONTAINER_NAME}" curl -o /dev/null -s -I --max-time "${CURL_TIMEOUT}" "${first_url}"; then + info "Warm-up curl failed; stderr from one attempt:" + docker exec "${CONTAINER_NAME}" curl -o /dev/null -s -I --max-time 5 "${first_url}" 2>&1 || true + return 1 + fi + + info "Warm-up: first 10 domains, 1 round..." + bench_ret=0 + run_bench_to /tmp/bench-warmup.txt 10 1 2>/tmp/bench-e2e-stderr.txt || bench_ret=$? + if [[ "$bench_ret" -ne 0 ]]; then + info "Warm-up run failed (exit $bench_ret); continuing with timed run anyway." + fi + + info "Running ${TOTAL_REQUESTS} E2E requests (${ROUNDS} rounds × ${NUM_DOMAINS} domains) inside container (max ${BENCH_EXEC_TIMEOUT}s)..." + local start_ts + start_ts=$(date +%s.%N) + bench_ret=0 + run_bench_to /tmp/bench-raw.txt 9999 "${ROUNDS}" timeout 2>/tmp/bench-e2e-stderr.txt || bench_ret=$? + if [[ "$bench_ret" -ne 0 ]]; then + info "Benchmark run failed (exit $bench_ret) or hit timeout; using partial results if any." + fi + docker cp "${CONTAINER_NAME}:/tmp/bench-raw.txt" /tmp/bench-e2e-raw.txt 2>/dev/null || true + local end_ts + end_ts=$(date +%s.%N) + + if [[ -s /tmp/bench-e2e-stderr.txt ]]; then + info "docker exec stderr (first 10 lines):" + head -10 /tmp/bench-e2e-stderr.txt >&2 + fi + if [[ ! -f /tmp/bench-e2e-raw.txt ]]; then + : > /tmp/bench-e2e-raw.txt + fi + local lines + lines=$(wc -l < /tmp/bench-e2e-raw.txt 2>/dev/null || echo 0) + if [[ "$lines" -lt $((TOTAL_REQUESTS / 2)) ]]; then + info "WARN: only ${lines}/${TOTAL_REQUESTS} responses captured; curl may be failing inside container." + fi + + awk -F'\t' '{print $2}' /tmp/bench-e2e-raw.txt 2>/dev/null > "$out_total" + awk -F'\t' '{print $1}' /tmp/bench-e2e-raw.txt 2>/dev/null > "$out_namelookup" + local wall_s + wall_s=$(awk -v s="$start_ts" -v e="$end_ts" 'BEGIN { print e - s }') + echo "$wall_s" > "/tmp/bench-e2e-${mode}-wall.txt" +} + +# Run one benchmark phase: start container with given mode, push policy, run client workload, collect timings. +# Usage: run_phase "dns" | "dns+nft" +run_phase() { + local mode="$1" + info "Phase: ${mode}" + cleanup + docker run -d --name "${CONTAINER_NAME}" \ + --cap-add=NET_ADMIN \ + --sysctl net.ipv6.conf.all.disable_ipv6=1 \ + --sysctl net.ipv6.conf.default.disable_ipv6=1 \ + -e OPENSANDBOX_EGRESS_MODE="${mode}" \ + -p "${POLICY_PORT}:18080" \ + "${IMG}" + + for i in $(seq 1 30); do + if curl -sf "http://127.0.0.1:${POLICY_PORT}/healthz" >/dev/null 2>&1; then + break + fi + sleep 0.5 + done + + local policy_egress="" + for d in "${BENCH_DOMAINS[@]}"; do + policy_egress="${policy_egress}{\"action\":\"allow\",\"target\":\"${d}\"}," + done + policy_egress="${policy_egress%,}" + local policy_json="{\"defaultAction\":\"deny\",\"egress\":[${policy_egress}]}" + curl -sf -XPOST "http://127.0.0.1:${POLICY_PORT}/policy" -d "${policy_json}" >/dev/null + + copy_url_file_to_container + run_workload "${mode}" +} + +# Run baseline phase: plain curl container, no egress container. Same workload for comparison. +run_phase_baseline() { + info "Phase: baseline (no egress)" + cleanup + docker pull "${BASELINE_IMG}" > /dev/null 2>&1 + docker run -d --name "${CONTAINER_NAME}" "${BASELINE_IMG}" sleep 3600 + sleep 2 + copy_url_file_to_container + run_workload "baseline" +} + +# Print comparison table (baseline, dns, dns+nft) +report() { + local nb n1 n2 avg0 avg1 avg2 p50_0 p50_1 p50_2 p99_0 p99_1 p99_2 wall0 wall1 wall2 + read -r nb avg0 p50_0 p99_0 <<< "$(stats /tmp/bench-e2e-baseline-total.txt)" + read -r n1 avg1 p50_1 p99_1 <<< "$(stats /tmp/bench-e2e-dns-total.txt)" + read -r n2 avg2 p50_2 p99_2 <<< "$(stats /tmp/bench-e2e-dns+nft-total.txt)" + wall0=$(cat /tmp/bench-e2e-baseline-wall.txt 2>/dev/null || echo "0") + wall1=$(cat /tmp/bench-e2e-dns-wall.txt 2>/dev/null || echo "0") + wall2=$(cat /tmp/bench-e2e-dns+nft-wall.txt 2>/dev/null || echo "0") + if [[ "${nb:-0}" -eq 0 ]] || [[ "${n1:-0}" -eq 0 ]] || [[ "${n2:-0}" -eq 0 ]]; then + echo "WARN: some phases had no successful requests; check container logs and network." + fi + + local rps0 rps1 rps2 + rps0=$(awk -v n="$nb" -v w="$wall0" 'BEGIN { print (w>0 && n>0) ? n/w : 0 }') + rps1=$(awk -v n="$n1" -v w="$wall1" 'BEGIN { print (w>0 && n>0) ? n/w : 0 }') + rps2=$(awk -v n="$n2" -v w="$wall2" 'BEGIN { print (w>0 && n>0) ? n/w : 0 }') + + echo "" + echo "========== E2E benchmark: baseline vs dns vs dns+nft ==========" + echo "Workload: ${TOTAL_REQUESTS} requests (${ROUNDS} rounds × ${NUM_DOMAINS} domains)" + echo "" + local ov_avg1 ov_p50_1 ov_p99_1 ov_rps1 ov_avg2 ov_p50_2 ov_p99_2 ov_rps2 + ov_avg1=$(awk -v a="$avg1" -v b="$avg0" 'BEGIN { printf "%+.1f", (b>0 && b!="") ? (a-b)/b*100 : 0 }') + ov_p50_1=$(awk -v a="$p50_1" -v b="$p50_0" 'BEGIN { printf "%+.1f", (b>0 && b!="") ? (a-b)/b*100 : 0 }') + ov_p99_1=$(awk -v a="$p99_1" -v b="$p99_0" 'BEGIN { printf "%+.1f", (b>0 && b!="") ? (a-b)/b*100 : 0 }') + ov_rps1=$(awk -v a="$rps1" -v b="$rps0" 'BEGIN { printf "%+.1f", (b>0 && b!="") ? (b-a)/b*100 : 0 }') + ov_avg2=$(awk -v a="$avg2" -v b="$avg0" 'BEGIN { printf "%+.1f", (b>0 && b!="") ? (a-b)/b*100 : 0 }') + ov_p50_2=$(awk -v a="$p50_2" -v b="$p50_0" 'BEGIN { printf "%+.1f", (b>0 && b!="") ? (a-b)/b*100 : 0 }') + ov_p99_2=$(awk -v a="$p99_2" -v b="$p99_0" 'BEGIN { printf "%+.1f", (b>0 && b!="") ? (a-b)/b*100 : 0 }') + ov_rps2=$(awk -v a="$rps2" -v b="$rps0" 'BEGIN { printf "%+.1f", (b>0 && b!="") ? (b-a)/b*100 : 0 }') + + printf "%-10s %14s %20s %20s %20s\n" "Mode" "Req/s" "Avg(s)" "P50(s)" "P99(s)" + printf "%-10s %14s %20s %20s %20s\n" "baseline" "$rps0" "$avg0" "$p50_0" "$p99_0" + printf "%-10s %14s %20s %20s %20s\n" "dns" "$(printf '%.2f(%s%%)' "$rps1" "$ov_rps1")" "$(printf '%.3f(%s%%)' "$avg1" "$ov_avg1")" "$(printf '%.3f(%s%%)' "$p50_1" "$ov_p50_1")" "$(printf '%.3f(%s%%)' "$p99_1" "$ov_p99_1")" + printf "%-10s %14s %20s %20s %20s\n" "dns+nft" "$(printf '%.2f(%s%%)' "$rps2" "$ov_rps2")" "$(printf '%.3f(%s%%)' "$avg2" "$ov_avg2")" "$(printf '%.3f(%s%%)' "$p50_2" "$ov_p50_2")" "$(printf '%.3f(%s%%)' "$p99_2" "$ov_p99_2")" + echo "" + echo "Overhead in parentheses vs baseline: latency +%% = slower, Req/s -%% = lower throughput." + echo "baseline: Plain container (${BASELINE_IMG}), no egress container." + echo "dns: DNS proxy only, no nft write (pass-through)." + echo "dns+nft: DNS proxy + sync AddResolvedIPs before each DNS reply (L2 enforcement)." + echo "" + echo "Note: Warm-up runs before each phase. Baseline gives no-proxy comparison." + echo "==========" +} + +info "Building image ${IMG}" +docker build -t "${IMG}" . > /dev/null 2>&1 + +run_phase_baseline +run_phase "dns+nft" +run_phase "dns" +report +info "Cleaning up" +cleanup diff --git a/components/egress/tests/hostname.txt b/components/egress/tests/hostname.txt new file mode 100644 index 00000000..9a4b895f --- /dev/null +++ b/components/egress/tests/hostname.txt @@ -0,0 +1,101 @@ +example.com +example.org +example.net +example.edu +example.io +github.com +github.io +google.com +cloudflare.com +amazon.com +wikipedia.org +mozilla.org +apple.com +microsoft.com +yahoo.com +facebook.com +twitter.com +instagram.com +linkedin.com +reddit.com +stackoverflow.com +npmjs.com +python.org +golang.org +rust-lang.org +docker.com +kubernetes.io +apache.org +gnu.org +kernel.org +ibm.com +oracle.com +openai.com +anthropic.com +stripe.com +slack.com +dropbox.com +spotify.com +netflix.com +twitch.tv +discord.com +zoom.us +medium.com +substack.com +blogger.com +tumblr.com +imgur.com +flickr.com +vimeo.com +soundcloud.com +bandcamp.com +patreon.com +kickstarter.com +etsy.com +ebay.com +craigslist.org +alibaba.com +bing.com +duckduckgo.com +brave.com +opera.com +protonmail.com +fastmail.com +zoho.com +notion.so +trello.com +asana.com +atlassian.com +bitbucket.org +gitlab.com +sourceforge.net +codepen.io +vercel.com +netlify.com +heroku.com +digitalocean.com +linode.com +vultr.com +ovh.com +hetzner.com +scaleway.com +archlinux.org +debian.org +ubuntu.com +fedoraproject.org +opensuse.org +freebsd.org +openbsd.org +mysql.com +mongodb.com +redis.io +elastic.co +nodejs.org +reactjs.org +vuejs.org +svelte.dev +nextjs.org +nuxtjs.org +jquery.com +bootstrap.com +tailwindcss.com diff --git a/components/egress/tests/smoke-dynamic-ip.sh b/components/egress/tests/smoke-dynamic-ip.sh new file mode 100755 index 00000000..f920cdfc --- /dev/null +++ b/components/egress/tests/smoke-dynamic-ip.sh @@ -0,0 +1,83 @@ +#!/bin/bash + +# Copyright 2026 Alibaba Group Holding Ltd. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# Smoke test: default deny + domain allow in dns+nft mode. +# Verifies that allowing a domain causes its resolved IP to be added to nft (dynamic IP), +# so that curl to that domain succeeds without static IP/CIDR in policy. + +set -euo pipefail + +IMG="opensandbox/egress:local" +containerName="egress-smoke-dynamic-ip" +POLICY_PORT=18080 + +info() { echo "[$(date +%H:%M:%S)] $*"; } + +cleanup() { + docker rm -f "${containerName}" >/dev/null 2>&1 || true +} +trap cleanup EXIT + +info "Building image ${IMG}" +docker build -t "${IMG}" . + +info "Starting sidecar (dns+nft)" +docker run -d --name "${containerName}" \ + --cap-add=NET_ADMIN \ + --sysctl net.ipv6.conf.all.disable_ipv6=1 \ + --sysctl net.ipv6.conf.default.disable_ipv6=1 \ + -e OPENSANDBOX_EGRESS_MODE=dns+nft \ + -p ${POLICY_PORT}:18080 \ + "${IMG}" + +info "Waiting for policy server..." +for i in $(seq 1 50); do + if curl -sf "http://127.0.0.1:${POLICY_PORT}/healthz" >/dev/null; then + break + fi + sleep 0.5 +done + +info "Pushing policy (default deny; allow example.com only)" +curl -sSf -XPOST "http://127.0.0.1:${POLICY_PORT}/policy" \ + -d '{"defaultAction":"deny","egress":[{"action":"allow","target":"example.com"}]}' + +run_in_app() { + docker run --rm --network container:"${containerName}" curlimages/curl "$@" +} + +pass() { info "PASS: $*"; } +fail() { echo "FAIL: $*" >&2; exit 1; } + +info "Test: allowed domain (example.com) should succeed via dynamic IP" +run_in_app -I https://example.com --max-time 15 >/dev/null 2>&1 || fail "example.com should succeed (DNS allow + dynamic IP in nft)" +pass "example.com allowed" + +info "Test: denied domain (api.github.com) should fail" +if run_in_app -I https://api.github.com --max-time 8 >/dev/null 2>&1; then + fail "api.github.com should be blocked" +else + pass "api.github.com blocked" +fi + +info "Test: denied IP (1.1.1.1) should fail" +if run_in_app -I 1.1.1.1 --max-time 8 >/dev/null 2>&1; then + fail "1.1.1.1 should be blocked" +else + pass "1.1.1.1 blocked" +fi + +info "All smoke tests (dynamic IP) passed."