Skip to content
16 changes: 8 additions & 8 deletions .claude/commands/review-challenge.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
allowed-tools: Bash(kubeasy-cli*),Bash(kubectl*),Bash(cat*),Bash(grep*),Bash(ls*),Bash(sleep*),Bash(head*),Bash(tail*),Read,Write,Edit
allowed-tools: Bash(kubeasy*),Bash(kubectl*),Bash(cat*),Bash(grep*),Bash(ls*),Bash(sleep*),Bash(head*),Bash(tail*),Read,Write,Edit
description: Review a Kubeasy challenge for quality, pedagogy, and bypass resistance
---

Expand Down Expand Up @@ -32,7 +32,7 @@ You must experience the challenge as a learner first.
Run structural validation before deploying anything:

```bash
kubeasy-cli dev lint <slug>
kubeasy dev lint <slug>
```

If lint fails → **stop the review immediately**, score 0/20, verdict ❌ Fail.
Expand All @@ -41,15 +41,15 @@ Write the PR comment with lint errors and exit.
### Phase 3: Deploy and Verify Broken State

```bash
kubeasy-cli dev apply <slug> --clean
kubeasy dev apply <slug> --clean
sleep 10
kubeasy-cli dev status <slug>
kubeasy dev status <slug>
```

**Then immediately run validations:**

```bash
kubeasy-cli dev validate <slug>
kubeasy dev validate <slug>
```

All validations MUST FAIL at this point. This confirms the broken state is real.
Expand All @@ -74,7 +74,7 @@ kubectl get events -n <slug> --sort-by='.lastTimestamp'

1. Form a hypothesis about what's wrong
2. Apply a fix using `kubectl`
3. Verify with `kubeasy-cli dev validate <slug>`
3. Verify with `kubeasy dev validate <slug>`

**Maximum 5 attempts.** If you can't solve it after 5 tries, flag the challenge and continue.

Expand All @@ -83,7 +83,7 @@ kubectl get events -n <slug> --sort-by='.lastTimestamp'
Reset to broken state:

```bash
kubeasy-cli dev apply <slug> --clean
kubeasy dev apply <slug> --clean
sleep 10
```

Expand Down Expand Up @@ -167,7 +167,7 @@ Write a spoiler-free PR comment to `review-<slug>-pr-comment.md` in the current
### Phase 10: Clean up

```bash
kubeasy-cli dev clean <slug>
kubeasy dev clean <slug>
```

## Spoiler-Free Writing Guide
Expand Down
116 changes: 116 additions & 0 deletions cascading-blackout/challenge.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,116 @@
title: "Cascading Blackout"
type: "fix"
theme: "networking"
difficulty: "hard"
estimatedTime: 30

description: |
The order-processing platform was running perfectly until a recent security hardening push.
The edge proxy returns HTTP 200 on its health endpoint, but actual order requests
fail silently — customers see empty responses or timeouts.
The team reports that "nothing changed in the application code" —
but the infrastructure change touched multiple components at once.

initialSituation: |
A three-tier order processing system is deployed in the namespace:
- An edge proxy (nginx) that routes requests to a backend service
- A backend application that processes orders and caches results
- A Redis cache used by the backend for session and order data
Each tier has its own Deployment, Service, and pods are running.
After a recent infrastructure change, the edge proxy health check still works,
but end-to-end order requests fail.
The security hardening introduced several changes simultaneously —
investigate each tier carefully before concluding the root cause.

objective: |
Restore full end-to-end communication across the platform.
Orders submitted through the edge proxy must reach the backend,
and the backend must be able to read and write to the cache.
All services should remain healthy and reachable through their Services.

objectives:
- key: gateway-running
title: "Gateway Online"
description: "The edge proxy pods must be running and ready"
order: 1
type: condition
spec:
target:
kind: Pod
labelSelector:
app: edge-proxy
checks:
- type: Ready
status: "True"

- key: backend-running
title: "Backend Online"
description: "The backend pods must be running and ready"
order: 2
type: condition
spec:
target:
kind: Pod
labelSelector:
app: order-backend
checks:
- type: Ready
status: "True"

- key: cache-running
title: "Cache Online"
description: "The cache pods must be running and ready"
order: 3
type: condition
spec:
target:
kind: Pod
labelSelector:
app: order-cache
checks:
- type: Ready
status: "True"

- key: gateway-to-backend
title: "Gateway Reaches Backend"
description: "The edge proxy must be able to forward requests to the backend service"
order: 4
type: connectivity
spec:
sourcePod:
labelSelector:
app: edge-proxy
targets:
- url: "http://order-backend:8080/health"
expectedStatusCode: 200
timeoutSeconds: 5

- key: backend-service-identity
title: "Backend Service Classification"
description: "The backend pods are correctly classified within the platform"
order: 5
type: condition
spec:
target:
kind: Pod
labelSelector:
app: order-backend
tier: backend
checks:
- type: Initialized
status: "True"

- key: backend-healthy
title: "Backend Fully Operational"
description: "The backend reports healthy status including cache connectivity"
order: 6
type: log
spec:
target:
kind: Pod
labelSelector:
app: order-backend
container: order-backend
expectedStrings:
- "ready to accept connections"
sinceSeconds: 120
64 changes: 64 additions & 0 deletions cascading-blackout/manifests/backend.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
apiVersion: apps/v1
kind: Deployment
metadata:
name: order-backend
namespace: cascading-blackout
labels:
app: order-backend
spec:
replicas: 1
selector:
matchLabels:
app: order-backend
template:
metadata:
labels:
app: order-backend
spec:
containers:
- name: order-backend
image: busybox:1.36
ports:
- containerPort: 8080
command:
- /bin/sh
- -c
- |
# Background: check cache and update state flag
while true; do
if nc -z -w2 order-cache 6379 2>/dev/null; then
echo "[$(date)] ready to accept connections"
touch /tmp/cache_ok
else
echo "[$(date)] ERROR: cannot reach cache at order-cache:6379"
rm -f /tmp/cache_ok
fi
sleep 5
done &
# Foreground: HTTP server reflects cache state
while true; do
if [ -f /tmp/cache_ok ]; then
echo -e "HTTP/1.1 200 OK\r\nContent-Type: text/plain\r\n\r\nok" | nc -l -p 8080 -w5 || true
else
echo -e "HTTP/1.1 503 Service Unavailable\r\nContent-Type: text/plain\r\n\r\nunavailable" | nc -l -p 8080 -w5 || true
fi
done
readinessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 5
periodSeconds: 10
failureThreshold: 3
---
apiVersion: v1
kind: Service
metadata:
name: order-backend
namespace: cascading-blackout
spec:
selector:
app: order-backend
ports:
- port: 8080
targetPort: 8080
41 changes: 41 additions & 0 deletions cascading-blackout/manifests/cache.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
apiVersion: apps/v1
kind: Deployment
metadata:
name: order-cache
namespace: cascading-blackout
labels:
app: order-cache
tier: cache
spec:
replicas: 1
selector:
matchLabels:
app: order-cache
template:
metadata:
labels:
app: order-cache
tier: cache
spec:
containers:
- name: redis
image: redis:7-alpine
ports:
- containerPort: 6379
readinessProbe:
exec:
command: ["redis-cli", "ping"]
initialDelaySeconds: 5
periodSeconds: 5
---
apiVersion: v1
kind: Service
metadata:
name: order-cache
namespace: cascading-blackout
spec:
selector:
app: order-cache
ports:
- port: 6379
targetPort: 6379
78 changes: 78 additions & 0 deletions cascading-blackout/manifests/gateway.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
apiVersion: apps/v1
kind: Deployment
metadata:
name: edge-proxy
namespace: cascading-blackout
labels:
app: edge-proxy
tier: frontend
spec:
replicas: 1
selector:
matchLabels:
app: edge-proxy
template:
metadata:
labels:
app: edge-proxy
tier: frontend
spec:
containers:
- name: edge-proxy
image: nginx:1.25-alpine
ports:
- containerPort: 80
volumeMounts:
- name: nginx-config
mountPath: /etc/nginx/conf.d/default.conf
subPath: default.conf
readinessProbe:
httpGet:
path: /healthz
port: 80
initialDelaySeconds: 5
periodSeconds: 5
livenessProbe:
httpGet:
path: /healthz
port: 80
initialDelaySeconds: 10
periodSeconds: 10
volumes:
- name: nginx-config
configMap:
name: gateway-config
---
apiVersion: v1
kind: Service
metadata:
name: edge-proxy
namespace: cascading-blackout
spec:
selector:
app: edge-proxy
ports:
- port: 80
targetPort: 80
---
apiVersion: v1
kind: ConfigMap
metadata:
name: gateway-config
namespace: cascading-blackout
data:
default.conf: |
server {
listen 80;

location /healthz {
return 200 'ok';
add_header Content-Type text/plain;
}

location /api/ {
proxy_pass http://order-backend:8080/;
proxy_connect_timeout 5s;
proxy_read_timeout 10s;
}
}
Loading