Skip to content

[RORDEV-1940] Fix ES 8.0.x/8.1.x container startup crash caused by JDK 17.0.2 cgroup v2 bug#73

Open
coutoPL wants to merge 5 commits intomasterfrom
bugfix/RORDEV-1940
Open

[RORDEV-1940] Fix ES 8.0.x/8.1.x container startup crash caused by JDK 17.0.2 cgroup v2 bug#73
coutoPL wants to merge 5 commits intomasterfrom
bugfix/RORDEV-1940

Conversation

@coutoPL
Copy link
Collaborator

@coutoPL coutoPL commented Feb 14, 2026

Summary by CodeRabbit

  • Bug Fixes

    • Replaced bundled Elasticsearch JDK with compatible Corretto builds to fix ES 7.x/8.x compatibility.
    • Added an explicit image pull policy for more consistent ES container behavior.
  • New Features

    • Automatically builds and loads JDK-patched Elasticsearch images during environment setup so compatible images are available before services start.
    • Added a non-destructive check mode to detect whether a JDK patch is needed.

@coderabbitai

This comment was marked as spam.

coderabbitai[bot]

This comment was marked as resolved.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@environments/eck-ror/start.sh`:
- Line 158: The curl command that downloads Corretto uses the insecure -k flag;
remove the -k option from the curl invocation so TLS certificate verification
remains enabled (the line containing curl -fsSLk
"https://corretto.aws/downloads/resources/${CORRETTO_VERSION}/amazon-corretto-${CORRETTO_VERSION}-linux-${ARCH}.tar.gz"
-o /tmp/jdk.tar.gz should be changed to drop -k), keeping the rest of the flags
(-fsSL) and output path (/tmp/jdk.tar.gz) unchanged.

In `@environments/elk-ror/images/es/Dockerfile`:
- Line 31: The curl invocation that downloads Corretto uses the insecure -k
flag; remove -k from the curl command in the Dockerfile line containing "curl
-fsSLk
\"https://corretto.aws/downloads/resources/${CORRETTO_VERSION}/amazon-corretto-${CORRETTO_VERSION}-linux-${ARCH}.tar.gz\""
and also remove -k from the corresponding curl in start.sh (around the download
at ~line 158) so TLS certificate verification is enforced; keep the other flags
(-fsSL) unchanged and verify the commands still work against the corretto.aws
endpoint.
🧹 Nitpick comments (1)
environments/eck-ror/start.sh (1)

130-168: Duplicated version-mapping logic between this script and the Dockerfile.

The ES version → Corretto version mapping is duplicated in environments/elk-ror/images/es/Dockerfile (lines 17-27). If the affected version ranges change, both places must be updated in lockstep. Consider extracting the mapping to a shared file (e.g., a sourced shell snippet or a simple lookup table) to keep it DRY.

if [ -n "$CORRETTO_VERSION" ]; then \
ARCH=$(uname -m | sed 's/x86_64/x64/' | sed 's/arm64/aarch64/') && \
echo "Replacing buggy bundled JDK with Corretto $CORRETTO_VERSION for ES $ES_VERSION (arch: $ARCH)" && \
curl -fsSLk "https://corretto.aws/downloads/resources/${CORRETTO_VERSION}/amazon-corretto-${CORRETTO_VERSION}-linux-${ARCH}.tar.gz" -o /tmp/jdk.tar.gz && \
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Remove -k (insecure) flag from curl.

The -k flag disables TLS certificate verification, allowing MITM attacks during the JDK download. The corretto.aws endpoint has valid certificates, so this flag is unnecessary. The same issue exists in start.sh line 158.

Proposed fix
-      curl -fsSLk "https://corretto.aws/downloads/resources/${CORRETTO_VERSION}/amazon-corretto-${CORRETTO_VERSION}-linux-${ARCH}.tar.gz" -o /tmp/jdk.tar.gz && \
+      curl -fsSL "https://corretto.aws/downloads/resources/${CORRETTO_VERSION}/amazon-corretto-${CORRETTO_VERSION}-linux-${ARCH}.tar.gz" -o /tmp/jdk.tar.gz && \
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
curl -fsSLk "https://corretto.aws/downloads/resources/${CORRETTO_VERSION}/amazon-corretto-${CORRETTO_VERSION}-linux-${ARCH}.tar.gz" -o /tmp/jdk.tar.gz && \
curl -fsSL "https://corretto.aws/downloads/resources/${CORRETTO_VERSION}/amazon-corretto-${CORRETTO_VERSION}-linux-${ARCH}.tar.gz" -o /tmp/jdk.tar.gz && \
🤖 Prompt for AI Agents
In `@environments/elk-ror/images/es/Dockerfile` at line 31, The curl invocation
that downloads Corretto uses the insecure -k flag; remove -k from the curl
command in the Dockerfile line containing "curl -fsSLk
\"https://corretto.aws/downloads/resources/${CORRETTO_VERSION}/amazon-corretto-${CORRETTO_VERSION}-linux-${ARCH}.tar.gz\""
and also remove -k from the corresponding curl in start.sh (around the download
at ~line 158) so TLS certificate verification is enforced; keep the other flags
(-fsSL) unchanged and verify the commands still work against the corretto.aws
endpoint.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@environments/common/images/es-jdk-patch/patch-es-jdk.sh`:
- Around line 12-14: Validate ES_VERSION before using cut: check that ES_VERSION
is set and matches a semantic version pattern (e.g., digits.digits.digits) and
exit with a clear error if not; only after that parse MAJOR, MINOR, PATCH from
ES_VERSION and proceed with the existing numeric comparisons in the script
(referencing ES_VERSION, MAJOR, MINOR, PATCH). Ensure the guard triggers under
set -e (use an explicit if [[ -z "$ES_VERSION" || ! "$ES_VERSION" =~
^[0-9]+(\.[0-9]+){1,2}$ ]]; then ... fi) so downstream arithmetic won't run on
empty values.

In `@environments/elk-ror/images/es/Dockerfile`:
- Line 19: The Dockerfile ends with "USER root" which leaves the container
running as root; either switch back to the non-root Elasticsearch user after
privileged setup or document why root is required. Modify the Dockerfile so
privileged operations (package install, permission/chown for ReadonlyREST
patching) run as root, then add a final "USER elasticsearch" (or the appropriate
non-root username used in this image) to drop privileges, or if reverting to
non-root is not possible for E2E needs, add a clear comment next to the "USER
root" instruction explaining the ReadonlyREST patching workflow and why root is
required for this test image.
🧹 Nitpick comments (3)
environments/elk-ror/start.sh (1)

119-125: Consider skipping the patch build for unaffected ES versions.

The patch script supports a --check mode to test whether patching is needed. Currently, this builds a new Docker image for every ES version, even when no JDK replacement occurs. You could gate this step to avoid unnecessary image builds:

💡 Suggested optimization
-echo "Building JDK-patched ES base image ..."
-export ES_PATCHED_IMAGE="es-ror-patched:${ES_VERSION}"
-docker build \
-  --build-arg BASE_IMAGE="${ROR_ES_REPO}:${ES_VERSION}-ror-${ROR_ES_VERSION}" \
-  --build-arg ES_VERSION="$ES_VERSION" \
-  -t "$ES_PATCHED_IMAGE" \
-  ../common/images/es-jdk-patch/
+BASE_ES_IMAGE="${ROR_ES_REPO}:${ES_VERSION}-ror-${ROR_ES_VERSION}"
+if ES_VERSION="$ES_VERSION" ../common/images/es-jdk-patch/patch-es-jdk.sh --check 2>/dev/null; then
+  echo "Building JDK-patched ES base image ..."
+  export ES_PATCHED_IMAGE="es-ror-patched:${ES_VERSION}"
+  docker build \
+    --build-arg BASE_IMAGE="$BASE_ES_IMAGE" \
+    --build-arg ES_VERSION="$ES_VERSION" \
+    -t "$ES_PATCHED_IMAGE" \
+    ../common/images/es-jdk-patch/
+else
+  echo "No JDK patch needed for ES $ES_VERSION, using base image directly."
+  export ES_PATCHED_IMAGE="$BASE_ES_IMAGE"
+fi
environments/common/images/es-jdk-patch/Dockerfile (1)

1-7: Consider restoring the original user after patching.

Trivy flags USER root as the last instruction (DS-0002). While this is an intermediate image and the downstream es/Dockerfile sets its own USER, restoring elasticsearch here would be defensive and make this image safe to use directly as well.

Suggested improvement
 ARG ES_VERSION
 USER root
 COPY patch-es-jdk.sh /tmp/patch-es-jdk.sh
 RUN chmod +x /tmp/patch-es-jdk.sh && ES_VERSION=${ES_VERSION} /tmp/patch-es-jdk.sh && rm -f /tmp/patch-es-jdk.sh
+USER elasticsearch
environments/common/images/es-jdk-patch/patch-es-jdk.sh (1)

29-32: --check mode: consider placing it before the version parsing to clarify intent, or document that it requires ES_VERSION.

Minor: the --check argument is parsed after version logic, which is fine functionally, but could be confusing if someone passes --check without ES_VERSION (it would error out on the arithmetic comparisons before reaching line 29).

Comment on lines +12 to +14
MAJOR=$(echo "$ES_VERSION" | cut -d. -f1)
MINOR=$(echo "$ES_VERSION" | cut -d. -f2)
PATCH=$(echo "$ES_VERSION" | cut -d. -f3)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Missing validation for ES_VERSION.

If ES_VERSION is unset or empty, cut will produce empty strings and the arithmetic comparisons on lines 17–26 will fail with errors under set -e. Add a guard:

Suggested fix
+if [ -z "$ES_VERSION" ]; then
+  echo "ES_VERSION is not set" >&2
+  exit 1
+fi
+
 MAJOR=$(echo "$ES_VERSION" | cut -d. -f1)
 MINOR=$(echo "$ES_VERSION" | cut -d. -f2)
 PATCH=$(echo "$ES_VERSION" | cut -d. -f3)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
MAJOR=$(echo "$ES_VERSION" | cut -d. -f1)
MINOR=$(echo "$ES_VERSION" | cut -d. -f2)
PATCH=$(echo "$ES_VERSION" | cut -d. -f3)
if [ -z "$ES_VERSION" ]; then
echo "ES_VERSION is not set" >&2
exit 1
fi
MAJOR=$(echo "$ES_VERSION" | cut -d. -f1)
MINOR=$(echo "$ES_VERSION" | cut -d. -f2)
PATCH=$(echo "$ES_VERSION" | cut -d. -f3)
🤖 Prompt for AI Agents
In `@environments/common/images/es-jdk-patch/patch-es-jdk.sh` around lines 12 -
14, Validate ES_VERSION before using cut: check that ES_VERSION is set and
matches a semantic version pattern (e.g., digits.digits.digits) and exit with a
clear error if not; only after that parse MAJOR, MINOR, PATCH from ES_VERSION
and proceed with the existing numeric comparisons in the script (referencing
ES_VERSION, MAJOR, MINOR, PATCH). Ensure the guard triggers under set -e (use an
explicit if [[ -z "$ES_VERSION" || ! "$ES_VERSION" =~ ^[0-9]+(\.[0-9]+){1,2}$
]]; then ... fi) so downstream arithmetic won't run on empty values.

# For ROR_ES_VERSION < 1.64.0
ENV I_UNDERSTAND_IMPLICATION_OF_ES_PATCHING yes
USER root No newline at end of file
USER root
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Read the full Dockerfile to see context
cat -n environments/elk-ror/images/es/Dockerfile

Repository: beshu-tech/readonlyrest-e2e-tests

Length of output: 976


🏁 Script executed:

#!/bin/bash
# Check for entrypoint scripts in the same directory
ls -la environments/elk-ror/images/es/

Repository: beshu-tech/readonlyrest-e2e-tests

Length of output: 244


🏁 Script executed:

#!/bin/bash
# Search for any entrypoint or startup scripts that might handle privilege dropping
find environments/elk-ror/images/es -type f \( -name "entrypoint*" -o -name "start*" -o -name "docker-entrypoint*" \) 2>/dev/null | head -20

Repository: beshu-tech/readonlyrest-e2e-tests

Length of output: 59


🏁 Script executed:

#!/bin/bash
# Check other Dockerfiles in the repo to see common USER patterns
find . -name "Dockerfile*" -type f | xargs grep -l "USER" | head -10

Repository: beshu-tech/readonlyrest-e2e-tests

Length of output: 305


🏁 Script executed:

#!/bin/bash
# Check USER patterns in other Dockerfiles
for file in ./docker-based-ror-dev-env/src/Dockerfile ./environments/common/images/es-jdk-patch/Dockerfile ./environments/elk-ror/images/apm/Dockerfile ./environments/elk-ror/images/kbn/Dockerfile; do
  echo "=== $file ==="
  grep -n "USER" "$file" | tail -5
done

Repository: beshu-tech/readonlyrest-e2e-tests

Length of output: 364


🏁 Script executed:

#!/bin/bash
# Check if there's any entrypoint or startup logic in the ES_PATCHED_IMAGE base image reference or documentation
grep -r "ES_PATCHED_IMAGE\|entrypoint" environments/elk-ror/images/es/ 2>/dev/null || echo "No additional files found"

Repository: beshu-tech/readonlyrest-e2e-tests

Length of output: 206


🏁 Script executed:

#!/bin/bash
# Search for any build scripts or compose files that might show how the image is built/used
find . -name "docker-compose*" -o -name "*build*" | grep -E "(elk-ror|es)" | head -5

Repository: beshu-tech/readonlyrest-e2e-tests

Length of output: 141


🏁 Script executed:

#!/bin/bash
# Read the kbn Dockerfile to see the pattern
cat -n environments/elk-ror/images/kbn/Dockerfile

Repository: beshu-tech/readonlyrest-e2e-tests

Length of output: 1263


🏁 Script executed:

#!/bin/bash
# Check the buildRorDevEnvImageIfNecessary.sh to understand context
head -30 ./docker-based-ror-dev-env/src/buildRorDevEnvImageIfNecessary.sh

Repository: beshu-tech/readonlyrest-e2e-tests

Length of output: 444


USER root as the final Dockerfile instruction — container defaults to running as root.

Trivy correctly flags this (DS-0002). However, examining the related Kibana Dockerfile in the same environment shows that root privileges are intentionally needed for ReadonlyREST patching operations (e.g., package installation and permission/ownership modifications). This pattern is consistent across the E2E test environment images.

For a production image, consider either:

  • Dropping back to the elasticsearch user after any privileged initialization
  • Adding a comment explaining why root is required for the ROR patching workflow

For an E2E test environment, this may be acceptable as-is, but documenting the rationale would improve clarity.

🧰 Tools
🪛 Trivy (0.69.1)

[error] 19-19: Image user should not be 'root'

Last USER command in Dockerfile should not be 'root'

Rule: DS-0002

Learn more

(IaC/Dockerfile)

🤖 Prompt for AI Agents
In `@environments/elk-ror/images/es/Dockerfile` at line 19, The Dockerfile ends
with "USER root" which leaves the container running as root; either switch back
to the non-root Elasticsearch user after privileged setup or document why root
is required. Modify the Dockerfile so privileged operations (package install,
permission/chown for ReadonlyREST patching) run as root, then add a final "USER
elasticsearch" (or the appropriate non-root username used in this image) to drop
privileges, or if reverting to non-root is not possible for E2E needs, add a
clear comment next to the "USER root" instruction explaining the ReadonlyREST
patching workflow and why root is required for this test image.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant