[Feature] Prometheus + Grafana 모니터링 시스템 구축 by Kimgyuilli · Pull Request #138 · TEAM-Cherrish/Cherrish-Server

Kimgyuilli · 2026-01-22T10:18:38Z

🛠 Related issue 🛠

closed [✨ Feature] Prometheus + Grafana 모니터링 시스템 구축 #129

✏️ Work Description ✏️

Phase 1: 애플리케이션 메트릭 노출
- build.gradle에 micrometer-registry-prometheus 의존성 추가
- application-monitoring.yaml 프로필 생성 (metrics 공통 설정)
- application.yaml에 monitoring 프로필 include 추가
- application-prod.yaml에 Actuator 별도 포트(8081) 설정
- application-dev.yaml 중복 설정 정리
Phase 2: 모니터링 인프라 구축
- docker-compose.monitoring.yml 작성 (Prometheus + Grafana)
- monitoring/prometheus/prometheus.yml 설정
- monitoring/grafana/provisioning/ 데이터소스 및 대시보드 프로비저닝
Phase 3: Grafana 대시보드 구성
- JVM 메트릭 패널 (Heap Memory, GC Pause, Threads, CPU Usage)
- HTTP 요청 패널 (Request Rate, Error Rate, Response Time Percentiles, Throughput)
- 대시보드 JSON 프로비저닝 저장
Phase 4: 알림 설정
- Discord Contact Point 설정
- Alert Rules 설정 (High Error Rate, High Latency)
- Notification Policy 설정
- 알림 메시지 형식 커스터마이징
기타
- MonitoringTestController 추가 (local/dev 프로필 전용, 테스트용)

📸 Screenshot 📸

설명	사진
Grafana 대시보드
Discord 알림 예시

😅 Uncompleted Tasks 😅

프로덕션 배포 시 DISCORD_MONITORING_WEBHOOK_URL 환경변수 설정 필요
Security Group 설정 (8081 포트 Prometheus 서버만 허용)
Spring Security 도입 후 Actuator IP 제한 추가 예정

📢 To Reviewers 📢

application-monitoring.yaml이 새로 추가되었고, profiles.include로 모든 환경에 적용됩니다
프로덕션에서는 Actuator가 8081 포트로 분리되어 외부 노출을 방지합니다
MonitoringTestController는 @Profile({"local", "dev"})로 프로덕션에서는 비활성화됩니다
모니터링 스택 실행: DISCORD_MONITORING_WEBHOOK_URL=... docker-compose -f docker-compose.monitoring.yml up -d

coderabbitai · 2026-01-22T10:18:57Z

📝 Walkthrough

Walkthrough

Prometheus·Grafana 기반 모니터링 스택과 Micrometer Prometheus 레지스트리 의존성을 추가하고, 애플리케이션의 Actuator/metrics 설정 및 프로필 조정, 도커 컴포즈 모니터링 스택, Grafana 프로비저닝(데이터소스·대시보드·알림), Prometheus 구성, 및 로컬/개발 전용 모니터링 테스트 컨트롤러를 추가했습니다.

Changes

Cohort / File(s)	변경 요약
빌드 · 애플리케이션 설정 `build.gradle`, `src/main/resources/application.yaml`, `src/main/resources/application-dev.yaml`, `src/main/resources/application-monitoring.yaml`, `src/main/resources/application-prod.yaml`	`io.micrometer:micrometer-registry-prometheus` 의존성 추가; `monitoring` 프로필 포함; Actuator/Prometheus/metrics 노출 및 관리 포트/설정 변경
도커 컴포즈 (모니터링 스택) `docker-compose.monitoring.yml`	Prometheus 및 Grafana 서비스 정의(이미지, 포트, 볼륨, 네트워크, 환경변수, 헬스체크, depends_on) 추가
Prometheus 구성 `monitoring/prometheus/prometheus.yml`, `monitoring/prometheus/prometheus.prod.yml`	글로벌 scrape/evaluation 간격 15s; `prometheus`와 `cherrish-server` 스크랩 타겟(개발: host.docker.internal:8080, prod: cherrish-server:8081) 추가
Grafana 프로비저닝 - 데이터소스·대시보드 `monitoring/grafana/provisioning/datasources/datasource.yml`, `monitoring/grafana/provisioning/dashboards/dashboard.yml`, `monitoring/grafana/provisioning/dashboards/json/cherrish-overview.json`	Prometheus를 기본 비편집 데이터소스로 설정; 대시보드 프로비저닝 및 cherrish-overview JSON 추가(여러 JVM/HTTP/성능 패널)
Grafana 프로비저닝 - 알림 `monitoring/grafana/provisioning/alerting/contactpoints.yml`, `monitoring/grafana/provisioning/alerting/policies.yml`, `monitoring/grafana/provisioning/alerting/rules.yml`	Discord webhook contactPoint, 알림 그룹화 정책 및 에러율·지연·메트릭 수집 다운 규칙 추가
모니터링 테스트 엔드포인트 `src/main/java/com/sopt/cherrish/global/monitoring/MonitoringTestController.java`	로컬/개발 전용 테스트 엔드포인트 추가: `/api/monitoring/test/error` (예외 발생), `/api/monitoring/test/slow` (2초 대기)

Sequence Diagram(s)

sequenceDiagram
  participant App as "Spring App\n(com.sopt.cherrish)"
  participant Prom as "Prometheus\n(prom/prometheus:9090)"
  participant Graf as "Grafana\n(grafana:3000)"
  participant Discord as "Discord\n(Webhook)"

  App->>Prom: /actuator/prometheus 노출 (스크래핑 대상)
  Prom->>Prom: 스크래프(간격 15s)
  Graf->>Prom: 대시보드/알림 쿼리(평가)
  Graf->>Graf: 알림 룰 평가 (Error Rate, High Latency, Metrics Down)
  Graf-->>Discord: Discord webhook으로 알림 전송

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

[Deploy] CD 플로우 구축 및 배포 #76 — Actuator/모니터링 관련 설정 및 빌드 의존성 변경과 연관.

Suggested reviewers

ssyoung02

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The pull request title '[Feature] Prometheus + Grafana 모니터링 시스템 구축' clearly and concisely summarizes the main objective of implementing a Prometheus + Grafana monitoring system.
Description check	✅ Passed	The pull request description is comprehensive and directly related to the changeset, detailing all four phases of implementation, affected files, configuration changes, and additional context for reviewers.
Linked Issues check	✅ Passed	All coding objectives from issue `#129` and its sub-issues (`#130-`#133) are met: metrics exposure via Micrometer (Phase 1), monitoring infrastructure with docker-compose/Prometheus/Grafana provisioning (Phase 2), JVM/HTTP dashboards (Phase 3), and Discord alerting (Phase 4).
Out of Scope Changes check	✅ Passed	All changes are directly aligned with the monitoring system implementation scope. MonitoringTestController and profile configurations are necessary supporting changes, not out-of-scope additions.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch 129-feature/prometheus-grafana-monitoring-system

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

…-system

coderabbitai

Actionable comments posted: 10

🤖 Fix all issues with AI agents

In `@docker-compose.monitoring.yml`:
- Around line 27-28: The docker-compose uses a dangerous default admin password
via GF_SECURITY_ADMIN_PASSWORD=${GRAFANA_PASSWORD:-admin}; remove the hardcoded
fallback and require an explicit secret by changing to
GF_SECURITY_ADMIN_PASSWORD=${GRAFANA_PASSWORD} (and consider the same for
GF_SECURITY_ADMIN_USER), and add startup validation or container healthcheck
that fails fast when GRAFANA_PASSWORD is empty so deployments without a strong
password do not start.
- Around line 36-37: Update the compose so Grafana waits for Prometheus to be
healthy rather than just started: add a healthcheck block to the Prometheus
service (define a reliable check that verifies Prometheus readiness) and change
Grafana's depends_on from the simple list to the condition form that references
prometheus with condition: service_healthy (i.e., use depends_on: prometheus:
condition: service_healthy). Ensure the healthcheck command and interval/retries
are appropriate for Prometheus readiness.

In `@monitoring/grafana/provisioning/alerting/rules.yml`:
- Line 7: 현재 그룹 평가 주기인 interval: 1m이 PromQL 쿼리의 5m 범위와 중복 평가를 초래할 수 있으니,
alerting 규칙의 'interval' 설정을 검토하고 필요하면 값을 늘리거나(예: interval: 2m) 반대로 쿼리의 범위를 줄여(예:
1m) 민감도와 리소스 사용량 균형을 맞추세요; 변경 대상은 provisioning/alerting/rules.yml에서 'interval'
항목이며, 변경 후에는 알림 빈도와 중복 평가 여부를 테스트해 결과를 확인해주세요.
- Around line 56-57: Change the alert rules so that missing metrics don’t
silently show OK: update noDataState (currently set to OK) to a non-OK state
(e.g., NoData or Alert) for the rules that define noDataState and execErrState,
and/or add a dedicated "Metrics Collection Health" alert that monitors
up{job="cherrish"} so scrapes/down metrics trigger alerts; modify the entries
referencing noDataState and execErrState in the rules.yml and add the
up{job="cherrish"} rule as suggested.

In `@monitoring/grafana/provisioning/dashboards/dashboard.yml`:
- Around line 1-11: Change the provider block in dashboard.yml to set
disableDeletion: true to prevent accidental removal of provisioned dashboards
(look for the providers list and the entry with name: 'Cherrish Dashboards'),
and consider increasing updateIntervalSeconds from 30 to a higher value (e.g.,
300) for production; ensure options.path remains
/etc/grafana/provisioning/dashboards/json and keep orgId, folder and type
unchanged.

In `@monitoring/grafana/provisioning/dashboards/json/cherrish-overview.json`:
- Line 501: The dashboard's refresh interval is set too aggressively via the
JSON key "refresh": "5s"; update the "refresh" value in cherrish-overview.json
from "5s" to a less frequent interval such as "30s" or "1m" to reduce load on
Prometheus/Grafana and avoid unnecessary scraping; locate the "refresh" entry
(currently "refresh": "5s") and replace it with the chosen value, then validate
the JSON and reload the dashboard.
- Around line 47-54: Update the threshold "steps" values to match each panel's
"unit" instead of the blanket 80: for the "CPU Usage" panel (unit "percentunit",
max 1) change the red threshold step value from 80 to 0.8; for the "JVM Heap
Memory" panel (unit "bytes") either convert the 80 to a byte value (e.g., 0.8 *
configured max heap in bytes) or switch that panel's unit to a percentage unit
and set the threshold value to 0.8; for the "GC Pause Time" panel (unit "s")
replace 80 with a realistic seconds threshold (e.g., 0.5 for 500ms or another
appropriate value). Edit the JSON objects under each panel's "thresholds.steps"
and/or "unit" properties accordingly.

In `@monitoring/prometheus/prometheus.yml`:
- Around line 10-16: The current Prometheus scrape job for job_name
'cherrish-server' hardcodes a local Docker target and leaves production config
commented out; instead, make the target configurable per environment by either
(a) splitting prometheus.yml into environment-specific files and loading the
correct one during deployment, or (b) parameterizing the static_configs target
using environment variables (referencing the job_name 'cherrish-server' and
metrics_path '/actuator/prometheus') so the target host and port come from
CHERRISH_SERVER_HOST/CHERRISH_SERVER_PORT (with sensible defaults) and remove
manual comment toggling.

In `@src/main/resources/application-monitoring.yaml`:
- Around line 9-10: 관리용 Prometheus 접근 속성인 management.endpoint.prometheus.access에
현재 값이 비어 있으니 의도한 접근 수준에 따라 명시적으로 'none', 'read-only' 또는 'unrestricted' 중 하나를
할당하거나 기본 동작을 사용하려면 해당 속성 라인(access:)을 삭제하세요; 설정 변경은 application-monitoring.yaml의
management.endpoint.prometheus.access 항목을 찾아 적용하세요.

In `@src/main/resources/application-prod.yaml`:
- Around line 12-17: The management endpoints are exposed without proper auth;
update production config or add security: either implement Spring Security to
protect management endpoints and enforce IP-based access (configure access
control for Prometheus via prometheus.access and secure endpoints under
management.* and endpoint.health.show-details), or remove the monitoring profile
from production by ensuring the "monitoring" profile is not active in production
application-prod.yaml and keep management.server.port: 8081 while
disabling/promoting safe defaults for management.endpoint.* to avoid
unauthenticated metric/health exposure.

docker-compose.monitoring.yml

monitoring/grafana/provisioning/alerting/rules.yml

monitoring/grafana/provisioning/dashboards/dashboard.yml

monitoring/grafana/provisioning/dashboards/json/cherrish-overview.json

monitoring/prometheus/prometheus.yml

src/main/resources/application-monitoring.yaml

src/main/resources/application-prod.yaml

Merge branch '129-feature/prometheus-grafana-monitoring-system' of https://github.com/TEAM-Cherrish/Cherrish-Server into 129-feature/prometheus-grafana-monitoring-system

coderabbitai

Actionable comments posted: 3

🤖 Fix all issues with AI agents

In `@monitoring/grafana/provisioning/alerting/policies.yml`:
- Around line 3-10: Current group_by uses only grafana_folder which can merge
unrelated alerts; update the policies block so group_by includes a distinct rule
identifier (e.g., add alertname or the rule UID label) in addition to
grafana_folder to ensure alerts are grouped per rule. Locate the policies entry
and modify the group_by array to include "alertname" (or your rule UID label)
alongside "grafana_folder", keeping group_wait/group_interval/repeat_interval
as-is.

In `@monitoring/prometheus/prometheus.prod.yml`:
- Around line 10-13: The Prometheus scrape job for job_name 'cherrish-server'
uses a static target 'cherrish-server:8081' which may not resolve in production;
verify that the DNS name 'cherrish-server' resolves in your production runtime
(Kubernetes/Docker Swarm) and if not, switch this scrape config to use the
appropriate service discovery (e.g., kubernetes_sd_configs or
docker_sd_configs), a fully qualified domain name, or an environment-specific
variable so it matches how the service is exposed in production; check related
app config in application-prod.yaml to ensure port 8081 matches and update the
prometheus scrape target or discovery method accordingly.

In `@monitoring/prometheus/prometheus.yml`:
- Around line 10-13: The Prometheus scrape target uses host.docker.internal:8080
which may fail on Linux because docker-compose.monitoring.yml lacks the
extra_hosts mapping; update the prometheus service in
docker-compose.monitoring.yml to include extra_hosts:
["host.docker.internal:host-gateway"] so host.docker.internal resolves, or
alternatively change the target in monitoring/prometheus/prometheus.yml for
job_name 'cherrish-server' to use the internal service hostname (e.g.,
cherrish-server:8081 as in prometheus.prod.yml) to ensure cross-platform
scraping works.

♻️ Duplicate comments (1)

monitoring/grafana/provisioning/dashboards/json/cherrish-overview.json (1)
407-415: 응답시간 패널 threshold 값이 단위 대비 과도함

unit: "s"인 응답시간 패널에서 red threshold가 80s로 설정되어 있어 실제 SLO와 크게 어긋납니다. thresholdsStyle가 현재 off이더라도, 추후 활성화 시 잘못된 신호가 될 수 있으니 합리적인 SLO 값(예: 0.5~1s 수준)으로 조정해 주세요.
♻️ 예시 수정안
-              { "color": "red", "value": 80 }
+              { "color": "red", "value": 0.5 }

monitoring/grafana/provisioning/alerting/policies.yml

monitoring/prometheus/prometheus.prod.yml

monitoring/prometheus/prometheus.yml

…-system

coderabbitai

Actionable comments posted: 4

🤖 Fix all issues with AI agents

In `@docker-compose.monitoring.yml`:
- Line 16: 현재 compose에 남아있는 '--web.enable-lifecycle' 플래그는 프로덕션에서 /-/reload 및
/-/quit 엔드포인트를 인증 없이 노출할 수 있으므로 제거하거나 환경별로 비활성화해야 합니다; 수정 방법은
docker-compose.monitoring.yml에서 '--web.enable-lifecycle' 항목을 프로덕션 빌드/배포에서는 삭제하거나
PROM_ENABLE_LIFECYCLE 같은 환경변수로 감싸서 development에서만 설정되도록 변경하고(또는 배포 레이어에서 리버스
프록시/네트워크 정책으로 /-/reload 및 /-/quit 엔드포인트 접근을 차단하도록 구성) 해당 플래그와 영향받는
엔드포인트('/-/reload', '/-/quit')를 명확히 확인해 적용하세요.
- Line 1: docker-compose.monitoring.yml의 최상단에 있는 'version' 필드는 Docker Compose
V2에서 무시되므로 해당 파일의 'version: '3.8'' 항목을 삭제하세요; 즉 파일에서 최상위 키명인 version을 제거하고 나머지
서비스/volumes/networks 정의는 그대로 두면 됩니다.
- Line 30: Update the Grafana image tag used in the docker-compose service (the
line containing image: grafana/grafana:10.0.0) to the current stable release by
replacing 10.0.0 with 12.3.1 (i.e., image: grafana/grafana:12.3.1) or with a
configurable variable (e.g., ${GRAFANA_IMAGE:-grafana/grafana:12.3.1}) so the
service uses the latest supported/stable Grafana release and can be updated
easily.
- Line 5: Update the Prometheus image tag in docker-compose.monitoring.yml:
replace the outdated image string "prom/prometheus:v2.45.0" with the current
stable release "prom/prometheus:3.9.1" (ensuring any related service name or
labels referencing Prometheus remain unchanged), then redeploy to pick up the
security-fixed version.

♻️ Duplicate comments (1)

docker-compose.monitoring.yml (1)

35-36: 기본 관리자 비밀번호 관련 보안 이슈는 이전 리뷰에서 논의되었으며, 프로덕션 환경은 별도로 관리한다고 확인되었습니다.

docker-compose.monitoring.yml

…-system

…tps://github.com/TEAM-Cherrish/Cherrish-Server into 129-feature/prometheus-grafana-monitoring-system

…-system

coderabbitai

Actionable comments posted: 2

🤖 Fix all issues with AI agents

In `@docker-compose.monitoring.yml`:
- Around line 1-25: Add explicit CPU and memory limits for the prometheus
service to prevent it from exhausting host resources: update the prometheus
service (service name "prometheus") to include resource constraints—for Docker
Compose v3 use deploy.resources.limits with cpu and memory (e.g., cpu: "1.0",
memory: "2G"), and if supporting older Compose formats add equivalent mem_limit
and cpus entries—so the container has bounded memory/CPU while retaining the
existing command, volumes, healthcheck, networks, and restart settings.
- Around line 27-45: The Grafana service lacks resource limits; update the
grafana service block (service named "grafana", image "grafana/grafana:11.6.9")
to include resource constraints by adding deploy.resources.limits (e.g., cpu and
memory) and deploy.resources.reservations to cap and reserve CPU/memory for the
container; if using plain docker-compose (non-swarm) add equivalent
mem_limit/cpu_shares or use compose v2/3 fields appropriate for your setup so
Grafana cannot exhaust host resources.

coderabbitai · 2026-01-23T06:46:10Z

docker-compose.monitoring.yml

+services:
+  prometheus:
+    image: prom/prometheus:v3.5.1
+    container_name: cherrish-prometheus
+    ports:
+      - "9090:9090"
+    volumes:
+      - ./monitoring/prometheus/prometheus.yml:/etc/prometheus/prometheus.yml
+      - prometheus_data:/prometheus
+    command:
+      - '--config.file=/etc/prometheus/prometheus.yml'
+      - '--storage.tsdb.path=/prometheus'
+      - '--storage.tsdb.retention.time=15d'
+      - '--web.enable-lifecycle'
+    healthcheck:
+      test: ["CMD", "wget", "--no-verbose", "--tries=1", "--spider", "http://localhost:9090/-/healthy"]
+      interval: 10s
+      timeout: 5s
+      retries: 3
+      start_period: 10s
+    extra_hosts:
+      - "host.docker.internal:host-gateway"
+    networks:
+      - monitoring
+    restart: unless-stopped


🧹 Nitpick | 🔵 Trivial

프로덕션 안정성을 위해 리소스 제한 추가를 권장합니다.

Prometheus 컨테이너에 메모리/CPU 제한이 없으면, 메트릭 데이터 증가 시 호스트 리소스를 과도하게 사용할 수 있습니다. 특히 15일 retention과 함께 사용하면 메모리 사용량이 점진적으로 증가할 수 있습니다.

♻️ 리소스 제한 추가 예시

prometheus: image: prom/prometheus:v3.5.1 container_name: cherrish-prometheus + deploy: + resources: + limits: + memory: 2G + cpus: '1.0' + reservations: + memory: 512M ports: - "9090:9090"

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

services:

prometheus:

image: prom/prometheus:v3.5.1

container_name: cherrish-prometheus

ports:

- "9090:9090"

volumes:

- ./monitoring/prometheus/prometheus.yml:/etc/prometheus/prometheus.yml

- prometheus_data:/prometheus

command:

- '--config.file=/etc/prometheus/prometheus.yml'

- '--storage.tsdb.path=/prometheus'

- '--storage.tsdb.retention.time=15d'

- '--web.enable-lifecycle'

healthcheck:

test: ["CMD", "wget", "--no-verbose", "--tries=1", "--spider", "http://localhost:9090/-/healthy"]

interval: 10s

timeout: 5s

retries: 3

start_period: 10s

extra_hosts:

- "host.docker.internal:host-gateway"

networks:

- monitoring

restart: unless-stopped

services:

prometheus:

image: prom/prometheus:v3.5.1

container_name: cherrish-prometheus

deploy:

resources:

limits:

memory: 2G

cpus: '1.0'

reservations:

memory: 512M

ports:

- "9090:9090"

volumes:

- ./monitoring/prometheus/prometheus.yml:/etc/prometheus/prometheus.yml

- prometheus_data:/prometheus

command:

- '--config.file=/etc/prometheus/prometheus.yml'

- '--storage.tsdb.path=/prometheus'

- '--storage.tsdb.retention.time=15d'

- '--web.enable-lifecycle'

healthcheck:

test: ["CMD", "wget", "--no-verbose", "--tries=1", "--spider", "http://localhost:9090/-/healthy"]

interval: 10s

timeout: 5s

retries: 3

start_period: 10s

extra_hosts:

- "host.docker.internal:host-gateway"

networks:

- monitoring

restart: unless-stopped

🤖 Prompt for AI Agents

In `@docker-compose.monitoring.yml` around lines 1 - 25, Add explicit CPU and memory limits for the prometheus service to prevent it from exhausting host resources: update the prometheus service (service name "prometheus") to include resource constraints—for Docker Compose v3 use deploy.resources.limits with cpu and memory (e.g., cpu: "1.0", memory: "2G"), and if supporting older Compose formats add equivalent mem_limit and cpus entries—so the container has bounded memory/CPU while retaining the existing command, volumes, healthcheck, networks, and restart settings.

coderabbitai · 2026-01-23T06:46:10Z

docker-compose.monitoring.yml

+  grafana:
+    image: grafana/grafana:11.6.9
+    container_name: cherrish-grafana
+    ports:
+      - "3000:3000"
+    environment:
+      - GF_SECURITY_ADMIN_USER=admin
+      - GF_SECURITY_ADMIN_PASSWORD=${GRAFANA_PASSWORD:-admin}
+      - GF_USERS_ALLOW_SIGN_UP=false
+      - DISCORD_MONITORING_WEBHOOK_URL=${DISCORD_MONITORING_WEBHOOK_URL}
+    volumes:
+      - ./monitoring/grafana/provisioning:/etc/grafana/provisioning
+      - grafana_data:/var/lib/grafana
+    networks:
+      - monitoring
+    depends_on:
+      prometheus:
+        condition: service_healthy
+    restart: unless-stopped


🧹 Nitpick | 🔵 Trivial

Grafana 컨테이너에도 리소스 제한 추가를 권장합니다.

Grafana도 대시보드 복잡도와 동시 사용자 수에 따라 리소스 사용량이 증가할 수 있습니다.

♻️ 리소스 제한 추가 예시

grafana: image: grafana/grafana:11.6.9 container_name: cherrish-grafana + deploy: + resources: + limits: + memory: 512M + cpus: '0.5' + reservations: + memory: 128M ports: - "3000:3000"

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

grafana:

image: grafana/grafana:11.6.9

container_name: cherrish-grafana

ports:

- "3000:3000"

environment:

- GF_SECURITY_ADMIN_USER=admin

- GF_SECURITY_ADMIN_PASSWORD=${GRAFANA_PASSWORD:-admin}

- GF_USERS_ALLOW_SIGN_UP=false

- DISCORD_MONITORING_WEBHOOK_URL=${DISCORD_MONITORING_WEBHOOK_URL}

volumes:

- ./monitoring/grafana/provisioning:/etc/grafana/provisioning

- grafana_data:/var/lib/grafana

networks:

- monitoring

depends_on:

prometheus:

condition: service_healthy

restart: unless-stopped

grafana:

image: grafana/grafana:11.6.9

container_name: cherrish-grafana

deploy:

resources:

limits:

memory: 512M

cpus: '0.5'

reservations:

memory: 128M

ports:

- "3000:3000"

environment:

- GF_SECURITY_ADMIN_USER=admin

- GF_SECURITY_ADMIN_PASSWORD=${GRAFANA_PASSWORD:-admin}

- GF_USERS_ALLOW_SIGN_UP=false

- DISCORD_MONITORING_WEBHOOK_URL=${DISCORD_MONITORING_WEBHOOK_URL}

volumes:

- ./monitoring/grafana/provisioning:/etc/grafana/provisioning

- grafana_data:/var/lib/grafana

networks:

- monitoring

depends_on:

prometheus:

condition: service_healthy

restart: unless-stopped

🤖 Prompt for AI Agents

In `@docker-compose.monitoring.yml` around lines 27 - 45, The Grafana service lacks resource limits; update the grafana service block (service named "grafana", image "grafana/grafana:11.6.9") to include resource constraints by adding deploy.resources.limits (e.g., cpu and memory) and deploy.resources.reservations to cap and reserve CPU/memory for the container; if using plain docker-compose (non-swarm) add equivalent mem_limit/cpu_shares or use compose v2/3 fields appropriate for your setup so Grafana cannot exhaust host resources.

ssyoung02

고생하셨습니다!

Kimgyuilli added 9 commits January 21, 2026 22:54

feat(build.gradle): prometheus 의존성 추가

81d4fb3

feat(global): application-monitoring.yaml 추가

212a2a3

feat(docker): 모니터링용 docker-compose 작성

8d72446

feat(monitoring): prometheus 설정 파일 추가

d2ec41c

feat(monitoring): Prometheus + Grafana 모니터링 스택 구축

f0b5b4b

feat(monitoring): jvm 대시보드 패널 구성

3c918a6

feat(monitoring): Grafana 디스코드 모니터링 구현

b900456

feat(monitoring): docker-compose 모니터링 yml 추가

2e2eae9

feat(grafana): 디스코드 알림 형식 개선

fd61eaa

Kimgyuilli requested a review from ssyoung02 January 22, 2026 10:18

Kimgyuilli self-assigned this Jan 22, 2026

Kimgyuilli added ✨ Feature 기능 개발 규일🍊 규일 담당 작업 labels Jan 22, 2026

Kimgyuilli linked an issue Jan 22, 2026 that may be closed by this pull request

[✨ Feature] Prometheus + Grafana 모니터링 시스템 구축 #129

Open

4 tasks

Merge branch 'develop' into 129-feature/prometheus-grafana-monitoring…

685e0e6

…-system

coderabbitai bot reviewed Jan 22, 2026

View reviewed changes

Kimgyuilli added 11 commits January 22, 2026 20:41

feat(prometheus): 실행시 healthCheck 로직 추가

f4a9448

feat(monitoring): 대시보드 설정 업데이트

04b468b

feat(monitoring): Prometheus 상태 알림 플로우 구축

0956988

feat(grafana): 대시보드 refresh 시간 조정

38658ad

feat(grafana): threshold 값 조정

25d324e

feat(grafana): docker-compose 파일 운영 환경별로 분리

e83ba64

refactor(grafana): application-monitoring 불필요한 라인 제거

21b929b

feat(grafana): 각 룰에 rule name 추가

4a0d26b

feat(grafana): 디스코드 응답 개선

79af7e7

feat(grafana): 디스코드 알림 그룹화 변경

8426379

t push:wq

4ea8266

Merge branch '129-feature/prometheus-grafana-monitoring-system' of https://github.com/TEAM-Cherrish/Cherrish-Server into 129-feature/prometheus-grafana-monitoring-system

coderabbitai bot reviewed Jan 22, 2026

View reviewed changes

monitoring/grafana/provisioning/alerting/policies.yml Show resolved Hide resolved

monitoring/prometheus/prometheus.prod.yml Show resolved Hide resolved

monitoring/prometheus/prometheus.yml Show resolved Hide resolved

feat(monitoring): docker linux 환경 호환성 개선

2b619ca

Merge branch 'develop' into 129-feature/prometheus-grafana-monitoring…

630c9b4

…-system

coderabbitai bot reviewed Jan 22, 2026

View reviewed changes

docker-compose.monitoring.yml Outdated Show resolved Hide resolved

docker-compose.monitoring.yml Outdated Show resolved Hide resolved

docker-compose.monitoring.yml Show resolved Hide resolved

docker-compose.monitoring.yml Outdated Show resolved Hide resolved

Kimgyuilli changed the title ~~[Feature] Prometheus + Grafana 모니터링 시스템 구축~~ [WIP][Feature] Prometheus + Grafana 모니터링 시스템 구축 Jan 22, 2026

Kimgyuilli and others added 6 commits January 23, 2026 00:35

Merge branch 'develop' into 129-feature/prometheus-grafana-monitoring…

c0d1397

…-system

feat(monitoring): 불필요한 버전 명시 정리

1a4c61c

feat(prometheus): 3.5.1(LTS) 버전으로 업데이트

438c506

feat(grafana): 11.6.9(LTS) 버전으로 업데이트

1154c99

Merge branch '129-feature/prometheus-grafana-monitoring-system' of ht…

18ca48b

…tps://github.com/TEAM-Cherrish/Cherrish-Server into 129-feature/prometheus-grafana-monitoring-system

Merge branch 'develop' into 129-feature/prometheus-grafana-monitoring…

f34c007

…-system

Kimgyuilli changed the title ~~[WIP][Feature] Prometheus + Grafana 모니터링 시스템 구축~~ [Feature] Prometheus + Grafana 모니터링 시스템 구축 Jan 23, 2026

coderabbitai bot reviewed Jan 23, 2026

View reviewed changes

ssyoung02 approved these changes Jan 23, 2026

View reviewed changes

Conversation

Kimgyuilli commented Jan 22, 2026

🛠 Related issue 🛠

✏️ Work Description ✏️

📸 Screenshot 📸

😅 Uncompleted Tasks 😅

📢 To Reviewers 📢

Uh oh!

coderabbitai bot commented Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

ssyoung02 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

coderabbitai bot commented Jan 22, 2026 •

edited

Loading