Skip to content

roboco-io/KSAT-AI-Benchmark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

97 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐ŸŽ“ KSAT AI Benchmark

๋Œ€ํ•œ๋ฏผ๊ตญ ์ˆ˜ํ•™๋Šฅ๋ ฅ์‹œํ—˜(KSAT)์œผ๋กœ AI ๋ชจ๋ธ์˜ ์‹ค๋ ฅ์„ ์ธก์ •ํ•ฉ๋‹ˆ๋‹ค

License: CC BY-NC 4.0 Python 3.10+ Deploy to GitHub Pages GitHub Pages

๐ŸŒ ๋ฆฌ๋”๋ณด๋“œ ๋ฐ”๋กœ๊ฐ€๊ธฐ: https://roboco.io/KSAT-AI-Benchmark/

๐Ÿ“… 2026ํ•™๋…„๋„ ์ˆ˜๋Šฅ ์˜ˆ์ •: 2025๋…„ 11์›” 13์ผ(๋ชฉ) ์‹œํ–‰ ์˜ˆ์ • - ์‹œํ—˜ ํ›„ ๋น ๋ฅธ ๋ฒค์น˜๋งˆํฌ ๊ฒฐ๊ณผ ์—…๋ฐ์ดํŠธ

๐Ÿ“– ์†Œ๊ฐœ

KSAT AI Benchmark๋Š” ๋Œ€ํ•œ๋ฏผ๊ตญ ์ˆ˜ํ•™๋Šฅ๋ ฅ์‹œํ—˜ ๋ฌธ์ œ๋ฅผ ํ™œ์šฉํ•˜์—ฌ ๋‹ค์–‘ํ•œ AI ๋ชจ๋ธ์˜ ๋ฌธ์ œ ํ•ด๊ฒฐ ๋Šฅ๋ ฅ์„ ํ‰๊ฐ€ํ•˜๊ณ , ๊ทธ ๊ฒฐ๊ณผ๋ฅผ ๊ณต๊ฐœ์ ์œผ๋กœ ๊ณต์œ ํ•˜๋Š” ์˜คํ”ˆ์†Œ์Šค ํ”„๋กœ์ ํŠธ์ž…๋‹ˆ๋‹ค.

๐ŸŽฏ ํ”„๋กœ์ ํŠธ ์ฒ ํ•™

์ธ๊ฐ„ ์ค‘์‹ฌ์˜ AI ํ‰๊ฐ€ (Human-Centered AI Evaluation)

๊ธฐ์กด์˜ AI ๋ฒค์น˜๋งˆํฌ๋“ค์€ ๋Œ€๋ถ€๋ถ„ AI๋ฅผ ์œ„ํ•ด ์„ค๊ณ„๋œ ํ•ฉ์„ฑ ๋ฐ์ดํ„ฐ์…‹์ด๋‚˜ ํŠน์ • ํƒœ์Šคํฌ์— ์ตœ์ ํ™”๋œ ๋ฌธ์ œ๋“ค์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ ์šฐ๋ฆฌ๋Š” ๋‹ค๋ฅธ ์ ‘๊ทผ์„ ํƒํ–ˆ์Šต๋‹ˆ๋‹ค:

  • ์ง„์งœ ์ธ๊ฐ„์ด ๋ณด๋Š” ์‹œํ—˜์œผ๋กœ ํ‰๊ฐ€: ๋Œ€ํ•œ๋ฏผ๊ตญ ๊ณ ๋“ฑํ•™์ƒ๋“ค์ด ์‹ค์ œ๋กœ ์น˜๋ฅด๋Š” ์ˆ˜๋Šฅ ๋ฌธ์ œ๋กœ AI๋ฅผ ํ‰๊ฐ€ํ•ฉ๋‹ˆ๋‹ค
  • ํ‘œ์ค€ํ™”๋œ ์ธก์ •: ๋งค๋…„ ๋™์ผํ•œ ๋‚œ์ด๋„์™€ ํ˜•์‹์œผ๋กœ ์ถœ์ œ๋˜๋Š” ์ˆ˜๋Šฅ์€ AI ๋Šฅ๋ ฅ์˜ ์ผ๊ด€๋œ ๋น„๊ต ๊ธฐ์ค€์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค
  • ์ข…ํ•ฉ์  ์‚ฌ๊ณ ๋ ฅ ์š”๊ตฌ: ๋‹จ์ˆœ ์•”๊ธฐ๊ฐ€ ์•„๋‹Œ ๋…ํ•ด๋ ฅ, ์ถ”๋ก ๋ ฅ, ๋ฌธ์ œํ•ด๊ฒฐ๋ ฅ์„ ์ข…ํ•ฉ์ ์œผ๋กœ ํ‰๊ฐ€ํ•ฉ๋‹ˆ๋‹ค
  • ํˆฌ๋ช…ํ•œ ๋ฒค์น˜๋งˆํ‚น: ๋ชจ๋“  ๋ฌธ์ œ, ๋‹ต๋ณ€, ์ฑ„์  ๊ณผ์ •์ด ๊ณต๊ฐœ๋˜์–ด ๋ˆ„๊ตฌ๋‚˜ ๊ฒ€์ฆ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค

Vibe Coding: ์ž์—ฐ์Šค๋Ÿฌ์šด ๊ฐœ๋ฐœ ๊ฒฝํ—˜

์ด ํ”„๋กœ์ ํŠธ๋Š” "Vibe Coding" ์ฒ ํ•™์œผ๋กœ ๊ตฌ์ถ•๋˜์—ˆ์Šต๋‹ˆ๋‹ค:

  • ์ž์—ฐ์Šค๋Ÿฌ์šด ์›Œํฌํ”Œ๋กœ์šฐ: make korean, make gpt-5 2025 korean ๊ฐ™์€ ์ง๊ด€์ ์ธ ๋ช…๋ น์–ด
  • ์ง€๋Šฅํ˜• ์ž๋™ํ™”: Vision API๋กœ PDF๋ฅผ ํŒŒ์‹ฑํ•˜๊ณ , GitHub Actions๋กœ ํ‰๊ฐ€๋ฅผ ์ž๋™ํ™”
  • ์ฆ‰๊ฐ์ ์ธ ํ”ผ๋“œ๋ฐฑ: ํ‰๊ฐ€ ํ›„ ๋ฐ”๋กœ ๋ฆฌ๋”๋ณด๋“œ์™€ ์ƒ์„ธ ๋ถ„์„ ๊ฒฐ๊ณผ ์ œ๊ณต
  • ํ™•์žฅ ๊ฐ€๋Šฅํ•œ ์„ค๊ณ„: ์ƒˆ ๋ชจ๋ธ, ์ƒˆ ์‹œํ—˜์„ ์‰ฝ๊ฒŒ ์ถ”๊ฐ€ํ•  ์ˆ˜ ์žˆ๋Š” ๊ตฌ์กฐ

์ง€์†์ ์ธ AI ๋ฐœ์ „ ์ถ”์ 

  • ์—„์„ ๋œ ์ตœ์‹  ๋ชจ๋ธ 6์ข…์œผ๋กœ ์ง‘์ค‘ ๋ฒค์น˜๋งˆํ‚น (Gemini 2.5 Pro๋Š” ์•ˆ์ „ ํ•„ํ„ฐ ์ด์Šˆ๋กœ ์ œ์™ธ)
    • OpenAI: GPT-5, GPT-4o
    • Anthropic: Claude Opus 4.1, Claude Sonnet 4.5 (via Perplexity)
    • Upstage: Solar Pro (ํ•œ๊ตญ์–ด ํŠนํ™”)
    • Perplexity: Sonar Pro
  • ๋™์ผํ•œ ์‹œํ—˜์œผ๋กœ ์‹œ๊ฐ„์— ๋”ฐ๋ฅธ AI ๋ฐœ์ „์„ ๊ฐ๊ด€์ ์œผ๋กœ ๋น„๊ต
  • ๊ณผ๋ชฉ๋ณ„(๊ตญ์–ด, ์ˆ˜ํ•™, ์˜์–ด), ์˜์—ญ๋ณ„(์–ธ์–ด์ดํ•ด, ์ˆ˜๋ฆฌ์ถ”๋ก , ๋ฌธ์ œํ•ด๊ฒฐ) ๊ฐ•์ ๊ณผ ์•ฝ์  ํŒŒ์•…

โš ๏ธ Google Gemini 2.5 Pro ์ œ์™ธ ์‚ฌ์œ  Google์˜ ์•ˆ์ „ ํ•„ํ„ฐ๊ฐ€ ํ•œ๊ตญ์–ด ์ˆ˜๋Šฅ ๋ฌธ์ œ ์ฝ˜ํ…์ธ ๋ฅผ ์œ ํ•ด ์ฝ˜ํ…์ธ ๋กœ ์˜ค์ธํ•˜์—ฌ ๋Œ€๋ถ€๋ถ„์˜ ๋ฌธ์ œ์—์„œ SAFETY ์‘๋‹ต(finish_reason=2)์„ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค. BLOCK_NONE ์„ค์ •์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ  ์ •์ƒ์ ์ธ ํ‰๊ฐ€๊ฐ€ ๋ถˆ๊ฐ€๋Šฅํ•˜์—ฌ ๋ฒค์น˜๋งˆํฌ์—์„œ ์ œ์™ธํ•˜์˜€์Šต๋‹ˆ๋‹ค.

์ฃผ์š” ํŠน์ง•

  • ๐Ÿค– Vision API ๊ธฐ๋ฐ˜ ์ง€๋Šฅํ˜• ํŒŒ์‹ฑ: GPT-4o Vision์œผ๋กœ ๋ณต์žกํ•œ ์ˆ˜์‹, ๊ทธ๋ž˜ํ”„, 2๋‹จ ๋ ˆ์ด์•„์›ƒ ์™„๋ฒฝ ์ฒ˜๋ฆฌ
  • ๐ŸŽฏ ์—„์„ ๋œ ์ตœ์‹  AI ๋ชจ๋ธ: GPT-5, GPT-4o, Claude Opus 4.1, Claude Sonnet 4.5, Solar Pro, Sonar Pro
  • ๐Ÿ“Š ์ƒ์„ธํ•œ ๊ฒฐ๊ณผ ๋ถ„์„: ์ •๋‹ต๋ฅ , ๋‹ต๋ณ€ ์„ ํƒ ์ด์œ , ํ’€์ด ์‹œ๊ฐ„, ๊ณผ๋ชฉ๋ณ„ ๋“ฑ์ˆ˜๊นŒ์ง€ ๋ชจ๋‘ ๊ธฐ๋ก
  • โšก ์™„์ „ ์ž๋™ํ™” ํŒŒ์ดํ”„๋ผ์ธ: PDF โ†’ Vision ํŒŒ์‹ฑ โ†’ ํ‰๊ฐ€ โ†’ ์›น ๋ฐฐํฌ๊นŒ์ง€ GitHub Actions๋กœ ์ž๋™ํ™”
  • ๐ŸŒ ํ˜„๋Œ€์ ์ธ ์›น UI: Next.js + Mantine UI๋กœ ๊ตฌํ˜„๋œ ์ธํ„ฐ๋ž™ํ‹ฐ๋ธŒ ๋ฆฌ๋”๋ณด๋“œ
  • ๐Ÿ”„ ์ง€์†์ ์ธ ์—…๋ฐ์ดํŠธ: ์ƒˆ๋กœ์šด ์‹œํ—˜์ด๋‚˜ ๋ชจ๋ธ์ด ์ถ”๊ฐ€๋˜๋ฉด ์ž๋™์œผ๋กœ ํ‰๊ฐ€ํ•˜๊ณ  ๋ฐฐํฌ

๐ŸŽฏ ํ‰๊ฐ€ ์ง€ํ‘œ

๊ฐ AI ๋ชจ๋ธ์€ ๋‹ค์Œ ๊ธฐ์ค€์œผ๋กœ ํ‰๊ฐ€๋ฉ๋‹ˆ๋‹ค:

  1. ์ •๋‹ต๋ฅ  & ์ ์ˆ˜: ์ •๋‹ต์„ ๋งžํžŒ ๋ฌธ์ œ์˜ ๋น„์œจ๊ณผ ํš๋“ ์ ์ˆ˜
  2. ๋‹ต๋ณ€ ์„ ํƒ ์ด์œ : ํ•ด๋‹น ๋‹ต์„ ์„ ํƒํ•œ ์ƒ์„ธํ•œ ๋…ผ๋ฆฌ์™€ ์„ค๋ช…
  3. ํ’€์ด ์‹œ๊ฐ„: ๊ฐ ๋ฌธ์ œ๋ฅผ ํ‘ธ๋Š”๋ฐ ์†Œ์š”๋œ ์‹œ๊ฐ„ (์ดˆ ๋‹จ์œ„)
  4. ๊ณผ๋ชฉ๋ณ„ ์„ฑ์ : ๊ตญ์–ด, ์ˆ˜ํ•™, ์˜์–ด, ํƒ๊ตฌ ์˜์—ญ๋ณ„ ์ ์ˆ˜

โšก Makefile ๋น ๋ฅธ ์‹œ์ž‘

# ๋„์›€๋ง
make help

# ๊ตญ์–ด ํŒŒ์‹ฑ + ์ •๋‹ต ์ž…๋ ฅ
make korean

# ์ˆ˜ํ•™ ํŒŒ์‹ฑ + ์ •๋‹ต ์ž…๋ ฅ (Vision API)
make math

# ๋ชจ๋“  ๊ณผ๋ชฉ ์ฒ˜๋ฆฌ
make all

# ์ปค์Šคํ…€ PDF ํŒŒ์‹ฑ
make parse PDF=exams/pdf/2025/๊ตญ์–ด์˜์—ญ_๋ฌธ์ œ์ง€_ํ™€์ˆ˜ํ˜•.pdf
make parse-vision PDF=exams/pdf/2025/์ˆ˜ํ•™์˜์—ญ_๋ฌธ์ œ์ง€_ํ™€์ˆ˜ํ˜•.pdf

# YAML ๊ฒ€์ฆ
make validate

# ์ •๋ฆฌ
make clean

๐Ÿ”„ ์ž๋™ํ™” ์›Œํฌํ”Œ๋กœ์šฐ

์™„์ „ ์ž๋™ํ™” ํŒŒ์ดํ”„๋ผ์ธ

graph LR
    A[ํ‰๊ฐ€ ์‹คํ–‰] --> B[results/ ์ €์žฅ]
    B --> C[Git Push]
    C --> D[GitHub Actions]
    D --> E[์ž๋™ ๋ฐฐํฌ]
Loading

์ž๋™ํ™” ๋‹จ๊ณ„:

  1. ๋กœ์ปฌ ํ‰๊ฐ€: make gpt-5 2025 korean ์‹คํ–‰ โ†’ results/ ๋””๋ ‰ํ† ๋ฆฌ์— YAML ์ €์žฅ
  2. Git ์ปค๋ฐ‹/ํ‘ธ์‹œ: results/ ๋ณ€๊ฒฝ์‚ฌํ•ญ์„ main ๋ธŒ๋žœ์น˜์— ํ‘ธ์‹œ
  3. GitHub Actions ์ž๋™ ์‹คํ–‰ (.github/workflows/deploy-pages.yml):
    • Python์œผ๋กœ YAML โ†’ JSON ๋ณ€ํ™˜ (scripts/export_data.py)
    • Next.js ์›น์‚ฌ์ดํŠธ ๋นŒ๋“œ
    • GitHub Pages ์ž๋™ ๋ฐฐํฌ
  4. ์›น์‚ฌ์ดํŠธ ์ž๋™ ์—…๋ฐ์ดํŠธ: https://roboco.io/KSAT-AI-Benchmark/

Vibe Coding ์›Œํฌํ”Œ๋กœ์šฐ

graph TD
    A[PDF ์—…๋กœ๋“œ<br/>exams/pdf/] --> B[Vision API ํŒŒ์‹ฑ<br/>GPT-4o Vision]
    B --> C[YAML ์ƒ์„ฑ<br/>exams/parsed/]
    C --> D[๋ชจ๋ธ ํ‰๊ฐ€<br/>AI ๋ชจ๋ธ ์‹คํ–‰]
    D --> E[๊ฒฐ๊ณผ ์ €์žฅ<br/>results/]
    E --> F[Git Commit & Push]
    F --> G[GitHub Actions ํŠธ๋ฆฌ๊ฑฐ]
    G --> H[์›น ๋ฐฐํฌ<br/>GitHub Pages]
    
    B1[์ˆ˜์‹ โ†’ LaTeX] -.-> B
    B2[๊ทธ๋ž˜ํ”„ ์ธ์‹] -.-> B
    B3[2๋‹จ ๋ ˆ์ด์•„์›ƒ ์ฒ˜๋ฆฌ] -.-> B
    
    D1[๋ฌธ์ œ ์ฝ๊ธฐ] -.-> D
    D2[์‚ฌ๊ณ  & ์ถ”๋ก ] -.-> D
    D3[๋‹ต๋ณ€ ์„ ํƒ] -.-> D
    D4[์ด์œ  ์„ค๋ช…] -.-> D
Loading

์‚ฌ์šฉ์ž ๊ด€์  ์ „์ฒด ํ”Œ๋กœ์šฐ

sequenceDiagram
    participant User as ๐Ÿ‘ค ์‚ฌ์šฉ์ž
    participant Make as ๐Ÿ”ง Makefile
    participant Parser as ๐Ÿ“„ Parser
    participant Eval as ๐Ÿค– Evaluator
    participant Git as ๐Ÿ“ฆ Git
    participant GHA as โš™๏ธ GitHub Actions
    participant Web as ๐ŸŒ Website
    
    User->>Make: make korean
    Make->>Parser: PDF ํŒŒ์‹ฑ (Vision API)
    Parser->>Parser: YAML ์ƒ์„ฑ
    Parser-->>User: โœ… YAML ํŒŒ์ผ ์ƒ์„ฑ ์™„๋ฃŒ
    
    User->>Make: make gpt-5 2025 korean
    Make->>Eval: ๋ชจ๋ธ ํ‰๊ฐ€ ์‹คํ–‰
    Eval->>Eval: AI ๋ชจ๋ธ๋กœ ๋ฌธ์ œ ํ’€์ด
    Eval->>Eval: results/ ์ €์žฅ
    Eval-->>User: โœ… ํ‰๊ฐ€ ์™„๋ฃŒ
    
    User->>Git: git push
    Git->>GHA: ํŠธ๋ฆฌ๊ฑฐ
    GHA->>GHA: YAML โ†’ JSON ๋ณ€ํ™˜
    GHA->>GHA: Next.js ๋นŒ๋“œ
    GHA->>Web: GitHub Pages ๋ฐฐํฌ
    Web-->>User: ๐ŸŽ‰ ๋ฆฌ๋”๋ณด๋“œ ์—…๋ฐ์ดํŠธ ์™„๋ฃŒ
Loading

๐Ÿš€ GitHub Pages ํ™œ์„ฑํ™” ๋ฐฉ๋ฒ•

GitHub Actions ์›Œํฌํ”Œ๋กœ์šฐ๊ฐ€ ์ž‘๋™ํ•˜๋ ค๋ฉด GitHub Pages ์„ค์ •์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค:

  1. GitHub ์ €์žฅ์†Œ โ†’ Settings โ†’ Pages
  2. Source: "GitHub Actions" ์„ ํƒ
  3. ์ €์žฅ ํ›„ ์ž๋™์œผ๋กœ ์›Œํฌํ”Œ๋กœ์šฐ ์‹คํ–‰

์ˆ˜๋™ ํŠธ๋ฆฌ๊ฑฐ ๋ฐฉ๋ฒ•:

  • GitHub ์ €์žฅ์†Œ โ†’ Actions ํƒญ โ†’ "Deploy to GitHub Pages" โ†’ "Run workflow"

๋ฐฐํฌ ์ƒํƒœ ํ™•์ธ:

  • Actions ํƒญ์—์„œ ์›Œํฌํ”Œ๋กœ์šฐ ์‹คํ–‰ ์ƒํƒœ ํ™•์ธ
  • ๋ฐฐํฌ ์™„๋ฃŒ ํ›„ https://roboco.io/KSAT-AI-Benchmark/ ์ ‘์†

๐ŸŽจ Vibe Coding in Action

์ด ํ”„๋กœ์ ํŠธ์˜ ์ „์ฒด ์›Œํฌํ”Œ๋กœ์šฐ๋ฅผ ๋‹จ๊ณ„๋ณ„๋กœ ์‚ดํŽด๋ด…๋‹ˆ๋‹ค:

1. PDF ์—…๋กœ๋“œ: exams/pdf/์— ์‹œํ—˜์ง€ PDF ์ถ”๊ฐ€

  • ์ˆ˜๋Šฅ ๋ฌธ์ œ์ง€๋ฅผ ๊ทธ๋Œ€๋กœ ์—…๋กœ๋“œ (OCR ๋ถˆํ•„์š”)

2. Vision API ํŒŒ์‹ฑ: GPT-4o Vision์œผ๋กœ ์ง€๋Šฅํ˜• ์ถ”์ถœ

  • ๋ณต์žกํ•œ ์ˆ˜ํ•™ ์ˆ˜์‹ โ†’ LaTeX๋กœ ์ •ํ™•ํ•˜๊ฒŒ ๋ณ€ํ™˜
  • ๊ทธ๋ž˜ํ”„, ๋„ํ‘œ โ†’ ์‹œ๊ฐ์  ์š”์†Œ ์ธ์‹ ๋ฐ ์„ค๋ช…
  • 2๋‹จ ๋ ˆ์ด์•„์›ƒ โ†’ ๊ตฌ์กฐ ํŒŒ์•… ๋ฐ ๋…ผ๋ฆฌ์  ์ˆœ์„œ๋กœ ์žฌ๋ฐฐ์—ด
  • ํ•œ ๋ฒˆ์˜ ๋ช…๋ น: make korean ๋˜๋Š” make math

3. YAML ์ƒ์„ฑ: exams/parsed/์— ๊ตฌ์กฐํ™”๋œ ๋ฐ์ดํ„ฐ ์ €์žฅ

  • ์‚ฌ๋žŒ์ด ์ฝ๊ธฐ ์‰ฌ์šด ํฌ๋งท
  • ๋ฒ„์ „ ๊ด€๋ฆฌ ๊ฐ€๋Šฅ
  • ์žฌ์‚ฌ์šฉ ๋ฐ ๊ฒ€์ฆ ์šฉ์ด

4. ๋ชจ๋ธ ํ‰๊ฐ€: ๊ฐ AI ๋ชจ๋ธ์ด ์‹ค์ œ ์‹œํ—˜ ์‘์‹œ

  • ๋ฌธ์ œ ์ฝ๊ธฐ โ†’ ์‚ฌ๊ณ  โ†’ ๋‹ต๋ณ€ ์„ ํƒ โ†’ ์ด์œ  ์„ค๋ช…
  • ์‹ค์‹œ๊ฐ„ ์†Œ์š” ์‹œ๊ฐ„ ์ธก์ •
  • ์œ ์—ฐํ•œ ํ‰๊ฐ€: make gpt-5 2025 korean,math

5. ๊ฒฐ๊ณผ ์ €์žฅ: results/์— YAML ํ˜•์‹์œผ๋กœ ์ €์žฅ

  • ๋ชจ๋ธ๋ณ„, ์‹œํ—˜๋ณ„ ๊ฒฐ๊ณผ ๋ถ„๋ฆฌ
  • ๋‹ต๋ณ€ ์ด์œ ์™€ ์‹œ๊ฐ„ ๋ชจ๋‘ ๊ธฐ๋ก
  • ์–ธ์ œ๋“  ์žฌ๋ถ„์„ ๊ฐ€๋Šฅ

6. ์›น ๋ฐฐํฌ: GitHub Actions๋กœ ์ž๋™ ๋ฐฐํฌ

  • ๊ฒฐ๊ณผ ์ปค๋ฐ‹ โ†’ ์ž๋™ ๋นŒ๋“œ โ†’ GitHub Pages ๋ฐฐํฌ
  • ์‹ค์‹œ๊ฐ„ ๋ฆฌ๋”๋ณด๋“œ ์—…๋ฐ์ดํŠธ
  • ๊ณผ๋ชฉ๋ณ„ ๋“ฑ์ˆ˜, ์ƒ์„ธ ํ†ต๊ณ„ ์ž๋™ ์ƒ์„ฑ

๐Ÿš€ ๋น ๋ฅธ ์‹œ์ž‘

ํ•„์š” ์กฐ๊ฑด

  • Python 3.10 ์ด์ƒ
  • Node.js 18 ์ด์ƒ
  • AI ๋ชจ๋ธ API ํ‚ค:
    • OpenAI (GPT-4, GPT-3.5)
    • Anthropic (Claude)
    • Google (Gemini)
    • Upstage (Solar)
    • Perplexity (Sonar)

์„ค์น˜

# ์ €์žฅ์†Œ ํด๋ก 
git clone https://github.com/roboco-io/KSAT-AI-Benchmark.git
cd KSAT-AI-Benchmark

# Python ์˜์กด์„ฑ ์„ค์น˜
pip install -r requirements.txt

# ํ™˜๊ฒฝ ๋ณ€์ˆ˜ ์„ค์ •
cp .env.example .env
# .env ํŒŒ์ผ์— API ํ‚ค ์ž…๋ ฅ

์ƒˆ๋กœ์šด ์‹œํ—˜ ์ถ”๊ฐ€ํ•˜๊ธฐ

๋ฐฉ๋ฒ• 1: Makefile ์‚ฌ์šฉ โญ๏ธ ๊ฐ€์žฅ ๊ฐ„ํŽธ!

# ๊ตญ์–ด ํŒŒ์‹ฑ + ์ •๋‹ต ์ž…๋ ฅ (ํ•œ ๋ฒˆ์—)
make korean

# ์ˆ˜ํ•™ ํŒŒ์‹ฑ + ์ •๋‹ต ์ž…๋ ฅ (ํ•œ ๋ฒˆ์—)
make math

# ์˜์–ด ํŒŒ์‹ฑ + ์ •๋‹ต ์ž…๋ ฅ (ํ•œ ๋ฒˆ์—)
make english

# ๋ชจ๋“  ๊ณผ๋ชฉ ์ฒ˜๋ฆฌ
make all

# ๋„์›€๋ง ๋ณด๊ธฐ
make help

๋ฐฉ๋ฒ• 2: Python ์Šคํฌ๋ฆฝํŠธ ์ง์ ‘ ์‹คํ–‰

# 1. PDF ํŒŒ์‹ฑ (๋กœ์ปฌ์—์„œ)
# ํ…์ŠคํŠธ ๊ธฐ๋ฐ˜ (๊ตญ์–ด, ์‚ฌํšŒ ๋“ฑ)
python src/parser/parse_exam.py exams/pdf/2025/๊ตญ์–ด์˜์—ญ_๋ฌธ์ œ์ง€_ํ™€์ˆ˜ํ˜•.pdf

# Vision API (์ˆ˜ํ•™, ๊ณผํ•™ ๋“ฑ) - ์ˆ˜์‹๊ณผ ๊ทธ๋ž˜ํ”„ ์™„๋ฒฝ ์ธ์‹
python src/parser/parse_exam.py exams/pdf/2025/์ˆ˜ํ•™์˜์—ญ_๋ฌธ์ œ์ง€_ํ™€์ˆ˜ํ˜•.pdf --vision

# 2. ์ •๋‹ตํ‘œ ํŒŒ์‹ฑ ๋ฐ ์ž๋™ ์ž…๋ ฅ
python src/parser/parse_answer_key.py \
  exams/pdf/2025/์ˆ˜ํ•™์˜์—ญ_์ •๋‹ตํ‘œ.pdf \
  exams/parsed/2025-math-sat.yaml

# 3. Git์— ์ถ”๊ฐ€ ๋ฐ ์ปค๋ฐ‹
git add exams/parsed/2025-math-sat.yaml
git commit -m "feat: 2025 ์ˆ˜ํ•™ ์‹œํ—˜ ์ถ”๊ฐ€"
git push

# 4. GitHub Actions๊ฐ€ ์ž๋™์œผ๋กœ:
#    - ๋ชจ๋“  AI ๋ชจ๋ธ๋กœ ํ‰๊ฐ€ ์‹คํ–‰
#    - ๊ฒฐ๊ณผ๋ฅผ results/์— ์ €์žฅ
#    - ์›น์‚ฌ์ดํŠธ ์—…๋ฐ์ดํŠธ

ํŒŒ์‹ฑ ๊ฐ€์ด๋“œ: ์ƒ์„ธํ•œ ํŒŒ์‹ฑ ๋ฐฉ๋ฒ•์€ docs/PARSER_GUIDE.md๋ฅผ ์ฐธ๊ณ ํ•˜์„ธ์š”.

๋ฐฉ๋ฒ• 3: ์ˆ˜๋™ YAML ์ž‘์„ฑ

exams/parsed/ ํด๋”์— YAML ํŒŒ์ผ์„ ์ง์ ‘ ์ž‘์„ฑํ•  ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค.

๊ตญ์–ด ์‹œํ—˜ (Optimized Schema - ์ง€๋ฌธ ์ค‘๋ณต ์ œ๊ฑฐ):

exam_id: 2024-korean-sat
title: 2024ํ•™๋…„๋„ ์ˆ˜๋Šฅ ๊ตญ์–ด์˜์—ญ
subject: korean
year: 2024

# ์ง€๋ฌธ ์ค‘์•™ ๊ด€๋ฆฌ (๊ฐ™์€ ์ง€๋ฌธ์„ ๊ณต์œ ํ•˜๋Š” ๋ฌธ์ œ ๊ทธ๋ฃน)
passages:
  - passage_id: p1
    passage_text: "๊ธด ์ง€๋ฌธ ๋‚ด์šฉ..."
    question_numbers: [1, 2, 3]

# ๋ฌธ์ œ๋“ค์€ passage_id๋กœ ์ง€๋ฌธ ์ฐธ์กฐ
questions:
  - question_id: q1
    question_number: 1
    question_text: "์œ—๊ธ€์˜ ๋‚ด์šฉ๊ณผ ์ผ์น˜ํ•˜๋Š” ๊ฒƒ์€?"
    passage_id: p1  # ์ง€๋ฌธ ์ฐธ์กฐ
    choices: ["์„ ํƒ์ง€1", "์„ ํƒ์ง€2", ...]
    correct_answer: 3
    points: 2

์ˆ˜ํ•™ ์‹œํ—˜ (Legacy Schema - ๊ธฐ์กด ๋ฐฉ์‹):

exam_id: 2024-math-sat
title: 2024ํ•™๋…„๋„ ์ˆ˜๋Šฅ ์ˆ˜ํ•™์˜์—ญ
subject: math
year: 2024

questions:
  - question_id: q1
    question_number: 1
    question_text: "๋‹ค์Œ ์ค‘ ์˜ณ์€ ๊ฒƒ์€?"
    passage: "์ง€๋ฌธ ๋‚ด์šฉ (์„ ํƒ์ )"  # ์ง์ ‘ ํฌํ•จ
    choices: ["1", "2", "3", "4", "5"]
    correct_answer: "3"
    points: 2

์ƒˆ๋กœ์šด ๋ชจ๋ธ ์ถ”๊ฐ€ํ•˜๊ธฐ

models/models.json ํŒŒ์ผ์— ๋ชจ๋ธ ์ •๋ณด๋ฅผ ์ถ”๊ฐ€ํ•ฉ๋‹ˆ๋‹ค:

{
  "models": [
    {
      "name": "gpt-4-turbo",
      "provider": "openai",
      "version": "2024-01",
      "api_key_env": "OPENAI_API_KEY",
      "max_tokens": 2000,
      "timeout": 60
    }
  ]
}

๋กœ์ปฌ์—์„œ ํ‰๊ฐ€ ์‹คํ–‰

# PDF ํŒŒ์‹ฑ
python -m src.parser.main --input exams/pdf/2024-ksat-math.pdf

# ๋ชจ๋“  ๋ชจ๋ธ๋กœ ๋ชจ๋“  ์‹œํ—˜ ํ‰๊ฐ€
python -m src.evaluator.main

# ํŠน์ • ๋ชจ๋ธ๋กœ ํŠน์ • ์‹œํ—˜ ํ‰๊ฐ€
python -m src.evaluator.main --model gpt-4-turbo --exam 2024-ksat-math

์›น ์ธํ„ฐํŽ˜์ด์Šค ๋กœ์ปฌ ์‹คํ–‰

cd web
npm install
npm run dev
# http://localhost:3000 ์ ‘์†

๐Ÿ“ ํ”„๋กœ์ ํŠธ ๊ตฌ์กฐ

KSAT-AI-Benchmark/
โ”œโ”€โ”€ .github/
โ”‚   โ””โ”€โ”€ workflows/              # GitHub Actions ์›Œํฌํ”Œ๋กœ์šฐ
โ”‚       โ”œโ”€โ”€ parse-and-evaluate.yml
โ”‚       โ””โ”€โ”€ deploy-pages.yml
โ”œโ”€โ”€ exams/
โ”‚   โ”œโ”€โ”€ pdf/                    # ์›๋ณธ PDF ์‹œํ—˜์ง€
โ”‚   โ””โ”€โ”€ parsed/                 # ํŒŒ์‹ฑ๋œ YAML ํŒŒ์ผ
โ”œโ”€โ”€ models/                     # AI ๋ชจ๋ธ ์„ค์ •
โ”œโ”€โ”€ src/
โ”‚   โ”œโ”€โ”€ parser/                 # PDF ํŒŒ์‹ฑ ์‹œ์Šคํ…œ
โ”‚   โ”œโ”€โ”€ evaluator/              # ํ‰๊ฐ€ ์‹œ์Šคํ…œ
โ”‚   โ””โ”€โ”€ models/                 # ๋ชจ๋ธ ์ธํ„ฐํŽ˜์ด์Šค
โ”œโ”€โ”€ results/                    # ํ‰๊ฐ€ ๊ฒฐ๊ณผ YAML
โ”œโ”€โ”€ web/                        # Next.js ํ”„๋ก ํŠธ์—”๋“œ
โ”‚   โ”œโ”€โ”€ app/                    # App Router ํŽ˜์ด์ง€
โ”‚   โ”œโ”€โ”€ components/             # React ์ปดํฌ๋„ŒํŠธ
โ”‚   โ””โ”€โ”€ lib/                    # YAML ๋กœ๋” ๋“ฑ
โ”œโ”€โ”€ docs/                       # ํ”„๋กœ์ ํŠธ ๋ฌธ์„œ
โ””โ”€โ”€ tests/                      # ํ…Œ์ŠคํŠธ ์ฝ”๋“œ

โš™๏ธ GitHub Actions ์›Œํฌํ”Œ๋กœ์šฐ

1. PDF ํŒŒ์‹ฑ ๋ฐ ํ‰๊ฐ€ (parse-and-evaluate.yml)

ํŠธ๋ฆฌ๊ฑฐ:

  • exams/pdf/์— ์ƒˆ PDF ์ถ”๊ฐ€
  • models/models.json ์ˆ˜์ •
  • ์ˆ˜๋™ ์‹คํ–‰

ํ”„๋กœ์„ธ์Šค:

graph TD
    subgraph "Job 1: PDF ํŒŒ์‹ฑ"
        A1[PDF ํ…์ŠคํŠธ/์ด๋ฏธ์ง€ ์ถ”์ถœ] --> A2[YAML ์ƒ์„ฑ]
        A2 --> A3[exams/parsed/ ์ปค๋ฐ‹]
    end
    
    subgraph "Job 2: ๋ชจ๋ธ ํ‰๊ฐ€"
        B1[YAML ๋กœ๋“œ] --> B2[๊ฐ ๋ชจ๋ธ๋กœ ๋ฌธ์ œ ํ’€์ด]
        B2 --> B3[๊ฒฐ๊ณผ YAML ์ €์žฅ]
        B3 --> B4[results/ ์ปค๋ฐ‹]
    end
    
    A3 --> B1
Loading

2. ์›น์‚ฌ์ดํŠธ ๋ฐฐํฌ (deploy-pages.yml)

ํŠธ๋ฆฌ๊ฑฐ:

  • results/ ํด๋” ๋ณ€๊ฒฝ
  • exams/parsed/ ํด๋” ๋ณ€๊ฒฝ
  • web/ ํด๋” ๋ณ€๊ฒฝ

ํ”„๋กœ์„ธ์Šค:

graph TD
    C1[YAML โ†’ JSON ๋ณ€ํ™˜] --> C2[Next.js ๋นŒ๋“œ]
    C2 --> C3[์ •์  HTML ์ƒ์„ฑ]
    C3 --> C4[GitHub Pages ๋ฐฐํฌ]
Loading

3. ํŠธ๋ฆฌ๊ฑฐ๋ณ„ ์›Œํฌํ”Œ๋กœ์šฐ

graph TD
    subgraph "ํŠธ๋ฆฌ๊ฑฐ"
        T1[exams/pdf/ ๋ณ€๊ฒฝ]
        T2[models/models.json ๋ณ€๊ฒฝ]
        T3[results/ ๋ณ€๊ฒฝ]
        T4[web/ ๋ณ€๊ฒฝ]
    end
    
    subgraph "์›Œํฌํ”Œ๋กœ์šฐ"
        W1[parse-and-evaluate.yml]
        W2[deploy-pages.yml]
    end
    
    T1 --> W1
    T2 --> W1
    T3 --> W2
    T4 --> W2
    
    W1 --> R1[PDF ํŒŒ์‹ฑ & ํ‰๊ฐ€]
    W2 --> R2[์›น์‚ฌ์ดํŠธ ๋ฐฐํฌ]
Loading

๐ŸŒ ๊ฒฐ๊ณผ ํ™•์ธ

ํ‰๊ฐ€ ๊ฒฐ๊ณผ๋Š” ๋‹ค์Œ ๋งํฌ์—์„œ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:

๐Ÿ‘‰ https://roboco.io/KSAT-AI-Benchmark/

์ฃผ์š” ํŽ˜์ด์ง€

1. ๋ฆฌ๋”๋ณด๋“œ (๋ฉ”์ธ)

  • ๋ชจ๋ธ๋ณ„ ์ „์ฒด ์ˆœ์œ„ ํ…Œ์ด๋ธ”
  • ๊ณผ๋ชฉ๋ณ„ ์ ์ˆ˜ ํ•„ํ„ฐ๋ง
  • ์ •๋‹ต๋ฅ ๊ณผ ํ‰๊ท  ํ’€์ด ์‹œ๊ฐ„

2. ๋ฌธ์ œ ๋ชฉ๋ก ํŽ˜์ด์ง€

  • ์‹œํ—˜์˜ ๋ชจ๋“  ๋ฌธ์ œ ํ‘œ์‹œ
  • ๊ฐ ๋ฌธ์ œ๋ณ„๋กœ ๋ชจ๋“  ๋ชจ๋ธ์˜ ๋‹ต์•ˆ ๊ทธ๋ฆฌ๋“œ
  • ์ •๋‹ต(์ดˆ๋ก) / ์˜ค๋‹ต(๋นจ๊ฐ•) ์ƒ‰์ƒ ๊ตฌ๋ถ„
  • ๋‹ต์•ˆ ํด๋ฆญ โ†’ ์ƒ์„ธ ๋ชจ๋‹ฌ ํŒ์—…

3. ๋‹ต์•ˆ ์ƒ์„ธ ๋ชจ๋‹ฌ

  • ์„ ํƒํ•œ ๋‹ต์•ˆ
  • ๋‹ต๋ณ€ ์„ ํƒ ์ด์œ  (์ „์ฒด ์„ค๋ช…)
  • ํ’€์ด ์†Œ์š” ์‹œ๊ฐ„
  • ํš๋“ ์ ์ˆ˜

4. ๋ชจ๋ธ๋ณ„ ์ƒ์„ธ ํŽ˜์ด์ง€

  • ํ•ด๋‹น ๋ชจ๋ธ์˜ ์ „์ฒด ์„ฑ์ 
  • ๊ณผ๋ชฉ๋ณ„ ํƒญ
  • ๋ฌธ์ œ๋ณ„ ์ •๋‹ต/์˜ค๋‹ต ์ƒ์„ธ
  • ์ฐจํŠธ ๋ฐ ํ†ต๊ณ„

๐Ÿค ๊ธฐ์—ฌํ•˜๊ธฐ

๐ŸŒŸ ์ด ํ”„๋กœ์ ํŠธ๋Š” ๊ฐœ๋ฐœ ์ดˆ๊ธฐ ๋‹จ๊ณ„์ž…๋‹ˆ๋‹ค!

์—ฌ๋Ÿฌ๋ถ„์˜ ๊ธฐ์—ฌ๋ฅผ ํ™˜์˜ํ•ฉ๋‹ˆ๋‹ค. ํ•จ๊ป˜ AI ๋ฒค์น˜๋งˆํ‚น์˜ ์ƒˆ๋กœ์šด ํ‘œ์ค€์„ ๋งŒ๋“ค์–ด๊ฐ€์š”!

๐Ÿš€ ๊ธฐ์—ฌ ํ”„๋กœ์„ธ์Šค

1. ์ด์Šˆ ํ™•์ธ ๋˜๋Š” ์ƒ์„ฑ

  • ์ด์Šˆ ํŠธ๋ž˜์ปค์—์„œ ๊ธฐ์กด ์ด์Šˆ ํ™•์ธ
  • ์ƒˆ๋กœ์šด ์•„์ด๋””์–ด๊ฐ€ ์žˆ๋‹ค๋ฉด ์ด์Šˆ ์ƒ์„ฑ ํ›„ ํ† ๋ก 
  • good first issue ๋ผ๋ฒจ๋กœ ์ดˆ๋ณด์ž ์นœํ™”์ ์ธ ํƒœ์Šคํฌ ์ฐพ๊ธฐ

2. Fork & Clone

# Repository fork ํ›„
git clone https://github.com/YOUR_USERNAME/KSAT-AI-Benchmark.git
cd KSAT-AI-Benchmark

3. ๊ฐœ๋ฐœ ํ™˜๊ฒฝ ์„ค์ •

# Python ์˜์กด์„ฑ ์„ค์น˜
make install

# .env ํŒŒ์ผ ์ƒ์„ฑ ๋ฐ API ํ‚ค ์„ค์ •
make env
code .env  # API ํ‚ค ์ž…๋ ฅ

# ๊ฐœ๋ฐœ ๋„๊ตฌ ์„ค์น˜
make dev-install

4. Feature Branch ์ƒ์„ฑ

git checkout -b feature/amazing-feature
# ๋˜๋Š”
git checkout -b fix/bug-description

5. ๊ฐœ๋ฐœ ๋ฐ ํ…Œ์ŠคํŠธ

# ์ฝ”๋“œ ์ž‘์„ฑ

# ํ…Œ์ŠคํŠธ ์‹คํ–‰
make test

# ์ฝ”๋“œ ํฌ๋งทํŒ…
make format

# Linting
make lint

6. ์ปค๋ฐ‹ & ํ‘ธ์‹œ

git add .
git commit -m "feat: ๋ฉ‹์ง„ ๊ธฐ๋Šฅ ์ถ”๊ฐ€"
git push origin feature/amazing-feature

7. Pull Request ์ƒ์„ฑ

  • GitHub์—์„œ PR ์ƒ์„ฑ
  • ์ œ๋ชฉ: ๋ช…ํ™•ํ•˜๊ณ  ๊ฐ„๊ฒฐํ•˜๊ฒŒ (์˜ˆ: feat: ๊ณผ๋ชฉ๋ณ„ ๋ฆฌ๋”๋ณด๋“œ ํƒญ ์ถ”๊ฐ€)
  • ์„ค๋ช…: ๋ณ€๊ฒฝ ์‚ฌํ•ญ, ํ…Œ์ŠคํŠธ ๋ฐฉ๋ฒ•, ์Šคํฌ๋ฆฐ์ƒท(UI ๋ณ€๊ฒฝ ์‹œ) ํฌํ•จ

๐Ÿ“‹ ๊ธฐ์—ฌ ๊ฐ€์ด๋“œ๋ผ์ธ

๐Ÿ†• ์ƒˆ๋กœ์šด ์‹œํ—˜ ์ถ”๊ฐ€

  • โœ… ๊ณต๊ฐœ ๊ฐ€๋Šฅํ•œ ์‹œํ—˜ ๋ฌธ์ œ๋งŒ ์ถ”๊ฐ€ (์ €์ž‘๊ถŒ ํ™•์ธ ํ•„์ˆ˜)
  • โœ… Vision API ํŒŒ์‹ฑ ํ›„ ๋ฐ˜๋“œ์‹œ ์ˆ˜๋™ ๊ฒ€์ฆ
  • โœ… ์ •๋‹ตํ‘œ ํŒŒ์‹ฑ ํ›„ ์ƒ˜ํ”Œ ํ‰๊ฐ€๋กœ ๊ฒ€์ฆ
  • ๐Ÿ“ ์œ„์น˜: exams/pdf/YYYY/๊ณผ๋ชฉ์˜์—ญ_๋ฌธ์ œ์ง€_ํ™€์ˆ˜ํ˜•.pdf

๐Ÿค– ์ƒˆ๋กœ์šด ๋ชจ๋ธ ์ถ”๊ฐ€

  • โœ… models/models.json์— ์„ค์ • ์ถ”๊ฐ€
  • โœ… src/evaluator/models/์— provider ๊ตฌํ˜„ (์ƒˆ provider์ธ ๊ฒฝ์šฐ)
  • โœ… ์ตœ์†Œ 1๊ฐœ ์‹œํ—˜์œผ๋กœ ํ…Œ์ŠคํŠธ ํ›„ PR
  • ๐Ÿ“ README.md์˜ ๋ชจ๋ธ ๋ชฉ๋ก ์—…๋ฐ์ดํŠธ

๐Ÿ’ป ์ฝ”๋“œ ๊ธฐ์—ฌ

  • โœ… PEP 8 ์Šคํƒ€์ผ ๊ฐ€์ด๋“œ ์ค€์ˆ˜
  • โœ… ์ƒˆ ๊ธฐ๋Šฅ์€ ํ…Œ์ŠคํŠธ ์ฝ”๋“œ ํ•จ๊ป˜ ์ž‘์„ฑ
  • โœ… CLAUDE.md ์—…๋ฐ์ดํŠธ (์ค‘์š”ํ•œ ๋ณ€๊ฒฝ์‚ฌํ•ญ์˜ ๊ฒฝ์šฐ)
  • โœ… Vibe Coding ์ฒ ํ•™ ์œ ์ง€:
    • ์ง๊ด€์ ์ธ ๋ช…๋ น์–ด์™€ API
    • ์ง€๋Šฅํ˜• ์ž๋™ํ™”
    • ํˆฌ๋ช…ํ•œ ํ”„๋กœ์„ธ์Šค

๐Ÿ“š ๋ฌธ์„œํ™”

  • ๐Ÿ“– README.md: ์‚ฌ์šฉ์ž ๊ด€์ ์˜ ๊ฐ€์ด๋“œ
  • ๐Ÿ”ง CLAUDE.md: ๊ฐœ๋ฐœ์ž ๊ด€์ ์˜ ๊ฐ€์ด๋“œ
  • ๐Ÿ’ฌ ์ฝ”๋“œ ์ฃผ์„: ๋ณต์žกํ•œ ๋กœ์ง์€ ์„ค๋ช… ์ถ”๊ฐ€
  • ๐ŸŒ ํ•œ๊ตญ์–ด ์šฐ์„ : ๋ฌธ์„œ๋Š” ํ•œ๊ตญ์–ด๋กœ ์ž‘์„ฑ

๐ŸŽจ ํ”„๋ก ํŠธ์—”๋“œ ๊ธฐ์—ฌ

  • โœ… Next.js 15 + App Router ์‚ฌ์šฉ
  • โœ… Mantine UI v7 ์ปดํฌ๋„ŒํŠธ ํ™œ์šฉ
  • โœ… ๋ฐ˜์‘ํ˜• ๋””์ž์ธ (๋ชจ๋ฐ”์ผ ์ง€์›)
  • โœ… ์ ‘๊ทผ์„ฑ(a11y) ๊ณ ๋ ค

๐Ÿ—๏ธ ์ฃผ์š” ๊ธฐ์—ฌ ์˜์—ญ

๐Ÿ”ฅ ๊ธด๊ธ‰ (High Priority)

  • GitHub Actions ์›Œํฌํ”Œ๋กœ์šฐ ๊ตฌํ˜„ - ์ž๋™ ๋ฐฐํฌ ํŒŒ์ดํ”„๋ผ์ธ
  • ์›น์‚ฌ์ดํŠธ UI/UX ๊ฐœ์„  - ๊ณผ๋ชฉ๋ณ„ ํƒญ, ์ฐจํŠธ, ํ•„ํ„ฐ
  • ๋ชจ๋ธ ์ถ”๊ฐ€ - ์ตœ์‹  AI ๋ชจ๋ธ ๋ฒค์น˜๋งˆํ‚น

โšก ์ค‘์š” (Medium Priority)

  • PDF ํŒŒ์‹ฑ ๊ฐœ์„  - ์ •ํ™•๋„ ํ–ฅ์ƒ, ์˜ค๋ฅ˜ ์ฒ˜๋ฆฌ
  • ํ…Œ์ŠคํŠธ ์ปค๋ฒ„๋ฆฌ์ง€ - ๋‹จ์œ„ ํ…Œ์ŠคํŠธ, ํ†ตํ•ฉ ํ…Œ์ŠคํŠธ ์ถ”๊ฐ€
  • ์„ฑ๋Šฅ ์ตœ์ ํ™” - ํ‰๊ฐ€ ์†๋„, ์›น ๋กœ๋”ฉ ์‹œ๊ฐ„

๐Ÿ’ก ๊ฐœ์„  (Nice to Have)

  • ๋‹ค๊ตญ์–ด ์‹œํ—˜ ์ง€์› - SAT, ๊ฐ€์˜ค์นด์˜ค ๋“ฑ
  • API ์„œ๋น„์Šค - ํ‰๊ฐ€ ๊ฒฐ๊ณผ ์กฐํšŒ API
  • ์ฐจํŠธ & ์‹œ๊ฐํ™” - ์„ฑ๋Šฅ ์ถ”์ด, ๋น„๊ต ๊ทธ๋ž˜ํ”„

๐Ÿ› ๋ฒ„๊ทธ ๋ฆฌํฌํŠธ

๋ฒ„๊ทธ๋ฅผ ๋ฐœ๊ฒฌํ•˜์…จ๋‚˜์š”? ์ด์Šˆ๋ฅผ ์ƒ์„ฑํ•ด์ฃผ์„ธ์š”!

ํฌํ•จํ•  ๋‚ด์šฉ:

  • ๐Ÿ” ์žฌํ˜„ ๋ฐฉ๋ฒ•: ๋‹จ๊ณ„๋ณ„ ์„ค๋ช…
  • ๐ŸŽฏ ์˜ˆ์ƒ ๋™์ž‘: ์–ด๋–ป๊ฒŒ ์ž‘๋™ํ•ด์•ผ ํ•˜๋Š”์ง€
  • ๐Ÿ’ฅ ์‹ค์ œ ๋™์ž‘: ์‹ค์ œ๋กœ ์–ด๋–ป๊ฒŒ ์ž‘๋™ํ•˜๋Š”์ง€
  • ๐Ÿ–ผ๏ธ ์Šคํฌ๋ฆฐ์ƒท: ๊ฐ€๋Šฅํ•˜๋ฉด ์ฒจ๋ถ€
  • ๐Ÿ”ง ํ™˜๊ฒฝ: OS, Python ๋ฒ„์ „, Node.js ๋ฒ„์ „

๐Ÿ’ฌ ์งˆ๋ฌธ & ํ† ๋ก 


๐Ÿ™Œ ๊ธฐ์—ฌ์ž ํ–‰๋™ ๊ฐ•๋ น

  • ๐Ÿค ์กด์ค‘: ๋ชจ๋“  ๊ธฐ์—ฌ์ž๋ฅผ ์กด์ค‘ํ•ฉ๋‹ˆ๋‹ค
  • ๐ŸŒˆ ํฌ์šฉ: ๋‹ค์–‘ํ•œ ๋ฐฐ๊ฒฝ๊ณผ ๊ด€์ ์„ ํ™˜์˜ํ•ฉ๋‹ˆ๋‹ค
  • ๐ŸŽฏ ๊ฑด์„ค์ : ํ”ผ๋“œ๋ฐฑ์€ ๊ฑด์„ค์ ์ด๊ณ  ๊ตฌ์ฒด์ ์œผ๋กœ
  • ๐Ÿš€ ํ˜‘๋ ฅ: ํ•จ๊ป˜ ์„ฑ์žฅํ•˜๋Š” ์ปค๋ฎค๋‹ˆํ‹ฐ

ํ•จ๊ป˜ ๋งŒ๋“ค์–ด๊ฐ€์š”! ์ž‘์€ ๊ธฐ์—ฌ๋„ ํฐ ์˜ํ–ฅ์„ ๋งŒ๋“ญ๋‹ˆ๋‹ค. ๐Ÿ’ช

๐Ÿ“Š ๋ฒค์น˜๋งˆํฌ ๋Œ€์ƒ ๋ชจ๋ธ

ํ˜„์žฌ ํ™œ์„ฑํ™”๋œ ๋ชจ๋ธ (6์ข…):

์ œ๊ณต์‚ฌ ๋ชจ๋ธ ๋ฒ„์ „ ํŠน์ง•
OpenAI GPT-5 2025 ์ˆ˜ํ•™/๊ณผํ•™ ๋Šฅ๋ ฅ ๋Œ€ํญ ํ–ฅ์ƒ
OpenAI GPT-4o 2024-08 ์ตœ์‹  ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ๋ชจ๋ธ
Anthropic Claude Opus 4.1 2025-08 ์—์ด์ „ํŠธ, ์ฝ”๋”ฉ, ์ถ”๋ก  ๊ฐ•ํ™”
Anthropic Claude Sonnet 4.5 2025 Perplexity API๋กœ ์ œ๊ณต
Upstage Solar Pro 2024 ํ•œ๊ตญ์–ด ํŠนํ™” ๋ชจ๋ธ
Perplexity Sonar Pro 2024-11 ์ตœ์‹  ์ถ”๋ก  ๋ชจ๋ธ

๋น„ํ™œ์„ฑํ™”๋œ ๋ชจ๋ธ: GPT-4 Turbo, GPT-4, GPT-3.5-turbo, Claude 3.5 Sonnet, Claude 3 ์‹œ๋ฆฌ์ฆˆ, Gemini 2.0/1.5 ์‹œ๋ฆฌ์ฆˆ, Solar Mini, Sonar ์˜จ๋ผ์ธ ์‹œ๋ฆฌ์ฆˆ

๐Ÿ’ก ๋ชจ๋ธ ์„ ์ • ๊ธฐ์ค€: ๊ฐ ์ œ๊ณต์‚ฌ์˜ ์ตœ์‹ /์ตœ๊ฐ• ๋ชจ๋ธ ์ค‘์‹ฌ์œผ๋กœ ์—„์„ ํ•˜์—ฌ ๋ฒค์น˜๋งˆํฌ ํšจ์œจ์„ฑ ๊ทน๋Œ€ํ™”

์ƒˆ๋กœ์šด ๋ชจ๋ธ ์ถ”๊ฐ€ ์š”์ฒญ์€ ์ด์Šˆ๋กœ ๋‚จ๊ฒจ์ฃผ์„ธ์š”.

๐Ÿ“‹ ์‹œํ—˜ ๊ณผ๋ชฉ

  • ๊ตญ์–ด
  • ์ˆ˜ํ•™
  • ์˜์–ด
  • ํ•œ๊ตญ์‚ฌ
  • ์‚ฌํšŒํƒ๊ตฌ
  • ๊ณผํ•™ํƒ๊ตฌ

โš™๏ธ ํ™˜๊ฒฝ ๋ณ€์ˆ˜

.env ํŒŒ์ผ์— ๋‹ค์Œ ํ™˜๊ฒฝ ๋ณ€์ˆ˜๋ฅผ ์„ค์ •ํ•ด์ฃผ์„ธ์š”:

# AI Model API Keys
OPENAI_API_KEY=your_openai_api_key
ANTHROPIC_API_KEY=your_anthropic_api_key
GOOGLE_API_KEY=your_google_api_key
UPSTAGE_API_KEY=your_upstage_api_key
PERPLEXITY_API_KEY=your_perplexity_api_key

# Evaluation Settings
MAX_RETRIES=3
TIMEOUT=60
API_CALL_DELAY=1

๐Ÿงช ํ…Œ์ŠคํŠธ

# ์ „์ฒด ํ…Œ์ŠคํŠธ ์‹คํ–‰
pytest

# ํŠน์ • ํ…Œ์ŠคํŠธ ์‹คํ–‰
pytest tests/test_evaluator.py

# ์ปค๋ฒ„๋ฆฌ์ง€ ๋ฆฌํฌํŠธ
pytest --cov=src tests/

๐Ÿ“ ๋ผ์ด์„ ์Šค

์ด ํ”„๋กœ์ ํŠธ๋Š” CC BY-NC 4.0 (Creative Commons Attribution-NonCommercial 4.0 International) ๋ผ์ด์„ ์Šค๋ฅผ ๋”ฐ๋ฆ…๋‹ˆ๋‹ค.

  • โœ… ๋น„์ƒ์—…์  ์‚ฌ์šฉ: ๊ต์œก ๋ฐ ์—ฐ๊ตฌ ๋ชฉ์ ์œผ๋กœ ์ž์œ ๋กญ๊ฒŒ ์‚ฌ์šฉ ๊ฐ€๋Šฅ
  • ๐Ÿ“ ์ถœ์ฒ˜ ํ‘œ๊ธฐ ํ•„์ˆ˜: ์‚ฌ์šฉ ์‹œ ๋ฐ˜๋“œ์‹œ ์›๋ณธ ์ถœ์ฒ˜๋ฅผ ๋ช…์‹œํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค
  • ๐Ÿšซ ์ƒ์—…์  ์‚ฌ์šฉ ๊ธˆ์ง€: ์ƒ์—…์  ๋ชฉ์ ์œผ๋กœ๋Š” ์‚ฌ์šฉํ•  ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค (๋ณ„๋„ ๋ฌธ์˜ ํ•„์š”)

์ž์„ธํ•œ ๋‚ด์šฉ์€ LICENSE ํŒŒ์ผ์„ ์ฐธ๊ณ ํ•˜์„ธ์š”.

์ถœ์ฒ˜ ํ‘œ๊ธฐ ์˜ˆ์‹œ

KSAT AI Benchmark by roboco-io
Licensed under CC BY-NC 4.0
Source: https://github.com/roboco-io/KSAT-AI-Benchmark

๐Ÿ“ฎ ์—ฐ๋ฝ์ฒ˜

๐Ÿ™ ๊ฐ์‚ฌ์˜ ๋ง

  • ํ•œ๊ตญ๊ต์œก๊ณผ์ •ํ‰๊ฐ€์› - ํ‘œ์ค€ํ™”๋œ ๊ณ ํ’ˆ์งˆ ํ‰๊ฐ€ ๋„๊ตฌ(์ˆ˜๋Šฅ) ์ œ๊ณต
  • OpenAI, Anthropic, Google, Upstage, Perplexity - ๊ฐ•๋ ฅํ•œ AI ๋ชจ๋ธ๊ณผ API ์ œ๊ณต
  • ์˜คํ”ˆ์†Œ์Šค ์ปค๋ฎค๋‹ˆํ‹ฐ - ์ด ํ”„๋กœ์ ํŠธ์˜ ๊ธฐ๋ฐ˜์ด ๋˜๋Š” ์ˆ˜๋งŽ์€ ๋„๊ตฌ์™€ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ
  • ๋ชจ๋“  ๊ธฐ์—ฌ์ž๋“ค - ์ฝ”๋“œ, ๋ฌธ์„œ, ํ”ผ๋“œ๋ฐฑ์œผ๋กœ ํ”„๋กœ์ ํŠธ๋ฅผ ๋ฐœ์ „์‹œ์ผœ์ฃผ์‹  ๋ถ„๋“ค

๐Ÿ’ก ์˜๊ฐ๊ณผ ๋™๊ธฐ

์ด ํ”„๋กœ์ ํŠธ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์€ ์งˆ๋ฌธ์—์„œ ์‹œ์ž‘๋˜์—ˆ์Šต๋‹ˆ๋‹ค:

"AI๊ฐ€ ์–ผ๋งˆ๋‚˜ ๋˜‘๋˜‘ํ•œ์ง€ ์–ด๋–ป๊ฒŒ ์ธก์ •ํ• ๊นŒ?"

ํ•ฉ์„ฑ ๋ฒค์น˜๋งˆํฌ๋Š” AI๋ฅผ ์œ„ํ•ด ๋งŒ๋“ค์–ด์ง„ ๊ฒƒ์ด๊ณ , ์ธ๊ฐ„์ด ์‹ค์ œ๋กœ ์–ผ๋งˆ๋‚˜ ์–ด๋ ค์šด์ง€ ์ฒด๊ฐํ•˜๊ธฐ ์–ด๋ ต์Šต๋‹ˆ๋‹ค. ๊ทธ๋ž˜์„œ ์šฐ๋ฆฌ๋Š” ์ธ๊ฐ„์ด ์‹ค์ œ๋กœ ๋ณด๋Š” ์‹œํ—˜์„ ์„ ํƒํ–ˆ์Šต๋‹ˆ๋‹ค.

์ˆ˜๋Šฅ์€ ๋Œ€ํ•œ๋ฏผ๊ตญ์—์„œ ๊ฐ€์žฅ ํ‘œ์ค€ํ™”๋˜๊ณ , ๊ณต์ •ํ•˜๋ฉฐ, ์ข…ํ•ฉ์ ์ธ ์‚ฌ๊ณ ๋ ฅ์„ ํ‰๊ฐ€ํ•˜๋Š” ์‹œํ—˜์ž…๋‹ˆ๋‹ค. ๋งค๋…„ 50๋งŒ ๋ช…์˜ ํ•™์ƒ์ด ๋™์ผํ•œ ์กฐ๊ฑด์—์„œ ์‘์‹œํ•˜๊ณ , ๋ฌธ์ œ์˜ ์งˆ๊ณผ ๋‚œ์ด๋„๊ฐ€ ์ฒ ์ €ํžˆ ๊ฒ€์ฆ๋ฉ๋‹ˆ๋‹ค.

Vibe Coding์œผ๋กœ ์ด ๋ฒค์น˜๋งˆํฌ๋ฅผ ๊ตฌ์ถ•ํ•˜๋ฉด์„œ, ๋‹จ์ˆœํžˆ ์ ์ˆ˜๋ฅผ ๋งค๊ธฐ๋Š” ๊ฒƒ์„ ๋„˜์–ด AI๊ฐ€ ์–ด๋–ป๊ฒŒ ์ƒ๊ฐํ•˜๋Š”์ง€๋ฅผ ๋“ค์—ฌ๋‹ค๋ณผ ์ˆ˜ ์žˆ๊ฒŒ ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ๊ฐ ๋ฌธ์ œ๋งˆ๋‹ค AI์˜ ๋‹ต๋ณ€ ์ด์œ ๋ฅผ ๋ณด๋ฉด์„œ, ์ธ๊ฐ„๊ณผ AI์˜ ์‚ฌ๊ณ  ๋ฐฉ์‹ ์ฐจ์ด๋ฅผ ๋ฐœ๊ฒฌํ•˜๊ณ , ์•ž์œผ๋กœ AI๊ฐ€ ์–ด๋–ป๊ฒŒ ๋ฐœ์ „ํ•ด์•ผ ํ• ์ง€ ํžŒํŠธ๋ฅผ ์–ป์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์ด ํ”„๋กœ์ ํŠธ๊ฐ€ AI ๋ฐœ์ „์„ ์ถ”์ ํ•˜๊ณ , AI ๋Šฅ๋ ฅ์„ ์ดํ•ดํ•˜๋ฉฐ, ๋” ๋‚˜์€ AI๋ฅผ ๋งŒ๋“œ๋Š” ๋ฐ ์ž‘์€ ๊ธฐ์—ฌ๊ฐ€ ๋˜๊ธธ ๋ฐ”๋ž๋‹ˆ๋‹ค.

๐Ÿ—บ๏ธ ๋กœ๋“œ๋งต

2025 Q3-Q4 โœ… ์™„๋ฃŒ

  • ํ”„๋กœ์ ํŠธ ์ดˆ๊ธฐ ์„ค์ • ๋ฐ ์•„ํ‚คํ…์ฒ˜ ์„ค๊ณ„
  • PDF ํŒŒ์‹ฑ ์‹œ์Šคํ…œ ๊ตฌํ˜„ (Vision API ๊ธฐ๋ฐ˜)
  • YAML ๋ฐ์ดํ„ฐ ํ˜•์‹ ์ •์˜ ๋ฐ ๊ฒ€์ฆ
  • AI ๋ชจ๋ธ ํ‰๊ฐ€ ์‹œ์Šคํ…œ ๊ตฌํ˜„
  • GitHub Actions ์ž๋™ํ™” ํŒŒ์ดํ”„๋ผ์ธ
  • Next.js + Mantine UI ์›น ์ธํ„ฐํŽ˜์ด์Šค
  • ๋ฆฌ๋”๋ณด๋“œ ๋ฐ ๋ฌธ์ œ ๋ชฉ๋ก ํŽ˜์ด์ง€
  • ๋‹ต์•ˆ ์ƒ์„ธ ๋ชจ๋‹ฌ ๋ฐ ์ธํ„ฐ๋ž™์…˜
  • 2025 ์ˆ˜๋Šฅ 3๊ฐœ ๊ณผ๋ชฉ ํŒŒ์‹ฑ ์™„๋ฃŒ (๊ตญ์–ด, ์ˆ˜ํ•™, ์˜์–ด)
  • 6๊ฐœ ์ตœ์‹  AI ๋ชจ๋ธ ๋ฒค์น˜๋งˆํ‚น

2025 Q4 - 2026 Q1 ๐Ÿšง ์ง„ํ–‰ ์ค‘

  • 2026ํ•™๋…„๋„ ์ˆ˜๋Šฅ ๋ฒค์น˜๋งˆํฌ (2025๋…„ 11์›” 13์ผ ์‹œํ–‰ ์˜ˆ์ •)
  • ๊ณผ๊ฑฐ ์ˆ˜๋Šฅ ๋ฌธ์ œ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค ํ™•์žฅ (2024, 2023...)
  • ๊ณผ๋ชฉ๋ณ„ ์ƒ์„ธ ๋ถ„์„ ํŽ˜์ด์ง€ ๊ตฌํ˜„
  • ์ฐจํŠธ ๋ฐ ์‹œ๊ฐํ™” ๊ณ ๋„ํ™” (์„ฑ๋Šฅ ์ถ”์ด ๊ทธ๋ž˜ํ”„)
  • ๋ชจ๋ธ๋ณ„ ๊ฐ•์ /์•ฝ์  ๋ถ„์„ ๋ฆฌํฌํŠธ
  • ์—ฐ๋„๋ณ„ ์„ฑ๋Šฅ ๋น„๊ต ๋ถ„์„
  • ์„ฑ๋Šฅ ์ตœ์ ํ™” (ํ‰๊ฐ€ ์†๋„, ์›น ๋กœ๋”ฉ)
  • ํ…Œ์ŠคํŠธ ์ปค๋ฒ„๋ฆฌ์ง€ ํ™•๋Œ€

2026 Q2 ๐Ÿ’ก ๊ณ„ํš

  • ๋‹ค๊ตญ์–ด ์‹œํ—˜ ์ง€์› (SAT, ์ผ๋ณธ ์„ผํ„ฐ์‹œํ—˜ ๋“ฑ)
  • RESTful API ์„œ๋น„์Šค ์ œ๊ณต
  • ์‹ค์‹œ๊ฐ„ ๋ชจ๋ธ ๋น„๊ต ๊ธฐ๋Šฅ
  • ์ปค๋ฎค๋‹ˆํ‹ฐ ๊ธฐ์—ฌ ๋ชจ๋ธ ์ถ”๊ฐ€ ์ง€์›
  • ๋ชจ๋ฐ”์ผ ์•ฑ ๊ฒ€ํ† 

๐Ÿ“„ ์ €์ž‘๊ถŒ ๋ฐ ์ถœ์ฒ˜

์‹œํ—˜ ๋ฌธ์ œ ์ €์ž‘๊ถŒ

๋ณธ ํ”„๋กœ์ ํŠธ์—์„œ ์‚ฌ์šฉํ•˜๋Š” ๋Œ€ํ•œ๋ฏผ๊ตญ ์ˆ˜ํ•™๋Šฅ๋ ฅ์‹œํ—˜(KSAT) ๋ฌธ์ œ์˜ ๋ชจ๋“  ์ €์ž‘๊ถŒ์€ ํ•œ๊ตญ๊ต์œก๊ณผ์ •ํ‰๊ฐ€์›์— ์žˆ์Šต๋‹ˆ๋‹ค.

  • ์ถœ์ฒ˜: ํ•œ๊ตญ๊ต์œก๊ณผ์ •ํ‰๊ฐ€์› ์ˆ˜๋Šฅ ์ž๋ฃŒ์‹ค
  • ์ €์ž‘๊ถŒ์ž: ํ•œ๊ตญ๊ต์œก๊ณผ์ •ํ‰๊ฐ€์› (Korea Institute for Curriculum and Evaluation)
  • ์‚ฌ์šฉ ๋ชฉ์ : ๋ณธ ํ”„๋กœ์ ํŠธ๋Š” AI ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์„ ํ‰๊ฐ€ํ•˜๊ธฐ ์œ„ํ•œ ๋น„์˜๋ฆฌ ์—ฐ๊ตฌ ๋ฐ ๊ต์œก ๋ชฉ์ ์œผ๋กœ๋งŒ ์‹œํ—˜ ๋ฌธ์ œ๋ฅผ ํ™œ์šฉํ•ฉ๋‹ˆ๋‹ค.

์‹œํ—˜ ๋ฌธ์ œ์™€ ๊ด€๋ จํ•œ ๋ชจ๋“  ๊ถŒ๋ฆฌ๋Š” ํ•œ๊ตญ๊ต์œก๊ณผ์ •ํ‰๊ฐ€์›์— ์žˆ์œผ๋ฉฐ, ๋ณธ ํ”„๋กœ์ ํŠธ๋Š” ๊ณต๊ฐœ๋œ ์ž๋ฃŒ๋ฅผ ์—ฐ๊ตฌ ๋ชฉ์ ์œผ๋กœ ํ™œ์šฉํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

ํ”„๋กœ์ ํŠธ ๋ผ์ด์„ ์Šค

๋ณธ ํ”„๋กœ์ ํŠธ์˜ ์†Œ์Šค ์ฝ”๋“œ ๋ฐ ํ‰๊ฐ€ ๊ฒฐ๊ณผ๋Š” CC BY-NC 4.0 ๋ผ์ด์„ ์Šค๋ฅผ ๋”ฐ๋ฆ…๋‹ˆ๋‹ค.

  • โœ… ์ถœ์ฒ˜๋ฅผ ํ‘œ์‹œํ•˜๋ฉด ์ž์œ ๋กญ๊ฒŒ ๊ณต์œ  ๋ฐ ๋ณ€๊ฒฝ ๊ฐ€๋Šฅ
  • โŒ ์ƒ์—…์  ์ด์šฉ ๋ถˆ๊ฐ€
  • โ„น๏ธ ์‹œํ—˜ ๋ฌธ์ œ ์›๋ณธ์€ ํ•œ๊ตญ๊ต์œก๊ณผ์ •ํ‰๊ฐ€์›์˜ ์ €์ž‘๊ถŒ ์ •์ฑ…์„ ๋”ฐ๋ฆ…๋‹ˆ๋‹ค

โญ Star History

์ด ํ”„๋กœ์ ํŠธ๊ฐ€ ์œ ์šฉํ•˜๋‹ค๋ฉด โญ๏ธ๋ฅผ ๋ˆŒ๋Ÿฌ์ฃผ์„ธ์š”!


Made with โค๏ธ by the KSAT AI Benchmark Team

About

No description, website, or topics provided.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •