Skip to content
View davidkimai's full-sized avatar
💭
optimizing my reward function
💭
optimizing my reward function

Block or report davidkimai

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Pinned Loading

  1. dash dash Public

    Dash - Agent Orchestration Platform

    TypeScript 1

  2. misalignment-monitoring misalignment-monitoring Public

    SPAR AI. A minimal viable demo showcasing monitoring architecture for detecting deceptive AI behavior in the wild.

    Python

  3. sabotage sabotage Public

    Heron AI 90min Work Test | Project - Michael Chen METR Sabotage Threat Modeling. A small eval harness designed to measure Monitor Negligence. The script simulates a "Monitor" (the insider) reviewin…

    Python 2

  4. ralph-zero ralph-zero Public

    Ralph Zero - Your agents can now orchestrate Ralph using Skills! Ralph Zero is an orchestrator system wrapped in an Agent Skills package over Geoffrey Huntley's Ralph Loop that implements complex m…

    Python 7 2

  5. Context-Engineering Context-Engineering Public

    "Context engineering is the delicate art and science of filling the context window with just the right information for the next step." — Andrej Karpathy. A frontier, first-principles handbook inspi…

    Python 8.3k 940

  6. RL101 RL101 Public

    Agentic Reinforcement Learning 101. A pragmatic course for AI/ML Engineers based on "The Landscape of Agentic Reinforcement Learning for LLMs: A Survey" https://arxiv.org/abs/2509.02547

    Roff 16 2