██╗██████╗ ██╗███████╗ ██║██╔══██╗██║██╔════╝ ██║██████╔╝██║███████╗ ██║██╔══██╗██║╚════██║ ██║██║ ██║██║███████║ ╚═╝╚═╝ ╚═╝╚═╝╚══════╝
A Lightweight Desktop GUI Agent via Dynamic Focus Vision and Hierarchical Memory
Lightweight • Minimal Code • Minimal Dependencies 🍃
Visual Perception • Infinite Memory • Long Interaction 💪🏻
Iris is an intelligent agent designed to navigate your operating system just like a human does. It doesn't just blindly run scripts; it sees the screen, thinks about what to do, and acts with precision.
Iris is lightweight, with minimal code and dependencies, requiring only a single API key. Yet, it packs a punch with:
- Visual Perception 👁️
- Infinite Memory 🧠
- Long Interaction 🔄
Powered by a robust ReAct (Reasoning + Acting) loop, Iris can handle complex workflows, recover from errors, and remember context over long periods thanks to its hierarchical memory system.
🚩 Update (2026-01-18) We release Iris-v1.0.
Iris operates on a cycle of Reasoning, Action, Observation and Reflection. Here's how the magic happens:
A dynamic focusing view strategy is adopted to improve positioning accuracy and action efficiency.
Hierarchical memory can effectively avoid context explosion and prevent task forgetting.
| Feature | Description |
|---|---|
| 🍃 Quick Installation | Only need to install a few dependencies and configure an API. |
| 👁️ Dynamic Focus Vision | Uses Global (coarse) and Local (fine) views to locate elements with pixel-perfect accuracy. |
| 🧠 Hierarchical Memory | Smartly compresses history into Short-term and Long-term layers. No more token overflow! |
| 🔄 Long Interaction | Complete super-long real-world tasks with 100 steps or more. |
| 🛡️ Self-Correction | Verifies cursor position before clicking. If it misses, it adjusts and tries again. |
| 🎮 Human-Like Control | Smooth mouse movements, typing, scrolling, and even drag-and-drop support. |
| 📺 Live Debug Mode | Watch Iris think and act in real-time with a dedicated GUI dashboard. |
- Task:
玩一局植物大战僵尸
3.mp4
- Task:
Open Google Chrome and search for Shanghai's weather
1.mp4
- Task:
Open Story.txt and write a short story of 100 words
2.mp4
Ready to let Iris take the wheel? Follow these steps to get started in minutes!
git clone https://github.com/black-yt/IrisGUI.git
cd IrisGUIMake sure you have Python 3.10+ installed.
pip install -r requirements.txtCreate a .env file in the root directory (copy from .env.example) and add your LLM credentials:
LLM_API_ENDPOINT="https://base-url/v1"
LLM_API_KEY="sk-your-api-key-here"
LLM_MODEL_NAME="gemini-3-pro"python main.py💡 Tip: To stop Iris in an emergency, press ESC three times quickly! 🛑
-
💬 GitHub Issues: Please open an issue for bug reports or feature requests
-
📧 Email: xu_wanghan@sjtu.edu.cn
If you find this work helpful, please consider to star⭐ this repo. Thanks for your support! 🤩


