Skip to content

A Lightweight Desktop GUI Agent via Dynamic Focus Vision and Hierarchical Memory. Your AI-powered hands and eyes for desktop automation.

License

Notifications You must be signed in to change notification settings

black-yt/IrisGUI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

64 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

██╗██████╗ ██╗███████╗
██║██╔══██╗██║██╔════╝
██║██████╔╝██║███████╗
██║██╔══██╗██║╚════██║
██║██║  ██║██║███████║
╚═╝╚═╝  ╚═╝╚═╝╚══════╝

License Python GitHub 

A Lightweight Desktop GUI Agent via Dynamic Focus Vision and Hierarchical Memory

Lightweight • Minimal Code • Minimal Dependencies 🍃

Visual Perception • Infinite Memory • Long Interaction 💪🏻

🚀 What is Iris?

Iris is an intelligent agent designed to navigate your operating system just like a human does. It doesn't just blindly run scripts; it sees the screen, thinks about what to do, and acts with precision.

Iris is lightweight, with minimal code and dependencies, requiring only a single API key. Yet, it packs a punch with:

  • Visual Perception 👁️
  • Infinite Memory 🧠
  • Long Interaction 🔄

Powered by a robust ReAct (Reasoning + Acting) loop, Iris can handle complex workflows, recover from errors, and remember context over long periods thanks to its hierarchical memory system.


🆕 Latest News

🚩 Update (2026-01-18) We release Iris-v1.0.


🧠 Architecture

Iris operates on a cycle of Reasoning, Action, Observation and Reflection. Here's how the magic happens:

Iris Architecture

A dynamic focusing view strategy is adopted to improve positioning accuracy and action efficiency.

Dynamic Focusing View

Hierarchical memory can effectively avoid context explosion and prevent task forgetting.

Memory


✨ Key Features

Feature Description
🍃 Quick Installation Only need to install a few dependencies and configure an API.
👁️ Dynamic Focus Vision Uses Global (coarse) and Local (fine) views to locate elements with pixel-perfect accuracy.
🧠 Hierarchical Memory Smartly compresses history into Short-term and Long-term layers. No more token overflow!
🔄 Long Interaction Complete super-long real-world tasks with 100 steps or more.
🛡️ Self-Correction Verifies cursor position before clicking. If it misses, it adjusts and tries again.
🎮 Human-Like Control Smooth mouse movements, typing, scrolling, and even drag-and-drop support.
📺 Live Debug Mode Watch Iris think and act in real-time with a dedicated GUI dashboard.

🎞️ Demos

  • Task: 玩一局植物大战僵尸
3.mp4

  • Task: Open Google Chrome and search for Shanghai's weather
1.mp4

  • Task: Open Story.txt and write a short story of 100 words
2.mp4

⚡ Quick Start

Ready to let Iris take the wheel? Follow these steps to get started in minutes!

1. Clone the Repository

git clone https://github.com/black-yt/IrisGUI.git
cd IrisGUI

2. Install Dependencies

Make sure you have Python 3.10+ installed.

pip install -r requirements.txt

3. Configure Environment

Create a .env file in the root directory (copy from .env.example) and add your LLM credentials:

LLM_API_ENDPOINT="https://base-url/v1"
LLM_API_KEY="sk-your-api-key-here"
LLM_MODEL_NAME="gemini-3-pro"

4. Run Iris

python main.py

💡 Tip: To stop Iris in an emergency, press ESC three times quickly! 🛑


📬 Contact

  • 💬 GitHub Issues: Please open an issue for bug reports or feature requests

  • 📧 Email: xu_wanghan@sjtu.edu.cn


🌟 Star History

If you find this work helpful, please consider to star⭐ this repo. Thanks for your support! 🤩

black-yt/IrisGUI Stargazers

🔝Back to top

About

A Lightweight Desktop GUI Agent via Dynamic Focus Vision and Hierarchical Memory. Your AI-powered hands and eyes for desktop automation.

Topics

Resources

License

Stars

Watchers

Forks

Languages