Enterprise-Grade AI Agent Platform | High Performance | Low Cost | Out-of-the-Box
Quick Start • Architecture • API Docs • 中文
- Project Background
- Core Features
- Architecture Overview
- Tech Stack
- Quick Start
- Usage Examples
- Performance Metrics
- Roadmap
- Contribution Guidelines
- Community & Support
- License
- References
In the process of implementing enterprise AI applications, we face the following core challenges:
- High Inference Costs: LLM inference costs account for 60%-80% of the total AI application cost, severely restricting large-scale adoption.
- Prominent Performance Bottlenecks: High inference latency (P95 > 5s) makes it difficult to meet real-time interaction demands.
- Low Development Efficiency: Building an Agent from 0 to 1 takes 3-6 months due to a lack of standardized frameworks.
- Data Security Risks: Risks of sensitive data (PII) leakage and a lack of privacy protection mechanisms.
- Difficulty in Continuous Optimization: Lack of a feedback loop makes it impossible to continuously improve model performance.
OpenEAAP (Open Enterprise AI Agent Platform) is an enterprise-grade AI Agent platform designed to solve the above pain points:
graph LR
subgraph PAIN[业务痛点]
P1[推理成本高<br/>60-80%总成本]
P2[性能瓶颈<br/>P95延迟>5s]
P3[开发周期长<br/>3-6个月]
P4[数据风险<br/>PII泄露]
P5[优化困难<br/>无反馈闭环]
end
subgraph SOLUTION[OpenEAAP解决方案]
S1[三级缓存<br/>降低60%成本]
S2[高效推理<br/>P95<1.5s]
S3[标准框架<br/>缩短50%周期]
S4[隐私网关<br/>自动脱敏]
S5[在线学习<br/>持续优化]
end
P1 --> S1
P2 --> S2
P3 --> S3
P4 --> S4
P5 --> S5
style PAIN fill:#ffebee
style SOLUTION fill:#e8f5e9
- Three-Level Smart Caching: L1 Local (<1ms) + L2 Redis (<10ms) + L3 Vector (<50ms), with a hit rate of 50%+.
- vLLM Integration: PagedAttention, KV-Cache sharing, and speculative decoding increase throughput by 24x.
- Smart Routing: Automatically selects the optimal model based on complexity, latency requirements, and cost budget.
Performance Comparison:
| Metric | Before Optimization | After Optimization | Improvement |
|---|---|---|---|
| P95 Latency | 5000ms | 1500ms | ⬇️ 70% |
| Inference Cost | $1.00/1K tokens | $0.40/1K tokens | ⬇️ 60% |
| GPU Utilization | 40% | 75% | ⬆️ 87.5% |
- Multi-Runtime Support: Seamless switching between Native, LangChain, and AutoGPT.
- Visual Orchestration: Low-code development by defining Agents and Workflows via YAML/JSON.
- Plugin Ecosystem: Supports dynamic loading of third-party plugins, offering unlimited extensibility.
graph TB
subgraph ORCHESTRATOR[编排引擎(Orchestrator)]
REQ[用户请求] --> PARSER[请求解析器<br/>Request Parser]
PARSER --> POLICY[策略检查<br/>Policy Check]
POLICY --> ROUTER[路由器<br/>Router]
ROUTER -->|原生| NATIVE[原生运行时<br/>Native Runtime]
ROUTER -->|LangChain| LC[LangChain<br/>适配器]
ROUTER -->|AutoGPT| AG[AutoGPT<br/>适配器]
NATIVE --> EXEC[执行器<br/>Executor]
LC --> EXEC
AG --> EXEC
EXEC --> RESULT[执行结果]
end
subgraph PLUGINS[插件层(Plugins)]
TOOL1[工具插件<br/>Tool Plugins]
TOOL2[记忆插件<br/>Memory Plugins]
TOOL3[自定义插件<br/>Custom Plugins]
end
EXEC -.调用.-> PLUGINS
style ORCHESTRATOR fill:#e3f2fd
style PLUGINS fill:#fff3e0
- Hybrid Retrieval: Three-way recall using Vector + Keyword + Knowledge Graph, improving accuracy by 30%.
- Smart Reranking: Optimization based on multiple dimensions including relevance, freshness, and authority.
- Answer Verification: Hallucination detection + citation checking to ensure generation quality.
- RLHF/DPO Training: Continuously optimize models based on human feedback.
- Prompt Auto-Optimization: Automatically discover and optimize Prompt templates.
- A/B Testing: Parallel testing of multiple versions to quickly verify optimization effects.
sequenceDiagram
participant USER as 用户(User)
participant AGENT as Agent
participant FEEDBACK as 反馈收集器<br/>Feedback Collector
participant OPTIMIZER as 优化器<br/>Optimizer
participant TRAINER as 训练服务<br/>Training Service
USER->>AGENT: 1. 发送请求
AGENT->>USER: 2. 返回响应
USER->>FEEDBACK: 3. 提交反馈<br/>(评分/修正)
FEEDBACK->>OPTIMIZER: 4. 聚合反馈数据
OPTIMIZER->>OPTIMIZER: 5. 生成优化建议<br/>(Prompt调整/模型微调)
OPTIMIZER->>TRAINER: 6. 触发训练任务
TRAINER->>AGENT: 7. 部署新版本模型
AGENT->>USER: 8. 使用优化后模型
Note over FEEDBACK,TRAINER: 持续优化闭环(Continuous Optimization Loop)
- Privacy Computing Gateway: Automatically detects and masks PII (Name, Email, Phone, etc.).
- Policy Engine: Fine-grained access control based on ABAC/RBAC.
- Full-Link Audit: Records all operations to meet SOC2/GDPR/PCI-DSS compliance requirements.
OpenEAAP adopts the classic DDD (Domain-Driven Design) Layered Architecture, with clear division of responsibilities ensuring system maintainability and scalability.
graph TB
subgraph API[接口层(Interface Layer)]
HTTP[HTTP API<br/>REST/GraphQL]
GRPC[gRPC API<br/>高性能RPC]
CLI[CLI工具<br/>命令行管理]
end
subgraph APP[应用层(Application Layer)]
SERVICE1[Agent服务<br/>Agent Service]
SERVICE2[Workflow服务<br/>Workflow Service]
SERVICE3[Model服务<br/>Model Service]
SERVICE4[Data服务<br/>Data Service]
end
subgraph PLATFORM[平台层(Platform Layer)]
ORCH[编排器<br/>Orchestrator]
RUNTIME[运行时<br/>Runtime]
INFERENCE[推理引擎<br/>Inference Engine]
RAG[RAG引擎<br/>RAG Engine]
LEARNING[在线学习<br/>Online Learning]
TRAINING[训练服务<br/>Training Service]
end
subgraph DOMAIN[领域层(Domain Layer)]
AGENT_D[Agent领域<br/>Agent Domain]
WORKFLOW_D[Workflow领域<br/>Workflow Domain]
MODEL_D[Model领域<br/>Model Domain]
KNOWLEDGE_D[Knowledge领域<br/>Knowledge Domain]
end
subgraph INFRA[基础设施层(Infrastructure Layer)]
REPO[仓储实现<br/>Repository]
VECTOR[向量数据库<br/>Vector DB]
STORAGE[对象存储<br/>Object Storage]
MQ[消息队列<br/>Message Queue]
end
subgraph GOV[治理层(Governance Layer)]
POLICY[策略引擎<br/>Policy Engine]
AUDIT[审计<br/>Audit]
COMPLIANCE[合规<br/>Compliance]
end
subgraph OBS[可观测性层(Observability Layer)]
TRACE[分布式追踪<br/>Tracing]
METRICS[指标收集<br/>Metrics]
LOGGING[日志<br/>Logging]
end
API --> APP
APP --> PLATFORM
APP --> DOMAIN
PLATFORM --> DOMAIN
DOMAIN --> INFRA
GOV -.横切.-> PLATFORM
GOV -.横切.-> APP
OBS -.横切.-> PLATFORM
OBS -.横切.-> APP
style API fill:#e1f5fe
style APP fill:#f3e5f5
style PLATFORM fill:#fff9c4
style DOMAIN fill:#c8e6c9
style INFRA fill:#ffccbc
style GOV fill:#ffebee
style OBS fill:#f0f4c3
Layer Responsibilities:
| Layer | Responsibility | Example Components |
|---|---|---|
| Interface Layer | Exposes APIs, handles requests/responses | HTTP Handler, gRPC Server, CLI Commands |
| Application Layer | Orchestrates business processes, coordinates domain services | AgentService, WorkflowService |
| Platform Layer | Core capability components (Orchestration, Inference, RAG, etc.) | Orchestrator, Inference Engine, RAG Engine |
| Domain Layer | Core business logic, domain entities, and domain services | Agent, Workflow, Model Entities and Domain Services |
| Infrastructure Layer | Data persistence and external system integration | PostgreSQL, Redis, Milvus, MinIO |
| Governance Layer | Security, compliance, auditing | Policy Engine, Audit Logs, PII Detection |
| Observability Layer | Monitoring, tracing, logging | OpenTelemetry, Prometheus, Loki |
The following sequence diagram shows the processing flow of a complete Agent execution request:
sequenceDiagram
participant USER as 用户(User)
participant APIGW as API网关<br/>API Gateway
participant ORCH as 编排器<br/>Orchestrator
participant PEP as 策略执行点<br/>PEP
participant RUNTIME as 运行时<br/>Runtime
participant INF as 推理引擎<br/>Inference Engine
participant CACHE as 缓存<br/>Cache
participant LLM as 大模型<br/>LLM
participant AUDIT as 审计<br/>Audit
USER->>APIGW: 1. 发送Agent请求
APIGW->>APIGW: 2. 认证与限流
APIGW->>ORCH: 3. 转发请求
ORCH->>PEP: 4. 策略检查
PEP->>ORCH: 5. 返回策略决策(Permit/Deny)
ORCH->>RUNTIME: 6. 选择运行时并执行
RUNTIME->>INF: 7. 调用推理引擎
INF->>CACHE: 8. 查询缓存<br/>(L1→L2→L3)
alt 缓存命中(Cache Hit)
CACHE->>INF: 9a. 返回缓存结果
else 缓存未命中(Cache Miss)
INF->>LLM: 9b. 调用大模型推理
LLM->>INF: 10. 返回推理结果
INF->>CACHE: 11. 写入缓存
end
INF->>RUNTIME: 12. 返回结果
RUNTIME->>ORCH: 13. 返回执行结果
ORCH->>AUDIT: 14. 记录审计日志
ORCH->>APIGW: 15. 返回响应
APIGW->>USER: 16. 返回最终结果
Note over CACHE,LLM: 缓存命中可减少70%延迟和60%成本
OpenEAAP supports flexible deployment methods, allowing you to choose between single-node deployment or distributed cluster deployment based on business scale.
graph TB
subgraph INTERNET[互联网(Internet)]
USER[用户(User)]
end
subgraph EDGE[边缘层(Edge Layer)]
CDN[CDN]
WAF[WAF防火墙<br/>Web Application Firewall]
end
subgraph GATEWAY[网关层(Gateway Layer)]
LB[负载均衡器<br/>Load Balancer]
APIGW[API网关<br/>API Gateway]
end
subgraph APP层[应用层(Application Layer)]
WEB1[Web服务1<br/>Web Service]
WEB2[Web服务2<br/>Web Service]
ORCH1[编排服务1<br/>Orchestrator]
ORCH2[编排服务2<br/>Orchestrator]
end
subgraph PLATFORM[平台层(Platform Layer)]
AEP[AEP服务集群<br/>Agent Execution Platform]
DIKF[DIKF服务集群<br/>Data & Inference]
MSIG[MSIG服务集群<br/>Model Service]
MTFS[MTFS服务集群<br/>Training & Feedback]
end
subgraph DATA[数据层(Data Layer)]
PG[(PostgreSQL<br/>主从集群)]
REDIS[(Redis<br/>集群)]
MILVUS[(Milvus<br/>向量数据库)]
MINIO[(MinIO<br/>对象存储)]
end
subgraph K8S[基础设施层(Infrastructure)]
CLUSTER[Kubernetes集群<br/>Kubernetes Cluster]
GPU[GPU节点池<br/>GPU Node Pool]
end
USER --> CDN
CDN --> WAF
WAF --> LB
LB --> APIGW
APIGW --> APP层
APP层--> PLATFORM
PLATFORM --> DATA
K8S -.编排.-> APP层
K8S -.编排.-> PLATFORM
GPU -.提供算力.-> PLATFORM
style INTERNET fill:#e3f2fd
style EDGE fill:#fff3e0
style GATEWAY fill:#f3e5f5
style APP层 fill:#c8e6c9
style PLATFORM fill:#fff9c4
style DATA fill:#ffccbc
style K8S fill:#f0f4c3
Deployment Highlights:
- Edge Layer: CDN for static resource acceleration, WAF for Web attack protection.
- Gateway Layer: Load Balancer + API Gateway as the unified entry point, integrating authentication, rate limiting, and routing.
- Application Layer: Stateless services supporting horizontal scaling.
- Platform Layer: Core capability components, split into microservices by business module.
- Data Layer: Primary-Replica architecture ensures high availability; distributed storage ensures performance.
- Infrastructure Layer: Kubernetes orchestration; GPU node pools support model inference and training.
| Category | Technology | Usage |
|---|---|---|
| Language | Go 1.24+ | High-performance backend services |
| Web Framework | Gin / Echo | HTTP API Services |
| RPC Framework | gRPC | High-performance internal communication |
| Database | PostgreSQL 14+ | Relational data storage |
| Cache | Redis 7+ | Distributed caching |
| Vector DB | Milvus 2.3+ | Vector retrieval |
| Object Storage | MinIO / S3 | File storage |
| Message Queue | Kafka / NATS | Asynchronous tasks and event-driven architecture |
| Inference Engine | vLLM | High-performance LLM inference |
| Training Framework | DeepSpeed / Megatron-LM | Distributed training |
| Containerization | Docker | Application packaging |
| Orchestration | Kubernetes | Container orchestration |
| Monitoring | Prometheus + Grafana | Metrics monitoring |
| Tracing | OpenTelemetry + Jaeger | Distributed tracing |
| Logging | Loki | Log aggregation |
- Go 1.24 or higher
- Docker and Docker Compose (for local development)
- Kubernetes 1.25+ (for production deployment)
- PostgreSQL 14+, Redis 7+, Milvus 2.3+ (can be started quickly via Docker Compose)
git clone [https://github.com/openeeap/openeeap.git](https://github.com/openeeap/openeeap.git)
cd openeeap
Use Docker Compose to quickly start dependent services like PostgreSQL, Redis, Milvus, and MinIO:
docker-compose up -d
Run the database migration script to create table structures:
make migrate-up
Copy the configuration file template and modify it according to your actual environment:
cp configs/development.yaml.example configs/development.yaml
# Edit configs/development.yaml to configure DB connection, Redis, Milvus, etc.
# Start HTTP Service
make run-server
# Or run directly using Go command
go run cmd/server/main.go --config configs/development.yaml
After the service starts, you can access it via the following addresses:
- HTTP API:
http://localhost:8080 - gRPC API:
localhost:9090 - Swagger UI:
http://localhost:8080/swagger/index.html
# Run all tests
make test
# Run unit tests
make test-unit
# Run integration tests
make test-integration
# View test coverage
make test-coverage
Create a GPT-4 based Customer Service Agent via HTTP API:
curl -X POST http://localhost:8080/api/v1/agents \
-H "Content-Type: application/json" \
-d '{
"name": "Customer Support Agent",
"description": "Automatically answers common customer questions",
"runtime_type": "native",
"config": {
"model": "gpt-4",
"temperature": 0.7,
"max_tokens": 500,
"system_prompt": "You are a professional customer support assistant responsible for answering customer questions about the product."
}
}'
Response Example:
{
"id": "agent-123456",
"name": "Customer Support Agent",
"status": "active",
"created_at": "2026-01-15T10:30:00Z"
}
Send a request to the Agent to get an intelligent response:
curl -X POST http://localhost:8080/api/v1/agents/agent-123456/execute \
-H "Content-Type: application/json" \
-d '{
"input": "What payment methods does your product support?",
"context": {
"user_id": "user-789",
"session_id": "session-abc"
}
}'
Response Example:
{
"output": "We support the following payment methods: 1. Alipay 2. WeChat Pay 3. Credit Cards (Visa, MasterCard) 4. Bank Transfer. You can choose the most convenient method at checkout.",
"execution_time_ms": 1200,
"cached": false,
"trace_id": "trace-xyz789"
}
Define a multi-step Workflow (e.g., Document Analysis Flow):
# workflow.yaml
name: Document Analysis Workflow
description: Automatically analyze uploaded contract documents and extract key information
steps:
- id: step1
name: Document Parsing
agent: document-parser
input: "${workflow.input.document_url}"
- id: step2
name: Information Extraction
agent: info-extractor
input: "${step1.output.text}"
depends_on:
- step1
- id: step3
name: Risk Assessment
agent: risk-analyzer
input: "${step2.output.entities}"
depends_on:
- step2
trigger:
type: manual
Create Workflow via CLI:
openeeap workflow create -f workflow.yaml
Run Workflow:
openeeap workflow run document-analysis-workflow \
--input '{"document_url": "[https://example.com/contract.pdf](https://example.com/contract.pdf)"}'
Performance test results based on real business workloads (100 concurrent users, sustained for 10 minutes):
| Metric | OpenEAAP (Optimized) | Traditional Solution (Before) | Improvement |
|---|---|---|---|
| P50 Latency | 800ms | 2500ms | ⬇️ 68% |
| P95 Latency | 1500ms | 5000ms | ⬇️ 70% |
| P99 Latency | 2200ms | 8000ms | ⬇️ 72.5% |
| QPS | 120 | 45 | ⬆️ 166% |
| Cache Hit Rate | 52% | 0% | New Capability |
| Inference Cost | $0.40/1K tokens | $1.00/1K tokens | ⬇️ 60% |
| Resource | Utilization | Description |
|---|---|---|
| GPU | 75% | Significantly improved utilization via vLLM optimization |
| CPU | 60% | Efficient concurrent processing |
| Memory | 65% | KV-Cache sharing reduces memory footprint |
| Network | 40% | Streaming response reduces bandwidth consumption |
- Core framework setup (DDD Architecture)
- Orchestrator & Runtime (Native, LangChain Adapters)
- Inference Gateway (vLLM Integration)
- Three-level caching architecture
- RAG Engine (Hybrid retrieval, Reranking)
- Vector Database Integration (Milvus)
- Knowledge Graph Construction
- Document Processing Pipeline (Parsing, Chunking, Vectorization)
- Online Learning Engine
- RLHF/DPO Training Flow
- Prompt Auto-Optimization
- A/B Testing Framework
- Policy Engine (ABAC/RBAC)
- Privacy Computing Gateway (PII Detection & Masking)
- Audit & Compliance (SOC2, GDPR)
- Vulnerability Scanning & Security Hardening
- AutoGPT Adapter
- Plugin Marketplace
- Multimodal Support (Image, Voice)
- Edge AI Deployment
- Code Style: Follow the official Go code specifications; use
gofmtandgolangci-lintto check code. - Commit Messages: Follow the Conventional Commits specification.
- Test Coverage: New code must include unit tests with coverage no less than 80%.
- Documentation Updates: If API changes or new features are involved, documentation must be updated synchronously.
| Type | Description | Example |
|---|---|---|
feat |
New feature | feat: Add RLHF training support |
fix |
Bug fix | fix: Fix cache penetration issue |
docs |
Documentation update | docs: Update API documentation |
style |
Code formatting (no logic change) | style: Unify code indentation |
refactor |
Code refactoring | refactor: Optimize orchestrator architecture |
perf |
Performance optimization | perf: Optimize vector retrieval performance |
test |
Testing related | test: Add inference engine integration tests |
chore |
Build/Tools related | chore: Upgrade Go version to 1.24 |
Before submitting a Pull Request, please ensure:
- ✅ Code passes all tests (
make test) - ✅ Code passes Lint checks (
make lint) - ✅ New features have corresponding unit and integration tests
- ✅ Documentation is updated (README, API Docs, Architecture Docs)
- ✅ Commit messages comply with Conventional Commits specifications
- ✅ Branch is based on the latest
mainbranch
Recommended tools to improve development efficiency:
# Install development tools
make install-tools
# Includes:
# - golangci-lint (Code linting)
# - protoc (gRPC code generation)
# - mockgen (Mock generation)
# - swagger (API documentation generation)
OpenEAAP is licensed under the Apache License 2.0.
Copyright 2026 OpenEAAP Authors
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
[http://www.apache.org/licenses/LICENSE-2.0](http://www.apache.org/licenses/LICENSE-2.0)
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
Key Points:
- ✅ Commercial Friendly: Can be used for commercial projects without paying copyright fees.
- ✅ Freedom to Modify: Source code can be freely modified and published.
- ✅ Patent Protection: Contributors grant patent licenses to protect users from patent litigation.
⚠️ Declaration Obligation: Modified code must state the changes made.
See the LICENSE file for the full license text.
The development of OpenEAAP relies on the support of the following open-source projects and communities:
- vLLM - High-performance LLM inference engine
- LangChain - Agent development framework
- Milvus - Vector database
- Gin - Go Web framework
- GORM - Go ORM library
- OpenTelemetry - Observability standard
- Prometheus - Monitoring system
Thanks to the following organizations and individuals for their contributions to the project:
- Early Contributors: Thanks to all contributors who submitted code, documents, and bug reports.
- Beta Users: Thanks to the enterprises and teams who piloted OpenEAAP in production environments.
- [1] Go Project Layout Standard. https://github.com/golang-standards/project-layout
- [2] LangChain Official Documentation. https://python.langchain.com/docs/get_started/introduction
- [3] vLLM Official Documentation. https://docs.vllm.ai/en/latest/
- [4] Milvus Official Documentation. https://milvus.io/docs
- [5] OpenTelemetry Specification. https://opentelemetry.io/docs/
- [6] RLHF Paper. https://arxiv.org/abs/2203.02155
- [7] DPO Paper. https://arxiv.org/abs/2305.18290
- [8] RAG Survey Paper. https://arxiv.org/abs/2312.10997
- [9] Kubernetes Best Practices. https://kubernetes.io/docs/concepts/
- [10] Apache License 2.0. https://www.apache.org/licenses/LICENSE-2.0
- [11] Conventional Commits Specification. https://www.conventionalcommits.org/
- [12] Domain-Driven Design (DDD). https://www.domainlanguage.com/ddd/