Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
78 changes: 63 additions & 15 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,9 @@
| 模块 | 说明 |
|------|------|
| **Topic Search** | 多主题聚合检索,支持 papers.cool + arXiv API + Hugging Face Daily Papers 三数据源,跨 query/branch 去重与评分排序,`min_score` 质量过滤 |
| **DailyPaper** | 日报生成(Markdown/JSON),可选 LLM 增强(摘要/趋势/洞察/相关性),支持定时推送(Email/Slack/钉钉) |
| **LLM-as-Judge** | 5 维评分(Relevance/Novelty/Rigor/Impact/Clarity)+ 推荐分级(must_read/worth_reading/skim/skip),Token Budget 控制,多轮校准 |
| **Analyze SSE** | Judge + Trend 分析通过 SSE 实时流式推送,前端增量渲染(逐张 Judge 卡片 / 逐条 Trend 分析) |
| **DailyPaper** | 日报生成(Markdown/JSON),SSE 实时流式推送全流程进度,可选 LLM 增强(摘要/趋势/洞察/相关性),Judge 评分后自动过滤低质论文,支持定时推送(Email/Slack/钉钉) |
| **LLM-as-Judge** | 5 维评分(Relevance/Novelty/Rigor/Impact/Clarity)+ 推荐分级(must_read/worth_reading/skim/skip),Token Budget 控制,多轮校准,评分后自动过滤 skip/skim 论文 |
| **Analyze SSE** | Judge + Trend 分析通过 SSE 实时流式推送,前端增量渲染(逐张 Judge 卡片 / 逐条 Trend 分析),完整 Judge 日志保留 |
| **学者追踪** | 定期监测学者论文,多 Agent 协作(Research/Code/Quality/Reviewer),PIS 影响力评分(引用速度、趋势动量) |
| **深度评审** | 模拟同行评审(初筛→深度批评→决策),输出 Summary/Strengths/Weaknesses/Novelty Score |
| **Paper2Code** | 论文到代码骨架(Planning→Analysis→Generation→Verification),自愈调试,Docker/E2B 沙箱执行 |
Expand Down Expand Up @@ -82,6 +82,24 @@ Input Queries ──→ ├─── arXiv API (relevance sort)
└── Web UI (DAG + Tabs: Papers / Insights / Judge)
```

### DailyPaper SSE 流式管线

当启用 LLM 分析或 Judge 评分时,`/daily` 端点返回 SSE 流式响应,前端实时显示每个阶段的进度:

```text
Search → Build Report → LLM Enrichment → Judge Scoring → Filter → Save → Notify → Result
│ │ │ │ │
│ │ │ │ └─ 移除 skip/skim 论文
│ │ │ └─ 逐篇评分,实时推送 judge 事件
│ │ └─ 逐篇摘要 + 趋势分析 + 洞察
│ └─ 构建报告结构
└─ 多源检索 + 去重 + 评分
```

**Post-Judge 过滤**:Judge 评分完成后,自动移除推荐等级为 `skip` 和 `skim` 的论文,只保留 `must_read` 和 `worth_reading` 的论文。完整的 Judge 评分日志保留在 `report.filter.log` 中。

**前端配置持久化**:所有功能开关(LLM/Judge/数据源/邮箱等)默认全部启用,保存在浏览器 localStorage 中,刷新页面不会丢失。

## 界面预览

### Terminal UI(Ink)
Expand Down Expand Up @@ -114,6 +132,10 @@ Input Queries ──→ ├─── arXiv API (relevance sort)
|---------------|-----------------|
| ![Judge Cards](asset/ui/9-4.png) | ![Judge Radar](asset/ui/9-5.png) |

### Email 推送

![Email Notification](asset/notify.png)

## 快速开始

### 1) 安装
Expand Down Expand Up @@ -166,17 +188,34 @@ LLM_REASONING_MODEL=...
<details>
<summary>每日推送配置(点击展开)</summary>

```bash
# 通知渠道
PAPERBOT_NOTIFY_ENABLED=true
PAPERBOT_NOTIFY_CHANNELS=email,slack,dingding
DailyPaper 生成后可自动推送摘要到 Email/Slack/钉钉。有两种配置方式:

**方式一:Web UI 配置(推荐)**

在 Topic Workflow 页面的 Settings 面板中:
1. 勾选 "Email Notification"
2. 填入收件邮箱地址(如 `you@example.com`)
3. 运行 DailyPaper 时会自动在最后发送邮件

> UI 中填写的邮箱会覆盖环境变量中的 `PAPERBOT_NOTIFY_EMAIL_TO`。
> 所有配置项(LLM/Judge/数据源/邮箱等)会自动持久化到浏览器 localStorage,刷新页面不会丢失。

# Email (SMTP)
PAPERBOT_NOTIFY_SMTP_HOST=smtp.example.com
PAPERBOT_NOTIFY_SMTP_USERNAME=...
PAPERBOT_NOTIFY_SMTP_PASSWORD=...
PAPERBOT_NOTIFY_EMAIL_FROM=bot@example.com
PAPERBOT_NOTIFY_EMAIL_TO=you@example.com
**方式二:环境变量配置**

```bash
# 总开关
PAPERBOT_NOTIFY_ENABLED=true # 是否启用推送(必须为 true 才能发送)
PAPERBOT_NOTIFY_CHANNELS=email,slack # 启用的推送渠道(逗号分隔)

# Email (SMTP) — 必须配置才能发送邮件
PAPERBOT_NOTIFY_SMTP_HOST=smtp.qq.com # SMTP 服务器地址
PAPERBOT_NOTIFY_SMTP_PORT=587 # SMTP 端口(587=STARTTLS, 465=SSL)
PAPERBOT_NOTIFY_SMTP_USERNAME=your@qq.com # SMTP 登录用户名
PAPERBOT_NOTIFY_SMTP_PASSWORD=your-auth-code # SMTP 密码或授权码
PAPERBOT_NOTIFY_SMTP_USE_TLS=true # 是否使用 STARTTLS(端口 587 时为 true)
PAPERBOT_NOTIFY_SMTP_USE_SSL=false # 是否使用 SSL(端口 465 时为 true)
PAPERBOT_NOTIFY_EMAIL_FROM=your@qq.com # 发件人地址
PAPERBOT_NOTIFY_EMAIL_TO=recipient@example.com # 默认收件人(可被 UI 覆盖)

# Slack
PAPERBOT_NOTIFY_SLACK_WEBHOOK_URL=https://hooks.slack.com/...
Expand All @@ -185,14 +224,23 @@ PAPERBOT_NOTIFY_SLACK_WEBHOOK_URL=https://hooks.slack.com/...
PAPERBOT_NOTIFY_DINGTALK_WEBHOOK_URL=https://oapi.dingtalk.com/robot/send?access_token=...
PAPERBOT_NOTIFY_DINGTALK_SECRET=SEC...

# DailyPaper 定时任务
# DailyPaper 定时任务(ARQ Worker)
PAPERBOT_DAILYPAPER_ENABLED=true
PAPERBOT_DAILYPAPER_CRON_HOUR=8
PAPERBOT_DAILYPAPER_CRON_MINUTE=30
PAPERBOT_DAILYPAPER_NOTIFY_ENABLED=true
PAPERBOT_DAILYPAPER_NOTIFY_CHANNELS=email,slack
```

**QQ 邮箱配置示例:**
1. 登录 QQ 邮箱 → 设置 → 账户 → POP3/SMTP 服务 → 开启
2. 生成授权码(不是 QQ 密码)
3. 设置 `SMTP_HOST=smtp.qq.com`, `SMTP_PORT=587`, `SMTP_USE_TLS=true`

**Gmail 配置示例:**
1. Google 账号 → 安全性 → 两步验证 → 应用专用密码
2. 设置 `SMTP_HOST=smtp.gmail.com`, `SMTP_PORT=587`, `SMTP_USE_TLS=true`

</details>

### 3) 启动
Expand Down Expand Up @@ -229,7 +277,7 @@ arq paperbot.infrastructure.queue.arq_worker.WorkerSettings
| `/api/review` | POST | 深度评审(SSE) |
| `/api/chat` | POST | AI 对话(SSE) |
| `/api/research/paperscool/search` | POST | 主题检索(多源聚合,支持 `min_score` 过滤) |
| `/api/research/paperscool/daily` | POST | DailyPaper 日报(支持 `notify` 推送) |
| `/api/research/paperscool/daily` | POST | DailyPaper 日报(LLM/Judge 启用时返回 SSE 流式,否则 JSON;支持 `notify` 推送) |
| `/api/research/paperscool/analyze` | POST | Judge + Trend 流式分析(SSE) |
| `/api/research/tracks` | GET/POST | 研究方向管理 |
| `/api/research/memory/*` | GET/POST | 记忆系统(Inbox/审核/检索) |
Expand Down
89 changes: 89 additions & 0 deletions alembic/versions/0003_paper_registry.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
"""paper registry

Revision ID: 0003_paper_registry
Revises: 0002_research_eval_runs
Create Date: 2026-02-10

Adds canonical papers table for persistent DailyPaper ingestion.
"""

from __future__ import annotations

import sqlalchemy as sa
from alembic import context, op


revision = "0003_paper_registry"
down_revision = "0002_research_eval_runs"
branch_labels = None
depends_on = None


def _is_offline() -> bool:
try:
return bool(context.is_offline_mode())
except Exception:
return False


def _insp():
return sa.inspect(op.get_bind())


def _has_table(name: str) -> bool:
return _insp().has_table(name)


def _get_indexes(table: str) -> set[str]:
idx = set()
for i in _insp().get_indexes(table):
idx.add(str(i.get("name") or ""))
return idx


def _create_index(name: str, table: str, cols: list[str]) -> None:
if _is_offline():
op.create_index(name, table, cols)
return
if name in _get_indexes(table):
return
op.create_index(name, table, cols)


def upgrade() -> None:
if _is_offline() or not _has_table("papers"):
op.create_table(
"papers",
sa.Column("id", sa.Integer(), primary_key=True, autoincrement=True),
sa.Column("arxiv_id", sa.String(length=64), nullable=True),
sa.Column("doi", sa.String(length=128), nullable=True),
sa.Column("title", sa.Text(), server_default="", nullable=False),
sa.Column("authors_json", sa.Text(), server_default="[]", nullable=False),
sa.Column("abstract", sa.Text(), server_default="", nullable=False),
sa.Column("url", sa.String(length=512), server_default="", nullable=False),
sa.Column("external_url", sa.String(length=512), server_default="", nullable=False),
sa.Column("pdf_url", sa.String(length=512), server_default="", nullable=False),
sa.Column("source", sa.String(length=32), server_default="papers_cool", nullable=False),
sa.Column("venue", sa.String(length=256), server_default="", nullable=False),
sa.Column("published_at", sa.DateTime(timezone=True), nullable=True),
sa.Column("first_seen_at", sa.DateTime(timezone=True), nullable=False),
sa.Column("keywords_json", sa.Text(), server_default="[]", nullable=False),
sa.Column("metadata_json", sa.Text(), server_default="{}", nullable=False),
sa.Column("created_at", sa.DateTime(timezone=True), nullable=False),
sa.Column("updated_at", sa.DateTime(timezone=True), nullable=False),
sa.UniqueConstraint("arxiv_id", name="uq_papers_arxiv_id"),
sa.UniqueConstraint("doi", name="uq_papers_doi"),
)

_create_index("ix_papers_arxiv_id", "papers", ["arxiv_id"])
_create_index("ix_papers_doi", "papers", ["doi"])
_create_index("ix_papers_title", "papers", ["title"])
_create_index("ix_papers_source", "papers", ["source"])
_create_index("ix_papers_published_at", "papers", ["published_at"])
_create_index("ix_papers_first_seen_at", "papers", ["first_seen_at"])
_create_index("ix_papers_created_at", "papers", ["created_at"])
_create_index("ix_papers_updated_at", "papers", ["updated_at"])


def downgrade() -> None:
op.drop_table("papers")
112 changes: 112 additions & 0 deletions alembic/versions/0004_paper_feedback_judge_links.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
"""paper feedback/judge links

Revision ID: 0004_paper_feedback_judge_links
Revises: 0003_paper_registry
Create Date: 2026-02-10

Adds:
- paper_judge_scores table
- paper_feedback.paper_ref_id nullable FK-like reference column
"""

from __future__ import annotations

import sqlalchemy as sa
from alembic import context, op


revision = "0004_paper_feedback_judge_links"
down_revision = "0003_paper_registry"
branch_labels = None
depends_on = None


def _is_offline() -> bool:
try:
return bool(context.is_offline_mode())
except Exception:
return False


def _insp():
return sa.inspect(op.get_bind())


def _has_table(name: str) -> bool:
return _insp().has_table(name)


def _get_columns(table: str) -> set[str]:
cols = set()
for c in _insp().get_columns(table):
cols.add(str(c.get("name") or ""))
return cols


def _get_indexes(table: str) -> set[str]:
idx = set()
for i in _insp().get_indexes(table):
idx.add(str(i.get("name") or ""))
return idx


def _create_index(name: str, table: str, cols: list[str]) -> None:
if _is_offline():
op.create_index(name, table, cols)
return
if name in _get_indexes(table):
return
op.create_index(name, table, cols)


def upgrade() -> None:
if _is_offline() or not _has_table("paper_judge_scores"):
op.create_table(
"paper_judge_scores",
sa.Column("id", sa.Integer(), primary_key=True, autoincrement=True),
sa.Column("paper_id", sa.Integer(), sa.ForeignKey("papers.id"), nullable=False),
sa.Column("query", sa.String(length=256), server_default="", nullable=False),
sa.Column("overall", sa.Float(), server_default="0.0", nullable=False),
sa.Column("relevance", sa.Float(), server_default="0.0", nullable=False),
sa.Column("novelty", sa.Float(), server_default="0.0", nullable=False),
sa.Column("rigor", sa.Float(), server_default="0.0", nullable=False),
sa.Column("impact", sa.Float(), server_default="0.0", nullable=False),
sa.Column("clarity", sa.Float(), server_default="0.0", nullable=False),
sa.Column("recommendation", sa.String(length=32), server_default="", nullable=False),
sa.Column("one_line_summary", sa.Text(), server_default="", nullable=False),
sa.Column("judge_model", sa.String(length=128), server_default="", nullable=False),
sa.Column("judge_cost_tier", sa.Integer(), nullable=True),
sa.Column("scored_at", sa.DateTime(timezone=True), nullable=False),
sa.Column("metadata_json", sa.Text(), server_default="{}", nullable=False),
sa.UniqueConstraint("paper_id", "query", name="uq_paper_judge_scores_paper_query"),
)

_create_index("ix_paper_judge_scores_paper_id", "paper_judge_scores", ["paper_id"])
_create_index("ix_paper_judge_scores_query", "paper_judge_scores", ["query"])
_create_index("ix_paper_judge_scores_recommendation", "paper_judge_scores", ["recommendation"])
_create_index("ix_paper_judge_scores_scored_at", "paper_judge_scores", ["scored_at"])

if _is_offline():
op.add_column("paper_feedback", sa.Column("paper_ref_id", sa.Integer(), nullable=True))
op.create_index("ix_paper_feedback_paper_ref_id", "paper_feedback", ["paper_ref_id"])
return

if "paper_ref_id" not in _get_columns("paper_feedback"):
with op.batch_alter_table("paper_feedback") as batch_op:
batch_op.add_column(sa.Column("paper_ref_id", sa.Integer(), nullable=True))

_create_index("ix_paper_feedback_paper_ref_id", "paper_feedback", ["paper_ref_id"])


def downgrade() -> None:
with op.batch_alter_table("paper_feedback") as batch_op:
try:
batch_op.drop_index("ix_paper_feedback_paper_ref_id")
except Exception:
pass
try:
batch_op.drop_column("paper_ref_id")
except Exception:
pass

op.drop_table("paper_judge_scores")
Loading
Loading