Skip to content

feat(cli): citation graph expansion via papi expand #48

@hummat

Description

@hummat

Problem or Motivation

When extending a related work section, finding relevant papers is manual and ad-hoc. A common strategy is citation graph traversal: for each paper you already cite, find influential papers it references and papers that cite it. This is tedious to do manually across many seed papers.

Tools like ResearchRabbit and Connected Papers do this via web UI, but there's no CLI option for:

  • Scripted/reproducible literature expansion
  • Integration with existing paper management workflows
  • Programmatic filtering (by year, venue, relevance)

Proposed Solution

Add papi expand command that traverses the citation graph via Semantic Scholar or OpenAlex APIs:

# Expand from papers in the database
papi expand [papers...] [--top-k=5] [--influential] [--since=2020] [--output=bibtex|json|table]

# Examples
papi expand if-net pointnet          # expand from specific papers
papi expand --all --top-k=3          # expand from entire database
papi expand nerf --influential       # only "highly influential" citations (S2 flag)
papi expand --since=2022 --output=bibtex >> candidates.bib

Core workflow:

  1. Resolve paper names → Semantic Scholar IDs (via DOI, arXiv ID, or title search)
  2. Fetch references (backward) and citations (forward) for each seed
  3. Filter by isInfluential flag, citation count, year
  4. Aggregate across seeds, deduplicate, rank by frequency + citation count
  5. Output with title, year, abstract snippet, citation count for triage

Research: Available Tools & APIs

APIs with Programmatic Access

Service Rate Limits Key Features Cost
Semantic Scholar 1 RPS (higher with free key) isInfluential flag, 225M papers, SPECTER embeddings Free
OpenAlex 100k/day referenced_works, cited_by_api_url, related_works, venue filtering Free
AI2 Asta MCP Higher with key Full-text search, get_citations, S2 wrapper Free
Connected Papers 5 builds/min Similarity graph (not citation), visual clustering Email for key

Python Libraries

Library Backend Install
semanticscholar Semantic Scholar pip install semanticscholar
pyalex OpenAlex pip install pyalex
connectedpapers-py Connected Papers pip install connectedpapers-py

Web-Only Tools (No Public API)

Tool Notes
ResearchRabbit Best UX, no API, BibTeX export only
Inciteful PageRank-based discovery, Literature Connector for bridging domains
Litmaps Semantic + citation hybrid, ChatGPT plugin exists
Citation Gecko Open source, Zotero integration, uses OpenCitations

MCP Servers

Server Endpoint Notable
AI2 Asta https://asta-tools.allen.ai/mcp/v1 Official S2, full-text snippet search
S2 Graph API MCP Self-hosted 10+ tools, recommendations from examples

Implementation Recommendation

Primary backend: Semantic Scholar

  • isInfluential flag is the killer feature—filters out superficial citations
  • Python library (semanticscholar) handles pagination, async, typing
  • Free, well-documented, 225M+ papers

OpenAlex as fallback/complement

  • No rate limits (100k/day vs 1 RPS)
  • Better venue/institution filtering
  • related_works uses algorithmic similarity, not just citations

Key differentiator from web tools

  • Scriptable + reproducible (version-controlled expansion)
  • Filters web tools lack: by venue, year range, influential-only
  • Direct output to paperpipe DB or BibTeX

Alternatives Considered

  • Web tools only (ResearchRabbit, Litmaps): Good for exploration but not scriptable or integrated with paperpipe workflow
  • Standalone CLI tool: Would duplicate paper resolution logic already in paperpipe
  • MCP-only approach: AI2 Asta MCP already wraps S2—but CLI tool is more universally accessible
  • Manual search: Current approach—works but doesn't scale

Area

CLI commands

Contribution

  • I would be willing to submit a PR for this feature

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions