Skip to content

Claude Code skill that searches the last 30 days of scientific literature across OpenAlex, Semantic Scholar, PubMed, arXiv, and HuggingFace Hub

License

Notifications You must be signed in to change notification settings

shandley/research30

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

research30

A skill for Claude Code (Anthropic's CLI coding agent) that searches the last 30 days of scientific literature and synthesizes findings. No API keys required.

/research30 CRISPR gene editing
/research30 microplastics health effects
/research30 large language model alignment

Claude searches 5 academic databases in parallel, scores and deduplicates results, then synthesizes key findings, trends, and gaps from the top 25 papers.

Quick Start

Prerequisites: Claude Code and Python 3.8+ (no pip packages needed — stdlib only)

1. Clone the repo:

git clone https://github.com/shandley/research30.git
cd research30

2. Install the skill (run from the repo root):

mkdir -p ~/.claude/skills
ln -s "$(pwd)/research30" ~/.claude/skills/research30

This creates a symlink so the skill stays in sync with the repo. The $(pwd) must expand to the repo root (the directory containing research30/, README.md, etc.).

3. Use it:

Start a new Claude Code session (skills are loaded at startup), then:

/research30 your topic here

This works from any directory — the skill is globally installed.

Standalone (no Claude Code): You can also run the script directly from the repo root:

python3 research30/scripts/research30.py "your topic here"

This outputs the raw ranked list. Claude Code adds the synthesis layer on top.

What It Searches

Source What Coverage
OpenAlex Journals, preprints, conference papers 250M+ works, topic-augmented full-text search
Semantic Scholar Embedding-based semantic search Conceptual matching beyond keywords (needs free API key)
PubMed Peer-reviewed journal articles TIAB field-tagged queries with MeSH metadata
arXiv Physics, math, CS, q-bio preprints Atom API keyword search
HuggingFace ML models, datasets, daily papers Hub API

Additional sources available via --sources=biorxiv or --sources=medrxiv (slower, direct API pagination).

Usage

python3 research30/scripts/research30.py <topic> [options]
Flag Description
--quick Faster: fewer API results, shows top 10
--deep Thorough: more API results, shows top 50
--refresh Bypass 24h cache, fetch fresh results
--sources=MODE all (default), preprints, pubmed, huggingface, openalex, semanticscholar, biorxiv, medrxiv, arxiv
--emit=MODE compact (default), html, json, md, context, path
--debug Verbose HTTP logs to stderr
--mock Use bundled fixtures (for testing)

Examples

# Default: top 25 results from all sources
python3 research30/scripts/research30.py "CRISPR gene editing"

# Quick scan — top 10
python3 research30/scripts/research30.py "protein folding" --quick

# Deep dive — top 50
python3 research30/scripts/research30.py "single-cell transcriptomics" --deep

# PubMed only
python3 research30/scripts/research30.py "Alzheimer's disease biomarkers" --sources=pubmed

# HTML report (opens in browser)
python3 research30/scripts/research30.py "protein folding" --emit=html

# JSON output for programmatic use
python3 research30/scripts/research30.py "protein folding" --emit=json

Optional: API Keys

Both keys are optional. The skill works without them.

Semantic Scholar (recommended — adds semantic/embedding search):

# Get a free key at https://www.semanticscholar.org/product/api#api-key-form
mkdir -p ~/.config/research30
echo 'S2_API_KEY=your_key_here' >> ~/.config/research30/.env
chmod 600 ~/.config/research30/.env

NCBI (faster PubMed — 10 req/s instead of 3):

# Get a free key at https://www.ncbi.nlm.nih.gov/account/settings/
echo 'NCBI_API_KEY=your_key_here' >> ~/.config/research30/.env

Output

See an example HTML report (download and open in your browser).

Default output is a flat ranked list sorted by score (0-100). Each item includes score, title, source, date, URL, metadata, abstract snippet (first 200 chars), and relevance explanation. When used as a Claude Code skill, Claude reads these and synthesizes key findings, research fronts, methods, and gaps.

Script outputs (written on every run to ~/.local/share/research30/out/):

  • report.html — interactive HTML with score badges, source tags, collapsible abstracts
  • report.json — all results (not just top N)
  • report.md — formatted markdown
  • context.md — condensed context snippet

Skill outputs (created by Claude when invoked via /research30):

  • ~/.local/share/research30/reports/{topic}-{date}.md — full synthesis report
  • ~/.local/share/research30/research-log.md — cumulative research journal

Results are cached for 24 hours at ~/.cache/research30/. Use --refresh to bypass.

How Scoring Works

Each result is scored 0-100:

Papers (50% relevance + 25% recency + 25% academic signal):

  • Keyword relevance: title match 2x weighted, bigram matching for multi-word queries, position boost for OpenAlex/S2 results
  • Recency: newer = higher
  • Academic: peer review, citations, journal, author count

HuggingFace (45% relevance + 25% recency + 30% engagement):

  • Downloads and likes as engagement signal

Cross-source deduplication uses DOI matching + Jaccard title similarity. Priority: PubMed > S2 > OpenAlex > bioRxiv > medRxiv > arXiv > HuggingFace.

How Search Works

OpenAlex discovers relevant topic clusters first (e.g., "virome" maps to topic T11048), then searches within them — combining ML-classified topics with keyword ranking.

PubMed uses explicit [TIAB] field tags instead of Automatic Term Mapping (which can misfire). Multi-word queries get both exact-phrase and AND-combination: ("CRISPR gene editing"[TIAB] OR (CRISPR[TIAB] AND gene[TIAB] AND editing[TIAB])). MeSH terms are extracted as metadata.

Semantic Scholar provides embedding-based search — catches papers that use different terminology for the same concept. Post-filtered with a higher relevance threshold (0.3 vs 0.1).

Development

Running Tests

cd research30
python3 -m pytest tests/ -v

All tests use bundled fixtures and make no network requests.

Project Structure

research30/
  SKILL.md              -- Claude Code skill manifest
  scripts/
    research30.py       -- Main orchestrator
    lib/
      openalex.py       -- OpenAlex API (topic discovery + works search)
      semanticscholar.py -- Semantic Scholar API (semantic search)
      pubmed.py         -- PubMed E-utilities (TIAB queries, MeSH)
      arxiv.py          -- arXiv Atom API
      huggingface.py    -- HuggingFace Hub API
      biorxiv.py        -- bioRxiv/medRxiv API (parallel pagination)
      schema.py         -- Data models
      normalize.py      -- Schema conversion + keyword relevance
      score.py          -- Academic-signal scoring
      dedupe.py         -- Cross-source deduplication
      render.py         -- Output formatting
      cache.py          -- 24-hour result caching
      http.py           -- HTTP client (stdlib only)
      xml_parse.py      -- XML parsers (arXiv Atom, PubMed)
      dates.py          -- Date utilities
      env.py            -- Configuration loading
      ui.py             -- Terminal progress display
  tests/                -- Unit tests (103 tests)
  fixtures/             -- Mock API responses

License

MIT

About

Claude Code skill that searches the last 30 days of scientific literature across OpenAlex, Semantic Scholar, PubMed, arXiv, and HuggingFace Hub

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages