ScholarMD - Google Scholar Knowledge Graph in Obsidian

A comprehensive conversion system that transforms Google Scholar author data into a richly interconnected Obsidian knowledge graph, creating a navigable web of academic research, researchers, and their relationships.

Overview

ScholarMD converts ~67,000 Google Scholar JSON profiles into beautifully formatted Obsidian markdown files with:

Scholar profiles with full metadata, publications, and metrics
Research interest pages automatically generated and linked
Co-author relationships with bidirectional wikilinks
Citation metrics including h-index, i10-index, and temporal graphs
Publication details with citation counts and links

Project Structure

scholarmd/
├── convert_scholar_to_markdown.py   # Main conversion script
├── test_convert.py                  # Fast single-author testing
├── rename_by_author.py              # Slug-based file renaming
├── scholars/                        # All 67k scholar profiles (by author_id)
├── scholars0/                       # Renamed profiles (by name)
├── scholar1/                        # Test output directory
├── research/                        # Research interest topic pages
├── research1/                       # Test research pages
├── docs/                           # Documentation
│   └── PHILOSOPHY.md               # Design philosophy and vision
└── README.md                       # This file

Source data:
../scholar/                         # 67k JSON files from Google Scholar API

Quick Start

Test with a Single Author

Fast iteration for development:

python3 test_convert.py

This processes only Steven Randolph Ness (uU-cWxEAAAAJ) and outputs to scholar1/ and research1/.

Full Conversion

Process all 67,000 authors:

python3 convert_scholar_to_markdown.py

Output goes to scholars/ (files named by author_id) and research/.

Rename Files by Author Name

Convert author_id filenames to human-readable names:

python3 rename_by_author.py

Reads from scholars/, outputs to scholars0/ with filenames like steven_randolph_ness.md.

Features

Scholar Profiles

Each scholar gets a markdown file with:

Frontmatter:

Author ID
Name
Affiliations
Email
Website
Total articles count
Scholar URL (https://scholar.doi.bio/{author_id})
Tags (scholar, researcher)

Content Sections:

Profile: Contact information and links
Citation Metrics: Total citations, h-index, i10-index (all-time and since 2019)
Research Interests: Wikilinks to topic pages (e.g., [[research/machine_learning]])
Related Authors: Wikilinks to co-authors (e.g., [[scholars/steven_randolph_ness]])
Publications: Full publication list with citations and links

Research Interest Pages

Automatically generated topic pages for all research interests:

---
title: Machine Learning
tags:
  - research
  - interest
---

# Machine Learning

Research interest in machine learning.

## Researchers

This topic is studied by researchers in this database.

Currently ~45,000 unique research topics identified!

Wikilink Strategy

All internal links use the Obsidian wikilink format with prefixes to avoid namespace collisions:

[[scholars/name]] - Links to scholar profiles
[[research/topic]] - Links to research interest pages

This allows seamless integration with other Obsidian vaults containing different types of linked information.

Data Extraction

Currently Extracted

From Google Scholar JSON:

✅ Author name, affiliations, email, website
✅ Research interests
✅ Co-authors with author IDs
✅ Publications (title, authors, publication, year, citations)
✅ Citation metrics (total citations, h-index, i10-index)
✅ Profile thumbnail URL

Not Yet Extracted

⏳ Citation graph (yearly citation counts)
⏳ Public access publication counts
⏳ Individual co-author links in publications (planned)

Name Collision Handling

The system intelligently handles duplicate names:

steven_jones.md                     # First Steven Jones
steven_jones_8iw6kd4aaaaj.md       # Second Steven Jones (author_id appended)

This ensures:

All scholars are preserved (no overwrites)
Readable filenames when possible
Unique identifiers when needed

Development Workflow

Modify conversion script: Edit convert_scholar_to_markdown.py
Test quickly: Run python3 test_convert.py (processes 1 author in ~10 seconds)
Verify output: Check scholar1/steven_randolph_ness.md
Iterate: Repeat steps 1-3 until perfect
Full conversion: Run python3 convert_scholar_to_markdown.py when ready

This approach ensures zero errors in the final 67k-file conversion.

Performance

Single author test: ~10 seconds (builds mapping of all 67k for accurate co-author links)
Full conversion: ~5-10 minutes for all 67,000 authors

The conversion uses a 2-pass system:

Pass 1: Build author_id → slug mapping for all authors
Pass 2: Convert files with accurate co-author wikilinks

Future Enhancements

See docs/PHILOSOPHY.md for the long-term vision. Planned features:

Link co-authors within publications
Extract citation graphs and visualize trends
Add public access indicators
Cross-reference publications between authors
Generate citation networks
Add research topic hierarchies
Integration with external knowledge bases

Data Source

The source data (../scholar/*.json) contains ~67,000 Google Scholar author profiles retrieved via the SerpAPI service.

Each JSON file includes:

Complete author profile
Full publication list (up to 1000 publications)
Co-author information
Citation metrics
Research interests

Requirements

Python 3.9+
Standard library only (no external dependencies!)

License

Data is sourced from Google Scholar. This tool is for personal knowledge management and research purposes.

Contributing

This is a personal knowledge base project. The conversion scripts are designed to be:

Idempotent: Can be run multiple times safely
Fast to test: Single-author testing for rapid iteration
Zero-error: Extensive validation to ensure perfect conversion
Well-documented: Clear philosophy and architecture

See docs/PHILOSOPHY.md for design principles.

Links

Scholar Profile Example: https://scholar.doi.bio/uU-cWxEAAAAJ
Obsidian: https://obsidian.md
Google Scholar: https://scholar.google.com

Generated as part of a comprehensive academic knowledge graph project.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.obsidian		.obsidian
__pycache__		__pycache__
docs		docs
research		research
scholarmd-app		scholarmd-app
scholarmd-viz		scholarmd-viz
scholars		scholars
README.md		README.md
convert_scholar_to_markdown.py		convert_scholar_to_markdown.py
cosmograph.html		cosmograph.html
cosmograph_data.json		cosmograph_data.json
generate_cosmograph_data.py		generate_cosmograph_data.py
rename_by_author.py		rename_by_author.py
test_convert.py		test_convert.py
update_research_backlinks.py		update_research_backlinks.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ScholarMD - Google Scholar Knowledge Graph in Obsidian

Overview

Project Structure

Quick Start

Test with a Single Author

Full Conversion

Rename Files by Author Name

Features

Scholar Profiles

Research Interest Pages

Wikilink Strategy

Data Extraction

Currently Extracted

Not Yet Extracted

Name Collision Handling

Development Workflow

Performance

Future Enhancements

Data Source

Requirements

License

Contributing

Links

About

Uh oh!

Releases

Packages

Languages

sness23/scholarmd_vault

Folders and files

Latest commit

History

Repository files navigation

ScholarMD - Google Scholar Knowledge Graph in Obsidian

Overview

Project Structure

Quick Start

Test with a Single Author

Full Conversion

Rename Files by Author Name

Features

Scholar Profiles

Research Interest Pages

Wikilink Strategy

Data Extraction

Currently Extracted

Not Yet Extracted

Name Collision Handling

Development Workflow

Performance

Future Enhancements

Data Source

Requirements

License

Contributing

Links

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages