A comprehensive conversion system that transforms Google Scholar author data into a richly interconnected Obsidian knowledge graph, creating a navigable web of academic research, researchers, and their relationships.
ScholarMD converts ~67,000 Google Scholar JSON profiles into beautifully formatted Obsidian markdown files with:
- Scholar profiles with full metadata, publications, and metrics
- Research interest pages automatically generated and linked
- Co-author relationships with bidirectional wikilinks
- Citation metrics including h-index, i10-index, and temporal graphs
- Publication details with citation counts and links
scholarmd/
├── convert_scholar_to_markdown.py # Main conversion script
├── test_convert.py # Fast single-author testing
├── rename_by_author.py # Slug-based file renaming
├── scholars/ # All 67k scholar profiles (by author_id)
├── scholars0/ # Renamed profiles (by name)
├── scholar1/ # Test output directory
├── research/ # Research interest topic pages
├── research1/ # Test research pages
├── docs/ # Documentation
│ └── PHILOSOPHY.md # Design philosophy and vision
└── README.md # This file
Source data:
../scholar/ # 67k JSON files from Google Scholar API
Fast iteration for development:
python3 test_convert.pyThis processes only Steven Randolph Ness (uU-cWxEAAAAJ) and outputs to scholar1/ and research1/.
Process all 67,000 authors:
python3 convert_scholar_to_markdown.pyOutput goes to scholars/ (files named by author_id) and research/.
Convert author_id filenames to human-readable names:
python3 rename_by_author.pyReads from scholars/, outputs to scholars0/ with filenames like steven_randolph_ness.md.
Each scholar gets a markdown file with:
Frontmatter:
- Author ID
- Name
- Affiliations
- Website
- Total articles count
- Scholar URL (https://scholar.doi.bio/{author_id})
- Tags (scholar, researcher)
Content Sections:
- Profile: Contact information and links
- Citation Metrics: Total citations, h-index, i10-index (all-time and since 2019)
- Research Interests: Wikilinks to topic pages (e.g.,
[[research/machine_learning]]) - Related Authors: Wikilinks to co-authors (e.g.,
[[scholars/steven_randolph_ness]]) - Publications: Full publication list with citations and links
Automatically generated topic pages for all research interests:
---
title: Machine Learning
tags:
- research
- interest
---
# Machine Learning
Research interest in machine learning.
## Researchers
This topic is studied by researchers in this database.Currently ~45,000 unique research topics identified!
All internal links use the Obsidian wikilink format with prefixes to avoid namespace collisions:
[[scholars/name]]- Links to scholar profiles[[research/topic]]- Links to research interest pages
This allows seamless integration with other Obsidian vaults containing different types of linked information.
From Google Scholar JSON:
- ✅ Author name, affiliations, email, website
- ✅ Research interests
- ✅ Co-authors with author IDs
- ✅ Publications (title, authors, publication, year, citations)
- ✅ Citation metrics (total citations, h-index, i10-index)
- ✅ Profile thumbnail URL
- ⏳ Citation graph (yearly citation counts)
- ⏳ Public access publication counts
- ⏳ Individual co-author links in publications (planned)
The system intelligently handles duplicate names:
steven_jones.md # First Steven Jones
steven_jones_8iw6kd4aaaaj.md # Second Steven Jones (author_id appended)
This ensures:
- All scholars are preserved (no overwrites)
- Readable filenames when possible
- Unique identifiers when needed
- Modify conversion script: Edit
convert_scholar_to_markdown.py - Test quickly: Run
python3 test_convert.py(processes 1 author in ~10 seconds) - Verify output: Check
scholar1/steven_randolph_ness.md - Iterate: Repeat steps 1-3 until perfect
- Full conversion: Run
python3 convert_scholar_to_markdown.pywhen ready
This approach ensures zero errors in the final 67k-file conversion.
- Single author test: ~10 seconds (builds mapping of all 67k for accurate co-author links)
- Full conversion: ~5-10 minutes for all 67,000 authors
The conversion uses a 2-pass system:
- Pass 1: Build author_id → slug mapping for all authors
- Pass 2: Convert files with accurate co-author wikilinks
See docs/PHILOSOPHY.md for the long-term vision. Planned features:
- Link co-authors within publications
- Extract citation graphs and visualize trends
- Add public access indicators
- Cross-reference publications between authors
- Generate citation networks
- Add research topic hierarchies
- Integration with external knowledge bases
The source data (../scholar/*.json) contains ~67,000 Google Scholar author profiles retrieved via the SerpAPI service.
Each JSON file includes:
- Complete author profile
- Full publication list (up to 1000 publications)
- Co-author information
- Citation metrics
- Research interests
- Python 3.9+
- Standard library only (no external dependencies!)
Data is sourced from Google Scholar. This tool is for personal knowledge management and research purposes.
This is a personal knowledge base project. The conversion scripts are designed to be:
- Idempotent: Can be run multiple times safely
- Fast to test: Single-author testing for rapid iteration
- Zero-error: Extensive validation to ensure perfect conversion
- Well-documented: Clear philosophy and architecture
See docs/PHILOSOPHY.md for design principles.
- Scholar Profile Example: https://scholar.doi.bio/uU-cWxEAAAAJ
- Obsidian: https://obsidian.md
- Google Scholar: https://scholar.google.com
Generated as part of a comprehensive academic knowledge graph project.