Skip to content

sness23/scholarmd_vault

Repository files navigation

ScholarMD - Google Scholar Knowledge Graph in Obsidian

A comprehensive conversion system that transforms Google Scholar author data into a richly interconnected Obsidian knowledge graph, creating a navigable web of academic research, researchers, and their relationships.

Overview

ScholarMD converts ~67,000 Google Scholar JSON profiles into beautifully formatted Obsidian markdown files with:

  • Scholar profiles with full metadata, publications, and metrics
  • Research interest pages automatically generated and linked
  • Co-author relationships with bidirectional wikilinks
  • Citation metrics including h-index, i10-index, and temporal graphs
  • Publication details with citation counts and links

Project Structure

scholarmd/
├── convert_scholar_to_markdown.py   # Main conversion script
├── test_convert.py                  # Fast single-author testing
├── rename_by_author.py              # Slug-based file renaming
├── scholars/                        # All 67k scholar profiles (by author_id)
├── scholars0/                       # Renamed profiles (by name)
├── scholar1/                        # Test output directory
├── research/                        # Research interest topic pages
├── research1/                       # Test research pages
├── docs/                           # Documentation
│   └── PHILOSOPHY.md               # Design philosophy and vision
└── README.md                       # This file

Source data:
../scholar/                         # 67k JSON files from Google Scholar API

Quick Start

Test with a Single Author

Fast iteration for development:

python3 test_convert.py

This processes only Steven Randolph Ness (uU-cWxEAAAAJ) and outputs to scholar1/ and research1/.

Full Conversion

Process all 67,000 authors:

python3 convert_scholar_to_markdown.py

Output goes to scholars/ (files named by author_id) and research/.

Rename Files by Author Name

Convert author_id filenames to human-readable names:

python3 rename_by_author.py

Reads from scholars/, outputs to scholars0/ with filenames like steven_randolph_ness.md.

Features

Scholar Profiles

Each scholar gets a markdown file with:

Frontmatter:

Content Sections:

  • Profile: Contact information and links
  • Citation Metrics: Total citations, h-index, i10-index (all-time and since 2019)
  • Research Interests: Wikilinks to topic pages (e.g., [[research/machine_learning]])
  • Related Authors: Wikilinks to co-authors (e.g., [[scholars/steven_randolph_ness]])
  • Publications: Full publication list with citations and links

Research Interest Pages

Automatically generated topic pages for all research interests:

---
title: Machine Learning
tags:
  - research
  - interest
---

# Machine Learning

Research interest in machine learning.

## Researchers

This topic is studied by researchers in this database.

Currently ~45,000 unique research topics identified!

Wikilink Strategy

All internal links use the Obsidian wikilink format with prefixes to avoid namespace collisions:

  • [[scholars/name]] - Links to scholar profiles
  • [[research/topic]] - Links to research interest pages

This allows seamless integration with other Obsidian vaults containing different types of linked information.

Data Extraction

Currently Extracted

From Google Scholar JSON:

  • ✅ Author name, affiliations, email, website
  • ✅ Research interests
  • ✅ Co-authors with author IDs
  • ✅ Publications (title, authors, publication, year, citations)
  • ✅ Citation metrics (total citations, h-index, i10-index)
  • ✅ Profile thumbnail URL

Not Yet Extracted

  • ⏳ Citation graph (yearly citation counts)
  • ⏳ Public access publication counts
  • ⏳ Individual co-author links in publications (planned)

Name Collision Handling

The system intelligently handles duplicate names:

steven_jones.md                     # First Steven Jones
steven_jones_8iw6kd4aaaaj.md       # Second Steven Jones (author_id appended)

This ensures:

  • All scholars are preserved (no overwrites)
  • Readable filenames when possible
  • Unique identifiers when needed

Development Workflow

  1. Modify conversion script: Edit convert_scholar_to_markdown.py
  2. Test quickly: Run python3 test_convert.py (processes 1 author in ~10 seconds)
  3. Verify output: Check scholar1/steven_randolph_ness.md
  4. Iterate: Repeat steps 1-3 until perfect
  5. Full conversion: Run python3 convert_scholar_to_markdown.py when ready

This approach ensures zero errors in the final 67k-file conversion.

Performance

  • Single author test: ~10 seconds (builds mapping of all 67k for accurate co-author links)
  • Full conversion: ~5-10 minutes for all 67,000 authors

The conversion uses a 2-pass system:

  1. Pass 1: Build author_id → slug mapping for all authors
  2. Pass 2: Convert files with accurate co-author wikilinks

Future Enhancements

See docs/PHILOSOPHY.md for the long-term vision. Planned features:

  • Link co-authors within publications
  • Extract citation graphs and visualize trends
  • Add public access indicators
  • Cross-reference publications between authors
  • Generate citation networks
  • Add research topic hierarchies
  • Integration with external knowledge bases

Data Source

The source data (../scholar/*.json) contains ~67,000 Google Scholar author profiles retrieved via the SerpAPI service.

Each JSON file includes:

  • Complete author profile
  • Full publication list (up to 1000 publications)
  • Co-author information
  • Citation metrics
  • Research interests

Requirements

  • Python 3.9+
  • Standard library only (no external dependencies!)

License

Data is sourced from Google Scholar. This tool is for personal knowledge management and research purposes.

Contributing

This is a personal knowledge base project. The conversion scripts are designed to be:

  • Idempotent: Can be run multiple times safely
  • Fast to test: Single-author testing for rapid iteration
  • Zero-error: Extensive validation to ensure perfect conversion
  • Well-documented: Clear philosophy and architecture

See docs/PHILOSOPHY.md for design principles.

Links


Generated as part of a comprehensive academic knowledge graph project.

About

scholarmd_vault

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published