Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,2 +1,3 @@
generated_markdown/**
generated_indices/**
*.pyc
34 changes: 34 additions & 0 deletions Readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ These were created by using OCR to extract the text from the book, then manually
The following tasks are things I would consider useful for others, and would love help with.
* [x] Convert word lists into a machine friendly format, probably JSON.
* [x] Apply unicode normalization to NFD to both markdown and JSON formats.
* [x] Create a master word index showing first occurrence by chapter.
* [ ] Add line number and word index information for the location of the word in the book.
* [ ] Macronize vocab list.

Expand All @@ -14,5 +15,38 @@ To use the lists effectively, I recommend finding a tool that lets you perform d

At the top of every file is the page numbers for the exercises of that chapter.

## Scripts

### create_word_index.py
Creates master alphabetical indexes of all vocabulary words across all chapters, showing the chapter number where each word first appears. Sorting is case-insensitive with proper Unicode normalization.

**Usage:**
```bash
# Create regular alphabetical index (all formats: JSON, Markdown, HTML)
python3 create_word_index.py

# Include both regular and sectioned indexes
python3 create_word_index.py --include-sectioned

# Create only the sectioned index (organized by grammatical sections)
python3 create_word_index.py --sectioned-only

# HTML only with custom page density
python3 create_word_index.py --format html --entries-per-page 90

# All formats with custom output filenames
python3 create_word_index.py --include-sectioned \
--json-output my_index.json \
--section-json-output my_index_by_section.json
```

**Output:**
- `word_index.json` — Flat alphabetical index
- `word_index.md` — Flat alphabetical markdown table
- `word_index.html` — Print-optimized HTML with 3-column pagination (102 entries/page)
- `word_index_by_section.json` — Index organized by grammatical sections (14 total)
- `word_index_by_section.md` — Sectioned markdown with sections as headers
- `word_index_by_section.html` — Sectioned HTML with responsive column layout (adapts to browser width, up to 4 columns), table of contents with anchor links, and adaptive font sizing for readability

# Copyright & License
The copyright of the word lists remain with the original authors, and if they dislike my public reproduction of their lists then I am fully willing to take this repo down. All code and other novel material in this repository is licensed under the terms of the MIT license.
Loading