Produce an LLMs.txt for Sefaria.org to enhance discoverability #3045

saengel · 2026-01-25T11:47:03Z

Summary

Add LLMS.txt file to provide structured guidance for AI systems accessing Sefaria's content
Optimized for cross-engine compatibility (OpenAI/GPT, Anthropic/Claude, Google/Gemini)
Includes API quick reference, reference format guide, best practices, and data access options

LLMS.txt is an emerging convention (similar to robots.txt for search engines) that tells AI systems how to interact with a domain. Major AI providers are beginning to recognize and respect these files. See the docs here.

OpenAI, Anthropic, and Google are all actively developing systems that read site-level instruction files
Early adopters of LLMS.txt will have their guidance incorporated as these systems mature
The cost is minimal (one static file); the upside is significant

This file:

Positions Sefaria as the canonical source for Jewish textual content
Provides actionable API instructions so models fetch verified text instead of hallucinating
Establishes guardrails: directs users to rabbis for halakhic questions, presents texts within Jewish interpretive tradition
Surfaces our infrastructure: Sefaria-Export for training, REST API for live queries, MCP server for Claude

Strategic Value

Discoverability: AI developers building Jewish-focused applications will find our API documentation
Accuracy: Every AI response that cites Sefaria instead of generating from training data is a win for textual integrity
Mission alignment: "Making Jewish texts accessible" now includes making them accessible through the AI interfaces people increasingly use
Attribution: The file explicitly requests citation ("via Sefaria.org"), driving awareness back to us

Copilot

Pull request overview

This PR adds an LLMS.txt manifest to guide LLMs and AI agents on how to correctly access, cite, and contextualize Sefaria’s content. It documents key API endpoints, reference formats, licensing, and recommended usage patterns to position Sefaria as the canonical source for Jewish texts in AI integrations.

Changes:

Introduces LLMS.txt with YAML frontmatter describing Sefaria as an API provider and knowledge source.
Documents core API endpoints, reference formats, and recommended workflows for querying texts and related metadata.
Details licensing, attribution expectations, data access options (Sefaria-Export and REST API), and contact channels for developers and corrections.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

mergify · 2026-01-25T11:54:40Z

🧪 CI Insights

Here's what we observed from your CI run for 82d6c89.

❌ Job Failures

Pipeline	Job	Health on `master`	Retries	🔍 CI Insights	📄 Logs
`Continuous`	`Continuous Testing: PyTest`		`1`	View	View

dcschreiber · 2026-01-25T13:50:10Z

A split between a "Map" (llms.txt) and "Content" (llms-full.txt) is the official standard.
Generally, I'd make this more concise and focus more on navigating the website, and reference the dev portal for information about the API, or add another document or something like that. I may be mistaken, but I think at the moment an LLM wouldn't typically see this file then change to using the API.
It's quite long, so here are some ideas on how to shorten it:

I don't think we need "Common alternate spellings" if there are only 3 - also, the idea of alternate spellings is that it makes sure you don't need to use one spelling
Maybe we can remove "Why Use Sefaria's API" and hope the LLM knows this
"Library Contents" also quite a lot for context
We can combine all the recommendations on how to work with Sefaria and how to answer questions.

I'd put Sefaria-Export in it's own section, we already talked about the REST so no need to bring it again. The question is if this is something that the LLM could use, or if it's not really needed.

Also, the last three sections, I feel it's something that LLM either likely knows or won't know how to use.

My biggest comment, of course, is how important this is, and it's amazing that you're actually getting to it.

dcschreiber

There are a bunch of changes I think we should do, but I'm approving because something is better than nothing.

LLMS.txt

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

llms.txt

static/llms.txt

dcschreiber

I think some of my existing comments are still relevant, especially a suggestion to split this into two files: LLMs and LLMs-full, or something else if you find that LLMs-full is not really popular.

llms.txt

dcschreiber · 2026-01-26T07:59:41Z

static/llms.txt

+
+**Base URL:** `https://www.sefaria.org`
+
+**Key Endpoints:**


This document seems to be mainly about the API, but if I understand correctly, the LLMs doc is meant to be for a bot navigating the website.

From Claude:

Short answer: Yes, this is not only OK—it's arguably the ideal use case. Here's my reasoning: The Purpose of llms.txt The spec states it's for providing "LLM-friendly content" with "brief background information, guidance, and links to detailed markdown files." The goal is to help LLMs understand and work with your site effectively. Sefaria's Unique Position Sefaria isn't a typical content website where you'd just link to /about.md and /pricing.md. Your content is: 384 million words of structured Jewish texts A complex reference system (Refs) Relationships between texts (commentaries, cross-references) An LLM can't usefully consume "Genesis Chapter 1" as a static markdown page. But it can: Understand the reference format (Genesis.1.1) Call /api/v3/texts/Genesis.1.1 Retrieve and serve the actual content to users This is Better Than the Alternative Consider the alternatives: Linking to static text dumps: Would overflow context windows and be stale Linking to HTML pages: LLMs would struggle to parse and extract Just describing the library: Useless without explaining how to access it By documenting the API, you're giving LLMs the tools to serve Sefaria's content correctly. That's exactly what llms.txt should do. The Spec Supports This The spec explicitly mentions that llms.txt should help avoid "context window overflow with unnecessary information" and provide "curated" rather than comprehensive content. Teaching an LLM to fetch what it needs via API is the ultimate curation. Bottom line: You've written an llms.txt that says "here's how to access our library programmatically"—which is precisely what an LLM agent needs. This is a sophisticated, correct application of the spec.

The difference between this and the developers llms.txt is that there you see the full docs and site nav for building projects, here it's focusing on how can LLMS best use our site to help the user - and via the API is the best way, plus it has benefits for site navigability.

dcschreiber · 2026-01-26T08:01:13Z

static/llms.txt

+
+Sefaria provides source texts for educational purposes. It is a textual library, not a rabbinic authority. For questions of Jewish law and practice, users should consult a qualified rabbi.
+
+The library: 384 million words, 4.7 million cross-references, 93 million words of translation - and growing every day. Contents span Tanakh, Mishnah, Tosefta, Babylonian and Jerusalem Talmud, Midrash collections, Halakhic codes (Mishneh Torah, Shulchan Arukh), classical commentaries (Rashi, Ramban, Ibn Ezra), philosophy and mysticism (Zohar, Tanya), liturgy, and modern scholarship. Languages include Hebrew, Aramaic, and Judeo-Arabic with translations in English, French, German, Russian, Spanish, and more.


LLMs know what Sefaria is so I think we can remove this

I think it's an important anchor, unless you feel it's too costly context wise?

dcschreiber · 2026-01-26T08:02:28Z

static/llms.txt

+
+The library: 384 million words, 4.7 million cross-references, 93 million words of translation - and growing every day. Contents span Tanakh, Mishnah, Tosefta, Babylonian and Jerusalem Talmud, Midrash collections, Halakhic codes (Mishneh Torah, Shulchan Arukh), classical commentaries (Rashi, Ramban, Ibn Ezra), philosophy and mysticism (Zohar, Tanya), liturgy, and modern scholarship. Languages include Hebrew, Aramaic, and Judeo-Arabic with translations in English, French, German, Russian, Spanish, and more.
+
+**Reference Format:** Convert queries to Sefaria format: `Genesis.1.1` (Tanakh), `Berakhot.2a` (Talmud Bavli), `Mishnah_Berakhot.1.1` (Mishnah), `Rashi_on_Genesis.1.1.1` (Commentary). Ranges use hyphens: `Genesis.1.1-5`. Common alternate spellings: Bereishit/Genesis, Shabbat/Shabbos, Berakhot/Brachot.


Claude Opus also knows this.
If it were an API reference I'd add this but would emit for navigating the site

Interesting, since in reading the docs it seems to me that the goal of the file is to "teach" the LLM how to read and retrieve site content, and in our case it's "easiest" for the LLM to get everything via the API - so I'd argue this is critical to keep (and tbh, critical to get references for quick queries to the site itself, i.e. sefaria.org/texts/Berakhot 2a.1)

dcschreiber · 2026-01-26T08:02:49Z

static/llms.txt

+- `GET /api/search-wrapper?query={q}` - Full-text search
+- `GET /api/calendars` - Current Torah readings, Daf Yomi, holidays
+
+## License


Why add this for site navigation?

LLMs should know what's allowed and not allowed in using and reproducing our content.

dcschreiber · 2026-01-26T08:03:25Z

static/llms.txt

+- [Name](https://developers.sefaria.org/reference/get-name.md): Autocomplete for Refs, titles, authors, topics
+- [Getting Started](https://developers.sefaria.org/reference/getting-started.md): API introduction (no auth required)
+
+## Key Concepts


maybe reference the dev portal once and mention it has it's own llms.txt

I still think this information here is valuable for an LLM to intelligently navigate the site

static/llms.txt

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

static/llms.txt

dcschreiber

As we discussed, I'm first approving this so we get this out good enough, and if I have comments, I will update you with them.

feat: First pass at LLMS.txt

c7f71c9

saengel requested review from Copilot, dcschreiber and mickeysefaria January 25, 2026 11:47

Copilot started reviewing on behalf of saengel January 25, 2026 11:47 View session

Copilot AI reviewed Jan 25, 2026

View reviewed changes

dcschreiber previously approved these changes Jan 25, 2026

View reviewed changes

LLMS.txt Outdated Show resolved Hide resolved

LLMS.txt Outdated Show resolved Hide resolved

saengel marked this pull request as draft January 25, 2026 17:56

feat: Version 2 of llms.txt, adhering to spec

69ce067

saengel dismissed dcschreiber’s stale review via 69ce067 January 26, 2026 07:40

feat: config for url path

b36051d

saengel marked this pull request as ready for review January 26, 2026 07:47

saengel requested review from Copilot and dcschreiber January 26, 2026 07:47

Copilot started reviewing on behalf of saengel January 26, 2026 07:47 View session

Copilot AI reviewed Jan 26, 2026

View reviewed changes

llms.txt Outdated Show resolved Hide resolved

static/llms.txt Show resolved Hide resolved

static/llms.txt Show resolved Hide resolved

dcschreiber requested changes Jan 26, 2026

View reviewed changes

saengel added 3 commits January 26, 2026 10:12

fix: remove llms.txt from root

5f3be24

fix: move up site nav

75e3dc0

fix: a bit of reorganization

82d6c89

saengel requested review from Copilot and dcschreiber January 26, 2026 08:21

Copilot started reviewing on behalf of saengel January 26, 2026 08:22 View session

Copilot AI reviewed Jan 26, 2026

View reviewed changes

static/llms.txt Show resolved Hide resolved

dcschreiber approved these changes Jan 26, 2026

View reviewed changes

saengel enabled auto-merge February 8, 2026 09:29


		Sefaria provides source texts for educational purposes. It is a textual library, not a rabbinic authority. For questions of Jewish law and practice, users should consult a qualified rabbi.

		The library: 384 million words, 4.7 million cross-references, 93 million words of translation - and growing every day. Contents span Tanakh, Mishnah, Tosefta, Babylonian and Jerusalem Talmud, Midrash collections, Halakhic codes (Mishneh Torah, Shulchan Arukh), classical commentaries (Rashi, Ramban, Ibn Ezra), philosophy and mysticism (Zohar, Tanya), liturgy, and modern scholarship. Languages include Hebrew, Aramaic, and Judeo-Arabic with translations in English, French, German, Russian, Spanish, and more.


		The library: 384 million words, 4.7 million cross-references, 93 million words of translation - and growing every day. Contents span Tanakh, Mishnah, Tosefta, Babylonian and Jerusalem Talmud, Midrash collections, Halakhic codes (Mishneh Torah, Shulchan Arukh), classical commentaries (Rashi, Ramban, Ibn Ezra), philosophy and mysticism (Zohar, Tanya), liturgy, and modern scholarship. Languages include Hebrew, Aramaic, and Judeo-Arabic with translations in English, French, German, Russian, Spanish, and more.

		Reference Format: Convert queries to Sefaria format: `Genesis.1.1` (Tanakh), `Berakhot.2a` (Talmud Bavli), `Mishnah_Berakhot.1.1` (Mishnah), `Rashi_on_Genesis.1.1.1` (Commentary). Ranges use hyphens: `Genesis.1.1-5`. Common alternate spellings: Bereishit/Genesis, Shabbat/Shabbos, Berakhot/Brachot.

Uh oh!

Produce an LLMs.txt for Sefaria.org to enhance discoverability #3045

Are you sure you want to change the base?

Produce an LLMs.txt for Sefaria.org to enhance discoverability #3045

Conversation

saengel commented Jan 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Strategic Value

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

mergify bot commented Jan 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🧪 CI Insights

❌ Job Failures

Uh oh!

dcschreiber commented Jan 25, 2026

Uh oh!

dcschreiber left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dcschreiber left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

dcschreiber left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

saengel commented Jan 25, 2026 •

edited

Loading

mergify bot commented Jan 25, 2026 •

edited

Loading