Skip to content

Conversation

@saengel
Copy link
Contributor

@saengel saengel commented Jan 25, 2026

Summary

Add LLMS.txt file to provide structured guidance for AI systems accessing Sefaria's content
Optimized for cross-engine compatibility (OpenAI/GPT, Anthropic/Claude, Google/Gemini)
Includes API quick reference, reference format guide, best practices, and data access options

LLMS.txt is an emerging convention (similar to robots.txt for search engines) that tells AI systems how to interact with a domain. Major AI providers are beginning to recognize and respect these files. See the docs here.

OpenAI, Anthropic, and Google are all actively developing systems that read site-level instruction files
Early adopters of LLMS.txt will have their guidance incorporated as these systems mature
The cost is minimal (one static file); the upside is significant

This file:

  • Positions Sefaria as the canonical source for Jewish textual content
  • Provides actionable API instructions so models fetch verified text instead of hallucinating
  • Establishes guardrails: directs users to rabbis for halakhic questions, presents texts within Jewish interpretive tradition
  • Surfaces our infrastructure: Sefaria-Export for training, REST API for live queries, MCP server for Claude

Strategic Value

Discoverability: AI developers building Jewish-focused applications will find our API documentation
Accuracy: Every AI response that cites Sefaria instead of generating from training data is a win for textual integrity
Mission alignment: "Making Jewish texts accessible" now includes making them accessible through the AI interfaces people increasingly use
Attribution: The file explicitly requests citation ("via Sefaria.org"), driving awareness back to us

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds an LLMS.txt manifest to guide LLMs and AI agents on how to correctly access, cite, and contextualize Sefaria’s content. It documents key API endpoints, reference formats, licensing, and recommended usage patterns to position Sefaria as the canonical source for Jewish texts in AI integrations.

Changes:

  • Introduces LLMS.txt with YAML frontmatter describing Sefaria as an API provider and knowledge source.
  • Documents core API endpoints, reference formats, and recommended workflows for querying texts and related metadata.
  • Details licensing, attribution expectations, data access options (Sefaria-Export and REST API), and contact channels for developers and corrections.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@mergify
Copy link

mergify bot commented Jan 25, 2026

🧪 CI Insights

Here's what we observed from your CI run for 82d6c89.

❌ Job Failures

Pipeline Job Health on master Retries 🔍 CI Insights 📄 Logs
Continuous Continuous Testing: PyTest Healthy 1 View View

@dcschreiber
Copy link
Contributor

A split between a "Map" (llms.txt) and "Content" (llms-full.txt) is the official standard.
Generally, I'd make this more concise and focus more on navigating the website, and reference the dev portal for information about the API, or add another document or something like that. I may be mistaken, but I think at the moment an LLM wouldn't typically see this file then change to using the API.
It's quite long, so here are some ideas on how to shorten it:

  1. I don't think we need "Common alternate spellings" if there are only 3 - also, the idea of alternate spellings is that it makes sure you don't need to use one spelling
  2. Maybe we can remove "Why Use Sefaria's API" and hope the LLM knows this
  3. "Library Contents" also quite a lot for context
  4. We can combine all the recommendations on how to work with Sefaria and how to answer questions.

I'd put Sefaria-Export in it's own section, we already talked about the REST so no need to bring it again. The question is if this is something that the LLM could use, or if it's not really needed.

Also, the last three sections, I feel it's something that LLM either likely knows or won't know how to use.

My biggest comment, of course, is how important this is, and it's amazing that you're actually getting to it.

dcschreiber
dcschreiber previously approved these changes Jan 25, 2026
Copy link
Contributor

@dcschreiber dcschreiber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a bunch of changes I think we should do, but I'm approving because something is better than nothing.

@saengel saengel marked this pull request as draft January 25, 2026 17:56
@saengel saengel marked this pull request as ready for review January 26, 2026 07:47
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

@dcschreiber dcschreiber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think some of my existing comments are still relevant, especially a suggestion to split this into two files: LLMs and LLMs-full, or something else if you find that LLMs-full is not really popular.


**Base URL:** `https://www.sefaria.org`

**Key Endpoints:**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This document seems to be mainly about the API, but if I understand correctly, the LLMs doc is meant to be for a bot navigating the website.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From Claude:

Short answer: Yes, this is not only OK—it's arguably the ideal use case.

Here's my reasoning:

The Purpose of llms.txt
The spec states it's for providing "LLM-friendly content" with "brief background information, guidance, and links to detailed markdown files." The goal is to help LLMs understand and work with your site effectively.

Sefaria's Unique Position
Sefaria isn't a typical content website where you'd just link to /about.md and /pricing.md. Your content is:

384 million words of structured Jewish texts
A complex reference system (Refs)
Relationships between texts (commentaries, cross-references)
An LLM can't usefully consume "Genesis Chapter 1" as a static markdown page. But it can:

Understand the reference format (Genesis.1.1)
Call /api/v3/texts/Genesis.1.1
Retrieve and serve the actual content to users
This is Better Than the Alternative
Consider the alternatives:

Linking to static text dumps: Would overflow context windows and be stale
Linking to HTML pages: LLMs would struggle to parse and extract
Just describing the library: Useless without explaining how to access it
By documenting the API, you're giving LLMs the tools to serve Sefaria's content correctly. That's exactly what llms.txt should do.

The Spec Supports This
The spec explicitly mentions that llms.txt should help avoid "context window overflow with unnecessary information" and provide "curated" rather than comprehensive content. Teaching an LLM to fetch what it needs via API is the ultimate curation.

Bottom line: You've written an llms.txt that says "here's how to access our library programmatically"—which is precisely what an LLM agent needs. This is a sophisticated, correct application of the spec. 

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The difference between this and the developers llms.txt is that there you see the full docs and site nav for building projects, here it's focusing on how can LLMS best use our site to help the user - and via the API is the best way, plus it has benefits for site navigability.


Sefaria provides source texts for educational purposes. It is a textual library, not a rabbinic authority. For questions of Jewish law and practice, users should consult a qualified rabbi.

The library: 384 million words, 4.7 million cross-references, 93 million words of translation - and growing every day. Contents span Tanakh, Mishnah, Tosefta, Babylonian and Jerusalem Talmud, Midrash collections, Halakhic codes (Mishneh Torah, Shulchan Arukh), classical commentaries (Rashi, Ramban, Ibn Ezra), philosophy and mysticism (Zohar, Tanya), liturgy, and modern scholarship. Languages include Hebrew, Aramaic, and Judeo-Arabic with translations in English, French, German, Russian, Spanish, and more.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LLMs know what Sefaria is so I think we can remove this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's an important anchor, unless you feel it's too costly context wise?


The library: 384 million words, 4.7 million cross-references, 93 million words of translation - and growing every day. Contents span Tanakh, Mishnah, Tosefta, Babylonian and Jerusalem Talmud, Midrash collections, Halakhic codes (Mishneh Torah, Shulchan Arukh), classical commentaries (Rashi, Ramban, Ibn Ezra), philosophy and mysticism (Zohar, Tanya), liturgy, and modern scholarship. Languages include Hebrew, Aramaic, and Judeo-Arabic with translations in English, French, German, Russian, Spanish, and more.

**Reference Format:** Convert queries to Sefaria format: `Genesis.1.1` (Tanakh), `Berakhot.2a` (Talmud Bavli), `Mishnah_Berakhot.1.1` (Mishnah), `Rashi_on_Genesis.1.1.1` (Commentary). Ranges use hyphens: `Genesis.1.1-5`. Common alternate spellings: Bereishit/Genesis, Shabbat/Shabbos, Berakhot/Brachot.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Opus also knows this.
If it were an API reference I'd add this but would emit for navigating the site

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, since in reading the docs it seems to me that the goal of the file is to "teach" the LLM how to read and retrieve site content, and in our case it's "easiest" for the LLM to get everything via the API - so I'd argue this is critical to keep (and tbh, critical to get references for quick queries to the site itself, i.e. sefaria.org/texts/Berakhot 2a.1)

- `GET /api/search-wrapper?query={q}` - Full-text search
- `GET /api/calendars` - Current Torah readings, Daf Yomi, holidays

## License
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why add this for site navigation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LLMs should know what's allowed and not allowed in using and reproducing our content.

- [Name](https://developers.sefaria.org/reference/get-name.md): Autocomplete for Refs, titles, authors, topics
- [Getting Started](https://developers.sefaria.org/reference/getting-started.md): API introduction (no auth required)

## Key Concepts
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe reference the dev portal once and mention it has it's own llms.txt

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think this information here is valuable for an LLM to intelligently navigate the site

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

@dcschreiber dcschreiber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we discussed, I'm first approving this so we get this out good enough, and if I have comments, I will update you with them.

@saengel saengel enabled auto-merge February 8, 2026 09:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants