Skip to content

Bitcoin Search API

Andreas edited this page Dec 11, 2024 · 2 revisions

Introduction

The Bitcoin Search API provides a proxy layer to Elasticsearch, offering endpoints to search, browse, and retrieve Bitcoin-related content from various sources. The API is THE access point for our data infrastructure.

Available Endpoints

The API currently exposes the following endpoints:

  1. /search - Primary search endpoint for querying documents
  2. /sources - Retrieve information about available data sources
  3. /sourceDocuments - Get documents for a specific source
  4. /getDocumentContent - Retrieve full content of a specific document

Initially, only the /search endpoint was implemented, serving as the core functionality for the Bitcoin Search product's main search interface. As the need for more detailed document exploration emerged, we developed the explore interface at bitcoinsearch.xyz/sources. This led to the creation of the supporting endpoints (/sources, /sourceDocuments, and /getDocumentContent) to provide access to the underlying data.

/search

Path: /api/elasticSearchProxy/search
Method: POST
Description: Primary search endpoint that provides full-text search capabilities across all documents with support for filtering, sorting, and aggregations.

Request Parameters

{
  queryString: string,         // Search query
  size: number,               // Number of results per page
  page: number,               // Page number
  filterFields?: [            // Optional filters
    {
      field: string,          // Field to filter on (domain, authors, tags)
      value: string           // Filter value
    }
  ],
  sortFields?: [              // Optional sorting
    {
      field: string,          // Field to sort by
      value: "asc" | "desc"   // Sort direction
    }
  ]
}

Supported Fields

  • Full-text search fields: authors, title, body
  • Filter fields: domain.keyword, authors.keyword, tags.keyword
  • Sort fields: Any indexed field (commonly indexed_at, created_at)

Response Format

{
  success: true,
  data: {
    result: {
      hits: {
        total: { value: number },
        hits: [/* documents */]
      },
      aggregations: {
        authors: { buckets: [/* author counts */] },
        domains: { buckets: [/* domain counts */] },
        tags: { buckets: [/* tag counts */] }
      }
    }
  }
}

Example Requests

Basic Search:

{
  queryString: "bitcoin lightning network",
  size: 10,
  page: 1
}

Filtered Search:

{
  queryString: "taproot",
  filterFields: [
    { field: "domain", value: "https://delvingbitcoin.org" },
    { field: "authors", value: "Pieter Wuille" }
  ],
  size: 10,
  page: 1
}

Sorted Search:

{
  queryString: "segwit",
  sortFields: [
    { field: "indexed_at", value: "desc" }
  ],
  filterFields: [
    { field: "tags", value: "cryptography" }
  ],
  size: 20,
  page: 2
}

Special Behaviors

  • Automatically excludes "combined-summary" type documents
  • Includes aggregations for authors, domains, and tags in every response
  • Uses term-level queries for exact matching in filters
  • Supports pagination through size and page parameters

/sources

Path: /api/elasticSearchProxy/sources
Method: POST
Description: Returns aggregated information about all data sources in the system.

Response Format

{
  success: true,
  data: {
    result: [
      {
        domain: string,              // Domain URL
        documentCount: number,       // Total documents from this source
        lastScraped: number,        // Timestamp of last indexing
        hasSummaries: boolean,      // Whether source has combined summaries
        hasThreads: boolean         // Whether source has threaded discussions
      }
    ]
  }
}

/sourceDocuments

Path: /api/elasticSearchProxy/sourceDocuments
Method: POST
Description: Retrieves documents for a specific source with support for different view modes and pagination.

Request Parameters

{
  domain: string,          // Required: Source domain
  page: number,           // Optional: Page number (default: 1)
  viewMode: string,       // Optional: View mode (flat/threaded/summaries)
  threadsPage: number     // Optional: Page number for threaded view
}

View Modes

Each viewMode fetches and displays documents appropriately:

  • Flat view: Shows all documents except combined summaries
  • Threaded view: Groups documents by thread_url (if available)
  • Summaries view: Shows only combined-summary documents (if available)

Response Format

{
  success: true,
  data: {
    documents: [
      {
        title: string,
        url: string,
        indexed_at: string,
        thread_url?: string,
        type?: string
      }
    ],
    total: number,
    viewMode: string
  }
}

/getDocumentContent

Path: /api/elasticSearchProxy/getDocumentContent
Method: POST
Description: Retrieves the full content and metadata of a specific document by its URL.

Request Parameters

{
  url: string    // Required: Full URL of the document to retrieve
}

Response Format

{
  success: true,
  data: {
    title: string,
    url: string,
    body: string,
    body_type: string,
    domain: string,
    indexed_at: string,
    created_at?: string,
    authors?: string[],
    tags?: string[],
    thread_url?: string,
    type?: string,
    summary?: string,
    body_formatted?: string
  }
}

Example Request

{
  url: "https://delvingbitcoin.org/t/htlc-endorsement-for-lightning-channels/171"
}

Example Response

{
  success: true,
  data: {
    title: "HTLC endorsement for Lightning channels",
    url: "https://delvingbitcoin.org/t/htlc-endorsement-for-lightning-channels/171",
    body: "## Introduction\nThis post discusses...",
    body_type: "markdown",
    domain: "delvingbitcoin.org",
    indexed_at: "2024-01-15T10:30:00Z",
    created_at: "2024-01-15T10:00:00Z",
    authors: ["John Doe"],
    tags: ["lightning", "htlc", "channels"],
    type: "original_post"
  }
}

Error Handling

  • Returns 404 if document is not found
  • Returns 400 for invalid requests
{
  success: false,
  message: "Document not found" | "URL is required" | "An error occurred while fetching document content"
}

This endpoint is particularly useful for retrieving the complete content of a document after finding it through the search or source documents endpoints.