Skip to content

Refactors #6

@ta4tsering

Description

@ta4tsering
  • set base metadata with set_base method
  • update PechaMetadata to support below metadata specs
  • pedurma list
  • lotsawa list
  • google ocr list
  • bdrc etext list
  • rest of pecha

New Metadata specs

id: [pecha_id](https://github.com/OpenPecha-dev/openpecha-toolkit/blob/16c0bc7a3bbf61b25f3f79fc8fd662c51c4a2699/openpecha/core/ids.py#L13)
source: https://library.bdrc.io
source_file: <release_asset_url>
initial_creation_type: ocr
imported: 2021-12-16 15:34:06.604704
last_modified: 2021-12-16 15:34:06.604704
parser: https://github.com/OpenPecha-dev/openpecha-toolkit/blob/231bba39dd1ba393320de82d4d08a604aabe80fc/openpecha/formatters/google_orc.py
source_metadata:
  id: bdr:W1PD90121
  title: མའོ་རྫོང་གི་ས་ཆའི་མིང་བཏུས།
  author: author_name
  base:
    f3c9:
      image_group_id: I1PD90137
      title: Volume 1 of mao wen qiang zu zi zhi xian di ming lu
      total_pages: 220
      order: 1
      base_file: f3c9.txt
  access: http://purl.bdrc.io/admindata/AccessOpen
  restrictedInChina: false

setting new base

base_name = pecha.set_base(
    "base content",
    metadata={
        "id": "id", 
        "title": "title",
        "total_pages": 220
        "order": 1
    }
)

Metadata documentation in parsser:

class GoogleOCRFormatter(BaseFormatter):
    """
    OpenPecha Formatter for Google OCR JSON output of scanned pecha
    
    
    Notes:
        metadata description:
            id: <>
            title: <>
            
    """

Metadata

Metadata

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions