Skip to content

Local scrape_file failing for some PDFs with out of memory #36

@camrail

Description

@camrail

Hi there,

I've just switched over to the local version from the API and I am experiencing a memory issue that exits my celery worker or shell environment when I run scrape_file. When scrape_file runs on some PDFs I see a "OOMKilled": true, for the worker, but I should have plenty of resources allocated with a ~22GB limit.

It reliably fails on some PDFs and reliably succeeds on others.

Here is my setup code

from openai import OpenAI
from thepipe.scraper import scrape_file

client = OpenAI()
            results = scrape_file(
                filepath=local_file_path, openai_client=client, model="gpt-4o"
            )

thepipe-api==1.5.8
litellm==1.61.1

Thank you 🙏

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions