Local scrape_file failing for some PDFs with out of memory

Hi there,

I've just switched over to the local version from the API and I am experiencing a memory issue that exits my celery worker or shell environment when I run scrape_file.  When scrape_file runs on *some* PDFs I see a `"OOMKilled": true,` for the worker, but I should have plenty of resources allocated with a ~22GB limit.

It reliably fails on some PDFs and reliably succeeds on others.

Here is my setup code

```
from openai import OpenAI
from thepipe.scraper import scrape_file

client = OpenAI()
            results = scrape_file(
                filepath=local_file_path, openai_client=client, model="gpt-4o"
            )
```

thepipe-api==1.5.8
litellm==1.61.1

Thank you 🙏

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Local scrape_file failing for some PDFs with out of memory #36

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Local scrape_file failing for some PDFs with out of memory #36

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions