Skip to content

Cant parse PDF #6

@whitsey

Description

@whitsey

I have a PDF file which I am extracting from an email attachment.
https://www.dropbox.com/s/yttcwtq1ii1nw4s/4267_STORE01_reports_20885337555325238928.pdf?dl=0

Whenever I try to parse() the file, it throws an exception, dumping the entire PDF contents:

Error processing PDF: PDF file not found: %PDF-1.3 %���� 1 0 obj << /Type /Pages /Count 39 /Kids [ 5 0 R 8 0 R 11 0 R 14 0 R 17 0 R 20 0 R 23 0 R 26 0 R 29 0 R 32 0 R 35 0 R 38 0 R 41 0 R 44 0 R 47 0 R 50 0 R 53 0 R 56 0 R 59 0 R 62 0 R 65 0 R 68 0 R 71 0 R 74 0 R 77 0 R 80 0 R 83 0 R 86 0 R 89 0 R 92 0 R 95 0 R 98 0 R 101 0 R 104 0 R 107 0 R 110 0 R 113 0 R 116 0 R 119 0 R ] /MediaBox [ 0 0 594 850 ] /Rotate 90 >> endobj 2 0 obj << /Type /Catalog /Pages 1 0 R >> endobj 3 0 obj << /Author (pctopdf) /Subject (/usr/Utools/PCL.templates/BASIC_report) /CreationDate (D:20250306070055-1700) /Creator

if ( $data = imap_fetchbody($mbox, $mid, $partno) ) {

  if ($partObject->encoding==3)$data=base64_decode($data);
  if ($partObject->encoding==4)$data=quoted_printable_decode($data);

  try {
    $parser = new \Wrseward\PdfParser\Pdf\PdfToTextParser(sys_get_temp_dir());
    $parser->parse($data); <--- Throws exception here
    $pdfText = $parser->text();
  } 
  catch (Exception $e) 
  {
    error_log("Error processing PDF: " . $e->getMessage());
  }
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions