Skip to content

Security audit: pickle RCE, path traversal, key leakage, and other findings #55

@hummat

Description

@hummat

Security Audit Summary

Manual audit of the codebase identified 5 verified security issues. All findings confirmed against source; severity adjusted where the audit overstated practical risk.


1. [CRITICAL] Unsafe pickle deserialization (RCE)

pickle.loads() on zlib-decompressed files.zip from user-influenced paths.

  • paperpipe/paperqa.py:184_paperqa_load_index_files_map
  • paperpipe/paperqa_mcp_server.py:135_load_files_zip_map
  • Called from CLI (cli/ask.py:400, cli/index.py:369) and MCP server (paperqa_mcp_server.py:255)

A crafted files.zip placed in any index directory achieves arbitrary code execution. The index path is constructed from CLI flags or MCP tool parameters.

Fix: Replace pickle.loads with a safe format (JSON, msgpack) or use RestrictedUnpickler that only allows dict/str types.


2. [HIGH] Path traversal in index name handling

index_name is used unsanitized in path construction:

  • paperpipe/paperqa.py:178index_directory / index_name / "files.zip"
  • paperpipe/paperqa_mcp_server.py:189-191index_root / index_name / "paperpipe_meta.json" with mkdir(parents=True) + write_text()

An index_name containing ../ allows reading/writing outside the intended index root. Especially relevant in the MCP server context where index_name originates from LLM tool calls, not direct user input.

Fix: Validate that the resolved path is a child of the index root (e.g., resolved.relative_to(index_root)).


3. [MEDIUM] File deletion outside managed staging on crash recovery

_paperqa_find_crashing_file (paperqa.py:142-143) returns absolute paths verbatim when crashing_doc contains one. The guard at cli/ask.py:732 checks that paper_dir equals the managed staging directory — but the returned file f can point anywhere. The guard is on the wrong variable.

Practical risk is lower than it appears: crashing_doc is parsed from PaperQA2's stdout (cli/ask.py:710-712), not direct user input. Exploitation requires PaperQA2 to emit a crafted absolute path in its "New file to index:" output.

Fix: After resolving f, verify f.resolve().is_relative_to(managed_staging_dir.resolve()) before unlinking.


4. [MEDIUM] API key leakage in logs and error output

The full LEANN command including --api-key is exposed in two ways:

  • paperpipe/leann.py:601, 707debug("Running LEANN: %s", shlex.join(cmd)) logs the key at debug level
  • paperpipe/leann.py:605echo_error(f"Command: {shlex.join(cmd)}") prints the key to stderr on failure

Key is appended to the command at leann.py:687-688.

Fix: Redact --api-key <value> from commands before logging/printing.


5. [LOW] Unbounded decompression / download size (DoS)

No size limits on:

  • zlib.decompress(path.read_bytes()) at paperqa.py:183 and paperqa_mcp_server.py:134 (zip bomb)
  • f.write(response.content) at paper.py:141 and tarball extraction at paper.py:147

Low practical risk for a CLI tool processing local index files and arXiv downloads. Slightly more relevant in MCP server mode.

Fix (defense-in-depth): Use zlib.decompressobj with maxlength, add Content-Length checks on downloads, limit tar extraction size.


Suggested fix priority

  1. P0: Pickle deserialization (Add logging controls and route CLI output through logger #1) + path traversal (Add Dependabot configuration for version updates #2) — highest impact, straightforward fix
  2. P1: Staging boundary check (Bump the all-actions group with 4 updates #3) + secret redaction (feat: Add TL;DR generation and import/export functionality #4)
  3. P2: Size limits (feat: v1.0.0 - BibTeX import and Semantic Scholar support #5)

All tests pass (make check, 445 tests). Bandit and pip-audit were not available in the audit environment.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingsecurity

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions