-
-
Notifications
You must be signed in to change notification settings - Fork 1
Description
Security Audit Summary
Manual audit of the codebase identified 5 verified security issues. All findings confirmed against source; severity adjusted where the audit overstated practical risk.
1. [CRITICAL] Unsafe pickle deserialization (RCE)
pickle.loads() on zlib-decompressed files.zip from user-influenced paths.
paperpipe/paperqa.py:184—_paperqa_load_index_files_mappaperpipe/paperqa_mcp_server.py:135—_load_files_zip_map- Called from CLI (
cli/ask.py:400,cli/index.py:369) and MCP server (paperqa_mcp_server.py:255)
A crafted files.zip placed in any index directory achieves arbitrary code execution. The index path is constructed from CLI flags or MCP tool parameters.
Fix: Replace pickle.loads with a safe format (JSON, msgpack) or use RestrictedUnpickler that only allows dict/str types.
2. [HIGH] Path traversal in index name handling
index_name is used unsanitized in path construction:
paperpipe/paperqa.py:178—index_directory / index_name / "files.zip"paperpipe/paperqa_mcp_server.py:189-191—index_root / index_name / "paperpipe_meta.json"withmkdir(parents=True)+write_text()
An index_name containing ../ allows reading/writing outside the intended index root. Especially relevant in the MCP server context where index_name originates from LLM tool calls, not direct user input.
Fix: Validate that the resolved path is a child of the index root (e.g., resolved.relative_to(index_root)).
3. [MEDIUM] File deletion outside managed staging on crash recovery
_paperqa_find_crashing_file (paperqa.py:142-143) returns absolute paths verbatim when crashing_doc contains one. The guard at cli/ask.py:732 checks that paper_dir equals the managed staging directory — but the returned file f can point anywhere. The guard is on the wrong variable.
Practical risk is lower than it appears: crashing_doc is parsed from PaperQA2's stdout (cli/ask.py:710-712), not direct user input. Exploitation requires PaperQA2 to emit a crafted absolute path in its "New file to index:" output.
Fix: After resolving f, verify f.resolve().is_relative_to(managed_staging_dir.resolve()) before unlinking.
4. [MEDIUM] API key leakage in logs and error output
The full LEANN command including --api-key is exposed in two ways:
paperpipe/leann.py:601, 707—debug("Running LEANN: %s", shlex.join(cmd))logs the key at debug levelpaperpipe/leann.py:605—echo_error(f"Command: {shlex.join(cmd)}")prints the key to stderr on failure
Key is appended to the command at leann.py:687-688.
Fix: Redact --api-key <value> from commands before logging/printing.
5. [LOW] Unbounded decompression / download size (DoS)
No size limits on:
zlib.decompress(path.read_bytes())atpaperqa.py:183andpaperqa_mcp_server.py:134(zip bomb)f.write(response.content)atpaper.py:141and tarball extraction atpaper.py:147
Low practical risk for a CLI tool processing local index files and arXiv downloads. Slightly more relevant in MCP server mode.
Fix (defense-in-depth): Use zlib.decompressobj with maxlength, add Content-Length checks on downloads, limit tar extraction size.
Suggested fix priority
- P0: Pickle deserialization (Add logging controls and route CLI output through logger #1) + path traversal (Add Dependabot configuration for version updates #2) — highest impact, straightforward fix
- P1: Staging boundary check (Bump the all-actions group with 4 updates #3) + secret redaction (feat: Add TL;DR generation and import/export functionality #4)
- P2: Size limits (feat: v1.0.0 - BibTeX import and Semantic Scholar support #5)
All tests pass (make check, 445 tests). Bandit and pip-audit were not available in the audit environment.