Skip to content

Conversation

@hummat
Copy link
Owner

@hummat hummat commented Feb 8, 2026

Summary

Fix all 5 security vulnerabilities reported in #55 (code audit findings), ordered by severity.

Closes: #55

Changes

  • [CRITICAL] Restricted pickle deserialization_RestrictedUnpickler + _safe_unpickle replace raw pickle.loads() in paperqa.py and paperqa_mcp_server.py. Only allows basic Python builtins (dict, str, int, etc.), blocking arbitrary code execution from crafted index files.
  • [HIGH] Path traversal validation_validate_index_name() rejects .., /, \, null bytes. Applied at 7 path-construction entry points across paperqa.py, paperqa_mcp_server.py, and leann.py.
  • [MEDIUM] Staging boundary check_paperqa_find_crashing_file() now verifies all returned paths are within paper_directory via is_relative_to(). Additional guard in cli/ask.py before f.unlink().
  • [MEDIUM] API key redaction_redact_cmd() in leann.py masks --api-key / --embedding-api-key values with *** in 3 debug/error log locations.
  • [LOW] Bounded decompression and download sizes_safe_zlib_decompress() (100MB limit), _MAX_DOWNLOAD_SIZE (500MB) check on Content-Length + body, _MAX_TAR_MEMBER_SIZE/_MAX_TAR_TOTAL_SIZE for tarball extraction.
  • 25 new tests covering all 5 fixes across test_paperqa.py, test_leann.py, test_paper.py.
  • 1 existing test fix — added headers attribute to MockResponse in test_figure_extraction.py (required by new download size check).

Type of Change

  • Bug fix

Testing

  • Ran make check (required) — 470 passed, 0 failed, lint clean, 75.80% coverage
  • Added/updated tests — 25 new security tests
  • Tested CLI manually

Checklist

  • I have read the Contributing Guidelines
  • My code follows the existing style (ruff format + pyright)
  • I have updated documentation if needed (README, AGENT_INTEGRATION.md)
  • CLI changes are reflected in help text and docs

…limits

Address all 5 security findings from issue #55:

1. [CRITICAL] Restricted pickle deserialization via _RestrictedUnpickler
   (blocks arbitrary code execution from crafted index files)
2. [HIGH] Path traversal validation on index_name at all entry points
3. [MEDIUM] Staging boundary check before file deletion in crash handler
4. [MEDIUM] API key redaction in LEANN debug/error log output
5. [LOW] Bounded zlib decompression (100MB), download size (500MB),
   and tarball extraction limits (100MB/file, 500MB total)

Includes 25 new tests covering all fixes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: fe427a03c3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines 68 to 71
for i in range(0, len(data), block_size):
chunk = dobj.decompress(data[i : i + block_size])
total += len(chunk)
if total > max_size:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Enforce zlib size limit before allocating output

The new _safe_zlib_decompress still calls dobj.decompress(...) without a max_length, so a crafted zlib stream that expands massively in a single call will allocate the full output before the total > max_size check runs. That means a decompression bomb can still exhaust memory even with the limit in place when reading untrusted files.zip contents. To actually bound memory, pass a remaining-length cap (e.g., dobj.decompress(chunk, max_size - total)) and handle unconsumed_tail/unused_data accordingly. The same issue appears in paperpipe/paperqa_mcp_server.py’s _safe_zlib_decompress.

Useful? React with 👍 / 👎.

- Validate index_name at entry of get_pqa_index_status before path construction
- Use decompressobj max_length to truly bound memory during decompression
- Guard int(Content-Length) against malformed headers (ValueError)
- Use validated index_name in _write_index_metadata path construction

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@hummat hummat merged commit 9eb7a7a into main Feb 8, 2026
2 checks passed
@hummat hummat deleted the fix/security-hardening branch February 8, 2026 19:18
@claude
Copy link

claude bot commented Feb 8, 2026


Code review

No issues found. Checked for bugs and CLAUDE.md compliance.


Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Security audit: pickle RCE, path traversal, key leakage, and other findings

1 participant