Security audit: pickle RCE, path traversal, key leakage, and other findings

## Security Audit Summary

Manual audit of the codebase identified 5 verified security issues. All findings confirmed against source; severity adjusted where the audit overstated practical risk.

---

### 1. [CRITICAL] Unsafe pickle deserialization (RCE)

`pickle.loads()` on zlib-decompressed `files.zip` from user-influenced paths.

- **`paperpipe/paperqa.py:184`** — `_paperqa_load_index_files_map`
- **`paperpipe/paperqa_mcp_server.py:135`** — `_load_files_zip_map`
- Called from CLI (`cli/ask.py:400`, `cli/index.py:369`) and MCP server (`paperqa_mcp_server.py:255`)

A crafted `files.zip` placed in any index directory achieves arbitrary code execution. The index path is constructed from CLI flags or MCP tool parameters.

**Fix:** Replace `pickle.loads` with a safe format (JSON, msgpack) or use `RestrictedUnpickler` that only allows `dict`/`str` types.

---

### 2. [HIGH] Path traversal in index name handling

`index_name` is used unsanitized in path construction:

- **`paperpipe/paperqa.py:178`** — `index_directory / index_name / "files.zip"`
- **`paperpipe/paperqa_mcp_server.py:189-191`** — `index_root / index_name / "paperpipe_meta.json"` with `mkdir(parents=True)` + `write_text()`

An `index_name` containing `../` allows reading/writing outside the intended index root. Especially relevant in the MCP server context where `index_name` originates from LLM tool calls, not direct user input.

**Fix:** Validate that the resolved path is a child of the index root (e.g., `resolved.relative_to(index_root)`).

---

### 3. [MEDIUM] File deletion outside managed staging on crash recovery

`_paperqa_find_crashing_file` (`paperqa.py:142-143`) returns absolute paths verbatim when `crashing_doc` contains one. The guard at `cli/ask.py:732` checks that `paper_dir` equals the managed staging directory — but the returned file `f` can point anywhere. The guard is on the wrong variable.

Practical risk is lower than it appears: `crashing_doc` is parsed from PaperQA2's stdout (`cli/ask.py:710-712`), not direct user input. Exploitation requires PaperQA2 to emit a crafted absolute path in its "New file to index:" output.

**Fix:** After resolving `f`, verify `f.resolve().is_relative_to(managed_staging_dir.resolve())` before unlinking.

---

### 4. [MEDIUM] API key leakage in logs and error output

The full LEANN command including `--api-key` is exposed in two ways:

- **`paperpipe/leann.py:601, 707`** — `debug("Running LEANN: %s", shlex.join(cmd))` logs the key at debug level
- **`paperpipe/leann.py:605`** — `echo_error(f"Command: {shlex.join(cmd)}")` prints the key to stderr on failure

Key is appended to the command at `leann.py:687-688`.

**Fix:** Redact `--api-key <value>` from commands before logging/printing.

---

### 5. [LOW] Unbounded decompression / download size (DoS)

No size limits on:

- `zlib.decompress(path.read_bytes())` at `paperqa.py:183` and `paperqa_mcp_server.py:134` (zip bomb)
- `f.write(response.content)` at `paper.py:141` and tarball extraction at `paper.py:147`

Low practical risk for a CLI tool processing local index files and arXiv downloads. Slightly more relevant in MCP server mode.

**Fix (defense-in-depth):** Use `zlib.decompressobj` with `maxlength`, add `Content-Length` checks on downloads, limit tar extraction size.

---

## Suggested fix priority

1. **P0:** Pickle deserialization (#1) + path traversal (#2) — highest impact, straightforward fix
2. **P1:** Staging boundary check (#3) + secret redaction (#4)
3. **P2:** Size limits (#5)

All tests pass (`make check`, 445 tests). Bandit and pip-audit were not available in the audit environment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Security audit: pickle RCE, path traversal, key leakage, and other findings #55

Security Audit Summary

1. [CRITICAL] Unsafe pickle deserialization (RCE)

2. [HIGH] Path traversal in index name handling

3. [MEDIUM] File deletion outside managed staging on crash recovery

4. [MEDIUM] API key leakage in logs and error output

5. [LOW] Unbounded decompression / download size (DoS)

Suggested fix priority

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Security audit: pickle RCE, path traversal, key leakage, and other findings #55

Description

Security Audit Summary

1. [CRITICAL] Unsafe pickle deserialization (RCE)

2. [HIGH] Path traversal in index name handling

3. [MEDIUM] File deletion outside managed staging on crash recovery

4. [MEDIUM] API key leakage in logs and error output

5. [LOW] Unbounded decompression / download size (DoS)

Suggested fix priority

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions