Skip to content

Robustness and efficiency issues from code audit #56

@hummat

Description

@hummat

Non-security findings from code audit

Verified bugs, packaging issues, and efficiency improvements.


1. [BUG] _find_paper_by_source_url crashes if PAPERS_DIR doesn't exist

paperpipe/cli/papers.py:75config.PAPERS_DIR.iterdir() is called without an existence check. Raises FileNotFoundError when invoked before the directory has been bootstrapped (e.g., first run, or when called outside the normal CLI entry point).

Fix: Guard with if not config.PAPERS_DIR.exists(): return None before iterating.


2. [BUG] Temp file leak on unexpected exceptions in download_source

paperpipe/paper.py:140-201tar_path.unlink() at line 201 is outside a finally block. The try/except at line 145 only catches tarfile.ReadError. Any other exception (e.g., MemoryError, OSError from tar.getmembers()) skips cleanup and leaks the temp file.

Fix: Move tar_path.unlink() into a finally block:

try:
    ...
except tarfile.ReadError:
    ...
finally:
    tar_path.unlink(missing_ok=True)

3. [PERF] Nested O(refs × members) scan in figure extraction

paperpipe/paper.py:305-318 — For each \includegraphics reference and each candidate extension, the code iterates all tar members (tar.getmembers()). This is O(refs × candidates × members).

Fix: Build a dict of tar members keyed by normalized name once, then do O(1) lookups per candidate:

member_index = {m.name.replace("\\", "/"): m for m in tar.getmembers() if m.isfile()}
# Also index by basename for partial matching

4. [MINOR] Duplicate URL check scans all paper directories on every add

paperpipe/cli/papers.py:75-93 — Every add call iterates the full PAPERS_DIR, opening and parsing each meta.json. O(n) per add. Not a bug per se — acceptable for personal paper collections — but worth noting if the DB grows large.

Lower priority. Could be addressed later with a URL index file if needed.


5. [BUG] _setup_debug_logging adds duplicate handlers on repeated calls

paperpipe/output.py:15-21 — Every call to _setup_debug_logging() unconditionally appends a new StreamHandler. In long-lived processes (MCP server) or tests, this duplicates log output and adds overhead.

Fix: Guard against re-adding:

def _setup_debug_logging() -> None:
    if _debug_logger.handlers:
        return
    ...

6. [BUG] Packaging: build includes wrong paths, ships stale backup file

  • pyproject.toml:85-86, 97-98 — Wheel and sdist only-include reference skill (singular) and prompts, but the actual directory is skills (plural) and no prompts directory exists. Skills are missing from built distributions, so papi install skill fails post-install.
  • paperpipe/install.py:23 — Runtime installer correctly looks for skills.
  • paperpipe/cli.py.backup — Stale backup file is tracked in git and shipped in the wheel.

Fix: Change skillskills, remove prompts in pyproject.toml build targets. Run git rm paperpipe/cli.py.backup and optionally add *.backup to .gitignore.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions