Skip to content

Conversation

@hummat
Copy link
Owner

@hummat hummat commented Feb 8, 2026

Summary

Address 6 of 7 non-security findings from code audit (issue #56). Finding 4 (O(n) URL check) deferred as acceptable at current scale.

Closes: #56

Changes

  • Guard _find_paper_by_source_url against missing PAPERS_DIR (finding 1)
  • Wrap download_source temp file in try/finally for guaranteed cleanup (finding 2)
  • Replace O(n²) figure member lookup with O(1) dict indexes (finding 3)
  • Prevent duplicate log handlers; replace stale handlers on re-init (finding 5)
  • Fix hatch build targets: "skill""skills", remove nonexistent "prompts" (finding 6a)
  • Remove stale cli.py.backup, add *.backup to .gitignore (finding 6b)
  • Handle figure basename collisions with parent-dir prefix (finding 7)
  • Deduplicate repeated \includegraphics refs to same tar member

Type of Change

  • Bug fix
  • Refactoring (no functional changes)

Testing

  • Ran make check (required)
  • Added/updated tests
  • 475 tests pass, 75.86% coverage (above 75% threshold)

New tests:

  • TestDownloadSourceTempFileCleanup — verifies temp file cleanup on OSError
  • TestSetupDebugLogging — handler dedup + stale handler replacement
  • test_deduplicates_repeated_includegraphics_refs — same figure ref'd 3× → extracted 1×
  • Updated test_handles_duplicate_figure_names_from_different_dirs — asserts 2 distinct files

Checklist

  • I have read the Contributing Guidelines
  • My code follows the existing style (ruff format + pyright)
  • I have updated documentation if needed (README, AGENT_INTEGRATION.md)
  • CLI changes are reflected in help text and docs

hummat and others added 4 commits February 8, 2026 20:29
- Guard _find_paper_by_source_url against missing PAPERS_DIR (#56 finding 1)
- Wrap download_source temp file in try/finally for guaranteed cleanup (#56 finding 2)
- Prevent duplicate log handlers in _setup_debug_logging (#56 finding 5)
- Add tests for temp file cleanup and handler deduplication

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ions

- Build O(1) lookup indexes (full path + basename) from single getmembers()
  call, replacing O(n²) inner loop (#56 finding 3)
- Handle basename collisions by prefixing with parent dir name, e.g.
  figs/plot.png + results/plot.png → plot.png + results_plot.png (#56 finding 7)
- Update test to assert both files extracted with distinct contents

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix hatch build targets: "skill" → "skills" (matches actual directory)
- Remove nonexistent "prompts" from build targets (#56 finding 6a)
- Remove stale paperpipe/cli.py.backup from repo (#56 finding 6b)
- Add *.backup to .gitignore

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Deduplicate figure extraction by source member to prevent inflated
  count when same \includegraphics ref appears multiple times
- Re-check dest_name uniqueness after parent-dir prefixing
- Replace stale log handlers instead of early-return guard, fixing
  ValueError on closed streams (CliRunner/captured output scenarios)
- Add tests for duplicate-ref dedup and stale-handler replacement

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@hummat hummat merged commit 48aed32 into main Feb 8, 2026
2 checks passed
@hummat hummat deleted the fix/robustness-improvements branch February 8, 2026 19:43
@claude
Copy link

claude bot commented Feb 8, 2026

No issues found. Checked for bugs and CLAUDE.md compliance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Robustness and efficiency issues from code audit

1 participant