Skip to content

Conversation

@martsokha
Copy link
Member

No description provided.

- Add TextExtractor trait for native text extraction from documents
  - extract_text() returns ExtractedText with raw, by-page, by-region text
  - extract_text_for_page() for single page extraction
  - needs_ocr() heuristic check for scanned documents

- Add TableExtractor trait for table extraction and normalization
  - extract_tables() returns Vec<NormalizedTable>
  - NormalizedTable, NormalizedRow, NormalizedCell types
  - CellDataType inference (Text, Number, Date, Boolean, Formula, Empty)

- Add prelude.rs to nvisy-archive crate

- Refactor nvisy-core:
  - ContentData now supports Bytes or HipStr via ContentBytes enum
  - Content struct wraps ContentData + optional ContentMetadata
  - Remove from_file_extension/common_extensions from ContentKind
  - Move extension-to-kind mapping to nvisy-archive

- Add runtime documentation:
  - docs/README.md - overview and crate structure
  - docs/PIPELINE.md - processing stages with pseudocode
  - docs/DATATYPES.md - core data structures
…l files

- Add nvisy-image stub crate with ImageDocument and ImageFormat
- Add dynamic FormatRegistry with type erasure for runtime format selection
- Split nvisy-text structured formats into separate modules (xml, yaml, toml, ini)
- Use csv 1.4 crate for CSV parsing
- Use markdown 1.0 crate for Markdown parsing
- Remove write operations from nvisy-document (operation module)
- Reformat all Cargo.toml files with grouped dependencies and comments
- Standardize package section format across all crates
@martsokha martsokha self-assigned this Jan 16, 2026
@martsokha martsokha added feat request for or implementation of a new feature docs improvements, updates or additions to docs labels Jan 16, 2026
- Remove nvisy-document error module, re-export from nvisy-core
- Add load_file method to DocumentFormat trait
- Use data.as_string()? instead of String::from_utf8_lossy in nvisy-text
- Split ImageFormat into JpegFormat and PngFormat
- Register image formats in nvisy-engine
- Rename all crate packages to avoid conflicts with consumer library
- Update all Cargo.toml dependencies and features
- Update all source file imports (nvisy_* -> nvisy_rt_*)
- Update README doc examples with new crate names
- Fix clippy lints and doc warnings
- Add diff module with Differ trait, Change, ChangeKind, RegionChange, Diff types
@martsokha martsokha merged commit ebeab65 into main Jan 17, 2026
5 checks passed
@martsokha martsokha deleted the feature/prerelease branch January 17, 2026 07:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs improvements, updates or additions to docs feat request for or implementation of a new feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants