Skip to content

Handle utf8 strings properly#34

Merged
redstrate merged 2 commits intoredstrate:mainfrom
AzurIce:main
Feb 13, 2026
Merged

Handle utf8 strings properly#34
redstrate merged 2 commits intoredstrate:mainfrom
AzurIce:main

Conversation

@AzurIce
Copy link
Contributor

@AzurIce AzurIce commented Feb 13, 2026

closes: #33


Previously all null-terminated string readers used byte as char, which treats each byte as a Latin-1 code point. This corrupts any multi-byte UTF-8 text (e.g. Chinese/Japanese item names in EXD sheets).

Replace every occurrence with proper UTF-8 decoding via String::from_utf8, and extract two reusable helpers in common_file_operations:

  • read_null_terminated_utf8 (reader-based)
  • null_terminated_utf8 (byte-slice-based)

Also fix dic.rs where as u8 as char truncated full Unicode code points to 8 bits.

Add 8 unit tests covering ASCII, CJK, empty, and invalid UTF-8 inputs.

Previously all null-terminated string readers used `byte as char`,
which treats each byte as a Latin-1 code point. This corrupts any
multi-byte UTF-8 text (e.g. Chinese/Japanese item names in EXD sheets).

Replace every occurrence with proper UTF-8 decoding via
`String::from_utf8`, and extract two reusable helpers in
common_file_operations:
- `read_null_terminated_utf8` (reader-based)
- `null_terminated_utf8` (byte-slice-based)

Also fix `dic.rs` where `as u8 as char` truncated full Unicode
code points to 8 bits.

Add 8 unit tests covering ASCII, CJK, empty, and invalid UTF-8 inputs.
Copy link
Owner

@redstrate redstrate left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Didn't think it would be that easy, also tested it locally with Novus and it prints the correct Japanese text.

@redstrate redstrate merged commit f668799 into redstrate:main Feb 13, 2026
1 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Handle utf8 strings properly

2 participants