-
Notifications
You must be signed in to change notification settings - Fork 8
Description
FYI: During testing, I noticed a couple invalid/unexpected string inputs can cause lexer::tokenize() to panic (i.e. crash the process vs returning Err(LexerError) result). I believe the input string slice is (valid UTF8), but definitely not (valid RTF). There may be other/similar occurrences of this, but I've posted a couple simplified examples below if you're interested in fixing it.
I should be able to work around this myself by sanitizing|validating the string input before calling into rtf_parser, but if it were addressed in the library itself: that would be great.
Here is the first example:
#[test]
fn rtf_lexer___scan_randombytes___panics() {
let bytes = [92u8, 97, 194, 160, 125];
let utf8_string = String::from_utf8_lossy(&bytes).to_string();
let tokens = Lexer::scan(utf8_string.as_str());
}
...
byte index 3 is not a char boundary; it is inside '\u{a0}' (bytes 2..4) of `\a `
thread 'text_extractors::rtf_text_extractor::tests::rtf_lexer___scan_randombytes___panics' panicked at /Users/jhoward/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/rtf-parser-0.4.2/src/utils.rs:28:26:
Here is another example that panics in a different location:
#[test]
fn rtf_lexer___scan_more_randombytes___panics() {
let bytes = [92u8, 39, 0, 10, 0];
let utf8_string = String::from_utf8_lossy(&bytes).to_string();
let tokens = Lexer::scan(utf8_string.as_str());
}
...
byte index 3 is out of bounds of `'�`
thread 'text_extractors::rtf_text_extractor::tests::rtf_lexer___scan_more_randombytes___panics' panicked at /Users/jhoward/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/rtf-parser-0.4.2/src/lexer.rs:108:56:
QuickCheck test dependency could be executed several times with a high iteration count to verify/increase confidence that no others remain:
use quickcheck_macros::quickcheck;
use rtf_parser::{Lexer};
#[quickcheck]
fn quickcheck___scan_randombytes___check_for_panic(value: Vec<u8>) {
eprintln!("{:?}", value);
let utf8_string = String::from_utf8_lossy(&value).to_string();
let tokens = Lexer::scan(utf8_string.as_str());
// USAGE: $ QUICKCHECK_TESTS=10000 cargo test quickcheck___scan_randombytes___check_for_panic
}
Thanks for creating+maintaining this library-