Skip to content

Panic in lexer::tokenize() #18

@jrobhoward

Description

@jrobhoward

FYI: During testing, I noticed a couple invalid/unexpected string inputs can cause lexer::tokenize() to panic (i.e. crash the process vs returning Err(LexerError) result). I believe the input string slice is (valid UTF8), but definitely not (valid RTF). There may be other/similar occurrences of this, but I've posted a couple simplified examples below if you're interested in fixing it.

I should be able to work around this myself by sanitizing|validating the string input before calling into rtf_parser, but if it were addressed in the library itself: that would be great.

Here is the first example:

    #[test]
    fn rtf_lexer___scan_randombytes___panics() {
        let bytes = [92u8, 97, 194, 160, 125];
        let utf8_string = String::from_utf8_lossy(&bytes).to_string();

        let tokens = Lexer::scan(utf8_string.as_str());
    }
...

byte index 3 is not a char boundary; it is inside '\u{a0}' (bytes 2..4) of `\a `
thread 'text_extractors::rtf_text_extractor::tests::rtf_lexer___scan_randombytes___panics' panicked at /Users/jhoward/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/rtf-parser-0.4.2/src/utils.rs:28:26:

Here is another example that panics in a different location:

    #[test]
    fn rtf_lexer___scan_more_randombytes___panics() {
        let bytes = [92u8, 39, 0, 10, 0];
        let utf8_string = String::from_utf8_lossy(&bytes).to_string();

        let tokens = Lexer::scan(utf8_string.as_str());
    }
...
byte index 3 is out of bounds of `'�`
thread 'text_extractors::rtf_text_extractor::tests::rtf_lexer___scan_more_randombytes___panics' panicked at /Users/jhoward/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/rtf-parser-0.4.2/src/lexer.rs:108:56:

QuickCheck test dependency could be executed several times with a high iteration count to verify/increase confidence that no others remain:

    use quickcheck_macros::quickcheck;
    use rtf_parser::{Lexer};

    #[quickcheck]
    fn quickcheck___scan_randombytes___check_for_panic(value: Vec<u8>) {
        eprintln!("{:?}", value);
        let utf8_string = String::from_utf8_lossy(&value).to_string();
        let tokens = Lexer::scan(utf8_string.as_str());

        // USAGE: $ QUICKCHECK_TESTS=10000 cargo test quickcheck___scan_randombytes___check_for_panic
    }

Thanks for creating+maintaining this library-

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions