Consider using a flex based scanner for more maintainability.
The current scanner is hand-written, but allows us to use mmap(). With a flex based scanner, we need to investigate if we can track the file offset. Also benchmark the flex based scanner to see if it's fast enough (I think it should be).