Fix LimitedInputStream reading too far #38
Draft
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Well, it fixes it sort of 99.99%...
Issue 1
The ability of the LimitedInputStream to throw an exception if reading beyond its limit introduces a small issue, and I believe this supported variance breaks the design of InputStream behavior. Reaching the limit should in all cases just be an EOF, not a thrown exception. The reason for this is that this makes it impossible to stop exactly after reading the last byte before the limit. If consuming the
LimitedInputStreamby exhausting it (which is the usual way), one must do at least one read beyond the limit to see if the underlying stream is also exactly at EOF, or if it contains more data (and followingly it should throw an exception because reading beyond the limit). See also commit msg of ecb3a8eIf e.g. wrapping a
BufferedInputStreamwith aLimitedInputStreamin order to initially consume a header portion of the bytes, and then afterwards.reset()the BufferedInputStream, the value passed to.mark(..)must be one more than the reading limit ofLimitedInputStream, because it needs to read one more byte to determine it's next action if reaching the limit, and this breaks the setmarkin theBufferedInputStream. It would be expected that these two values should be the same, that a LimitedInputStream in fact never reads beyond it's limit.The
LimitedInputStreamTest.rewindWhenReachingLimit()test demonstrates this.Issue 2
The second issue this PR fixes is:
If reading bytes in chunks (which is most common for performance reasons), if the last chunk of read bytes reads past the (and potentially way past) the limit, the last chunk would be treated as having reached the limit and either an EOF or exception would be yielded. For example:
Running
LimitedInputStreamTest.rewindWhenReachingLimit()against the old implementation also demonstrates this, becauseIOUtils.toByteArrayreads chunks of 8kB, and thus consuming the LimitedInputStream (limit=1024) yields zero bytes, because the first read way past the limit.Now, the reading from the source stream is guarded by calculating how many bytes is applicable to read.
Conclusion
LimitedInputStreamnow guards the delegated read operations to never read more than absolutely necessary.It is still however necessary to provide a value for
BufferedInputStream.mark(..)one larger than the limit of the wrapping LimitedInputStream in order for being able to safely rewind after exhausting the LimitedInputStream. This is unfortunate, but still better than theLimitedInputStreampotentially reading way past its limit.