Skip to content

Comments

Fix IndexError in scan_tag() and check_key() on empty input#913

Open
veeceey wants to merge 1 commit intoyaml:mainfrom
veeceey:fix/issue-906-indexerror-empty-input
Open

Fix IndexError in scan_tag() and check_key() on empty input#913
veeceey wants to merge 1 commit intoyaml:mainfrom
veeceey:fix/issue-906-indexerror-empty-input

Conversation

@veeceey
Copy link

@veeceey veeceey commented Feb 13, 2026

Summary

Fixes #906 by adding buffer bounds checking before peek(1) calls in scanner methods.

When the input is an empty string or very short, calling peek(1) without validating buffer bounds causes IndexError because the buffer only contains the EOF marker (\0) at position 0, but peek(1) tries to access position 1.

Changes

Added bounds checking in three methods in lib/yaml/scanner.py:

  1. check_key() - Check if pointer + 1 < len(buffer) before peek(1)
  2. check_value() - Check if pointer + 1 < len(buffer) before peek(1)
  3. scan_tag() - Check buffer bounds before both peek(1) and peek() calls

When buffer bounds are insufficient, the code falls back to peek() which safely returns the EOF marker \0.

Testing

Verified the fix with multiple test cases:

# Previously raised IndexError, now works correctly
loader = yaml.loader.SafeLoader("")
loader.check_key()    # Returns True
loader.scan_tag()     # Returns TagToken
loader.check_value()  # Returns True

All existing tests pass:

  • tests/test_dump_load.py - 3 tests passed
  • tests/legacy_tests/test_errors.py - 189 tests passed

Manual Test Results

1. check_key() with empty input: SUCCESS (no IndexError)
2. scan_tag() with empty input: SUCCESS (no IndexError)  
3. check_value() with empty input: SUCCESS (no IndexError)
4. Various YAML inputs parse correctly without IndexError

The fix ensures proper handling of edge cases while maintaining backward compatibility with all existing functionality.

Calling peek(1) without checking buffer bounds causes IndexError when
the buffer doesn't have enough characters available. This occurs with
empty or very short input where peek(1) tries to access buffer[pointer+1]
but only buffer[pointer] exists (the EOF marker '\0').

Fixed by adding bounds checking before peek() calls in:
- check_key(): Check if pointer+1 < len(buffer) before peek(1)
- check_value(): Check if pointer+1 < len(buffer) before peek(1)
- scan_tag(): Check buffer bounds before both peek(1) and peek() calls

When buffer bounds are insufficient, fall back to peek() which returns
the EOF marker '\0', ensuring proper handling of end-of-stream cases.

Fixes yaml#906
@veeceey
Copy link
Author

veeceey commented Feb 19, 2026

Friendly ping - any chance someone could take a look at this when they get a chance? Happy to make any changes if needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

IndexError in scan_tag() and check_key() on empty input

1 participant