Skip to content

Failed parsing overflow record offset #31

@Btibert3

Description

@Btibert3

I have identified a scenario where a certain MS Access database can not be parsed. Below is the error.

WARNING:access_parser:Failed parsing overflow record offset
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
[<ipython-input-9-79c5ce1f7c6d>](https://localhost:8080/#) in <cell line: 0>()
----> 1 db = AccessParser("IPEDS201617.accdb")

5 frames
[/usr/local/lib/python3.11/dist-packages/access_parser/access_parser.py](https://localhost:8080/#) in _parse_memo(self, relative_obj_data, return_raw)
    546             LOGGER.debug("LVAL type 2")
    547             rec_data = self._get_overflow_record(parsed_memo.record_pointer)
--> 548             next_page = struct.unpack("I", rec_data[:4])[0]
    549             # LVAL2 has data over multiple pages. The first 4 bytes of the page are the next record, then that data.
    550             # Concat the data until we get a 0 next_page.

TypeError: 'NoneType' object is not subscriptable

The reproducible code can be found in this Google Colab notebook: https://colab.research.google.com/drive/1Bs-8aPSFHveMwZN--y_fpSEciOp_-0tm?usp=sharing

What's interesting is that the following files on the NCES website do not open:

  • 2016/17 (shown in the Colab notebook)
  • 2017/18
  • 2018/19
  • 2019/20

The remainder of the Access files are able to be processed by this library.

I don't have (easy) access to a Windows machine, but I have opened this database file (and the other listed above) several ways; a Windows Machine via Access, in KNIME and via custom code using JDBC.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions