I have identified a scenario where a certain MS Access database can not be parsed. Below is the error.
WARNING:access_parser:Failed parsing overflow record offset
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
[<ipython-input-9-79c5ce1f7c6d>](https://localhost:8080/#) in <cell line: 0>()
----> 1 db = AccessParser("IPEDS201617.accdb")
5 frames
[/usr/local/lib/python3.11/dist-packages/access_parser/access_parser.py](https://localhost:8080/#) in _parse_memo(self, relative_obj_data, return_raw)
546 LOGGER.debug("LVAL type 2")
547 rec_data = self._get_overflow_record(parsed_memo.record_pointer)
--> 548 next_page = struct.unpack("I", rec_data[:4])[0]
549 # LVAL2 has data over multiple pages. The first 4 bytes of the page are the next record, then that data.
550 # Concat the data until we get a 0 next_page.
TypeError: 'NoneType' object is not subscriptable
The reproducible code can be found in this Google Colab notebook: https://colab.research.google.com/drive/1Bs-8aPSFHveMwZN--y_fpSEciOp_-0tm?usp=sharing
What's interesting is that the following files on the NCES website do not open:
- 2016/17 (shown in the Colab notebook)
- 2017/18
- 2018/19
- 2019/20
The remainder of the Access files are able to be processed by this library.
I don't have (easy) access to a Windows machine, but I have opened this database file (and the other listed above) several ways; a Windows Machine via Access, in KNIME and via custom code using JDBC.