When trying to access data from the Svenskt Pressregister 1903-1911 dataset (direct download link: Dataset(8991 kB)) I get some unexpected results.
- One table is called "Konsert" when I open it in Microsoft Access Office 2007 but when parsed by access_parser it is called "Konsert춢".
- Some columns of type varchar can't decode the characters
… and ’. They instead become \0& (0x00 0x38) and \0\25 \0 (0x00 0x25 0x32 0x00). In Microsoft Access they have the expected values of … and ’ respectively.
The following code
from access_parser import AccessParser
db_path = "SVEPDB.accdb"
db = AccessParser(db_path)
tables = db.catalog.keys()
concert_table = [x for x in tables if x.startswith("Konsert")][0]
if concert_table != "Konsert":
print(f"Can't find Konsert table, found {concert_table}")
else:
print("Found Konsert table")
table = db.parse_table("nskon")
ellipsis_value = table['ntit'][4338]
apostrophe_value = table['ntit'][14986]
if ellipsis_value != 'Det "fula" Stockholm …':
print(f"Ellipsis not decoded correctly, got: {ellipsis_value}")
else:
print("Ellipsis decoded correctly")
if apostrophe_value != 'Landsmålsbref. Tell ’n Stanialus':
print(f"Apostrophe not decoded correctly, got: {apostrophe_value}")
else:
print("Apostrophe decoded correctly")
outputs
WARNING:Could not find overflow record data page overflow pointer: 27
WARNING:Could not find overflow record data page overflow pointer: 27
Can't find Konsert table, found Konsert춢
Ellipsis not decoded correctly, got: Det "fula" Stockholm &
Apostrophe not decoded correctly, got: Landsmålsbref. Tell n Stanialus
instead of the expected
Found Konsert table
Ellipsis decoded correctly
Apostrophe decoded correctly