Skip to content

UnicodeDecodeError: Can't decode from latin1 #8

@pabloab

Description

@pabloab

I was getting

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd1 in position 24: invalid continuation byte

0xd1 is "Ñ" on latin1, which is consistent with the output dbfview -t -b myfile.dbf > myfile.txt; dbfile myfile.txt.

I tried to

dbf2csv -ie 'latin1'  'myfile.dbf'

which I believe should fix the issue, but it doesn't.

git clone https://github.com/akadan47/dbf2csv.git
cd dbf2csv
pip install -r requirements.txt
python setup.py install

dbf2csv --version
# dbf2csv 1.3

Workaround

I ended up converting with LibreOffice

sudo apt install libreoffice-base
libreoffice --headless --convert-to csv myfile.dbf  # Will generate myfile.csv
iconv -f ISO-8859-15 -t UTF-8 myfile.csv  > myfile-utf8.csv

I validated the CSV generated with frictionless.

For multiple files:

find . -type f -execdir libreoffice --headless --convert-to csv "{}" +
for FILE in *.csv; do iconv -f ISO-8859-15 -t UTF-8 "$FILE" -o "${FILE%%.*}-utf8.csv"; done

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions