Skip to content

UnicodeDecodeError when running codeviz on non-UTF-8 encoded files #15

@gismail

Description

@gismail

Hi,

I'm encountering a UnicodeDecodeError when running codeviz on a source directory that contains files not encoded in UTF-8.
codeviz -r myproject/

Error Traceback:

Traceback (most recent call last):
File "/usr/local/bin/codeviz", line 33, in
sys.exit(load_entry_point('codeviz==1.0.0', 'console_scripts', 'codeviz')())
File "/usr/local/lib/python3.10/dist-packages/codeviz-1.0.0-py3.10.egg/codeviz.py", line 346, in main
File "/usr/local/lib/python3.10/dist-packages/codeviz-1.0.0-py3.10.egg/codeviz.py", line 117, in get_nodes
File "/usr/local/lib/python3.10/dist-packages/codeviz-1.0.0-py3.10.egg/codeviz.py", line 65, in init
File "/usr/local/lib/python3.10/dist-packages/codeviz-1.0.0-py3.10.egg/codeviz.py", line 47, in init
File "/usr/local/lib/python3.10/dist-packages/codeviz-1.0.0-py3.10.egg/codeviz.py", line 53, in _get_included_headers
File "/usr/lib/python3.10/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 5078: invalid continuation byte
Environment:

OS: Ubuntu
Python: 3.10

Possible Cause: It seems the tool assumes all files are UTF-8 encoded, but some files in the scanned directory are encoded in Latin-1 or another encoding.

Suggested Fix: Add support for detecting or specifying file encodings, or fallback to a more permissive encoding like latin-1 when UTF-8 decoding fails.

Thanks for your work on this tool!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions