Skip to content

Reading CSVs with UTF8 BOM #138

@mcoady

Description

@mcoady

This might be more of a cassava issue (there's a decade-old issue there), but when I've used cassava in the past I've had to do this "fix" when reading certain csv files (usually when making them in Excel). But might be useful to add the change into dataframe from a UX perspective maybe? Basic fix here.

Current behaviour (due to cassava)

dataframe> D.readCsv "cdr.csv"
--------------------
*** Exception: <stdout>: hPutChar: invalid argument (cannot encode character '\65279')

Behaviour after change

dataframe> D.readCsv "cdr.csv"
-------------------
period | percentage
-------|-----------
 Int   |   Double
-------|-----------
-1     | 0.38
0      | 0.76

Lazy

I see the lazy version does load and picks these up as valid characters, but not sure if that should be the case. Had a look at trying to do a similar "fix" but would need to research the mechanics more.

dataframe> CSVL.readCsv "cdr.csv"
----------------------
´╗┐period | percentage
----------|-----------
   Int    |   Double
----------|-----------
-1        | 0.38
0         | 0.76

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions