Skip to content

Minor suggestion: add option to sanitize atypical unicode #55

@owasow

Description

@owasow

I was working with some survey data in which a number of open-ended text responses included atypical unicode characters that broke latex compilation (though could work with xelatex). These text strings tended to be nonsense input from users so it wasn't essential to include them in the codebook but I found it hard to find the right code to strip them out before running dataReporter. Ultimately, it turned out to be easy with the following code gsub('[^\x20-\x7E]', '', text) but it took me a while to locate this particular solution on StackOverflow (see link below).
https://stackoverflow.com/questions/38828620/how-to-remove-strange-characters-using-gsub-in-r

I wonder if an option/argument to automatically sanitize character strings would make sense given that this will likely be an issue for a wide range of data sets.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions