Skip to content

Wrong delimiter detection under certain conditions #6

@rusproject

Description

@rusproject

Conditions under which the problem occurs:

  1. The CSV being parsed has the ; delimiter.
  2. The $delimiter parameter is not specified when calling the parse() or import() methods (i.e., it remains 'auto').
  3. The CSV contains a cell with the following character sequence: ", like this:

image
Actual CSV contents:

"cell 1";"cell 2"
"cell 3";"cell ""4"", causing problem"

In these circumstances, the delimiter is auto-detected as ,, although the actual delimiter is ;. This leads to incorrect further CSV parsing, resulting in an array that looks like this:

array (
  0 => 
  array (
    0 => 'cell 1;cell 2',
  ),
  1 => 
  array (
    0 => 'cell 3;cell "4", causing problem',
  ),
)

image
Instead of this:

array (
  0 => 
  array (
    0 => 'cell 1',
    1 => 'cell 2',
  ),
  1 => 
  array (
    0 => 'cell 3',
    1 => 'cell "4" NO problem',
  ),
)

image


As far as I understand, the problem is that the following 'if' condition doesn't cover the case where the file contains $this->_enclosure . ',' (i.e. ", in this case) as an actual cell content (which is escaped as $this->_enclosure . $this->_enclosure . ',' ("", in this case)):

// detect delimiter
if ( strpos($this->_csv, $this->_enclosure . ',' ) !== false ) {
  $this->_delimiter = ',';
} // else ...

So I made a quick fix with an additional check whether the previous character is NOT the same as _enclosure:

// detect delimiter

// quick fix of wrong delimiter detection in some files with doublequotes (`"",` case)
$pos = strpos($this->_csv, $this->_enclosure . ',' );
$prev_char = substr($this->_csv, $pos - 1, 1);

if ( $pos !== false && $prev_char !== $this->_enclosure) {
  $this->_delimiter = ',';
} // else ...

It works for me and solves this one particular case, but it doesn't cover other combinations of enclosures/delimiters inside cells, neither check for some edge cases like empty enclosed cells ("","cell 2"). Please consider implementing the actual fix in your class.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions