Skip to content

Access lines based on column name instead of its position #19

@Nicolas-Bouteille

Description

@Nicolas-Bouteille

Accessing lines by their column name ($line['first_name']) is better than accessing it with its position index ($line[0]) IMHO.
First, it is safer because if you ever make a mistake with the position you might copy the wrong data in the wrong field. That can lead to huge problems and you will need to delete all and import again no matter what. When you use the column name, worst case scenario the column is misspelled so data will be blank or you can stop execution right from the first line.
It is also much more convenient since it allows us to be able to add a new column later on the road in our CSV file and not necessarily place it at the end. We can insert the new column where it makes the most sense without having to rewrite and retest all the import code.
It also makes the code more readable, understandable, easier to maintain...
Here's what I did to pass the line as an associative array with the header key => line value:

$lines = [];
$is_headers_line = TRUE;
$headers = [];
while ($line = fgetcsv($handle, 4096, ';')) {
  if ($is_headers_line) {
    $headers = $line;
    $is_headers_line = FALSE;
  }
  else {
    $lines[] = $line;
  }
}
$nb_lines = count($lines);
foreach ($lines as $line) {
  $line_assoc = array_combine($headers, $line);
  // Use base64_encode to ensure we don't overload the batch
  // processor by stuffing complex objects into it.
  $batch['operations'][] = [
    '\Drupal\csvimport\Batch\CsvImportBatch::csvimportImportLine',
    [array_map('base64_encode', $line_assoc), $csvupload, $nb_lines],
  ];
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions