Skip to content

Conversation

@hentzthename
Copy link

Hi Nico, I think I found a bug while using this package.

Problem

When loading JSON data (where field order isn't guaranteed), subsequent loads could fail with:

ValueError: Target schema's field names are not matching the table's field names: ['a', 'b', 'c'], ['c', 'b', 'a']

This occurs because PyArrow's table.cast() matches fields by position, not by name. Fails even though all field names and types match.

Solution

Reorder source table columns to match target schema order before casting

Changes

  • tests/test_schema_casting.py: Add test for field ordering
  • src/dlt_iceberg/schema_casting.py: Add column reordering before table.cast() in cast_table_safe()

Add regression test that demonstrates the field ordering bug where
cast_table_safe fails when source table fields are in a different
order than the target schema, even when all field names and types match.
Reorder source table columns to match target schema order before
casting. PyArrow's cast() matches fields by position, not name,
so tables with different field ordering would fail even when all
field names and types matched.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant