Skip to content

parquet-reader outputting invalid JSON #47668

@alexey-milovidov

Description

@alexey-milovidov

Describe the bug, including details regarding any error messages, version, and platform.

Example output without this:

{
  "FileName": "/home/ubuntu/ClickHouse/tests/queries/0_stateless/data_parquet/68131.parquet",
  "Version": "1.0",
  "CreatedBy": "parquet-mr version 1.12.2 (build 77e30c8093386ec52c3cfa6c34b7ef3321322c94)",
  "TotalRows": "1",
  "NumberOfRowGroups": "1",
  "NumberOfRealColumns": "1",
  "NumberOfColumns": "1",
  "Columns": [
     { "Id": "0", "Name": "f", "PhysicalType": "INT32", "ConvertedType": "NONE", "LogicalType": {"Type": "None"} }
  ],
  "RowGroups": [
     {
       "Id": "0",  "TotalBytes": "43",  "TotalCompressedBytes": "43",  "Rows": "1",
       "ColumnChunks": [
          {"Id": "0", "Values": "2", "StatsSet": "True", "Stats": {"NumNulls": "0", "Max": "2", "Min": "1" },
           "Compression": "UNCOMPRESSED", "Encodings": "PLAIN", "UncompressedSize": "43", "CompressedSize": "43", ColumnIndex {"offset": "47", "length": "23"}", OffsetIndex {"offset": "70", "length": "10"}" }
        ]
     }
  ]
}

It goes off the rails starting at ColumnIndex.

ClickHouse uses this tool in test 00900_long_parquet_load (ran manually to get table schemas).

The fix is here: ClickHouse#72

Component(s)

C++

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions