-
Notifications
You must be signed in to change notification settings - Fork 4k
Open
Labels
Description
Describe the bug, including details regarding any error messages, version, and platform.
Example output without this:
{
"FileName": "/home/ubuntu/ClickHouse/tests/queries/0_stateless/data_parquet/68131.parquet",
"Version": "1.0",
"CreatedBy": "parquet-mr version 1.12.2 (build 77e30c8093386ec52c3cfa6c34b7ef3321322c94)",
"TotalRows": "1",
"NumberOfRowGroups": "1",
"NumberOfRealColumns": "1",
"NumberOfColumns": "1",
"Columns": [
{ "Id": "0", "Name": "f", "PhysicalType": "INT32", "ConvertedType": "NONE", "LogicalType": {"Type": "None"} }
],
"RowGroups": [
{
"Id": "0", "TotalBytes": "43", "TotalCompressedBytes": "43", "Rows": "1",
"ColumnChunks": [
{"Id": "0", "Values": "2", "StatsSet": "True", "Stats": {"NumNulls": "0", "Max": "2", "Min": "1" },
"Compression": "UNCOMPRESSED", "Encodings": "PLAIN", "UncompressedSize": "43", "CompressedSize": "43", ColumnIndex {"offset": "47", "length": "23"}", OffsetIndex {"offset": "70", "length": "10"}" }
]
}
]
}It goes off the rails starting at ColumnIndex.
ClickHouse uses this tool in test 00900_long_parquet_load (ran manually to get table schemas).
The fix is here: ClickHouse#72
Component(s)
C++