Skip to content

Conversation

@al13n321
Copy link
Member

Example output without this:

{
  "FileName": "/home/ubuntu/ClickHouse/tests/queries/0_stateless/data_parquet/68131.parquet",
  "Version": "1.0",
  "CreatedBy": "parquet-mr version 1.12.2 (build 77e30c8093386ec52c3cfa6c34b7ef3321322c94)",
  "TotalRows": "1",
  "NumberOfRowGroups": "1",
  "NumberOfRealColumns": "1",
  "NumberOfColumns": "1",
  "Columns": [
     { "Id": "0", "Name": "f", "PhysicalType": "INT32", "ConvertedType": "NONE", "LogicalType": {"Type": "None"} }
  ],
  "RowGroups": [
     {
       "Id": "0",  "TotalBytes": "43",  "TotalCompressedBytes": "43",  "Rows": "1",
       "ColumnChunks": [
          {"Id": "0", "Values": "2", "StatsSet": "True", "Stats": {"NumNulls": "0", "Max": "2", "Min": "1" },
           "Compression": "UNCOMPRESSED", "Encodings": "PLAIN", "UncompressedSize": "43", "CompressedSize": "43", ColumnIndex {"offset": "47", "length": "23"}", OffsetIndex {"offset": "70", "length": "10"}" }
        ]
     }
  ]
}

It goes off the rails starting at ColumnIndex.

ClickHouse uses this tool in test 00900_long_parquet_load (ran manually to get table schemas).

@github-actions
Copy link

Thanks for opening a pull request!

If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose

Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project.

Then could you also rename the pull request title in the following format?

GH-${GITHUB_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}

or

MINOR: [${COMPONENT}] ${SUMMARY}

See also:

@al13n321
Copy link
Member Author

We should stop using this tool instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants