-
Notifications
You must be signed in to change notification settings - Fork 8
Open
Description
The Parquet serializer has two issues. They are illustrated with the following api query
https://data.ssb.no/api/pxwebapi/v2/tables/14216/data/?valuecodes[TettSted]=0801
&valuecodes[ContentsCode]=Areal,Bosatte&valuecodes[Tid]=2025,2024&outputFormat=parquet
Resulting Parquet
| år | timestamp | tettsted | ContentsCode_Areal | ContentsCode_Areal_symbol | ContentsCode_Bosatte | ContentsCode_Bosatte_symbol |
|---|---|---|---|---|---|---|
| 2024 | 2024-01-01T00:00:00.000 | 0801 | 275,87 | 1110887 | ||
| 2024 | 2024-01-01T00:00:00.000 | 0801 | 1110887 | 276,3 | ||
| 2025 | 2025-01-01T00:00:00.000 | 0801 | 276,3 | 1098061 | ||
| 2025 | 2025-01-01T00:00:00.000 | 0801 | 1098061 | 1098061 |
- Selecting two or more contents (Areal and Bosatte) creates to many rows in the resulting parquet file, in this case there should have been two rows
- Selecting years
2025,2024is not the same as selecting2024,2025. In this case the the first row is actually the 2025 figures. The reason for this is that the parquet seralizer usesTIMEVALand from the px output below we see that TIMVAL is the same when swapping the years. The api does not sort any valuecodes. This is intentional in the new api.
$ curl "https://data.ssb.no/api/pxwebapi/v2/tables/14216/data/?valuecodes%5bTettSted%5d=0801&valuecodes%5bContentsCode%5d=Areal,Bosatte&valuecodes%5bTid%5d=2025,2024&outputFormat=px" -s -i | grep -E '(TIMEVAL|CODES|VALUES)'
VALUES("tettsted")="Oslo";
VALUES("statistikkvariabel")="Areal av tettsted (km?)","Bosatte";
VALUES("år")="2025","2024";
TIMEVAL("år")=TLIST(A1),"2024","2025";
CODES("tettsted")="0801";
CODES("statistikkvariabel")="Areal","Bosatte";
CODES("år")="2025","2024";$ curl "https://data.ssb.no/api/pxwebapi/v2/tables/14216/data/?valuecodes%5bTettSted%5d=0801&valuecodes%5bContentsCode%5d=Areal,Bosatte&valuecodes%5bTid%5d=2024,2025&outputFormat=px" -s -i | grep -E '(TIMEVAL|CODES|VALUES)'
VALUES("tettsted")="Oslo";
VALUES("statistikkvariabel")="Areal av tettsted (km?)","Bosatte";
VALUES("år")="2024","2025";
TIMEVAL("år")=TLIST(A1),"2024","2025";
CODES("tettsted")="0801";
CODES("statistikkvariabel")="Areal","Bosatte";
CODES("år")="2024","2025";The first issue with to many rows is a clear bug. I have changed the tests and will try and fix the bug in PxTools/PCAxis.Serializers#181
For the second issue it is not clear if the bug is in the parquet serializer or in the PxWebApi for not sorting time in ascending order?
Is this a valid PX file according to the TIMEVAL documentation?
Metadata
Metadata
Assignees
Labels
No labels