Parquet decoder
The Parquet decoder stores the decoded output as Parquet files. For more information the Parquet format itself, see: https://parquet.apache.org/
Note
All major programming languages have Parquet support, see: https://arrow.apache.org/docs/
Tool support
Examples of some specific tools/languages supporting the Parquet format:
Tad (Parquet file viewer)
ClickHouse (Parquet as database)
Grafana (Requires ClickHouse server)
Output
The Parquet output data-schema always uses the following structure:
One timestamp value (
t
) using datatype Int64 (MICROS
) and snappy compressionOne or more signal values using datatype double and snappy compression
The row-group-size is set to 1.000.000 (1e6).
The signal names are constructed from the database used for decoding, as in the example below:
t Speed SpeedAccuracy SpeedValid
____________________ _____ _____________ __________
22-Apr-2022 14:14:43 0.01 2.006 1
22-Apr-2022 14:14:44 0.01 2.152 1
22-Apr-2022 14:14:45 0.01 2.290 1
If specific values exceed the MIN/MAX as defined in the database, they are included in the output as NaN
.
Warning
Output records are skipped if all values are NaN
.
Performance
Below table provides some performance numbers for different input / output scenarios.
Input records |
Input size total (MB) |
Input files (#) |
Output size total (MB) |
Output files |
Exe time (s) |
Exe peak memory (MB) |
Note |
---|---|---|---|---|---|---|---|
10.000.000 |
190 |
1 |
117 |
1 |
7 |
309 |
One input file one output file |
10.000.000 |
292 |
1000 |
114 |
1 |
8 |
116 |
Many input files one output file |
10.000.000 |
190 |
1 |
151 |
1000 |
11 |
1131 |
One input file many output files |
Note
All tests have been performed using default input arguments on a Intel Xeon @ 3.4 GHz running Linux (x86_64)
Note
All tests use random payload data, making the Parquet compression less effective. Output sizes can in some cases be significantly smaller.
Changelog
2.4.0
Change CLI version output
2.3.2
Fixed
Issue with merge of PGN source/destination addresses sometimes resulting in invalid output files
2.3.1
Added
DBC, support for float and double data types
Option to merge PGN source-address outputs
Option to merge PGN destination-address outputs
Changed
DBC, message with name “VECTOR__INDEPENDENT_SIG_MSG” ignored
DBC, empty attribute “BusType” now defaults to CAN-bus
Improved error text on invalid output dir
Improved error text on colliding message keys
2.2.2
Added
Support for signed data type
2.2.1
Added
Input argument
dbdump
to dump loaded database to JSON-file
Fixed
Better handling of special characters in DBC-file
2.1.1
First release