MF4 to Parquet decoder
The MF4 to Parquet decoder stores the decoded output as Parquet files. For more information the Parquet format itself, see: https://parquet.apache.org/
Note
All major programming languages have Parquet support, see: https://arrow.apache.org/docs/
Tool support
Examples of some specific tools/languages supporting the Parquet format:
Tad (Parquet file viewer)
ClickHouse (Parquet as database)
Grafana (Requires ClickHouse server)
Output
The Parquet output data-schema always uses the following structure:
One timestamp value (
t
) using datatype Int64 (MICROS
) and snappy compressionOne or more signal values using datatype double / NULL and snappy compression
The row-group-size is set to 1 000 000 (1e6).
The signal names are constructed from the database used for decoding, as in the example below:
t Speed SpeedAccuracy SpeedValid
____________________ _____ _____________ __________
22-Apr-2022 14:14:43 0.01 2.006 1
22-Apr-2022 14:14:44 0.01 2.152 1
22-Apr-2022 14:14:45 0.01 2.290 1
If specific values exceed the MIN/MAX as defined in the database, they are included in the output as NULL.
Warning
Output records are skipped if all values are NULL.
Performance
Below table provides some performance numbers for different input / output scenarios.
Input records |
Input size total (MB) |
Input files (#) |
Output size total (MB) |
Output files |
Exe time (s) |
Exe peak memory (MB) |
Note |
---|---|---|---|---|---|---|---|
10 000 000 |
190 |
1 |
117 |
1 |
7 |
309 |
One input file one output file |
10 000 000 |
292 |
1000 |
114 |
1 |
8 |
116 |
Many input files one output file |
10 000 000 |
190 |
1 |
151 |
1000 |
11 |
1131 |
One input file many output files |
Note
All tests have been performed using default input arguments on a Intel Xeon @ 3.4 GHz running Linux (x86_64)
Note
All tests use random payload data, making the Parquet compression less effective. Output sizes can in some cases be significantly smaller.
Changelog
# Changelog
All notable changes to this project will be documented in this file.
## [24.12.19]
### Added
- Support for mdf2mdf output format from version 24.12.19
### Changed
- Updated MDF reader / writer based on "mdflib" by ihedvall
- Parquet writer updated to version 18.0.0
### Fixed
- Parquet writer performance restored (issue introduced in 24.10.17)
- Fix for packed error frames in MUX-TP
## [24.10.17]
### Added
- Support for transport protocols (ISO-TP, J1939-21, NMEA-TP, MUX-TP)
### Changed
- Versioning schema from SemVer to CalVer
- Default verbosity level changed to 2
- Type of out-of-range signal values changed from double *NaN* to *NULL* (change to parquet schema)
- Restriction on max 5 DBC-files per interface/channel removed
### Known-issues
- Reduced parquet write speed
## [2.3.2]
### Fixed
- Issue with merge of PGN source/destination addresses sometimes resulting in invalid output files
## [2.3.1]
### Added
- Support for float and double data types (defined in DBC file)
- Option to merge PGN source-address outputs
- Option to merge PGN destination-address outputs
### Changed
- DBC file message with name "VECTOR__INDEPENDENT_SIG_MSG" ignored
- DBC file empty attribute "BusType" now defaults to "CAN-bus"
- Improved error text on invalid output dir
- Improved error text on colliding message keys
## [2.2.2]
### Added
- Support for "signed" data type
- Support for compacted JSON password file
## [2.2.1]
### Added
- Input argument "dbdump" to dump loaded database to JSON-file
### Fixed
- Better handling of special characters in DBC-file
Download
Windows AMD64 / x86-64 (64-bit)
- 24.12.19.zip (MD5:
25fd6978483e89d46bf97f6ea0f7bdad
) - 24.10.17.zip (MD5:
02aa987185ad1a7c9cd30bcadb10ca9b
) - 2.3.2.zip (MD5:
fb0950bc0808df8af410634e536fa9bb
) - 2.3.1.zip (MD5:
b754b0160de7c8bf9efec48553576641
) - 2.2.2.zip (MD5:
4261251ae62138db1561d37ba6da82c2
) - 2.2.1.zip (MD5:
e34aa2359fd642d77794efbc071042b1
) - 2.1.1.zip (MD5:
4216d7aaffaf98a4e135c70eaae1325e
)
Linux AMD64 / x86-64 (64-bit)
- 24.12.19.zip (MD5:
b8fb2e988142ce72ec37e3019725ff52
) - 24.10.17.zip (MD5:
02b93dcd49e36c7656e1d78e98969cfd
) - 2.3.2.zip (MD5:
e219f190a1d1613a8c6076121c91c625
) - 2.3.1.zip (MD5:
4752bc58a0e3a5f3b850af68ea8663ea
) - 2.2.2.zip (MD5:
463d995342a2fbf5a6d0b319f9cc0ade
) - 2.2.1.zip (MD5:
52cf0fcc3c31fd606a742d5176fa22f8
) - 2.1.1.zip (MD5:
15f959910cf0fef97d52f54c419e2882
)
Linux ARM64 (64-bit)
- 24.12.19.zip (MD5:
86297415862c199f44e6a2823e2a2592
) - 24.10.17.zip (MD5:
1c42b0db5075c9e23373f37e327a44f3
) - 2.3.2.zip (MD5:
e9f0ee32716bbcaafc3f93b13bd6d2f1
) - 2.3.1.zip (MD5:
e956cba794066c85e72fe7d2cafbb4f0
) - 2.2.2.zip (MD5:
018922344724c49a521179ff24ca48a9
) - 2.2.1.zip (MD5:
3c20752cec36af96fdee4014c7f0fbac
) - 2.1.1.zip (MD5:
a3853b90494c8e038fd275b9d5b49326
)