MF4 to Parquet decoder

The MF4 to Parquet decoder stores the decoded output as Parquet files. For more information the Parquet format itself, see: https://parquet.apache.org/

Note

All major programming languages have Parquet support, see: https://arrow.apache.org/docs/

Tool support

Examples of some specific tools/languages supporting the Parquet format:

Tad (Parquet file viewer)
Matlab
Python
ClickHouse (Parquet as database)
Grafana (Requires ClickHouse server)

Output

The Parquet output data-schema always uses the following structure:

One timestamp value (t) using datatype Int64 (MICROS) and snappy compression
One or more signal values using datatype double / NULL and snappy compression

The row-group-size is set to 1 000 000 (1e6).

The signal names are constructed from the database used for decoding, as in the example below:

         t              Speed    SpeedAccuracy    SpeedValid
____________________    _____    _____________    __________

22-Apr-2022 14:14:43    0.01         2.006            1
22-Apr-2022 14:14:44    0.01         2.152            1
22-Apr-2022 14:14:45    0.01         2.290            1

If specific values exceed the MIN/MAX as defined in the database, they are included in the output as NULL.

Warning

Output records are skipped if all values are NULL.

Performance

Below table provides some performance numbers for different input / output scenarios.

Input records	Input size total (MB)	Input files (#)	Output size total (MB)	Output files	Exe time (s)	Exe peak memory (MB)	Note
10 000 000	190	1	117	1	7	309	One input file one output file
10 000 000	292	1000	114	1	8	116	Many input files one output file
10 000 000	190	1	151	1000	11	1131	One input file many output files

Note

All tests have been performed using default input arguments on a Intel Xeon @ 3.4 GHz running Linux (x86_64)

Note

All tests use random payload data, making the Parquet compression less effective. Output sizes can in some cases be significantly smaller.

Changelog

# Changelog

All notable changes to this project will be documented in this file.

## [24.12.19]

### Added

- Support for mdf2mdf output format from version 24.12.19

### Changed

- Updated MDF reader / writer based on "mdflib" by ihedvall
- Parquet writer updated to version 18.0.0

### Fixed

- Parquet writer performance restored (issue introduced in 24.10.17)
- Fix for packed error frames in MUX-TP

## [24.10.17]

### Added

- Support for transport protocols (ISO-TP, J1939-21, NMEA-TP, MUX-TP)

### Changed

- Versioning schema from SemVer to CalVer
- Default verbosity level changed to 2
- Type of out-of-range signal values changed from double *NaN* to *NULL* (change to parquet schema)
- Restriction on max 5 DBC-files per interface/channel removed

### Known-issues

- Reduced parquet write speed

## [2.3.2]

### Fixed

- Issue with merge of PGN source/destination addresses sometimes resulting in invalid output files

## [2.3.1]

### Added

- Support for float and double data types (defined in DBC file)
- Option to merge PGN source-address outputs
- Option to merge PGN destination-address outputs

### Changed

- DBC file message with name "VECTOR__INDEPENDENT_SIG_MSG" ignored
- DBC file empty attribute "BusType" now defaults to "CAN-bus"
- Improved error text on invalid output dir
- Improved error text on colliding message keys

## [2.2.2]

### Added

- Support for "signed" data type
- Support for compacted JSON password file

## [2.2.1]

### Added

- Input argument "dbdump" to dump loaded database to JSON-file

### Fixed

- Better handling of special characters in DBC-file

Download

Windows AMD64 / x86-64 (64-bit)

24.12.19.zip (MD5: 25fd6978483e89d46bf97f6ea0f7bdad)
24.10.17.zip (MD5: 02aa987185ad1a7c9cd30bcadb10ca9b)
2.3.2.zip (MD5: fb0950bc0808df8af410634e536fa9bb)
2.3.1.zip (MD5: b754b0160de7c8bf9efec48553576641)
2.2.2.zip (MD5: 4261251ae62138db1561d37ba6da82c2)
2.2.1.zip (MD5: e34aa2359fd642d77794efbc071042b1)
2.1.1.zip (MD5: 4216d7aaffaf98a4e135c70eaae1325e)

Linux AMD64 / x86-64 (64-bit)

24.12.19.zip (MD5: b8fb2e988142ce72ec37e3019725ff52)
24.10.17.zip (MD5: 02b93dcd49e36c7656e1d78e98969cfd)
2.3.2.zip (MD5: e219f190a1d1613a8c6076121c91c625)
2.3.1.zip (MD5: 4752bc58a0e3a5f3b850af68ea8663ea)
2.2.2.zip (MD5: 463d995342a2fbf5a6d0b319f9cc0ade)
2.2.1.zip (MD5: 52cf0fcc3c31fd606a742d5176fa22f8)
2.1.1.zip (MD5: 15f959910cf0fef97d52f54c419e2882)

Linux ARM64 (64-bit)

24.12.19.zip (MD5: 86297415862c199f44e6a2823e2a2592)
24.10.17.zip (MD5: 1c42b0db5075c9e23373f37e327a44f3)
2.3.2.zip (MD5: e9f0ee32716bbcaafc3f93b13bd6d2f1)
2.3.1.zip (MD5: e956cba794066c85e72fe7d2cafbb4f0)
2.2.2.zip (MD5: 018922344724c49a521179ff24ca48a9)
2.2.1.zip (MD5: 3c20752cec36af96fdee4014c7f0fbac)
2.1.1.zip (MD5: a3853b90494c8e038fd275b9d5b49326)