MF4 to Parquet decoder

The MF4 to Parquet decoder stores the decoded output as Parquet files. For more information the Parquet format itself, see: https://parquet.apache.org/

Note

All major programming languages have Parquet support, see: https://arrow.apache.org/docs/


Tool support

Examples of some specific tools/languages supporting the Parquet format:


Output

The Parquet output data-schema always uses the following structure:

  • One timestamp value (t) using datatype Int64 (MICROS) and snappy compression

  • One or more signal values using datatype double / NULL and snappy compression

The row-group-size is set to 1 000 000 (1e6).

The signal names are constructed from the database used for decoding, as in the example below:

         t              Speed    SpeedAccuracy    SpeedValid
____________________    _____    _____________    __________

22-Apr-2022 14:14:43    0.01         2.006            1
22-Apr-2022 14:14:44    0.01         2.152            1
22-Apr-2022 14:14:45    0.01         2.290            1

If specific values exceed the MIN/MAX as defined in the database, they are included in the output as NULL.

Warning

Output records are skipped if all values are NULL.


Performance

Below table provides some performance numbers for different input / output scenarios.

Input records

Input size total (MB)

Input files (#)

Output size total (MB)

Output files

Exe time (s)

Exe peak memory (MB)

Note

10 000 000

190

1

117

1

7

309

One input file one output file

10 000 000

292

1000

114

1

8

116

Many input files one output file

10 000 000

190

1

151

1000

11

1131

One input file many output files

Note

All tests have been performed using default input arguments on a Intel Xeon @ 3.4 GHz running Linux (x86_64)

Note

All tests use random payload data, making the Parquet compression less effective. Output sizes can in some cases be significantly smaller.


Changelog

# Changelog

All notable changes to this project will be documented in this file.

## [24.12.19]

### Added

- Support for mdf2mdf output format from version 24.12.19

### Changed

- Updated MDF reader / writer based on "mdflib" by ihedvall
- Parquet writer updated to version 18.0.0

### Fixed

- Parquet writer performance restored (issue introduced in 24.10.17)
- Fix for packed error frames in MUX-TP

## [24.10.17]

### Added

- Support for transport protocols (ISO-TP, J1939-21, NMEA-TP, MUX-TP)

### Changed

- Versioning schema from SemVer to CalVer
- Default verbosity level changed to 2
- Type of out-of-range signal values changed from double *NaN* to *NULL* (change to parquet schema)
- Restriction on max 5 DBC-files per interface/channel removed

### Known-issues

- Reduced parquet write speed

## [2.3.2]

### Fixed

- Issue with merge of PGN source/destination addresses sometimes resulting in invalid output files

## [2.3.1]

### Added

- Support for float and double data types (defined in DBC file)
- Option to merge PGN source-address outputs
- Option to merge PGN destination-address outputs

### Changed

- DBC file message with name "VECTOR__INDEPENDENT_SIG_MSG" ignored
- DBC file empty attribute "BusType" now defaults to "CAN-bus"
- Improved error text on invalid output dir
- Improved error text on colliding message keys

## [2.2.2]

### Added

- Support for "signed" data type
- Support for compacted JSON password file

## [2.2.1]

### Added

- Input argument "dbdump" to dump loaded database to JSON-file

### Fixed

- Better handling of special characters in DBC-file

Download

Windows AMD64 / x86-64 (64-bit)

  • 24.12.19.zip (MD5: 25fd6978483e89d46bf97f6ea0f7bdad)
  • 24.10.17.zip (MD5: 02aa987185ad1a7c9cd30bcadb10ca9b)
  • 2.3.2.zip (MD5: fb0950bc0808df8af410634e536fa9bb)
  • 2.3.1.zip (MD5: b754b0160de7c8bf9efec48553576641)
  • 2.2.2.zip (MD5: 4261251ae62138db1561d37ba6da82c2)
  • 2.2.1.zip (MD5: e34aa2359fd642d77794efbc071042b1)
  • 2.1.1.zip (MD5: 4216d7aaffaf98a4e135c70eaae1325e)

Linux AMD64 / x86-64 (64-bit)

  • 24.12.19.zip (MD5: b8fb2e988142ce72ec37e3019725ff52)
  • 24.10.17.zip (MD5: 02b93dcd49e36c7656e1d78e98969cfd)
  • 2.3.2.zip (MD5: e219f190a1d1613a8c6076121c91c625)
  • 2.3.1.zip (MD5: 4752bc58a0e3a5f3b850af68ea8663ea)
  • 2.2.2.zip (MD5: 463d995342a2fbf5a6d0b319f9cc0ade)
  • 2.2.1.zip (MD5: 52cf0fcc3c31fd606a742d5176fa22f8)
  • 2.1.1.zip (MD5: 15f959910cf0fef97d52f54c419e2882)

Linux ARM64 (64-bit)

  • 24.12.19.zip (MD5: 86297415862c199f44e6a2823e2a2592)
  • 24.10.17.zip (MD5: 1c42b0db5075c9e23373f37e327a44f3)
  • 2.3.2.zip (MD5: e9f0ee32716bbcaafc3f93b13bd6d2f1)
  • 2.3.1.zip (MD5: e956cba794066c85e72fe7d2cafbb4f0)
  • 2.2.2.zip (MD5: 018922344724c49a521179ff24ca48a9)
  • 2.2.1.zip (MD5: 3c20752cec36af96fdee4014c7f0fbac)
  • 2.1.1.zip (MD5: a3853b90494c8e038fd275b9d5b49326)