Parquet decoder

The Parquet decoder stores the decoded output as Parquet files. For more information the Parquet format itself, see: https://parquet.apache.org/

Note

All major programming languages have Parquet support, see: https://arrow.apache.org/docs/


Tool support

Examples of some specific tools/languages supporting the Parquet format:


Output

The Parquet output data-schema always uses the following structure:

  • One timestamp value (t) using datatype Int64 (MICROS) and snappy compression

  • One or more signal values using datatype double and snappy compression

The row-group-size is set to 1.000.000 (1e6).

The signal names are constructed from the database used for decoding, as in the example below:

         t              Speed    SpeedAccuracy    SpeedValid
____________________    _____    _____________    __________

22-Apr-2022 14:14:43    0.01         2.006            1
22-Apr-2022 14:14:44    0.01         2.152            1
22-Apr-2022 14:14:45    0.01         2.290            1

If specific values exceed the MIN/MAX as defined in the database, they are included in the output as NaN.

Warning

Output records are skipped if all values are NaN.


Performance

Below table provides some performance numbers for different input / output scenarios.

Input records

Input size total (MB)

Input files (#)

Output size total (MB)

Output files

Exe time (s)

Exe peak memory (MB)

Note

10.000.000

190

1

117

1

7

309

One input file one output file

10.000.000

292

1000

114

1

8

116

Many input files one output file

10.000.000

190

1

151

1000

11

1131

One input file many output files

Note

All tests have been performed using default input arguments on a Intel Xeon @ 3.4 GHz running Linux (x86_64)

Note

All tests use random payload data, making the Parquet compression less effective. Output sizes can in some cases be significantly smaller.


Changelog

2.4.0

  • Change CLI version output

2.3.2

  • Fixed

    • Issue with merge of PGN source/destination addresses sometimes resulting in invalid output files

2.3.1

  • Added

    • DBC, support for float and double data types

    • Option to merge PGN source-address outputs

    • Option to merge PGN destination-address outputs

  • Changed

    • DBC, message with name “VECTOR__INDEPENDENT_SIG_MSG” ignored

    • DBC, empty attribute “BusType” now defaults to CAN-bus

    • Improved error text on invalid output dir

    • Improved error text on colliding message keys

2.2.2

  • Added

    • Support for signed data type

2.2.1

  • Added

    • Input argument dbdump to dump loaded database to JSON-file

  • Fixed

    • Better handling of special characters in DBC-file

2.1.1

  • First release


Download

Windows AMD64 / x86-64 (64-bit)

  • 2.3.2.zip (MD5: fb0950bc0808df8af410634e536fa9bb)
  • 2.3.1.zip (MD5: b754b0160de7c8bf9efec48553576641)
  • 2.2.2.zip (MD5: 4261251ae62138db1561d37ba6da82c2)
  • 2.2.1.zip (MD5: e34aa2359fd642d77794efbc071042b1)
  • 2.1.1.zip (MD5: 4216d7aaffaf98a4e135c70eaae1325e)

Linux AMD64 / x86-64 (64-bit)

  • 2.3.2.zip (MD5: e219f190a1d1613a8c6076121c91c625)
  • 2.3.1.zip (MD5: 4752bc58a0e3a5f3b850af68ea8663ea)
  • 2.2.2.zip (MD5: 463d995342a2fbf5a6d0b319f9cc0ade)
  • 2.2.1.zip (MD5: 52cf0fcc3c31fd606a742d5176fa22f8)
  • 2.1.1.zip (MD5: 15f959910cf0fef97d52f54c419e2882)

Linux ARM64 (64-bit)

  • 2.3.2.zip (MD5: e9f0ee32716bbcaafc3f93b13bd6d2f1)
  • 2.3.1.zip (MD5: e956cba794066c85e72fe7d2cafbb4f0)
  • 2.2.2.zip (MD5: 018922344724c49a521179ff24ca48a9)
  • 2.2.1.zip (MD5: 3c20752cec36af96fdee4014c7f0fbac)
  • 2.1.1.zip (MD5: a3853b90494c8e038fd275b9d5b49326)