Local Parquet data lake

Local Parquet data lake interface

The simplest way to set up a Parquet data lake is to manually create and store it locally:

  1. Create a local folder (input/) next to your mdf2parquet_decode.exe and prefixed DBC files
  2. Copy your MDF log files into this folder (with the CANedge path structure[1])
  3. Drag & drop this folder onto the mdf2parquet_decode.exe to create your data lake

If you want to use the more advanced cloud automation functionality locally, clone the canedge-mdftoparquet-automation repo and follow the README to process your local input folder.

Open source interfaces like DuckDB and ClickHouse let query data from your local Parquet data lake via SQL. They can be used in e.g. Grafana dashboards, Excel or Python.

Local interface are beyond the scope of this intro, but we recommend below resources:

  1. MF4 decoder Docs - learn how to set up Grafana-ClickHouse or use DuckDB in Python

[1]I.e. how the files are stored in the LOG/ folder on a CANedge SD: [DEVICE_ID]/[SESSION_NUMBER]/[SPLIT_NUMBER].[FILE_EXTENSION]