Parquet data lake
In this section we outline how to set up a Parquet data lake.
Parquet data lakes offer an efficient, low cost, scalable and interoperable way of storing DBC decoded CAN/LIN data. The data lake can be analyzed directly via e.g. Python/MATLAB - or through an interface. The data lake can be stored locally or in the cloud.
This is a prerequisite for dashboards and some MATLAB/Python script examples.
Table of Contents
Prepare & test DBC files
- Download the MF4 decoder
mdf2parquet_decode.exe
and review the documentation here - Rename your DBC files to add the
<channel>
prefix (can1-<name>.dbc
,can9-<name>.dbc
, …)[1] - Verify that you can decode your log file by drag & dropping it onto the
mdf2parquet_decode.exe
Note
You can easily open Parquet files on your PC via the free Parquet viewer Tad
Note
If you have issues decoding your data, see our MF4 decoder troubleshooting guide
Create a Parquet data lake
Once you have tested your setup locally, you can set up your Parquet data lake and automation.
You can set this up in multiple ways, depending on your existing environment[2]:
- Amazon - create a Parquet data lake stored in an AWS S3 bucket (incl. automation)
- Google - create a Parquet data lake stored in a Google bucket (incl. automation)
- Azure - create a Parquet data lake stored in an Azure container (incl. automation)
- Local - create a Parquet data lake stored locally with manual processing
[1] | As per the MF4 decoder docs, the prefix specifies whether a DBC is applied to CAN CH1, LIN CH2 etc. You can have multiple DBC files with the same type and channel prefix. Ensure that your DBC file names use only letters, numbers and dashes |
[2] | For example, if you are using a CANedge2/3 to upload data to Amazon or Azure, we recommend setting up a Parquet data lake in Amazon or Azure, respectively |