Customize your Lambda function 

In some use cases you may need to add custom code to your Lambda function automation. This can e.g. be used to create custom Parquet tables with calculated signals, setting up data-based alerts or writing data to external endpoints (e.g. databases).

Table of Contents

Customize your Lambda function

When to use custom Lambda functions 

Before you start customizing your Lambda function, consider if it is the best way to achieve your intended goal. Generally, a Lambda function is useful in below situations:

If the processing has to be done immediately upon file upload
If the processing has to be done on all of your data
If the processing can be done file-by-file[1]

1: Create your custom processing function 

To customize the Lambda function, it is useful to understand the ‘default’ workflow:

It downloads the trigger MDF log file and DBC files
It uses mdf2parquet_decode to DBC decode the data into Parquet files
It runs the function process_decoded_data on the Parquet data
The default version of this function simply uploads all the Parquet files to the S3 output bucket

The last step can be amended by providing your own process_decoded_data function:

Download the latest Lambda zip and unzip it
Customize the functionality of process_decoded_data in functions.py
Test that it works locally via below script (download it into your Lambda folder)[3]
Zip the files again, including your new custom functions.py[2]
Update your deployed Lambda function with the new zip
If your new function uses pandas/pyarrow, add the ARN layer (see below)
Test your Lambda by uploading an MDF file in your S3 input bucket
Monitor your Lambda via the CloudWatch (alarm, logs) to verify it performs as expected

Local test script

Note

For custom Parquet processing, we strongly recommend to take outset in one of our example functions - see the sections on calculated signals and event detection

2: Add ARN layer for pandas/pyarrow support 

To read/write Parquet files in the Lambda, you will need support for pandas and pyarrow. This can be achieved easily by adding a pre-built ‘ARN layer’:

Go to your Lambda function overview page and click your function
Scroll down, select ‘Add a layer/AWS layers/AWSSDKPandas-Python311’ and click ‘Add’

Backlog processing 

Once your Lambda is updated, you may want to also process your historical data using the new function. This can be done via the local test script by adding your S3 credentials[3]. In this case, you simply add all the Parquet data you wish to process in the relevant folder.

[1]

For example, Lambda functions are not useful for performing analyses that need to aggregate data across devices, trips, days or similar as this involve multiple log files (and may exceed the max Lambda function timeout). For such use cases, a periodic Glue job is better suited (see our trip summary section as an example)

[2]	Make sure that you include all the required files in the zip and that you do not e.g. zip a folder containing the files. If you are using Windows 11 rather than Windows 10, you may experience issues with the zip method. If so, try creating an empty zip folder and moving the files into that instead.

[3]	(1, 2) The zip includes a local testing script, sample Parquet data, a `requirements.txt` and a `README.md`. See the README for details on using the script.

Customize your Lambda function

When to use custom Lambda functions

1: Create your custom processing function

2: Add ARN layer for pandas/pyarrow support

Backlog processing