Customize your Lambda function
In some use cases you may need to add custom code to your Lambda function automation. This can e.g. be used to create custom Parquet tables with calculated signals, setting up data-based alerts or writing data to external endpoints (e.g. databases).
Table of Contents
When to use custom Lambda functions
Before you start customizing your Lambda function, consider if it is the best way to achieve your intended goal. Generally, a Lambda function is useful in below situations:
- If the processing has to be done immediately upon file upload
- If the processing has to be done on all of your data
- If the processing can be done file-by-file[1]
1: Create your custom processing function
To customize the Lambda function, it is useful to understand the ‘default’ workflow:
- It downloads the trigger MDF log file and DBC files
- It uses
mdf2parquet_decode
to DBC decode the data into Parquet files - It runs the function
process_decoded_data
on the Parquet data - The default version of this function simply uploads all the Parquet files to the S3 output bucket
The last step can be amended by providing your own process_decoded_data
function:
- Download the latest Lambda zip and unzip it
- Customize the functionality of
process_decoded_data
infunctions.py
- Test that it works locally via below script (download it into your Lambda folder)[3]
- Zip the files again, including your new custom
functions.py
[2] - Update your deployed Lambda function with the new zip
- If your new function uses pandas/pyarrow, add the ARN layer (see below)
- Test your Lambda by uploading an MDF file in your S3 input bucket
- Monitor your Lambda via the CloudWatch (alarm, logs) to verify it performs as expected
Note
For custom Parquet processing, we strongly recommend to take outset in one of our example functions - see the sections on custom signals and alerts
2: Add ARN layer for pandas/pyarrow support
To read/write Parquet files in the Lambda, you will need support for pandas
and pyarrow
. This can be achieved easily by adding a pre-built ‘ARN layer’:
- Open the AWS SDK project’s managed ARN layers page
- Navigate to your region, e.g.
eu-central-1
- Copy the ARN layer for your region (e.g.
eu-central-1
),3.11
andx86_64
- Go to your Lambda function overview page and click your function
- Scroll down, select ‘Add a layer/Specify an ARN’, paste your ARN and click ‘Add’
Backlog processing
Once your Lambda is updated, you may want to also process your historical data using the new function. This can be done via the local test script by adding your S3 credentials[3]. In this case, you simply add all the Parquet data you wish to process in the relevant folder.
[1] | For example, Lambda functions are not useful for performing analyses that need to aggregate data across devices, trips, days or similar as this involve multiple log files (and may exceed the max Lambda function timeout). For such use cases, a periodic Glue job is better suited (see our trip summary section as an example) |
[2] | Make sure that you include all the required files in the zip and that you do not e.g. zip a folder containing the files |
[3] | (1, 2) The zip includes a local testing script, sample Parquet data, a requirements.txt and a README.md . See the README for details on using the script. |