Customize your Lambda function

In some use cases you may need to add custom code to your Lambda function automation. This can e.g. be used to create custom Parquet tables with calculated signals, setting up data-based alerts or writing data to external endpoints (e.g. databases).


When to use custom Lambda functions

Before you start customizing your Lambda function, consider if it is the best way to achieve your intended goal. Generally, a Lambda function is useful in below situations:

  • If the processing has to be done immediately upon file upload
  • If the processing has to be done on all of your data
  • If the processing can be done file-by-file[1]

1: Create your custom processing function

To customize the Lambda function, it is useful to understand the ‘default’ workflow:

  1. It downloads the trigger MDF log file and DBC files
  2. It uses mdf2parquet_decode to DBC decode the data into Parquet files
  3. It runs the function process_decoded_data on the Parquet data
  4. The default version of this function simply uploads all the Parquet files to the S3 output bucket

The last step can be amended by providing your own process_decoded_data function:

  1. Download the latest Lambda zip and unzip it
  2. Customize the functionality of process_decoded_data in functions.py
  3. Test that it works locally via below script (download it into your Lambda folder)[3]
  4. Zip the files again, including your new custom functions.py[2]
  5. Update your deployed Lambda function with the new zip
  6. If your new function uses pandas/pyarrow, add the ARN layer (see below)
  7. Test your Lambda by uploading an MDF file in your S3 input bucket
  8. Monitor your Lambda via the CloudWatch (alarm, logs) to verify it performs as expected

Local test script

Note

For custom Parquet processing, we strongly recommend to take outset in one of our example functions - see the sections on custom signals and alerts


2: Add ARN layer for pandas/pyarrow support

To read/write Parquet files in the Lambda, you will need support for pandas and pyarrow. This can be achieved easily by adding a pre-built ‘ARN layer’:

  1. Open the AWS SDK project’s managed ARN layers page
  2. Navigate to your region, e.g. eu-central-1
  3. Copy the ARN layer for your region (e.g. eu-central-1), 3.11 and x86_64
  4. Go to your Lambda function overview page and click your function
  5. Scroll down, select ‘Add a layer/Specify an ARN’, paste your ARN and click ‘Add’

Backlog processing

Once your Lambda is updated, you may want to also process your historical data using the new function. This can be done via the local test script by adding your S3 credentials[3]. In this case, you simply add all the Parquet data you wish to process in the relevant folder.


[1]For example, Lambda functions are not useful for performing analyses that need to aggregate data across devices, trips, days or similar as this involve multiple log files (and may exceed the max Lambda function timeout). For such use cases, a periodic Glue job is better suited (see our trip summary section as an example)
[2]Make sure that you include all the required files in the zip and that you do not e.g. zip a folder containing the files
[3](1, 2) The zip includes a local testing script, sample Parquet data, a requirements.txt and a README.md. See the README for details on using the script.