Set up Amazon Athena 

Athena makes it simple and fast to query data from an AWS S3 Parquet data lake. It can e.g. be used in Grafana-Athena dashboards, Excel or beyond-memory Python scripts.

In this section, we explain how you can set up Athena.

Table of Contents

Set up Amazon Athena

Prerequisites: AWS S3 data lake 

Set up AWS S3 Parquet data lake [~10 min]

Note

The above steps are required before proceeding

Set up Athena 

Ensure that you have completed the above prerequisites
Download below Glue script and upload it to your input bucket root via the AWS S3 console
Verify that your input bucket contains the latest Lambda zip
Log into your AWS account, go to CloudFormation and select your data lake stack
Click ‘Stack actions/Create change set for current stack’
Click ‘Replace current template’ and enter below:

https://css-electronics-resources.s3.eu-central-1.amazonaws.com/stacks/glue-athena-v2.0.5-vG.1.0.json

Click ‘Acknowledge’, ‘Submit’, wait ~1 min and click the upper-right refresh
Click ‘Execute change set’ (and click it again in the popup), then wait ~1 min
Verify that the deployment succeeds, then go to the stack ‘Outputs’ tab

The ‘Outputs’ tab contains the details required in using Athena as a data source.

Glue script | changelog

Trigger Glue job 

Open AWS Glue Triggers in a new tab
Select the ‘on-demand’ trigger and click ‘Action/Start trigger’
Open your database under AWS Glue Databases
Verify that your database tables show up (this may take a few minutes)

Note

Glue adds ‘meta data’ about your S3 output bucket. If new devices/messages are added to your Parquet data lake, the Glue job should be triggered again (manually or by schedule)[1]

You are now ready to use Athena as a data source in e.g. Grafana-Athena dashboards.

[1]

New Parquet files added for existing devices/messages will automatically be available for queries by Athena. A new Glue job run is only required if the new Parquet data reflects a previously ‘unmapped’ device or table. For most use cases, the manual trigger will therefore suffice. However, a scheduled trigger is recommended if you expect new devices/messages to be added frequently over time. To activate the scheduled trigger, select it and click ‘Action/Activate trigger’. A Glue job will normally cost ~0.03$/run (depending on data lake size), in which case a scheduled daily trigger would cost cost ~10$/year

Set up Amazon Athena

Prerequisites: AWS S3 data lake

Set up Athena

Trigger Glue job

Set up Amazon Athena 

Prerequisites: AWS S3 data lake 

Set up Athena 

Trigger Glue job 