Set up Amazon Athena 

Athena makes it simple and fast to query data from your Amazon Parquet data lake via SQL. It can e.g. be used in Grafana-Athena dashboards, Excel or Python scripts.

In this section we explain how you can set up Athena.

Table of Contents

Set up Amazon Athena

Prerequisites 

Set up Azure Parquet data lake [~10 min]

Note

The above steps are required before proceeding

Deploy Athena and Glue 

Ensure that you have completed the prerequisites
Download below Glue script and upload it to your input bucket root via the AWS S3 console
Verify that your input bucket contains the latest Lambda zip
Log into your AWS account, go to CloudFormation and select your data lake stack
Click ‘Stack actions/Create change set for current stack’
Click ‘Replace current template’ and enter below:

https://css-electronics-resources.s3.eu-central-1.amazonaws.com/stacks/glue-athena-v2.0.7-vG.3.0.json

Click ‘Acknowledge’, ‘Submit’, wait ~1 min and click the upper-right refresh
Click ‘Execute change set’ (and click it again in the popup), then wait ~1 min
Verify that the deployment succeeds, then go to the stack ‘Outputs’ tab

The ‘Outputs’ tab contains the details required in using Athena as a data source.

Glue script | changelog

Note

If you later need to update the integration with a new revision, see this guide

Trigger Glue job 

Verify that your S3 output bucket contains Parquet files
Open AWS Glue Triggers in a new tab
Select the ‘on-demand’ trigger and click ‘Action/Start trigger’
Open your database under AWS Glue Databases
Verify that your database tables show up (this may take a few minutes)

Note

Glue adds ‘meta data’ about your S3 output bucket. If new devices/messages are added to your Parquet data lake, the Glue job should be triggered again (manually or by schedule)[1]

You can now use Athena as a data source in e.g. Grafana-Athena dashboards. You can also check out the advanced topics to learn how to set up periodic data aggregation in your data lake.

[1]

New Parquet files added for existing devices/messages will automatically be available for queries by Athena. A new Glue job run is only required if the new Parquet data reflects a previously ‘unmapped’ device or table. For most use cases, the manual trigger will therefore suffice. However, a scheduled trigger is recommended if you expect new devices/messages to be added frequently over time. To activate the scheduled trigger, select it and click ‘Action/Activate trigger’. A Glue job will normally cost ~0.03$/run (depending on data lake size), in which case a scheduled daily trigger would cost cost ~10$/year

Set up Amazon Athena

Prerequisites

Deploy Athena and Glue

Trigger Glue job

Set up Amazon Athena 

Prerequisites 

Deploy Athena and Glue 

Trigger Glue job 