Set up Google BigQuery
BigQuery makes it simple and fast to query data from your Google Parquet data lake via SQL. It can e.g. be used in e.g. Grafana-BigQuery dashboards or Python scripts.
In this section we explain how you can set up BigQuery.
Table of Contents
Prerequisites
- Set up Google Parquet data lake [~10 min]
Note
The above steps are required before proceeding
1: Deploy BigQuery and mapping function
- Upload below zip to your input bucket root via the console (storage overview)
- Open the canedge-google-cloud-terraform repository
- Go through the ‘setup instructions’ to open your Cloud Shell and clone the repository
- Go through step 3 (BigQuery) and reference the uploaded mapping function zip name
2: Map your Parquet data lake to tables
- Verify that your output bucket contains Parquet files[1]
- Open your
<id>-bq-map-tables
function via the console (function overview) - Click ‘Test’ (at the top), copy the ‘CLI test command’ and click ‘Test in cloud shell’
- Paste the command and run it, then verify that the script succeeds
Note
The mapping script adds ‘meta data’ about your output bucket. If new devices/messages are added to your Parquet data lake, the script should be run again (manually or by schedule)[2]
You are now ready to use BigQuery as a data source in e.g. Grafana-BigQuery dashboards.
[1] | If your output bucket is empty, you can upload a test MDF file to your input bucket to create some Parquet data |
[2] | You only need to re-run the script if new tables are to be created, not if you simply add more data to an existing table |