Set up Google BigQuery 

BigQuery makes it simple and fast to query data from your Google Parquet data lake via SQL. It can e.g. be used in e.g. Grafana-BigQuery dashboards or Python scripts.

In this section we explain how you can set up BigQuery.

Table of Contents

Set up Google BigQuery

Prerequisites 

Set up Google Parquet data lake [~10 min]

Note

The above steps are required before proceeding

Create ‘admin’ service account (for BigQuery + Storage)

Go to ‘IAM and admin/Service accounts/Create’
Specify name as bigquery-storage-admin
Add the below roles and click ‘Done’[1]

BigQuery Admin
Storage Admin

Open the service account and go to ‘Keys/Add key/Create new key/JSON’
Download the key and name it bigquery-storage-admin-account.json

Create ‘user’ service account (for BigQuery)

Go to ‘IAM and admin/Service accounts/Create’
Specify name as bigquery-user
Add the below roles and click ‘Done’[1]

BigQuery Data Viewer
BigQuery Job User
Storage Object Viewer

Open the service account and go to ‘Keys/Add key/Create new key/JSON’
Download the key and name it bigquery-user-account.json

Create BigQuery data set 

In the GCP console go to BigQuery
In the Explorer view, click the ‘…’ next to your project ID and select ‘Create data set’
Name it lakedataset1 and use the same region as your buckets (e.g. europe-west3)

Note

The dataset must be named as above for scripts/templates to work

Map your Parquet data lake to tables 

Verify that your output bucket contains Parquet files[2]
Download and unzip below script
Place the bigquery-storage-admin-account.json next to the Python script
Open the bigquery-map-tables.py via a text editor
Add your details (project ID, data set ID, output bucket name) and save the script
Run the script to map your current Parquet data lake to tables[3]

BigQuery map tables script | changelog

Note

The script adds ‘meta data’ about your output bucket. If new devices/messages are added to your Parquet data lake, the script should be run again (manually or by schedule)[4]

You are now ready to use BigQuery as a data source in e.g. Grafana-BigQuery dashboards.

[1]	(1, 2) Once you are done testing, you can optionally restrict the service account further if needed via e.g. IAM conditions

[2]	If your output bucket is empty, you can upload a test MDF file to your input bucket to create some Parquet data

[3]	This assumes that you have installed Python 3.11 and the `requirements.txt` (we recommend using a virtual environment). You can set this up by running `python -m venv env & env\Scripts\activate & pip install -r requirements.txt`

[4]	You only need to re-run the script if new tables are to be created, not if you simply add more data to an existing table

Set up Google BigQuery

Prerequisites

Create ‘admin’ service account (for BigQuery + Storage)

Create ‘user’ service account (for BigQuery)

Create BigQuery data set

Map your Parquet data lake to tables

Set up Google BigQuery 

Prerequisites 

Create BigQuery data set 

Map your Parquet data lake to tables 