Set up Google BigQuery

Google BigQuery interface

BigQuery makes it simple and fast to query data from your Google Parquet data lake via SQL. It can e.g. be used in e.g. Grafana-BigQuery dashboards or Python scripts.

In this section we explain how you can set up BigQuery.


Prerequisites

  1. Set up Google Parquet data lake [~10 min]

Note

The above steps are required before proceeding


Create ‘admin’ service account (for BigQuery + Storage)

  1. Go to ‘IAM and admin/Service accounts/Create’
  2. Specify name as bigquery-storage-admin
  3. Add the below roles and click ‘Done’[1]
BigQuery Admin
Storage Admin
  1. Open the service account and go to ‘Keys/Add key/Create new key/JSON’
  2. Download the key and name it bigquery-storage-admin-account.json

Create ‘user’ service account (for BigQuery)

  1. Go to ‘IAM and admin/Service accounts/Create’
  2. Specify name as bigquery-user
  3. Add the below roles and click ‘Done’[1]
BigQuery Data Viewer
BigQuery Job User
Storage Object Viewer
  1. Open the service account and go to ‘Keys/Add key/Create new key/JSON’
  2. Download the key and name it bigquery-user-account.json

Create BigQuery data set

  1. In the GCP console go to BigQuery
  2. In the Explorer view, click the ‘…’ next to your project ID and select ‘Create data set’
  3. Name it lakedataset1 and use the same region as your buckets (e.g. europe-west3)

Note

The dataset must be named as above for scripts/templates to work


Map your Parquet data lake to tables

  1. Verify that your output bucket contains Parquet files[2]
  2. Download and unzip below script
  3. Place the bigquery-storage-admin-account.json next to the Python script
  4. Open the bigquery-map-tables.py via a text editor
  5. Add your details (project ID, data set ID, output bucket name) and save the script
  6. Run the script to map your current Parquet data lake to tables[3]

BigQuery map tables script | changelog

Note

The script adds ‘meta data’ about your output bucket. If new devices/messages are added to your Parquet data lake, the script should be run again (manually or by schedule)[4]


You are now ready to use BigQuery as a data source in e.g. Grafana-BigQuery dashboards.


[1](1, 2) Once you are done testing, you can optionally restrict the service account further if needed via e.g. IAM conditions
[2]If your output bucket is empty, you can upload a test MDF file to your input bucket to create some Parquet data
[3]This assumes that you have installed Python 3.11 and the requirements.txt (we recommend using a virtual environment). You can set this up by running python -m venv env & env\Scripts\activate & pip install -r requirements.txt
[4]You only need to re-run the script if new tables are to be created, not if you simply add more data to an existing table