Airflow: How to load data from a REST API to BigQuery?
Problem Description:
I am new to Airflow here, and I am trying to write a Python method to insert data into BigQuery from a REST API, but I am having no luck.
So far, I have tried to read the documentation, but I have not found an example of what I am trying to accomplish; though, I have written the following code, but I am not sure if it is correct:
def insert_from_api_to_bq():
request1 = Request(
'[URL GOES GERE]', headers=headers)
sale_list = urlopen(request1).read()
dec_sale_list = json.loads(sale_list)
for sale in dec_sale_list['SaleList']:
print("sale ID: " + sale['SaleID'] + " Customer:" + sale['Customer'] + " Order Date: " + sale['OrderDate'])
with DAG("sales_data_pipeline", start_date=datetime(2021, 1 ,1),
schedule_interval="@daily", default_args=default_args, catchup=False) as dag:
downloading_sales = PythonOperator(
task_id="downloading_sales",
python_callable=download_sales
Solution – 1
You are in the good direction.
In your Python
method that retrieves data from the api, you can then transform and load the result to a BigQuery
table with Python
client :
from google.cloud import bigquery
def insert_from_api_to_bq():
request1 = Request(
'[URL GOES GERE]', headers=headers)
sale_list = urlopen(request1).read()
dec_sale_list = json.loads(sale_list)
# you can transform and adapt dec_sale_list before writing to BQ
client = bigquery.Client()
client.insert_rows_json(f'{dataset}.{table}', dec_sale_list)
with DAG("sales_data_pipeline", start_date=datetime(2021, 1 ,1),
schedule_interval="@daily", default_args=default_args, catchup=False) as dag:
downloading_sales = PythonOperator(
task_id="downloading_sales",
python_callable=download_sales
The insert_rows_json
method allows saving a Python
list of Dict
to BigQuery
.