
International Journal on Science and Technology
E-ISSN: 2229-7677
•
Impact Factor: 9.88
A Widely Indexed Open Access Peer Reviewed Multidisciplinary Bi-monthly Scholarly International Journal
Plagiarism is checked by the leading plagiarism checker
Call for Paper
Volume 16 Issue 3
July-September 2025
Indexing Partners



















Batch Loading Data to Google BigQuery using Google Data Fusion
Author(s) | Suhas Hanumanthaiah |
---|---|
Country | United States |
Abstract | Efficient data ingestion into cloud-based data warehouses is critical for enabling timely analytics and informed decision-making in modern enterprises. This paper explores a practical and scalable solution for batch loading large volumes of structured and semi-structured data into Google BigQuery by leveraging Google Cloud Data Fusion (CDF). BigQuery, a serverless and highly scalable analytical database, excels at processing petabyte-scale datasets but requires efficient upstream data integration to unlock its full potential. Google Cloud Data Fusion, built on the Cask Data Application Platform (CDAP), offers a visual, code-free interface for designing, managing, and executing ETL pipelines. The research outlines how CDF integrates seamlessly with other GCP services such as Dataproc to orchestrate resource-optimized data pipelines. Through a well-defined architectural framework and deployment model, the paper demonstrates how CDF can be employed to create modular, reusable, and auto-scaling batch data workflows, delivering operational cost savings and performance benefits. Best practices such as namespace segregation, transformation pushdown, autoscaling clusters, and failure alerting are presented to enhance pipeline efficiency and governance. Additionally, this study identifies existing limitations in real-time data ingestion capabilities within CDF and proposes future work to evaluate its streaming performance using Pub/Sub and Spark Streaming. Overall, the approach provides a robust and cost-effective foundation for enterprise-grade data integration on Google Cloud, with strong potential for hybrid batch-streaming models in future research. |
Keywords | Cloud Data Fusion (CDF), Google BigQuery, Batch Data Processing, Google Cloud Platform (GCP). |
Field | Engineering |
Published In | Volume 15, Issue 1, January-March 2024 |
Published On | 2024-03-06 |
DOI | https://doi.org/10.71097/IJSAT.v15.i1.7552 |
Short DOI | https://doi.org/g9v45g |
Share this


CrossRef DOI is assigned to each research paper published in our journal.
IJSAT DOI prefix is
10.71097/IJSAT
Downloads
All research papers published on this website are licensed under Creative Commons Attribution-ShareAlike 4.0 International License, and all rights belong to their respective authors/researchers.
