Google Cloud Platform Blog
Product updates, customer stories, and tips and tricks on Google Cloud Platform
Jump-start your data pipelining into Google BigQuery
Tuesday, October 1, 2013
Once you get your data into
Google BigQuery
, you don’t have to worry about running out of machine capacity, because you use Google’s machines as if they were your own. But what if you want to transform your source data before putting it into BigQuery and you don’t have the server capacity to handle the transformation? In this case, how about using Google Compute Engine to run your Extract, Transform and Load (ETL) processing? To learn how, read our paper
Getting Started With Google BigQuery
. To get started, download the sample ETL tool for Google Compute Engine from
GitHub
.
The sample ETL tool is an application that automates the steps of getting the Google Compute Engine instance up and running, and installing the software you need to rapidly design, create and execute the ETL workflow. The application includes a sample workflow that uses KNIME to help you understand the entire process, as shown here:
If you already have an established process for performing the ETL process to prepare the data and load it into Google Cloud Storage, but need a reliable way to load the data from there into BigQuery, we have a another sample application to help you. The
Automated File Loader for BigQuery
sample app demonstrates how to automate data loading from Google Cloud Storage to BigQuery.
This application uses the Cloud Storage Object Change Notification API to receive notifications that files have been uploaded to a bucket in Google Cloud Storage, then uses the BigQuery API to load the data from the bucket into BigQuery. Download it now from
GitHub
.
Both these sample applications accompany the article
Getting Started With Google BigQuery
, which provides an overview of the end-to-end process from loading data into BigQuery to visualization, and design practices that should be considered when using BigQuery.
-Posted by Wally Yau, Solutions Architect
Free Trial
GCP Blogs
Big Data & Machine Learning
Kubernetes
GCP Japan Blog
Firebase Blog
Apigee Blog
Popular Posts
World's largest event dataset now publicly available in BigQuery
A look inside Google’s Data Center Networks
Enter the Andromeda zone - Google Cloud Platform’s latest networking stack
Using labels to organize Google Cloud Platform resources
New in Google Cloud Storage: auto-delete, regional buckets and faster uploads
Labels
Announcements
193
Big Data & Machine Learning
134
Compute
271
Containers & Kubernetes
92
CRE
27
Customers
107
Developer Tools & Insights
151
Events
38
Infrastructure
44
Management Tools
87
Networking
43
Open
1
Open Source
135
Partners
102
Pricing
28
Security & Identity
85
Solutions
24
Stackdriver
24
Storage & Databases
164
Weekly Roundups
20
Feed
Subscribe by email
Demonstrate your proficiency to design, build and manage solutions on Google Cloud Platform.
Learn More
Technical questions? Check us out on
Stack Overflow
.
Subscribe to
our monthly newsletter
.
Google
on
Follow @googlecloud
Follow
Follow