Google Cloud Platform Blog
Product updates, customer stories, and tips and tricks on Google Cloud Platform
Announcing Google BigQuery and Datastore Connectors for Hadoop
Wednesday, April 16, 2014
Today, we are making it easier for you to run Hadoop jobs directly against your data in
Google BigQuery
and
Google Cloud Datastore
with the Preview release of
Google BigQuery connector
and
Google Cloud Datastore connector
for Hadoop. The Google BigQuery and Google Cloud Datastore connectors implement Hadoop’s
InputFormat
and
OutputFormat
interfaces for accessing data. These two connectors complement the existing
Google Cloud Storage connector for Hadoop
, which implements the
Hadoop Distributed File System
interface for accessing data in Google Cloud Storage.
The connectors can be
automatically installed and configured
when deploying your Hadoop cluster using bdutil simply by including the extra “env” files:
./bdutil deploy bigquery_env.sh
./bdutil deploy datastore_env.sh
./bdutil deploy bigquery_env.sh datastore_env.sh
Diagram of Hadoop on Google Cloud Platform
These three connectors allow you to directly access data stored in Google Cloud Platform’s storage services from Hadoop and other Big Data open source software that use Hadoop's IO abstractions. As a result, your valuable data is available simultaneously to multiple Big Data clusters and other services, without duplications. This should dramatically simplify the operational model for your Big Data processing on Google Cloud Platform.
Here are some word-count MapReduce code samples to get you started:
Using the BigQuery connector
Using the Datastore connector
Using the Datastore connector for reading data and using the BigQuery connector for publishing results
As always, we would love to hear your feedback and ideas on improving these connectors and making Hadoop run better on Google Cloud Platform.
-Posted by Pratul Dublish, Product Manager
Free Trial
GCP Blogs
Big Data & Machine Learning
Kubernetes
GCP Japan Blog
Firebase Blog
Apigee Blog
Popular Posts
Understanding Cloud Pricing
World's largest event dataset now publicly available in BigQuery
A look inside Google’s Data Center Networks
Enter the Andromeda zone - Google Cloud Platform’s latest networking stack
New in Google Cloud Storage: auto-delete, regional buckets and faster uploads
Labels
Announcements
193
Big Data & Machine Learning
134
Compute
271
Containers & Kubernetes
92
CRE
27
Customers
107
Developer Tools & Insights
151
Events
38
Infrastructure
44
Management Tools
87
Networking
43
Open
1
Open Source
135
Partners
102
Pricing
28
Security & Identity
85
Solutions
24
Stackdriver
24
Storage & Databases
164
Weekly Roundups
20
Feed
Subscribe by email
Demonstrate your proficiency to design, build and manage solutions on Google Cloud Platform.
Learn More
Technical questions? Check us out on
Stack Overflow
.
Subscribe to
our monthly newsletter
.
Google
on
Follow @googlecloud
Follow
Follow