Google Cloud Platform Blog: Easier, faster and lower cost Big Data processing with the Google Cloud Storage Connector for Hadoop

Google Cloud Platform Blog

Product updates, customer stories, and tips and tricks on Google Cloud Platform

Easier, faster and lower cost Big Data processing with the Google Cloud Storage Connector for Hadoop

Tuesday, January 14, 2014

fastApache HadoopHadoop on Google Cloud PlatformGoogle Cloud Storage connector for Hadoop

Diagram of Hadoop on Google Cloud Platform. HDFS and the NameNode are optional when storing data in Google Cloud Storage

Google File System (GFS)ColossusGoogle Cloud Storage

Compatibility: The Google Cloud Storage connector for Hadoop code-compatible with Hadoop. Just change the URL to point to your data.

Quick startup: Your data is ready to process. You don’t have to wait for extra minutes or more while your data is copied over to HDFS and the NameNode comes out of safe mode, and you don’t have to pay for the VM time for data copying either.

Greater availability and scalability: Google Cloud Storage is globally replicated and has higher availability than HDFS because it’s independent of the compute nodes and the NameNode. If the VMs are turned down (or, cloud forbid, crash) your data lives on.

Lower costs: Save on storage and compute: storage, because there’s no need to maintain two copies of your data, one for backups and one for running Hadoop; compute, because you don’t need to keep VMs going just to serve data. And with per-minute billing, you can run Hadoop jobs faster on more cores and know your costs aren’t getting rounded up to a whole hour.

No storage management overhead: Whereas HDFS requires routine maintenance -- like file system checks, rebalancing, upgrades, rollbacks and NameNode restarts -- Google Cloud Storage just works. Your data is safe and consistent with no extra effort.

Interoperability: By keeping your data in Google Cloud Storage, you can benefit from all of the other Google services that already play nicely together.

Performance: Google’s infrastructure delivers high performance from Google Cloud Storage that’s comparable to HDFS -- without the overhead and maintenance.

simple tutorialfeedback and ideas

Labels: Storage & Databases

Free Trial

GCP Blogs

Big Data & Machine Learning
Kubernetes
GCP Japan Blog
Firebase Blog
Apigee Blog

Popular Posts

A look inside Google’s Data Center Networks
World's largest event dataset now publicly available in BigQuery
Fans come on stage in Azealia Banks’ new interactive video, built on Google Cloud Platform
Understanding Cloud Pricing
New in Google Cloud Storage: auto-delete, regional buckets and faster uploads

Labels

Announcements 193
Big Data & Machine Learning 134
Compute 271
Containers & Kubernetes 92
CRE 27
Customers 107
Developer Tools & Insights 151
Events 38
Infrastructure 44
Management Tools 87
Networking 43
Open 1
Open Source 135
Partners 102
Pricing 28
Security & Identity 85
Solutions 24
Stackdriver 24
Storage & Databases 164
Weekly Roundups 20

Feed

Subscribe by email

Demonstrate your proficiency to design, build and manage solutions on Google Cloud Platform.

Technical questions? Check us out on Stack Overflow.
Subscribe to our monthly newsletter.

Company-wide

Official Google Blog
Enterprise Blog
Student Blog

Products

Official Android Blog
Chrome Blog
Lat Long Blog

Developers

Ads Developer Blog
Android Developers Blog
Developers Blog

Google
Privacy
Terms