Google Cloud Platform Blog: January 2014

BigQuery in Practice - Loading Data Sets that are Terabytes and Beyond

Friday, January 31, 2014

Google BigQuery

Its hosting model allows you to quickly run your data analysis without having to set up a costly computing infrastructure.

The interactive speed allows your analysts to quickly validate hypothesis about their insights.

BigQuery in practice - Loading Data Sets that are Terabytes and Beyond

Scenarios for data ingestion into BigQuery

Announcing the Google Cloud Platform Developer Challenge 2013 Winners

Thursday, January 30, 2014

Google Cloud Developer ChallengeGoogle App EngineGoogle+YouTubeMaps

Enterprise/Small Business Solutions , Education, Not for Profit

Social / Personal Productivity/Games / Fun

Latin America

Sub Saharan Africa

South East Asia

Middle East & North Africa

India

Rest of the World

434 submissions80 judges from 26 countries worldwide98 finalistshere

Large Akka Cluster on Google Compute Engine

Wednesday, January 22, 2014

Our guest blog post today comes from Patrik Nordwall, Senior Software Engineer at Typesafe. His areas of expertise include Scala, Akka, and how to build reactive applications.AkkaTypesafeActor ModelGoogle Compute EngineRunning a 2400 Akka Nodes Clustersmall test applicationAkka Cluster

Starting Up 1000 Nodes
Testing Detailsakka/appsf86ba8db18read my post

Learn about Permissions on Google Cloud Platform

Tuesday, January 21, 2014

newly published guide to understanding permissions, projects and accounts on Google Cloud Platform

Performance advantages of the new Google Cloud Storage Connector for Hadoop

Wednesday, January 15, 2014

Our guest blog post today comes from Mike Wendt, R&D Associate Manager at Accenture Technology Labs, who recently published a study detailing the real world performance advantages of Hadoop on Google Compute Engine. His team utilized the recently launched Google Cloud Storage Connector for Hadoop and observed significant performance improvements over HDFS on local filesystemsAccenture Technology LabsGoogle Cloud Storage Connector for HadoophereOriginal experiment setup

Figure 1. Data-flow model using input and output copies to local disk HDFS

Modified experiment setup using Google Cloud Storage Connector for Hadoop

Figure 2. Data-flow model using Google Cloud Storage Connector for Hadoop

Result of experimentsRecommendation engine

Figure 3. Recommendation engine execution times

Sessionization

Figure 4. Sessionization execution times

Document clustering

Figure 5. Document clustering execution times

Conclusionrecent study

Easier, faster and lower cost Big Data processing with the Google Cloud Storage Connector for Hadoop

Tuesday, January 14, 2014

fastApache HadoopHadoop on Google Cloud PlatformGoogle Cloud Storage connector for Hadoop

Diagram of Hadoop on Google Cloud Platform. HDFS and the NameNode are optional when storing data in Google Cloud Storage

Google File System (GFS)ColossusGoogle Cloud Storage

Compatibility: The Google Cloud Storage connector for Hadoop code-compatible with Hadoop. Just change the URL to point to your data.

Quick startup: Your data is ready to process. You don’t have to wait for extra minutes or more while your data is copied over to HDFS and the NameNode comes out of safe mode, and you don’t have to pay for the VM time for data copying either.

Greater availability and scalability: Google Cloud Storage is globally replicated and has higher availability than HDFS because it’s independent of the compute nodes and the NameNode. If the VMs are turned down (or, cloud forbid, crash) your data lives on.

Lower costs: Save on storage and compute: storage, because there’s no need to maintain two copies of your data, one for backups and one for running Hadoop; compute, because you don’t need to keep VMs going just to serve data. And with per-minute billing, you can run Hadoop jobs faster on more cores and know your costs aren’t getting rounded up to a whole hour.

No storage management overhead: Whereas HDFS requires routine maintenance -- like file system checks, rebalancing, upgrades, rollbacks and NameNode restarts -- Google Cloud Storage just works. Your data is safe and consistent with no extra effort.

Interoperability: By keeping your data in Google Cloud Storage, you can benefit from all of the other Google services that already play nicely together.

Performance: Google’s infrastructure delivers high performance from Google Cloud Storage that’s comparable to HDFS -- without the overhead and maintenance.

simple tutorialfeedback and ideas

A better way to explore and learn on GitHub

Monday, January 13, 2014

launched our GitHub organization,

some clever naming schemesgooglecloudplatform.github.ioGoogle Cloud Platform

Multisite hosting using Protocol Forwarding, Compute Engine’s new forwarding capability

Thursday, January 9, 2014

Serving 1 million requests per second using Compute EngineProtocol ForwardingCompute EngineLayer-3 Load Balancingadvanced routingForwarding Rulesquickstart guidepricing and promotion details