Google Cloud Platform Blog
Product updates, customer stories, and tips and tricks on Google Cloud Platform
Announcing General Availability of Google Cloud Dataflow and Cloud Pub/Sub
Wednesday, August 12, 2015
By the time you are done reading this blog post,
Google Cloud Platform
customers will have processed hundreds of millions of messages and analyzed thousands of terabytes of data utilizing
Cloud Dataflow
,
Cloud Pub/Sub
, and
BigQuery
. These fully-managed services remove the operational burden found in traditional data processing systems. They enable you to build applications on a platform that can scale with the growth of your business and drive down data processing latency, all while processing your data efficiently and reliably.
Every day, customers use Google Cloud Platform to execute business-critical big data processing workloads, including: financial fraud detection,
genomics analysis
, inventory management, click-stream analysis, A/B user interaction testing and cloud-scale ETL.
Today we are removing our “beta” label and making Cloud Dataflow generally available. Cloud Dataflow is specifically designed to remove the complexity of developing separate systems for batch and streaming data sources by providing a unified programming model. Based on more than a decade of Google innovation, including
MapReduce
,
FlumeJava
, and
Millwheel
, Cloud Dataflow is built to free you from the operational overhead related to large scale cluster management and optimization.
Cloud Dataflow provides a unified computation model for batch and streaming processing
With Cloud Dataflow GA you get:
A
fully managed
, fault tolerant, highly available, SLA-backed service for batch and stream processing.
"We are utilizing Cloud Dataflow to overcome elasticity challenges with our current Hadoop cluster. Starting with some basic ETL workflow for BigQuery ingestion, we transitioned into full blown clickstream processing and analysis. This has helped us significantly improve performance of our overall system and reduce cost."
Sudhir Hasbe, Director of Software Engineering,
Zullily.com
“The current iteration of Qubit’s real-time data supply chain was heavily inspired by the ground-breaking stream processing concepts described in Google’s MillWheel paper. Today we are happy to come full circle and build streaming pipelines on top of Cloud Dataflow - which has delivered on the promise of a highly-available and fault-tolerant data processing system with an incredibly powerful and expressive API.”
Jibran Saithi, Lead Architect, Qubit
A comprehensive model for balancing correctness, latency, and cost
when dealing with unordered data at massive scale. These concepts power key elements of the Cloud Dataflow programming model.
"Streaming Google Cloud Dataflow perfectly fits requirements of time series analytics platform at Wix.com, in particular, its scalability, low latency data processing and fault-tolerant computing. Wide range of data collection transformations and grouping operations allow to implement complex stream data processing algorithms."
Gregory Bondar, Ph.D., Sr. Director of Data Services Platform, Wix.com
Great performance. Cloud Dataflow is 2-3x faster and cheaper than Hadoop when evaluating classic MapReduce based pipelines, such as PageRank and WordCount. And with
dynamic work rebalancing
, Cloud Dataflow effectively optimizes resource utilization which provides additional performance gains without requiring manual intervention.
An extensible SDK. We have
expanded our technology partner
, 3rd party connector, and service provider integration efforts including
Tamr
,
Salesforce
,
ClearStory
,
springML
,
Cloudera
,
data Artisans
. We also continue to support alternate runner enablement for Apache Spark and
Apache Flink
.
"We're excited to collaborate with Google Cloud Platform on integrations with Salesforce Wave. The integrations with Google Cloud Dataflow further enable Wave to deliver insights to business users. Businesses can now use vast, diverse datasets like machine-generated data to derive customer insights in near-real-time."
Olivier Pin, VP of Product Management, Wave Analytics, Salesforce.com
"Tamr and Google Cloud Dataflow are simplifying how people access and use crucial data and distributed computing assets in the enterprise. The combination of Cloud Dataflow and Tamr running on Google Cloud Platform enables organizations to connect and enrich their enterprise data at internet scale."
Andy Palmer, co-founder and CEO of Tamr, Inc.
Cloud Dataflow seamlessly integrates with Google Cloud Platform, third party services & data stores
Native Google Cloud Platform integration for Cloud Storage, Cloud Datastore, BigQuery, and Cloud Pub/Sub. You now get full query support for our BigQuery source. Our integration with Cloud Pub/Sub now provides source timestamp processing in addition to arrival time processing. Source timestamps, when combined with flexible Windowing and Triggering primitives, enable developers to produce more accurate windows of data output.
"We are very excited about the productivity benefits offered by Cloud Dataflow and Cloud Pub/Sub. It took half a day to rewrite something that had previously taken over six months to build using Spark"
Paul Clarke, Director of Technology, Ocado
A decade of internal innovation also stands behind today’s general availability of Google Cloud Pub/Sub. Delivering over a trillion messages for our alpha and
beta
customers has helped tune our performance, refine our
v1
API
, and ensure a stable foundation for
Cloud Dataflow’s streaming ingestion
,
Cloud Logging’s streaming export
,
Gmail’s Push API
, and Cloud Platform customers streaming their own production workloads — at rates up to 1 million message operations per second.
Such diverse scenarios demonstrate how Cloud Pub/Sub is designed to deliver real-time and reliable messaging — in one global, managed service that helps you create simpler, more robust, and more flexible applications.
Cloud Pub/Sub connects your services to each other, to other Google APIs, and third parties.
Cloud Pub/Sub can help integrate applications and services reliably, as well as analyze big data streams in real-time. Traditional approaches require separate queueing, notification, and logging systems, each with their own APIs and tradeoffs between durability, availability, and scalability. Cloud Pub/Sub addresses a broad range of scenarios with a single API, a managed service that eliminates those tradeoffs, and remains cost-effective as you grow, with
pricing
as low as 5¢ per million message operations for sustained usage.
General availability is a key milestone, though hardly the end of the road.
We are continuing to innovate with the alpha release of the
gcloud
pubsub
tool and today’s beta release of our new
Identity and Access Management (IAM) APIs
and Permissions Editor in the Google Developers Console.These improvements allow users to control access down to the level of particular operations on specific topics and subscriptions. IAM ACLs make it easier to connect multiple Cloud Platform projects, either within the same organization or to third-party services.
Get Started
We’re looking forward to this next step for Google Cloud Platform as we continue to help developers and businesses everywhere benefit from Google’s technical and operational expertise in big data. Please visit
Cloud Dataflow
and
Cloud Pub/Sub
to learn more and contact us with your feedback, ideas for new connectors, or even new public data feeds we can help you share.
- Posted by Eric Schmidt (not that Eric), PM Cloud Dataflow & Rohit Khare, PM Cloud Pub/Sub
Free Trial
GCP Blogs
Big Data & Machine Learning
Kubernetes
GCP Japan Blog
Firebase Blog
Apigee Blog
Popular Posts
Understanding Cloud Pricing
World's largest event dataset now publicly available in BigQuery
A look inside Google’s Data Center Networks
Enter the Andromeda zone - Google Cloud Platform’s latest networking stack
New in Google Cloud Storage: auto-delete, regional buckets and faster uploads
Labels
Announcements
193
Big Data & Machine Learning
134
Compute
271
Containers & Kubernetes
92
CRE
27
Customers
107
Developer Tools & Insights
151
Events
38
Infrastructure
44
Management Tools
87
Networking
43
Open
1
Open Source
135
Partners
102
Pricing
28
Security & Identity
85
Solutions
24
Stackdriver
24
Storage & Databases
164
Weekly Roundups
20
Feed
Subscribe by email
Demonstrate your proficiency to design, build and manage solutions on Google Cloud Platform.
Learn More
Technical questions? Check us out on
Stack Overflow
.
Subscribe to
our monthly newsletter
.
Google
on
Follow @googlecloud
Follow
Follow