Google Cloud Platform Blog
Product updates, customer stories, and tips and tricks on Google Cloud Platform
Processing logs at scale using Cloud Dataflow
Tuesday, November 24, 2015
Logs generated by applications and services can provide an immense amount of information about how your deployment is running and the experiences your users are having as they interact with the products and services. But as deployments grow more complex, gleaning insights from this data becomes more challenging. Logs come from an increasing number of sources, so they can be hard to collate and query for useful information. And building, operating and maintaining your own infrastructure to analyze log data at scale requires extensive expertise in running distributed systems and storage. Today, we’re introducing
a new solution paper
and
reference implementation
that will show how you can process logs from multiple sources and extract meaningful information by using
Google Cloud Platform
and
Google Cloud Dataflow
.
Log processing typically involves some combination of the following activities:
Configuring applications and services
Collecting and capturing log files
Storing and managing log data
Processing and extracting data
Persisting insights
Each of those components has it’s own scaling and management challenges, often using different approaches at different times. These sorts of challenges can slow down the generation of meaningful, actionable information from your log data.
Cloud Platform
provides a number of services that can help you to address these challenges. You can use
Cloud Logging
to collect logs from applications and services, and then store them in
Google Cloud Storage
buckets or stream them to Pub/Sub topics. Dataflow can read from Cloud Storage or
Pub/Sub
(and many more), process log data, extract and transform metadata and compute aggregations. You can persist the output from Dataflow in
BigQuery
, where it can be analyzed or reviewed anytime. These mechanisms are offered as managed services—meaning they can scale when needed. That also means that you don't need to worry about provisioning resources up front.
The
solution paper
and
reference implementation
describe how you can use Dataflow to process log data from multiple sources and persist findings directly in BigQuery. You’ll learn how to configure Cloud Logging to collect logs from applications running in
Container Engine
, how to export those logs to Cloud Storage, and how to execute the Dataflow processing job. In addition, the solution shows you how to reconfigure Cloud Logging to use Pub/Sub to stream data directly to Dataflow, so you can process logs in real-time.
Check out the
Processing Logs at Scale using Cloud Dataflow
solution to learn how to combine logging, storage, processing and persistence into a scalable log processing approach. Then take a look at the
reference implementation tutorial
on Github to deploy a complete end-to-end working example. Feedback is welcome and appreciated; comment here, submit a pull request, create an issue, or find me on Twitter
@crcsmnky
and let me know how I can help.
-
Posted by Sandeep Parikh, Google Solutions Architect
Free Trial
GCP Blogs
Big Data & Machine Learning
Kubernetes
GCP Japan Blog
Firebase Blog
Apigee Blog
Popular Posts
Understanding Cloud Pricing
World's largest event dataset now publicly available in BigQuery
A look inside Google’s Data Center Networks
Enter the Andromeda zone - Google Cloud Platform’s latest networking stack
Getting your data on, and off, of Google App Engine
Labels
Announcements
193
Big Data & Machine Learning
134
Compute
271
Containers & Kubernetes
92
CRE
27
Customers
107
Developer Tools & Insights
151
Events
38
Infrastructure
44
Management Tools
87
Networking
43
Open
1
Open Source
135
Partners
102
Pricing
28
Security & Identity
85
Solutions
24
Stackdriver
24
Storage & Databases
164
Weekly Roundups
20
Feed
Subscribe by email
Demonstrate your proficiency to design, build and manage solutions on Google Cloud Platform.
Learn More
Technical questions? Check us out on
Stack Overflow
.
Subscribe to
our monthly newsletter
.
Google
on
Follow @googlecloud
Follow
Follow