Google Cloud Platform Blog
Product updates, customer stories, and tips and tricks on Google Cloud Platform
What Lipstick can reveal about your Hadoop pipeline
Monday, March 10, 2014
It would be a strong statement to say that lipstick can change your life, but when it comes to supporting your data analysts who use
Apache Pig
, we think that
Netflix Lipstick
can make a big difference.
For some of us, when we see sights like this:
we get a little rush of adrenaline and know we are in the zone to debug, analyze, and optimize.
But for most, a little graphical visualization is in order. Lipstick makes life easier for all of us to understand our Pig data flows, recognize what's inefficient, and fix what is just plain incorrect.
Get a high-level look at the sequence of Hadoop jobs executing as a result of your Pig script. Watch the flow of data in real-time:
Pop-up a sampling of the output from one stage of the pipeline:
In doing so, data analysts are able to quickly observe mistakes and inefficiencies in their Pig jobs. A common observation is that data rows or columns are filtered much later than needed. Eliminating those data elements earlier produces more efficient Pig jobs. This reduces time to completion as well as cost.
I first saw Lipstick at a
Netflix OSS Meetup
and thought it was a great tool to increase data analyst and software engineering productivity.
If you are running Pig jobs on Google Compute Engine, we've got instructions to help you run
Netflix Lipstick on Google Compute Engine
. If you are not yet using Hadoop on Google Compute Engine, we have
resources to help you get started
.
-Posted by Matt Bookman, Solutions Architect
Free Trial
GCP Blogs
Big Data & Machine Learning
Kubernetes
GCP Japan Blog
Firebase Blog
Apigee Blog
Popular Posts
Understanding Cloud Pricing
World's largest event dataset now publicly available in BigQuery
A look inside Google’s Data Center Networks
Enter the Andromeda zone - Google Cloud Platform’s latest networking stack
New in Google Cloud Storage: auto-delete, regional buckets and faster uploads
Labels
Announcements
193
Big Data & Machine Learning
134
Compute
271
Containers & Kubernetes
92
CRE
27
Customers
107
Developer Tools & Insights
151
Events
38
Infrastructure
44
Management Tools
87
Networking
43
Open
1
Open Source
135
Partners
102
Pricing
28
Security & Identity
85
Solutions
24
Stackdriver
24
Storage & Databases
164
Weekly Roundups
20
Feed
Subscribe by email
Demonstrate your proficiency to design, build and manage solutions on Google Cloud Platform.
Learn More
Technical questions? Check us out on
Stack Overflow
.
Subscribe to
our monthly newsletter
.
Google
on
Follow @googlecloud
Follow
Follow