Google Cloud Platform Blog
Product updates, customer stories, and tips and tricks on Google Cloud Platform
We think Germany will win. But don’t take our word for it...
Friday, July 11, 2014
We’ve had a great time giving you our predictions for the World Cup (check out our post before the
quarter-finals
and
semi-finals
). So far, we’ve gotten 13 of 14 games correct. But this isn't about us picking winners in World Cup soccer - it’s about what you can do with Google Cloud Platform. Now, we are open-sourcing our prediction model and packaging it up so you can do your own analysis and predictions.
We used
Google Cloud Dataflow
to ingest raw, touch-by-touch gameplay data from
Opta
for thousands of soccer matches. This data goes back to the 2006 World Cup, three years of English Barclays Premier League, two seasons of Spanish La Liga, and two seasons of U.S. MLS. We then polished the raw data into predictive statistics using
Google BigQuery
.
You can see BigQuery engineer Jordan Tigani (
+JordanTigani
) and developer advocate Felipe Hoffa (
@felipehoffa
) talk about how we did it in
this video from Google I/O
.
Our prediction for the final
It’s a narrow call, but Germany has the edge: our model gives them a 55% chance of defeating Argentina due to a number of factors. Thus far in the tournament, they’ve had better passing in the attacking half of their field, a higher number of shots (64 vs. 61) and a higher number of goals scored (17 vs. 8).
But, 55% is only a small edge. And, although we've been trumpeting our 13 of 14 record, picking winners isn't exactly the same as predicting outcomes. If you'd asked us which scenario was more likely, a 7 to 1 win for Germany against Brazil or a 0 to 1 defeat of Germany by Brazil,
we wouldn't have gotten that one quite right
.
(Oh, and we think Brazil has a tiny advantage in the third place game. They may have had a disappointing defeat on Tuesday, but the numbers still look good.)
But don’t take our word for it...
Now it’s your turn to take a stab at predicting. We have provided an
IPython notebook
that shows exactly how we built our model and used it to predict matches. We had to aggregate the data that we used, so you can't compute additional statistics from the raw data. However, for the real data geeks, you could try to see how well neural networks can predict the same data or try advanced techniques like principal components analysis. Alternatively, you can try adding your own features like player salaries or team travel distance. We've only scratched the surface, and there are lots of other approaches you can take.
You might also try simulating how the USA would have done if they had beat Belgium. Or how Germany in 2014 would fare against the unstoppable Spanish team of 2010. Or you could figure out whether the USA team is getting better by simulating the 2006 team against the 2010 and 2014 teams.
Here’s how you can do it
We’ve put everything on GitHub
. You’ll find the
IPython notebook
containing all of the code (using pandas and statsmodels) to build the same machine learning models that we've used to predict the games so far. We've packaged it all up in a Docker container so that you can run your own
Google Compute Engine
instance to crunch the data. For the most up-to-date step-by-step instructions, check out the
readme on GitHub
.
-Posted by Benjamin Bechtolsheim, Product Marketing Manager
Free Trial
GCP Blogs
Big Data & Machine Learning
Kubernetes
GCP Japan Blog
Firebase Blog
Apigee Blog
Popular Posts
Understanding Cloud Pricing
World's largest event dataset now publicly available in BigQuery
A look inside Google’s Data Center Networks
New in Google Cloud Storage: auto-delete, regional buckets and faster uploads
Enter the Andromeda zone - Google Cloud Platform’s latest networking stack
Labels
Announcements
193
Big Data & Machine Learning
134
Compute
271
Containers & Kubernetes
92
CRE
27
Customers
107
Developer Tools & Insights
151
Events
38
Infrastructure
44
Management Tools
87
Networking
43
Open
1
Open Source
135
Partners
102
Pricing
28
Security & Identity
85
Solutions
24
Stackdriver
24
Storage & Databases
164
Weekly Roundups
20
Feed
Subscribe by email
Demonstrate your proficiency to design, build and manage solutions on Google Cloud Platform.
Learn More
Technical questions? Check us out on
Stack Overflow
.
Subscribe to
our monthly newsletter
.
Google
on
Follow @googlecloud
Follow
Follow