Google Cloud Platform Blog
Product updates, customer stories, and tips and tricks on Google Cloud Platform
Google Compute Engine helps Mendelics diagnose genetic diseases
Monday, December 2, 2013
Today’s guest blog comes from Rodrigo Borges, senior bioinformatician of
Mendelics
, a molecular diagnostics company based in São Paulo, Brazil. The company diagnoses genetic diseases by sequencing the human genome to identify mutations that cause inherited diseases. While the company currently has fewer than 20 employees, it tackles a multi-billion dollar problem by helping the over 100,000 children that are born with genetic diseases in Brazil.
Next-generation sequencing, the reading of DNA and storing the information as a digital file, generates a huge amount of data: 25 GB per exome and 150 GB per genome. These files contain hundreds of millions of short sequences including information like where each read came from in the patient’s genome or sequences existing in the patient that aren’t present in healthy individuals. These short sequences then need to be assembled like a jigsaw puzzle and genotyped to determine all of the differences between the patient’s DNA sequence and reference sequence. After that, we interpret the list of genetic variants or mutations, a process that typically uses a cluster for up to seven days for a single patient.
Our workload for processing DNA sequencing requests varies from day to day. Thus, the ability to rapidly scale with increased demand for processing power was the main reason we migrated to the cloud. We have a web-based app in
Google App Engine
which controls the workflow of samples at Mendelics. At the end of the process, physicians are able to search among the variety of samples in an easy-to-use way where almost all necessary information is one click away. In this dynamic app, several filters are automatically applied so that more significant variants among millions of possibilities emerge for physicians. Before this app, physicians had to manually examine spreadsheets with thousands of genetic mutations and are now thrilled to do real-time analysis with
Google Cloud Platform
.
We moved to
Google Compute Engine
for better integration with App Engine, which we were using for our Web-based workflow. We also use
Google Cloud Storage
for our bioinformatics pipeline as well as
Google BigQuery
for extremely fast and flexible processing and interpretation of DNA variants. The migration to Compute Engine was straightforward and took only one month.
We’re currently using Compute Engine for our development and analysis processing. Our app in App Engine starts up our pipeline, and one instance is created for each test. The code for the processing pipeline is on a persistent disk attached to all instances running a test. Instances only live while being processed, taking advantage of Google’s per-minute pricing.
The pipeline and App Engine communicate during the process for information about test status, and when the process is done, the results are uploaded to Cloud Storage so that App Engine can process and deliver results to the physicians. Finally, the instance is killed automatically.
We find that Compute Engine scales quickly, allowing us to easily meet the flow of new sequencing requests. In addition to scalability and integration with App Engine, it is simple to use, requires low maintenance and has high availability. Compute Engine also has great security management, custom metadata and friendly APIs.
Compute Engine has helped us scale with our demands and has been a key component to helping our physicians diagnose and cure genetic diseases in Brazil and around the world.
-Contributed by Rodrigo Borges, Senior Bioinformatician, Mendelics
Free Trial
GCP Blogs
Big Data & Machine Learning
Kubernetes
GCP Japan Blog
Firebase Blog
Apigee Blog
Popular Posts
Understanding Cloud Pricing
World's largest event dataset now publicly available in BigQuery
A look inside Google’s Data Center Networks
New in Google Cloud Storage: auto-delete, regional buckets and faster uploads
Enter the Andromeda zone - Google Cloud Platform’s latest networking stack
Labels
Announcements
193
Big Data & Machine Learning
134
Compute
271
Containers & Kubernetes
92
CRE
27
Customers
107
Developer Tools & Insights
151
Events
38
Infrastructure
44
Management Tools
87
Networking
43
Open
1
Open Source
135
Partners
102
Pricing
28
Security & Identity
85
Solutions
24
Stackdriver
24
Storage & Databases
164
Weekly Roundups
20
Feed
Subscribe by email
Demonstrate your proficiency to design, build and manage solutions on Google Cloud Platform.
Learn More
Technical questions? Check us out on
Stack Overflow
.
Subscribe to
our monthly newsletter
.
Google
on
Follow @googlecloud
Follow
Follow