Google Cloud Platform Blog
Product updates, customer stories, and tips and tricks on Google Cloud Platform
Leverage Google Cloud Platform features for your Hadoop deployments
Thursday, February 27, 2014
Accenture recently
demonstrated
Google Compute Engine as an excellent place to run
Hadoop
for performance and price. They demonstrated both familiar HDFS-based deployments as well as performance and cost advantages of using the Google Cloud Storage
connector
.
If you use
Apache Hive
with Hadoop, we want to show you another great feature of the Google Cloud Platform that will save you time and money. The default Hive installation uses a local single-session database as the Hive Metastore to store Hive table and column names, column types, and so on. This local database is fine for single-user/single-session usage, but to use Hive with multiple concurrent connections, you need to install and maintain a relational database such as MySQL.
Google Cloud SQL
is managed MySQL in the cloud, which
now supports the MySQL wire protocol
, so you can connect to Google Cloud SQL using common tools from anywhere. Why not use it to run your MySQL database in the cloud too, and save yourself the time and cost of having to install and maintain MySQL? Let Google take care of it for you.
We provide a
sample application
to help you get started running Apache Hive and
Pig
on Google Compute Engine. The
README
includes detailed instructions for using Cloud SQL for the Hive Metastore.
The rest of this blog post hits the high points for what you need to do to use Google Cloud SQL for the Hive Metastore.
Create a Google Cloud SQL instance
In the
Google Cloud Console
, create a
Google Cloud SQL instance
. Make sure to:
Choose the Compute Engine zone of the Compute Engine instance running Hive as the preferred location for the Cloud SQL instance.
Select "Assign an IP Address" to the Cloud SQL instance.
Add the external IP address of the Compute Engine instance as an Authorized IP Address. Be sure to append "/32" to the IP address entered in this field.
Define a root password for MySQL (it's good practice).
Install MySQL client and MySQL JDBC Driver
Connect to the Google Compute Engine instance that is running Hive and install both the MySQL client and the MySQL JDBC driver. Add the JDBC driver JAR file to Hive's CLASSPATH. The simplest method is to copy the file to the hive/lib/directory.
Create the hivemeta database and the user to access it
Use the mysql command line tool to:
Connect to the Google Cloud SQL instance
Create a database called hivemeta (Hive requires the database to use latin1 character encoding, which you'll need to specify explicitly).
Create the database user (for example hdpuser).
Issue grants on the hivemeta database to the database user.
Configure Hive to use Cloud SQL
Update your hive/conf/hive-site.xml file to connect to the Google Cloud SQL database. Add the following configuration, replacing cloud-sql-ip-addr with the assigned IP address of the Cloud SQL instance, and replacing password with the hdpuser database user password:
Hive is now configured to use the Google Cloud SQL database as its metastore; there's no need for you to maintain a MySQL installation.
Other software that uses MySQL can similarly use Google Cloud SQL. Since Cloud SQL is MySQL, you won't need to migrate code - just establish connectivity and you are off and running.
-Posted by Matt Bookman, Cloud Solutions Architect
Free Trial
GCP Blogs
Big Data & Machine Learning
Kubernetes
GCP Japan Blog
Firebase Blog
Apigee Blog
Popular Posts
Understanding Cloud Pricing
World's largest event dataset now publicly available in BigQuery
A look inside Google’s Data Center Networks
New in Google Cloud Storage: auto-delete, regional buckets and faster uploads
Enter the Andromeda zone - Google Cloud Platform’s latest networking stack
Labels
Announcements
193
Big Data & Machine Learning
134
Compute
271
Containers & Kubernetes
92
CRE
27
Customers
107
Developer Tools & Insights
151
Events
38
Infrastructure
44
Management Tools
87
Networking
43
Open
1
Open Source
135
Partners
102
Pricing
28
Security & Identity
85
Solutions
24
Stackdriver
24
Storage & Databases
164
Weekly Roundups
20
Feed
Subscribe by email
Demonstrate your proficiency to design, build and manage solutions on Google Cloud Platform.
Learn More
Technical questions? Check us out on
Stack Overflow
.
Subscribe to
our monthly newsletter
.
Google
on
Follow @googlecloud
Follow
Follow