How to Run Hive Scripts?

Being a Data Warehousing package built on top of Hadoop, Apache Hive is increasingly getting used for data analysis, data mining and predictive modeling. Organizations are looking for professionals with a firm hold on Hive & Hadoop skills. In this post, let’s look at how to run Hive Scripts. In general, we use the scripts to execute a set of statements at once. Hive Scripts are used pretty much in the same way. It will reduce the time and effort we put on to writing and executing each command manually.

Apache Hive is an integral part of Hadoop eco-system. Hive can be defined as a data warehouse-like software that facilitates query and large data management on HDFS (Hadoop distributed file system). One must remember that Hive is not data warehouse software, rather it provides a mechanism to manage data on distributed environment and query it by using an SQL-like language called Hive Query Language, or HiveQL. Hive scripts can be defined as a group of Hive commands bundled together to reduce the execution time. In this article, we will discuss Hive scripts.

Introduction

Hadoop Distributed File System, or HDFS, provides a scalable and fault tolerant enabled data storage. HIVE provides a simple SQL like query language – HiveQL. HiveQL allows the traditional map reduce developers to plug-in their custom mappers and reducers to do more sophisticated analysis.

Limitation of Hive

Latency for Hive queries is usually very high because of the substantial overhead in job submission and scheduling. Hive does not offer real time queries and row level updates. It is best used for log analysis

How to write and run the hive script :

Step 1: Writing a Hive script.

To write the Hive Script the file should be saved with .sql extension. Open a terminal and give the following command to create a Hive Script. Command: sudo gedit sample.sql

On executing the above command, it will open this file in gedit

1. Create the Data to store into the Table.

To load the data into the table first we need to create an input file which contains the records that need to be inserted in the table.
Let us create an input file.
Command: sudo gedit input.txt
Edit the contents into it you want to store into table

2. Creating the Table in Hive:

Command: create table product ( productid: int, productname: string, price: float, category: string) rows format delimited fields terminated by ‘,’ ;
Here, product is the table name and { productid, productname, price, category} are the columns of this table.
Fields terminated by ‘,’ indicate that the columns in the input file are separated by the symbol ‘,’. By default the records in the input file are separated by a new line.

3. Describing the Table:

Command: describe product

4. Retrieving the Data:

To retrieve the data, the select command is used.
Command: Select * from product;
The above command is used to retrieve the value of all the columns present in the table.
Now, we are done with writing the Hive script. The file sample.sql can now be saved.

Step 2: Running the Hive Script

The following is the command to run the Hive script:

Command: hive –f /home/sample.sql
While executing the script, make sure that the entire path of the location of the Script file is present.
This is how Hive scripts are run and executed.

Hive is a critical component of Hadoop and your expertise in Hive can land you top-paying Hadoop jobs! Mildaintrainings has a specially curated Big Data Analytics course that helps you master concepts such as MapReduce, Yarn, Pig, Hive, HBase, Oozie, Flume and Sqoop.Enroll and get Certified Now.