Monday, September 26, 2022

Hive - Q&A - Part -3

 

Hive Interview Questions


26. How to add the partition in existing table without the partition table?

Ans. Basically, we cannot add/create the partition in the existing table, especially which was not partitioned while creation of the table.

Although, there is one possible way, using “PARTITIONED BY” clause. But the condition is if you had partitioned the existing table, then by using the ALTER TABLE command, you will be allowed to add the partition.

So, here are the create and alter commands:

  • CREATE TABLE tab02 (foo INT, bar STRING) PARTITIONED BY (mon STRING);
  • ALTER TABLE tab02 ADD PARTITION (mon =’10’) location ‘/home/hdadmin/hive-0.13.1-cdh5.3.2/examples/files/kv5.txt’;

27. Explain Clustering in Hive?

Ans. Basically, to decompose table data sets into more manageable parts is Clustering in Hive
To be more specific, the table is divided into the number of partitions, and these partitions can be further subdivided into more manageable parts known as Buckets/Clusters.

In addition, “clustered by” clause is used to divide the table into buckets.

28. How to Write a UDF function in Hive?

Ans. Basically, following are the steps:

  1. Create a Java class for the User Defined Function which extends ora.apache.hadoop.hive.sq.exec.UDF and implements more than one evaluate() methods. Put in your desired logic and you are almost there.
  2. Package your Java class into a JAR file
  3. Go to Hive CLI, add your JAR, and verify your JARs is in the Hive CLI classpath
  4. CREATE TEMPORARY FUNCTION in Hive which points to your Java class
  5. Then Use it in Hive SQL.

29. What are different modes of metastore deployment in Hive?

Ans. There are three modes for metastore deployment which Hive offers.

1. Embedded metastore

Here, by using embedded Derby Database both metastore service and hive service runs in the same JVM.

2. Local Metastore

However, here, Hive metastore service runs in the same process as the main Hive Server process, but the metastore database runs in a separate process.

3. Remote Metastore

Here, metastore runs on its own separate JVM, not in the Hive service JVM.

30. Give the command to see the indexes on a table.

Ans. SHOW INDEX ON table_name
Basically, in the table table_name, this will list all the indexes created on any of the columns.

31. How do you specify the table creator name when creating a table in Hive?

Ans. The TBLPROPERTIES clause is used to add the creator name while creating a table.
The TBLPROPERTIES is added like −
TBLPROPERTIES(‘creator’= ‘Joan’)

32. Difference between Hive and Impala?

Ans. Following are the feature wise comparison between Impala vs Hive:

1. Query Process

  • Hive

Basically, in Hive every query has the common problem of a “cold start”.

  • Impala

Impala avoids any possible startup overheads, being a native query language. However, that are very frequently and commonly observed in MapReduce based jobs. Moreover, to process a query always Impala daemon processes are started at the boot time itself, making it ready.`

2. Intermediate Results

  • Hive

Basically, Hive materializes all intermediate results. Hence, it enables enabling better scalability and fault tolerance. However, that has an adverse effect on slowing down the data processing.

  • Impala

However, it’s streaming intermediate results between executors. Although, that trades off scalability as such.

3. During the Runtime

  • Hive

At Compile time, Hive generates query expressions.

  • Impala

During the Runtime, Impala generates code for “big loops”.


No comments:

Post a Comment

Spark- Window Function

  Window functions in Spark ================================================ -> Spark Window functions operate on a group of rows like pa...