Sunday, August 21, 2022

Apache Hive Series-Architecture



Hive Architecture:

As you know, Hive is a query engine which is built upon Hadoop and uses MapReduce model for processing. Before going for optimization series, we need to understand its architecture.

The main components of Hive as follows:

UI(User Interface): It is an interface between User and Hive. It enables user to submit queries and other operations.

Client: It is the Command Line(CLI) interface through which we can submit the hive queries.

Server: It is known as apache thrift server, which accepts the requests from hive clients and submit to hive driver.

Driver: The driver is mainly to receive the queries from various resources like thrift, JDBC, ODBS, UI and CLIENTS. After receiving the queries, it transfers in to the complier.

Compiler: It performs query parsing, generate execution plan based on table and partition info of metastore.

Metastore: Here hive stores the meta-information about the databases like schema of the table, data types of the columns, location in the HDFS, etc

Executor:
It will execute the execution plan generated by complier. It manages dependencies within stage of DAG and execute it on system.

Job Execution flow:

The user will execute query from CLI/UI, once queries send by server, Driver designs a session handle for query to complier, which gets necessary metadata from metastore and make an execution plan and then transfer execute plan to driver, which send it to executor. The executor will execute(via MapReduce) and return the results to driver, finally the result will be submitted to UI by driver.




No comments:

Post a Comment

Spark- Window Function

  Window functions in Spark ================================================ -> Spark Window functions operate on a group of rows like pa...