SlideShare a Scribd company logo
1 of 41
Inside Hive (for beginners) 1 Takeshi NAKANO / Recruit Co. Ltd.
Why? Hive is good tool for non-specialist! The number of M/R controls the Hive processing time. ↓ How can we reduce the number? What can we do for this on writing HiveQL? ↓ How does Hive convert HiveQLto M/R jobs? On this, what optimizing processes are adopted? 7/6/2011 HIVE - A warehouse solution over Map Reduce Framework 2
Don’t you have.. This fb’s paper has a lot of information! But this is a little old.. 7/6/2011 HIVE - A warehouse solution over Map Reduce Framework 3
Component Level Analysis 7/6/2011 HIVE - A warehouse solution over Map Reduce Framework 4
Hive Architecture / Exec Flow 7/6/2011 HIVE - A warehouse solution over Map Reduce Framework 5 Client Hadoop Metastore Driver Compiler
Client Hadoop Driver Compiler Hive Workflow Hive has the operators which are minimum processing units. The process of each operator is done with HDFS operation or M/R jobs. The compiler converts HiveQL to the sets of operators. 7/6/2011 HIVE - A warehouse solution over Map Reduce Framework 6 Metastore
Hive Workflow Operators 7/6/2011 HIVE - A warehouse solution over Map Reduce Framework 7
Client Hadoop Metastore Driver Compiler Hive Workflow For M/R processing, Hiveuses ExecMaper and ExecReducer. On processing, we have 2 modes. Local processing mode Distributed processing mode 7/6/2011 HIVE - A warehouse solution over Map Reduce Framework 8
Client Hadoop Metastore Driver Compiler Hive Workflow On 1(Local mode)Hive fork the process with hadoop command.The plan.xml is made just on 1 and the single node processes this. On 2(Distributed mode).Hive send the process to exsistingJobTracker.The information is housed on DistributedCacheand processed on multi nodes. 7/6/2011 HIVE - A warehouse solution over Map Reduce Framework 9
Compiler : How to Process HiveQL 7/6/2011 HIVE - A warehouse solution over Map Reduce Framework 10 Client Hadoop Metastore Driver Compiler
“Plumbing” of HIVE compiler 7/6/2011 11 HIVE - A warehouse solution over Map Reduce Framework
“Plumbing” of HIVE compiler 7/6/2011 12 HIVE - A warehouse solution over Map Reduce Framework
Compiler Overview 13 Parser Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer
Compiler Overview 14 Hive QL Parser AST Semantic Analyzer QB Logical Plan Gen. Operator  Tree Logical Optimizer Operator  Tree Physical Plan Gen. Task Tree Physical Optimizer Task Tree
Parser Hive QL AST INSERT OVERWRITE TABLE access_log_temp2  SELECT a.user, a.prono, p.maker, p.price  FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono); Hive QL TOK_QUERY   + TOK_FROM       + TOK_JOIN           + TOK_TABREF               + TOK_TABNAME                   + "access_log_hbase"               + a           + TOK_TABREF               + TOK_TABNAME                   + "product_hbase"               + "p"           + "="               + "."                   + TOK_TABLE_OR_COL                       + "a"                   + "access_log_hbase"               + "."                   + TOK_TABLE_OR_COL                       + "p"                   + "prono“ AST   + TOK_INSERT       + TOK_DESTINATION           + TOK_TAB               + TOK_TABNAME                   + "access_log_temp2"       + TOK_SELECT           + TOK_SELEXPR               + "."                   + TOK_TABLE_OR_COL                       + "a"                   + "user"           + TOK_SELEXPR               + "."                   + TOK_TABLE_OR_COL                       + "a"                   + "prono"           + TOK_SELEXPR               + "."                   + TOK_TABLE_OR_COL                       + "p"                   + "maker"           + TOK_SELEXPR               + "."                   + TOK_TABLE_OR_COL                       + "p"                   + "price" Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser
Parser SQL AST INSERT OVERWRITE TABLE access_log_temp2  SELECT a.user, a.prono, p.maker, p.price  FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono); SQL TOK_QUERY   + TOK_FROM       + TOK_JOIN           + TOK_TABREF               + TOK_TABNAME                   + "access_log_hbase"               + a           + TOK_TABREF               + TOK_TABNAME                   + "product_hbase"               + "p"           + "="               + "."                   + TOK_TABLE_OR_COL                       + "a"                   + "access_log_hbase"               + "."                   + TOK_TABLE_OR_COL                       + "p"                   + "prono“   + TOK_INSERT       + TOK_DESTINATION           + TOK_TAB               + TOK_TABNAME                   + "access_log_temp2"       + TOK_SELECT           + TOK_SELEXPR               + "."                   + TOK_TABLE_OR_COL                       + "a"                   + "user"           + TOK_SELEXPR               + "."                   + TOK_TABLE_OR_COL                       + "a"                   + "prono"           + TOK_SELEXPR               + "."                   + TOK_TABLE_OR_COL                       + "p"                   + "maker"           + TOK_SELEXPR               + "."                   + TOK_TABLE_OR_COL                       + "p"                   + "price" AST 1 2 3 Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser
17 Semantic Analyzer (1/2) AST QB + TOK_FROM       + TOK_JOIN           + TOK_TABREF               + TOK_TABNAME                   + "access_log_hbase"               + a           + TOK_TABREF               + TOK_TABNAME                   + "product_hbase"               + "p"           + "="               + "."                   + TOK_TABLE_OR_COL                       + "a"                   + "access_log_hbase"               + "."                   + TOK_TABLE_OR_COL                       + "p"                   + "prono“ AST 1 QB MetaData AliasTo Table Info “a”=Table Info(“access_log_hbase”) “p”=Table Info(“product_hbase”) ParseInfo Join Node + TOK_JOIN     + TOK_TABREF         …     + TOK_TABREF         …     + “=”         … Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 17
18 Semantic Analyzer (2/2) AST QB       + TOK_DESTINATION           + TOK_TAB               + TOK_TABNAME                   + "access_log_temp2” AST 2 QB ParseInfo NameTo Destination Node + TOK_TAB     + TOK_TABNAME         +"access_log_temp2” Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 18 18
19 Semantic Analyzer (2/2) AST QB       + TOK_SELECT           + TOK_SELEXPR               + "."                   + TOK_TABLE_OR_COL                       + "a"                   + "user"           + TOK_SELEXPR               + "."                   + TOK_TABLE_OR_COL                       + "a"                   + "prono"           + TOK_SELEXPR               + "."                   + TOK_TABLE_OR_COL                       + "p"                   + "maker"           + TOK_SELEXPR               + "."                   + TOK_TABLE_OR_COL                       + "p"                   + "price" AST QB ParseInfo 3 Name To Select Node + TOK_SELECT     + TOK_SELEXPR         …      + TOK_SELEXPR         …     + TOK_SELEXPR         …     + TOK_SELEXPR         … Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 19 19
20 Logical Plan Generator (1/4) QB OP Tree QB MetaData AliasTo Table Info “a”=Table Info(“access_log_hbase”) “p”=Table Info(“product_hbase”) OP Tree TableScanOperator(“access_log_hbase”) TableScanOperator(“product_hbase”) Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 20 20
21 Logical Plan Generator (2/4) QB OP Tree QB ParseInfo  + TOK_JOIN           + TOK_TABREF               + TOK_TABNAME                   + "access_log_hbase"               + a           + TOK_TABREF               + TOK_TABNAME                   + "product_hbase"               + "p"           + "="               + "."                   + TOK_TABLE_OR_COL                       + "a"                   + "access_log_hbase"               + "."                   + TOK_TABLE_OR_COL                       + "p"                   + "prono“ ReduceSinkOperator(“access_log_hbase”) ReduceSinkOperator(“product_hbase”) OP Tree JoinOperator Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser
22 Logical Plan Generator (3/4) QB OP Tree QB ParseInfo Name To Select Node + TOK_SELECT     + TOK_SELEXPR         + "."              + TOK_TABLE_OR_COL                  + "a"              + "user"     + TOK_SELEXPR          + "."              + TOK_TABLE_OR_COL                  + "a"              + "prono"     + TOK_SELEXPR          + "."              + TOK_TABLE_OR_COL                  + "p"              + "maker"     + TOK_SELEXPR          + "."              + TOK_TABLE_OR_COL                  + "p"              + "price" OP Tree SelectOperator Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser
23 Logical Plan Generator (4/4) QB OP Tree QB MetaData Name To Destination Table Info “insclause-0”=     Table Info(“access_log_temp2”) OP Tree FileSinkOperator Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser
Logical Plan Generator (result) 24 LCF  OP Tree TableScanOperator TS_1 TableScanOperator TS_0 ReduceSinkOperator RS_2 ReduceSinkOperator RS_3 JoinOperator JOIN_4 SelectOperator SEL_5 FileSinkOperator FS_6 Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser
Logical Optimizer Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 25 25 25
Logical Optimizer (Predicate Push Down) INSERT OVERWRITE TABLE access_log_temp2  SELECT a.user, a.prono, p.maker, p.price  FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono); INSERT OVERWRITE TABLE access_log_temp2  SELECT a.user, a.prono, p.maker, p.price  FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono)  WHERE p.maker = 'honda'; Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 26 26
Logical Optimizer (Predicate Push Down) TableScanOperator TS_1 TableScanOperator TS_0 INSERT OVERWRITE TABLE access_log_temp2  SELECT a.user, a.prono, p.maker, p.price  FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono); ReduceSinkOperator RS_3 ReduceSinkOperator RS_2 JoinOperator JOIN_4 INSERT OVERWRITE TABLE access_log_temp2  SELECT a.user, a.prono, p.maker, p.price  FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono)  WHERE p.maker = 'honda'; SelectOperator SEL_6 FileSinkOperator FS_7 Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 27 27
INSERT OVERWRITE TABLE access_log_temp2  SELECT a.user, a.prono, p.maker, p.price  FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono); INSERT OVERWRITE TABLE access_log_temp2  SELECT a.user, a.prono, p.maker, p.price  FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono)  WHERE p.maker = 'honda'; Logical Optimizer (Predicate Push Down) TableScanOperator TS_1 TableScanOperator TS_0 ReduceSinkOperator RS_3 ReduceSinkOperator RS_2 JoinOperator JOIN_4 FilterOperator FIL_5 (_col8 = 'honda') SelectOperator SEL_6 FileSinkOperator FS_7 Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 28 28
Logical Optimizer (Predicate Push Down) TableScanOperator TS_1 TableScanOperator TS_0 INSERT OVERWRITE TABLE access_log_temp2  SELECT a.user, a.prono, p.maker, p.price  FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono); FilterOperator FIL_8 (maker = 'honda') ReduceSinkOperator RS_2 ReduceSinkOperator RS_3 JoinOperator JOIN_4 INSERT OVERWRITE TABLE access_log_temp2  SELECT a.user, a.prono, p.maker, p.price  FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono)  WHERE p.maker = 'honda'; FilterOperator FIL_5 (_col8 = 'honda') SelectOperator SEL_6 FileSinkOperator FS_7 Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 29 29
30 Physical Plan Generator OP Tree Task Tree MoveTask(Stage-0) Ope Tree LoadTableDesc TableScanOperator(TS_0) TableScanOperator(TS_1) ReduceSinkOperator(RS_2) MapRedTask(Stage-1/root) ReduceSinkOperator(RS_3) JoinOperator(JOIN_4) SelectOperator(SEL_5) FileSinkOperator(FS_6)  StatsTask(Stage-2) Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 30 30
OP Tree Task Tree MapRedTask (Stage-1/root) TableScanOperator(TS_0) Physical Plan Generator (result) 31 LCF  Mapper TableScanOperator TS_1 TableScanOperator TS_0 TableScanOperator(TS_1) ReduceSinkOperator RS_2 ReduceSinkOperator RS_3 ReduceSinkOperator(RS_2) MapRedTask(Stage-1/root) ReduceSinkOperator(RS_3) Reducer JoinOperator JOIN_4 JoinOperator(JOIN_4) SelectOperator SEL_5 SelectOperator(SEL_5) FileSinkOperator FS_6 Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 31 31 31
32 Physical Optimizer Task Tree Task Tree java/org/apache/hadoop/hive/ql/optimizer/physical/以下 Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser
33 Physical Optimizer (MapJoinResolver) Task Tree Task Tree MapRedTask (Stage-1) Mapper TableScanOperator TS_1 TableScanOperator TS_0 MapJoinOperator MAPJOIN_7 SelectOperator SEL_8 SelectOperator SEL_5 FileSinkOperator FS_6 Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 33
34 Physical Optimizer (MapJoinResolver) Task Tree Task Tree MapredLocalTask(Stage-7) MapRedTask (Stage-1) TableScanOperator TS_0 Mapper TableScanOperator TS_1 TableScanOperator TS_0 HashTableSinkOperator HASHTABLESINK_11 MapJoinOperator MAPJOIN_7 MapRedTask (Stage-1) SelectOperator SEL_8 Mapper TableScanOperator TS_1 SelectOperator SEL_5 MapJoinOperator MAPJOIN_7 FileSinkOperator FS_6 SelectOperator SEL_8 SelectOperator SEL_5 FileSinkOperator FS_6 Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 34
In the end 7/6/2011 HIVE - A warehouse solution over Map Reduce Framework 35 Client Hadoop Metastore Driver Compiler
In the end 36 Hive QL Parser AST Semantic Analyzer QB Logical Plan Gen. Operator  Tree Logical Optimizer Operator  Tree Physical Plan Gen. Task Tree Physical Optimizer Task Tree
End 7/6/2011 37
Appendix: What does Explain show? 7/6/2011 HIVE - A warehouse solution over Map Reduce Framework 38
Appendix: What does Explain show? hive> explain INSERT OVERWRITE TABLE access_log_temp2     >  SELECT a.user, a.prono, p.maker, p.price     >  FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono); OK ABSTRACT SYNTAX TREE:   (TOK_QUERY (TOK_FROM (TOK_JOIN (TOK_TABREF (TOK_TABNAME access_log_hbase) a) (TOK_TABREF (TOK_TABNAME product_hbase) p) (= (. (TOK_TABLE_OR_COL a) prono) (. (TOK_TABLE_OR_COL p) prono)))) (TOK_INSERT (TOK_DESTINATION (TOK_TAB (TOK_TABNAME access_log_temp2))) (TOK_SELECT (TOK_SELEXPR (. (TOK_TABLE_OR_COL a) user)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL a) prono)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL p) maker)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL p) price))))) STAGE DEPENDENCIES:   Stage-1 is a root stage   Stage-0 depends on stages: Stage-1   Stage-2 depends on stages: Stage-0 STAGE PLANS:   Stage: Stage-1     Map Reduce       Alias -> Map Operator Tree:         a TableScan             alias: a             Reduce Output Operator               key expressions: expr: prono                     type: int               sort order: +               Map-reduce partition columns: expr: prono                     type: int               tag: 0               value expressions: expr: user                     type: string expr: prono                     type: int         p TableScan             alias: p             Reduce Output Operator               key expressions: expr: prono                     type: int               sort order: +               Map-reduce partition columns: expr: prono                     type: int               tag: 1               value expressions: expr: maker                     type: string expr: price                     type: int Reduce Operator Tree:         Join Operator           condition map:                Inner Join 0 to 1           condition expressions:             0 {VALUE._col0} {VALUE._col2}             1 {VALUE._col1} {VALUE._col2} handleSkewJoin: false outputColumnNames: _col0, _col2, _col6, _col7           Select Operator             expressions: expr: _col0                   type: string expr: _col2                   type: int expr: _col6                   type: string expr: _col7                   type: int outputColumnNames: _col0, _col1, _col2, _col3             File Output Operator               compressed: false GlobalTableId: 1               table:                   input format: org.apache.hadoop.mapred.TextInputFormat                   output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe                   name: default.access_log_temp2   Stage: Stage-0     Move Operator       tables:           replace: true           table:               input format: org.apache.hadoop.mapred.TextInputFormat               output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe               name: default.access_log_temp2   Stage: Stage-2     Stats-Aggr Operator Time taken: 0.1 seconds hive>
Appendix: What does Explain show? hive> explain INSERT OVERWRITE TABLE access_log_temp2     >  SELECT a.user, a.prono, p.maker, p.price     >  FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono); OK ABSTRACT SYNTAX TREE:   (TOK_QUERY (TOK_FROM (TOK_JOIN (TOK_TABREF (TOK_TABNAME access_log_hbase) a) (TOK_TABREF (TOK_TABNAME product_hbase) p) (= (. (TOK_TABLE_OR_COL a) prono) (. (TOK_TABLE_OR_COL p) prono)))) (TOK_INSERT (TOK_DESTINATION (TOK_TAB (TOK_TABNAME access_log_temp2))) (TOK_SELECT (TOK_SELEXPR (. (TOK_TABLE_OR_COL a) user)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL a) prono)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL p) maker)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL p) price))))) STAGE DEPENDENCIES:   Stage-1 is a root stage   Stage-0 depends on stages: Stage-1   Stage-2 depends on stages: Stage-0 STAGE PLANS:   Stage: Stage-1     Map Reduce       Alias -> Map Operator Tree:         a TableScan             alias: a Reduce Output Operator               key expressions: expr: prono                     type: int               sort order: +               Map-reduce partition columns: expr: prono                     type: int               tag: 0               value expressions: expr: user                     type: string expr: prono                     type: int         p TableScan             alias: p Reduce Output Operator               key expressions: expr: prono                     type: int               sort order: +               Map-reduce partition columns: expr: prono                     type: int               tag: 1               value expressions: expr: maker                     type: string expr: price                     type: int ABSTRACT SYNTAX TREE: STAGE DEPENDENCIES:   Stage-1 is a root stage   Stage-0 depends on stages: Stage-1   Stage-2 depends on stages: Stage-0 STAGE PLANS:   Stage: Stage-1     Map Reduce       Map Operator Tree: TableScan             Reduce Output Operator TableScan             Reduce Output Operator       Reduce Operator Tree:         Join Operator           Select Operator             File Output Operator   Stage: Stage-0     Move Operator   Stage: Stage-2     Stats-Aggr Operator Reduce Operator Tree:         Join Operator           condition map:                Inner Join 0 to 1           condition expressions:             0 {VALUE._col0} {VALUE._col2}             1 {VALUE._col1} {VALUE._col2} handleSkewJoin: false outputColumnNames: _col0, _col2, _col6, _col7           Select Operator             expressions: expr: _col0                   type: string expr: _col2                   type: int expr: _col6                   type: string expr: _col7                   type: int outputColumnNames: _col0, _col1, _col2, _col3 File Output Operator               compressed: false GlobalTableId: 1               table:                   input format: org.apache.hadoop.mapred.TextInputFormat                   output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe                   name: default.access_log_temp2   Stage: Stage-0     Move Operator       tables:           replace: true           table:               input format: org.apache.hadoop.mapred.TextInputFormat               output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe               name: default.access_log_temp2   Stage: Stage-2     Stats-Aggr Operator Time taken: 0.1 seconds hive>
Appendix: What does Explain show? ABSTRACT SYNTAX TREE: STAGE DEPENDENCIES:   Stage-1 is a root stage   Stage-0 depends on stages: Stage-1   Stage-2 depends on stages: Stage-0 STAGE PLANS:   Stage: Stage-1     Map Reduce       Map Operator Tree: TableScan             Reduce Output Operator TableScan             Reduce Output Operator       Reduce Operator Tree:         Join Operator           Select Operator             File Output Operator   Stage: Stage-0     Move Operator   Stage: Stage-2     Stats-Aggr Operator MapRedTask (Stage-1/root) Mapper TableScanOperator TS_1 TableScanOperator TS_0 ReduceSinkOperator RS_2 ReduceSinkOperator RS_3 Reducer JoinOperator JOIN_4 ≒ SelectOperator SEL_5 FileSinkOperator FS_6 MoveTask (Stage-0) Stats Task (Stage-2)

More Related Content

What's hot

Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data Processing
DataWorks Summit
 

What's hot (20)

Introducing Change Data Capture with Debezium
Introducing Change Data Capture with DebeziumIntroducing Change Data Capture with Debezium
Introducing Change Data Capture with Debezium
 
Using Apache Calcite for Enabling SQL and JDBC Access to Apache Geode and Oth...
Using Apache Calcite for Enabling SQL and JDBC Access to Apache Geode and Oth...Using Apache Calcite for Enabling SQL and JDBC Access to Apache Geode and Oth...
Using Apache Calcite for Enabling SQL and JDBC Access to Apache Geode and Oth...
 
PySpark Best Practices
PySpark Best PracticesPySpark Best Practices
PySpark Best Practices
 
The Apache Spark File Format Ecosystem
The Apache Spark File Format EcosystemThe Apache Spark File Format Ecosystem
The Apache Spark File Format Ecosystem
 
High-speed Database Throughput Using Apache Arrow Flight SQL
High-speed Database Throughput Using Apache Arrow Flight SQLHigh-speed Database Throughput Using Apache Arrow Flight SQL
High-speed Database Throughput Using Apache Arrow Flight SQL
 
Optimizing Delta/Parquet Data Lakes for Apache Spark
Optimizing Delta/Parquet Data Lakes for Apache SparkOptimizing Delta/Parquet Data Lakes for Apache Spark
Optimizing Delta/Parquet Data Lakes for Apache Spark
 
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdfDeep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
 
Planning your Next-Gen Change Data Capture (CDC) Architecture in 2019 - Strea...
Planning your Next-Gen Change Data Capture (CDC) Architecture in 2019 - Strea...Planning your Next-Gen Change Data Capture (CDC) Architecture in 2019 - Strea...
Planning your Next-Gen Change Data Capture (CDC) Architecture in 2019 - Strea...
 
Hue architecture in the Hadoop ecosystem and SQL Editor
Hue architecture in the Hadoop ecosystem and SQL EditorHue architecture in the Hadoop ecosystem and SQL Editor
Hue architecture in the Hadoop ecosystem and SQL Editor
 
Dynamic filtering for presto join optimisation
Dynamic filtering for presto join optimisationDynamic filtering for presto join optimisation
Dynamic filtering for presto join optimisation
 
Airflow presentation
Airflow presentationAirflow presentation
Airflow presentation
 
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and HudiHow to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data Processing
 
Photon Technical Deep Dive: How to Think Vectorized
Photon Technical Deep Dive: How to Think VectorizedPhoton Technical Deep Dive: How to Think Vectorized
Photon Technical Deep Dive: How to Think Vectorized
 
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta LakeSimplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
 
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidPremier Inside-Out: Apache Druid
Premier Inside-Out: Apache Druid
 
Using Apache Hive with High Performance
Using Apache Hive with High PerformanceUsing Apache Hive with High Performance
Using Apache Hive with High Performance
 
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
 
Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Su...
Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Su...Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Su...
Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Su...
 
Building a Streaming Microservice Architecture: with Apache Spark Structured ...
Building a Streaming Microservice Architecture: with Apache Spark Structured ...Building a Streaming Microservice Architecture: with Apache Spark Structured ...
Building a Streaming Microservice Architecture: with Apache Spark Structured ...
 

Similar to Internal Hive

Developing A Real World Logistic Application With Oracle Application - UKOUG ...
Developing A Real World Logistic Application With Oracle Application - UKOUG ...Developing A Real World Logistic Application With Oracle Application - UKOUG ...
Developing A Real World Logistic Application With Oracle Application - UKOUG ...
Roel Hartman
 

Similar to Internal Hive (20)

Pdxpugday2010 pg90
Pdxpugday2010 pg90Pdxpugday2010 pg90
Pdxpugday2010 pg90
 
Hive_p
Hive_pHive_p
Hive_p
 
Python And GIS - Beyond Modelbuilder And Pythonwin
Python And GIS - Beyond Modelbuilder And PythonwinPython And GIS - Beyond Modelbuilder And Pythonwin
Python And GIS - Beyond Modelbuilder And Pythonwin
 
Lean & Mean Tokyo Cabinet Recipes (with Lua) - FutureRuby '09
Lean & Mean Tokyo Cabinet Recipes (with Lua) - FutureRuby '09Lean & Mean Tokyo Cabinet Recipes (with Lua) - FutureRuby '09
Lean & Mean Tokyo Cabinet Recipes (with Lua) - FutureRuby '09
 
Python 3000
Python 3000Python 3000
Python 3000
 
Developing A Real World Logistic Application With Oracle Application - UKOUG ...
Developing A Real World Logistic Application With Oracle Application - UKOUG ...Developing A Real World Logistic Application With Oracle Application - UKOUG ...
Developing A Real World Logistic Application With Oracle Application - UKOUG ...
 
Computer science project work
Computer science project workComputer science project work
Computer science project work
 
Migration testing framework
Migration testing frameworkMigration testing framework
Migration testing framework
 
Code Management
Code ManagementCode Management
Code Management
 
Танки_в_Лунапарке: нагрузочное_тестирование_в_Яндексе
Танки_в_Лунапарке: нагрузочное_тестирование_в_ЯндексеТанки_в_Лунапарке: нагрузочное_тестирование_в_Яндексе
Танки_в_Лунапарке: нагрузочное_тестирование_в_Яндексе
 
Jquery mobile
Jquery mobileJquery mobile
Jquery mobile
 
Wellington APAC Groundbreakers tour - Upgrading to the 12c Optimizer
Wellington APAC Groundbreakers tour - Upgrading to the 12c OptimizerWellington APAC Groundbreakers tour - Upgrading to the 12c Optimizer
Wellington APAC Groundbreakers tour - Upgrading to the 12c Optimizer
 
Introduction to Assembly Language
Introduction to Assembly LanguageIntroduction to Assembly Language
Introduction to Assembly Language
 
CloudKit
CloudKitCloudKit
CloudKit
 
JDBC Java Database Connectivity
JDBC Java Database ConnectivityJDBC Java Database Connectivity
JDBC Java Database Connectivity
 
VoCamp Seoul2009 Sparql
VoCamp Seoul2009 SparqlVoCamp Seoul2009 Sparql
VoCamp Seoul2009 Sparql
 
What's new in Rails 2?
What's new in Rails 2?What's new in Rails 2?
What's new in Rails 2?
 
Html5
Html5Html5
Html5
 
Php
PhpPhp
Php
 
TYPO3 Extension development using new Extbase framework
TYPO3 Extension development using new Extbase frameworkTYPO3 Extension development using new Extbase framework
TYPO3 Extension development using new Extbase framework
 

More from Recruit Technologies

More from Recruit Technologies (20)

新卒2年目が鍛えられたコードレビュー道場
新卒2年目が鍛えられたコードレビュー道場新卒2年目が鍛えられたコードレビュー道場
新卒2年目が鍛えられたコードレビュー道場
 
カーセンサーで深層学習を使ってUX改善を行った事例とそこからの学び
カーセンサーで深層学習を使ってUX改善を行った事例とそこからの学びカーセンサーで深層学習を使ってUX改善を行った事例とそこからの学び
カーセンサーで深層学習を使ってUX改善を行った事例とそこからの学び
 
Rancherを活用した開発事例の紹介 ~Rancherのメリットと辛いところ~
Rancherを活用した開発事例の紹介 ~Rancherのメリットと辛いところ~Rancherを活用した開発事例の紹介 ~Rancherのメリットと辛いところ~
Rancherを活用した開発事例の紹介 ~Rancherのメリットと辛いところ~
 
Tableau活用4年の軌跡
Tableau活用4年の軌跡Tableau活用4年の軌跡
Tableau活用4年の軌跡
 
HadoopをBQにマイグレしようとしてる話
HadoopをBQにマイグレしようとしてる話HadoopをBQにマイグレしようとしてる話
HadoopをBQにマイグレしようとしてる話
 
LT(自由)
LT(自由)LT(自由)
LT(自由)
 
リクルートグループの現場事例から見る AI/ディープラーニング ビジネス活用の勘所
リクルートグループの現場事例から見る AI/ディープラーニング ビジネス活用の勘所リクルートグループの現場事例から見る AI/ディープラーニング ビジネス活用の勘所
リクルートグループの現場事例から見る AI/ディープラーニング ビジネス活用の勘所
 
Company Recommendation for New Graduates via Implicit Feedback Multiple Matri...
Company Recommendation for New Graduates via Implicit Feedback Multiple Matri...Company Recommendation for New Graduates via Implicit Feedback Multiple Matri...
Company Recommendation for New Graduates via Implicit Feedback Multiple Matri...
 
リクルート式AIの活用法
リクルート式AIの活用法リクルート式AIの活用法
リクルート式AIの活用法
 
銀行ロビーアシスタント
銀行ロビーアシスタント銀行ロビーアシスタント
銀行ロビーアシスタント
 
リクルートにおけるマルチモーダル Deep Learning Web API 開発事例
リクルートにおけるマルチモーダル Deep Learning Web API 開発事例リクルートにおけるマルチモーダル Deep Learning Web API 開発事例
リクルートにおけるマルチモーダル Deep Learning Web API 開発事例
 
ユーザー企業内製CSIRTにおける対応のポイント
ユーザー企業内製CSIRTにおける対応のポイントユーザー企業内製CSIRTにおける対応のポイント
ユーザー企業内製CSIRTにおける対応のポイント
 
ユーザーからみたre:Inventのこれまでと今後
ユーザーからみたre:Inventのこれまでと今後ユーザーからみたre:Inventのこれまでと今後
ユーザーからみたre:Inventのこれまでと今後
 
Struggling with BIGDATA -リクルートおけるデータサイエンス/エンジニアリング-
Struggling with BIGDATA -リクルートおけるデータサイエンス/エンジニアリング-Struggling with BIGDATA -リクルートおけるデータサイエンス/エンジニアリング-
Struggling with BIGDATA -リクルートおけるデータサイエンス/エンジニアリング-
 
EMRでスポットインスタンスの自動入札ツールを作成する
EMRでスポットインスタンスの自動入札ツールを作成するEMRでスポットインスタンスの自動入札ツールを作成する
EMRでスポットインスタンスの自動入札ツールを作成する
 
RANCHERを使ったDev(Ops)
RANCHERを使ったDev(Ops)RANCHERを使ったDev(Ops)
RANCHERを使ったDev(Ops)
 
リクルートにおけるセキュリティ施策方針とCSIRT組織運営のポイント
リクルートにおけるセキュリティ施策方針とCSIRT組織運営のポイントリクルートにおけるセキュリティ施策方針とCSIRT組織運営のポイント
リクルートにおけるセキュリティ施策方針とCSIRT組織運営のポイント
 
ユーザー企業内製CSIRTにおける対応のポイント
ユーザー企業内製CSIRTにおける対応のポイントユーザー企業内製CSIRTにおける対応のポイント
ユーザー企業内製CSIRTにおける対応のポイント
 
リクルートテクノロジーズが語る 企業における、「AI/ディープラーニング」活用のリアル
リクルートテクノロジーズが語る 企業における、「AI/ディープラーニング」活用のリアルリクルートテクノロジーズが語る 企業における、「AI/ディープラーニング」活用のリアル
リクルートテクノロジーズが語る 企業における、「AI/ディープラーニング」活用のリアル
 
「リクルートデータセット」 ~公開までの道のりとこれから~
「リクルートデータセット」 ~公開までの道のりとこれから~「リクルートデータセット」 ~公開までの道のりとこれから~
「リクルートデータセット」 ~公開までの道のりとこれから~
 

Recently uploaded

Recently uploaded (20)

What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024
 
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya HalderCustom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
 
A Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System StrategyA Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System Strategy
 
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
 
Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John Staveley
 
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
 
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfThe Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
 
Designing for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at ComcastDesigning for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at Comcast
 
Buy Epson EcoTank L3210 Colour Printer Online.pdf
Buy Epson EcoTank L3210 Colour Printer Online.pdfBuy Epson EcoTank L3210 Colour Printer Online.pdf
Buy Epson EcoTank L3210 Colour Printer Online.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1
 
Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024
 
Agentic RAG What it is its types applications and implementation.pdf
Agentic RAG What it is its types applications and implementation.pdfAgentic RAG What it is its types applications and implementation.pdf
Agentic RAG What it is its types applications and implementation.pdf
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
 
PLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. StartupsPLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. Startups
 
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeFree and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
 
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfLinux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
 
Buy Epson EcoTank L3210 Colour Printer Online.pptx
Buy Epson EcoTank L3210 Colour Printer Online.pptxBuy Epson EcoTank L3210 Colour Printer Online.pptx
Buy Epson EcoTank L3210 Colour Printer Online.pptx
 
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdfHow Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
 
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
 
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfWhere to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
 

Internal Hive

  • 1. Inside Hive (for beginners) 1 Takeshi NAKANO / Recruit Co. Ltd.
  • 2. Why? Hive is good tool for non-specialist! The number of M/R controls the Hive processing time. ↓ How can we reduce the number? What can we do for this on writing HiveQL? ↓ How does Hive convert HiveQLto M/R jobs? On this, what optimizing processes are adopted? 7/6/2011 HIVE - A warehouse solution over Map Reduce Framework 2
  • 3. Don’t you have.. This fb’s paper has a lot of information! But this is a little old.. 7/6/2011 HIVE - A warehouse solution over Map Reduce Framework 3
  • 4. Component Level Analysis 7/6/2011 HIVE - A warehouse solution over Map Reduce Framework 4
  • 5. Hive Architecture / Exec Flow 7/6/2011 HIVE - A warehouse solution over Map Reduce Framework 5 Client Hadoop Metastore Driver Compiler
  • 6. Client Hadoop Driver Compiler Hive Workflow Hive has the operators which are minimum processing units. The process of each operator is done with HDFS operation or M/R jobs. The compiler converts HiveQL to the sets of operators. 7/6/2011 HIVE - A warehouse solution over Map Reduce Framework 6 Metastore
  • 7. Hive Workflow Operators 7/6/2011 HIVE - A warehouse solution over Map Reduce Framework 7
  • 8. Client Hadoop Metastore Driver Compiler Hive Workflow For M/R processing, Hiveuses ExecMaper and ExecReducer. On processing, we have 2 modes. Local processing mode Distributed processing mode 7/6/2011 HIVE - A warehouse solution over Map Reduce Framework 8
  • 9. Client Hadoop Metastore Driver Compiler Hive Workflow On 1(Local mode)Hive fork the process with hadoop command.The plan.xml is made just on 1 and the single node processes this. On 2(Distributed mode).Hive send the process to exsistingJobTracker.The information is housed on DistributedCacheand processed on multi nodes. 7/6/2011 HIVE - A warehouse solution over Map Reduce Framework 9
  • 10. Compiler : How to Process HiveQL 7/6/2011 HIVE - A warehouse solution over Map Reduce Framework 10 Client Hadoop Metastore Driver Compiler
  • 11. “Plumbing” of HIVE compiler 7/6/2011 11 HIVE - A warehouse solution over Map Reduce Framework
  • 12. “Plumbing” of HIVE compiler 7/6/2011 12 HIVE - A warehouse solution over Map Reduce Framework
  • 13. Compiler Overview 13 Parser Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer
  • 14. Compiler Overview 14 Hive QL Parser AST Semantic Analyzer QB Logical Plan Gen. Operator Tree Logical Optimizer Operator Tree Physical Plan Gen. Task Tree Physical Optimizer Task Tree
  • 15. Parser Hive QL AST INSERT OVERWRITE TABLE access_log_temp2 SELECT a.user, a.prono, p.maker, p.price FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono); Hive QL TOK_QUERY + TOK_FROM + TOK_JOIN + TOK_TABREF + TOK_TABNAME + "access_log_hbase" + a + TOK_TABREF + TOK_TABNAME + "product_hbase" + "p" + "=" + "." + TOK_TABLE_OR_COL + "a" + "access_log_hbase" + "." + TOK_TABLE_OR_COL + "p" + "prono“ AST + TOK_INSERT + TOK_DESTINATION + TOK_TAB + TOK_TABNAME + "access_log_temp2" + TOK_SELECT + TOK_SELEXPR + "." + TOK_TABLE_OR_COL + "a" + "user" + TOK_SELEXPR + "." + TOK_TABLE_OR_COL + "a" + "prono" + TOK_SELEXPR + "." + TOK_TABLE_OR_COL + "p" + "maker" + TOK_SELEXPR + "." + TOK_TABLE_OR_COL + "p" + "price" Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser
  • 16. Parser SQL AST INSERT OVERWRITE TABLE access_log_temp2 SELECT a.user, a.prono, p.maker, p.price FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono); SQL TOK_QUERY + TOK_FROM + TOK_JOIN + TOK_TABREF + TOK_TABNAME + "access_log_hbase" + a + TOK_TABREF + TOK_TABNAME + "product_hbase" + "p" + "=" + "." + TOK_TABLE_OR_COL + "a" + "access_log_hbase" + "." + TOK_TABLE_OR_COL + "p" + "prono“ + TOK_INSERT + TOK_DESTINATION + TOK_TAB + TOK_TABNAME + "access_log_temp2" + TOK_SELECT + TOK_SELEXPR + "." + TOK_TABLE_OR_COL + "a" + "user" + TOK_SELEXPR + "." + TOK_TABLE_OR_COL + "a" + "prono" + TOK_SELEXPR + "." + TOK_TABLE_OR_COL + "p" + "maker" + TOK_SELEXPR + "." + TOK_TABLE_OR_COL + "p" + "price" AST 1 2 3 Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser
  • 17. 17 Semantic Analyzer (1/2) AST QB + TOK_FROM + TOK_JOIN + TOK_TABREF + TOK_TABNAME + "access_log_hbase" + a + TOK_TABREF + TOK_TABNAME + "product_hbase" + "p" + "=" + "." + TOK_TABLE_OR_COL + "a" + "access_log_hbase" + "." + TOK_TABLE_OR_COL + "p" + "prono“ AST 1 QB MetaData AliasTo Table Info “a”=Table Info(“access_log_hbase”) “p”=Table Info(“product_hbase”) ParseInfo Join Node + TOK_JOIN + TOK_TABREF … + TOK_TABREF … + “=” … Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 17
  • 18. 18 Semantic Analyzer (2/2) AST QB + TOK_DESTINATION + TOK_TAB + TOK_TABNAME + "access_log_temp2” AST 2 QB ParseInfo NameTo Destination Node + TOK_TAB + TOK_TABNAME +"access_log_temp2” Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 18 18
  • 19. 19 Semantic Analyzer (2/2) AST QB + TOK_SELECT + TOK_SELEXPR + "." + TOK_TABLE_OR_COL + "a" + "user" + TOK_SELEXPR + "." + TOK_TABLE_OR_COL + "a" + "prono" + TOK_SELEXPR + "." + TOK_TABLE_OR_COL + "p" + "maker" + TOK_SELEXPR + "." + TOK_TABLE_OR_COL + "p" + "price" AST QB ParseInfo 3 Name To Select Node + TOK_SELECT + TOK_SELEXPR … + TOK_SELEXPR … + TOK_SELEXPR … + TOK_SELEXPR … Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 19 19
  • 20. 20 Logical Plan Generator (1/4) QB OP Tree QB MetaData AliasTo Table Info “a”=Table Info(“access_log_hbase”) “p”=Table Info(“product_hbase”) OP Tree TableScanOperator(“access_log_hbase”) TableScanOperator(“product_hbase”) Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 20 20
  • 21. 21 Logical Plan Generator (2/4) QB OP Tree QB ParseInfo + TOK_JOIN + TOK_TABREF + TOK_TABNAME + "access_log_hbase" + a + TOK_TABREF + TOK_TABNAME + "product_hbase" + "p" + "=" + "." + TOK_TABLE_OR_COL + "a" + "access_log_hbase" + "." + TOK_TABLE_OR_COL + "p" + "prono“ ReduceSinkOperator(“access_log_hbase”) ReduceSinkOperator(“product_hbase”) OP Tree JoinOperator Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser
  • 22. 22 Logical Plan Generator (3/4) QB OP Tree QB ParseInfo Name To Select Node + TOK_SELECT + TOK_SELEXPR + "." + TOK_TABLE_OR_COL + "a" + "user" + TOK_SELEXPR + "." + TOK_TABLE_OR_COL + "a" + "prono" + TOK_SELEXPR + "." + TOK_TABLE_OR_COL + "p" + "maker" + TOK_SELEXPR + "." + TOK_TABLE_OR_COL + "p" + "price" OP Tree SelectOperator Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser
  • 23. 23 Logical Plan Generator (4/4) QB OP Tree QB MetaData Name To Destination Table Info “insclause-0”= Table Info(“access_log_temp2”) OP Tree FileSinkOperator Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser
  • 24. Logical Plan Generator (result) 24 LCF OP Tree TableScanOperator TS_1 TableScanOperator TS_0 ReduceSinkOperator RS_2 ReduceSinkOperator RS_3 JoinOperator JOIN_4 SelectOperator SEL_5 FileSinkOperator FS_6 Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser
  • 25. Logical Optimizer Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 25 25 25
  • 26. Logical Optimizer (Predicate Push Down) INSERT OVERWRITE TABLE access_log_temp2 SELECT a.user, a.prono, p.maker, p.price FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono); INSERT OVERWRITE TABLE access_log_temp2 SELECT a.user, a.prono, p.maker, p.price FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono) WHERE p.maker = 'honda'; Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 26 26
  • 27. Logical Optimizer (Predicate Push Down) TableScanOperator TS_1 TableScanOperator TS_0 INSERT OVERWRITE TABLE access_log_temp2 SELECT a.user, a.prono, p.maker, p.price FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono); ReduceSinkOperator RS_3 ReduceSinkOperator RS_2 JoinOperator JOIN_4 INSERT OVERWRITE TABLE access_log_temp2 SELECT a.user, a.prono, p.maker, p.price FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono) WHERE p.maker = 'honda'; SelectOperator SEL_6 FileSinkOperator FS_7 Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 27 27
  • 28. INSERT OVERWRITE TABLE access_log_temp2 SELECT a.user, a.prono, p.maker, p.price FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono); INSERT OVERWRITE TABLE access_log_temp2 SELECT a.user, a.prono, p.maker, p.price FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono) WHERE p.maker = 'honda'; Logical Optimizer (Predicate Push Down) TableScanOperator TS_1 TableScanOperator TS_0 ReduceSinkOperator RS_3 ReduceSinkOperator RS_2 JoinOperator JOIN_4 FilterOperator FIL_5 (_col8 = 'honda') SelectOperator SEL_6 FileSinkOperator FS_7 Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 28 28
  • 29. Logical Optimizer (Predicate Push Down) TableScanOperator TS_1 TableScanOperator TS_0 INSERT OVERWRITE TABLE access_log_temp2 SELECT a.user, a.prono, p.maker, p.price FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono); FilterOperator FIL_8 (maker = 'honda') ReduceSinkOperator RS_2 ReduceSinkOperator RS_3 JoinOperator JOIN_4 INSERT OVERWRITE TABLE access_log_temp2 SELECT a.user, a.prono, p.maker, p.price FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono) WHERE p.maker = 'honda'; FilterOperator FIL_5 (_col8 = 'honda') SelectOperator SEL_6 FileSinkOperator FS_7 Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 29 29
  • 30. 30 Physical Plan Generator OP Tree Task Tree MoveTask(Stage-0) Ope Tree LoadTableDesc TableScanOperator(TS_0) TableScanOperator(TS_1) ReduceSinkOperator(RS_2) MapRedTask(Stage-1/root) ReduceSinkOperator(RS_3) JoinOperator(JOIN_4) SelectOperator(SEL_5) FileSinkOperator(FS_6) StatsTask(Stage-2) Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 30 30
  • 31. OP Tree Task Tree MapRedTask (Stage-1/root) TableScanOperator(TS_0) Physical Plan Generator (result) 31 LCF Mapper TableScanOperator TS_1 TableScanOperator TS_0 TableScanOperator(TS_1) ReduceSinkOperator RS_2 ReduceSinkOperator RS_3 ReduceSinkOperator(RS_2) MapRedTask(Stage-1/root) ReduceSinkOperator(RS_3) Reducer JoinOperator JOIN_4 JoinOperator(JOIN_4) SelectOperator SEL_5 SelectOperator(SEL_5) FileSinkOperator FS_6 Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 31 31 31
  • 32. 32 Physical Optimizer Task Tree Task Tree java/org/apache/hadoop/hive/ql/optimizer/physical/以下 Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser
  • 33. 33 Physical Optimizer (MapJoinResolver) Task Tree Task Tree MapRedTask (Stage-1) Mapper TableScanOperator TS_1 TableScanOperator TS_0 MapJoinOperator MAPJOIN_7 SelectOperator SEL_8 SelectOperator SEL_5 FileSinkOperator FS_6 Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 33
  • 34. 34 Physical Optimizer (MapJoinResolver) Task Tree Task Tree MapredLocalTask(Stage-7) MapRedTask (Stage-1) TableScanOperator TS_0 Mapper TableScanOperator TS_1 TableScanOperator TS_0 HashTableSinkOperator HASHTABLESINK_11 MapJoinOperator MAPJOIN_7 MapRedTask (Stage-1) SelectOperator SEL_8 Mapper TableScanOperator TS_1 SelectOperator SEL_5 MapJoinOperator MAPJOIN_7 FileSinkOperator FS_6 SelectOperator SEL_8 SelectOperator SEL_5 FileSinkOperator FS_6 Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 34
  • 35. In the end 7/6/2011 HIVE - A warehouse solution over Map Reduce Framework 35 Client Hadoop Metastore Driver Compiler
  • 36. In the end 36 Hive QL Parser AST Semantic Analyzer QB Logical Plan Gen. Operator Tree Logical Optimizer Operator Tree Physical Plan Gen. Task Tree Physical Optimizer Task Tree
  • 38. Appendix: What does Explain show? 7/6/2011 HIVE - A warehouse solution over Map Reduce Framework 38
  • 39. Appendix: What does Explain show? hive> explain INSERT OVERWRITE TABLE access_log_temp2 > SELECT a.user, a.prono, p.maker, p.price > FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono); OK ABSTRACT SYNTAX TREE: (TOK_QUERY (TOK_FROM (TOK_JOIN (TOK_TABREF (TOK_TABNAME access_log_hbase) a) (TOK_TABREF (TOK_TABNAME product_hbase) p) (= (. (TOK_TABLE_OR_COL a) prono) (. (TOK_TABLE_OR_COL p) prono)))) (TOK_INSERT (TOK_DESTINATION (TOK_TAB (TOK_TABNAME access_log_temp2))) (TOK_SELECT (TOK_SELEXPR (. (TOK_TABLE_OR_COL a) user)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL a) prono)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL p) maker)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL p) price))))) STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 Stage-2 depends on stages: Stage-0 STAGE PLANS: Stage: Stage-1 Map Reduce Alias -> Map Operator Tree: a TableScan alias: a Reduce Output Operator key expressions: expr: prono type: int sort order: + Map-reduce partition columns: expr: prono type: int tag: 0 value expressions: expr: user type: string expr: prono type: int p TableScan alias: p Reduce Output Operator key expressions: expr: prono type: int sort order: + Map-reduce partition columns: expr: prono type: int tag: 1 value expressions: expr: maker type: string expr: price type: int Reduce Operator Tree: Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {VALUE._col0} {VALUE._col2} 1 {VALUE._col1} {VALUE._col2} handleSkewJoin: false outputColumnNames: _col0, _col2, _col6, _col7 Select Operator expressions: expr: _col0 type: string expr: _col2 type: int expr: _col6 type: string expr: _col7 type: int outputColumnNames: _col0, _col1, _col2, _col3 File Output Operator compressed: false GlobalTableId: 1 table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe name: default.access_log_temp2 Stage: Stage-0 Move Operator tables: replace: true table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe name: default.access_log_temp2 Stage: Stage-2 Stats-Aggr Operator Time taken: 0.1 seconds hive>
  • 40. Appendix: What does Explain show? hive> explain INSERT OVERWRITE TABLE access_log_temp2 > SELECT a.user, a.prono, p.maker, p.price > FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono); OK ABSTRACT SYNTAX TREE: (TOK_QUERY (TOK_FROM (TOK_JOIN (TOK_TABREF (TOK_TABNAME access_log_hbase) a) (TOK_TABREF (TOK_TABNAME product_hbase) p) (= (. (TOK_TABLE_OR_COL a) prono) (. (TOK_TABLE_OR_COL p) prono)))) (TOK_INSERT (TOK_DESTINATION (TOK_TAB (TOK_TABNAME access_log_temp2))) (TOK_SELECT (TOK_SELEXPR (. (TOK_TABLE_OR_COL a) user)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL a) prono)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL p) maker)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL p) price))))) STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 Stage-2 depends on stages: Stage-0 STAGE PLANS: Stage: Stage-1 Map Reduce Alias -> Map Operator Tree: a TableScan alias: a Reduce Output Operator key expressions: expr: prono type: int sort order: + Map-reduce partition columns: expr: prono type: int tag: 0 value expressions: expr: user type: string expr: prono type: int p TableScan alias: p Reduce Output Operator key expressions: expr: prono type: int sort order: + Map-reduce partition columns: expr: prono type: int tag: 1 value expressions: expr: maker type: string expr: price type: int ABSTRACT SYNTAX TREE: STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 Stage-2 depends on stages: Stage-0 STAGE PLANS: Stage: Stage-1 Map Reduce Map Operator Tree: TableScan Reduce Output Operator TableScan Reduce Output Operator Reduce Operator Tree: Join Operator Select Operator File Output Operator Stage: Stage-0 Move Operator Stage: Stage-2 Stats-Aggr Operator Reduce Operator Tree: Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {VALUE._col0} {VALUE._col2} 1 {VALUE._col1} {VALUE._col2} handleSkewJoin: false outputColumnNames: _col0, _col2, _col6, _col7 Select Operator expressions: expr: _col0 type: string expr: _col2 type: int expr: _col6 type: string expr: _col7 type: int outputColumnNames: _col0, _col1, _col2, _col3 File Output Operator compressed: false GlobalTableId: 1 table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe name: default.access_log_temp2 Stage: Stage-0 Move Operator tables: replace: true table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe name: default.access_log_temp2 Stage: Stage-2 Stats-Aggr Operator Time taken: 0.1 seconds hive>
  • 41. Appendix: What does Explain show? ABSTRACT SYNTAX TREE: STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 Stage-2 depends on stages: Stage-0 STAGE PLANS: Stage: Stage-1 Map Reduce Map Operator Tree: TableScan Reduce Output Operator TableScan Reduce Output Operator Reduce Operator Tree: Join Operator Select Operator File Output Operator Stage: Stage-0 Move Operator Stage: Stage-2 Stats-Aggr Operator MapRedTask (Stage-1/root) Mapper TableScanOperator TS_1 TableScanOperator TS_0 ReduceSinkOperator RS_2 ReduceSinkOperator RS_3 Reducer JoinOperator JOIN_4 ≒ SelectOperator SEL_5 FileSinkOperator FS_6 MoveTask (Stage-0) Stats Task (Stage-2)