Hive Interview Questions

Explain what is Hive?

Hive is an ETL and Data warehousing tool developed on top of Hadoop Distributed File System (HDFS). It is a data warehouse framework for querying and analysis of data that is stored in HDFS. Hive is an open-source-software that lets programmers analyze large data sets on Hadoop.

When to use Hive?

  • Hive is useful when making data warehouse applications
  • When you are dealing with static data instead of dynamic data
  • When application is on high latency (high response time)
  • When a large data set is maintained
  • When we are using queries instead of scripting

Mention what are the different modes of Hive?

Depending on the size of data nodes in Hadoop, Hive can operate in two modes.

These modes are,

  • Local mode
  • Map reduce mode

Mention when to use Map reduce mode?

Map reduce mode is used when,

  • It will perform on large amount of data sets and query going to execute in a parallel way
  • Hadoop has multiple data nodes, and data is distributed across different node we use Hive in this mode
  • Processing large data sets with better performance needs to be achieved

Mention key components of Hive Architecture?

Key components of Hive Architecture includes,

  • User Interface
  • Compiler
  • Metastore
  • Driver
  • Execute Engine

Mention what are the different types of tables available in Hive?

There are two types of tables available in Hive.

  • Managed table: In managed table, both the data and schema are under control of Hive
  • External table: In the external table, only the schema is under the control of Hive.

Explain what is Metastore in Hive?

Metastore is a central repository in Hive.  It is used for storing schema information or metadata in the external database.

Mention what Hive is composed of ?

Hive consists of 3 main parts,

  • Hive Clients
  • Hive Services
  • Hive Storage and Computing

Mention what are the type of database does Hive support ?

For single user metadata storage, Hive uses derby database and for multiple user Metadata or shared Metadata case Hive uses MYSQL.

Mention Hive default read and write classes?

Hive default read and write classes are

  • TextInputFormat/HiveIgnoreKeyTextOutputFormat
  • SequenceFileInputFormat/SequenceFileOutputFormat

Mention what are the different modes of Hive?

Different modes of Hive depends on the size of data nodes in Hadoop.

These modes are,

  • Local mode
  • Map reduce mode

Why is Hive not suitable for OLTP systems?

Hive is not suitable for OLTP systems because it does not provide insert and update function at the row level.

Differentiate between Hive and HBase?

Hive HBase
Enables most of the SQL queries This doesn’t allow SQL queries
Doesn’t support record level insert, update, and delete operations on table It supports
It is a data warehouse framework It is NoSQL database
Hive run on the top of MapReduce HBase runs on the top of HDFS

Explain what is a Hive variable? What for we use it?

Hive variable is created in the Hive environment that can be referenced by Hive scripts. It is used to pass some values to the hive queries when the query starts executing.

Mention what is ObjectInspector functionality in Hive?

ObjectInspector functionality in Hive is used to analyze the internal structure of the columns, rows, and complex objects.  It allows to access the internal fields inside the objects.

Mention what is (HS2) HiveServer2?

It is a server interface that performs following functions.

  • It allows remote clients to execute queries against Hive
  • Retrieve the results of mentioned queries

Some advanced features Based on Thrift RPC in its latest version include

  • Multi-client concurrency
  • Authentication

Mention what Hive query processor does?

Hive query processor convert graph of MapReduce jobs with the execution time framework.  So that the jobs can be executed in the order of dependencies.

Mention what are the components of a Hive query processor?

The components of a Hive query processor include,

  • Logical Plan Generation
  • Physical Plan Generation
  • Execution Engine
  • Operators
  • UDF’s and UDAF’s
  • Optimizer
  • Parser
  • Semantic Analyzer
  • Type Checking

Mention what is Partitions in Hive?

Hive organizes tables into partitions.

  • It is one of the ways of dividing tables into different parts based on partition keys.
  • Partition is helpful when the table has one or more Partition keys.
  • Partition keys are basic elements for determining how the data is stored in the table.

Mention when to choose “Internal Table” and “External Table” in Hive?

In Hive you can choose internal table,

  • If the processing data available in local file system
  • If we want Hive to manage the complete lifecycle of data including the deletion

You can choose External table,

  • If processing data available in HDFS
  • Useful when the files are being used outside of Hive

Mention if we can name view same as the name of a Hive table?

No. The name of a view must be unique compared to all other tables and as views present in the same database.

Mention what are views in Hive?

In Hive, Views are Similar to tables. They are generated based on the requirements.

  • We can save any result set data as a view in Hive
  • Usage is similar to as views used in SQL
  • All type of DML operations can be performed on a view

Explain how Hive Deserialize and serialize the data?

Usually, while read/write the data, the user first communicate with inputformat. Then it connects with Record reader to read/write record.  To serialize the data, the data goes to row. Here deserialized custom serde use object inspector to deserialize the data in fields.

What is Buckets in Hive?

  • The data present in the partitions can be divided further into Buckets
  • The division is performed based on Hash of particular columns that is selected in the table.

In Hive, how can you enable buckets?

In Hive, you can enable buckets by using the following command,

set.hive.enforce.bucketing=true;

In Hive, can you overwrite Hadoop MapReduce configuration in Hive?

Yes, you can overwrite Hadoop MapReduce configuration in Hive.

Explain how can you change a column data type in Hive?

You can change a column data type in Hive by using command,

ALTER TABLE table_name CHANGE column_name column_name new_datatype;

Mention what is the difference between order by and sort by in Hive?

  • SORT BY will sort the data within each reducer. You can use any number of reducers for SORT BY operation.
  • ORDER BY will sort all of the data together, which has to pass through one reducer. Thus, ORDER BY in hive uses a single

Explain when to use explode in Hive?

Hadoop developers sometimes take an array as input and convert into a separate table row. To convert complex data types into desired table formats, Hive use explode.

Mention how can you stop a partition form being queried?

You can stop a partition form being queried by using the ENABLE OFFLINE clause with ALTER TABLE statement.

Compare Pig and Hive?

Criteria Pig Hive
Architecture Procedural data flow language SQL type declarative language
Application Programming purposes Report creation
Operational field Client side Server side
Support for avro files Yes No

What is the definition of Hive? What is the present version of Hive and explain about ACID transactions in Hive?

Hive is an open source data warehouse system. We can use Hive for analyzing and querying in large data sets of Hadoop files. It’s similar to SQL. The present version of hive is 0.13.1. Hive supports ACID transactions: The full form of ACID is Atomicity, Consistency, Isolation, and Durability. ACID transactions are provided at the row levels, there are Insert, Delete, and Update options so that Hive supports ACID transaction.

  • Insert
  • Delete
  • Update

Explain what is a Hive variable. What do we use it for?

Hive variable is basically created in the Hive environment that is referenced by Hive scripting languages. It provides to pass some values to the hive queries when the query starts executing. It uses the source command.

What kind of data warehouse application is suitable for Hive? What are the types of tables in Hive?

Hive is not considered as a full database. The design rules and regulations of Hadoop and HDFS put restrictions on what Hive can do.Hive is most suitable for data warehouse applications.
Where :

  • Analyzing the relatively static data.
  • Less Responsive time.
  • No rapid changes in data.Hive doesn’t provide fundamental features required for OLTP, Online Transaction Processing.Hive is suitable for data warehouse applications in large data sets.Two types of tables in Hive
  1. Managed table.
  2. External table.

Can We Change settings within Hive Session? If Yes, How?

Yes we can change the settings within Hive session, using the SET command. It helps to change Hive job settings for an exact query.
Example: The following commands shows buckets are occupied according to the table definition.

hive> SET hive.enforce.bucketing=true;

We can see the current value of any property by using SET with the property name. SET will list all the properties with their values set by Hive.

hive> SET hive.enforce.bucketing;

hive.enforce.bucketing=true

And this list will not include defaults of Hadoop. So we should use the below like

SET -v

It will list all the properties including the Hadoop defaults in the system.

Is it possible to add 100 nodes when we have 100 nodes already in Hive? How?

Yes, we can add the nodes by following the below steps.

  1. Take a new system create a new username and password.
  2. Install the SSH and with master node setup ssh connections.
  3. Add ssh public_rsa id key to the authorized keys file.
  4. Add the new data node host name, IP address and other details in /etc/hosts slaves file
    168.1.102 slave3.in slave3.
  5. Start the Data Node on New Node.
  6. Login to the new node like suhadoop or ssh -X [email protected]
  7. Start HDFS of a newly added slave node by using the following command
    ./bin/hadoop-daemon.sh start data node.
  8. Check the output of jps command on a new node

Explain the concatenation function in Hive with an example ?

Concatenate function will join the input strings.We can specify the
‘N’ number of strings separated by a comma.
Example:

CONCAT (‘Intellipaat’,’-‘,’is’,’-‘,’a’,’-‘,’eLearning’,’-’,’provider’);

Output:

Intellipaat-is-a-eLearning-provider

So, every time we set the limits of the strings by ‘-‘. If it is common for every strings, then Hive provides another command

CONCAT_WS. In this case,we have to specify the set limits of operator first.

CONCAT_WS (‘-‘,’Intellipaat’,’is’,’a’,’eLearning’,‘provider’);

Output: Intellipaat-is-a-eLearning-provider

Trim and Reverse function in Hive with examples?

Trim function will delete the spaces associated with a string.
Example:

TRIM(‘ INTELLIPAAT ‘);

Output:

INTELLIPAAT

To remove the Leading space

LTRIM(‘ INTELLIPAAT’);

To remove the trailing space

RTRIM(‘INTELLIPAAT ‘);

In Reverse function, characters are reversed in the string.

Example:

REVERSE(‘INTELLIPAAT’);

Output:

TAAPILLETNI

How to change the column data type in Hive? Explain RLIKE in Hive?

We can change the column data type by using ALTER and CHANGE.
The syntax is :

ALTER TABLE table_name CHANGE column_namecolumn_namenew_datatype;

Example: If we want to change the data type of the salary column from integer to bigint in the employee table.
ALTER TABLE employee CHANGE salary salary BIGINT;RLIKE: Its full form is Right-Like and it is a special function in the Hive. It helps to examine the two substrings. i.e, if the substring of A matches with B then it evaluates to true.
Example:

Trueà‘Intellipaat’ RLIKE ‘tell’

True (this is a regular expression)à‘Intellipaat’ RLIKE ‘^I.*’

What are the components used in Hive query processor?

The components of a Hive query processor include

  • Logical Plan of Generation.
  • Physical Plan of Generation.
  • Execution Engine.
  • UDF’s and UDAF’s.
  • Semantic Analyzer.
  • Type Checking

What is Buckets in Hive?

The present data is partitioned and divided into different Buckets. This data is divided on the basis of Hash of the particular table columns.

Explain process to access sub directories recursively in Hive queries?

By using below commands we can access sub directories recursively in Hive

hive> Set mapred.input.dir.recursive=true;

hive> Set hive.mapred.supports.subdirectories=true;

Hive tables can be pointed to the higher level directory and this is suitable for the directory structure which is like /data/country/state/city/

What are the components used in Hive query processor?

The components of a Hive query processor include

  • Logical Plan of Generation
  • Physical Plan of Generation
  • Execution Engine
  • Operators
  • UDF’s and UDAF’s
  • Optimizer
  • Parser
  • Semantic Analyzer
  • Type Checking

How to skip header rows from a table in Hive?

Header records in log files
System=….
Version=…
Sub-version=….
In the above three lines of headers that we do not want to include in our Hive query. To skip header lines from our tables in the Hive,set a table property that will allow us to skip the header lines.

CREATE EXTERNAL TABLE employee (

name STRING,

job STRING,

dob STRING,

id INT,

salary INT)

ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘ ‘ STORED AS TEXTFILE

LOCATION ‘/user/data’

TBLPROPERTIES(“skip.header.line.count”=”2”);

What is the maximum size of string data type supported by hive? Mention the Hive support binary formats?

The maximum size of string data type supported by hive is 2 GB.
Hive supports the text file format by default and it supports the binary format Sequence files, ORC files, Avro Data files, Parquet files.
Sequence files: Splittable, compressible and row oriented are the general binary format.
ORC files: Full form of ORC is optimized row columnar format files. It is a Record columnar file and column oriented storage file. It divides the table in row split. In each split stores that value of the first row in the first column and followed sub subsequently.
AVRO data files: It is same as a sequence file splittable, compressible and row oriented, but except the support of schema evolution and multilingual binding support.

What is the precedence order of HIVE configuration?

We are using a precedence hierarchy for setting the properties

  1. SET Command in HIVE
  2. The command line –hiveconf option
  3. Hive-site.XML
  4. Hive-default.xml
  5. Hadoop-site.xml
  6. Hadoop-default.xml

If you run a select * query in Hive, Why does it not run MapReduce?

The hive.fetch.task.conversion property of Hive lowers the latency of mapreduce overhead and in effect when executing queries like SELECT, FILTER, LIMIT, etc., it skips mapreduce function

How Hive can improve performance with ORC format tables?

We can store the hive data in highly efficient manner in the Optimized Row Columnar file format. It can simplify many Hive file format limitations. We can improve the performance by using ORC files while reading, writing and processing the data.

Set hive.compute.query.using.stats-true;

Set hive.stats.dbclass-fs;

CREATE TABLE orc_table (

idint,

name string)

ROW FORMAT DELIMITED

FIELDS TERMINATED BY ‘\:’

LINES TERMINATED BY ‘\n’

STORES AS ORC;

Explain the functionality of Object-Inspector?

It helps to analyze the internal structure of row object and individual structure of columns in HIVE. It also provides a uniform way to access complex objects that can be stored in multiple formats in the memory.
Instance of Java class
A standard Java object
A lazily initialized object
The Object-Inspector tells structure of the object and also ways to access the internal fields inside the object.

Whenever we run hive query, new metastore_db is created. Why?

Local metastore is created when we run Hive in embedded mode. And before creating it checks whether the metastore exists or not and this metastore property is defined in the configuration file hive-site.xml. Property is“javax.jdo.option.ConnectionURL” with default value “jdbc:derby:;databaseName=metastore_db;create=true”.So to change the behavior of the location to an absolute path, so that from that location meta-store will be used.

How can we access the sub directories recursively?

By using below commands we can access sub directories recursively in Hive

hive> Set mapred.input.dir.recursive=true;

hive> Set hive.mapred.supports.subdirectories=true;

Hive tables can be pointed to the higher level directory and this is suitable for the directory structure which is like /data/country/state/city/

What are the uses of explode Hive?

Hadoop developers consider the array as their inputs and convert them into a separate table row. To convert complicate data types into desired table formats Hive is essentially using explode.

What is available mechanism for connecting from applications, when we run hive as a server?

1.Thrift Client: Using thrift you can call hive commands from various programming languages. Example: C++, PHP,Java, Python and Ruby.

2.JDBC Driver: JDBC Driver supports the Type 4 (pure Java) JDBC Driver

3.ODBC Driver: ODBC Driver supports the ODBC protocol.

How do we write our own custom SerDe?

End users want to read their own data format instead of writing, so the user wants to write a Deserializer than SerDe.
Example: The RegexDeserializer will deserialize the data using the configuration parameter ‘regex’, and a list of column names.
If our SerDe supports DDL, we probably want to implement a protocol based on DynamicSerDe. It’s non-trivial to write a “thrift DDL” parser.

Mention the date data type in Hive. Name the Hive data type collection?

The TIMESTAMP data type stores date in java.sql.timestamp format.

Three collection data types in Hive

  1. ARRAY
  2. MAP
  3. STRUCT

Can we run UNIX shell commands from Hive? Can Hive queries be executed from script files? How? Give an example?

Yes, we can run UNIX shell commands from Hive using the! Mark before the command .For example: !pwd at hive prompt will list the current directory.
We can execute Hive queries from the script files by using the source command.
Example :

Hive> source /path/to/file/file_with_query.hql

How Facebook Uses Hadoop,Hive and Hbase ?

Facebook data stored on HDFS,everyday millions of photos uploaded into facebook with the help of Hadoop Facebook Messages,Likes and statues updates running on top of Hbase Hive to generate reports for third-party developers and advertisers who need to track the success of their applications or campaigns.

What is the difference between HBase and Hive?

Both hive and hbase can be used in different technologies that are based on Hadoop. Hive happens to be a infrastructure warehouse of data which is used on Hadoop whereas HBase is NoSQL. The key value stores which run on Hadoop themselves. Hive will also help those who know about SQL run a few jobs in MapReduce when Hbase will also support 4 of the operations such as put, get, scan and delete. The Hbase happens to be good for querying for data but Hive on the other hand is good for querying data is analytical and is collected over a while.

What is Hive Metastore ?

Hive Meta store is a database that stores metadata of your hive tables like table name,column name,data types,table location,number of buckets in the table etc.

Hive new version supported Hadoop Versions ?

This release works with Hadoop 0.20.x, 0.23.x.y, 1.x.y, 2.x.y

Which companies are mostly using Hive ?

Facebook,Netflix

Wherever (Different Directory) I run hive query, it creates new metastore_db, please explain the reason for it?

Whenever you run the hive in embedded mode, it creates the local metastore. And before creating the metastore it looks whether metastore already exist or not. This property is defined in configuration file hive – site.xml. Property is “javax.jdo.option.ConnectionURL” with default value “jdbc:derby:;databaseName=metastore_db;create=true”. So to change the behavior change the location to absolute path, so metastore will be used from that location.

Is it possible to use same metastore by multiple users, in case of embedded hive?

No, it is not possible to use metastore in sharing mode. It is recommended to use standalone “real” database like MySQL or PostGresSQL.

What is the functionality of Query Processor in Apached Hive ?

This component implements the processing framework for converting SQL to a graph of map/reduce jobs and the execution time framework to run those jobs in the order of dependencies.

Is multi line comment supported in HIVE Script?

NO

What is the functionality of Query Processor in Apache Hive?

This components implements the processing framework for converting SQL to graph of map/reduce jobs and the execution time framework to run those jobs in the order od dependencies.

what is a Hive Metastore?

Hive Metastore is a central repository that stores metadata in external database.

Explain about the SMB Join in Hive?

In SMB join in Hive, each mapper reads a bucket from the first table and the corresponding bucket from the second table and then a merge sort join is performed. Sort Merge Bucket (SMB) join in hive is mainly used as there is no limit on file or partition or table join. SMB join can best be used when the tables are large. In SMB join the columns are bucketed and sorted using the join columns. All tables should have the same number of buckets in SMB join.

What is ObjectInspector functionality?

ObjectInspector is used to analyze the structure of individual columns and the internal structure of the row objects. ObjectInspector in Hive provides access to complex objects which can be stored in multiple formats.

Is it possible to use same metastore by multiple users, in case of embedded hive?

No, it is not possible to use metastore in sharing mode. It is recomended to use standalone “real” database like MySQL or PostGreSQL.

Explain about the different types of join in Hive?

HiveQL has 4 different types of joins – JOIN- Similar to Outer Join in SQL

FULL OUTER JOIN – Combines the records of both the left and right outer tables that fulfil the join condition.

LEFT OUTER JOIN- All the rows from the left table are returned even if there are no matches in the right table.

RIGHT OUTER JOIN-All the rows from the right table are returned even if there are no matches in the left table.

Is it possible to change the default location of Managed Tables in Hive, if so how?

Yes, we can change the default location of Managed tables using the LOCATION keyword while creating the managed table. The user has to specify the storage path of the managed table as the value to the LOCATION keyword.

How can you connect an application, if you run Hive as a server?

When running Hive as a server, the application can be connected in one of the 3 ways-

ODBC Driver-This supports the ODBC protocol

JDBC Driver- This supports the JDBC protocol

Thrift Client- This client can be used to make calls to all hive commands using different programming language like PHP, Python, Java, C++ and Ruby.

Which classes are used by the Hive to Read and Write HDFS Files?

Following classes are used by Hive to read and write HDFS files

  • TextInputFormat/HiveIgnoreKeyTextOutputFormat: These 2 classes read/write data in plain text file format.
  • SequenceFileInputFormat/SequenceFileOutputFormat: These 2 classes read/write data in hadoop SequenceFile format.

Is it possible to create multiple table in hive for same data?

As hive creates schema and append on top of an existing data file. One can have multiple schema for one data file, schema will be saved in hive’s metastore and data will not be parsed or serialized to disk in given schema. When we will try to retrieve data, schema will be used. For example if we have 5 column (name, job, dob, id, salary) in the data file present in hive metastore then, we can have multiple schema by choosing any number of columns from the above list. (Table with 3 columns or 5 columns or 6 columns).

What kind of datawarehouse application is suitable for Hive?

Hive is not a full database. The design constraints and limitations of Hadoop and HDFS impose limits on what Hive can do. Hive is most suited for data warehouse applications, where

1) Relatively static data is analyzed,

2) Fast response times are not required, and

3) When the data is not changing rapidly.

Hive doesn’t provide crucial features required for OLTP, Online Transaction Processing. It’s closer to being an OLAP tool, Online Analytic Processing.So, Hive is best suited for data warehouse applications, where a large data set is maintained and mined for insights, reports, etc.

What are the Binary Storage formats supported in Hive?

By default Hive supports text file format, however hive also supports below binary formats.

Sequence Files, Avro Data files, RCFiles, ORC files, Parquet files

Sequence files: General binary format. splittable, compressible and row oriented. a typical example can be. if we have lots of small file, we may use sequence file as a container, where file name can be a key and content could stored as value. it support compression which enables huge gain in performance.

Avro datafiles: Same as Sequence file splittable, compressible and row oriented except support of schema evolution and multilingual binding support.

RCFiles: Record columnar file, it’s a column oriented storage file. it breaks table in row split. in each split stores that value of first row in first column and followed sub subsequently.

ORC Files: Optimized Record Columnar files

CONCAT function in Hive with Example?

CONCAT function will concat the input strings. You can specify any number of strings separated by comma.

Is HQL case sensitive?

HQL is not case sensitive.

REPEAT function in Hive with example?

REPEAT function will repeat the input string n times specified in the command.

Describe REVERSE function in Hive with example?

REVERSE function will reverse the characters in a string.

LOWER or LCASE function in Hive with example?

LOWER or LCASE function will convert the input string to lower case characters.

UPPER or UCASE function in Hive with example?

UPPER or UCASE function will convert the input string to upper case characters.

Rename a table in Hive – How to do it?

Using ALTER command, we can rename a table in Hive.

ALTER TABLE hive_table_name RENAME  TO new_name;

RLIKE in Hive?

RLIKE (Right-Like) is a special function in Hive where if any substring of A matches with B then it evaluates to true. It also obeys Java regular expression pattern. Users don’t need to put % symbol for a simple match in RLIKE.

Difference between external table and internal table in HIVE ?

Hive has a relational database on the master node it uses to keep track of state. For instance, when you CREATE TABLE FOO(foo string) LOCATION ‘hdfs://tmp/’;, this table schema is stored in the database. If you have a partitioned table, the partitions are stored in the database(this allows hive to use lists of partitions without going to the filesystem and finding them, etc). These sorts of things are the ‘metadata’.

When you drop an internal table, it drops the data, and it also drops the metadata. When you drop an external table, it only drops the meta data. That means hive is ignorant of that data now. It does not touch the data itself

Does Hive support record level Insert, delete or update?

Hive does not provide record-level update, insert, or delete. Henceforth, Hive does not provide transactions too. However, users can go with CASE statements and built in functions of Hive to satisfy the above DML operations. Thus, a complex update query in a RDBMS may need many lines of code in Hive.

Is Hive suitable to be used for OLTP systems? Why?

No Hive does not provide insert and update at row level. So it is not suitable for OLTP system.

What kind of datawarehouse application is suitable for Hive?

Hive is not a full database. The design constraints and limitations of Hadoop and HDFS impose limits on what Hive can do.

Hive is most suited for data warehouse applications, where

1) Relatively static data is analyzed,

2) Fast response times are not required, and

3) When the data is not changing rapidly.

Hive doesn’t provide crucial features required for OLTP, Online Transaction Processing. It’s closer to being an OLAP tool, Online Analytic Processing.So, Hive is best suited for data warehouse applications, where a large data set is maintained and mined for insights, reports, etc

Can we change the data type of a column in a hive table?

Using REPLACE column option

ALTER TABLE table_name REPLACE COLUMNS ……

TRIM function in Hive with example?

TRIM function will remove the spaces associated with a string. xample:

TRIM(‘  Hadoop  ‘);

Output: Hadoop.

Why do we need Hive?

Hive is a tool in Hadoop ecosystem which provides an interface to organize and query data in a databse like fashion and write SQL like queries. It is suitable for accessing and analyzing data in Hadoop using SQL syntax.

Is there a date data type in Hive?

Yes. The TIMESTAMP data types stores date in java.sql.timestamp format

While loading data into a hive table using the LOAD DATA clause, how do you specify it is a hdfs file and not a local file ?

By Omitting the LOCAL CLAUSE in the LOAD DATA statement.

What does the “USE” command in hive do?

With the use command you fix the database on which all the subsequent hive queries will run

How can you delete the DBPROPERTY in Hive?

There is no way you can delete the DBPROPERTY.

Does the archiving of Hive tables give any space saving in HDFS?

No. It only reduces the number of files which becomes easier for namenode to manage

What is the usefulness of the DISTRIBUTED BY clause in Hive?

It controls ho wthe map output is reduced among the reducers. It is useful in case of streaming data

Can a partition be archived? What are the advantages and Disadvantages?

Yes. A partition can be archived. Advantage is it decreases the number of files stored in namenode and the archived file can be queried using hive. The disadvantage is it will cause less efficient query and does not offer any space savings.

Summary
Review Date
Reviewed Item
Hive Interview Questions and Answers very difficult to find Thank you iteanz for sharing Hive Interview Questions with answers.
Author Rating
51star1star1star1star1star