Explain the term HCatalog?
Ans. In simple words, a table storage management tool for Hadoop is HCatalog. As it main function it exposes the tabular data of Hive metastore to other Hadoop applications. Moreover, to easily write data onto a grid, it enables users with different data processing tools (Pig, MapReduce). Also, we don’t have to worry about where or in what format their data is stored.
So, basically, a key component of Hive which enables the users to store their data in any format and any structure is HCatalog.
Explain how HCatalog enables right tool for right Job?
Ans. Especially, for data processing such as Hive, Pig, and MapReduce, Hadoop ecosystem have different tools. Since as a benefit these tools don’t need metadata, so, they can still benefit from it when it is present. Though, to share data more easily, sharing a metadata store also enables users across tools. In addition, it is very common that using a Pig or MapReduce, a workflow where data is loaded and normalized and then analyzed via Hive. So, users of each tool have immediate access to data created with another tool, if all these tools share one metastore. In all, there is no need of loading or transfer steps here.
How HCatalog helps to capture processing states to enable sharing?
Ans. Straight forward, we can publish our analytics results via HCatalog. Thus different programmer can access our analytics platform with the help of “REST”. So, all the published schemas by us are also useful to other data scientists. They can use our discoveries as inputs into a subsequent discovery.
HCatalog helps to Integrate Hadoop with everything. Explain?
Ans. Since, for the enterprise, Hadoop opens up a lot of opportunities; but it must work with and augment existing tools in order to fuel adoption. Similarly, In HCatalog, with a familiar API and SQL-like language REST services opens up the platform to the enterprise. In addition, to more deeply integrate with the Hadoop platform, enterprise data management systems uses HCatalog.
What are the general Prerequisites to learn HCatalog?
Ans. In order to learn HCatalog, an individual must have a basic knowledge of Core Java along with Database concepts of SQL. In addition, one must know about Hadoop File system and any of Linux operating system flavors for learning it.
Who is intended audience to learn HCatalog?
Ans. The professionals those are aspiring to make a career in Big Data Analytics by using Hadoop Framework, must go for this tutorial. Apart from them, all the ETL developers and professionals those are into analytics, in general, can learn through this tutorial for good effect.
Why HCatalog?
Ans. Some specific reasons for using HCatalog are:
Enabling the right tool for right Job
Capture processing states to enable sharing
Integrate Hadoop with everything
Explain HCatalog Architecture in Brief?
Ans. Basically, in any format for which a SerDe (serializer-deserializer) can be written, HCatalog supports reading and writing files. Formats like RCFile, CSV, JSON, SequenceFile, and ORC file are supported by default. Although, we must offer the InputFormat, OutputFormat, and SerDe, to use a custom format.
On top of the Hive metastore and incorporates Hive’s DDL, HCatalog is built. Moreover, for Pig and MapReduce, HCatalog offers read as well as write interfaces and also for issuing data definition and metadata exploration commands it uses Hive’s command line interface.
How to invoke Command Line Interface?
Ans. From the command $HIVE_HOME/HCatalog/bin/hcat where $HIVE_HOME is the home directory of Hive, it is possible to invoke HCatalog Command Line Interface (CLI). Further, to initialize the HCatalog server, we use hcat command.
Follow the link to learn more about HCatalog Commands
command to initialize HCatalog command line:
cd $HCAT_HOME/bin
./hcat
State some command line options.?
Ans. Commands supported by HCatalog CLI are −
-g
Usage- hcat -g mygroup …
Basically, to create must have the group “mygroup” in the table which we need.
-p
Usage-hcat -p rwxr-xr-x …
Moreover, make sure that the table which we need to create must have several permissions like read, write, and execute.
-f
Usage- hcat -f myscript.HCatalog …
Further to make execution myscript.HCatalog is a script file have some DDL commands.
-e
Usage- hcat -e ‘create table mytable(a int);’ …
First of all, consider the following string as a DDL command only after then it is possible to execute it.
-D
Usage- hcat -Dkey = value …
HCatalog CLI -D, passes the key-value pair to HCatalog, especially as a Java system property.
hcat
In order to print a usage message, we sue hcat CLI.
Explain Alter Table Statement in HCatalog.
Ans. In order to alter a table, we can use the ALTER TABLE statement.
Syntax-
There are various syntaxes, we can use any of them according to what attributes we want to modify in a table:
ALTER TABLE name RENAME TO new_name
ALTER TABLE name ADD COLUMNS (col_spec[, col_spec ...])
ALTER TABLE name DROP [COLUMN] column_name
ALTER TABLE name CHANGE column_name new_name new_type
ALTER TABLE name REPLACE COLUMNS (col_spec[, col_spec ...])
How can we drop a table in HCatalog?
Ans. Drop a table means removing table/column data and their metadata. So, while we use Drop table, it removes the table/column data and their metadata. That table can be of any type either normal or external table. Here Normal table means a table which is stored in metastore, whereas external table means a table which is stored in the local file system. However, irrespective of their types, HCatalog treats both the tables in the same way.
Syntax −
DROP TABLE [IF EXISTS] table_name;
State some DDL Command with brief Description.
Ans. Some DDL CommandS are:
CREATE TABLE
ALTER TABLE
DROP TABLE
CREATE/ALTER/DROP VIEW
SHOW TABLES
SHOW PARTITIONS
Create/Drop Index
DESCRIBE
Explain HCatalog Create Table CLI along with its syntax.
Ans. In HCatalog, we sue Create Table statement to create a table in Hive metastore.
Syntax-
CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.] table_name
[(col_name data_type [COMMENT col_comment], ...)]
[COMMENT table_comment]
[ROW FORMAT row_format]
[STORED AS file_format]
Which command do we use to insert data in HCatalog?
Ans. In general, by using the Insert statement, we can insert data just after creating a table in SQL. However, we use the LOAD DATA statement in HCatalog for inserting data.
Make sure to store bulk records, LOAD DATA is a better option while inserting data into HCatalog. It is possible to load data in two different ways, one of them is from the local file system or another one is from the Hadoop file system.
Syntax-
LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename
[PARTITION (partcol1=val1, partcol2=val2 ...)]
How to create and manage a view in HCatalog?
Ans. A statement CREATE VIEW, do create a view with the given name. However, if a table or view with the same name already exists, an error is thrown. Though, there is a flexibility that we can skip the error by using IF NOT EXISTS option.
Syntax –
CREATE VIEW [IF NOT EXISTS] [db_name.]view_name [(column_name [COMMENT column_comment], ...) ]
[COMMENT view_comment]
[TBLPROPERTIES (property_name = property_value, ...)]
AS SELECT ...;
C
Explain Drop View Statement along with syntax.
Ans. Basically, a DROP VIEW Statement in HCatalog removes metadata for the specified view. Although, make sure no warning is given, when dropping a view referenced by other views.
Syntax-
DROP VIEW [IF EXISTS] view_name;
Explain HCatLoader and HCatStorer APIs.
Ans. HCatLoader
Basically, to read data from HCatalog-managed tables we use HCatLoader along with Pig scripts.
Syntax:
A = LOAD 'tablename' USING org.apache.HCatalog.pig.HCatLoader();
HCatStorer
Whereas to write data to HCatalog-managed tables, we can use HCatStorer along with Pig scripts.
Follow the link to learn more about HCatloader and HCatStorer
Syntax:
A = LOAD ...
B = FOREACH A ...
...
...
my_processed_data = ...
STORE my_processed_data INTO 'tablename' USING org.apache.HCatalog.pig.HCatStorer();
Name all HCatalog Features.
Ans. Here is the list of best HCatalog Features:
Table and storage management layer
Table abstraction layer
Any Format
Shared schema and data type
Integration with other tools
Expose the information
Binary format
Authentication
Adding columns to partitions
Support Hive tables
Name Applications and Use Cases of HCatalog.
Ans. Some key uses could be:
Enabling the right tool for right Job
Capture processing states to enable sharing
Integrate Hadoop with everything
Some applications of HCatalog:
SQL INTERFACE FOR HADOOP? HCATALOG AS ENABLER…
HADOOP DEVELOPER PRODUCTIVITY AND HCATALOG
GOOD FOR THE ECOSYSTEM IS GOOD FOR YOU.
Which command is used to list all the tables in a database or list all the columns in a table?
Ans. On defining Show Tables statement, it simply displays the names of all tables. However, it lists tables either from the current database, or with the IN clause, or in a specified database, by default.
Syntax of SHOW TABLES is−
SHOW TABLES [IN database_name] ['identifier_with_wildcards'];
A query which displays a list of tables −
./hcat –e “Show tables;”
Which command is used to SHOW PARTITIONS lists in HCatalog?
Ans. Basically, to see the partitions that exist in a particular table, we can use SHOW PARTITIONS command.
Syntax −
SHOW PARTITIONS table_name;
State syntax of the command that is used to drop a partition.
Ans. In order to drop a partition, the syntax is −
./hcat –e "ALTER TABLE table_name DROP [IF EXISTS] PARTITION partition_spec,.
PARTITION partition_spec,...;"
Explain Creating an Index.
Ans. In simple words, a pointer on a particular column of a table is what we call an Index. So, we can say creating an index simply means creating a pointer on a particular column of a table.
Syntax:
CREATE INDEX index_name
ON TABLE base_table_name (col_name, ...)
AS 'index.handler.class.name'
[WITH DEFERRED REBUILD]
[IDXPROPERTIES (property_name = property_value, ...)]
[IN TABLE index_table_name]
[PARTITIONED BY (col_name, ...)][
[ ROW FORMAT ...] STORED AS ...
| STORED BY ...
]
[LOCATION hdfs_path]
[TBLPROPERTIES (...)]
State syntax of the command to drop an index.
Ans. Syntax to drop an index is−
DROP INDEX <index_name> ON <table_name>
What is the role of data transfer API in HCatalog?
Ans. In HCatalog there is a data transfer API for parallel input as well as output without even using MapReduce. It uses a basic storage abstraction of tables and rows for the purpose of reading and writing data from/into it.
What are the main classes of Data Transfer API?
Ans. There are several classes of Data Transfer API, such as −
HCatReader − This helps to Read data from a Hadoop cluster.
HCatWriter − However, HCatWriter Writes data into a Hadoop cluster.
DataTransferFactory − Well, DataTransferFactory class generates reader as well as writer instances.
Explain HCatReader.
Ans. An abstract class internal to HCatalog is what we call HCatReader. Basically, from where the records are to be retrieved, HCatReader abstracts away the complexities of the underlying system.
Explain HCatWriter.
Ans. Basically, this abstraction is internal to HCatalog. As the main function, this abstraction facilitates writing to HCatalog from external systems. Although, make sure to use DataTransferFactory, rather than instantiate it directly.
Explain HCatInputFormat and HCatOutputFormat.
Ans. HCatInputFormat-
In order to read data from HCatalog-managed tables, we use HCatInputFormat along with MapReduce jobs. Also, it exposes a Hadoop 0.20 MapReduce API. That API helps for reading data as if it had been published to a table.
HCatInputFormat –
Similarly, we use HCatOutputFormat with MapReduce jobs, but to write data to HCatalog-managed tables. This also exposes a Hadoop 0.20 MapReduce API for the purpose of writing data to a table.
Ans. In simple words, a table storage management tool for Hadoop is HCatalog. As it main function it exposes the tabular data of Hive metastore to other Hadoop applications. Moreover, to easily write data onto a grid, it enables users with different data processing tools (Pig, MapReduce). Also, we don’t have to worry about where or in what format their data is stored.
So, basically, a key component of Hive which enables the users to store their data in any format and any structure is HCatalog.
Explain how HCatalog enables right tool for right Job?
Ans. Especially, for data processing such as Hive, Pig, and MapReduce, Hadoop ecosystem have different tools. Since as a benefit these tools don’t need metadata, so, they can still benefit from it when it is present. Though, to share data more easily, sharing a metadata store also enables users across tools. In addition, it is very common that using a Pig or MapReduce, a workflow where data is loaded and normalized and then analyzed via Hive. So, users of each tool have immediate access to data created with another tool, if all these tools share one metastore. In all, there is no need of loading or transfer steps here.
How HCatalog helps to capture processing states to enable sharing?
Ans. Straight forward, we can publish our analytics results via HCatalog. Thus different programmer can access our analytics platform with the help of “REST”. So, all the published schemas by us are also useful to other data scientists. They can use our discoveries as inputs into a subsequent discovery.
HCatalog Freshers Advanced Experienced Interview Questions and Answers |
HCatalog helps to Integrate Hadoop with everything. Explain?
Ans. Since, for the enterprise, Hadoop opens up a lot of opportunities; but it must work with and augment existing tools in order to fuel adoption. Similarly, In HCatalog, with a familiar API and SQL-like language REST services opens up the platform to the enterprise. In addition, to more deeply integrate with the Hadoop platform, enterprise data management systems uses HCatalog.
What are the general Prerequisites to learn HCatalog?
Ans. In order to learn HCatalog, an individual must have a basic knowledge of Core Java along with Database concepts of SQL. In addition, one must know about Hadoop File system and any of Linux operating system flavors for learning it.
Who is intended audience to learn HCatalog?
Ans. The professionals those are aspiring to make a career in Big Data Analytics by using Hadoop Framework, must go for this tutorial. Apart from them, all the ETL developers and professionals those are into analytics, in general, can learn through this tutorial for good effect.
Why HCatalog?
Ans. Some specific reasons for using HCatalog are:
Enabling the right tool for right Job
Capture processing states to enable sharing
Integrate Hadoop with everything
Explain HCatalog Architecture in Brief?
Ans. Basically, in any format for which a SerDe (serializer-deserializer) can be written, HCatalog supports reading and writing files. Formats like RCFile, CSV, JSON, SequenceFile, and ORC file are supported by default. Although, we must offer the InputFormat, OutputFormat, and SerDe, to use a custom format.
On top of the Hive metastore and incorporates Hive’s DDL, HCatalog is built. Moreover, for Pig and MapReduce, HCatalog offers read as well as write interfaces and also for issuing data definition and metadata exploration commands it uses Hive’s command line interface.
How to invoke Command Line Interface?
Ans. From the command $HIVE_HOME/HCatalog/bin/hcat where $HIVE_HOME is the home directory of Hive, it is possible to invoke HCatalog Command Line Interface (CLI). Further, to initialize the HCatalog server, we use hcat command.
Follow the link to learn more about HCatalog Commands
command to initialize HCatalog command line:
cd $HCAT_HOME/bin
./hcat
State some command line options.?
Ans. Commands supported by HCatalog CLI are −
-g
Usage- hcat -g mygroup …
Basically, to create must have the group “mygroup” in the table which we need.
-p
Usage-hcat -p rwxr-xr-x …
Moreover, make sure that the table which we need to create must have several permissions like read, write, and execute.
-f
Usage- hcat -f myscript.HCatalog …
Further to make execution myscript.HCatalog is a script file have some DDL commands.
-e
Usage- hcat -e ‘create table mytable(a int);’ …
First of all, consider the following string as a DDL command only after then it is possible to execute it.
-D
Usage- hcat -Dkey = value …
HCatalog CLI -D, passes the key-value pair to HCatalog, especially as a Java system property.
hcat
In order to print a usage message, we sue hcat CLI.
Explain Alter Table Statement in HCatalog.
Ans. In order to alter a table, we can use the ALTER TABLE statement.
Syntax-
There are various syntaxes, we can use any of them according to what attributes we want to modify in a table:
ALTER TABLE name RENAME TO new_name
ALTER TABLE name ADD COLUMNS (col_spec[, col_spec ...])
ALTER TABLE name DROP [COLUMN] column_name
ALTER TABLE name CHANGE column_name new_name new_type
ALTER TABLE name REPLACE COLUMNS (col_spec[, col_spec ...])
How can we drop a table in HCatalog?
Ans. Drop a table means removing table/column data and their metadata. So, while we use Drop table, it removes the table/column data and their metadata. That table can be of any type either normal or external table. Here Normal table means a table which is stored in metastore, whereas external table means a table which is stored in the local file system. However, irrespective of their types, HCatalog treats both the tables in the same way.
Syntax −
DROP TABLE [IF EXISTS] table_name;
State some DDL Command with brief Description.
Ans. Some DDL CommandS are:
CREATE TABLE
ALTER TABLE
DROP TABLE
CREATE/ALTER/DROP VIEW
SHOW TABLES
SHOW PARTITIONS
Create/Drop Index
DESCRIBE
Explain HCatalog Create Table CLI along with its syntax.
Ans. In HCatalog, we sue Create Table statement to create a table in Hive metastore.
Syntax-
CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.] table_name
[(col_name data_type [COMMENT col_comment], ...)]
[COMMENT table_comment]
[ROW FORMAT row_format]
[STORED AS file_format]
Which command do we use to insert data in HCatalog?
Ans. In general, by using the Insert statement, we can insert data just after creating a table in SQL. However, we use the LOAD DATA statement in HCatalog for inserting data.
Make sure to store bulk records, LOAD DATA is a better option while inserting data into HCatalog. It is possible to load data in two different ways, one of them is from the local file system or another one is from the Hadoop file system.
Syntax-
LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename
[PARTITION (partcol1=val1, partcol2=val2 ...)]
How to create and manage a view in HCatalog?
Ans. A statement CREATE VIEW, do create a view with the given name. However, if a table or view with the same name already exists, an error is thrown. Though, there is a flexibility that we can skip the error by using IF NOT EXISTS option.
Syntax –
CREATE VIEW [IF NOT EXISTS] [db_name.]view_name [(column_name [COMMENT column_comment], ...) ]
[COMMENT view_comment]
[TBLPROPERTIES (property_name = property_value, ...)]
AS SELECT ...;
C
Explain Drop View Statement along with syntax.
Ans. Basically, a DROP VIEW Statement in HCatalog removes metadata for the specified view. Although, make sure no warning is given, when dropping a view referenced by other views.
Syntax-
DROP VIEW [IF EXISTS] view_name;
Explain HCatLoader and HCatStorer APIs.
Ans. HCatLoader
Basically, to read data from HCatalog-managed tables we use HCatLoader along with Pig scripts.
Syntax:
A = LOAD 'tablename' USING org.apache.HCatalog.pig.HCatLoader();
HCatStorer
Whereas to write data to HCatalog-managed tables, we can use HCatStorer along with Pig scripts.
Follow the link to learn more about HCatloader and HCatStorer
Syntax:
A = LOAD ...
B = FOREACH A ...
...
...
my_processed_data = ...
STORE my_processed_data INTO 'tablename' USING org.apache.HCatalog.pig.HCatStorer();
Name all HCatalog Features.
Ans. Here is the list of best HCatalog Features:
Table and storage management layer
Table abstraction layer
Any Format
Shared schema and data type
Integration with other tools
Expose the information
Binary format
Authentication
Adding columns to partitions
Support Hive tables
Name Applications and Use Cases of HCatalog.
Ans. Some key uses could be:
Enabling the right tool for right Job
Capture processing states to enable sharing
Integrate Hadoop with everything
Some applications of HCatalog:
SQL INTERFACE FOR HADOOP? HCATALOG AS ENABLER…
HADOOP DEVELOPER PRODUCTIVITY AND HCATALOG
GOOD FOR THE ECOSYSTEM IS GOOD FOR YOU.
Ans. On defining Show Tables statement, it simply displays the names of all tables. However, it lists tables either from the current database, or with the IN clause, or in a specified database, by default.
Syntax of SHOW TABLES is−
SHOW TABLES [IN database_name] ['identifier_with_wildcards'];
A query which displays a list of tables −
./hcat –e “Show tables;”
Which command is used to SHOW PARTITIONS lists in HCatalog?
Ans. Basically, to see the partitions that exist in a particular table, we can use SHOW PARTITIONS command.
Syntax −
SHOW PARTITIONS table_name;
State syntax of the command that is used to drop a partition.
Ans. In order to drop a partition, the syntax is −
./hcat –e "ALTER TABLE table_name DROP [IF EXISTS] PARTITION partition_spec,.
PARTITION partition_spec,...;"
Explain Creating an Index.
Ans. In simple words, a pointer on a particular column of a table is what we call an Index. So, we can say creating an index simply means creating a pointer on a particular column of a table.
Syntax:
CREATE INDEX index_name
ON TABLE base_table_name (col_name, ...)
AS 'index.handler.class.name'
[WITH DEFERRED REBUILD]
[IDXPROPERTIES (property_name = property_value, ...)]
[IN TABLE index_table_name]
[PARTITIONED BY (col_name, ...)][
[ ROW FORMAT ...] STORED AS ...
| STORED BY ...
]
[LOCATION hdfs_path]
[TBLPROPERTIES (...)]
State syntax of the command to drop an index.
Ans. Syntax to drop an index is−
DROP INDEX <index_name> ON <table_name>
What is the role of data transfer API in HCatalog?
Ans. In HCatalog there is a data transfer API for parallel input as well as output without even using MapReduce. It uses a basic storage abstraction of tables and rows for the purpose of reading and writing data from/into it.
What are the main classes of Data Transfer API?
Ans. There are several classes of Data Transfer API, such as −
HCatReader − This helps to Read data from a Hadoop cluster.
HCatWriter − However, HCatWriter Writes data into a Hadoop cluster.
DataTransferFactory − Well, DataTransferFactory class generates reader as well as writer instances.
Explain HCatReader.
Ans. An abstract class internal to HCatalog is what we call HCatReader. Basically, from where the records are to be retrieved, HCatReader abstracts away the complexities of the underlying system.
Explain HCatWriter.
Ans. Basically, this abstraction is internal to HCatalog. As the main function, this abstraction facilitates writing to HCatalog from external systems. Although, make sure to use DataTransferFactory, rather than instantiate it directly.
Explain HCatInputFormat and HCatOutputFormat.
Ans. HCatInputFormat-
In order to read data from HCatalog-managed tables, we use HCatInputFormat along with MapReduce jobs. Also, it exposes a Hadoop 0.20 MapReduce API. That API helps for reading data as if it had been published to a table.
HCatInputFormat –
Similarly, we use HCatOutputFormat with MapReduce jobs, but to write data to HCatalog-managed tables. This also exposes a Hadoop 0.20 MapReduce API for the purpose of writing data to a table.
Post a Comment