What is Apache HBase?
One of the sub-project of Apache Hadoop, which was mainly designed for NoSql database(Hadoop Database), Big Data store and a distributed, scalable is what we call HBase. Especially, when we need random, real-time read/write access to our Big Data we use HBase. Moreover, after Google’s Bigtable it is an open-source, distributed, versioned, non-relational database modeled.
Should the region server be located on all DataNodes?
Absolutely yes because Region Servers run on the same servers as DataNodes.
Suppose that your data is stored in collections, for instance, some binary data, message data or metadata is all keyed on the same value. Will you use HBase for this?
Definitely yes. Since, whenever key-based access to data is required for storing and retrieving, it is ideal to use HBase.
Assume that an HBase table Student is disabled. So, how to access the student table once it is disabled, by using Scan command?
With the help of Scan command, any HBase table that is disabled cannot be accessed.
What do you understand by compaction?
By having one file per store, it is not possible to achieve optimal performance, during periods of heavy incoming writes so as to reduce the number of disk seeds for every read HBase combines all these HFiles. It is what we call Compaction in HBase.
Explain the various table design approaches in HBase.
There are two HBase table design approaches we can use. Tall-Narrow and Flat-Wide.
The major difference between flat-wide and tall-narrow approach is when there are less number of rows and a large number of columns we use a tall-narrow approach or when there is less number of columns and a large number of rows, we use flat-wide approach.
What is the best practice on deciding the number of column families for HBase table?
Since every column family in HBase is stored as a single file, it is ideal not to exceed the number of columns families per HBase table by 15. Hence, to read and merge multiple files, we will need a large number of columns families.
How will you implement joins in HBase?
By using MapReduce jobs join queries, HBase does support joins. Basically, it helps to retrieve data from various HBase tables.
Define MapReduce.
To solve the problem of processing in excess of terabytes of data in a scalable way, we use MapReduce as a process.
What is the full form of MSLAB?
MSLAB refers to Memstore-Local Allocation Buffer.
Which type of data HBase can store?
Any type of data that we can convert into the bytes, HBase can store.
Features of HBase.
Some best features of HBase:
Consistency
Atomic read and write
Sharding
High Availability
Client API
Scalability
Hadoop/HDFS integration
Distributed storage
Data Replication
Failover support and load sharing
API support
MapReduce support
Back up support
Sorted row keys
Real-time processing
faster lookups
Type of Data
Schema-less
High throughput
Easy to use Java API for client access
Thrift gateway and a REST-ful Web service
What is Client API?
In order to perform CRUD operations on HBase tables, we use Java client API for HBase. Because of Java Native API of HBase, it offers programmatic access to DML (Data Manipulation Language).
Explain MemStore?
On defining MemStore, updates in memory(sorted KeyValues) are stored in the MemStore. And, also the Data which consists of sorted key/values is stored in an HFile.
HBase Operational Commands are:
Get
Delete
Put
Increment
Scan
Which code do we use to open the connection in Hbase?
To open a connection, following code:
Configuration myConf = HBaseConfiguration.create();
HTableInterface usersTable = new HTable(myConf, “users”);
Which command do we use to show the version?
In order to show the version of HBase, we use version command:
Syntax –
hbase> version
What is the use of tools command?
To list the HBase surgery tools we use this command.
What is NoSQL?
“NoSQL” database means the database isn’t an RDBMS which supports SQL as its primary access language. HBase is a type of NoSQL Database.
However, we can say, HBase is really more a “Data Store” since it lacks many of the features we find in an RDBMS. For example, secondary indexes, triggers, typed columns, and advanced query languages, etc.
What do you understand by Filters in HBase?
Basically, by allowing users to add limiting selectors to a query and eliminate the data that is not required, HBase filters enhance the effectiveness of working with large data stored in tables.
There are 18 filters in HBase–
TimestampsFilter
PageFilter
MultipleColumnPrefixFilter
FamilyFilter
ColumnPaginationFilter
SingleColumnValueFilter
RowFilter
QualifierFilter
ColumnRangeFilter
ValueFilter
PrefixFilter
SingleColumnValueExcludeFilter
ColumnCountGetFilter
InclusiveStopFilter
DependentColumnFilter
FirstKeyOnlyFilter
KeyOnlyFilter
Explain about the data model operations in HBase.
Data model operations are:
Put Method – It helps to store data in HBase.
Get Method – In HBase, it helps to retrieve data which is stored.
Delete Method – It helps to delete the data from HBase tables.
Scan Method –This operation helps to iterate over the data with larger key ranges or the entire table.
How will you backup an HBase cluster?
In 2 ways we can perform HBase cluster backups –
Live Cluster Backup: In this method, to copy the data from one table to another on the same cluster or another cluster copy table utility is used. Also, to dump the contents of the table onto HDFS on the same cluster, the export utility can be used.
Full Shutdown Backup:
While here, a periodic complete shutdown of the HBase cluster is performed And, only for back-end analytic capacity this kind of approach can be used.
Does HBase support SQL like syntax?
There is no SQL like support in HBase
Is it possible to iterate through the rows of HBase table in reverse order?
No.
In order to shut down the cluster, we use it.
What is the use of truncate command?
we use it to disable, recreate and drop the specified tables.
Which command do we use to run HBase Shell?
To run the HBase shell, we use $ ./bin/HBase shell command.
Which command is available to show the current HBase user?
To show the current HBase user, HBase uses the whoami command.
How to delete the table with the HBase shell?
In order to delete a table, first, disable it then delete it.
What is the use of InputFormat in MapReduce process?
In MapReduce process, InputFormat the input data. Afterward, it returns a RecordReader instance which defines the classes of the key and value objects and provides a next() method which we use to iterate over each input record.
One of the sub-project of Apache Hadoop, which was mainly designed for NoSql database(Hadoop Database), Big Data store and a distributed, scalable is what we call HBase. Especially, when we need random, real-time read/write access to our Big Data we use HBase. Moreover, after Google’s Bigtable it is an open-source, distributed, versioned, non-relational database modeled.
Should the region server be located on all DataNodes?
Absolutely yes because Region Servers run on the same servers as DataNodes.
Suppose that your data is stored in collections, for instance, some binary data, message data or metadata is all keyed on the same value. Will you use HBase for this?
Definitely yes. Since, whenever key-based access to data is required for storing and retrieving, it is ideal to use HBase.
Assume that an HBase table Student is disabled. So, how to access the student table once it is disabled, by using Scan command?
With the help of Scan command, any HBase table that is disabled cannot be accessed.
What do you understand by compaction?
By having one file per store, it is not possible to achieve optimal performance, during periods of heavy incoming writes so as to reduce the number of disk seeds for every read HBase combines all these HFiles. It is what we call Compaction in HBase.
Apache HBase Freshers Basic Interview Questions and Answers |
Explain the various table design approaches in HBase.
There are two HBase table design approaches we can use. Tall-Narrow and Flat-Wide.
The major difference between flat-wide and tall-narrow approach is when there are less number of rows and a large number of columns we use a tall-narrow approach or when there is less number of columns and a large number of rows, we use flat-wide approach.
What is the best practice on deciding the number of column families for HBase table?
Since every column family in HBase is stored as a single file, it is ideal not to exceed the number of columns families per HBase table by 15. Hence, to read and merge multiple files, we will need a large number of columns families.
How will you implement joins in HBase?
By using MapReduce jobs join queries, HBase does support joins. Basically, it helps to retrieve data from various HBase tables.
Define MapReduce.
To solve the problem of processing in excess of terabytes of data in a scalable way, we use MapReduce as a process.
What is the full form of MSLAB?
MSLAB refers to Memstore-Local Allocation Buffer.
Which type of data HBase can store?
Any type of data that we can convert into the bytes, HBase can store.
Features of HBase.
Some best features of HBase:
Consistency
Atomic read and write
Sharding
High Availability
Client API
Scalability
Hadoop/HDFS integration
Distributed storage
Data Replication
Failover support and load sharing
API support
MapReduce support
Back up support
Sorted row keys
Real-time processing
faster lookups
Type of Data
Schema-less
High throughput
Easy to use Java API for client access
Thrift gateway and a REST-ful Web service
What is Client API?
In order to perform CRUD operations on HBase tables, we use Java client API for HBase. Because of Java Native API of HBase, it offers programmatic access to DML (Data Manipulation Language).
Explain MemStore?
On defining MemStore, updates in memory(sorted KeyValues) are stored in the MemStore. And, also the Data which consists of sorted key/values is stored in an HFile.
What are the operational commands of HBase?
HBase Operational Commands are:
Get
Delete
Put
Increment
Scan
Which code do we use to open the connection in Hbase?
To open a connection, following code:
Configuration myConf = HBaseConfiguration.create();
HTableInterface usersTable = new HTable(myConf, “users”);
Which command do we use to show the version?
In order to show the version of HBase, we use version command:
Syntax –
hbase> version
What is the use of tools command?
To list the HBase surgery tools we use this command.
What is NoSQL?
“NoSQL” database means the database isn’t an RDBMS which supports SQL as its primary access language. HBase is a type of NoSQL Database.
However, we can say, HBase is really more a “Data Store” since it lacks many of the features we find in an RDBMS. For example, secondary indexes, triggers, typed columns, and advanced query languages, etc.
What do you understand by Filters in HBase?
Basically, by allowing users to add limiting selectors to a query and eliminate the data that is not required, HBase filters enhance the effectiveness of working with large data stored in tables.
There are 18 filters in HBase–
TimestampsFilter
PageFilter
MultipleColumnPrefixFilter
FamilyFilter
ColumnPaginationFilter
SingleColumnValueFilter
RowFilter
QualifierFilter
ColumnRangeFilter
ValueFilter
PrefixFilter
SingleColumnValueExcludeFilter
ColumnCountGetFilter
InclusiveStopFilter
DependentColumnFilter
FirstKeyOnlyFilter
KeyOnlyFilter
Explain about the data model operations in HBase.
Data model operations are:
Put Method – It helps to store data in HBase.
Get Method – In HBase, it helps to retrieve data which is stored.
Delete Method – It helps to delete the data from HBase tables.
Scan Method –This operation helps to iterate over the data with larger key ranges or the entire table.
How will you backup an HBase cluster?
In 2 ways we can perform HBase cluster backups –
Live Cluster Backup: In this method, to copy the data from one table to another on the same cluster or another cluster copy table utility is used. Also, to dump the contents of the table onto HDFS on the same cluster, the export utility can be used.
Full Shutdown Backup:
While here, a periodic complete shutdown of the HBase cluster is performed And, only for back-end analytic capacity this kind of approach can be used.
Does HBase support SQL like syntax?
There is no SQL like support in HBase
Is it possible to iterate through the rows of HBase table in reverse order?
No.
What is the use of shutdown command?
In order to shut down the cluster, we use it.
What is the use of truncate command?
we use it to disable, recreate and drop the specified tables.
Which command do we use to run HBase Shell?
To run the HBase shell, we use $ ./bin/HBase shell command.
Which command is available to show the current HBase user?
To show the current HBase user, HBase uses the whoami command.
How to delete the table with the HBase shell?
In order to delete a table, first, disable it then delete it.
What is the use of InputFormat in MapReduce process?
In MapReduce process, InputFormat the input data. Afterward, it returns a RecordReader instance which defines the classes of the key and value objects and provides a next() method which we use to iterate over each input record.
Post a Comment