When HBase Shines
One place where HBase really does well is when you have r
ecords that are very sparse. This might mean
un- or semi-structured data. In any case, unlike row-oriented RDBMSs, HBase is
column-oriented, meaning that nulls are stored for free. If you have a row that only has one out of dozens of possible columns, literally only that single column is stored. This can mean huge savings in both disk space and IO read time.
Another way that HBase matches well to un- or semi-structured data is in its treatment of column families. I
n HBase, individual records of data are called cells. Cells are addressed with a row key/column family/cell qualifier/timestamp tuple. However, when you define your schema, you only specify what column families you want, with the qualifier portion determined dynamically by consumers of the table at runtime. This means that you can store pretty much anything in a column family without having to know what it will be in advance. This also allows you to essentially store one-to-many relationships in a single row! Note that this is not denormalization in the traditional sense, as you aren’t storing one row per parent-child tuple. This can be very powerful - if your child entities are truly subordinate, they can be stored with their parent, eliminating all join operations.
Partager