BigDW: HBase Conference 2013

At my current workplace we are using Hadoop and related technologies for Datawarehousing. The main advantage we get out of this was ofcourse cost and scalability. But Hadoop and related technologies (especially Hive,PIG) are not perfect for DW and BI development projects. The major issue we faced are updates and capturing incremental data, since we are trying to do a Dimensional model with slowly changing dimensions.

I heard many saying about HBase, which can do updates in Hadoop. But my question is, Is HBase suitable for DW related updates? , Is it suitable for Dimensional modeling?, How about maintainance since DW systems constantly undergo change..

So got an oppurtunity to attend HBase conference to get my answer.. and certainly its a no.. There are some use cases where HBase is used to serve reports or an online reporting portal but certainly unsuitable for a Bus architecture based Dimensional models.

Anyway HBase seems to be serving magnifiently for online applications and nice to see some case studies which we use everyday like Groupon, jws player.

Some notes from the sessions Im able to attend..

apache hive and Hbase

Seems like data-retrieval and updates are very programatic and hard in HBase
Since Hive is becoming very popular because of SQL nature, this presentation talks about how to use Hive above HBase
Hive over Hbase - online,unstructured and used by programmers
Hive over HDFS - offline, structured and used by analyst
use case 1 -- keep dimensions in Hbase .. fact in Hive and query can derive data from both
use case 2 -- continuous updates to Hbase and periodic dump to HDFS
join the above with a single Hive query
Actual presentation link
http://www.slideshare.net/hortonworks/integration-of-hive-and-hbase

Phoenix Project

An SQL skin for HBase
embedded JDBC driver to run HBase

presentation link
http://www.slideshare.net/dmitrymakarchuk/phoenix-h-basemeetup

Impala and Hbase

Cloudera trying to help the analyst community by returning query data faster
problem could be on efficiency
it may not used for a batch process like ETL - as there is a possibility of failiure
can be used for analyst queries
not the best yet -- in other words not production ready yet
Im skeptical about the optimizer these tools use

Couldnt find the actual presentation, but this PPT has most of the info
http://www.slideshare.net/cloudera/cloudera-impala-a-modern-sql-engine-for-hadoop

Apache Drill

Another SQL for HBase, I liked this talk since the presenter from MapR accepted that this product is not yet production ready.
The design has a Distributed cache in each data-node, which means more data-transfer to build this cache.
Maybe suitable for small, quick analytical queries. Have to wait and see for production use cases.

http://www.slideshare.net/ApacheDrill/apache-drill-technical-overview

HBase in Pinterest - Use case

Interesting talk about pinning, unpinning, follow etc, Couldnt find the actual presentation

And there is WiBiData's Kiji ..maybe this link will give some idea
http://gigaom.com/2012/11/14/wibidata-open-sources-kiji-to-make-hbase-more-useful/

BigDW

Friday, June 14, 2013

HBase Conference 2013

apache hive and Hbase

Phoenix Project

Impala and Hbase

Apache Drill

HBase in Pinterest - Use case

1 comment: