I strongly believe that dimensional model cannot/should not be build (effectively) in Hive. First of all Hive is not for slicing and dicing ( as expected in a DataMart), Hive is a SQL wrapper which makes executing complex MR jobs very very easier.
Still there are many Projects ( small & large ) which are trying to accomplish Dimensional modeling in Hive.
With an open framework like Hive, its certainly not impossible. But the question is how effective it is?
Effective in the sense, How scalable? How flexible? and How easy it is for an BI team to implement this?
Mostly answers to all the above are negative. My belief is that a BI team should spend most of its time in solving business problems not technical problems. If we implement dimensional model in Hive we will end-up writing code for all Hive's not-haves w.r.t any Database. And that a big overhead especially on smaller BI teams with high expectations.
This is a very good paper on how to do implement http://dbtr.cs.aau.dk/DBPublications/DBTR-31.pdf
So what can be Hive's in a datawarehouse reporting and ETL? Hive can play two roles.
1. It can act as a staging which actually resembles the source, with incrementally captured partitions. Later data from this layer can be ETLed to a Datamart in Datawarehouse.
2. A flattened table which merges all the information into a single flattened table.
3. All reporting DB's will serve only for certain History data, The historical dimensional model can be flattened and imported into Hive partitions. This gives greater scalability of Archiving. You can actually query and report out of this archive. (certainly not in real-time !!)
All these solutions are because of Hive's biggest advantage of scalability and performance. And also taking into consideration that Hive is not suitable for Updates and historical revisits.
Still there are many Projects ( small & large ) which are trying to accomplish Dimensional modeling in Hive.
With an open framework like Hive, its certainly not impossible. But the question is how effective it is?
Effective in the sense, How scalable? How flexible? and How easy it is for an BI team to implement this?
Mostly answers to all the above are negative. My belief is that a BI team should spend most of its time in solving business problems not technical problems. If we implement dimensional model in Hive we will end-up writing code for all Hive's not-haves w.r.t any Database. And that a big overhead especially on smaller BI teams with high expectations.
This is a very good paper on how to do implement http://dbtr.cs.aau.dk/DBPublications/DBTR-31.pdf
So what can be Hive's in a datawarehouse reporting and ETL? Hive can play two roles.
1. It can act as a staging which actually resembles the source, with incrementally captured partitions. Later data from this layer can be ETLed to a Datamart in Datawarehouse.
2. A flattened table which merges all the information into a single flattened table.
3. All reporting DB's will serve only for certain History data, The historical dimensional model can be flattened and imported into Hive partitions. This gives greater scalability of Archiving. You can actually query and report out of this archive. (certainly not in real-time !!)
All these solutions are because of Hive's biggest advantage of scalability and performance. And also taking into consideration that Hive is not suitable for Updates and historical revisits.
I’ve desired to post about something similar to this on one of my blogs and this has given me an idea. Cool Mat.
ReplyDeletepython interview questions and answers | python tutorialspython training institute in electronic city
This comment has been removed by the author.
ReplyDeleteI simply wanted to thank you so much again. I am not sure the things that I might have gone through without the type of hints revealed by you regarding that situation.
ReplyDeleteangular js training in chennai
angular js training in velachery
full stack training in chennai
full stack training in velachery
php training in chennai
php training in velachery
photoshop training in chennai
photoshop training in velachery
This post is so interactive and informative.keep update more information...
ReplyDeleteGerman Classes in Velachery
German Classes in chennai
Great post. Thanks for sharing such a useful blog.
ReplyDeletePHP Training in Velachery
PHP Training in Velachery