Tuesday, May 13, 2014

"Big Data" - Im a believer!

What is the use case where Hadoop is "the" best technology to use?. Im working in Hadoop and Big-data for some years, using it for production analytics and learning and improving everyday .Attended numerous big data webminars and conferences. Talked great about the mighty of Big data to those who never used it or scared about it. But I haven't felt the real benefit of using Hadoop or big data ( Im talking about my case !) apart from its cost effective.Most of the stuff can be done using some other existing technology. With the traction Hadoop is getting in technology grounds there must be something really BIG use case here. Yeah there are stories about Facebook, twitter and other sites, but there must be something which Hadoop can only do. I couldn't find that unavoidable purpose for this technology, until...
I recently enrolled in "machine learning" course from Coursera. An awesome course from an awesome teacher (Andrew Ng). I never imagined that I could understand why all the loads of Math in school and college (why the heck differentiation, integration?). I could have learned just languages and computer language. But this course changed my perspective towards math like a slap across the face. All the things we see, use and consume are developed using some form of math. This course is an eye-opener. I would recommend this course to anyone who has a data background, you will not see any data as a waste of memory, you will see it as a gold-mine waiting for the right explorer.

Now while machine learning, there are lots of techniques to predict something (give 'x' and it will give you 'y' not simple like that :0). Im still a beginner but what I found is that we need to do thousands of, in some cases even millions of iterations just to find a simple parameter. And there could be thousands of parameters in some cases. That's the aha! moment for me, Hadoop/ Big data is the only place "where you can store and process humongous data" in a cost effective way. Previously engineers could have limited themselves in number of iteration because of resources, not anymore they have the power of HDFS and mapreduce to store and process respectively.

There are many compelling use-case I seen in numerous webniars, conferences and whitepapers, but machine learning / predictive modeling is the most compelling reason( at least for me) that Hadoop is indispensable for the future analytics world. Especially we are living in digital-social world where the factors(x1,x2,x3...Xn) which could affect any outcome(y) is ever increasing. Now I'm no more a practitioner but a believer!

1 comment: