Big data is
not more a fad that geeks, major enterprises, start-ups alike are in love with
- it is a reality driven by the dynamic and diverse nature of channels,
business lines, innovative products and customer behavior. All the 4Vs –
Volume, Velocity, Variety and Veracity of data are true and us analysts, data
scientists, data professionals, strategists, business leaders have to live
with. Investments are being made into Technology, Infastructure and Talents but
like a wise man once said “all problems in the world cannot be solved by
throwing money at it”.
Reality Check:
It is not as simple as
creating a data lake where everything can be dumped and Data Scientists and
Analysts can feed off of that. The adoption should not be just an Investment
question (cost of data storage, data preparation, management and retrieval) on
which predominantly the decisions are made. It is also a Returns question
(Reports, Business Analytics, Advanced Analytics, Data Products, Decision
Engines, etc.) which is usually ignored when making the decision. Investment
only decisions usually create a sub-optimal experience for the end users, i.e.,
it may be efficient for Reporting but may be very slow and inefficient when an
Analyst has to use it or vice versa. Adoption and Engagement needs a Strategic
framework of key corporate needs, an Tactical Outcome Focused delivery approach
and an iterative learning execution model.
Scalable Metrics Model:
RDBMS structure is still one
of the most “go-to” framework for Enterprise Data Warehouse and has been so for
decades. The reliability, stability, speed, ease of understanding makes it
optimal for many core services. The downside is the flexibility, extensibility,
cost of modifications and rigidity of the structure which is what Hadoop File
System framework tries to address. But lack of structure brings its own
problems of performance, reliability, error corrections, etc. and just forcing
a structure via Metadata or Aggregates might not be sufficient for a wide
variety of users. We need a hybrid framework which brings in the strengths of
RDBMS with merits of HDFS whose key objective is to serve the diverse needs of
users and is malleable enough to efficiently and effectively change with the
needs. It has to be modular enough to predominantly address a bucket of needs
(e.g, Reports/Decision Engines by functions) but also with connections that can
help connect the dots (e.g., Deep Dive into drivers). The Scalable metrics
model is one such option and we are discussing it at the Global Big Data Conference
at Santa Clara on Sep 2nd.
More about the Global Big Data Conference:
It brings together leaders and practitioners in the field of Big Data
and provides a platform for sharing ideas, getting feedback and learning about
the new trends and technologies that are in the industry today. We are excited
to be a part of it and hope to have a very good chat and learning session.
The slides on Scalable
Metrics Model can be found at:
No comments:
Post a Comment