Big Data: Data Science emerging field to support new levels of understanding

note: lots of good embedded links this time 🙂

Today’s blog is one of a pair of essays I’m going to write on two different perspectives associated with Big Data. I believe Big Data is notably important and impactful to the immediate future of our culture. First we’ll look at what Big Data is and the dynamics of dataology, or what most people call Data Science.

I am a latent Data Scientist.  It was probably my calling. Today I’m in the sales function of our business, but I started my career with a Masters in Database Management. I was doing C++, B-trees, Bloom filters, working with SAS  data sets and taking endless classes in statistics/economics.  Frankly, I loved it. My passion was the data structures and the insight that could be acquired by slicing the chunks.  Organized data, in its prescribed context, is real. It’s not some pundit’s opinion on the  It is based on facts.

 For whatever reason, I ended up going down the “line of business” systems integration path. My career led me into workflow, document management, and supply chain technologies.  These were connective and provided an immersive business process environment, but the treatment of the data was still inefficient, and disrespectful to its value.

Yes, I have been branded a “purist” in my past life, but ultimately, my focus was more on the data and the data models than on corporate profit.  I consistently ran into the walls of “good enough”.  And I realized that rich data models weren’t considered profitable endeavors by the majority of the American industry.  Instead, most wanted little snippets of data, transactional crap, when we knew so much more.

Several industry initiatives showed up: “Business Intelligence” (BI), “Information Lifecycle Management” (ILM) and “Master Data Management” (MDM) all with promise of a better strategy for data, but they were realized into purpose-built tactical systems to address specific real life problems of sales growth, regulatory submissions and company acquisitions. None of them truly structured and exploited the fundamental Intellectual Property (IP) of the data that each company has about their customers, employees, and processes.

Add to these points that we all see “the disconnect” between data points in our daily lives. Banks that don’t recognize the “savings account you” from the “home loan you”. Or, my employee electronic health records system that shows all my doctors visit records and tests, but when I take an online health risk assessment, it still asks me pages of questions about blood pressure and cholesterol.    

Well I am an optimist, and I believe that the current industry initiative called “Data Science” has structural differentiation from past trends and will likely get closer to my vision in several ways.

Data Science will be more scientific with data because:

  • Data is growing in such a manner that we are actually in trouble. We have multiple points of failure in people, process, and technology; and most industry leaders recognize it. Look at this page on the digital universe, if you don’t believe me. 
  • There is a convergence of three powerful movements.
    •  Data generating devices both personal mobile and industrial assets are creating data.
    • Global data mining and analytics is on the rise.
    • Significant improvement in data warehouse and, data analytics technologies allowing for the next level of processing.  Here’s EMC’s Big Data page as an example
  • Big Data is the aggregation and analysis of heterogeneous data sets/collectors.  The concept of big data is an important pillar in the new “Data Analytics” investments which will be pervasive over the next several years.  By definition, Big Data is not just about infinitely large purpose built databases.  It’s like a hive of bees; Big Data to me is broader, more dispersed and hierarchical in nature.  It’s funny we will know less about each individual piece of data, but in mass, we’ll know more about ourselves in many more ways.  Today these initiatives will be funded by the standard engines of power and profit, but some of the most impressive data science I have seen so far is in the scientific community.  Spend an hour watching TED Videos like Hans Rosling leveraging Microstrategy visuals on his HIV data analysis work.  Or one of my favorite books of all time, freakonomics based on the data analysis work of statistician Steven Levitt.

So, I am excited about the advancement of “Science” and “Data” in this emerging field.  EMC Corporation is showing industry leadership in this growing discipline including funding studies and an annual conference for data scientists.   EMC’s Data Science Summit (EDSS11) May 23 2011 brought together an international consortium of data scientists to help define core fundamentals and highlight the building need for resources in this field.  I applaud EMC for stepping into the proactive mentorship of the data industry. It’s a great fit for EMC and a place we need to invest in advancement.  Additionally, EMC just published a survey from the summit.

Here are some of the summary findings:

       Informed Decision-making—Only 1/3 of respondents are very confident in their company’s ability to make business decisions based on new data.

       Looming Talent Shortage—65% of data science professionals believe demand for data science talent will outpace the supply over the next 5 years – with most feeling that this supply will be most effectively sourced from new college graduates.

       Customer Insights—Only 38% of business intelligence analysts and data scientists strongly agree that their company uses data to learn more about customers.

       Lack of Data Accessibility—Only 12% of business intelligence professionals and 22% of data scientists strongly believe employees have the access to run experiments on data – undermining a company’s ability to rapidly test and validate ideas and thus its approach to innovation.

       Advanced Degrees—Data scientists are 3 times as likely as business intelligence professionals to have a Master’s or Doctoral degree.

       Higher-Level Skills—Data scientists require significantly greater business and technical skills than today’s business intelligence professional. According to the Data Science Study, they are twice as likely to apply advanced algorithms to data, but also 37% more likely to make business decisions based on that data.

 You will note in the survey that data scientists are inherently different from BI professionals.  This confirms my beliefs that we’re going somewhere more all-encompassing than a “sales report” and that we’ll spend more time and money on the submerged part of the iceberg.

If you’re interested in next year’s summit click this link EDSS12