Optimizing on the Question – Big Data Analytics

In life there are opportunity costs. I have spent many a Sunday fretting over whether to clean the basement, watch the game, or spend the afternoon on single track with one of my kids. We unfortunately can’t create a duality that allows us to experience two simultaneous events, I always have to choose across my options. Companies face opportunity costs too. Do you hire head count for Brazil, reduce inventory in mid west, manage COGS, or acquire a company. There are constant decisions that leave alternative opportunities like dust on the floor. Matter of fact, companies are really layers of opportunity cost decisions we call strategies.  Strategies allow us to plan and execute with focus. Sometimes strategies are new and/or overt; sometimes they are organic and implied (a kin to culture). Usually companies have an agenda that is known by most and for this conversation I will call it a strategy. I would say for the majority of companies, strategy isn’t always democratic and out of the box for everyone. There’s a bit of trickle down defined and disseminated. “This is a growth year for us, we need to penetrate the market…”, “We’re concerned about Europe…”, “Divisional goal is 500 million…” or something similar.

Many of us, at various levels in the organization, are bounded by constraints and are asked not to create and advance the cause aimlessly, but to a set of criteria. This is important to align the organization and execute with powerful sameness.  Yet once in a while something happens, the market changes, new tech shows up. When this happens, the technologists get geeky and the C-level gets hungry, scared, or both.  Something “new” arises from the corporation that breaks the rules, and clears paths for new ways of thinking.  Think about the Dotcom wave, it is currently the quintessential example of this scenario.  All of a sudden, 20 year olds were becoming executives, Dotcom companies had valuations that surpassed blue chips, and people who sold socks to dogs were getting millions in VC money. The “rule followers” were punished and mocked as dinosaurs. The companies considered long-standing cultural strategies as broken and many chose to replace process while revolutionizing their offerings.  This worked for some, destroyed others, and embarrassed many in the ranks. We’re now entering a similar wave around Big Data Analytics.

My term for this wave/bubble/era… is Big Data Analytics (BDA). I use this inclusive term for the wave of change that is taking BI and departmental analytics and merging it with social media, global mobile user, cloud, and various forms of big data. Here within this wave again we have many great examples about how knowing more about us (from what data exists around us) allows for new insights and is ushering a new age. There are examples like “The Human Face of Big Data” (#HFOBD) with Rick Smolan, EMC/Greenplum Analytics Workbench on a 1000 node cluster developed to study big data (i.e. twitter, facebook, etc). or SAP’s recent “Real-Time Race” (#RealTimeRace) pitting two System Integrators against each other on stage developing live solutions on HANA. These bits of news get us all thinking “what if”.  However, as a large group of capitalists, I’m not sure we know how to leverage it, how to make decisions around opportunity costs for BDA. We’re in a bit of an innovation bubble and the ultimate question is what can we do to improve and prioritize our choices?

I have a few recommendations for everyone to consider in their business. It’s a short list of 3 things that I believe the smart companies will consider either organically or by reading this blog 😉

1)      Data is the new Information Technology (IT). Read “The Big Switch” by Nicholas Carr and you’ll see that since the birth of the “industrial company”, there has been a “technology” group which consists of smart, well-paid resources who apply the latest tech to business. First it was electricity & machines, then business systems, then computers, then data centers, and I propose the next wave will be whatever Big Data Analytics becomes.  (I would call it “Knowledge Management”, but we already blew that logo in the 90’s…) What this means is BDA will be pervasive and inject itself in many aspects of business, not just creating opportunity to increase revenue through traditional means.  Companies need to build new disciplines that identify and develop data use. Picture the 1960-1970’s. We used vacuum based mainframes, typewriters, rotary phones, and business men took 2 hour martini lunches. The world of iphones, tablets and angry birds is very different. I believe our future world will be thoroughly basted with data driven wisdom which will have even larger impacts.

2)      A Corporate Decision Strategy is needed. How we make decisions and what questions do we ask? We need to be much better at asking questions than we are today. I am under impressed by the long line of shopping basket, alternative offer, or Twitter sentiment studies I see.  This is applying old perspectives of your customer to a field with much greater opportunity. How do we involve crowd sourcing, self-service, social media, and data analytics to change the customer experience, the corporate workforce, and ultimately what is considered corporate core competencies.  I think a great way to start is a review and inventory of BDA capabilities within the business and an assessment of how questions are asked that create data analytics projects. Questions and decisions don’t serve the strategy, they are the strategy going forward.  Also note, the people who can define the right questions are more like artists, musicians, song-writers.  Many know there is a strong correlation between scientists and musicians. Similar to how computing jobs became loaded with creative people, we’ll need to migrate these skills away from programming and towards question development and decision strategy. Good news, the new recruits have grown up as “gammers”, this plays well into foundational skill sets needed.

3)      Change Business Systems – I’ve written assembler code and I’ve run analytics models. They are both very difficult to conceptualize and navigate. It’s a relatively small group of individuals that can execute these skills and thus create a real problem of scale. AI and machine learning are the beginnings of techniques that begin to help the average person to become a contributing member in this new age. We need decision support within BDA to morph from manually running a “K Means” model or hand developing “ordered pairs” to something less academically rigorous, and data capture/management/cleansing to become more intuitive and automated.  If we can automate “Mario Kart” as far from assembler as it is today, we can do the same in the era of BDA.

As companies refine and develop their BI/Data Analytics programs into the era of BDA. I think there’s an equivalent need to rethink the questions they ask. Do we need less sales execs? Do we need more data security?  Do we want to know what else we could sell someone buying running shoes, or do we want to help them design a new shoe specifically match their demographic/health needs? Can our customer sell themselves? And if so, how do we need to change what we sell to be competitive? What are we good at? What companies should merge to exploit inherent opportunity?  These are the opportunity costs of the new era that will emerge from BDA. To gain insight from data, we must first ask the right questions of it…


Big Data: Data Science emerging field to support new levels of understanding

note: lots of good embedded links this time 🙂

Today’s blog is one of a pair of essays I’m going to write on two different perspectives associated with Big Data. I believe Big Data is notably important and impactful to the immediate future of our culture. First we’ll look at what Big Data is and the dynamics of dataology, or what most people call Data Science.

I am a latent Data Scientist.  It was probably my calling. Today I’m in the sales function of our business, but I started my career with a Masters in Database Management. I was doing C++, B-trees, Bloom filters, working with SAS  data sets and taking endless classes in statistics/economics.  Frankly, I loved it. My passion was the data structures and the insight that could be acquired by slicing the chunks.  Organized data, in its prescribed context, is real. It’s not some pundit’s opinion on the 247News.com.  It is based on facts.

 For whatever reason, I ended up going down the “line of business” systems integration path. My career led me into workflow, document management, and supply chain technologies.  These were connective and provided an immersive business process environment, but the treatment of the data was still inefficient, and disrespectful to its value.

Yes, I have been branded a “purist” in my past life, but ultimately, my focus was more on the data and the data models than on corporate profit.  I consistently ran into the walls of “good enough”.  And I realized that rich data models weren’t considered profitable endeavors by the majority of the American industry.  Instead, most wanted little snippets of data, transactional crap, when we knew so much more.

Several industry initiatives showed up: “Business Intelligence” (BI), “Information Lifecycle Management” (ILM) and “Master Data Management” (MDM) all with promise of a better strategy for data, but they were realized into purpose-built tactical systems to address specific real life problems of sales growth, regulatory submissions and company acquisitions. None of them truly structured and exploited the fundamental Intellectual Property (IP) of the data that each company has about their customers, employees, and processes.

Add to these points that we all see “the disconnect” between data points in our daily lives. Banks that don’t recognize the “savings account you” from the “home loan you”. Or, my employee electronic health records system that shows all my doctors visit records and tests, but when I take an online health risk assessment, it still asks me pages of questions about blood pressure and cholesterol.    

Well I am an optimist, and I believe that the current industry initiative called “Data Science” has structural differentiation from past trends and will likely get closer to my vision in several ways.

Data Science will be more scientific with data because:

  • Data is growing in such a manner that we are actually in trouble. We have multiple points of failure in people, process, and technology; and most industry leaders recognize it. Look at this page on the digital universe, if you don’t believe me. 
  • There is a convergence of three powerful movements.
    •  Data generating devices both personal mobile and industrial assets are creating data.
    • Global data mining and analytics is on the rise.
    • Significant improvement in data warehouse and, data analytics technologies allowing for the next level of processing.  Here’s EMC’s Big Data page as an example
  • Big Data is the aggregation and analysis of heterogeneous data sets/collectors.  The concept of big data is an important pillar in the new “Data Analytics” investments which will be pervasive over the next several years.  By definition, Big Data is not just about infinitely large purpose built databases.  It’s like a hive of bees; Big Data to me is broader, more dispersed and hierarchical in nature.  It’s funny we will know less about each individual piece of data, but in mass, we’ll know more about ourselves in many more ways.  Today these initiatives will be funded by the standard engines of power and profit, but some of the most impressive data science I have seen so far is in the scientific community.  Spend an hour watching TED Videos like Hans Rosling leveraging Microstrategy visuals on his HIV data analysis work.  Or one of my favorite books of all time, freakonomics based on the data analysis work of statistician Steven Levitt.

So, I am excited about the advancement of “Science” and “Data” in this emerging field.  EMC Corporation is showing industry leadership in this growing discipline including funding studies and an annual conference for data scientists.   EMC’s Data Science Summit (EDSS11) May 23 2011 brought together an international consortium of data scientists to help define core fundamentals and highlight the building need for resources in this field.  I applaud EMC for stepping into the proactive mentorship of the data industry. It’s a great fit for EMC and a place we need to invest in advancement.  Additionally, EMC just published a survey from the summit.

Here are some of the summary findings:

       Informed Decision-making—Only 1/3 of respondents are very confident in their company’s ability to make business decisions based on new data.

       Looming Talent Shortage—65% of data science professionals believe demand for data science talent will outpace the supply over the next 5 years – with most feeling that this supply will be most effectively sourced from new college graduates.

       Customer Insights—Only 38% of business intelligence analysts and data scientists strongly agree that their company uses data to learn more about customers.

       Lack of Data Accessibility—Only 12% of business intelligence professionals and 22% of data scientists strongly believe employees have the access to run experiments on data – undermining a company’s ability to rapidly test and validate ideas and thus its approach to innovation.

       Advanced Degrees—Data scientists are 3 times as likely as business intelligence professionals to have a Master’s or Doctoral degree.

       Higher-Level Skills—Data scientists require significantly greater business and technical skills than today’s business intelligence professional. According to the Data Science Study, they are twice as likely to apply advanced algorithms to data, but also 37% more likely to make business decisions based on that data.

 You will note in the survey that data scientists are inherently different from BI professionals.  This confirms my beliefs that we’re going somewhere more all-encompassing than a “sales report” and that we’ll spend more time and money on the submerged part of the iceberg.

If you’re interested in next year’s summit click this link EDSS12