Can I take Mission Critical Apps to the Cloud? – THE GOLDEN 7 CONSIDERATIONS

I have been in Minneapolis since Monday to participate in #SAPWeek.  #SAPWeek is a unique experience where EMC brings in their customers, partners and experts, from all over the globe, into a concentrated deep dive whiteboard jam session for a few days. I was at the first in Santa Clara in 2007 and have been attending many each year since. There are very few things that are as consistently more rewarding as exploring the future of enterprise landscapes that run SAP, and that is #SAPWeek.

Right now, customers are re-platforming in droves. Here in the mid-west the topics of interest ranged from “Is it time to go to X86/Virtualization”, to “what is HANA TDI” (and I should I use it), to “is the cloud real for mission critical”.

The quick answer to whether you should roll out HANA with an appliance or TDI is: TDI. You are going to save 20-30% or more of OPX getting off the appliance model. If someone   is advising you otherwise, get new advisers.  Check out my blog on “the HANA Puzzle” for more on that story.

As for the cloud, many customers were…well… shocked here in the heartland that “cloud” is a feasible option for mission critical apps like SAP HANA. I am hear to tell you it is. There are many VERY LARGE companies aggressively adopting the cloud for SAP Traditional and HANA. Since our alignment with Virtustream in 2014 our field teams are very active responding to this market migration.  (If you are not familiar with Virtustream,  here’s a good level set video for you)

During the discussions this week. I referred to a slide I built with Christoph Streubert to help our customers navigate the questions as to whether they can get to the cloud. My laptop was not booting and I promised to post this via a blog. (Commitment DONE)

Golden7 for Cloud

I think the average IT org can take this list and build out a tailored profile/gap analysis to begin to determine the big questions of the cloud:

  • What – What workloads will I move to the cloud (or what environments in my landscape)
  • When – When does it make sense based on my cost and risk profile
  • Why Not – What about my environment hinders me from sending workloads to the cloud.  Make sure you socialize this item.  I am finding these “sacred relics” of the past are actually breaking down as your cross the lines of business. Cloud is compelling.
  • Who – Not all clouds are the same. Make sure your cloud partners are offering:
    • You performance requirements/guarantees
    • You long-term operational costs with significant reductions.
    • Risk monitoring and management

As you dive in to the “Golden 7” considerations, feel free to reach out for an interactive discussion on how to fill out your version of this story


Optimizing on the Question – Big Data Analytics

In life there are opportunity costs. I have spent many a Sunday fretting over whether to clean the basement, watch the game, or spend the afternoon on single track with one of my kids. We unfortunately can’t create a duality that allows us to experience two simultaneous events, I always have to choose across my options. Companies face opportunity costs too. Do you hire head count for Brazil, reduce inventory in mid west, manage COGS, or acquire a company. There are constant decisions that leave alternative opportunities like dust on the floor. Matter of fact, companies are really layers of opportunity cost decisions we call strategies.  Strategies allow us to plan and execute with focus. Sometimes strategies are new and/or overt; sometimes they are organic and implied (a kin to culture). Usually companies have an agenda that is known by most and for this conversation I will call it a strategy. I would say for the majority of companies, strategy isn’t always democratic and out of the box for everyone. There’s a bit of trickle down defined and disseminated. “This is a growth year for us, we need to penetrate the market…”, “We’re concerned about Europe…”, “Divisional goal is 500 million…” or something similar.

Many of us, at various levels in the organization, are bounded by constraints and are asked not to create and advance the cause aimlessly, but to a set of criteria. This is important to align the organization and execute with powerful sameness.  Yet once in a while something happens, the market changes, new tech shows up. When this happens, the technologists get geeky and the C-level gets hungry, scared, or both.  Something “new” arises from the corporation that breaks the rules, and clears paths for new ways of thinking.  Think about the Dotcom wave, it is currently the quintessential example of this scenario.  All of a sudden, 20 year olds were becoming executives, Dotcom companies had valuations that surpassed blue chips, and people who sold socks to dogs were getting millions in VC money. The “rule followers” were punished and mocked as dinosaurs. The companies considered long-standing cultural strategies as broken and many chose to replace process while revolutionizing their offerings.  This worked for some, destroyed others, and embarrassed many in the ranks. We’re now entering a similar wave around Big Data Analytics.

My term for this wave/bubble/era… is Big Data Analytics (BDA). I use this inclusive term for the wave of change that is taking BI and departmental analytics and merging it with social media, global mobile user, cloud, and various forms of big data. Here within this wave again we have many great examples about how knowing more about us (from what data exists around us) allows for new insights and is ushering a new age. There are examples like “The Human Face of Big Data” (#HFOBD) with Rick Smolan, EMC/Greenplum Analytics Workbench on a 1000 node cluster developed to study big data (i.e. twitter, facebook, etc). or SAP’s recent “Real-Time Race” (#RealTimeRace) pitting two System Integrators against each other on stage developing live solutions on HANA. These bits of news get us all thinking “what if”.  However, as a large group of capitalists, I’m not sure we know how to leverage it, how to make decisions around opportunity costs for BDA. We’re in a bit of an innovation bubble and the ultimate question is what can we do to improve and prioritize our choices?

I have a few recommendations for everyone to consider in their business. It’s a short list of 3 things that I believe the smart companies will consider either organically or by reading this blog 😉

1)      Data is the new Information Technology (IT). Read “The Big Switch” by Nicholas Carr and you’ll see that since the birth of the “industrial company”, there has been a “technology” group which consists of smart, well-paid resources who apply the latest tech to business. First it was electricity & machines, then business systems, then computers, then data centers, and I propose the next wave will be whatever Big Data Analytics becomes.  (I would call it “Knowledge Management”, but we already blew that logo in the 90’s…) What this means is BDA will be pervasive and inject itself in many aspects of business, not just creating opportunity to increase revenue through traditional means.  Companies need to build new disciplines that identify and develop data use. Picture the 1960-1970’s. We used vacuum based mainframes, typewriters, rotary phones, and business men took 2 hour martini lunches. The world of iphones, tablets and angry birds is very different. I believe our future world will be thoroughly basted with data driven wisdom which will have even larger impacts.

2)      A Corporate Decision Strategy is needed. How we make decisions and what questions do we ask? We need to be much better at asking questions than we are today. I am under impressed by the long line of shopping basket, alternative offer, or Twitter sentiment studies I see.  This is applying old perspectives of your customer to a field with much greater opportunity. How do we involve crowd sourcing, self-service, social media, and data analytics to change the customer experience, the corporate workforce, and ultimately what is considered corporate core competencies.  I think a great way to start is a review and inventory of BDA capabilities within the business and an assessment of how questions are asked that create data analytics projects. Questions and decisions don’t serve the strategy, they are the strategy going forward.  Also note, the people who can define the right questions are more like artists, musicians, song-writers.  Many know there is a strong correlation between scientists and musicians. Similar to how computing jobs became loaded with creative people, we’ll need to migrate these skills away from programming and towards question development and decision strategy. Good news, the new recruits have grown up as “gammers”, this plays well into foundational skill sets needed.

3)      Change Business Systems – I’ve written assembler code and I’ve run analytics models. They are both very difficult to conceptualize and navigate. It’s a relatively small group of individuals that can execute these skills and thus create a real problem of scale. AI and machine learning are the beginnings of techniques that begin to help the average person to become a contributing member in this new age. We need decision support within BDA to morph from manually running a “K Means” model or hand developing “ordered pairs” to something less academically rigorous, and data capture/management/cleansing to become more intuitive and automated.  If we can automate “Mario Kart” as far from assembler as it is today, we can do the same in the era of BDA.

As companies refine and develop their BI/Data Analytics programs into the era of BDA. I think there’s an equivalent need to rethink the questions they ask. Do we need less sales execs? Do we need more data security?  Do we want to know what else we could sell someone buying running shoes, or do we want to help them design a new shoe specifically match their demographic/health needs? Can our customer sell themselves? And if so, how do we need to change what we sell to be competitive? What are we good at? What companies should merge to exploit inherent opportunity?  These are the opportunity costs of the new era that will emerge from BDA. To gain insight from data, we must first ask the right questions of it…

MAKING SENSE OUT OF ANALYTICS… It’s time to progress this!

Big Data”.  A phrase destined to live a long life because it’s catchy and amorphous enough to mold itself into the current conversation. There is a great deal of literature, lesson and lore now floating around on this topic. We know Hadoop customers store petabytes of unstructured content, we know there are approximately 5 billion cell phones in the world producing both data and mobile consumers, we know Facebook is this important global mind meld of  puppy-dog pictures, farmville, and patriotic prose. Today, I want to involve you in what has been speed walking through my brain, how do you take something as esoteric as analytics and apply big data to it?

If you’re not a data scientist already, go load the analytics package “r” on your laptop. Then, create yourself a list, an array or a matrix of data and run some of the analytic functions contained within the package (i.e.  t.test(), lm(), kmeans()) . If you do this,  you’ll quickly learn that:

a)      This is super deep complex activity that combines statistics and programming

b)      Not suited for the majority of brains in the IT industry today

c)        Manipulated, massaged, converted, and carefully transformed by humans from one analysis to the next making decisions as you go.

d)      And… the data for each analysis isn’t the most aggressive volume of data (in no. of gigabytes) you’ve seen before.

If this is true, how do we answer the following questions:

1)      How does the data get so big?

2)      How do we apply Big IT to Analytics?


Ok, let me baseline you on linear regression.  Linear Regression (LR) is the gateway drug to predictive analytics. LR is the assessment of existing data that shows a linear pattern as you traverse a set of variables. This is called “Best fit” or Least Squares Regression Line. With luck your data will provide a best fit that is so linear, when you predict one variable, it gives you an algorithm (or coefficients) to predict the other.  In the link I reference (here), is an example of tracking two variables “age” and “height” across a sample.  The result is an algorithm where we could enter an “age” and get a suggested “height” back, thus providing predictive capabilities.

There are other algorithms beyond LR. Shopping baskets are processed as key value pairs (KVP). Combining each pair combination looking to trends in buying patterns. KVP is in demand today, and it’s a culprit in creating large amounts of data. A classic example is predicting what consumers buy. Everyone always talks about groceries because everyone buys groceries and they buy lots of things each trip.  Being able to predict what people will purchase, would allow grocery stores to better serve, while reducing costs to help their razor thin margins. This is why there are so many grocery store examples…

However, I’d prefer a new target for our discussion. Its beach time, I’m heading to a North Carolina beach destination soon, so let’s do beach shop souvenirs. If you’ve been to a North Carolina beach you know it’s all about lighthouses, pirate legends, and casual fun.

In our example there are:

–          Pretty T-shirts

–          Rebel/Pirate T-shirts

–          Surfer/Cool T-shirts

–          Sea shells

–          Bow covered flip-flops

–          Pirate gear

–          Cheap surf/water gear

–          Sun tan lotion

–          Postcards

–          Lighthouse gifts

–          Boat-in-a-bottle gifts (BiaB)

Review my list, I think we can predict a few type of shoppers.  The contemporary “southern belle”, the “pirate on the inside” and the “surfer dude wannabe”. I would speculate that these three shoppers would tend to buy like in these patterns:

–          Southern Belle – Pretty T-shirts, bow flip-flops, sea shells, sun tan lotion, postcards, lighthouse gifts, and BiaB

–          Pirate Pete – Pirate/Rebel T-shirts, pirate gear, and BiaB

–          Spicoli Dude – Surfer/Cool T-shirts, Sun tan lotion, surf gear

Note, I speculate on my mental library of personal observations, KVP speculates based on data.  To come to a more definitive conclusion, we would use KVP to process a day’s worth of transactions. If we ran these tests, you would be able to appreciate the vast number of combinations to consider. If our goal is to identify correlation or causality in product purchase relationships (say a person who buys a shell, likely to also buy lighthouse at a 95% confidence), you have to consider all the combinations of purchase relationships across a large number of receipts. This mean comparing: one to one combinations, 2 to 1 combinations, up to N to 1 combinations where N is the number of items purchased (data scientists…yes this is a simplification, be kind with your technical assessment of it…). Now apply those combinations across 100’s shoppers in a given day, across a chain of stores, across the summer season.  Now imagine you’re a global retail giant. What rubik’s cube of potential value this could be, and how much data gets generated in the process. It’s the combinations of assessment that make the data growth sky rocket off the charts.  Now think about how you manage it, communicate it, leverage it, and do you ever throw it away?


I know the word Big IT sounds like something we’re trying to get away from, right? Scale up is dead, scale out is hip. However to me Big IT is the lessons we learned about mission criticality, scale, consolidation/virtualization, and service levels that run all our companies today. We have a hoard of global IT professionals keeping the lights on and analytics has to make the jump from academia and departmental solutions to Big IT to get to Big Data. If we really want to know infinitely more about us (US defined as myself, myself with others, others without myself, only men, only women, only women in Europe, only teens who play football, I think you get it…) we need the IP and assets we’ve developed in the last era of scale up.  We need the connectivity, we need the structure of  things like ITIL, we need hiccup tolerant approaches that run on systems management tools, not hundreds of IT resources flipping out blown out components  they bought at a school auction. The “science project” has to become big business.

So how do we apply something that is as complex and focused as analytics to IT? I think its organizational changes that provide a vehicle for architectural changes.  Companies need to consider a “Chief Strategy Officer” (CSO) role as a binding force. Some companies may make this an engineering position or a marketing position based on their primary culture, but the role should exist and that role needs to set an analytics strategy for the company.  First they should define an analytics mission statement. “What are we going to do with analytics within the company?” Then they need to answer the basic questions about what we know of: our employees, our customers and our processes. Additionally they need to ask what in the big “datascape” do we want to bring into our analytics engines to accomplish our mission.  With this they can set an architectural strategy that leverages old tech and incorporates new tech to meet the mission objectives. Otherwise the company is locked in silo’d perspectives and can’t get to the bigger order items, it’s hard to construct this monster bottom-up.  Many of the companies I talk to who are starting enterprise programs, still seem to be searching for the how to bring it together. The answer is, just like the CIO organized IT, companies need a c-level resource to define the charter.

With the correct organizational structure, a company can then look at an architecture that can index the outside bits for later use, adjudicate the data flow to peel off the useful content into higher functioning data stores and then apply analytics packages to distill insight. Techniques like machine learning will help automate the processing and allow the super smart operators to become more productive. And, the programmers will write applications to get the global-mobile user in active participation by both creating and consuming the information within the process.

Taking a Seat at the Big Data Table

Today EMC made a concerted effort to lay down the “table stakes” at the Big Data table, quietly demanding respect for their bid as an anchor player in the future of Big Data. At the start of day on the tech heavy west coast, word rang out of the EMC announcement in the industry press channels. Today, EMC announced their answer to expanding collaboration between data scientists, and the acquisition of the world-class development house of Pivotal Labs.  In a world inundated by an almost endless amount of corporate-generated press releases, this was noteworthy through the noise.

 Why?  Let me give you a few points to consider. Ok first, have you ever had a need to execute a mental task or two on your computer only to find that your virus scanner was running, or the company is installing something behind the scenes and you are relegated into feeling like you are living in the Slow-mo footage at ESPN? You wait seconds if not minutes everything slows down and somewhere in the process you realize you have forgotten EVERYTHING you were doing and your creative energy is all but drained from your person. You might as well just do email now…

Well I tell you this loss is pervasive in our IT world, often we lose the ability to execute on our most brilliant ideas because of timing, multi-tasking churn, and the inability to effectively execute on collaborative activities, all before the constant rush of information again flows over our levees’ and we’re buried in the next rush of content. Like waves at the beach we only have so long between the intervaled poundings to make real progress.  So if we imagine this to be true of the basics of our business lives, now imagine the pace of predictive and/or low-latency analytics where the provisioning, processing, analysis, decision and actions have to happen in a day, or an hour, or a second, or a micro-second.  We’ve got to get better at our ability to process in the cycle-times we have. Of course to address micro-second requirements will require machine based processes, but there is an immense opportunity to refine our cycles that fall in the realm of human processing.

One of the most critical tools of human invention is timely collaboration. Humans make better decisions faster when they can leverage readily available tools, data and peers. Imagine the perfect world where you can get data when you need it, you can get advice when you ask for it, you can push a result to another for review without technical limitations, and you didn’t even have look at an email to do it.  This is the big step EMC made today by announcing “Greenplum Chorus”.


Greenplum Chorus is a collaborative environment that allows the users (data scientists) to self-provision space, capture, transpose, assess, and share chunks of data without approvals, requests, or IT intervention.  We’ve seen how impactful social tools like Twitter or Yelp have been. The ability to connect, share and discover has profoundly changed the world, and now we have a similar social lever to use in the advancement of data analytics.  Chorus will additionally be a landing zone for an eco-system of 3rd party offerings allowing for freedom to advance the evolving strategies of its user community.


WOW, if that wasn’t enough to talk about.  The acquisition of Pivotal Labs is the laser sharpened sword hidden in a gentleman’s cane.  These guys are extremely competent, experienced, and pedigreed. If you were look at their customer and collaborative projects list, it’s the who’s-who of this IT age. Technically Pivotal Labs brings to the equation, their focus on agile development tools and an invaluable resume of cloud, analytics and mobile success stories.  In addition, Pivotal Tracker is an industry leading agile development platform with hundreds of thousands of active developers currently using the platform. Akin to SAP’s purchase of Business Objects, this provides EMC with not only great technology, but an existing eco-system to embolden the go-forward plan.


Easy as 1-2-3

EMC Greenplum is known for their high performance data driven solutions which are cloud ready. Now you add a collaborative platform that engages the users in socially empowered self-service model, and an integration platform to pull the story together and you can see an impact player in the making.

Big Data: The Quiet, Quick Death of Privacy

note: use the embedded links its good stuff! 🙂

Last week I wrote the first of a two part blog on Big Data. I discussed the evolution of digital data, the series of initiatives that have appeared to manage data and the impact Big Data will have on our current world. Today I want to discuss another aspect of Big Data and that is information privacy.

I will start by saying I am a bit disturbed. Not by progress, because I enjoy evolving. My concerns are in the dichotomy between massive change and the blind involvement of the average human in this process. It is a common sight to see a mother driving a Lexus with a phone propped against her ear, kids pushing Facebook photos to friends, or a college student typing out a few keystrokes on twitter in the local coffee shop. We’re all active participants in massive information usage. We are also the first generation in modern history to see a permanent revocation of privacy rights. Many would say these minor losses of privacy are needed to fight terrorism or to keep up with the criminals; but I would state the changes in technology and our new found social obsessions are equal contributors to this loss of privacy.

Very few of us are considering the changes in privacy in a broad societal context. We leave most decisions to the ill prepared politicians to poke their finger in the shadows of the black box without understanding the ramifications of their actions. Or, to those who will capitalize on the opportunity. Those who will leverage this information, possibly for great or nefarious purposes. I hope you see in this article I am not here to vilify anyone or to say we need the level of privacy we have had. Instead this is a call to action for us all to think about what this all means. I’m asking you to be the “cowboy”, not the “cow”. Big Data means data everywhere; it will undoubtedly impregnate our lives for the foreseeable future.

This blog will be broken into three parts. The first part will quickly hit upon the historical rise of freedoms, the second part will focus on the current world of information collection, and finally the third part will ask the question should we care? Before diving in, I will tell you that I have no “legal chops”. This blog provides a dime store tour of some of the legality of privacy. Take it for what it’s worth.

PART ONE: A Brief History of Privacy
Today’s modern view of rights and liberties principally start in 1215 with the “Great Charter” or Magna Carta. There was little in the way of specifics defined within the big “MC” related to personal privacy, but it kicked off a vein of thought that became our current global era which has afforded more international freedoms to more people today than in any other known period of time.
From the 1200’s through the 1700’s most rights and liberties focused on the freedom of religion, incarceration, taxes, and property ownership. Virginia led the colonies in drafting the “Virginia Bill of Rights” in 1776 which later became the starter document for the federal bill of rights Using the site, I was able to understand that much of the momentum for our privacy and protection in the constitution, came from the findings associated with a handful trials like “Wilkes v. Wood” where Wilkes won a suit against the practice of issued royal warrants allowing appointed agents to ransack the designee’s homes and seize their books and papers. These warrants were issued against those who spoke ill of the King. The revolutionary war and our formative years as a country solidified our resolve for liberty and gave birth to a new foundation of human rights.

Going forward through the decades, America and England set many standards for personal freedoms. New legislature and case law grew in support of the concepts of privacy, “Invasion of Privacy” to be specific. Interesting enough technology played a big role in the development of privacy. Newspapers, cameras, and TVs allowed the common man to see what was happening in mass and to begin to influence the law makers and courts to better protect privacy. Examples include tort laws on invasion of privacy which allow you to sue against those who intrude on your properties. One of the biggest “rocks in the river” happened in 1961, the Fourth Amendment to the US Constitution.

“The right of the people to be secure in their persons, houses, papers, and effects, against unreasonable searches and seizures, shall not be violated, and no Warrants shall issue, but upon probable cause, supported by Oath or affirmation, and particularly describing the place to be searched, and the persons or things to be seized.”                  – Fourth Amendment to the US Constitution

The fourth amendment protects against unwarranted search and seizure. To me what is interesting is the fact that our deep seated concept of privacy is actually a fairly recent addition coming into its current form just before I was born.

It is important to point out that privacy law is subjective. Case in point: the term “of public interest” seems quite broad, but is a key point leveraged in the legal interpretation of “invasion of privacy”. Case point: celebrities have less privacy rights than the average American because of such interpretations.

So I hope I didn’t bore you in the review of our collective, traditional perspective of privacy rights. I’ll finish part one with some related quotes through history:

“But no part of the property of any individual can, with justice, be taken from him, or applied to public uses, without his own consent, or that of the representative body of the people.” – John Adams 1776

“If men through fear, fraud or mistake, should in terms renounce and give up any essential natural right, the eternal law of reason and the great end of society, would absolutely vacate such renunciation;” – John Adams 1772

“All tyranny needs to gain a foothold is for people of good conscience to remain silent. “ -Thomas Jefferson

“It all comes from, I would argue, this right to privacy that doesn’t exist in my opinion in the United States Constitution . . .” – Rick Santorum 2003


PART TWO: Current Events in Information Capture
One can say that up until 1990, privacy was a physical thing. A search warrant protected privacy of one’s space. You were required to have proper cause and approval to invade someone’s personal space. Today the digital world has no analog to protect your digital and physical privacy from a new wave of technologies.
In this section, let’s look at several current events that relate to the capture and use of personal information

Carrier IQ Article, Gigcom – Carrier IQ is a product that can be preloaded on your cell phone or device that can literally capture every action and designed to send that data to some collection zone. It was engineered for the cell business to create new knowledge, and arguably profit, from the data collected. Apple quit using it in iOS5. My carrier Verizon has a policy against this product, but other carriers like Sprint and AT&T have used it in some form. This article included a poll on whether respondants would change purchase behavior based on this knowledge, at the time of writing, 82% voted they would.
House Committee Bill on Data Retention Mandate, CNET – Here’s an excerpt from this article:
“Internet providers would be forced to keep logs of their customers’ activities for one year… 19 to 10 vote represents a victory for conservative Republicans, who made data retention their first major technology initiative after last fall’s elections, and the Justice Department officials who have quietly lobbied for the sweeping new requirements… rewrite of the bill expands the information that commercial Internet providers are required to store to include customers’ names, addresses, phone numbers, credit card numbers, bank account numbers, and temporarily-assigned IP addresses, some committee members suggested. By a 7-16 vote, the panel rejected an amendment that would have clarified that only IP addresses must be stored. [so the all data would be stored…] It represents “a data bank of every digital act by every American” that would “let us find out where every single American visited Web sites,” said Rep. Zoe Lofgren of California, who led Democratic opposition”
There is also an interesting “Government Snooping Timeline” in the article. Popularity can only be judged by the comments that follow the blog. They were damning…
CIA Mood Ring: Monitors Twitter and Facebook, Computerworld  – The CIA monitors up to 5 million tweets as day says Computerworld. They do this to rapidly assess the global sentiment. Feeding results to various governmental recipients like the president. The article additionally states that Homeland Security is looking at guidelines for privacy rights while it monitors social media. This is self-regulating and questionable if it would be successful.
Town Center to Monitor Christmas Shoppers w/ their Cell Phones, – These things hit close to home, my local town center “Short Pump” announced plans ( that they would track everyone that entered the town center via their cell phones and aggregate the data to determine shopping patterns. Though they did provide some news information on the web, the actual shopper would not be notified of this activity. Approximately 1 week later ( they announced they would be pulling the system out at least temporarily because of the backlash. However they imply they will be working on enhancements. I bought my presents at the competing malls this year.

Surveillance Catalog Article, WSJ – This article and video provide a laundry list of  new hacking, monitoring, and intercept technologies. Hey maybe something for under the tree?
CNN News – Just this morning I watched CNN where they talked about new surveillance companies providing police forces and other governmental agencies data collection and mining tools that take information from public wifi’s by penetrating laptops and devices on the network, even if they were password protected in some instances. Their next report was a discussion of drone planes and whether would we ever use them in American skies?
SCAN VAN – Ok… if you have survived the others without a flinch, this one has to get you a little. The company American Science and Engineering has installed and is selling backscatter vans (or x-ray vans) that can be driven up and down the road visualizing all the contents of houses and/or cars. To be clear through brick walls and through clothes while you pass on the street. Here are 2 news reports that differ in the support of such technologies as appropriate. (Young Turks, FoxNews worth watching both…). Once again we all have to decide if there is a line and where that line is.

So, hopefully you now realize that data is collected in very new ways that is without a doubt invasive by traditional views of privacy, yet it feels so right as we leverage the inter-connectiveness of our personal device world.

PART THREE: What’s an Addict to Do?
In 1949 George Orwell wrote the epic book called Nineteen Eighty Four about a society ruled by an authoritarian group called “the Party”. The people were constantly watched with little to no freedoms, they are surrounded by pervasive surveillance, they were subjected to mind control, and they were perpetually in a state of war. With constant devices and 24×7 news, sometimes I feel like I am on mind control…

Whether you’re ready to grab an aluminum hat or if you’re an “out of sight out of mind” type, I hope you see the acute changes that technology has brought upon our actual privacy and thus our rights. The question is what do you do about it? Do you shut off your phone? Put down the iPad? Write your congressman? Write a blog? Change your shopping patterns? Start a new data mining company? The answer is to take action, whichever you see fit.

As a society we need to step up our societal skills. There is lots of “whitespace” to be defined today and most of us are too busy with kids, soccer games, quarterly reports, and the daily DOW to focus in on these issues. If Thomas Jefferson or George Mason had followed our lead, we wouldn’t have the America we have today. Our founding fathers were consistent in their concern in the fragility of our democracy and our freedoms. They were extremely worried we would become complacent and apathetic to our luxuries. They were right. Form an opinion and share it. Maybe privacy is overrated… Maybe privacy and freedom are loosely correlated and do not represent a causal relationship. These are the things we should be discussing. Let’s formally and proactively reach that conclusion if so.

Big Data is coming and it is a positive force and will better our lives and our economy. Yet, I don’t want others defining my future while I’m watching a hulu clip.

The opinions written of this blog are solely my own and not those of my associations.

Big Data: Data Science emerging field to support new levels of understanding

note: lots of good embedded links this time 🙂

Today’s blog is one of a pair of essays I’m going to write on two different perspectives associated with Big Data. I believe Big Data is notably important and impactful to the immediate future of our culture. First we’ll look at what Big Data is and the dynamics of dataology, or what most people call Data Science.

I am a latent Data Scientist.  It was probably my calling. Today I’m in the sales function of our business, but I started my career with a Masters in Database Management. I was doing C++, B-trees, Bloom filters, working with SAS  data sets and taking endless classes in statistics/economics.  Frankly, I loved it. My passion was the data structures and the insight that could be acquired by slicing the chunks.  Organized data, in its prescribed context, is real. It’s not some pundit’s opinion on the  It is based on facts.

 For whatever reason, I ended up going down the “line of business” systems integration path. My career led me into workflow, document management, and supply chain technologies.  These were connective and provided an immersive business process environment, but the treatment of the data was still inefficient, and disrespectful to its value.

Yes, I have been branded a “purist” in my past life, but ultimately, my focus was more on the data and the data models than on corporate profit.  I consistently ran into the walls of “good enough”.  And I realized that rich data models weren’t considered profitable endeavors by the majority of the American industry.  Instead, most wanted little snippets of data, transactional crap, when we knew so much more.

Several industry initiatives showed up: “Business Intelligence” (BI), “Information Lifecycle Management” (ILM) and “Master Data Management” (MDM) all with promise of a better strategy for data, but they were realized into purpose-built tactical systems to address specific real life problems of sales growth, regulatory submissions and company acquisitions. None of them truly structured and exploited the fundamental Intellectual Property (IP) of the data that each company has about their customers, employees, and processes.

Add to these points that we all see “the disconnect” between data points in our daily lives. Banks that don’t recognize the “savings account you” from the “home loan you”. Or, my employee electronic health records system that shows all my doctors visit records and tests, but when I take an online health risk assessment, it still asks me pages of questions about blood pressure and cholesterol.    

Well I am an optimist, and I believe that the current industry initiative called “Data Science” has structural differentiation from past trends and will likely get closer to my vision in several ways.

Data Science will be more scientific with data because:

  • Data is growing in such a manner that we are actually in trouble. We have multiple points of failure in people, process, and technology; and most industry leaders recognize it. Look at this page on the digital universe, if you don’t believe me. 
  • There is a convergence of three powerful movements.
    •  Data generating devices both personal mobile and industrial assets are creating data.
    • Global data mining and analytics is on the rise.
    • Significant improvement in data warehouse and, data analytics technologies allowing for the next level of processing.  Here’s EMC’s Big Data page as an example
  • Big Data is the aggregation and analysis of heterogeneous data sets/collectors.  The concept of big data is an important pillar in the new “Data Analytics” investments which will be pervasive over the next several years.  By definition, Big Data is not just about infinitely large purpose built databases.  It’s like a hive of bees; Big Data to me is broader, more dispersed and hierarchical in nature.  It’s funny we will know less about each individual piece of data, but in mass, we’ll know more about ourselves in many more ways.  Today these initiatives will be funded by the standard engines of power and profit, but some of the most impressive data science I have seen so far is in the scientific community.  Spend an hour watching TED Videos like Hans Rosling leveraging Microstrategy visuals on his HIV data analysis work.  Or one of my favorite books of all time, freakonomics based on the data analysis work of statistician Steven Levitt.

So, I am excited about the advancement of “Science” and “Data” in this emerging field.  EMC Corporation is showing industry leadership in this growing discipline including funding studies and an annual conference for data scientists.   EMC’s Data Science Summit (EDSS11) May 23 2011 brought together an international consortium of data scientists to help define core fundamentals and highlight the building need for resources in this field.  I applaud EMC for stepping into the proactive mentorship of the data industry. It’s a great fit for EMC and a place we need to invest in advancement.  Additionally, EMC just published a survey from the summit.

Here are some of the summary findings:

       Informed Decision-making—Only 1/3 of respondents are very confident in their company’s ability to make business decisions based on new data.

       Looming Talent Shortage—65% of data science professionals believe demand for data science talent will outpace the supply over the next 5 years – with most feeling that this supply will be most effectively sourced from new college graduates.

       Customer Insights—Only 38% of business intelligence analysts and data scientists strongly agree that their company uses data to learn more about customers.

       Lack of Data Accessibility—Only 12% of business intelligence professionals and 22% of data scientists strongly believe employees have the access to run experiments on data – undermining a company’s ability to rapidly test and validate ideas and thus its approach to innovation.

       Advanced Degrees—Data scientists are 3 times as likely as business intelligence professionals to have a Master’s or Doctoral degree.

       Higher-Level Skills—Data scientists require significantly greater business and technical skills than today’s business intelligence professional. According to the Data Science Study, they are twice as likely to apply advanced algorithms to data, but also 37% more likely to make business decisions based on that data.

 You will note in the survey that data scientists are inherently different from BI professionals.  This confirms my beliefs that we’re going somewhere more all-encompassing than a “sales report” and that we’ll spend more time and money on the submerged part of the iceberg.

If you’re interested in next year’s summit click this link EDSS12

“One Throat to Choke” – Who’s Kidding your Hands Don’t Fit.

Well here I am, back from Oracle OpenWorld and banging out another week of work. OOW for me was 3 days, jam packed full of interactions with our customers, partners, and Oracle.   I found the show to be informative, and I definitely realized how invested EMC’s customers are in the Oracle-EMC solutions.  EMC and Oracle have been in an industry interlock through the last 2 decades together supporting some of the most impactful business processes in the world. As I talked to an endless stream of noteworthy customers, I felt that connection completely.  While there, I also poked around, attended keynotes, and talked to everyone who dared to look at me (a mistake they will not make twice!).

I’d like to tell you about all the insights I took away, but frankly, I can’t keep your interest for that long. So, let me work to edit and organize my thoughts. Today I will work in Vignettes. Why? Because, no one has told me I couldn’t, and I am feeling creative.

Vignette One: “Crazy Mixed Up [Open]World”

The scene opens in a super large room with bright lights and thousands of people, otherwise known as the Keynote hall.  Monday morning, I had the privilege to sit up front with a few of our customers and some serious players from EMC. Monday morning Joe Tucci, CEO of EMC, kicked off the keynotes. After a strong rally of all the great things EMC is doing around the Cloud (web coverage:, He introduced Pat Gelsinger who in turn introduced Chad Sakac. Pat and Chad had an entertaining presentation on EMC’s Big Data strategy. Data growth, EMC Greenplum, virtualization, analytics engines; many topics were reviewed in the context of EMC innovations. Pat also held up a new piece of EMC technology, Lightening flash cards . In beta, the Lightening flash cards have a CPU mounted on the blade, and they will provide a reported 320GB of lightening fast flash per card. Chad followed Pat’s lead and began to demo VMware’s integrated VFabric Cloud Application Platform (related coverage: This really showed how to take customer analytics requirements down through the software and hardware.  They also showed the card at work as it vaporized performance problems live on stage. The two ended by comparing the hypothetical auto insurance costs between 3 constituents, you may have heard of: Gelsinger, Tucci, and Larry Ellison. Larry’s was the most expensive since it had a jet and racing boat in his fleet of vehicles.  It was a humorous way to end, and a good time was had by all. 

What struck me was that in just a handful of years, how EMC was no longer a storage company, but a bundled solution company, much of what was noteworthy in the presentation was all the software and software integration that has been developed. And, other than the Lightening blade that Pat held up, there was little mention about the hardware.  Following the event, it was my job to take Pat to a meeting with a CIO from one of our customers. In that meeting and throughout the day, you could tell that the keynote message and the energy resonated.

Following the EMC keynote, Oracle took the stage for a couple of hours and presented on Exalytics, SPARC Super Cluster, they also reviewed Exadata and Exalogic updates. By circumstance, they reiterated many of the same functionalities EMC had discussed, but with an Oracle-specific platform to support Oracle apps. In their presentation, Oracle took shots at many of their new infrastructure competitors “23x faster than”, “more gigabit capacity for”, and “2 more DRAM of that”. The dialog continued…

Whether I was bored or in a new enlightened state, there listening to the keynotes it hit me. Like an episode from the “Twilight Zone”, our two companies had switched places. “Freaky Friday”, but it’s only Monday… We were spending our time talking about software and Oracle was spending their time talking about hardware. I wonder how many of the thousands of people listening thought the exact same thing?  It goes to show, as Joe Tucci said, “Cloud is the most disruptive tech wave ever”. The vendors our customers have worked with for years are going through notable changes to provide for a new era of IT technology. The good news, customers have quality options to fulfill their requirements with, and they will vote with their wallets.

Vignette Two: “Congestion at the Intersection of Cloud Meets Big Data”

Ever been in a canyon in Arizona when a thundering horde of cattle came pounding in your direction? I’ve done some hiking in New Mexico and Arizona in my life and…well ok I saw a cow or two, but no stampede. The closest I ever came was last week at the EMC booth at OOW11. About every 5-10 minutes we would run a theater presentation and as the crowd left, you’d literally watch the booth staff step aside to avoid being trampled. I didn’t count them personally, but I know that way more than 13,000 people took a few minutes to talk to an expert or watch a show in our theater. Additionally we had EMC IT speaking about our transformation to virtualize our Oracle databases internally, we had EMC TV taping customer testimonials, and our meeting space was packed for 3 days straight. Unlike a traffic intersection, there’s always room for more.  Come join the movement!

Vignette Three:  “One Throat to Choke – if you have hands the size of Manhattan”

So let’s talk a little about clouds. The cloud is a lot like a Mainframe(MF), without ownership issues… What you say?!?  Stay with me here…if you look at the systems in support of PaaS, IaaS, SaaS, etc. what are their major features: Virtualization, Scale, Consolidation, Multi-tenancy, Systems management, Chargeback, etc.  They are in ways very similar to a big MF from the 1970’s.  A major difference is that the MF was a vertically integrated mostly proprietary single sourced product. The efficiency of the system was high, but the flexibility of user to choose how she used the system was limited and costly.  It’s taken us 30 years to get back to the same concept with a small but massive innovation: choice.  The cloud is the cloud because it’s democratic. It’s made up of many providers providing a litany of options on open systems. You get the benefits of MF on a hyper scale.  These key concepts are the essence of the “tipping point” (Malcom Gladwell)  for the next wave of IT, and these concepts are what bothered me about Oracle’s strategy as they too join the cloud. 

The keynote on Wednesday claimed that Oracle’s new public cloud offering is great because it’s standards based. This claim mainly hung on Java as the development platform. It was said many existing clouds and enterprise software are not valid because they are not based on these similar standards.

Yet now for the third year in a row, Oracle announced new appliances and a proprietary version of Linux that continue to drive the Oracle apps and DB owners to single sourced, primarily proprietary solution. Luckily for the thundering horde, there are good alternatives that offer better alignment to their entire IT strategy.  However, it’s the overwhelming message that this is somehow good for the industry, is what I would call, un-productive to the cause. A clear eye will see this as a trip “forward to the past”, back to a world Tom J. Watson would recognize.

Vignette Four: “It’s Easier to Ask for Forgiveness than Permission”

A man walks into a doctor and says “It hurts when I run my Oracle apps without Virtualization”, the doctor says “then virtualize”.  If there was a predominate dialog running through the entire show it was customers asking if, when and how they can virtualize Oracle.  Oracle has traditionally tried to make it difficult to virtualize Oracle using VMware; [assumptive] because a lack of VMware drives demand to their appliances and thus OVM. However this has been a small puddle in the path of progress that many have already crossed for both non-production and more recently production DBs. With vSphere5 the limitations have been removed and now it’s on a normal technology adoption cycle.  I already mentioned that a company as big as EMC is converting to an approximately 99% virtualized environment. We will see many customers virtualize the database in 2012 as described in this recent press release on American Tire Distributors.  Of the customers I spoke with, their primary concern was that Oracle support contract states, if there is a problem that can’t be resolved the customer may have to migrate to a physical environment to resolve it.   That’s not a crazy statement to have in a support contract, and it’s also not crazy for customers to be highly concerned about how this statement will be leveraged.  I appreciate that this big opportunity, to better really important IT environments, is also a risk because they are so important. This is why a natural technology adoption cycle exists, and it is similar to the virtualization of MS Exchange debate 5-6 years ago. We’re way past that one, the databases are next to be taken by the Virtual Tsunami.

Two recent surveys came out that I want to bring to your attention.

  • Storage Attach for VMWare Environments (Source: Goldman Sachs IT Spending Survey, March 2011)
    • EMC went from 33% (Dec 2010), to 40% (Feb 2011)
    • next closest competitor was 17% (Feb 2011)
  • EMC #1 Choice for Application Storage (Source: IDC’s Wrldwd Qtrly Strge Sys Tracker, Mar 2011,SUDS Survey )
    • Across seven categories including Oracle, SAP, SharePoint, Exchange, VDI, Analytics, EMC is #1. 
    • The 2nd and 3rd positions were not swept by any other vendor.

At EMC, we see this happening. There is no doubt the train has left the station, it’s your decision which car to jump on, or if you’re taking alternate transportation.

 Vignette Five: “Tragically Upstaged”

On Wednesday, Larry Ellison held the keynote. If you’ve never seen Larry present he’s casual, charismatic, and poisonous to his prey.  For Wednesday the prey was SAP & Like an XBOX shoot’em up, there was gore everywhere; if you avoid the inaccuracies, it was a great demonstration sleight of hand and showmanship. He also announced a few new offerings that I should spend some time on, but I’m going with a different angle here.

What really interested me happened close to the end of the keynote. Let me take you back to my blog “Shake Rattle and Roll”  where I talked about the different technologies I used versus my kids during the east coast earthquake. I wasn’t on social media and thus less informed and connected than my daughters.   I am here to report I am reformed!  I was on twitter during Larry’s speech typing and reading. It is there where I got the sad news about Steve Jobs. I then watched people begin to get up and leave the keynote, first a trickle, then a flow, then a flood.  Those who were not on social media probably didn’t know what was happening.  I however, was connected to my fellow techies at that moment, and though be it that I was completely bummed by the news, I felt I had closed the gap just a little on the iGeneration.

Snakes Swallowing Hippos – Reducing the Gag reflex

I don’t want to imply the world’s corporations are “snakes”. However they remind me of a viral video I once saw of an African Rock Python who had swallowed a small hippo whole. The major growth into the world of “big data” resembles this metaphor. Many data analytics initiatives are like hungry snakes consuming colossal amounts of data at rates we did not conceive of five years ago. Ingestion, query processing, provisioning, indexing/data structures, backup, availability, retention, and governance requirements are all viciously attacking the traditional architectures, and we’re all learning the lessons that come with each innovation in the field.

Someone once explained to me that system performance is like plumbing. If you have a system full of partial clogs, but the water volume is 1 % below capacity everything works great. Once you reach capacity on any one spot, the system slows or stops (then your wife yells at you). If you remove the clog, that actually frees up more volume that may create another delay later in the system. In other words, the software and hardware that make up a computer stack create a complex and interdependent system. Growth can outpace one or more capacities. Most of the challenges I see in our customers are related to limitations in capacity and throughput as the business world begins the era of big data.

 I want to point out that cloud, in theory alone, does not free us of this burden. These same agitants will transfer to the cloud. Cloud providers have to work diligently to tune capacity and utilization in multi-tenancy environments so that as you float around in your cloud, you don’t accidently bump into the harsh side of an iron box. This has been witnessed in various news clips lately. Cloud will help by the function of major consolidation allowing for better utilization and allowing for shared “slack”, but it is held to the same rules of physics. “If you run at a wall, eventually you will hit it”.

 OK so back to data analytics, we are in a powerful growth phase of big data and there is a community of vendor companies on the edge of this movement. That includes traditional providers of data management software like Oracle. It also includes new innovations like EMC’s Greenplum data warehouse solutions, and Isilion for scale-out of unstructured data (i.e. videos, GIS maps).

I bring this up because there are only a few events a year where so many of the experts get together to discuss these topics, one of those opportunities is in two weeks. If you’re game, and you can find a hotel room, dive head first into a sloppy big mess of big data @Oracle OpenWorld (#oow11). OOW will be held in San Francisco at the Moscone Center starting Oct 2nd with a reported attendee list of approximately 42,000 people. The majority will be bringing some of the freshest challenges and ideas to what will be a very interesting soirée.

So if you’re a snake looking to swallow a hippo, this show is a hotspot. EMC has been a long time player with a large customer base in support of Oracle and other mission critical applications. When it comes to big data, we help reduce the “gag reflex” of consuming large quantities of the stuff. Both the associated Oracle teams and the big data teams at EMC spend their time divining answers to the challenging questions of how to speed, backup, de-duplicate, copy, protect, and virtualize data systems to support the rate of growth. Imagine hundreds of guys as smart as Sam Lucido (@Sam_lucido) running around with the single mission to help customers “digest the hippo”. If you are going to Oracle OpenWorld here are some of the key topics we’ll be covering and their venue:

 o (Session ID 33580) Virtualized Oracle: More Performance, 80% Faster Provisioning and 50% less Cost Monday @ 11am

o (Session ID 33640) Oracle Data Warehouse meets Big Data: Blazing Fast Loads, Queries and Analytics Monday @3:30pm

o (Session ID 33600) Deduplication and Oracle RMAN: More Full Backups, Faster Recovery, Less Cost Tuesday @10:15am

o (Session ID 33660) Migrating Oracle to the Cloud: 10x More Performance and $7M saved on x86/Linux Tuesday @3:30pm

o (Session ID 33620) Clone Oracle Databases Online in Seconds with Oracle CloneDB and DirectNFS Wednesday @1:15pm

Ok let me especially highlight these 2 events where you can see my friend Ramesh Razdan from EMC IT present. Like “500 brake horsepower”, this is where “smoking rubber hits the road”. Ramesh is all facts when it comes to running one of the largest integrated and cloudifying IT organizations on the planet.

o (Session ID 3560) Ramesh will serve in a panel discussion as part of a Cisco session

o Ramesh Razdan will be presenting in Cisco Booth (#721) on 10/3 and 10/4 @3:30 pm.

Did I save the best for last? You bet. Joe Tucci, EMC’s CEO, is doing the Keynote on Monday 8:00am (@oracleopenworld). Joe has been the master mind of EMC’s growth through the acquisition strategies of companies like VMware, Data Domain, Greenplum and Isilon. He’s at full pace as he takes EMC to the crossroads of Cloud Computing and Big Data. Super smart guy and I think you’ll find inspiration in his session.

Ok, if you miss something while you’re there you can always come by the booth and be inundated with prizes, smiling faces, and #EMC experts to answer questions. BOOTH 901 There will be quite a few announcements so keep your ear out that first week of October. Also keep a look out for the “Most Intriging Man in IT (

I look forward to a healthy slice of hippo…see you there!