Thursday, January 5, 2012

The Data Scientist


 
Recently the job title of "Data Scientist" has become en vogue.

Just like the buzz word "Business Intelligence" has evolved from previous concepts, the function of "Data Scientist" is hardly new. In fact, on of its predecessors, "Statisticians", had the aura of being dull & boring, back when being a geek was not chic.  So we needed a new title to justify it suddenly being cool?

Language in social context can be a funny thing. All the more ironic that many challenges in business intelligence and data analytics these days revolve around semantics and ambiguity of source data, the very phenomenon that humans seem to intrinsically create.

It also seems convenient how suddenly an army of consultants, tool vendors and job seekers smoothly morphed their existing portfolio and previous experience into being "Experts" in Business Intelligence  or , well, "Data Scientist". In the end, I find my friend Jeff's self-titulation as "Data Archeologist" cooler.

To be fair, what emerges under this new moniker of Data Scientist is an amalgamation and refinement of previously distinct disciplines, such as Business Analyst, Database Administrator, Software Developer, Statistician, etc. Yet, Information Science (not to be confused with Computer Science) has always concerned itself with topics just now hitting the lime-light in terms of desirability in businesses. The primary factor for the role of Data Scientist going viral may just be businesses' realization that there is money to be made with insights into the data exhaust from our business processes and social interactions on the Internet. The classic, "you build it and they will come". Now that this mass of data is available, we might as well use it, and "oh wow, you can actually monetize all this?!".

My hunch is that the role of an analytical and data integration specialist will eventually fade into an expected subset of skills in the modern knowledge worker, both as individual contributors as well as managers. If you look at how much responsibility over a broad range of complex decisions the average white-collar worker has these days, it is as if they were running their own business within the larger organization, with information tools and all. So it would make sense that eventually, between tool simplification, data-integration automation, and commodity-pricing of related products/services, "data science" will be yet another expected business skill set, such as fluency with Microsoft Office products and PCs has become the norm over the old business standards 30 years ago.

Tuesday, November 1, 2011

Journey into Semantic Space

How context affects meaning and its transition through situational participants....












This is a more generic view on the pattern alluded to in previous posts....

B.I. Tools don't replace human judgment

Along the earlier point of Augmented Intelligence, it should be re-emphasized that even the fanciest Business Intelligence tool and infrastructure is not replace critical thinking by humans.

Currently there is a bubble developing around visualization techniques. The mere visualization of data is hailed as "Business Intelligence", while the semantic layer ("what does this data mean?") is still neglected.

Computers' main purpose is to empower human thinking, to speed up tedious tasks and help managing vast amounts of data that would overwhelm the human mind. But the actual judgment, as to what the data reveals, is still a human responsibility.

Traditional decision support systems wait for humans to pose questions for the DSS to answer. More current approaches let the software/data provide pro-active insights human users were not even aware of. But how to use these insights is entirely up to the business user.

If business analytics solutions would be so powerful to actually make decisions as well, then business people wouldn't be needed anymore. And these decision engines better be implemented by business-savvy developers.

Friday, April 22, 2011

Ambient Analytics

A while back I talked about Augmented Intelligence, tools that help us humans make ever more complex decisions in our increasingly detailed and networked world. Business Intelligence is just a sub-section of that, with businesses being early adopters of tools that help efficiency, because they have the funding and sense for investing strategically. Eventually these tools go mainstream and become available for the masses. Just think about how computers evolved out of the corporate real into every day use by individuals.

Microsoft has this slogan, "Business Intelligence for the Masses", and was pretty successful in taking the tools & interfaces of traditional data warehousing & reporting to a more intuitive level, so more people could understand and adopt the technology. The Microsoft BI Stack doesn't really do anything groundbreakingly new from a functional solution perspective, but it shines in being fairly integrated and comparatively intuitive to use.

Along this philosophy, I believe that the next step in augmented intelligence will be a push for "Analytics for the Masses". This thought inspired my online identity as "Analytics-To-Go".

While the nitty-gritty under the covers is based on heavy statistical algorithms, usability will become a huge factor of success. And that does not just mean intuitive user interfaces. It also pertains to reliability, the confidence-factor. A tool can spit out all kinds of recommendations and answers, if the user doesn't trust it, there will be no use, no acceptance, no adoption.

The developments are already under way to address the intuitive user interface, for example car companies working with navigation solution providers on voice controlled human/machine interaction. This in itself is a fairly opportunistic field: voice recognition (tone, mode, speed, dialect/accent, etc)

Yet, there is a whole other layer of complexity to be solved for the results of the augmented intelligence system to be useful to the user. Keyword SEMANTICS.

Semantics is about context, "what does it mean?". The business man talks into his car computer "call my wife and tell her I am running late for dinner". The system will have to figure out who the wife is, how to reach her and what late means (inferred ETA?).

One way to go is to store all possible information in a huge database, akin to Deep Blue when playing chess against Grand Master Kasparov. More advanced and the trend is toward a Watson/Jeopardy approach: interactive learning, just as humans do. After all, intelligence is not just about putting many "facts" together, but considering newly evolving information, from different sources, putting them into proper context.

That's where sensors and higher level machine/environment interfaces come in. Just as humans relay on their biological senses to take in information about their surroundings, technology-driven data-acquisition will play a pivotal role in Ambient Analytics.  Just consider how modern automobiles have sub-systems constantly inquiring environmental information, like proximity to cars around, lane tracking, tire pressure, speed, location, deceleration/momentum, driver sleepiness and then take these information into account for active driver-assistance in augmented vehicle control: pro-active seatbelt tightening, brake-priming, lane departure alert, blind spot alert, steering wheel vibrating to keep drivers alert when they doze off etc.

These systems need to be highly adaptable and cannot rely on heavy top-down configuration/programming, or even static update schedules. The science behind this approach is called machine-learning. But the important part is that the science becomes usable in a practical scenario, not just functional, but user-acceptable. Again, confidence level drives adaption.

Also to consider, most corporate-drive analytics solution tend to be rather top-heavy, aimed at strategic insight. This is in line with many business' investment horizon. The opportunity though, in the mass market is more for tactical decision making assistance (another angle on augmented intelligence). The "for the masses" market breakthrough will be in delivering "just enough, and just right", and not over build, or over engineer, losing perspective of the actual opportunity. Integration of all the tactical solutions will be an evolution that can only emerge from the bottom up, i.e. adoption, and how users apply it. Just like kids rarely ever use their toys the way they were intended, but put them together in new creative ways the adult toy designers never dreamt of. Give the end user flexible tools and they will come up with new uses.

Tactical Analytics will be the new killer app!

So the high-level approach to augmented intelligence is to solve the systematic & market challenges of:
  • Usability (e.g. natural language interface, high confidence level)
  • Context (semantics, the situational meaning of things, the users's perspective)
  • Active Machine Learning (tracking changes, reconciling different information sources, applying to contexts)
  • Shift from Strategic to Tactical Analytics (focused opportunities will eventually merge into broader strategic patterns through social adaption, exchange, and user creativity)

The concepts are becoming mature, and the technology is largely there. The opportunity moving forward will be integrating it into useful products, creating the next killer application, just as mobile device adoption exploded when the intuitive touch interfaces, coupled with built-in camera, location awareness (GPS) and network-connected multi-media hub came all together in a handy package. And mobile devices will most likely play a key role in the augmented intelligence evolution.

Thursday, April 14, 2011

Obsolete Top Down Data Modeling!

Just wanted to capture real quick a thought before it escapes me....will elaborate more later. Here goes...

So much effort had been going into designing, build relational data models, often in an information vacuum, without even fully understanding (if even having seen) the source data intended to go into that model.

At best, considerable effort went into profiling the source data. Many tools to automate profiling exist. Why not have them generate the data model for us? We entrust tools to help us with predictive modeling. How about having them help us model the database for that which is already there?

If we can look into the future, shouldn't it be easy to look into the past? I say: let's automate data modeling, no more hand-crafted ERDs, no more ivory-tower modeling!

Once again, the rift between intent & need. We master building machines that make things happen fast, and software that automates processes. Now let's focus on having them do the right thing.

Sunday, April 10, 2011

Testing Perspectives

Got to love the ambiguity in this headline. I don't mean testing some/the perspectives, rather: "Perspectives on Testing".

In my latest project I had the opportunity to get more involved in a formal testing strategy of our B.I. deliverable. As I tried to prepare myself for the best approach, I realized that there is not much literature on how to properly test business intelligence architectures. While there are plenty of treatises on general software development, most with emphasis on user interfaces or transactional systems, there is not much that covers the qualitative aspects of an end-to-end Business Intelligence solution.

One could argue that there is nothing new under the sun, even in the B.I. space. It basically is just comprised of user interface elements, database tiers and some sort of data processing system (ETL can be seen as the transactional component).

And in the end, the biggest challenge is not the technical/functional component, but the process, the business.
That pattern is not much different than the requirements management arena. In fact, with the popularity of test-driven development (TDD) the tendency is to more closely link requirements with testing anyway.

If we narrow it down to the important aspects of testing, we arrive at something like this:



While test requirements are at the center, as always in life, what they mean, how they fit into the bigger picture is a matter of perspective.




A good testing process, or testers as individual professionals, reconciles user needs with project plan, developed artifacts and what impact they all have on the business. 

In B.I. as well as other business solutions.

To be continued...



Friday, March 11, 2011

Augmented Intelligence

The current state of affairs in Business Intelligence is a rather top-down driven and manual process. 

Let's say, a business user is interested in how a business process is performing. If they are lucky, a performance dashboard for monitoring key performance indicators (KPI), exists. If not, it will have to be developed. A request-response situation, through the whole IT/developer/analyst organizational stack. In more sophisticated business, possibly with sophisticated tools, the business user can configure new KPIs on the fly, assuming they have the proper understanding of KPI logic and proper tool usage.

Three issues stand out from this approach:
  1. Turn-around time (lack of agility, due to reliance on human-driven process)
  2. Business people's focus gets side-tracked by required tool & database knowledge
  3. Potential for error (as always in overloaded human responsibilities)
So what can we do? How about automating the mundane tasks and leaving humans with work they are good at? Welcome to "Augmented Intelligence". Instead of going form one extreme (manual process) to the other extreme (Artificial Intelligence), how about a balance to leave the tedious work to computers, and the more fluid, ambiguous information that requires human judgment to people. But how to make sure that people don't over-ride human judgment, and vice versa, humans not second-guessing proven algorithmic correlations of real data?

That's where the feedback loops and iterative approach comes in. Delegate & monitor, trust-but-verify, do & review. And track history of all decisions made. If business changes are applied, keep tracking the old reality and compare it to the new (something like A/B testing in marketing). You can only know if something works when comparing to the alternative(s).

Make things easy for people, use common interfaces: Web, Email, Wikis, News/Event formats.

Simulate a work-flow they are used to from their personal life: Send messages. Read how-to articles. Get News. Write letters to the editor. Publish their ideas.

Get buy-in from your organization by cultural popularity, make it another social engagement opportunity. Accountability & motivation will evolve naturally as people enjoy these more flexible and bottom-up driven opportunities to make an impact.

Don't have the computers try to do the complicated things humans can easily do, and don't have people do dull tasks that are much better performed by computers.



(Click on picture for larger view)