bi directions

IBM Stream Computing

Just two weeks ago, our report on Semi-Structured data processing had a major gap – no real IBM presence. True enough, the IBM/Cognos, IBM FileNet and other IBM BI offerings certainly dealt with some aspects of semi-structured data ,particularly FileNet’s positions in Content Management and Record Archiving. But the integration across platforms through semi-structured data search, accessing, managing and refinement just has not emerged yet from the IBM software phalanx.

Well on the refining side that has changed. IBM System S puts them squarely in the continuous analytics marketplace. Here is what IBM says the System S mission is:

Data volumes are expected to double every two years over the next decade. The global economic slowdown is resulting in organizations seeking to become more nimble with their operations and more innovative with their decisions. In the face of exploding data volumes and shrinking batch time windows, these organizations are struggling to make ‘truly’ real time decisions and beat the competition. Existing tools and technologies that aid decision making by the Line of Business first require data to be recorded on a storage device and run queries after the fact to detect actionable insights. Savvy businesses are fast realizing that the time lost in this process leads to missed opportunity that might be the difference between success and failure. InfoSphere Streams addresses this gap effectively by providing a futuristic technology that can detect insights within data streams still in motion.

IBM has identified six key deliverables for System S – InfoSphere Streams as being:
1 – Perform complex analytics on data in motion;
2 – But still deliver sub millisecond response time to events and changing requirements;
3 – Handles multiple structured and unstructured data types;
4 – Handles massive data volumes;
5 – Provide continuous analysis at rates higher than existing systems;
6 – Simplify development of streaming applications with the ability to seamlessly extend existing applications with new analysis types.
This is an ambitious set of perfromance goals consider what the competition (see below) can do. It is notable that IBM has chosen to deliver all these capabilities using Red Hat Linux(64 and 32bit) and the Eclipse Development Studio Environ. No word of other target platforms yet. Is this the slow but sure move away from AIX and Z-os platforms?

Note that streaming data analytics is not a new or under served market as this report from 2005 shows. And some major BI vendors like Attensity, Microstrategy, SaS , SAP among others have longstanding or newly emerging presence in real-time BI Analytics. And both Microsoft and Oracle have staked a lot on their BI initiatives. For Microsoft, BI and data analytics is one bright spot in an otherwise challenging  environ. So IBM will not be lofty and alone but rather will have plenty of competition in its Stream Computing initiatives.

BI Road Blocks ?

BI has had mixed success out in the IT marketplace and Information Week is documenting that. Just 19% of business technology professionals report succes in using BI to support business performance. As a BI supporter I was frankly aghast at the lowness of this number. But the survey cited a number of barriers:
1 – Complexity of BI tools and interfaces
2 – Cost of BI software and licenses
3 – Difficulty of assessing relevant, timely or reliable data
Source: Information Week, December 1, 2008 page 6

I am convinced these problems are addressable especially with the new JavaScript UI frameworks (though traditional languages are also workable  such as C/C++ and Java; however, the coding is much more demanding). Lets look at these factors one by one. First,  the complexity of BI has several components:

1)There is the complexity of operating the interface and tools for gathering and making  decisions. The UIs for many BI systems have improved with more portals and User iInterfaces with drag and drop operations, etc. But what a lot of BI really needs is a Wizards option. This wiould take the user through each step of data preparation and analysis and explain what each step is intended to accomplish. This would be done in the context of a workline showing each major phase and reminding what that phase is intended to accomplish. The wizard assistant can be turned on and off by the BI user by show/hiding the panel at the top of the screen. See Corel’s PaintShop Pro’s Coach system for one example of how this can be done. The key here is never allow a BI User to get lost yet make the Coach inconspicuous when not needed.

2)Next there is the complexity of the underlying query (pulling all the relevant and timely data together)
. This task, already difficult with structured data and their SQL and other data massagings has just gotten worse with the addition of semi-structured and unstructured data sources. This is the realm of statistical, classification, and search methods which can quickly reach high sophistication. Inturn, this can mean spinning wheels, amateur pitfalls, and simply getting lost in a forest of technologies/processes while bottom line conclusions get lost in the shuffle. This is the often a critical step where  a smart build or buy decision is required. Sometimes severals step of expert coaching can get organizations closer to whwere they want to be in decision making over completely outsourcing or doing a brute internal build.

3)Finally, the inherent complexity of the data means there are many contingencies and thus potential reversal of priorities in the analysis and decision making process. These changes in direction can sometimes easily defeat the coach steps outlined in 1) above. The best solution is to provide exit or breakout conditions when the results of analysis require a) turning to a new approach, b)stopping now and taking the results as likely the best that can be produced given the costs versus the benefits or c) and the basic analytics including  providing an inherent agility that allows users to analyze data in different and sometimes quite complex ways (just think of onte Carlo risk analysis).

The key observation here is that though the range of analyses are potentially powerful, marhalling the data and following an agile process is still no guarantee of high ROI decision results. Yes, UI interfaces can be made to offer inconspicuous coaching and contingent analysis steps  or processes with sophisticated branchings which can provide further agility in matching evolving analysis to results. But every analysis step has an inherent first question – should I continue to the next stage of analysis or buy outside insights or stop with the anlysis being good enough given the costs incurred? Make, Buy or Stop.

What Will Moderate Costs of BI?

First and foremost constantly answering the Make, Buy or Stop decision for every major step in a BI Analysis.  Prudent use of BI methods and tools will naturally control their costs. Second, the breadth and depth of Open Source BI software is just staggering and quite good:
R- For sophisticated statistical analysis and semi-structured data analysis
SugarCRM – for very good CRM analysis
Jedox – Palo is an OLAP engine linked with Excel  for sophisticated OLAP analysis
Pentaho – for a complete suite of data collection and analysis tools
Birt and Jasper – for reportwriting and tabuklar analuysis
Talend – ETL and basic data integration  for basic data gathering
And in case you are worried here are some articles examining the viability of Open Source BI software from DMReview, Information WeekTDWI and BIguru provide the insights and appropriate cautions.  Bu the bottom line is that Open Source BI being inherently embeddable  and a lower cost option is now being offered by more consulting and VAR operations with the result that users have a broader choice and a lid on prices for a broad range of BI services.

Return to Data Analysis

ETL is the basic process around building data sources or warehouses for analytic data. It is also at the forefront of breaking down the current incompatible silo’s of information that are associated with propritary ERP, HR, Accounting and other first generation business systems which were long on features except for sharing data with other systems. As noted above this process of data gathering has been further complicated by the need to go into Semi-Structured data (think tables, spreadsheets, lists, and other first cut organizing of data) and Unstructured data(raw data feeds or articles and texts about a subject with miscellaneous citations, some semi-structured charts and analysis, etc).

Given these observations, it is well worth pointing out that information has at least 7 Critical Dimensions. This gives users an idea as to how complicated the management of data itself and the associated analyses can be. The value of the data transformed into information is measured by the a)the reliability of the predictions/projections that information produces; b)the amount of risk and uncertainty the analysis reduces from the No-analysis situation; and c)the inherent value add of the new actions predicated by the predictions/projections. These are the basic ingredient/criteria for the Make, Buy or Stop decision.  Thus, data analysis can quickly become a wicked problem where optimal solutions are few and far between – and just dong better is often a winning strategy. <u>Hence it is well to know that in some circumstances, Data Analysis is critical to good decision making but may not be able to always provide optimum decison making</u>.

Options
Translate This Blog Post
Whose Keeping us Open