IBM Stream Computing

Just two weeks ago, our report on Semi-Structured data processing had a major gap – no real IBM presence. True enough, the IBM/Cognos, IBM FileNet and other IBM BI offerings certainly dealt with some aspects of semi-structured data ,particularly FileNet’s positions in Content Management and Record Archiving. But the integration across platforms through semi-structured data search, accessing, managing and refinement just has not emerged yet from the IBM software phalanx.

Well on the refining side that has changed. IBM System S puts them squarely in the continuous analytics marketplace. Here is what IBM says the System S mission is:

Data volumes are expected to double every two years over the next decade. The global economic slowdown is resulting in organizations seeking to become more nimble with their operations and more innovative with their decisions. In the face of exploding data volumes and shrinking batch time windows, these organizations are struggling to make ‘truly’ real time decisions and beat the competition. Existing tools and technologies that aid decision making by the Line of Business first require data to be recorded on a storage device and run queries after the fact to detect actionable insights. Savvy businesses are fast realizing that the time lost in this process leads to missed opportunity that might be the difference between success and failure. InfoSphere Streams addresses this gap effectively by providing a futuristic technology that can detect insights within data streams still in motion.

IBM has identified six key deliverables for System S – InfoSphere Streams as being:
1 – Perform complex analytics on data in motion;
2 – But still deliver sub millisecond response time to events and changing requirements;
3 – Handles multiple structured and unstructured data types;
4 – Handles massive data volumes;
5 – Provide continuous analysis at rates higher than existing systems;
6 – Simplify development of streaming applications with the ability to seamlessly extend existing applications with new analysis types.
This is an ambitious set of perfromance goals consider what the competition (see below) can do. It is notable that IBM has chosen to deliver all these capabilities using Red Hat Linux(64 and 32bit) and the Eclipse Development Studio Environ. No word of other target platforms yet. Is this the slow but sure move away from AIX and Z-os platforms?

Note that streaming data analytics is not a new or under served market as this report from 2005 shows. And some major BI vendors like Attensity, Microstrategy, SaS , SAP among others have longstanding or newly emerging presence in real-time BI Analytics. And both Microsoft and Oracle have staked a lot on their BI initiatives. For Microsoft, BI and data analytics is one bright spot in an otherwise challenging  environ. So IBM will not be lofty and alone but rather will have plenty of competition in its Stream Computing initiatives.