Thursday, October 31, 2013

Sample to Population - The Big Data Leap

Capacity of technology to churn huge amount of data has ushered a new era in statistics.

Lets visit Sampling definition " sampling is concerned with the selection of a subset of individuals from within a statistical population to estimate characteristics of the whole population " , 'subset' is the term which expresses statisticians workaround to get as close as possible to the population spread. As we know the act of getting sample right is of paramount importance.

Oh , what if we can get the whole population data - so that's a clear possibility with the advent of Big Data moving in fast. Massive parallel processing of the Map-Reduce Framework on the Distributed File System , its all possible. To top it, likes of Revolution R & Mahout are already there.

Statistics as we know of today is on its journey to accept population as the base data for analysis rather than the sample. Better estimates , predictions are on its way....

Tuesday, September 3, 2013

Bus Parallel Processing Architecture - Cloud Ready

Business Intelligence conceptual design paves the way for BPPA (Bus Parallel Processing Architecture). Bus Architecture is the epicenter of developing a well knit dimensional model i.e. translate business processes/sub-business processes into star schema's with dimensions drawing on the master data and transactions finding its way to the facts. Knitting is taken care off through conformed/shared dimensions.

Logical layer takes the bus architecture to the implementation turf, developing the database ER model leading to creation of database tables and bringing star schema into existence in relational database. Physical layer provides the infrastructure for DW i.e. servers, networks etc.

BPPA works on the principle of dimensional redundancy across servers (virtual servers). So each fact moves into a dedicated box thus allowing scaling out the enterprise DW - enabling cloud move for our DW. Conformed/shared dimension are redundant on respective star schema. Cost or overhead of this approach is maintaining redundant copies of dimensions. ROI of this is performance gain and ability to port the scaled out solution to multiple small servers and eventually to cloud.

To top up fact partitioning can be done, as most database vendors now provide this - out of the box, thus reducing IO footprints of fact access.


Tuesday, April 2, 2013

Futuring Now


History has seeds of Future. Identification of these seeds is what a Business Leader needs.

Information age has made data capture of almost all the activities possible. It’s all about unraveling the patterns in these data which is the key to the future or decision making. So were to start - today there is comprehensive disciple of business intelligence. Dimensionalising our data structure and applying permutation and combination to better understand our business processes.

But that's just the beginning. Actuaries’ science is one of the most evolved application of peeping into future on the basis of history. It's time when statistical pattern analysis and prediction takes the central stage for conversion of the business data into artificial intelligence.

With the computing power and technological break-through (i.e. big data, hadoop distributed file system) huge amount of data can be churned in a scale out manner and that too in virtual space of cloud. Artificial intelligence hits the ground reality as the Data quantum used increases and becomes closer to depicting the population (in statistical context).

Business empires will be built on futuring now and technology organisations have a great opportunity in this.

Wednesday, March 13, 2013

Performance, Elasticity and Scalability


Performance tuning is done for improving system performance. Prerequisite to performance tuning is monitoring. Monitoring forms the bases of building system performance baseline. Baseline provides reference for the performance evaluation of the system.

A system’s ability to accommodate to changing conditions or circumstances is called elasticity. If system is resilient and requires minimal changes in order to accommodate the new conditions, system has elasticity.

In many cases the system performance decreases for example with increasing load volumes or the number of executed applications, resilience of the system to these and maintain the performance is termed as scalability.

All of the three i.e. performance, elasticity and scalability are qualities of the system and are the cumulative result of designing the system with proven methodology. Architectural approach is the deciding factor here. Design of the system from conceptual, logical to physical layer contributes towards these.

Conceptual Layer design needs proper understanding of the Business Process and there relative importance to the business top line. This will ensure elasticity built into the system as, conceptual design will delineate detailed understanding of the business requirement, towards this Bus Architecture is of prime most importance as it provides top-down integrated view of the enterprise.

Logical Layer design requires process and data modeling expertise in order to ensure all permutation and combination in terms of business requirement are developed.

Physical Layer design needs using appropriate capacity planning of hardware and configuration of software/services

Monday, October 8, 2012

UI Crossroads


User Interface, Unification and then Disintegration. So Browser to Apps, WHY -- it’s from need to want journey i.e. from generic to specialized to personalize User Interfaces (UI).

As technology evolves, it allows more customization and sophistication. Today we are standing at another crossroads where ease of use is taking front seat and standards are taking a hit, thus today we create more than one UI for our application - namely Web UI, IPhone UI, Android UI, Windows UI....

Hang on; the verdict is not out still. HTML is making a mega comeback, watch out for HTML5. Platform independent UI is on unveil.

SOA thought process demands platform agnostic UI. But Business world thinks otherwise, till technology proves ROI. Eventually we have seen time and again technology supersedes, profit machines (corporate houses). Case in point: extensive use of SOA framework to integrate diverse systems with ease.

Effort turnaround time points towards Platform Independent UI.

Tuesday, August 7, 2012

Massive Parallel Processing In Cloud - "BIG Data"


Massive Parallel Processing In Cloud - "BIG Data"

How Pyramid of Giza was built? What it takes to run a chip? What's the Universe made up of?

Basic fabric of nature is - 'building block needs to be atomic and should be available in plenty'. Same applies to computing world. There is a limit to scale up, because as we scale up, maintenance overhead increases and soon we reach to a point where scale up advantage is nullified by the corresponding increase in maintenance overhead.

So the way to go forward is nature’s way. Scale out. Concept - when many atomic computing resources are leveraged in parallel, the impact is massive. And that’s what we call MPP - Massive Parallel Processing.

With each passing day internet infrastructure efficiency is getting stronger in terms of payload and speed. Now that forms the foundation for MPP in Cloud. Unlimited atomic computing resource availability (literally) on cloud from many vendors, is another thumbs up for MPP in Cloud to become a reality.

MPP in cloud needs, scale out architecture paradigm to be effective.

Big Data concept is a reality - “Think Scale Out”  


Friday, June 1, 2012

Economics of Compression


Compression is a mechanism to reduce space required to keep a certain thing or entity. In the world of databases this word assumes a greater meaning. In the information age, data is growing at exponential rate. As we have more avenues to capture data.

With this growth in Data, Organization increasingly wants to draw meaning/value out of this data. That entails - Data Engineers now have to maintain huge volume of data and not only that, the need for performance is also growing.

Algorithms for compression have evolved, providing reduction in space required to maintain data. That’s just the beginning.

Trinity of computing resources namely CPU, Memory and IO plays a vital role in the performance. Out of these three, IO is the weakest link in the chain when it comes to data management performance.

So with reduction in size of data using compression, the need for IO decreases resulting into performance increases. But there will be increase in the total CPU cost, as compression and subsequent decompression will need CPU cycles.

A word of caution - High Cardinality data is difficult to compress and frequent updates at granular level will be slower due to compression overhead.

When in need of Performance one needs to take compression factor into consideration.