Greed is good – this investment maxim has been embraced by almost everyone. The enduring fascination with Big data is a case in point. However, rather than fascination, a level-headed approach will yield more efficient outcomes

Any data is useful somewhere or the other. Thus, it is hardly surprising that many institutions and development practitioners discovered that big data can be very useful in assisting development. Some of you might recall that before the initiation of big data into the tech lexicon, the much used term was unstructured data, to distinguish from the structured data that typically characterized (or thought so) the data captured in enterprise databases. This was largely textual data for which there were then limited techniques available to create structured sets. Natural language processing has grown sufficiently to manage to produce some structured data. There was another term to deal with large volumes of data – VLDB, very large database. I am mentioning this because today big data subsumes even large data sets. I prefer unstructured data and very large data sets but one of the basic principles of communication is to listen, to listen to what has already transpired. Hence, I will use big data but keep in mind what I have said.

What is unique to the IT and technology industry is that companies seek to create ‘products’, which may be sets of tools & techniques and not necessarily a full-fledged software system, with every breakthrough in understanding and using information. It was no different with big data. Among the earliest ‘products’ I can recall was Omni from IBM which sought to help manage unstructured data. If I were not mistaken, it also led a short life. I think there is an explanation.

There are two kinds of greed in the information business – one is to extract more than a set of information can yield and another is to create ‘products’. The first greed may be accompanied by the second but has been known to go on its own too. That has happened with big data so much so that it has become a panacea. This is rather disappointing because it will deprive the term of any credibility and reduce it to a cliché. We shall refrain from such misadventures and advise you to also follow.

The best way to appreciate the worth of big data is to see as a necessary accompaniment to structured data such that together they offer a more credible, more probable explanation. Although it was not so used expressly, among the earliest to use textual information to explain some quantitative data was Stephen Levitt when he sought to explain the declined in crime in New York by correlating it with the decriminalization of abortion. He did examine many other possible factors to settle on the decriminalization of abortion as the most probable cause of the decline in crime in New York. Bloomberg, whose terminals are used extensively in the financial services industry, acquired the weekly magazine, BusinessWeek, as a way of enhancing the value of its terminals. Any user of the Bloomberg terminal didn’t need to exit it find answers for whatever queries may have arisen out of the data that they encounter in the terminal because they could find it in BusinessWeek which became an integral part of Bloomberg. The ‘big data’ (understood here as unstructured data) of BusinessWeek would probably contain the answers to what needs to be explained in the quantitative data. Or the intelligent use of the hashtag in Twitter. You can actually use the hashtag to create primary data – you can gather the information you are seeking.

This is the huge promise of big data, that it has the potential to complete a picture. Used this way and refraining from greed, it is a very good marriage with quantitative data or even primary data. It can provide strong anecdotal evidence or can explain regional variations or furnish clues to product defects or problems in performance. Used this way, you might not even need any of the fancy techniques of managing big data because all that you will need is a set of efficient search clues. And those clues should come straight out of questions from quantitative data. Used wisely, a chunk of big data can actually be used to create a functioning knowledge management system. The taxonomy will be more difficult to create but that goes with the territory.

Let me cite an instance of a leading IT company which was given a project to cull out hidden intelligence in warranty claims submitted by customers. The claim form had 32 fields of which 30 were structured, searchable information such as invoice number & data, machine number and so on. The ‘intelligence’ though was in two fields filled in by customers in free flowing text. While initially it did seem daunting, it became quite easy once the team focused on the verbs associated with the product – after all, in a warranty claim you are ‘doing’ something. The team ‘found’ the ‘intelligence’.

It is this focused approach you need to bring in instead of blindly looking for techniques. Yes, techniques are important but you need to know when to use them. We now turn our attention to big data and development.

Takeaways

Getting carried away by techniques is a real risk

Find what you want based on the information you need

Big data is useful in and for development

Photo by Karolina Grabowska from Pexels