Quite often, facts are simple but can be slippery too if you are not careful. We encounter all kinds of facts in the world, and some don’t become until they are unearthed. We look at some historical events and the specific case of Covid-19 to reiterate the point
There are many different kinds of facts. There are facts that repeat themselves perhaps at a defined or definable frequency and magnitude, facts which jump at you at irregular intervals, facts that (almost) go unnoticed before someone accidentally trips on them (leading to a new quest), facts that are denied (by governments and bureaucrats) until they no longer can be, facts that become known when documents are declassified leading in some cases to a rewriting of history, facts that can be understood with a simple explanation, facts that defy commonsense or logic (as we understand it), facts that seem bizarre until they seem perfectly clear when explained (such as what you encounter in chaos theory), facts that have to be meticulously established (through documentation as proof as in the case of holocaust to confront the naysayers), facts that become so only as a result of measurement (such as say inequality, air pollution), facts that were later shown to be falsehoods and so on. Quite a collection, isn’t it?
In the book ‘Auschwitz’, Lawrence Reece (creative director, BBC at the time the book was published in 2005) (https://www.publishersweekly.com/978-1-58648-303-6) takes enormous pains to demonstrate and ‘prove’ that the tragic events at Auschwitz did take place as there were (and still are) people who deny the holocaust ever happened. Haven’t you come across this term – ‘they got the facts wrong’? Especially, in matters of history. The most ordinary documents can lead to proof of major events, as this book shows. It took more than a century for the world to accept that there was a massacre of more than a million Armenians in 1915 (https://www.britannica.com/event/Armenian-Genocide/Genocide0. It is still being written about (https://www.thehindu.com/news/international/explained-what-happened-to-armenians-in-1915/article34406120.ece).
What kind of analysis we can perform with some degree of success is clearly a function of the facts available for analysis. Just consider the recent ‘revelations’ thanks to Pegasus and the Pandora’s Papers. No one knew of these ‘facts’ until they were unearthed and made available to the world. And we have had to revise our views of many people! And wonder about global banking! If all facts were easily available, there will be no need for forensic accountants or lawyers. In an age where camouflaging is common, facts have to be continuously discovered.
Finding facts
Finding facts is an arduous task as one set of facts may well demand more if what have been found so far are more puzzling than clear, raising more questions than offering answers. Historians are accustomed to this. The great historian, Fernand Braudel, spent 25 years researching for a study of the Mediterranean civilization between 1400 and 1800. What began as an essay grew as Braudel needed to ‘find more facts’ to help him understand his subject and just took 25 years! He did not refer to a single secondary source! Finally published as a three volume, this treatise is a monument to ‘finding’ as a profession not just an act. Or the way Karl Marx pored over vast amounts of work at the British Museum which culminated in the incomplete Das Kapital. Neither accepted what seemed plausible explanations but kept exploring, plodding through an incredible amounts of data. And this was a time when good research was a function of how good a library was and archives were – Marx writing in the middle of the 19th century and Braudel in the first half of the 20th century.
Recently, there has been a revival of interest in the French Revolution, the Russian Revolution, Nazism and Fascism where the availability and accessibility of more documents has enabled historians to offer new perspectives on what we thought were concluded subjects. It is instructive to study what has made it possible for them to question the existing framework of explanation and proffer, even if tentatively, an alternative explanation.
Impact of IT revolution
The information technology revolution taught us to ask the right questions to elicit the right kind of information before we passed judgment on any topic. It also taught us to ask specifics about the level of granularity of information that is required to ascertain something. In one of the recent studies of the French Revolution, the author attempted to find out the social and economic background of the members of the Constituent Assembly after the 1789 revolution which has been widely considered to have been brought about by the rising industrial class. If this were true, it should be reflected in the composition of the Constituent Assembly, according to the author. The world IT and information has forced us to ask a simple, straightforward question – what is the information that will confirm that the 1789 revolution was fought in the interests of the industrial class? I cannot recall such a simple but pertinent question in my earlier readings of the French Revolution.
If IT has taught us to ask the right kind of question, the new generation of analytics professionals too have a lot to learn from historians. Let me hasten to add that I am not suggesting they spend years – anyway no company will allow such a luxury. What is worthy of learning is the skill in spotting gaps, identifying appropriate facts to be found, identifying ways of doing so and attempt to explain what it all means. I wish to emphasize this point because there is a reigning view that given the vast amounts of big data and processing power available now, we need only correlation and nothing more. There is this dominant view that data reveals patterns and we just have to follow them. An extremely shortsighted view in my opinion.
There is a lot of attention paid to ‘discovering’ patterns or ‘pattern analysis’ in the contemporary world’s obsession with ‘analytics’. Patterns are born of facts repeating themselves at certain frequencies and in some magnitude. To make sense of this pattern might mean resorting to some other facts that can throw light on the ‘why’ of the patterns, which will, hopefully, result in some plan of action. Many businesses thrive on this. Criminologists confront this while dealing with what is understood as ‘serial killers’. Contrary to what many think, detecting patterns is not easy especially if all the data is not held in ‘one place’. Seemingly different things happening at different places might suddenly be revealed as connected when they are ‘explained’.
Covid-19
Some facts have to be captured. Let us look at the very current pandemic, which has raised serious questions about what facts to be considered in evaluating the many dimensions of the virus and, more important, how they can be captured. In an article in The Times of India dated September 22, 2021 titled ‘We need better Covid data’, Gagandeep Kang, Professor, Christian Medical College, Vellore has raised precisely this aspect about the level of granularity, among other things (https://timesofindia.indiatimes.com/india/we-need-better-covid-data/articleshow/86430117.cms). She refers to a document published by the US National Academy of Sciences titled ‘Evaluating Data Types’ (https://www.nap.edu/download/25826). I am just mentioning the key points relevant here: “The seven data types are: the number of confirmed cases, hospitalizations, emergency department visits, reported confirmed COVID-19 deaths, excess deaths, fraction of viral tests that are positive, and representative prevalence surveys (including both viral and antibody tests). The five criteria are: representativeness; bias; uncertainty, and measurement and sampling error; time; and space. The importance of any of these five criteria depends on the nature of the decision being made, and each data type has different strengths and weaknesses”. (The document is freely downloadable)
And then they make what to me are observations not just specific to Covid but understanding any data. “Each data type represents a piece of the puzzle, and when used in combination, the various types form a clearer picture of how the disease is spreading and its severity. Because any single data type is likely to yield an under- or over- estimate of the extent and spread of the disease, it is important to consider multiple data types and be cautious in relying on estimates without considering sources of bias”. (Italics mine)
Taking a cue, Gagandeep Kang writes: “We need data at the individual level, who is being infected and where (age, sex, location, symptoms, vaccination status), by what, when and how (sequence information linked to other individual’s sequencing and preferably with information on the contacts between them”. (Italics mine) She has also written about what information can serve as proxies for severe disease; the concept of proxy information is vital. She goes on to detail the many related information that should be understood such as the period of hospitalization, duration of protection after vaccination and so on. After admitting that a lot of this information is available but in different databases, she comes to the crux of the problem: “All we need is a willingness to link them together and monitor the quality of the data, preferably independently”.
Summing up
Let me end this with a slightly cryptic observation. Software technology offers great number crunching capabilities. But if you can bring a historian’s mind to this technological capabilities, a researcher’s keen eye for data at different levels of granularity, the result can be profoundly deep. Depth versus shallowness. The choice is clear.
Takeaways
There are different kind of facts
Finding facts is a (productive) task
Analytic professionals should learn from historians
Some facts have to be captured
Evaluating data types is a critical requirement
Covid-19 is an illustrative case
A historian’s mind meshed with a software’s capabilities can be a winning combination