Data collection is arguably the most decisive part of any process of analysis. This article discusses this task with special reference to qualitative data

Most of the discussions I have read about relating to data and its analysis talk a great deal about the integrity of data but without also discussing how it is collected. In this post, I am examining just this, especially in the context of complex problems, understood as those that have more than one layer to pierce through and need data from multiple non-uniform sources. The addition of the qualifying ‘non-uniform’ is deliberate and important. To this we must add another dimension – formats. Increasingly, in various different problems, data in more than one format is used at the same time to analyse a problem. By the way, I am using the word ‘problem’ in a broad sense so as to include many things. It could be a project, business proposal, an event, a vexing and persistent problem or a contextual problem. What is important here is that data from multiple sources and formats will be required to make an intelligent analysis and take an informed decision.

We must also recognize a separation that lies at the heart of this and seriously influencing the quality of analysis – those who work on the problem are not the ones who collect data or are not trained to do so. Implicit here is that we are talking of two different (unequal) skills which together determine what the analysis will yield. If these are matters of health or public policy, the consequences can work either way. Never ever forget this aspect. And if the data is qualitative rather than quantitative, the difficulty gets compounded because qualitative data is not ‘available by itself’ but has to be sought and gathered.

Consider for example the challenges faced by The Government of Victoria, Australia in collecting data on family violence, and from priority communities. While talking of inconsistent data collection standards – multiple agencies with multiple objective (but all converging on the same problem), it elaborates: “Data collection from clients may occur in a variety of situations and settings where it can be difficult to obtain complete and accurate information, and the amount of information gathered may vary depending on the context of the situation. In most cases, the person responsible for collecting data has a primary role that focusses on the provision of a service (for example, as a police officer, support worker or medical practitioner) and, although they collect data as part of these roles, data collection is not necessarily the primary function of their role”. Not to mention the fact that these people are not trained in data collection (https://www.vic.gov.au/victorian-family-violence-data-collection-framework/data-collection-challenges-and-improvements). Think of various different instances where people gathering the data are not trained for it or it is not their function and you can gauge what kind of ‘data’ would have been collected.

Let us look at another exacting instance of gathering qualitative data – “Describing Death in America: What We Need to Know” The Academic Press, conducting a study for The National Academies of Science, Engineering, Medicine, observes that “National expenditures for medical care in the months and days preceding death are enormous. But we do not know whether that money is buying good quality care or optimizing the quality of life of those dying, or whether the situation is getting better or worse over time. The information that exists “describing death” at a national level – though some of it is very informative – is fragmentary. This report recommends ways to fill the information gaps by better use of existing nationally-representative data, and through some new measures, in particular, a new, ongoing National Mortality Followback Survey. The aim is to allow us to benchmark where we are today as a society, and what goals we can set to minimize pain and suffering and maximize the quality of life of all of us who will die in the years to come” (https://www.nap.edu/read/10619/chapter/6).

Most people are quite careless in data gathering since the obsession is with analysis itself born of the fascination with computing power. In my experience over the last five years interviewing candidates for various scholarships, and my earlier professional life, I have encountered a very poor understanding of qualitative data but showing no qualms about drawing conclusions. Let me add that there is no understanding of appropriate sample size, the nature of sample and so on.

Let me add one more example to reiterate my point. This is from a study on “Data collection challenges experienced while conducting the international alcohol control study (IAC) in Tshwane, South Africa”, by Elmarie Nel along with others, in Drugs, Education, Prevention, Policy, Vol 24, 2017, Issue 5. Their observation about the key challenges – “difficulties in identifying the full range of alcohol beverages during the adaptation of the questionnaire as manufacturers sometimes changed quantities and packaging to enhance sales; a resultant lengthy questionnaire which negatively impacted on participation; potential participants sometime being afraid to allow fieldworkers onto premises due to high crime rates in many participating areas and religious beliefs concerning alcohol consumption that prevented some parents from consenting to their adolescent children’s participation. Conclusion: When planning community surveys on alcohol use, it is important to anticipate and prepare for these and other challenging conditions that arise in the preparation for, and execution of fieldwork”.

(https://www.tandfonline.com/doi/full/10.1080/09687637.2016.1226774).
The problem becomes more serious when qualitative data is gathered to go together with quantitative data. And this assuming that quantitative data is correct! We will examine this in a later post

Takeaways

Qualitative data accrues from multiple sources and in multiple formats

Data collection and analysis are often done by two different groups of people

The quality of data is often compromised because of poor data collection approaches