Zeba Siddiqui, Francis Rathinam
With a significant number of active cases globally, the novel coronavirus represents an extreme public health challenge. The outbreak of the pandemic has led to stringent travel restrictions, making data collection a highly challenging task for the development research community.
Traditional methods of data collection which require field visits can be risky during these times. In this context, if appropriate privacy and ethical safeguards are in place, then big data is both relevant and useful, now more than ever. For example, Global Positioning System coordinates obtained from cell phone records can be useful in tracking people’s movements. During the pandemic, there is immense potential for using this data to predict hotspots and arrest the spread of the virus. As another example, sentiment analysis through the mining of social media data might be able to provide useful insights to help design appropriate health messaging for the public.
A new CEDIL-supported systematic map that was developed by 3ie draws together a unique and comprehensive collection of studies which use big data to measure or evaluate development outcomes. The map covers impact evaluations that use big data to evaluate development outcomes, systematic reviews of big data impact evaluations and other measurement studies that innovatively used big data to measure and validate any development outcomes. This blog is an attempt to highlight the role big data can play in solving challenges related to public health issues. We provide a snapshot of our gap map findings and then discuss the potential use of big data in the field of health.
What did the map find about big data and health outcomes?
Out of the 437 studies included in the map, 63 examined health-related development outcomes. Twenty-eight studies looked at interventions which sought to reduce mortality, and there were another 28 that evaluated interventions which aimed to end the epidemic of a communicable disease. There is, however, an absence of impact evaluations that measured the impact of an epidemic outbreak, both in smaller units like districts and larger units like a state or country.
Satellite data was used in 29 studies and was the most frequently used source of big data. For example, one of the included studies examined the coverage of measles outbreak vaccines in Niger. The study merged satellite derived measurements of population distribution with high resolution measles cases reported in the country. This was closely followed by cell phone call detail record (CDR), which was used in 27 studies. A study in Haiti aimed to assess whether CDR could predict early spatial evolution of the cholera epidemic. Furthermore, findings from our map show that the largest number of big data studies related to epidemics have been conducted in Sub-Saharan Africa, with the fewest in the Middle East and North Africa.
The map highlights a significant evidence gap when it comes to studies related to epidemics in fragile contexts. In terms of the overall evidence gaps in other health outcomes, fatalities due to road accidents, substance abuse and sexual and reproductive health services have seldom been studied.
Big data and COVID-19
With the exponential rise in the number of coronavirus cases, big data has the potential to help detect outbreaks. Bringing together data from a variety of sources, we can be using algorithms to analyse health records and trace the contact history of patients to help identify patterns of virus spread. These applications may be able to demarcate not just current zones with high number of cases but also help in predicting future outbreaks with the help of movement and contact tracing. One artificial intelligence technique called Natural Language Processing (NLP) deserves mention here. By analysing regular human interactions in the form of text and speech, NLP can help attribute more meaning to human communication. It can be used to scan social media links and online news reports, potentially raise an alarm when there are any new COVID-19 related developments in the world. NLP and other big data techniques may also be used in incident detection so that in the future, such health emergencies are tackled promptly. Despite significant advances in recent years, these technologies are still new and many implementation challenges such as information overload and data ambiguities, remain.
Many countries across the world are trying to blunt the pandemic curve with the help of smartphone applications (see table listing examples of mobile-based applications). These applications monitor people’s movement in order to trace whether they are in high-risk places or have come in contact with high-risk people.
The rising use of big data has however raised ethical concerns and posed legal challenges. These mobile phone applications have access to a significant amount of personal information. Ethical issues include compromised privacy, lack of personal autonomy and public demand for transparency and fairness while employing big data. It is therefore very important to carefully consider and implement data privacy policies while using big data.
Despite the privacy challenges, big data has a promising future in the health sector. As travel restrictions persist in many countries, there may be more and more opportunities for using big data to compensate for the lack of in-person data collection. However, to make that happen, financial investments are required. If organisations working in the area of health want to utilize big data, they would need to invest in the necessary technology, infrastructure and staff training. They would need smartphone applications with robust privacy protection and the necessary computing infrastructure for safely working with large amounts of data. Quite importantly, they would need to train staff in data analysis techniques. In order to explore the potential of big data in healthcare, a more systematic evaluation of the available methodologies by the research community is needed. Given the current public health crisis, it will be useful to see if big data can be employed to predict any disease outbreaks in the future.