It’s estimated that humans create a combined 2.5 quintillion bytes of data each day – and that around 90% of all data to ever exist was created in just the last few years. As difficult as it is to ignore the massive impact that connected technologies and social media have had on the way we experience our world, it’s somehow even more challenging to fully grasp the scale of it. For health researchers, this data explosion – and especially the rise of social media – has created tantalizing new opportunities to understand patient health, outcomes, and experiences outside of the confines of the clinical setting. But this promising source for real-world data represents only a tiny sliver of the potential ways we can use data to gain patient insights.
Traditionally, gathering patient data has run up against tangible resource limitations – it’s simply not possible to scale human-led collection of data from patients past a certain point. Factors like cost, geography and patient bias can all affect the usefulness of this data. For this reason, it has become attractive to look to sources like social media to better understand how a patient or group of patients are faring. It’s perhaps not shocking that candid posts on Facebook or Twitter can be a better lens into a recently discharged patient’s emotional state than a filled-out form or a conversation that takes place in an explicitly clinical context.
Using technologies like natural language processing (NLP) researchers are able to pick through the mass of social media data created each day to unveil relevant patient insights – for instance, looking for early signs of adverse drug reactions that could impact adherence or even result in significant health complications. With this technology, it is increasingly possible to parse the informal language of social media and correlate it to meaningful, clinically relevant patterns.
But to make this data work in meaningful ways, it’s critical that we remember that social media is still a highly limited window into the full patient experience. It can provide us with pieces of the puzzle – but not the solution itself. On top of that, real and legitimate privacy concerns mean that use of social media data may be on shaky regulatory ground. The EU’s GDPR regulations along with state-level data privacy laws in the United States limit how, when, and by whom this data can be collected and processed, even if it is available publicly.
It’s for these reasons that any organization looking to collect and analyze real-world evidence should ensure that social media is only a piece of their data equation. The good news is that the technologies that make social media analysis scalable and effective can be applied to a wider range of data sets to paint a more robust picture of patient experiences and outcomes. For instance, using natural language processing to automatically review and analyze the reams of academic, clinical and general-purpose media available in databases like PubMed can add additional perspectives to your data set. Published studies often include large amounts of traditionally collected patient data that, in aggregate, can be used to unveil new insights.
This work is already bearing fruit – for example, researchers have used NLP to evaluate large quantities of data related to conditions like autism spectrum disorder and prostate cancer available in published academic papers and studies. In doing so, they can look for patterns in what is effectively a large, randomized group of patients far faster than any human – in fact, the group of researchers looking at prostate cancer trends reported that using NLP to extract patient data from published studies reduced their workload by 95%.
Beyond the realm of clinical studies, general-purpose media, public records, and other freely available data streams can also be a source for new insights. By looking at these sources, we can fill in larger demographic trends and provide a lens into real-world patient responses, experiences, and outcomes that would not be available if social media was used exclusively.
The promise of machine learning and data science is its ability to not only look at a lot of data but also to take in data from seemingly disparate sources and begin to draw correlations and detect trends that humans can turn into innovations that make us healthier and more enlightened. To have the maximum impact, any medical research organization using social media data to inform its real-world evidence activity should ensure that they do not do so at the exclusion of other available data sources that can benefit from similar approaches.
About Benjamin Holmes
Ben Holmes, Ph.D. has been with Syapse since 2019 and brings a wealth of bioinformatics and data science experience to the team. Ben has served as the lead author on peer-reviewed publications in the areas of validation of the Syapse mortality composite score in oncology and the use of natural language processing (NLP) in uncovering valuable real-world data (RWD) in free-text notes and unstructured data fields. Ben lives in Ohio with his wife. If you want to learn more about Ben, check out his TEDxDayton talk on YouTube: Machines Learn Together and We Should Too.