Harnessing Big Data for Precision Medicine and Healthcare

Workshop Series Organized by Leibniz AI Lab

Thank you, to our speakers and to all attendees. We hope you enjoyed the workshop as much as we did. The recordings will soon be uploaded on our website. Follow us on our social platforms for updates.

Scroll down to find the schedule below.

Medical research, diagnostics and therapy are increasingly IT-supported and networked. Huge amounts of heterogeneous routine care data are already being collected in hospitals, e.g., in intensive care units, as well as in primary care institutions and by citizens in their personal environments. In addition, the use of genomics, metabolomics and other omics data can provide insights into the development of illnesses and suitability for specific treatments, which enables individualized therapies and treatment. Through our series of workshops in Leibniz AI Lab we aim to build a community involving researchers from machine learning, AI, biology, medical professionals and more to address the shared questions.

The main topics for our first workshop on June 10-11, 2021 include

Computational Methods in Medicine
Mental Health
Databases/Dataintegration/Medical Text and Data Mining
Epidemiology

Confirmed Speakers

Tim Althoff

University of Washington

Althoff

Helge Frieling

Medizinische Hochschule Hannover

Frieling

Surjya Ghosh

CWI, Netherland

Ghosh

Prof. Dr. Robert Hoehndorf

King Abdullah University of Science and Technology, Saudi Arabia

Hoehndorf

Yang Li

Helmholtz Centre, Hannover

Li

Madhav Marathe

University of Virginia

Marathe

Gautam Menon

Ashoka University, India

Menon

Animesh Mukherjee

Indian Institute of Technology, kharagpur

Mukherjee

Fernando Peruani

CY Cergy Paris University

Peruani

Manuel Gomez Rodriguez

MPI, SWS

Rodriguez

Prof. Dr.-Ing. Bodo Rosenhahn

Leibniz Universität Hannover

Rosenhahn

Bidisha Samanta

Google Research, India

Samanta

Akane Sano

Electrical and Computer Engineering, Rice University,

Sano

Stefan Schulz

Institute of Medical Informatics, Austria

Schulz

Antje Wulff

TU Braunschweig and the Hannover Medical School,

Wulff

Technical Support

Sophie Boneß

sophie.boness@l3s.de

Boneß

Humaun Rashid

rashid@l3s.de

Rashid

Schedule list

Speaker	Video		Time (CET)
Session: Computational Methods in Medicine
Introduction to Future Lab	L3S Future Lab June 2021 Intro	June 10	9:00-9:10
Yang Li	not available	June 10	9:15-9:45
Bodo Rosenhahn	Bodo Rosenhahn's talk	June 10
Bidisha Samanta	Bidisha Samanta's talk	June 10
Manuel Gomez Rodriguez	Manuel Gomez Rodriguez' talk	June 10	11:15- 12:05

Session: Mental Health
Helge Frieling	Helge Frieling's talk	June 10	3:00-3:35
Surjya Ghosh	Surjya Ghosh's talk	June 10	3:40-4:15
Akane Sano	Akane Sano's talk	June 10
Tim Althoff	Tim Althoff's talk	June 10

Session: Medical Text and Data Mining
Robert Hoehndorf	Robert Hoehndorf's talk	June 11	9:00-09:50
Stefan Schulz	Stefan Schulz' talk	June 11
Animesh Mukherjee	Animesh Mukherjee's talk	June 11	10:50-11:20
Antje Wulff	Antje Wulff's talk	June 11	11:25-12:00

Session: Epidemiology		June 11
Fernando Peruani	Fernando Peruani's talk	June 11	3:10 - 4:00
Gautam Menon	Gautam Menon's talk	June 11	4:05 - 4:55
Madhav Marathe	Madhav Marathe's talk	June 11	5:00 – 5:55

Harnessing Big Data for Precision Medicine and Healthcare

9:00 am - 9:10 am

Introduction to Future Lab

9:15 am - 9:45 am Session: Computational Methods in Medicine

A big data approach for personalised infection medicine

The immune function at an individual level is highly variable. It is therefore important to capture the impact of genetics and biological molecules on the immune function to better understand the immune system as a whole. In my talk, I will describe novel strategies to study these factors by simultaneously modeling information from genome, transcriptome, proteome, metabolome and environmental profiles. The results from such a study will reveal insights into the mechanisms driving the immune response to pathogens and provide mathematical models for predicting individual variation in immune functions, a crucial step towards personalized prevention/treatment of infectious diseases.

Yang Li

9:50 am - 10:30 am Session: Computational Methods in Medicine

Multi Object Tracking for cells, microorganisms and human motion analysis

In this talk I will summarise recent results on multi object tracking using the tracking-by-detection paradigm. The challenge is to use detections (e.g. bounding boxes of objects in images) as input and the goal is to generate consistent trajectories over the entire sequence. This task can be cast as a network flow graph, which can be globally optimised using a linear programming (LP) formulation. I will present selected basics about (integer) linear programming, some simple discrete optimization tasks (e.g. graph matching), their respective LP formulations and finally explain graph based tracking and its optimisation. The talk continues with some applications, recent results e.g. on the MOTA benchmark and concludes with future research strands. Recent papers: Andrea Hornakova*, Roberto Henschel*, Bodo Rosenhahn, Paul Swoboda, (* equal contribution) Lifted Disjoint Paths with Application in Multiple Object Tracking Proceedings of the 37th International Conference on Machine Learning (ICML), July 2020 Roberto Henschel, Timo von Marcard, Rosenhahn Bodo Simultaneous Identification and Tracking of Multiple People using Video and IMUs Computer Vision and Pattern Recognition Workshops (CVPRW), June 2019 Timo von Marcard, Roberto Henschel, Michael J. Black, Bodo Rosenhahn, Gerard Pons-Moll Recovering Accurate 3D Human Pose in The Wild Using IMUs and a Moving Camera European Conference on Computer Vision, September 2018 Weblink: https://motchallenge.net

Prof. Dr.-Ing. Bodo Rosenhahn

10:35 am - 11:10 am Session: Computational Methods in Medicine

A deep generative model for molecular graphs

Deep generative models have been praised for their ability to learn smooth latent representation of images, text, and audio, which can then be used to generate new, plausible data. However, current generative models are unable to work with molecular graphs due to their unique characteristics—their underlying structure is not Euclidean or grid-like, they remain isomorphic under permutation of the nodes labels, and they come with a different number of nodes and edges. In our work we propose a variational autoencoder for molecular graphs, whose encoder and decoder are specially designed to account for the above properties by means of several technical innovations. We further develop a gradient-based algorithm to optimize the decoder of our model so that it learns to generate molecules that maximize the value of a certain property of interest and, given a molecule of interest, it is able to optimize the spatial configuration of its atoms for greater stability.

Bidisha Samanta

11:15 am - 12:05 pm Session: Computational Methods in Medicine

Learning Under Algorithmic Triage

Under algorithmic triage, a machine learning model does not predict all instances but instead defers some of them to human experts. The motivation that underpins learning under algorithmic triage is the observation that, while there are high-stake tasks where machine learning models have matched, or even surpassed, the average performance of human experts, they are still less accurate than human experts on some instances, where they make far more errors than average. The main promise is that, by working together, human experts and machine learning models are likely to achieve a considerably better performance than each of them would achieve on their own. In this talk, I will present several algorithms to learn under algorithmic triage that we have developed in recent years, discuss their theoretical properties, and present a variety of experimental results demonstrating their potential in improving medical diagnosis, content moderation and scientific discovery.

Manuel Gomez Rodriguez

3:00 pm - 3:35 pm Session: Mental Health

Big data and mental health - moving forward to precision psychiatry

One of the major obstacles in today’s psychiatric research is the obvious mismatch between the diagnostic categories stemming mainly from the end of the 19th century and modern neurobiological concepts of normal and disrupted brain function, leading to a lag in the development of new and more effective therapies/therapeutics. One main research goal of my group is the use of (epi-)genetic markers to detect and categorize biologically distinct sub-groups of psychiatric disorders, using therapy response as a primary phenotype. We were able to establish DNA methylation markers indicating an elevated risk for non-response to standard mono-aminergic antidepressants (EU and US patent granted) or for non-response under electro-convulsive therapy (ECT). Further DNA methylation markers that predict positive response to ECT and specific psychotherapeutic treatments are currently being tested. Large data sets derived from the different branches of basic or clinical research contain a wealth of information. To access and implement this information, new methods of data analysis are needed. Pattern recognition based on artificial intelligence/neuronal networks is a feasible approach to tackle these research questions. During recent years, we have gained expertise in the use of these “big-data” methods for the integrative analysis of molecular and clinical data. The use of self-learning algorithms not only helps to discover new and unexpected relationships between molecular and clinical data but also fosters the development of diagnostic and treatment algorithms in an iterative and evolutionary way (plan-do-check-act (PDCA)-cycle integrating patient’s care and research goals), paving the way for precision psychiatry.

Helge Frieling

3:40 pm - 4:15 pm Session: Mental Health

Developing Smartphone Keyboard Interaction based Emotion Detection System

Keyboard interactions on communication applications, like WhatsApp, FB messenger can induce emotional exchanges. Moreover, monitoring keyboard interaction is unobtrusive and does not have high resource implications. As a result, exploring different types of keyboard interaction patterns (such as typing speed, touch pressure, error rate, special characters usage) can reveal emotion cues and aid in developing non-intrusive, resource- friendly emotion tracking applications. Additionally, the effects of emotion in one session of communication can persist across sessions, which if modeled jointly with keyboard interaction patterns, can further improve the emotion detection performance. We, in this work, discuss the design, development, and implementation of an Android application, TouchSense, which leverages keyboard interaction characteristics and implements a machine learning model to infer multiple emotion states (happy, sad, stressed, relaxed). We also discuss the approaches of automatically learning the keyboard interaction representation and adopting a Multi-task Learning strategy so that the keyboard interaction similarity among users can be leveraged for a superior performance.

Surjya Ghosh

4:20 pm - 5:10 pm Session: Computational Methods in Medicine

Multimodal sensor machine learning for mental health

Digital phenotyping and machine learning technologies have shown a potential to measure objective behavioral and physiological markers, provide risk assessment for people who might have a high risk of poor health and wellbeing, and help better decisions or behavioral changes to support health and wellbeing. I will introduce a series of studies, algorithms, and systems we have developed for measuring, predicting, and supporting personalized health and wellbeing. I will also discuss challenges, learned lessons, and potential future directions in health and wellbeing research.

Akane Sano

5:15 pm - 6:05 pm Session: Mental Health

Empathy in peer-to-peer mental health support

Access to mental health care is a global problem for hundreds of millions of people. Online mental health support may be able to help mitigate this global challenge. However, while mental health platforms attract millions of people, we lack a comprehensive understanding of how conversations on these platforms could be most effective in supporting those in need. Specifically, I will describe how we can measure empathy in mental health peer support and how we can give feedback in order to empower peer supporters to increase expressed levels of empathy, using large-scale neural transformer architectures and reinforcement learning approaches. I will also share learnings from applying these methods to a large corpus of support conversations.

Tim Althoff

9:00 am - 9:50 am Session: Medical Text and Data Mining

Machine learning with biomedical ontologies for precision health

The life sciences have invested significant resources in the development and application of semantic technologies to make research data accessible and interlinked, and to enable the integration and analysis of data. Utilizing the semantics associated with research data in data analysis approaches is often challenging. Now, novel methods are becoming available that combine symbolic methods and statistical methods in Artificial Intelligence. In my talk, I will show how to incorporate biological background knowledge in machine learning models for identification of gene-disease associations, genomic variants that are causative for heritable disorders, and to predict protein functions. The methods I describe are generic and can be applied in other domains in which biomedical ontologies and structured knowledge bases exist.

Prof. Dr. Robert Hoehndorf

9:55 am - 10:45 am Session: Medical Text and Data Mining

Mining the electronic health records. Linguistic and ontological challenges

Despite great efforts to establish semantic standards for making clinical data interoperable, most of the electronic health record consists of narrative content written by clinicians in their own jargon, whereas its structured and coded parts are incomplete and geared towards specific use cases such as billing. This talk will focus on challenges that concern the extracting of patient-related information from clinical information systems. Starting with the analysis of clinical language and presenting the state of clinical text mining, I will then address questions of the semantic post-processing and normalisation of the text mining output, taking into account the problem of separating the ontological meaning from the context, against the background of current health IT standards like SNOMED CT, LOINC and FHIR.

Stefan Schulz

10:50 am - 11:20 am Session: Medical Text and Data Mining

Characterizing the spread of exaggerated news content over social media

In this work, we consider a dataset comprising press releases about health research from different universities in the UK along with a corresponding set of news articles. As a first step we perform an exploratory data analysis to understand how the basic information published in the scientific journals get exaggerated as they are reported in these press releases or news articles. This initial analysis shows that some news agencies exaggerate almost 60% of the articles they publish in the health domain; more than 50% of the press releases from certain universities are exaggerated; articles in topics like lifestyle and childhood are heavily exaggerated. Motivated by the above observation we set the central objective of this paper to investigate how exaggerated news spreads over an online social network like Twitter. We next study the characteristics of the users who never or rarely post exaggerated news content and compare them with those who post exaggerated news content more frequently. We observe that the latter class of users have fewer retweets/mentions per tweet, have significantly more followers, use more slang words, fewer hyperbolic words, and less word contractions. We also observe that the LIWC categories like ‘bio’, ‘health’, ‘body’, and ‘negative emotion’ are more pronounced in the tweets posted by the users in the latter class. As a final step, we use these observations as features and automatically classify the two groups achieving an F1-score of 0.83.

Animesh Mukherjee

11:25 am - 12:00 pm Session: Medical Text and Data Mining

Clinical decision-support, clinical data repositories and reuse of clinical data

Recently, the amount of data produced in the healthcare system is increasing and with it the possibilities of reusing existing data. Clinical routine data may be valuable for gaining valuable insights for care and research but is currently not efficiently used. This talk will provide an insight into an open health data platform approach, established in the medical data integration centre of the Hannover Medical School, to enhance the reuse of such routine data as pursued in the "Medical Informatics Initiative". Potentials of this approach will be illustrated by presenting examples for querying those integrated and standardised data sets as well as for using data in applications such as clinical decision-support systems.

Antje Wulff

3:10 pm - 4:00 pm Session: Epidemiology

Beyond complex network models: epidemic models based on moving agent systems

Epidemic models canonically assume one of the following supports on to top of which the spreading occurs: i) a well-mixed population, ii) a (regular) lattice, or iii) an underlying complex network over which the disease propagates. Is there anything else beyond these three supports, i.e. well-mixed populations, lattices, or complex networks? In this talk we will discuss one alternative: the use of systems of mobile agents, where the agents adopt different states (e.g. state S, I, R). The theoretical importance of these models is paramount: they interpolate between well-mixed populations and lattice models, and exhibit a behavior that share same similarities with -- despite it cannot be reduced to -- the one observed on dynamical complex networks. At the application level, the advantage of these models is that they allow to evaluate the impact of human mobility at scales smaller than large-scale transportation networks. We will use of these models to investigate different sources of fluctuations and assess how predictable is the evolution of an epidemics. And in particular, we will see how a vaccination can lead, counterintuitively, to an increase of infections. Refs: Peruani, Sibona, Phys. Rev. Lett. 100, 168103 (2008); Soft Matter 15, 497-503 (2019), and Marcolongo et al. preprint (2021)

Fernando Peruani

4:05 pm - 4:55 pm Session: Epidemiology

Modeling COVID-19 in India

I will describe several models for the spread of COVID-19 in India. These include a complex compartmental model (INDSCI-SIM), network models which we use to suggest policies regarding testing, as well as ultra-large-scale agent-based models (the BHARATSIM project) which we use to describe disease spread in large cities such as Mumbai and Delhi as well as in several Indian states. These models are also used to study vaccination scenarios, the effects of mutant strains of the virus and the impacts of non-pharmaceutical interventions. I will describe insights gained from this body of work.

5:00 pm - 5:55 pm Session: Epidemiology

AI Driven Epidemic Science

COVID-19 pandemic is the most significant pandemic since the 1918 Influenza pandemic. It has had a significant social, economic and health impact globally. I will give an overview of the state of the art in computational epidemiology. I will then describe how data-driven scalable AI and analytics play an important role in supporting policy makers as they respond to the COVID-19 pandemic. I will conclude by articulating the challenges encountered while developing analytical tools as a pandemic is unfolding.

Madhav Marathe

Niloy Ganguly

Email : ganguly@l3s.de

Wolfgang Nejdl

Email : nejdl@l3s.de

Megha Khosla

Email : khosla@l3s.de