Harnessing Big Data for Precision Medicine and Healthcare

Harnessing Big Data for Precision Medicine and Healthcare

Workshop Series Organized by Leibniz AI Lab

Thank you, to our speakers and to all attendees. We hope you enjoyed the workshop as much as we did. The recordings will soon be uploaded on our website. Follow us on our social platforms for updates.

Scroll down to find the schedule below.

Medical research, diagnostics and therapy are increasingly IT-supported and networked. Huge amounts of heterogeneous routine care data are already being collected in hospitals, e.g., in intensive care units, as well as in primary care institutions and by citizens in their personal environments. In addition, the use of genomics, metabolomics and other omics data can provide insights into the development of illnesses and suitability for specific treatments, which enables individualized therapies and treatment. Through our series of workshops in Leibniz AI Lab we aim to build a community involving researchers from machine learning, AI, biology, medical professionals and more to address the shared questions.

The main topics for our first workshop on June 10-11, 2021 include

Computational Methods in Medicine
Mental Health
Databases/Dataintegration/Medical Text and Data Mining


Confirmed Speakers



Surjya Ghosh





Manuel Gomez Rodriguez


Prof. Dr.-Ing. Bodo Rosenhahn


Bidisha Samanta


Technical Support


Humaun Rashid


Schedule list

SpeakerVideoTime (CET)
Session: Computational Methods in Medicine
Introduction to Future LabL3S Future Lab June 2021 IntroJune 109:00-9:10
Yang Linot availableJune 109:15-9:45
Bodo RosenhahnBodo Rosenhahn's talkJune 10
Bidisha SamantaBidisha Samanta's talkJune 10
Manuel Gomez RodriguezManuel Gomez Rodriguez' talkJune 1011:15- 12:05

Session: Mental Health
Helge FrielingHelge Frieling's talkJune 103:00-3:35
Surjya GhoshSurjya Ghosh's talkJune 103:40-4:15
Akane SanoAkane Sano's talkJune 10
Tim AlthoffTim Althoff's talkJune 10

Session: Medical Text and Data Mining
Robert HoehndorfRobert Hoehndorf's talkJune 119:00-09:50
Stefan SchulzStefan Schulz' talkJune 11
Animesh MukherjeeAnimesh Mukherjee's talkJune 1110:50-11:20
Antje WulffAntje Wulff's talkJune 1111:25-12:00

Session: EpidemiologyJune 11
Fernando PeruaniFernando Peruani's talkJune 113:10 - 4:00
Gautam MenonGautam Menon's talkJune 114:05 - 4:55
Madhav MaratheMadhav Marathe's talkJune 115:00 – 5:55


Harnessing Big Data for Precision Medicine and Healthcare
9:00 am - 9:10 am

Introduction to Future Lab

9:15 am - 9:45 am Session: Computational Methods in Medicine

A big data approach for personalised infection medicine

The immune function at an individual level is highly variable. It is therefore important to capture the impact of genetics and biological molecules on the immune function to better understand the immune system as a whole. In my talk, I will describe novel strategies to study these factors by simultaneously modeling information from genome, transcriptome, proteome, metabolome and environmental profiles. The results from such a study will reveal insights into the mechanisms driving the immune response to pathogens and provide mathematical models for predicting individual variation in immune functions, a crucial step towards personalized prevention/treatment of infectious diseases.

Yang Li
9:50 am - 10:30 am Session: Computational Methods in Medicine

Multi Object Tracking for cells, microorganisms and human motion analysis

In this talk I will summarise recent results on multi object tracking using the tracking-by-detection paradigm. The challenge is to use detections (e.g. bounding boxes of objects in images) as input and the goal is to generate consistent trajectories over the entire sequence. This task can be cast as a network flow graph, which can be globally optimised using a linear programming (LP) formulation. I will present selected basics about (integer) linear programming, some simple discrete optimization tasks (e.g. graph matching), their respective LP formulations and finally explain graph based tracking and its optimisation. The talk continues with some applications, recent results e.g. on the MOTA benchmark and concludes with future research strands. Recent papers: Andrea Hornakova*, Roberto Henschel*, Bodo Rosenhahn, Paul Swoboda, (* equal contribution) Lifted Disjoint Paths with Application in Multiple Object Tracking Proceedings of the 37th International Conference on Machine Learning (ICML), July 2020 Roberto Henschel, Timo von Marcard, Rosenhahn Bodo Simultaneous Identification and Tracking of Multiple People using Video and IMUs Computer Vision and Pattern Recognition Workshops (CVPRW), June 2019 Timo von Marcard, Roberto Henschel, Michael J. Black, Bodo Rosenhahn, Gerard Pons-Moll Recovering Accurate 3D Human Pose in The Wild Using IMUs and a Moving Camera European Conference on Computer Vision, September 2018 Weblink: https://motchallenge.net

Prof. Dr.-Ing. Bodo Rosenhahn
10:35 am - 11:10 am Session: Computational Methods in Medicine

A deep generative model for molecular graphs

Deep generative models have been praised for their ability to learn smooth latent representation of images, text, and audio, which can then be used to generate new, plausible data. However, current generative models are unable to work with molecular graphs due to their unique characteristics—their underlying structure is not Euclidean or grid-like, they remain isomorphic under permutation of the nodes labels, and they come with a different number of nodes and edges. In our work we propose a variational autoencoder for molecular graphs, whose encoder and decoder are specially designed to account for the above properties by means of several technical innovations. We further develop a gradient-based algorithm to optimize the decoder of our model so that it learns to generate molecules that maximize the value of a certain property of interest and, given a molecule of interest, it is able to optimize the spatial configuration of its atoms for greater stability.

Bidisha Samanta
11:15 am - 12:05 pm Session: Computational Methods in Medicine

Learning Under Algorithmic Triage

Under algorithmic triage, a machine learning model does not predict all instances but instead defers some of them to human experts. The motivation that underpins learning under algorithmic triage is the observation that, while there are high-stake tasks where machine learning models have matched, or even surpassed, the average performance of human experts, they are still less accurate than human experts on some instances, where they make far more errors than average. The main promise is that, by working together, human experts and machine learning models are likely to achieve a considerably better performance than each of them would achieve on their own. In this talk, I will present several algorithms to learn under algorithmic triage that we have developed in recent years, discuss their theoretical properties, and present a variety of experimental results demonstrating their potential in improving medical diagnosis, content moderation and scientific discovery.

Manuel Gomez Rodriguez
3:00 pm - 3:35 pm Session: Mental Health

Big data and mental health - moving forward to precision psychiatry

One of the major obstacles in today’s psychiatric research is the obvious mismatch between the diagnostic categories stemming mainly from the end of the 19th century and modern neurobiological concepts of normal and disrupted brain function, leading to a lag in the development of new and more effective therapies/therapeutics. One main research goal of my group is the use of (epi-)genetic markers to detect and categorize biologically distinct sub-groups of psychiatric disorders, using therapy response as a primary phenotype. We were able to establish DNA methylation markers indicating an elevated risk for non-response to standard mono-aminergic antidepressants (EU and US patent granted) or for non-response under electro-convulsive therapy (ECT). Further DNA methylation markers that predict positive response to ECT and specific psychotherapeutic treatments are currently being tested. Large data sets derived from the different branches of basic or clinical research contain a wealth of information. To access and implement this information, new methods of data analysis are needed. Pattern recognition based on artificial intelligence/neuronal networks is a feasible approach to tackle these research questions. During recent years, we have gained expertise in the use of these “big-data” methods for the integrative analysis of molecular and clinical data. The use of self-learning algorithms not only helps to discover new and unexpected relationships between molecular and clinical data but also fosters the development of diagnostic and treatment algorithms in an iterative and evolutionary way (plan-do-check-act (PDCA)-cycle integrating patient’s care and research goals), paving the way for precision psychiatry.

Helge Frieling
3:40 pm - 4:15 pm Session: Mental Health

Developing Smartphone Keyboard Interaction based Emotion Detection System

Keyboard interactions on communication applications, like WhatsApp, FB messenger can induce emotional exchanges. Moreover, monitoring keyboard interaction is unobtrusive and does not have high resource implications. As a result, exploring different types of keyboard interaction patterns (such as typing speed, touch pressure, error rate, special characters usage) can reveal emotion cues and aid in developing non-intrusive, resource- friendly emotion tracking applications. Additionally, the effects of emotion in one session of communication can persist across sessions, which if modeled jointly with keyboard interaction patterns, can further improve the emotion detection performance. We, in this work, discuss the design, development, and implementation of an Android application, TouchSense, which leverages keyboard interaction characteristics and implements a machine learning model to infer multiple emotion states (happy, sad, stressed, relaxed). We also discuss the approaches of automatically learning the keyboard interaction representation and adopting a Multi-task Learning strategy so that the keyboard interaction similarity among users can be leveraged for a superior performance.

Surjya Ghosh
4:20 pm - 5:10 pm Session: Computational Methods in Medicine

Multimodal sensor machine learning for mental health

Digital phenotyping and machine learning technologies have shown a potential to measure objective behavioral and physiological markers, provide risk assessment for people who might have a high risk of poor health and wellbeing, and help better decisions or behavioral changes to support health and wellbeing. I will introduce a series of studies, algorithms, and systems we have developed for measuring, predicting, and supporting personalized health and wellbeing. I will also discuss challenges, learned lessons, and potential future directions in health and wellbeing research.

Akane Sano
5:15 pm - 6:05 pm Session: Mental Health

Empathy in peer-to-peer mental health support

Access to mental health care is a global problem for hundreds of millions of people. Online mental health support may be able to help mitigate this global challenge. However, while mental health platforms attract millions of people, we lack a comprehensive understanding of how conversations on these platforms could be most effective in supporting those in need. Specifically, I will describe how we can measure empathy in mental health peer support and how we can give feedback in order to empower peer supporters to increase expressed levels of empathy, using large-scale neural transformer architectures and reinforcement learning approaches. I will also share learnings from applying these methods to a large corpus of support conversations.

Tim Althoff
9:00 am - 9:50 am Session: Medical Text and Data Mining

Machine learning with biomedical ontologies for precision health

The life sciences have invested significant resources in the development and application of semantic technologies to make research data accessible and interlinked, and to enable the integration and analysis of data. Utilizing the semantics associated with research data in data analysis approaches is often challenging. Now, novel methods are becoming available that combine symbolic methods and statistical methods in Artificial Intelligence. In my talk, I will show how to incorporate biological background knowledge in machine learning models for identification of gene-disease associations, genomic variants that are causative for heritable disorders, and to predict protein functions. The methods I describe are generic and can be applied in other domains in which biomedical ontologies and structured knowledge bases exist.

Prof. Dr. Robert Hoehndorf
9:55 am - 10:45 am Session: Medical Text and Data Mining

Mining the electronic health records. Linguistic and ontological challenges

Despite great efforts to establish semantic standards for making clinical data interoperable, most of the electronic health record consists of narrative content written by clinicians in their own jargon, whereas its structured and coded parts are incomplete and geared towards specific use cases such as billing. This talk will focus on challenges that concern the extracting of patient-related information from clinical information systems. Starting with the analysis of clinical language and presenting the state of clinical text mining, I will then address questions of the semantic post-processing and normalisation of the text mining output, taking into account the problem of separating the ontological meaning from the context, against the background of current health IT standards like SNOMED CT, LOINC and FHIR.

Stefan Schulz
10:50 am - 11:20 am Session: Medical Text and Data Mining

Characterizing the spread of exaggerated news content over social media

In this work, we consider a dataset comprising press releases about health research from different universities in the UK along with a corresponding set of news articles. As a first step we perform an exploratory data analysis to understand how the basic information published in the scientific journals get exaggerated as they are reported in these press releases or news articles. This initial analysis shows that some news agencies exaggerate almost 60% of the articles they publish in the health domain; more than 50% of the press releases from certain universities are exaggerated; articles in topics like lifestyle and childhood are heavily exaggerated. Motivated by the above observation we set the central objective of this paper to investigate how exaggerated news spreads over an online social network like Twitter. We next study the characteristics of the users who never or rarely post exaggerated news content and compare them with those who post exaggerated news content more frequently. We observe that the latter class of users have fewer retweets/mentions per tweet, have significantly more followers, use more slang words, fewer hyperbolic words, and less word contractions. We also observe that the LIWC categories like ‘bio’, ‘health’, ‘body’, and ‘negative emotion’ are more pronounced in the tweets posted by the users in the latter class. As a final step, we use these observations as features and automatically classify the two groups achieving an F1-score of 0.83.

Animesh Mukherjee
11:25 am - 12:00 pm Session: Medical Text and Data Mining

Clinical decision-support, clinical data repositories and reuse of clinical data

Recently, the amount of data produced in the healthcare system is increasing and with it the possibilities of reusing existing data. Clinical routine data may be valuable for gaining valuable insights for care and research but is currently not efficiently used. This talk will provide an insight into an open health data platform approach, established in the medical data integration centre of the Hannover Medical School, to enhance the reuse of such routine data as pursued in the "Medical Informatics Initiative". Potentials of this approach will be illustrated by presenting examples for querying those integrated and standardised data sets as well as for using data in applications such as clinical decision-support systems.

Antje Wulff
3:10 pm - 4:00 pm Session: Epidemiology

Beyond complex network models: epidemic models based on moving agent systems

Epidemic models canonically assume one of the following supports on to top of which the spreading occurs: i) a well-mixed population, ii) a (regular) lattice, or iii) an underlying complex network over which the disease propagates. Is there anything else beyond these three supports, i.e. well-mixed populations, lattices, or complex networks? In this talk we will discuss one alternative: the use of systems of mobile agents, where the agents adopt different states (e.g. state S, I, R). The theoretical importance of these models is paramount: they interpolate between well-mixed populations and lattice models, and exhibit a behavior that share same similarities with -- despite it cannot be reduced to -- the one observed on dynamical complex networks. At the application level, the advantage of these models is that they allow to evaluate the impact of human mobility at scales smaller than large-scale transportation networks. We will use of these models to investigate different sources of fluctuations and assess how predictable is the evolution of an epidemics. And in particular, we will see how a vaccination can lead, counterintuitively, to an increase of infections. Refs: Peruani, Sibona, Phys. Rev. Lett. 100, 168103 (2008); Soft Matter 15, 497-503 (2019), and Marcolongo et al. preprint (2021)

Fernando Peruani
4:05 pm - 4:55 pm Session: Epidemiology

Modeling COVID-19 in India

I will describe several models for the spread of COVID-19 in India. These include a complex compartmental model (INDSCI-SIM), network models which we use to suggest policies regarding testing, as well as ultra-large-scale agent-based models (the BHARATSIM project) which we use to describe disease spread in large cities such as Mumbai and Delhi as well as in several Indian states. These models are also used to study vaccination scenarios, the effects of mutant strains of the virus and the impacts of non-pharmaceutical interventions. I will describe insights gained from this body of work.

5:00 pm - 5:55 pm Session: Epidemiology

AI Driven Epidemic Science

COVID-19 pandemic is the most significant pandemic since the 1918 Influenza pandemic. It has had a significant social, economic and health impact globally. I will give an overview of the state of the art in computational epidemiology. I will then describe how data-driven scalable AI and analytics play an important role in supporting policy makers as they respond to the COVID-19 pandemic. I will conclude by articulating the challenges encountered while developing analytical tools as a pandemic is unfolding.

Madhav Marathe
  • Date : 10 Jun 2021 - 11 Jun 2021
  • Time : 9:00 am - 6:00 pm (UTC)


Niloy Ganguly

Email : ganguly@l3s.de
Profile :

Wolfgang Nejdl

Email : nejdl@l3s.de
Profile :

Megha Khosla

Email : khosla@l3s.de
Profile :