Research
Use Cases
Team: Can Aykul, Jonas Wallat, Dr. Cameron Pierson, Prof. Dr. Maria-Esther Vidal
In the project "Breast Cancer Network Hannover", which focuses on breast cancer, Prof. Tjoung-Won Park-Simon and Dr. Thilo Dörk-Bousset from the Department of Gynaecology (MHH) are cooperating with the Leibniz AI Lab to identify factors for therapeutic success in patients diagnosed with breast cancer. For this purpose, standardized data of about 5000 patients of the regional network "Network Breast Cancer" will be analysed. In a first step, medical history data of the patient and her family, tumor characteristics, therapy data, data on follow-up examinations and survival, genetic information as well as socioeconomic data of the patient will be integrated to enable a comprehensive analysis. Special emphasis will be placed on the association of socioeconomic aspects such as education and migration background with therapeutic success. Another focus is on the identification of sub-populations of patients based on the success of different therapy options to enable targeted, personalized therapy. In particular, the project aims to give optimized suggestions which patients will benefit more from neo-adjuvant therapy and which patients will benefit more from surgery.
While the current approach to predict relapse probability is using a logistic regression model, we aim to expand to more involved models such as decision trees, random forests, neural networks and introducing existing domain knowledge on breast cancer using knowledge graphs. Hence, a knowledge graph will be modeled and populated based on obtained patient data. Building upon benchmark knowledge graph embedding models such as TransE [1], ComplEx [2] and RotatE [3] a framework that can incorporate existing biomedical ontologies (e.g. Gene Ontology) will be developed and thence relapse probability of a treatment will be predicted. On top of this, in order to assist decision making of the clinician, a drug-drug interaction knowledge graph will be used to learn latent semantic representations of drugs/medications to predict potentially harmful drug interactions that may occur if a patient is required to take multiple medications simultaneously. While introducing more complex models, we will need to balance model performance and interpretability of our approaches. Especially with the use of neural networks, we will use existing interpretability techniques such as LIME [4] and Shapley Values [5].
Given the ethical implications of developing and using machine learning models as healthcare decision support systems, we use this opportunity to evaluate an existing ethical framework in parallel to developing the solutions described above: The rapid and increasing development of machine learning in healthcare applications (ML-HCAs) requires ethical examination to assess the impact of novel medical devices and methods on patient and society. It is imperative that such ethical examinations are made to elucidate the associated ethical considerations, whether known or new. As medical technology advances so must the concurrent ethical examination of use and scope, such as the nature of system application, the data underwriting said system, and impacts to patient, society, and healthcare. Such ethical examination is imperative to avoid embedding or amplifying biases into machine learning tools used in healthcare.
While ethical frameworks have been proposed (e.g., Floridi & Strait, 2020; Saltz & Dewar, 2019), Char and colleagues (2020) develop a framework is thoroughly and clearly constructed from pre-existing literature to systematically identify ethical considerations specific to ML-HCAs. While some argue for an ‘ethicist-as-designer’ auditing the developmental process of machine learning tools (van Wynsberghe & Robbins, 2014), there is increased benefit of implementation of such an ethical identification framework with a research team. As has been suggested elsewhere (e.g, Armstrong, 2017; Blay et al., 2012), the development of AI in medicine ought to be interdisciplinary and/or by co-design. Therefore, implementation of Char and colleague’s (2020) framework with a research team provides the benefit of auditing (i.e., van Wynsberghe & Robbins, 2014) from the investigators of this study, while also promoting ethical consideration identification and management in situ of the research group. Such implementation would promote the ethical development of ML-HCAs. The proposed framework, however, has yet to be independently evaluated. Thus, we aim to evaluate Char and colleagues’ (2020) pipeline framework within the context of a research group seeking to develop machine learning techniques to identify biomarkers of breast cancer patients to predict patient success to chemotherapy treatment.
References:
[1] Bordes, Antoine, et al. "Translating embeddings for modeling multi-relational data." Advances in neural information processing systems 26 (2013). [2] Trouillon, Théo, et al. "Complex embeddings for simple link prediction." International conference on machine learning. PMLR, 2016. [3] Sun, Zhiqing, et al. "Rotate: Knowledge graph embedding by relational rotation in complex space." arXiv preprint arXiv:1902.10197 (2019). [4] M. Ribeiro - “Why Should I Trust You?” Explaining the Predictions of Any Classifier - https://dl.acm.org/doi/pdf/10.1145/2939672.2939778 [5] S. Lundberg - A Unified Approach to Interpreting Model Predictions - https://www.semanticscholar.org/paper/A-Unified-Approach-to-Interpreting-Model-Lundberg-Lee/442e10a3c6640ded9408622005e3c2a8906ce4c2Team: Michelle Tang, PD Dr. Anke Bergmann
B-progenitor acute lymphoblastic
leukemia (B-ALL) is the most common pediatric malignancy. Next Generation Sequencing (NGS) technologies have been incorporated into routine diagnostics. Among them, the cost-effective targeted RNA sequencing is particularly appealing. We analyzed targeted RNA sequencing on ~1,500 pediatric ALL patients from the German pediatric ALL study groups. We combine UMAP (Uniform Manifold Approximation and Projection) and supervised machine learning algorithms to build an interactive tool for visualization and prediction of diagnostic subgroups. We explore a variety of machine learning techniques including gene network informed neural networks to build our predictive model. The tool helps to stratify patients without aberrant fusion or aneudiploidy, validate conventional diagnostic methods and discover new subgroups. In the future, we plan to expand such AI assisted diagnostic tool to more clinical , transcriptomic and epigenetic data. The proposed workflow will greatly complement the current diagnostic routine, provide better treatment options for patients and pave the way for personalized oncology.
In the project "Big Data in Psychiatric Disorders", Prof. Dr. Helge Frieling of the Department of Psychiatry, Social Psychiatry and Psychotherapy (MHH) is working together with the Leibniz AI Lab on the focus areas of schizophrenia and neurodegenerative diseases. In the first sub-project, genetic information from around 50,000 patients diagnosed with schizophrenia is being evaluated using artificial intelligence in order to identify possible subtypes. The hypothesis here is that schizophrenia as a phenotype is based on a wide variety of causes that require differentiated diagnosis and therapy. We will focus on this project and have completed the data request formalities. However, we are yet to receive the data from NIMH.
Therefore, we are working on patient subtyping of Parkinson‘s disease, a neuro-degenerative disease, using clinical and genetic data. Most works focus of patient subtyping of Parkinson Disease (PD) based on motor symptoms and typically the population consider older population (above the age of 60 years). Recently, researchers also include non-motor symptoms to define patient subtypes because non-motor symptoms often precede the development of classical motor signs and contribute significantly to overall prognosis. Specifically, we plan to identify patient subtypes in younger patients with PD (below the age of 60 years) in terms of clinical and genetic data. We are also interested in patients with comorbodities like schizophrenia, severe depression. We have developed a binary classification model for predicting whether a patient has PD or not. We use the learnt decision tree to determine the patient subtypes; this is the first approach we take to overcome the limitation that the ground truth patient subtype labels are not available. Currently, we are performing a characterisation study of PD patient subtypes in terms of clinical data. In future, we plan to further characterize these clinical patient subtypes in terms of their genotype data. Along the same lines, we are currently exploring a second approach for patient subtyping where we directly cluster the patients in terms of their genotype data (SNP data).
Team: Leonie Basso, Jingge Xiao, Seham Nasr, Dr. Zhao Ren, Prof. Antje Wulff, PD. Dr. Thomas Jack, PD. Dr. Henning Rathert, Marcel Mast, Prof. Michael Marschollek, Prof. Wolfgang Nejdl
In the project of “Pediatric Intensive Care Unit (PICU) use case”, Professor Antje Wulff, PD Dr. Thomas Jack, PD. Dr. Henning Rathert, Marcel Mast and Prof. Michael Marschollek from Hannover Medical School are working with the Leibniz AI Lab on the target of automatically detecting organ dysfunction in PICUs. Due to immediate decision-making with high risk and stress at a high level for clinicians in ICU wards, a data-intensive environment, it is essential to develop automatic decision-making models with the state-of-the-art machine learning and deep learning topologies; thus, promoting the development of real-time models for making decisions and mitigating the pressure of clinicians. More importantly, there are several difficulties during the decision-making procedure in PICUs: i) Different diseases dominate specific age groups from 0 to 18 years, and ii) normative values spread widely in different age groups. However, there are only a few research studies working on analysis of the data collected from PICU wards. In this regard, the project of PICU use case focuses on predicting organ dysfunction based on PICU data. There are two major branches that have been planned in this project. In the following, the two branches will be introduced.
i) We will focus on processing the clinical data which mainly contains vital signs (e.g., respiratory rate, heart rate, etc), laboratory parameters (e.g., leucocytes), and patient data (e.g., height, weight, etc).
ii) A new database of the waveform data (e.g., electrocardiogram) from the bedside monitors will be collected. The benchmark will be set up when the data is collected and pre-processed (e.g., anonymization) and a series of machine learning and deep learning approaches will be applied.
In summary, the research of this project is expected to facilitate related research studies in the applications of AI in PICU wards.COVID-19, a disease caused by SARS-CoV2, can take many different forms, ranging in clinical severity from mild or asymptomatic illness to acute conditions such as ARDS (acute respiratory distress syndrome) and death. Several studies have already shown that, in addition to demographic factors and pre-existing conditions, genetic predisposition may play an important role in disease development. To better understand the pathophysiology and progression of COVID-19, clinicians and researchers at Hannover Medical School (MHH) have been collecting patient samples and data in the COVID-19 Biobank funded by the Lower Saxony Ministry of Science and Culture (MWK) since the beginning of the pandemic.
Broad molecular characterizations have been performed on the collected biospecimens, particularly on material from patients with severe clinical courses requiring intensive care and respiratory support. These global analyses include sequencing of the patient genome, gene expression, and the methylation state of specific bases in the genome (epigenome). These data are complemented by high-resolution optical analyses of structural DNA variants that may be associated with increased disease risk. In addition, a broad clinical dataset on all patients was collected by the Hannover Unified Biobank (HUB) in collaboration with the Pneumology Department of the MHH, which includes information on COVID-19 patients' previous disease, disease severity, therapeutic measures, complications, and disease outcome.
To bring together this extensive collection of molecular and clinical data, already comprising over 14 TB in its raw state, in an integrative analysis, the HUB is collaborating with scientists from the L3S Future Laboratory and Prof. Yang Li from the Helmholtz Centre for Infection Research (HZI). The integrative data analysis aims to bring together the different data layers and identify prognostic molecular markers or early disease patterns associated with further disease progression.
Future Lab Seminars
Publications
2024
2023
- Describing and Organizing Semantic Web and Machine Learning Systems in the SWeMLS-KG. In The Semantic Web - 20th International Conference, {ESWC} 2023, Hersonissos, Crete, Greece, May 28 - June 1, 2023, Proceedings (Pesquita, C., Jim{{é}}nez{-}Ruiz, E., McCusker, J. P., Faria, D., Dragoni, M., Dimou, A., Troncy, R., and Hertling, S., Eds.), pp. 372–389, Springer.(2023)
- Evaluating Prompt-Based Question Answering for Object Prediction in the Open Research Knowledge Graph. In Database and Expert Systems Applications - 34th International Conference, {DEXA} 2023, Penang, Malaysia, August 28-30, 2023, Proceedings, Part {I} (Strauss, C., Amagasa, T., Kotsis, G., Tjoa, A. M., and Khalil, I., Eds.), pp. 508–515, Springer.(2023)
- An Upper Ontology for Modern Science Branches and Related Entities. In The Semantic Web - 20th International Conference, {ESWC} 2023, Hersonissos, Crete, Greece, May 28 - June 1, 2023, Proceedings (Pesquita, C., Jim{{é}}nez{-}Ruiz, E., McCusker, J. P., Faria, D., Dragoni, M., Dimou, A., Troncy, R., and Hertling, S., Eds.), pp. 436–453, Springer.(2023)
- Increasing Reproducibility in Science by Interlinking Semantic Artifact Descriptions in a Knowledge Graph. In Leveraging Generative Intelligence in Digital Libraries: Towards Human-Machine Collaboration - 25th International Conference on Asia-Pacific Digital Libraries, {ICADL} 2023, Taipei, Taiwan, December 4-7, 2023, Proceedings, Part {II} (Goh, D. H.- }Lian, Chen, S.- }Jiun, and Tuarob, S., Eds.), pp. 220–229, Springer.(2023)
- Probing BERT for Ranking Abilities. In Advances in Information Retrieval - 45th European Conference on Information Retrieval, {ECIR} 2023, Dublin, Ireland, April 2-6, 2023, Proceedings, Part {II} (Kamps, J., Goeuriot, L., Crestani, F., Maistro, M., Joho, H., Davis, B., Gurrin, C., Kruschwitz, U., and Caputo, A., Eds.), pp. 255–273, Springer.(2023)
- LLMs4OL: Large Language Models for Ontology Learning. In The Semantic Web - {ISWC} 2023 - 22nd International Semantic Web Conference, Athens, Greece, November 6-10, 2023, Proceedings, Part {I} (Payne, T. R., Presutti, V., Qi, G., Poveda{-}Villal{{ó}}n, M., Stoilos, G., Hollink, L., Kaoudi, Z., Cheng, G., and Li, J., Eds.), pp. 408–427, Springer.(2023)
2022
- Overview of Touch{é} 2022: Argument Retrieval: Argument Retrieval: Extended Abstract. In Advances in Information Retrieval (Hagen, M., Verberne, S., Macdonald, C., Seifert, C., Balog, K., N{\o}rv{\aa}g, K., and Setty, V., Eds.) Part 2., pp. 339–346, Springer Science and Business Media Deutschland GmbH, Germany.(2022)
- MTLTS: A Multi-Task Framework To Obtain Trustworthy Summaries From Crisis-Related Microblogs. In Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining, pp. 755–763, Association for Computing Machinery, Virtual Event, AZ, USA.(2022)
2021
2020
- BERTnesia: Investigating the capture and forgetting of knowledge in BERT. In Proceedings of the Third BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP, BlackboxNLP@EMNLP 2020, Online, November 2020 (Alishahi, A., Belinkov, Y., Chrupala, G., Hupkes, D., Pinter, Y., and Sajjad, H., Eds.), pp. 174–183, Association for Computational Linguistics.(2020)
2019
2018
2017
2016
2015
2014
Web APP
Our work “A message passing framework with multiple data integration
for miRNA-disease association prediction” has been published in Scientific Reports. (https://www.nature.com/articles/s41598-022-20529-5).
We provide a web application accompanying this work to make the results easily accessible, and to foster assessments and future adoption. Using the web application, you can query verified information as well as the predictions of our model for specific miRNAs, diseases or pathways, covering 1618 miRNAs and 3679 diseases.
Web application: http://software.mpm.leibniz-ai-lab.de/