
Breast Cancer

Team: Can Aykul, Jonas Wallat, Dr. Cameron Pierson, Prof. Dr. Maria-Esther Vidal
In the project “Breast Cancer Network Hannover”, which focuses on breast cancer, Prof. Tjoung-Won Park-Simon and Dr. Thilo Dörk-Bousset from the Department of Gynaecology (MHH) are cooperating with the Leibniz AI Lab to identify factors for therapeutic success in patients diagnosed with breast cancer. For this purpose, standardized data of about 5000 patients of the regional network “Network Breast Cancer” will be analysed. In a first step, medical history data of the patient and her family, tumor characteristics, therapy data, data on follow-up examinations and survival, genetic information as well as socioeconomic data of the patient will be integrated to enable a comprehensive analysis. Special emphasis will be placed on the association of socioeconomic aspects such as education and migration background with therapeutic success. Another focus is on the identification of sub-populations of patients based on the success of different therapy options to enable targeted, personalized therapy. In particular, the project aims to give optimized suggestions which patients will benefit more from neo-adjuvant therapy and which patients will benefit more from surgery.
While the current approach to predict relapse probability is using a logistic regression model, we aim to expand to more involved models such as decision trees, random forests, neural networks and introducing existing domain knowledge on breast cancer using knowledge graphs. Hence, a knowledge graph will be modeled and populated based on obtained patient data. Building upon benchmark knowledge graph embedding models such as TransE [1], ComplEx [2] and RotatE [3] a framework that can incorporate existing biomedical ontologies (e.g. Gene Ontology) will be developed and thence relapse probability of a treatment will be predicted. On top of this, in order to assist decision making of the clinician, a drug-drug interaction knowledge graph will be used to learn latent semantic representations of drugs/medications to predict potentially harmful drug interactions that may occur if a patient is required to take multiple medications simultaneously. While introducing more complex models, we will need to balance model performance and interpretability of our approaches. Especially with the use of neural networks, we will use existing interpretability techniques such as LIME [4] and Shapley Values [5].
Given the ethical implications of developing and using machine learning models as healthcare decision support systems, we use this opportunity to evaluate an existing ethical framework in parallel to developing the solutions described above: The rapid and increasing development of machine learning in healthcare applications (ML-HCAs) requires ethical examination to assess the impact of novel medical devices and methods on patient and society. It is imperative that such ethical examinations are made to elucidate the associated ethical considerations, whether known or new. As medical technology advances so must the concurrent ethical examination of use and scope, such as the nature of system application, the data underwriting said system, and impacts to patient, society, and healthcare. Such ethical examination is imperative to avoid embedding or amplifying biases into machine learning tools used in healthcare.
While ethical frameworks have been proposed (e.g., Floridi & Strait, 2020; Saltz & Dewar, 2019), Char and colleagues (2020) develop a framework is thoroughly and clearly constructed from pre-existing literature to systematically identify ethical considerations specific to ML-HCAs. While some argue for an ‘ethicist-as-designer’ auditing the developmental process of machine learning tools (van Wynsberghe & Robbins, 2014), there is increased benefit of implementation of such an ethical identification framework with a research team. As has been suggested elsewhere (e.g, Armstrong, 2017; Blay et al., 2012), the development of AI in medicine ought to be interdisciplinary and/or by co-design. Therefore, implementation of Char and colleague’s (2020) framework with a research team provides the benefit of auditing (i.e., van Wynsberghe & Robbins, 2014) from the investigators of this study, while also promoting ethical consideration identification and management in situ of the research group. Such implementation would promote the ethical development of ML-HCAs. The proposed framework, however, has yet to be independently evaluated. Thus, we aim to evaluate Char and colleagues’ (2020) pipeline framework within the context of a research group seeking to develop machine learning techniques to identify biomarkers of breast cancer patients to predict patient success to chemotherapy treatment.
References:
[1] Bordes, Antoine, et al. “Translating embeddings for modeling multi-relational data.” Advances in neural information processing systems 26 (2013).
[2] Trouillon, Théo, et al. “Complex embeddings for simple link prediction.” International conference on machine learning. PMLR, 2016.
[3] Sun, Zhiqing, et al. “Rotate: Knowledge graph embedding by relational rotation in complex space.” arXiv preprint arXiv:1902.10197 (2019).
[4] M. Ribeiro – “Why Should I Trust You?” Explaining the Predictions of Any Classifier – https://dl.acm.org/doi/pdf/10.1145/2939672.2939778
[5] S. Lundberg – A Unified Approach to Interpreting Model Predictions – https://www.semanticscholar.org/paper/A-Unified-Approach-to-Interpreting-Model-Lundberg-Lee/442e10a3c6640ded9408622005e3c2a8906ce4c2
Acute Lymphoblastic Leukemia

Team: Michelle Tang, PD Dr. Anke Bergmann
B-progenitor acute lymphoblastic leukemia (B-ALL) is the most common pediatric malignancy. Next Generation Sequencing (NGS) technologies have been incorporated into routine diagnostics. Among them, the cost-effective targeted RNA sequencing is particularly appealing. We analyzed targeted RNA sequencing on ~1,500 pediatric ALL patients from the German pediatric ALL study groups. We combine UMAP (Uniform Manifold Approximation and Projection) and supervised machine learning algorithms to build an interactive tool for visualization and prediction of diagnostic subgroups. We explore a variety of machine learning techniques including gene network informed neural networks to build our predictive model. The tool helps to stratify patients without aberrant fusion or aneudiploidy, validate conventional diagnostic methods and discover new subgroups. In the future, we plan to expand such AI assisted diagnostic tool to more clinical , transcriptomic and epigenetic data. The proposed workflow will greatly complement the current diagnostic routine, provide better treatment options for patients and pave the way for personalized oncology.
Psychiatric Disorders

Team: Soumyadeep Roy, Salomon Kabongo Kabenamualu, Prof. Niloy Ganguly, Prof. Dr. Helge Frieling, Dr. Stefanie Mücke, Dominik Wolff
In the project “Big Data in Psychiatric Disorders”, Prof. Dr. Helge Frieling of the Department of Psychiatry, Social Psychiatry and Psychotherapy (MHH) is working together with the Leibniz AI Lab on the focus areas of schizophrenia and neurodegenerative diseases. In the first sub-project, genetic information from around 50,000 patients diagnosed with schizophrenia is being evaluated using artificial intelligence in order to identify possible subtypes. The hypothesis here is that schizophrenia as a phenotype is based on a wide variety of causes that require differentiated diagnosis and therapy. We will focus on this project and have completed the data request formalities. However, we are yet to receive the data from NIMH.
Therefore, we are working on patient subtyping of Parkinson‘s disease, a neuro-degenerative disease, using clinical and genetic data. Most works focus of patient subtyping of Parkinson Disease (PD) based on motor symptoms and typically the population consider older population (above the age of 60 years). Recently, researchers also include non-motor symptoms to define patient subtypes because non-motor symptoms often precede the development of classical motor signs and contribute significantly to overall prognosis. Specifically, we plan to identify patient subtypes in younger patients with PD (below the age of 60 years) in terms of clinical and genetic data. We are also interested in patients with comorbodities like schizophrenia, severe depression. We have developed a binary classification model for predicting whether a patient has PD or not. We use the learnt decision tree to determine the patient subtypes; this is the first approach we take to overcome the limitation that the ground truth patient subtype labels are not available. Currently, we are performing a characterisation study of PD patient subtypes in terms of clinical data. In future, we plan to further characterize these clinical patient subtypes in terms of their genotype data. Along the same lines, we are currently exploring a second approach for patient subtyping where we directly cluster the patients in terms of their genotype data (SNP data).
ICU

Team: Leonie Basso, Jingge Xiao, Seham Nasr, Dr. Zhao Ren, Prof. Antje Wulff, PD. Dr. Thomas Jack, PD. Dr. Henning Rathert, Marcel Mast, Prof. Michael Marschollek, Prof. Wolfgang Nejdl
In the project of “Pediatric Intensive Care Unit (PICU) use case”, Professor Antje Wulff, PD Dr. Thomas Jack, PD. Dr. Henning Rathert, Marcel Mast and Prof. Michael Marschollek from Hannover Medical School are working with the Leibniz AI Lab on the target of automatically detecting organ dysfunction in PICUs. Due to immediate decision-making with high risk and stress at a high level for clinicians in ICU wards, a data-intensive environment, it is essential to develop automatic decision-making models with the state-of-the-art machine learning and deep learning topologies; thus, promoting the development of real-time models for making decisions and mitigating the pressure of clinicians. More importantly, there are several difficulties during the decision-making procedure in PICUs: i) Different diseases dominate specific age groups from 0 to 18 years, and ii) normative values spread widely in different age groups. However, there are only a few research studies working on analysis of the data collected from PICU wards. In this regard, the project of PICU use case focuses on predicting organ dysfunction based on PICU data. There are two major branches that have been planned in this project. In the following, the two branches will be introduced.
i) We will focus on processing the clinical data which mainly contains vital signs (e.g., respiratory rate, heart rate, etc), laboratory parameters (e.g., leucocytes), and patient data (e.g., height, weight, etc).
ii) A new database of the waveform data (e.g., electrocardiogram) from the bedside monitors will be collected. The benchmark will be set up when the data is collected and pre-processed (e.g., anonymization) and a series of machine learning and deep learning approaches will be applied.
In summary, the research of this project is expected to facilitate related research studies in the applications of AI in PICU wards.
2022
-
1.Geisler, S., Vidal, M.-E., Cappiello, C., Loscio, B. F., Gal, A., Jarke, M., Lenzerini, M., Missier, P., Otto, B., Paja, E., Pernici, B., and Rehof, J. (2022) Knowledge-Driven Data Ecosystems Toward Data Transparency, Journal of Data and Information Quality, Association for Computing Machinery (ACM) 14, 1–12.
-
1.Hinrichs, R., Jiang, N., Beltran, R., Krause, T., Käding, M., Lange, A., Schmidt, B., Ostermann, J., and Marx, S. (2022) Analysis of the Repeatability of the Pencil Lead Break in Comparison to the Ball Impact and Electromagnetic Body-Noise Actuator. In 20th World Conference on Non-Destructive Testing (WCNDT 2020).
-
1.Mukherjee, R., Vishnu, U., Peruri, H. C., Bhattacharya, S., Rudra, K., Goyal, P., and Ganguly, N. (2022) MTLTS: A Multi-Task Framework To Obtain Trustworthy Summaries From Crisis-Related Microblogs. In Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining, pp. 755–763, Association for Computing Machinery, Virtual Event, AZ, USA.
AbstractURLBibTeXEndNoteBibSonomy
2021, 2020
2021
-
1.Sheshadri, S., Saha, A., Patel, P., Datta, S., and Ganguly, N. (2021) Graph-based semi-supervised learning through the lens of safety. In Proceedings of the Thirty-Seventh Conference on Uncertainty in Artificial Intelligence (de Campos, C., and Maathuis, M. H., Eds.), pp. 1576–1586, PMLR.
AbstractURLBibTeXEndNoteBibSonomy
-
1.Roy, S., Chakraborty, S., Mandal, A., Balde, G., Sharma, P., Natarajan, A., Khosla, M., Sural, S., and Ganguly, N. (2021) Knowledge-Aware Neural Networks for Medical Forum Question Classification. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management, pp. 3398–3402, Association for Computing Machinery, New York, NY, USA.
AbstractURLBibTeXEndNoteBibSonomy
-
1.Nandy, A., Sharma, S., Maddhashiya, S., Sachdeva, K., Goyal, P., and Ganguly, N. (2021) Question Answering over Electronic Devices: A New Benchmark Dataset and a Multi-Task Learning based QA Framework. In Findings of the Association for Computational Linguistics: EMNLP 2021, Association for Computational Linguistics.
-
1.Eggensperger, K., Müller, P., Mallik, N., Feurer, M., Sass, R., Klein, A., Awad, N., Lindauer, M., and Hutter, F. (2021) HPOBench: A Collection of Reproducible Multi-Fidelity Benchmark Problems for HPO. In Proceedings of the international conference on Neural Information Processing Systems (NeurIPS) (Datasets and Benchmarks Track).
-
1.Decker, M., Lammens, T., Ferster, A., Erlacher, M., Yoshimi, A., Niemeyer, C. M., Ernst, M. P. T., Raaijmakers, M. H. G. P., Duployez, N., Flaum, A., Steinemann, D., Schlegelberger, B., Illig, T., and Ripperger, T. (2021) Functional classification of RUNX1 variants in familial platelet disorder with associated myeloid malignancies, Leukemia.
-
1.Olatunji, I. E., Nejdl, W., and Khosla, M. (2021) Membership inference attack on graph neural networks. In IEEE International Conference on Trust, Privacy and Security in Intelligent Systems, and Applications (short version presented in ICLR-21 Workshop on Distributed and Private Machine Learning (DPML) ).
-
1.Souza, A., Nardi, L., Oliveira, L., Olukotun, K., Lindauer, M., and Hutter, F. (2021) Bayesian Optimization with a Prior for the Optimum. In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD).
-
1.Rumberg, L., Ehlert, H., Lüdtke, U., and Ostermann, J. (2021) Age-Invariant Training for End-to-End Child Speech Recognition using Adversarial Multi-Task Learning. In Proceedings INTERSPEECH 2021 -- 22th Annual Conference of the International Speech Communication Association.
-
1.Guerrero-Viu, J., Hauns, S., Izquierdo, S., Miotto, G., Schrodi, S., Biedenkapp, A., Elsken, T., Deng, D., Lindauer, M., and Hutter, F. (2021) Bag of Baselines for Multi-objective Joint Neural Architecture Search and Hyperparameter Optimization. In Proceedings of the international workshop on Automated Machine Learning (AutoML) at ICML’21.
-
1.Booth, A., Reed, A. B., Ponzo, S., Yassaee, A., Aral, M., Plans, D., Labrique, A., and Mohan, D. (2021) Population risk factors for severe disease and mortality in COVID-19: A global systematic review and meta-analysis, PLOS ONE, Public Library of Science 16, 1–30.
AbstractURLBibTeXEndNoteBibSonomy
-
1.Bellinghausen, C., Pletz, M. W., Rupp, J., Witzenrath, M., Welsch, C., Zeuzem, S., Trebicka, J., Rohde, G. G. U., and of the CAPNETZ study group, M. (2021) Chronic liver disease negatively affects outcome in hospitalised patients with community-acquired pneumonia, Gut 70, 221–222.
-
1.Becker, M., Strengert, M., Junker, D., Kaiser, P. D., Kerrinnes, T., Traenkle, B., Dinter, H., Häring, J., Ghozzi, S., Zeck, A., Weise, F., Peter, A., Hörber, S., Fink, S., Ruoff, F., Dulovic, A., Bakchoul, T., Baillot, A., Lohse, S., Cornberg, M., Illig, T., Gottlieb, J., Smola, S., Karch, A., Berger, K., Rammensee, H.-G., Schenke-Layland, K., Nelde, A., Märklin, M., Heitmann, J. S., Walz, J. S., Templin, M., Joos, T. O., Rothbauer, U., Krause, G., and Schneiderhan-Marra, N. (2021) Exploring beyond clinical routine SARS-CoV-2 serology using MultiCoV-Ab to evaluate endemic coronavirus cross-reactivity, Nat Commun 12.
AbstractBibTeXEndNoteBibSonomy
-
1.Hachmann, H., Krüger, B., Rosenhahn, B., and Nogueira, W. (2021) Localization Of Cochlear Implant Electrodes From Cone Beam Computed Tomography Using Particle Belief Propagation. In 2021 IEEE 18th International Symposium on Biomedical Imaging (ISBI), pp. 593–597.
AbstractURLBibTeXEndNoteBibSonomy
-
1.Luo, C., Lin, J., Cai, S., Chen, X., He, B., Qiao, B., Zhao, P., Lin, Q., Zhang, H., Wu, W., Rajmohan, S., and Zhang, D. (2021) AutoCCAG: An Automated Approach to Constrained Covering Array Generation. In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE), pp. 201–212.
AbstractURLBibTeXEndNoteBibSonomy
-
1.Holzapfel, C., Sag, S., Graf-Schindler, J., Fischer, M., Drabsch, T., Illig, T., Grallert, H., Stecher, L., Strack, C., Caterson, I., Jebb, S., Hauner, H., and Baessler, A. (2021) Association between single nucleotide polymorphisms and weight reduction in behavioural interventions—a pooled analysis, Nutrients, MDPI 13.
AbstractURLBibTeXEndNoteBibSonomy
-
1.Warnstorf, D., Bawadi, R., Schienke, A., Strasser, R., Schmidt, G., Illig, T., Tauscher, M., Thol, F., Heuser, M., Steinemann, D., Davenport, C., Schlegelberger, B., Behrens, Y. L., and Göhring, G. (2021) Unbalanced translocation der(5;17) resulting in a TP53 loss as recurrent aberration in myelodysplastic syndrome and acute myeloid leukemia with complex karyotype, Genes Chromosomes Cancer 60, 452–457.
AbstractBibTeXEndNoteBibSonomy
-
1.Liu, Z., Pavao, A., Xu, Z., Escalera, S., Ferreira, F., Gyon, I., Hong, S., Hutter, F., Ji, R., Junior, J. J., Li, G., Lindauer, M., Luo, Z., Madadi, M., Nierhoff, T., Niu, K., Pan, C., Stoll, D., Treguer, S., Jin, W., Wang, P., Wu, C., Youcheng, X., Zela, A., and Zhang, Y. (2021) Winning solutions and post-challenge analyses of the ChaLearn AutoDL challenge 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence 1–18.
2020
-
1.Scheffner, I., Gietzelt, M., Abeling, T., Marschollek, M., and Gwinner, W. (2020) Patient Survival After Kidney Transplantation: Important Role of Graft-sustaining Factors as Determined by Predictive Modeling Using Random Survival Forest Analysis, Transplantation 104, 1095–1107.
AbstractBibTeXEndNoteBibSonomy
-
1.Wallat, J., Singh, J., and Anand, A. (2020) BERTnesia: Investigating the capture and forgetting of knowledge in BERT.. In Proceedings of the Third BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP, pp. 174–183, Association for Computational Linguistics, Online.
AbstractURLBibTeXEndNoteBibSonomy
-
1.Liu, Z., Pavao, A., Xu, Z., Escalera, S., Ferreira, F., Guyon, I., Hong, S., Hutter, F., Ji, R., Jacques, J., Li, G., Lindauer, M., Luo, Z., Madadi, M., Nierhoff, T., Niu, K., Pan, C., Stoll, D., Treguer, S., Wang, J., Wang, P., Wu, C., Xiong, Y., Zela, A., and Zhang, Y. (2020) Winning solutions and post-challenge analyses of the ChaLearn AutoDL challenge 2019. In HAL.