Machine learning in medicine must be comprehensible and replicable
Scientists at Technische Universität Berlin and Charité are applying machine learning techniques in imaging diagnostics
Sep 13, 2019
The partnership between Klaus-Robert Müller, a professor of machine learning at Technische Universität Berlin, and Frederick Klauschen, a professor of molecular pathology at Charité – Universitätsmedizin Berlin, was initially marred by a rejection; Müller was not prepared to divulge information to a research colleague working for Klauschen on a topic at the interface of machine learning and medicine. His reason was that fundamental issues of this kind could only be resolved in close cooperation. Müller, who has a strong interest in cancer research, suggested instead that a collaborative project be set up. The two scientists have now been working closely for more than six years.
“Innovative projects that involve more than one scientific discipline are often marathon projects,” says Müller. “The formulation of the common research question alone – such that it can be solved by the technologies in question – requires a long-term cooperative process that has to be agreed in fine detail. Germany lacks the support formats for this.”
A possible framework for interdisciplinary research questions in the future is now in place: the Berlin Center for Machine Learning (BZML, Berliner Zentrum für Maschinelles Lernen), set up in August 2018. It is led by Müller, and Klauschen is also a participant. Machine learning is a sub-field of artificial intelligence. Simply put, machine learning involves processes that enable the recognition of patterns using known data; this then allows unknown data to be assessed.
The aim of BZML is to draw on the synergies afforded by the extensive Berlin scientific landscape and to pool together basic research in the field of machine learning. The center unites scientists at Freie Universität Berlin, Humboldt-Universität zu Berlin, Technische Universität Berlin, Charité – Universitätsmedizin Berlin, the University of Potsdam, and numerous non-university research institutions.
One of its core aims is to tap into new scientific/technical applications of machine learning for use in other areas, including medicine. A shared project by Müller and Klauschen concerns imaging analysis in cancer diagnostics. Generally, cancer research involves looking at wafer-thin slices of tissue under the microscope. Using these images, the doctor can assess what type of tumor is present. “Pathologists are able to make a qualitative judgement, i.e., whether a tumor is malignant or not,” says Klauschen. “But quantitative analysis or description that gives a more detailed picture is considerably more of a challenge for the human eye. We would like to know for instance how quickly the tumor is growing, or how many immune cells (lymphocytes) have entered the tumor.”
“We know that in some carcinomas such as breast cancer, the number of lymphocytes present in the tumor tissue affects the prognosis. We are also discussing whether the number can have a predictive value – in other words, allow us to say what therapy is working and how well,” Klauschen says. “For this reason, the number is particularly interesting for research.” Up to now, the pathologist has had two options: either he or she estimates the number of lymphocytes, or counts them. The first method is imprecise, the second extremely cumbersome as a daily medical procedure.
Klauschen and Müller began to consider machine learning as an answer to these questions some six years ago. Now, a process exists that not only can convey the exact number of lymphocytes but also presents the result in visual form and thus is verifiable by doctors. “The ability to explain results is indispensable in medicine. For this alone, we had to set up a massive database of medical images through which the system can learn,” Klauschen explains.
“Currently we are working with Berlin Health Innovations on developing a prototype for clinical applications.” Frederick Klauschen
It is not enough to solve the mathematical problem – to establish the number of specific cells from imaging data. The math must be replicable by users. “Here, this happens thanks to Layerwise Relevance Propagation (LRP). This is a process we have developed in the project,” says Müller. It permits a view into the so-called “Black Box” of machine learning and presents the image points based on which the algorithm makes decisions, in a so-called heat map.
Recently, the system has been perfected to the stage where a heat map can be produced in just a few seconds. “Currently we are working with Berlin Health Innovations – the joint technology transfer unit of the Berlin Institute of Health (BIH) and Charité – on developing a prototype for clinical applications,” says Klauschen.
Close cooperation between the various Berlin institutions in joint projects like BZML or the Berlin Big Data Center (BBDC), also carried by Technische Universität Berlin in cooperation with Charité and other partners as well as beyond, is what makes such projects possible in the first place, Müller believes. The starting point is a scientific question, which must be honed to a point that allows the participating technologies to resolve it. Then, members of the BZML can attempt to develop the appropriate new algorithms. “At the first stage, these do not fulfil any real time conditions, but simply resolve the mathematical problem. Such methods can be scaled in the BBDC. In order to introduce them in medical practice, we need institutions like Berlin Health Innovations,” Müller states.
“The difficulty here – or one of them – is that we are dealing with very heterogeneous data and diverse information content.” Klaus-Robert Müller
Within the framework of the new Berlin Center for Machine Learning, the two scientists are already busy with the further development of the technology. “Our next goal is to supplement image-based diagnostics with molecular data, for instance mutation profiles or the specific protein composition of a sample,” says Klauschen. “The difficulty here – or one of them – is that we are dealing with very heterogeneous data and diverse information content; not only that, but the spatial resolution and underlying statistics also vary,” Müller says of the challenge they face.
At the same time, the topic is a perfect fit for BZML; the center is concerned with gathering and compiling information from divergent structures of multi-modal data from various sources. “Even if the theme of an individual area is varied, the questions for machine learning are similar,” says Müller. The BZML aims to achieve fundamentally new research contributions in the interdisciplinary fields of biomedicine, communication, and the digital humanities.
The Berlin Center for Machine Learning
The German Federal Ministry of Education and Research (BMBF) has subsidized the creation of the Berlin Center for Machine Learning (BZML) to the tune of 8.5 million euros, beginning in August 2018 and continuing for four years. The interdisciplinary forum BZML is under the leadership of Klaus-Robert Müller, a professor of machine learning at Technische Universität Berlin. BZML has four focus areas:
- to continue to advance the theoretical and algorithmic bases of machine learning
- to identify new scientific and technical applications for machine learning
- to realize new research findings in the interdisciplinary areas of biomedicine, communication, and digital humanities
- to design machine learning in a manner that is comprehensible and replicable.