Prof. Dr. Ulf Leser
Profil
Forschungsthemen43
Bioinformatische Beschreibung von Ähnlichkeiten als Grundlage für verbesserte Empfehlungs-Algorithmen in der Präzisionsonkologie
Quelle ↗Förderer: Bundesministerium für Gesundheit Zeitraum: 04/2023 - 01/2027 Projektleitung: Prof. Dr. Ulf Leser
CellFinder - Informationsextraktion
Quelle ↗Förderer: DFG Sachbeihilfe Zeitraum: 09/2010 - 12/2012 Projektleitung: Prof. Dr. Ulf Leser
ColoNET: A Systems Biology Approach for Integrating Molecular Diagnostics and Targeted Therapy in Colorectal Cancer
Quelle ↗Förderer: Bundesministerium für Forschung, Technologie und Raumfahrt Zeitraum: 03/2009 - 02/2012 Projektleitung: Prof. Dr. Ulf Leser
easyTEM: Entwicklung von ressourceneffizienter Transmissionselektronenmikroskopie zur Demokratisierung ihres Einsatzes in der Materialforschung
Quelle ↗Förderer: DFG Sachbeihilfe Zeitraum: 04/2026 - 03/2029 Projektleitung: Prof. Dr. Thomas Kosch
Entwicklung von Ressourceneffizenter Transmissionselektronenmikroskopie zur Demokratisierung ihres Einsatzes in der Materialforschung
Quelle ↗Förderer: DFG Sachbeihilfe Zeitraum: 04/2026 - 03/2029 Projektleitung: Prof. Christoph T. Koch, PhD, Prof. Dr. Thomas Kosch, Prof. Dr. Ulf Leser
EU: Scalable, Secure Storage and Analysis of Biobank Data (BioBankCloud)
Quelle ↗Zeitraum: 12/2012 - 11/2015 Projektleitung: Prof. Dr. Ulf Leser
Exist-Gründerstipendium: Webpgr
Quelle ↗Förderer: Bundesministerium für Wirtschaft und Energie Zeitraum: 10/2013 - 09/2014 Projektleitung: Prof. Dr. Ulf Leser
EXIST-Gründungsstipendium: HAUT_APP
Quelle ↗Förderer: BMWE: EXIST Zeitraum: 01/2020 - 12/2020 Projektleitung: Prof. Dr. Ulf Leser
EX: KPI-Wall
Quelle ↗Förderer: Bundesministerium für Wirtschaft und Energie Zeitraum: 07/2011 - 06/2012 Projektleitung: Prof. Dr. Ulf Leser
FG 1306/1: Stratosphere - Information Management on the Cloud (TP D)
Quelle ↗Förderer: DFG Forschungsgruppe Zeitraum: 09/2010 - 10/2015 Projektleitung: Prof. Dr. Ulf Leser
FOR 1306/1: Stratosphere II – Information Management on the Cloud (TP D)
Quelle ↗Förderer: DFG Forschungsgruppe Zeitraum: 01/2014 - 12/2017 Projektleitung: Prof. Dr. Ulf Leser
FOR 2841/1: Ein umfassendes Verzeichnis regulatorischer Elemente mit Relevanz für menschliche Krankheiten und ihrer Variationen (TP 05)
Quelle ↗Förderer: DFG Forschungsgruppe Zeitraum: 03/2020 - 09/2023 Projektleitung: Prof. Dr. Ulf Leser
FOR 2841/2: Jenseits des Exoms - Auffindung, Analyse und Vorhersage des Krankheitspotenzials nichtkodierender DNA Varianten (TP 05)
Quelle ↗Förderer: DFG Forschungsgruppe Zeitraum: 07/2023 - 05/2027 Projektleitung: Prof. Dr. Ulf Leser
GRK 1651/2: Service-orientierte Architekturen zur Integration Software-gestützter Prozesse am Beispiel des Gesundheitswesens und der Medizintechnik (SOAMED)
Quelle ↗Förderer: DFG Graduiertenkolleg Zeitraum: 10/2014 - 12/2019 Projektleitung: Prof. Dr. Ulf Leser
GRK 2424/1: Computermethoden für personalisierte Therapien in der Onkologie
Quelle ↗Förderer: DFG Graduiertenkolleg Zeitraum: 06/2019 - 12/2024 Projektleitung: Prof. Dr. Ulf Leser
GRK 2424: Computermethoden für personalisierte Therapien in der Onkologie
Quelle ↗Förderer: DFG Graduiertenkolleg Zeitraum: 06/2019 - 05/2028 Projektleitung: Prof. Dr. rer. nat. Nils Blüthgen
GSC Berlin School of Integrative Oncology
Quelle ↗Zeitraum: 11/2014 - 10/2017 Projektleitung: Prof. Dr. Ulf Leser
Helmholtz-Einstein International Berlin Research School in Data Science (HEIBRiDS)
Quelle ↗Förderer: Helmholtz-Gemeinschaft Zeitraum: 04/2018 - 12/2024 Projektleitung: Prof. Dr. Ulf Leser
Helmholtz-Einstein International Berlin Research School in Data Science (HEIBRiDS)
Quelle ↗Förderer: Helmholtz-Gemeinschaft Zeitraum: 04/2018 - 12/2024 Projektleitung: Prof. Johann-Christoph Freytag Ph.D., Prof. Dr. Ulf Leser, Prof. Dr. Björn Scheuermann
IVD-EUMeD: Expertensystem zur evidenzbasierten Unterstützung von Medizinern bei der Diagnosestellung Medizinern bei der Diagnosestellung
Quelle ↗Förderer: Bundesministerium für Wirtschaft und Energie Zeitraum: 07/2012 - 06/2015 Projektleitung: Prof. Dr. Ulf Leser
Komplexe Datenbasen TP 3 II
Quelle ↗Zeitraum: 01/2007 - 12/2007 Projektleitung: Prof. Dr. Ulf Leser
Komplexe Datenbasen zur Rekonstruktion und Simulation evolutionärere Prozesse (Teilprojekt 3)
Quelle ↗Förderer: Land Berlin - Andere Zeitraum: 07/2003 - 12/2007 Projektleitung: Prof. Dr. Ulf Leser
Lernen von Ähnlichkeitsfunktionen für Tabellen
Quelle ↗Förderer: DFG Sachbeihilfe Zeitraum: 10/2018 - 08/2021 Projektleitung: Prof. Dr. Ulf Leser
MAPTor-Net - Personalisierte Therapie pankreatischer neuroendokriner Tumoren basierend auf mathematischen Modellen des MAPK-mTOR Netzwerks
Quelle ↗Förderer: Bundesministerium für Forschung, Technologie und Raumfahrt Zeitraum: 03/2015 - 12/2018 Projektleitung: Prof. Dr. Ulf Leser
Modellierung und Proteom-Signaturen Therapie-relevanter Signalwege in Tumoren
Quelle ↗Förderer: Bundesministerium für Forschung, Technologie und Raumfahrt Zeitraum: 02/2012 - 07/2015 Projektleitung: Prof. Dr. Ulf Leser
OncoPath: Dissecting and Modelling Vulnerabilities of Oncogeneic Pathways and Metabolism in Solid Cancers
Quelle ↗Förderer: Bundesministerium für Forschung, Technologie und Raumfahrt Zeitraum: 01/2013 - 06/2016 Projektleitung: Prof. Dr. Ulf Leser
Personalisierte Onkologie durch semantische Datenintegration
Quelle ↗Förderer: Bundesministerium für Forschung, Technologie und Raumfahrt Zeitraum: 03/2016 - 02/2021 Projektleitung: Prof. Dr. Ulf Leser
Procope: Sharing and Optimizing Scientific Workflows
Quelle ↗Förderer: DAAD Zeitraum: 01/2013 - 12/2014 Projektleitung: Prof. Dr. Ulf Leser
SeneSys (for iiLymTx) - Seneszenz-basierte systemmedizinische Stratifikation zur individualisierten Lymphomtherapie - Teilprojekt C
Quelle ↗Förderer: Bundesministerium für Forschung, Technologie und Raumfahrt Zeitraum: 09/2019 - 03/2023 Projektleitung: Prof. Dr. Ulf Leser
SFB 1404/1: Adaption von Datenanalyseworkflows der Genomforschung auf unterschiedliche Datenzugriffsmuster (TP A02)
Quelle ↗Förderer: DFG Sonderforschungsbereich Zeitraum: 07/2020 - 06/2024 Projektleitung: Prof. Dr. Ulf Leser
SFB 1404/1: Adaptive, verteilte und skalierbare Analyse massiver Satellitendaten (TP B05)
Quelle ↗Förderer: DFG Sonderforschungsbereich Zeitraum: 07/2020 - 06/2024 Projektleitung: Prof. Dr. Ulf Leser, Prof. Dr. Patrick Hostert
SFB 1404/1: FONDA – Grundlagen von Workflows für die Analyse großer naturwissenschaftlicher Daten
Quelle ↗Förderer: DFG Sonderforschungsbereich Zeitraum: 07/2020 - 06/2024 Projektleitung: Prof. Dr. Ulf Leser
SFB 1404/1: Testsysteme und Repositorien (S01)
Quelle ↗Förderer: DFG Sonderforschungsbereich Zeitraum: 07/2020 - 06/2024 Projektleitung: Prof. Dr. Ulf Leser, Malte Dreyer
SFB 1404/2: Energie-Optimierung von Workflows in der Bioinformatik (TP A02)
Quelle ↗Förderer: DFG Sonderforschungsbereich Zeitraum: 07/2024 - 06/2028 Projektleitung: Prof. Dr. Ulf Leser
SFB 1404/2: FONDA – Grundlagen von Workflows für die Analyse großer naturwissenschaftlicher Daten
Quelle ↗Förderer: DFG Sonderforschungsbereich Zeitraum: 07/2024 - 06/2028 Projektleitung: Prof. Dr. Ulf Leser
SFB 1404/2: Testsysteme und Repositorien (TP S01)
Quelle ↗Förderer: DFG Sonderforschungsbereich Zeitraum: 07/2024 - 06/2028 Projektleitung: Prof. Dr. Ulf Leser, Malte Dreyer
SFB 1404/2: Transparente Multi-Center Datenanalyseworkflows für die Erdbeobachtung (TP B05)
Quelle ↗Förderer: DFG Sonderforschungsbereich Zeitraum: 07/2024 - 06/2028 Projektleitung: Prof. Dr. Patrick Hostert, Prof. Dr. Ulf Leser
SFB 1404: FONDA – Grundlagen von Workflows für die Analyse großer naturwissenschaftlicher Daten
Quelle ↗Förderer: DFG Sonderforschungsbereich Zeitraum: 07/2020 - 06/2028 Projektleitung: Prof. Dr. Ulf Leser
SFB/TR 54 I: Data Management (TP INF)
Quelle ↗Förderer: DFG Sonderforschungsbereich Zeitraum: 01/2009 - 12/2012 Projektleitung: Prof. Dr. Ulf Leser
T-Sys: Systems biology of T helper cells
Quelle ↗Förderer: Bundesministerium für Forschung, Technologie und Raumfahrt Zeitraum: 01/2013 - 05/2016 Projektleitung: Prof. Dr. Ulf Leser
Umfassende Datenintegration zur Verbesserung onkologischer Therapien
Quelle ↗Förderer: Bundesministerium für Forschung, Technologie und Raumfahrt Zeitraum: 03/2016 - 12/2021 Projektleitung: Prof. Dr. Ulf Leser
Virtual Liver, Teilprojekt F2
Quelle ↗Förderer: Bundesministerium für Forschung, Technologie und Raumfahrt Zeitraum: 04/2010 - 03/2015 Projektleitung: Prof. Dr. Ulf Leser
WebMonitor
Quelle ↗Förderer: Bundesministerium für Wirtschaft und Energie Zeitraum: 08/2010 - 07/2011 Projektleitung: Prof. Dr. Ulf Leser
Mögliche Industrie-Partner10
Stand: 26.4.2026, 19:48:44 (Top-K=20, Min-Cosine=0.4)
- 45 Treffer62.7%
- Zuwendung im Rahmen des Programms „exist – Existenzgründungen aus der Wissenschaft“ aus dem Bundeshaushalt, Einzelplan 09, Kapitel 02, Titel 68607, Haushaltsjahr 2026, sowie aus Mitteln des Europäischen Strukturfonds (hier Euro-päischer Sozialfonds Plus – ESF Plus) Förderperiode 2021-2027 – Kofinanzierung für das Vorhaben: „exist Women“T62.7%
- Zuwendung im Rahmen des Programms „exist – Existenzgründungen aus der Wissenschaft“ aus dem Bundeshaushalt, Einzelplan 09, Kapitel 02, Titel 68607, Haushaltsjahr 2026, sowie aus Mitteln des Europäischen Strukturfonds (hier Euro-päischer Sozialfonds Plus – ESF Plus) Förderperiode 2021-2027 – Kofinanzierung für das Vorhaben: „exist Women“
- 102 Treffer59.2%
- EU: Simulation in Multiscale Physical and Biological Systems (STIMULATE)P59.2%
- EU: Bottom-Up Generation of atomicalLy Precise syntheTIc 2D MATerials for High Performance in Energy and Electronic Applications – A Multi-Site Innovative Training Action (ULTIMATE)P49.9%
- EU: Simulation in Multiscale Physical and Biological Systems (STIMULATE)
NVIDIA GmbH
PT78 Treffer59.2%- EU: Simulation in Multiscale Physical and Biological Systems (STIMULATE)P59.2%
- EU: Simulation in Multiscale Physical and Biological Systems (STIMULATE)
- 76 Treffer59.2%
- EU: Simulation in Multiscale Physical and Biological Systems (STIMULATE)P59.2%
- EU: Simulation in Multiscale Physical and Biological Systems (STIMULATE)
- 22 Treffer59.0%
- Züchterische Erschließung und Nutzbarmachung pflanzengenetischer Ressourcen durch on-farm/insitu-Erhaltung und Positionierung von Produkten im Bio-LebensmitteleinzelhandelP59.0%
- Züchterische Erschließung und Nutzbarmachung pflanzengenetischer Ressourcen durch on-farm/insitu-Erhaltung und Positionierung von Produkten im Bio-Lebensmitteleinzelhandel
- 99 Treffer58.9%
- WayIn – Der Inklusionswegweiser für Arbeitgeber: Technische Entwicklung und wissenschaftliche BegleitanalyseP58.9%
- WayIn – Der Inklusionswegweiser für Arbeitgeber: Technische Entwicklung und wissenschaftliche Begleitanalyse
- 97 Treffer58.9%
- WayIn – Der Inklusionswegweiser für Arbeitgeber: Technische Entwicklung und wissenschaftliche BegleitanalyseP58.9%
- WayIn – Der Inklusionswegweiser für Arbeitgeber: Technische Entwicklung und wissenschaftliche Begleitanalyse
- 64 Treffer58.8%
- Systematic Models for Biological Systems Engineering Training NetworkP58.8%
- Systematic Models for Biological Systems Engineering Training Network
Protatuans-Etaireia Ereynas Viotechologias Monoprosopi Etaireia Periorisments Eythinis
PT62 Treffer58.8%- Systematic Models for Biological Systems Engineering Training NetworkP58.8%
- Systematic Models for Biological Systems Engineering Training Network
- 65 Treffer58.8%
- Systematic Models for Biological Systems Engineering Training NetworkP58.8%
- Systematic Models for Biological Systems Engineering Training Network
Publikationen25
Top 25 nach Zitationen — Quelle: OpenAlex (BAAI/bge-m3 embedded für Matching).
Bioinformatics · 561 Zitationen · DOI
habibima@informatik.hu-berlin.de.
Lecture notes in computer science · 498 Zitationen · DOI
The VLDB Journal · 431 Zitationen · DOI
Bioinformatics · 270 Zitationen · DOI
ChemSpot is freely available at: http://www.informatik.hu-berlin.de/wbi/resources.
JAMA Network Open · 264 Zitationen · DOI
In this diagnostic study, treatment options of LLMs in precision oncology did not reach the quality and credibility of human experts; however, they generated helpful ideas that might have complemented established procedures. Considering technological progress, LLMs could play an increasingly important role in assisting with screening and selecting relevant biomedical literature to support evidence-based, personalized treatment decisions.
236 Zitationen · DOI
Many applications work with graph-structured data. As graphs grow in size, indexing becomes essential to ensure sufficient query performance. We present the GRIPP index structure (GRaph Indexing based on Pre- and Postorder numbering) for answering reachability queries in graphs.
edoc Publication server (Humboldt University of Berlin) · 187 Zitationen · DOI
t p ' dx x t p ye fx h dr g lu c dx mh y g z c{ C| Sa cg eg } ~| eu tU Vs { C e| Q Su tR m a c q 'r ts vu cw x y p ' t p r $ dx i X d 4 e fy gu t h gi Xw ds d j Ch k r g lu c dx mh y g e t 4 d d ty x x y t d dw 2 gi Xw ds d w dk p t d y w vs vr tx d tx y g d d u tr g d ty y 7k t y p dw t w r pw r t x y t 0 s cw p 4 I t dw u tx s $s A dx y 4 d cx y $x yw s s x y x my gw s h v t x x x 1 dk t d d s v g w dw 1w u g p w dw f x w u x t 4r t d r p y i X dw s y g C p 4 pw r t x y t w ~y g C d 4 x yw s vu t d w u tr g S t x V S t dw k t u t d x 1 y g c p dw fw u g x x p h w k t x fs vr x k o w dj ~k u c Ay gw t Iw A 4r t d t dw gx y t d p v y x k tw w 4 p tx r t g y g 1x yw 4 1x y d d d C d 0x p dx fj tw u cw r g f dk t s cw p 4 ( t dw u tx s w i x yw d cw d dk t gi Xw ds d w 0 4r x p c tx ' 4 dw 4r t d tx t t h dk t f c f y g t dx y d u c i X s w dj i Xw (s vr tx d y u p ~ 4r t d 0 t dw gx y t d p dk fi Xr tx x x yx r y g t e dk t I 4r x vw i C gi Xw ds d w ~ s i x y y 4 pr x k x yw s tx y d t t d d s x t t d x x yr g x y y x h e I p s x t d px 1 x yx r y g ' gi Xw ds d w 4r x 4 dw s vr tx d y u p 4r t d v t dw gx y t d pw u p ty "w l d d d s tx k t ps 0h 7 $s w gy g x gi Xw ds d w 4r x y g C d 4 x 4 x ( t ~ c pi Xw ds 4r x 4 y g d pw r t x y p x tx d w y 2x yw 4 d r t ~w tx $ dk 2 dk t ~u c t pw r t x y t h S tx yw y 4 Ix yw s tr g d f 4r t d 4 y g c y g 4 gi Xw ds d w 4r x ~w i dk t I `y g y t d w dk y g t dx y d u c A dk t $x yw 4 d 4 w i f pw r t x y t h " x x 7 0y g y d ds t A dk t 0w x x 4r x w i f tx x d d d t 'u d 4 d dk t t p v gi Xw ds d w 4r x $ dx yw d t dw y p y fw i k t k g 4r x 4r t d 4 p d tx h m y g w 9 X g y y m X y w X X g ~ X m y g m w y y X X 1 g $ t m X
A Comprehensive Benchmark of Kernel Methods to Extract Protein–Protein Interactions from Literature
2010PLoS Computational Biology · 182 Zitationen · DOI
The most important way of conveying new findings in biomedical research is scientific publication. Extraction of protein-protein interactions (PPIs) reported in scientific publications is one of the core topics of text mining in the life sciences. Recently, a new class of such methods has been proposed - convolution kernels that identify PPIs using deep parses of sentences. However, comparing published results of different PPI extraction methods is impossible due to the use of different evaluation corpora, different evaluation metrics, different tuning procedures, etc. In this paper, we study whether the reported performance metrics are robust across different corpora and learning settings and whether the use of deep parsing actually leads to an increase in extraction quality. Our ultimate goal is to identify the one method that performs best in real-life scenarios, where information extraction is performed on unseen text and not on specifically prepared evaluation data. We performed a comprehensive benchmarking of nine different methods for PPI extraction that use convolution kernels on rich linguistic information. Methods were evaluated on five different public corpora using cross-validation, cross-learning, and cross-corpus evaluation. Our study confirms that kernels using dependency trees generally outperform kernels based on syntax trees. However, our study also shows that only the best kernel methods can compete with a simple rule-based approach when the evaluation prevents information leakage between training and test corpora. Our results further reveal that the F-score of many approaches drops significantly if no corpus-specific parameter optimization is applied and that methods reaching a good AUC score often perform much worse in terms of F-score. We conclude that for most kernels no sensible estimation of PPI extraction performance on new text is possible, given the current heterogeneity in evaluation data. Nevertheless, our study shows that three kernels are clearly superior to the other methods.
Briefings in Bioinformatics · 157 Zitationen · DOI
The recognition of biomedical concepts in natural text (named entity recognition, NER) is a key technology for automatic or semi-automatic analysis of textual resources. Precise NER tools are a prerequisite for many applications working on text, such as information retrieval, information extraction or document classification. Over the past years, the problem has achieved considerable attention in the bioinformatics community and experience has shown that NER in the life sciences is a rather difficult problem. Several systems and algorithms have been devised and implemented. In this paper, the problems and resources in NER research are described, the principal algorithms underlying most systems sketched, and the current state-of-the-art in the field surveyed.
PLoS Genetics · 155 Zitationen · DOI
Circadian rhythms are essential to the temporal regulation of molecular processes in living systems and as such to life itself. Deregulation of these rhythms leads to failures in biological processes and eventually to the manifestation of pathological phenotypes including cancer. To address the questions as to what are the elicitors of a disrupted clock in cancer, we applied a systems biology approach to correlate experimental, bioinformatics and modelling data from several cell line models for colorectal and skin cancer. We found strong and weak circadian oscillators within the same type of cancer and identified a set of genes, which allows the discrimination between the two oscillator-types. Among those genes are IFNGR2, PITX2, RFWD2, PPARγ, LOXL2, Rab6 and SPARC, all involved in cancer-related pathways. Using a bioinformatics approach, we extended the core-clock network and present its interconnection to the discriminative set of genes. Interestingly, such gene signatures link the clock to oncogenic pathways like the RAS/MAPK pathway. To investigate the potential impact of the RAS/MAPK pathway - a major driver of colorectal carcinogenesis - on the circadian clock, we used a computational model which predicted that perturbation of BMAL1-mediated transcription can generate the circadian phenotypes similar to those observed in metastatic cell lines. Using an inducible RAS expression system, we show that overexpression of RAS disrupts the circadian clock and leads to an increase of the circadian period while RAS inhibition causes a shortening of period length, as predicted by our mathematical simulations. Together, our data demonstrate that perturbations induced by a single oncogene are sufficient to deregulate the mammalian circadian clock.
Bioinformatics · 139 Zitationen · DOI
http://alibaba.informatik.hu-berlin.de/
Information Systems · 134 Zitationen · DOI
Nature Communications · 130 Zitationen · DOI
Genetic heterogeneity between and within tumours is a major factor determining cancer progression and therapy response. Here we examined DNA sequence and DNA copy-number heterogeneity in colorectal cancer (CRC) by targeted high-depth sequencing of 100 most frequently altered genes. In 97 samples, with primary tumours and matched metastases from 27 patients, we observe inter-tumour concordance for coding mutations; in contrast, gene copy numbers are highly discordant between primary tumours and metastases as validated by fluorescent in situ hybridization. To further investigate intra-tumour heterogeneity, we dissected a single tumour into 68 spatially defined samples and sequenced them separately. We identify evenly distributed coding mutations in APC and TP53 in all tumour areas, yet highly variable gene copy numbers in numerous genes. 3D morpho-molecular reconstruction reveals two clusters with divergent copy number aberrations along the proximal-distal axis indicating that DNA copy number variations are a major source of tumour heterogeneity in CRC.
Geophysical Journal International · 110 Zitationen · DOI
SUMMARY Precise real time estimates of earthquake magnitude and location are essential for early warning and rapid response. While recently multiple deep learning approaches for fast assessment of earthquakes have been proposed, they usually rely on either seismic records from a single station or from a fixed set of seismic stations. Here we introduce a new model for real-time magnitude and location estimation using the attention based transformer networks. Our approach incorporates waveforms from a dynamically varying set of stations and outperforms deep learning baselines in both magnitude and location estimation performance. Furthermore, it outperforms a classical magnitude estimation algorithm considerably and shows promising performance in comparison to a classical localization algorithm. Our model is applicable to real-time prediction and provides realistic uncertainty estimates based on probabilistic inference. In this work, we furthermore conduct a comprehensive study of the requirements on training data, the training procedures and the typical failure modes. Using three diverse and large scale data sets, we conduct targeted experiments and a qualitative error analysis. Our analysis gives several key insights. First, we can precisely pinpoint the effect of large training data; for example, a four times larger training set reduces average errors for both magnitude and location prediction by more than half, and reduces the required time for real time assessment by a factor of four. Secondly, the basic model systematically underestimates large magnitude events. This issue can be mitigated, and in some cases completely resolved, by incorporating events from other regions into the training through transfer learning. Thirdly, location estimation is highly precise in areas with sufficient training data, but is strongly degraded for events outside the training distribution, sometimes producing massive outliers. Our analysis suggests that these characteristics are not only present for our model, but for most deep learning models for fast assessment published so far. They result from the black box modeling and their mitigation will likely require imposing physics derived constraints on the neural network. These characteristics need to be taken into consideration for practical applications.
Bioinformatics · 108 Zitationen · DOI
The code is available on request from the author.
Geophysical Journal International · 100 Zitationen · DOI
SUMMARY Earthquakes are major hazards to humans, buildings and infrastructure. Early warning methods aim to provide advance note of incoming strong shaking to enable preventive action and mitigate seismic risk. Their usefulness depends on accuracy, the relation between true, missed and false alerts and timeliness, the time between a warning and the arrival of strong shaking. Current approaches suffer from apparent aleatoric uncertainties due to simplified modelling or short warning times. Here we propose a novel early warning method, the deep-learning based transformer earthquake alerting model (TEAM), to mitigate these limitations. TEAM analyses raw, strong motion waveforms of an arbitrary number of stations at arbitrary locations in real-time, making it easily adaptable to changing seismic networks and warning targets. We evaluate TEAM on two regions with high seismic hazard, Japan and Italy, that are complementary in their seismicity. On both data sets TEAM outperforms existing early warning methods considerably, offering accurate and timely warnings. Using domain adaptation, TEAM even provides reliable alerts for events larger than any in the training data, a property of highest importance as records from very large events are rare in many regions.
arXiv (Cornell University) · 92 Zitationen
Multivariate time series (MTS) arise when multiple interconnected sensors record data over time. Dealing with this high-dimensional data is challenging for every classifier for at least two aspects: First, an MTS is not only characterized by individual feature values, but also by the interplay of features in different dimensions. Second, this typically adds large amounts of irrelevant data and noise. We present our novel MTS classifier WEASEL+MUSE which addresses both challenges. WEASEL+MUSE builds a multivariate feature vector, first using a sliding-window approach applied to each dimension of the MTS, then extracts discrete features per window and dimension. The feature vector is subsequently fed through feature selection, removing non-discriminative features, and analysed by a machine learning classifier. The novelty of WEASEL+MUSE lies in its specific way of extracting and filtering multivariate features from MTS by encoding context information into each feature. Still the resulting feature set is small, yet very discriminative and useful for MTS classification. Based on a popular benchmark of 20 MTS datasets, we found that WEASEL+MUSE is among the most accurate classifiers, when compared to the state of the art. The outstanding robustness of WEASEL+MUSE is further confirmed based on motion gesture recognition data, where it out-of-the-box achieved similar accuracies as domain-specific methods.
88 Zitationen
Briefings in Bioinformatics · 82 Zitationen · DOI
Differential network analysis (DiNA) denotes a recent class of network-based Bioinformatics algorithms which focus on the differences in network topologies between two states of a cell, such as healthy and disease, to identify key players in the discriminating biological processes. In contrast to conventional differential analysis, DiNA identifies changes in the interplay between molecules, rather than changes in single molecules. This ability is especially important in cases where effectors are changed, e.g. mutated, but their expression is not. A number of different DiNA approaches have been proposed, yet a comparative assessment of their performance in different settings is still lacking. In this paper, we evaluate 10 different DiNA algorithms regarding their ability to recover genetic key players from transcriptome data. We construct high-quality regulatory networks and enrich them with co-expression data from four different types of cancer. Next, we assess the results of applying DiNA algorithms on these data sets using a gold standard list (GSL). We find that local DiNA algorithms are generally superior to global algorithms, and that all DiNA algorithms outperform conventional differential expression analysis. We also assess the ability of DiNA methods to exploit additional knowledge in the underlying cellular networks. To this end, we enrich the cancer-type specific networks with known regulatory miRNAs and compare the algorithms performance in networks with and without miRNA. We find that including miRNAs consistently and considerably improves the performance of almost all tested algorithms. Our results underline the advantages of comprehensive cell models for the analysis of -omics data.
BMC Bioinformatics · 82 Zitationen · DOI
The IAT was an informative exercise that advanced the dialog between curators and developers and increased the appreciation of challenges faced by each group. A major conclusion was that the intended users should be actively involved in every phase of software development, and this will be strongly encouraged in future tasks. The IAT Task provides the first steps toward the definition of metrics and functional requirements that are necessary for designing a formal evaluation of interactive curation systems in the BioCreative IV challenge.
Nucleic Acids Research · 75 Zitationen · DOI
Research results are primarily published in scientific literature and curation efforts cannot keep up with the rapid growth of published literature. The plethora of knowledge remains hidden in large text repositories like MEDLINE. Consequently, life scientists have to spend a great amount of time searching for specific information. The enormous ambiguity among most names of biomedical objects such as genes, chemicals and diseases often produces too large and unspecific search results. We present GeneView, a semantic search engine for biomedical knowledge. GeneView is built upon a comprehensively annotated version of PubMed abstracts and openly available PubMed Central full texts. This semi-structured representation of biomedical texts enables a number of features extending classical search engines. For instance, users may search for entities using unique database identifiers or they may rank documents by the number of specific mentions they contain. Annotation is performed by a multitude of state-of-the-art text-mining tools for recognizing mentions from 10 entity classes and for identifying protein-protein interactions. GeneView currently contains annotations for >194 million entities from 10 classes for ∼21 million citations with 271,000 full text bodies. GeneView can be searched at http://bc3.informatik.hu-berlin.de/.
Proceedings of the VLDB Endowment · 74 Zitationen · DOI
Much data in the Web is hidden behind Web query interfaces. In most cases the only means to "surface" the content of a Web database is by formulating complex queries on such interfaces. Applications such as Deep Web crawling and Web database integration require an automatic usage of these interfaces. Therefore, an important problem to be addressed is the automatic extraction of query interfaces into an appropriate model. We hypothesize the existence of a set of domain-independent "commonsense design rules" that guides the creation of Web query interfaces. These rules transform query interfaces into schema trees. In this paper we describe a Web query interface extraction algorithm, which combines HTML tokens and the geometric layout of these tokens within a Web page. Tokens are classified into several classes out of which the most significant ones are text tokens and field tokens. A tree structure is derived for text tokens using their geometric layout. Another tree structure is derived for the field tokens. The hierarchical representation of a query interface is obtained by iteratively merging these two trees. Thus, we convert the extraction problem into an integration problem. Our experiments show the promise of our algorithm: it outperforms the previous approaches on extracting query interfaces on about 6.5% in accuracy as evaluated over three corpora with more than 500 Deep Web interfaces from 15 different domains.
74 Zitationen
We are currently witnessing the emerging of a new generation of software systems: Federated information systems. Their main characteristic is that they are constructed as an integrating layer over existing legacy applications and databases. They can be broadly classified in three dimensions: the degree of autonomy they allow in integrated components, the degree of heterogeneity between components they can cope with, and whether or not they support distribution. Whereas the communication and interoperation problem has come into a stage of applicable solutions over the past decade, semantic data integration has not become similarly clear. This report
73 Zitationen · DOI
Rule-based models are attractive for various tasks because they inherently lead to interpretable and explainable decisions and can easily incorporate prior knowledge. However, such systems are difficult to apply to problems involving natural language, due to its linguistic variability. In contrast, neural models can cope very well with ambiguity by learning distributed representations of words and their composition from data, but lead to models that are difficult to interpret. In this paper, we describe a model combining neural networks with logic programming in a novel manner for solving multi-hop reasoning tasks over natural language. Specifically, we propose to use a Prolog prover which we extend to utilize a similarity function over pretrained sentence encoders. We fine-tune the representations for the similarity function via backpropagation. This leads to a system that can apply rulebased reasoning to natural language, and induce domain-specific rules from training data. We evaluate the proposed system on two different question answering tasks, showing that it outperforms two baselines -BIDAF (Seo et al., 2016a) and FASTQA (Weissenborn et al., 2017b) on a subset of the WIKIHOP corpus and achieves competitive results on the MEDHOP data set
Future Generation Computer Systems · 73 Zitationen · DOI
Kooperationen20
Bestätigte Forscher↔Partner-Paare aus HU-FIS — Gold-Standard-Positive für das Matching.
Helmholtz-Einstein International Berlin Research School in Data Science (HEIBRiDS)
other
GRK 2424: Computermethoden für personalisierte Therapien in der Onkologie
other
SFB 1404/2: FONDA – Grundlagen von Workflows für die Analyse großer naturwissenschaftlicher Daten
other
SFB 1404/2: FONDA – Grundlagen von Workflows für die Analyse großer naturwissenschaftlicher Daten
university
Helmholtz-Einstein International Berlin Research School in Data Science (HEIBRiDS)
other
Helmholtz-Einstein International Berlin Research School in Data Science (HEIBRiDS)
other
SFB 1404/2: FONDA – Grundlagen von Workflows für die Analyse großer naturwissenschaftlicher Daten
university
SFB 1404/2: FONDA – Grundlagen von Workflows für die Analyse großer naturwissenschaftlicher Daten
research_institute
Helmholtz-Einstein International Berlin Research School in Data Science (HEIBRiDS)
other
SFB 1404/2: FONDA – Grundlagen von Workflows für die Analyse großer naturwissenschaftlicher Daten
other
EU: Scalable, Secure Storage and Analysis of Biobank Data (BioBankCloud)
other
EU: Scalable, Secure Storage and Analysis of Biobank Data (BioBankCloud)
university
SFB 1404/2: FONDA – Grundlagen von Workflows für die Analyse großer naturwissenschaftlicher Daten
other
easyTEM: Entwicklung von ressourceneffizienter Transmissionselektronenmikroskopie zur Demokratisierung ihres Einsatzes in der Materialforschung
other
GRK 2424: Computermethoden für personalisierte Therapien in der Onkologie
other
SFB 1404/2: FONDA – Grundlagen von Workflows für die Analyse großer naturwissenschaftlicher Daten
university
SFB 1404/2: FONDA – Grundlagen von Workflows für die Analyse großer naturwissenschaftlicher Daten
university
EU: Scalable, Secure Storage and Analysis of Biobank Data (BioBankCloud)
university
SFB 1404/2: FONDA – Grundlagen von Workflows für die Analyse großer naturwissenschaftlicher Daten
university
SFB 1404/2: FONDA – Grundlagen von Workflows für die Analyse großer naturwissenschaftlicher Daten
research_institute
Stammdaten
Identität, Organisation und Kontakt aus HU-FIS.
- Name
- Prof. Dr. Ulf Leser
- Titel
- Prof. Dr.
- Fakultät
- Mathematisch-Naturwissenschaftliche Fakultät
- Institut
- Institut für Informatik
- Arbeitsgruppe
- Wissensmanagement in der Bioinformatik
- Telefon
- +49 30 2093-41282
- HU-FIS-Profil
- Quelle ↗
- Zuletzt gescrapt
- 26.4.2026, 01:08:32