Přehled

Doctoral study program: Biomedical Sciences (Faculty of Medicine, Masaryk University)

Study plan: Molecular Medicine

Form of study: doctoral full time/combined

Department: CEITEC CF Bioinformatics

Supervisor: Mgr. Vojtěch Bystrý, Ph.D.

Annotation

This Ph.D. project focuses on applying foundational encoding models to multiomics data integration and knowledge-based modeling for clinical applications. The primary goal is to develop computational tools and workflows utilizing models such as dnaBERT, epi-GPT, DeepSNP, scGPT, and other foundational AI models to analyze both single-cell and bulk omics datasets. These models will be integrated with data from genomics, transcriptomics, proteomics, and epigenomics to create predictive frameworks, with a focus on AI-enhanced improvements to existing patient stratification models.

The Ph.D. candidate will start with single-cell transcriptomics models, as they are the most advanced in the current research landscape, while significant advancements in other omics models are anticipated. The candidate will explore how these foundational models, applied through latent space representations, can enhance our understanding of multiomics data and unravel molecular mechanisms related to various diseases. Collaborative research projects on cardiovascular diseases, triple-negative breast cancer, and prostate cancer will provide a strong foundation for testing and validating these approaches.

Additionally, the ACGT2 project, focusing on hematology patients and long-read sequencing (covering small variants, structural variants, and methylation profiles), will serve as a core platform for further development and testing of these models and methods. The research is expected to lead to the advancement of predictive models for clinical applications and result in first-author publications, pushing the boundaries of bioinformatics in molecular medicine.

Recommended literature

  • Hao, M., Gong, J., Zeng, X., Liu, C., Guo, Y., Cheng, X., Wang, T., Ma, J., Song, L., & Zhang, X. (2023). “Large Scale Foundation Model on Single-cell Transcriptomics.” bioRxiv. https://doi.org/10.1101/2023.05.29.542705
  • Wang, S., et al. (2023). “scGPT: leveraging GPT-like architecture for single-cell RNA-seq analysis.” Nature Methods.
  • Wang, S., et al. (2020). “dnaBERT: Pre-trained Bidirectional Encoder Representations from Transformers model for DNA-language in genome research.” Nature Communications.

Research area:Bioinformatics

Funding of the PhD candidate:

  • CEITEC Bioinformatics Core Facility budget
  • TACR – Precizní medicína online pomocí AI a omických dat
  • ACGT2

Requirements on candidates

The ideal candidate should have a background in bioinformatics, data science, and machine learning, with experience in sequencing data analysis. Knowledge of foundational AI models (e.g., transformers like GPT) and proficiency in developing bioinformatics tools will be beneficial. Experience in multiomics data integration and knowledge-based systems is highly advantageous.

Keywords: Bioinformatics, foundational models, multiomics, single-cell transcriptomics, molecular medicine

Information about the supervisor

Number of successfully finished students: 1
Number of current students: 3
Number of current students over 4 years: 0

Information about the application process: https://www.ceitec.eu/ls-mm-phd/

Application webpage: https://www.ceitec.eu/utilizing-foundational-encoding-models-for-multiomics-data-integration-and-knowledge-base-modeling-towards-clinical-applications/t11429