Přehled

Doctoral study program
Life Sciences (Faculty of Science, Masaryk University)

Supervisor
Mgr. Vojtěch Bystrý, Ph.D.

Annotation
This Ph.D. project is part of a national initiative to build a cutting-edge platform for storing and analyzing omics data, spanning genomic, epigenomic, transcriptomic, and proteomic datasets. The platform will be integrated with European networks created through the Genomic Data Infrastructure (GDI) project, enabling the sharing and analysis of data across borders, granting access to vast amounts of multi-omics data. This level of collaboration requires a federated approach, where data remains at local nodes, while computation and model training happen across distributed systems, ensuring both data privacy and security.

The primary goal of this Ph.D. will be to develop and orchestrate bioinformatics tools that leverage federated learning. These tools will facilitate scalable, collaborative computation across multiple European institutions, allowing local nodes to train models independently and contribute to a global model without centralized data storage. The Ph.D. candidate will design and deploy these federated bioinformatics tools, focusing on integrating long-read sequencing technologies—emphasizing the detection of structural variants and modeling methylation patterns—along with short-read sequencing data for a comprehensive analysis.

Federated learning will be crucial for efficiently processing the distributed datasets, allowing the platform to securely compute over sensitive data while preserving its informative value. By developing novel algorithms and workflows that integrate federated computing with omics data, the Ph.D. candidate will push the boundaries of current bioinformatics approaches. The research will lead to first-author publications, making significant contributions to both national and European scientific advancements in genomics, epigenomics, and multi-omics data integration.

Recommended literature
Zhao, Y., et al. “Federated Learning with Non-IID Data.” Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics (2018).
Li, T., et al. “Federated Optimization in Heterogeneous Networks.” arXiv preprint arXiv:1812.06127 (2018).
Celi, L. A., et al. “Federated Learning Applications in Medicine: A Systematic Review.” PLOS Digital Health (2022).
Rieke, N., et al. “The Future of Digital Health with Federated Learning.” Nature Medicine 26 (2020): 1691–1700.
Wang, S., et al. “Privacy-Preserving Federated Learning for Bioinformatics Data Integration.” IEEE Transactions on Big Data (2022).
Research area
Bioinformatics

Keywords
Federated learning, genomics, epigenomics, multi-omics data, long-read sequencing, bioinformatics

Funding of the PhD candidate
The Ph.D. will be funded by the Genomic Data Infrastructure (GDI) project and the Open Science II (OSII) project.

Requirements for candidate
The ideal candidate should possess strong IT skills, particularly in coding, machine learning, and data science, with a solid understanding of bioinformatics. Experience with sequencing data analysis, especially long-read and short-read technologies, is highly advantageous. Familiarity with federated computing concepts is also a plus.

Information about the supervisor
Number of successfully finished students: 1
Number of current students: 3
Number of current students over 4 years: 0

Information about the application process
https://www.ceitec.eu/ls-mm-phd/

Application webpage
https://www.ceitec.eu/development-and-orchestration-of-bioinformatics-tools-for-federated-computing-within-a-european-omics-data-platform/t11433