The German Human Genome-Phenome Archive in an International Context: Toward a Federated Infrastructure for Managing and Analyzing Genomics and Health Data




Genomics and Health Data, International Data Sharing, Federated Computing, German Human Genome-Phenome Archive


With increasing numbers of human omics data, there is an urgent need for adequate resources for data sharing while also standardizing and harmonizing data processing. As part of the National Research Data Infrastructure (NFDI), the German Human Genome-Phenome Archive (GHGA) strives to connect the data from German researchers and their institutions to the international landscape of genome research. To achieve this, GHGA partners up with international activities such as the federated European Genome-Phenome Archive (EGA) [1] and the recently funded European Genomic Data Infrastructure (GDI) project to enable participation in international studies while ensuring at the same time the proper protection of the sensitive patient data included in GHGA.


M. A. Freeberg et al., “The European Genome-phenome Archive in 2021,” Nucleic Acids Res., vol. 50, no. D1, pp. D980–D987, Jan. 2022, doi: 10.1093/nar/gkab1059.

M. D. Wilkinson et al., “The FAIR Guiding Principles for scientific data management and stewardship,” Sci. Data, vol. 3, p. 160018, Mar. 2016, doi: 10.1038/sdata.2016.18.

Z. Stark et al., “Integrating Genomics into Healthcare: A Global Responsibility,” Am. J. Hum. Genet., vol. 104, no. 1, pp. 13–20, Jan. 2019, doi: 10.1016/j.ajhg.2018.11.014.

G. Saunders et al., “Leveraging European infrastructures to access 1 million human genomes by 2022,” Nat. Rev. Genet., vol. 20, no. 11, pp. 693–701, Nov. 2019, doi: 10.1038/s41576-019-0156-9.

H. L. Rehm et al., “GA4GH: International policies and standards for data sharing across genomic research and healthcare,” Cell Genomics, vol. 1, no. 2, p. 100029, Nov. 2021, doi: 10.1016/j.xgen.2021.100029.

J. Rambla et al., “Beacon v2 and Beacon networks: A ‘lingua franca’ for federated da-ta discovery in biomedical genomics, and beyond,” Hum. Mutat., p. humu.24369, Apr. 2022, doi: 10.1002/humu.24369.

J. O. B. Jacobsen et al., “The GA4GH Phenopacket schema defines a computable representation of clinical data,” Nat. Biotechnol., vol. 40, no. 6, pp. 817–820, Jun. 2022, doi: 10.1038/s41587-022-01357-4.

C. Voisin et al., “GA4GH Passport standard for digital identity and access permis-sions,” Cell Genomics, vol. 1, no. 2, p. 100030, Nov. 2021, doi: 10.1016/j.xgen.2021.100030.

J. Lawson et al., “The Data Use Ontology to streamline responsible access to human biomedical datasets,” Cell Genomics, vol. 1, no. 2, p. 100028, Nov. 2021, doi: 10.1016/j.xgen.2021.100028.

A. Senf et al., “Crypt4GH: a file format standard enabling native access to encrypted data,” Bioinformatics, vol. 37, no. 17, pp. 2753–2754, Sep. 2021, doi: 10.1093/bioinformatics/btab087.

P. A. Ewels et al., “The nf-core framework for community-curated bioinformatics pipelines,” Nat. Biotechnol. 2020 383, vol. 38, no. 3, pp. 276–278, Feb. 2020, doi: 10.1038/s41587-020-0439-x.

C. Goble et al., “FAIR Computational Workflows,” Data Intell., vol. 2, no. 1–2, pp. 108–121, Jan. 2020, doi: 10.1162/dint_a_00033.

M. Herschel, R. Diestelkämper, and H. Ben Lahmar, “A survey on provenance: What for? What form? What from?,” VLDB J., vol. 26, no. 6, pp. 881–906, Dec. 2017, doi: 10.1007/s00778-017-0486-1.

S. Cohen-Boulakia et al., “Scientific workflows for computational reproducibility in the life sciences: Status, challenges and opportunities,” Future Gener. Comput. Syst., vol. 75, pp. 284–298, Oct. 2017, doi: 10.1016/j.future.2017.01.012.

J. Ison et al., “EDAM: an ontology of bioinformatics operations, types of data and iden-tifiers, topics and formats,” Bioinformatics, vol. 29, no. 10, pp. 1325–1332, May 2013, doi: 10.1093/bioinformatics/btt113.

A. Gray, C. Goble, and R. Jimenez, “Bioschemas: From Potato Salad to Protein An-notation,” in Proceedings of the ISWC 2017 Posters & Demonstrations and Industry Tracks co-located with 16th International Semantic Web Conference (ISWC 2017), 2017.



Received 2023-04-26
Accepted 2023-06-29
Published 2023-09-07