Projects related to the molecular evolution of primates
The Primate TFome
We recently determined that there are more than 3000 Gene Regulatory Factors (GRFs), including ~1500 DNA-binding transcription factors (TFs), co-factors, hormone receptors, histone-modifying enzymes etc., in the human genome. The number of GRFs is still unknown in most other sequenced genomes. The most significant problem for determining the exact TF content in the other genomes is the insufficient quality of their draft genomes, which makes it difficult to identify TFs in complicated genome areas. Furthermore, the lack of transcript information (RNA-Seq data, mRNA, cDNA, or EST sequences) makes it difficult to determine the sequence of the transcribed genes, because the prediction of promoters, open reading frames, and splice sites has to be done purely based on genomic features and conservation to other species. We take advantage of improved genomic information and increasing amounts of transcript data provided by RNA-Seq to computationally identify all TFs in primate genomes and to manually curate gene models for TFs in a number of primate species. We are using our high-quality TF gene models to reveal lineage- and species-specific TFs, TFs that have lineage- or species-specific changes in functional domains, and TFs under positive selection.
Comparative Functional Characterization of Transcription Factors
Only a small proportion of TFs has been functionally characterized. Very little is known about many gene families, and this situation is especially dramatic for the biggest TF family in mammalian genomes: the KRAB-ZNFs. The importance of TFs for phenotypic differences and speciation has been established for various examples (e.g. PRDM9, FOXP2, EGR1, BMP4). Several KRAB-ZNFs have been implicated in brain and cognitive development. We are focusing on human-specific TFs, TFs with human-specific domain changes, and TFs that are connected in gene regulatory networks in a human-specific way to determine experimentally their evolutionary impact. We perform for instance ChIP-Seq experiments in human and non-human primate cell lines to identify the binding sites of the TFs in both species. Furthermore, we manipulate expression levels of the TFs in cell lines of both species (knock-down and overexpression) followed by RNA-Seq to determine downstream targets. These experiments will not only give us insight into the function of the selected TFs, but more importantly, insight into their functional changes during evolution.
Evolution of Transcription Factor Networks
TFs regulate their target genes in a concerted, combinatorial fashion, thus forming often large and complex gene regulatory networks. Little is known about the evolution of such networks, about the amount of noise or redundancy in such networks, and the importance of gain or loss of nodes (genes) or links (interactions). Based on transcriptome information, we have previously identified a network of TFs that is active in the prefrontal cortex and is characterized by significant link changes between humans and chimpanzees. It appears that this network was involved in shaping some phenotypic differences, such as the larger human brain and its higher energy consumption. We are now investigating this TF network in other primates to reveal its evolutionary history. Furthermore we are interested in network differences underlying cognitive disorders.
Long Non-Coding RNAs in Primate Brain Evolution
Long non-coding RNAs (lncRNAs) are emerging as key players in the nervous system. Many of the about 15.000 human lncRNAs are expressed in the brain and multiple lines of evidence have linked them to important brain functions, such as neurogenesis and behavior, or have associated them with neurodegenerative and psychiatric diseases. Although several databases for lncRNAs exist, there is still a large gap in the structural and functional annotation of lncRNAs hindering a full understanding of their role in the nervous system. Many characteristics of the brain are human specific. Genes that evolve quickly, as lncRNAs do, are therefore the best candidates to be primarily responsible for the evolution of these innovations. Since biological function has to be studied in the light of evolution, we aim here at establishing a full catalog of human lncRNAs, including an annotation of their sequence, structure, expression, network integration and evolutionary changes by collating and coherently re-analyzing the wealth of already available high throughout data.
Monoallelic expression as a potential trigger of cognitive diseases
Random monoallelic expression (RMAE) is a mechanism, in which only one allele of a gene is expressed. Since the allele is randomly chosen, this gene expression mode can create variability between cells of the same cell type and might be one mechanism to render some neurons more sensitive than others to developing pathologies. RMAE has been shown to be involved in cognitive diseases, such as schizophrenia and autism, and in neurodevelopmental disorders. Alzheimer (AD) -associated genes are significantly enriched among RMAE genes, suggesting a link between AD and RMAE. Moreover, the AD-characteristic amyloid precursor protein (APP) is expressed monoallelically, potentially leading to different amounts of APP in different cells. We are using single neuron sequencing to test the hypothesis, that patterns of RMAE are altered in neurons of individuals with AD. Using state-of-the-art comparative transcriptome and co-expression network analyses we aim to uncover functional consequences of changes in RMAE that might be related to AD.
Sex-biased gene flow
The mtDNA and the Y chromosome are commonly used uniparental markers in population genetics that provide information on the history and relationships of populations and individuals. However, genetic profiles of a population inferred from mtDNA vs. the MSY often differ from each other, and from the genetic profile inferred from autosomal markers – which could be driven by the differences in the maternal and paternal histories of human populations. Recently, many populations have been described using both uniparental and autosomal markers, however, we still know very little about associations of uniparental (mtDNA and Y chromosome) haplogroups with autosomal ancestry components. Synergistically using mtDNA and Y chromosome haplogroup compositions together with autosomal ancestry components we are trying to define “ancestry packages”, i.e. associated combinations of mtDNA, Y chromosome haplogroups, and autosomal ancestry components, which are indicative of ancestral genetic compositions. The ancestry packages approach can be used to objectively classify the likely geographic origin of haplogroups (and other markers for which population estimates are available) in accordance with autosomal ancestry components and can be further used to infer the potential direction and composition of sex-biased gene flow between different ancestral populations.