一、【MHC】斯坦福大學(xué)Pritchard:主要組織相容性復(fù)合體(MHC)的遺傳多樣性
Ancient Trans-Species Polymorphism at the Major Histocompatibility Complex in Primates
Classical genes within the Major Histocompatibility Complex (MHC) are responsible for peptide presentation to T cells, thus playing a central role in immune defense against pathogens. These genes are subject to strong selective pressures including both balancing and directional selection, resulting in exceptional genetic diversity—thousands of alleles per gene. Moreover, some alleles appear to be shared between primate species, a phenomenon known as trans-species polymorphism (TSP) or incomplete lineage sorting, which is rare in the genome overall. However, despite the clinical and evolutionary importance of MHC diversity, we currently lack a full picture of primate MHC evolution. To start addressing this gap, we used Bayesian phylogenetic methods to determine the extent of TSP at six classical MHC genes. We find strong support for TSP in all six genes, including between humans and old-world monkeys in HLA-DRB1 and even— remarkably—between humans and new-world monkeys in HLA-DQB1. Despite the long-term persistence of ancient lineages, we additionally observe rapid evolution at amino acids within the peptide-binding domain. The most rapidly-evolving positions are also strongly enriched for autoimmune and infectious disease associations. Together, these results suggest complex selective forces arising from differential peptide binding, which drive short-term allelic turnover within lineages while also maintaining deeply divergent lineages for at least 45 million years.
二、【鰻鱺】日本鰻鱺基因組出爐
A Chromosome-level Assembly of the Japanese Eel Genome, Insights into Gene Duplication and Chromosomal Reorganization
Japanese eels (Anguilla japonica) are commercially important species that have been harvested extensively for foods. Currently, this and related species (American and European eels) are difficult to breed on a commercial basis. Wild stock is used for aquaculture. Due to pollution, overfishing, and international trafficking, eel populations are declining. The International Union for Conservation of Nature lists Japanese eels as critically endangered and on its red list. Here we presented a high-quality genome assembly for Japanese eels and demonstrated that large chromosome reorganizations occurred in the events of third-round whole-genome duplications (3R-WRD). Following multiple chromosomal fusion and fission rearrangement, the Anguilla lineage has reduced the haploid chromosomal number of 19 from the ancestral proto-chromosomal number of 25. Phylogenetic analysis of expanded gene families showed the gene families of olfactory receptors and voltage-gated Ca2+-channel expanded significantly. The expansion of olfactory receptors (group δ and ζ genes) and voltage-gated Ca2+-channel gene families are important for olfaction and neurophysiological functions. Following 3R-WGD, additional tandem (TD) and proximal (PD) duplications occurred to acquire immune-related genes for adaptation. The Japanese eel assembly presented here can be used to study other Anguilla species that are related to evolution and conservation.
三、【錯(cuò)誤】瑞士蘇黎世聯(lián)邦理工(ETH Zürich):深度學(xué)習(xí)預(yù)測宏基因組錯(cuò)誤拼裝
ResMiCo: increasing the quality of metagenome-assembled genomes with deep learning
The number of published metagenome assemblies is rapidly growing due to advances in sequencing technologies. However, sequencing errors, variable coverage, repetitive genomic regions, and other factors can produce misassemblies, which are challenging to detect for taxonomically novel genomic data. Assembly errors can affect all downstream analyses of the assemblies. Accuracy for the state of the art in reference-free misassembly prediction does not exceed an AUPRC of 0.57, and it is not clear how well these models generalize to real-world data. Here, we present the Residual neural network for Misassembled Contig identification (ResMiCo), a deep learning approach for reference-free identification of misassembled contigs. To develop ResMiCo, we first generated a training dataset of unprecedented size and complexity that can be used for further benchmarking and developments in the field. Through rigorous validation, we show that ResMiCo is substantially more accurate than the state of the art, and the model is robust to novel taxonomic diversity and varying assembly methods. ResMiCo estimated 4.7% misassembled contigs per metagenome across multiple real-world datasets. We demonstrate how ResMiCo can be used to optimize metagenome assembly hyperparameters to improve accuracy, instead of optimizing solely for contiguity. The accuracy, robustness, and ease-of-use of ResMiCo make the tool suitable for general quality control of metagenome assemblies and assembly methodology optimization.
四、【空間】斯隆研究所:一款空間轉(zhuǎn)錄組學(xué)的基于貝葉斯概率模型的分析工具
BayesTME: A unified statistical framework for spatial transcriptomics
Spatial variation in cellular phenotypes underlies heterogeneity in immune recognition and response to therapy in cancer and many other diseases. Spatial transcriptomics (ST) holds the potential to quantify such variation, but existing analysis methods address only a small part of the analysis challenge, such as spot deconvolution or spatial differential expression. We present BayesTME, an end-to-end Bayesian method for analyzing spatial transcriptomics data. BayesTME unifies several previously distinct analysis goals under a single, holistic generative model. This unified approach enables BayesTME to (i) be entirely reference-free without any need for paired scRNA-seq, (ii) outperform a large suite of methods in quantitative benchmarks, and (iii) uncover a new type of ST signal: spatial differential expression within individual cell types. To achieve the latter, BayesTME models each phenotype as spatially adaptive and discovers statistically significant spatial patterns amongst coordinated subsets of genes within phenotypes, which we term spatial transcriptional programs. On human and zebrafish melanoma tissues, BayesTME identifies spatial transcriptional programs that capture fundamental biological phenomena like bilateral symmetry, differential expression between interior and surface tumor cells, and tumor-associated fibroblast and macrophage reprogramming. Our results demonstrate BayesTME's power in unlocking a new level of insight from spatial transcriptomics data and fostering a deeper understanding of the spatial architecture of the tumor microenvironment. BayesTME is open source and publicly available (https://github.com/tansey-lab/bayestme).
五、【甲基化】奧地利學(xué)者:對近600種動(dòng)物的DNA甲基化組分析
Comparative analysis of genome-scale, base-resolution DNA methylation profiles across 580 animal species
Methylation of cytosines is the prototypic epigenetic modification of the DNA. It has been implicated in various regulatory mechanisms throughout the animal kingdom and particularly in vertebrates. We mapped DNA methylation in 580 animal species (535 vertebrates, 45 invertebrates), resulting in 2443 genome-scale, base-resolution DNA methylation profiles of primary tissue samples from various organs. Reference-genome independent analysis of this comprehensive dataset quantified the association of DNA methylation with the underlying genomic DNA sequence throughout vertebrate evolution. We observed a broadly conserved link with two major transitions – once in the first vertebrates and again with the emergence of reptiles. Cross-species comparisons focusing on individual organs supported a deeply conserved association of DNA methylation with tissue type, and cross-mapping analysis of DNA methylation at gene promoters revealed evolutionary changes for orthologous genes with conserved DNA methylation patterns. In summary, this study establishes a large resource of vertebrate and invertebrate DNA methylomes, it showcases the power of reference-free epigenome analysis in species for which no reference genomes are available, and it contributes an epigenetic perspective to the study of vertebrate evolution.
六、【放棄】從開始到放棄:建進(jìn)化樹之前先預(yù)測一下難度?
From Easy to Hopeless - Predicting the Difficulty of Phylogenetic Analyses
Phylogenetic analyses under the Maximum Likelihood model are time and resource intensive. To adequately capture the vastness of tree space, one needs to infer multiple independent trees. On some datasets, multiple tree inferences converge to similar tree topologies, on others to multiple, topologically highly distinct yet statistically indistinguishable topologies. At present, no method exists to quantify and predict this behavior. We introduce a method to quantify the degree of difficulty for analyzing a dataset and present Pythia, a Random Forest Regressor that accurately predicts this difficulty. Pythia predicts the degree of difficulty of analyzing a dataset prior to initiating Maximum Likelihood based tree inferences. Pythia can be used to increase user awareness with respect to the amount of signal and uncertainty to be expected in phylogenetic analyses, and hence inform an appropriate (post-)analysis setup. Further, it can be used to select appropriate search algorithms for easy-, intermediate-, and hard-to-analyze datasets.
七、【顛倒】沙特阿不都拉國王科技大學(xué)(KAUST):亞洲稻的泛基因組分析展示大片段DNA倒位的普遍性
Pan-genome inversion index reveals evolutionary insights into the subpopulation structure of Asian rice (Oryza sativa)
Understanding and exploiting genetic diversity is a key factor for the productive and stable production of rice. Utilizing 16 high-quality genomes that represent the subpopulation structure of Asian rice (O. sativa), plus the genomes of two close relatives (O. rufipogon and O. punctata), we built a pan-genome inversion index of 1,054 non-redundant inversions that span an average of ~ 14% of the O. sativa cv. Nipponbare reference genome sequence. Using this index we estimated an inversion rate of 1,100 inversions per million years in Asian rice, which is 37 to 73 times higher than previously estimated for plants. Detailed analyses of these inversions showed evidence of their effects on gene regulation, recombination rate, linkage disequilibrium and agronomic trait performance. Our study uncovers the prevalence and scale of large inversions (≥ 100 bp) across the pan-genome of Asian rice, and hints at their largely unexplored role in functional biology and crop performance.
八、【重復(fù)】從600余種昆蟲基因組看DNA重復(fù)片段中隱藏著怎樣的秘密?
Repetitive elements in the era of biodiversity genomics: insights from 600+ insect genomes
Repetitive elements (REs) are integral to the composition, structure, and function of eukaryotic genomes. Yet, RE dynamics remain understudied in many taxonomic groups, preventing holistic understanding of how genomes and species evolve. Here, we investigated REs across 601 insect species (20 orders) to better understand the RE landscape of insects and to evaluate automated RE annotation methods in the era of biodiversity genomics. We identified wide variation in the types and frequency of REs across insect groups. We quantified associations between REs and protein-coding genes and found an elevated frequency of associations in insects with abundant long interspersed nuclear elements (LINEs). Sequencing technology impacts RE detection; ~36% more REs could be identified in long-read versus short-read assemblies. Long terminal repeats (LTRs) showed markedly improved detection in long-read assemblies (162% more), while DNA transposons and LINEs showed less respective technology-related bias. We illustrate fundamental challenges to efficient study of REs in diverse groups, showing that in most insect lineages, 25–85% of repetitive sequences were “unclassified” compared to only ~13% of unclassified repeats in Drosophila species. Our findings suggest this RE-annotation bottleneck, driven largely by uneven taxonomic representation in RE reference databases, is worsening. Although the diversity of available insect genomes has rapidly expanded, the rate of community contributions to RE databases (essential for RE annotation) has not kept pace, preventing high resolution study of REs in most groups. We highlight the tremendous opportunity and need for the field of biodiversity genomics to embrace REs and suggest collective steps for making progress towards this goal.
九、【調(diào)控】同濟(jì)大學(xué):SCRIP,一種scATAC-seq中推測基因調(diào)控的整合性方法
Single-cell Gene Regulation Network Inference by Large-scale Data Integration
The Single-cell ATAC-seq (scATAC-seq) has proven to be a state-of-art approach to investigating gene regulation at the single-cell level. However, existing methods cannot precisely uncover cell-type-specific binding of transcription regulators (TRs) and construct gene regulation networks (GRNs) in single-cell. ChIP-seq has been widely used to profile TR binding sites in the past decades. Here, we developed SCRIP, an integrative method to infer single-cell TR activity and targets based on the integration of scATAC-seq and a large-scale TR ChIP-seq reference. Our method showed improved performance in evaluating TR binding activity compared to the existing motif-based methods and reached a higher consistency with matched TR expressions. Besides, our method enables identifying TR target genes as well as building GRNs at the single-cell resolution based on a regulatory potential model. We demonstrate SCRIP’s utility in accurate cell-type clustering, lineage tracing, and inferring cell-type-specific GRNs in multiple biological systems. SCRIP is freely available at https://github.com/wanglabtongji/SCRIP.
十、【通路】德國不倫瑞克工大(TU Braunschweig):一款自動(dòng)注釋生物合成通路的新軟件
KIPEs3: Automatic annotation of biosynthesis pathways
Findings KIPEs3 is an improved version with additional features and the potential to identify not just the core biosynthesis players, but also candidates involved in the decoration steps and in the transport of flavonoids. Functionality of KIPEs3 is demonstrated through the analysis of the flavonoid biosynthesis in Arabidopsis thaliana Nd-1, Capsella grandiflora, and Dioscorea dumetorum. We demonstrate the applicability of KIPEs to other pathways by adding the carotenoid biosynthesis to the repertoire. As a proof of concept, the carotenoid biosynthesis was analyzed in the same species and Daucus carota. KIPEs3 is available as an online service to enable access without prior bioinformatics experience. Conclusion KIPEs3 facilitates the automatic annotation and analysis of biosynthesis pathways with a consistent and high quality in a large number of plant species. Numerous genome sequencing projects are generating a huge amount of data sets that can be analyzed to identify evolutionary patterns and promising candidate genes for biotechnological and breeding applications.