2021年年底的bioRxiv生信好文速覽,我們的引子當(dāng)然少不了年度盤點(diǎn)。這次,小編決定將舞臺交給Richard Sever。理查德是大名鼎鼎的bioRxiv、medrxiv,同時也是美國冷泉港實(shí)驗(yàn)室出版社的的聯(lián)合創(chuàng)始人。新年伊始,理查德就在推特上對biorxiv過去的一年做了盤點(diǎn)。另外,有讀者向小編建議說每期推薦的文章應(yīng)該加上翻譯,小編決定部分地滿足大家的這一愿望:在引子部分,對理查德的biorxiv總結(jié)來一個翻譯吧。
去年疫情這么嚴(yán)重,各個行業(yè)都往下走,只有bioRxiv和medRxiv業(yè)績越來越好,文章越來越多啊,而且頻頻在各大媒體的顯要位置出現(xiàn)。沒辦法,誰讓新冠的文章都首先在我們這里投稿呢??匆院笳l敢瞧不起我們?
看看上面這個截圖吧,25號你們這幫人也不知道休息,一個勁往我們bioxiv投稿,搞得我們后臺工作人員圣誕節(jié)都要忙的不亦樂乎。順便希望來自中國的作者多多包涵,如果你發(fā)現(xiàn)12月底biorxiv稿件的處理速度慢了,那是因?yàn)樵诿绹フQ節(jié)就相當(dāng)于農(nóng)歷新年。
不論怎樣,新年了,給大家拜個年。看到的趕快給我點(diǎn)贊。
【奧密克戎】蒙大拿州立大學(xué):新冠病毒演化分析表示棘蛋白的突變有機(jī)會削弱疫苗效果
The rise and fall of SARS-CoV-2 variants and the mutational profile of Omicron
Omicron is the fifth SARS-CoV-2 variant to be designated a Variant of Concern (VOC) by the World Health Organization (WHO). Here we provide a retrospective analysis of SARS-CoV-2 variants and explain how the Omicron variant is distinct. Our work shows that the spike protein is a ‘hotspot’ for viral evolution in all variants, suggesting that existing vaccines and diagnostics that target this protein may become less effective against Omicron and that our therapeutic and public health strategies will have to evolve along with the virus.
【大王烏賊】德布魯因圖構(gòu)建工具Cuttlefish(烏賊)升級了
Scalable, ultra-fast, and low-memory construction of compacted de Bruijn graphs with Cuttlefish 2
We present Cuttlefish 2, significantly advancing the existing state-of-the-art methods for construction of this graph. On a typical shared-memory machine, it reduces the construction of the compacted de Bruijn graph for 661K bacterial genomes (2.58 Tbp of input reference genomes) from about 4.5 days to 17–23 hours. Similarly on sequencing data, it constructs the graph for a 1.52 Tbp white spruce read set in about 10 hours, while the closest competitor, which also uses considerably more memory, requires 54–58 hours.
【巨型病毒】巴黎薩克雷大學(xué)(Université Paris-Saclay)及日本京都大學(xué)(Kyoto University)聯(lián)合團(tuán)隊(duì)發(fā)現(xiàn)海洋宏基因組數(shù)據(jù)中的新型巨型病毒
Discovery of a class of giant virus relatives displaying unusual functional traits and prevalent within plankton: the Mirusviricetes
Large and giant DNA viruses of the phylum Nucleocytoviricota have a profound influence on the ecology and evolution of planktonic eukaryotes. Recently, various Nucleocytoviricota genomes have been characterized from environmental metagenomes based on the occurrence of hallmark genes identified from cultures. However, lineages diverging from the culture genomics functional principles have been overlooked thus far. Here, we developed a phylogeny-guided genome-resolved metagenomic framework using a single hallmark gene as compass, a subunit of DNA-dependent RNA polymerase encoded by most Nucleocytoviricota. We applied this method to large metagenomic data sets from the surface of five oceans and two seas and characterized 697 non-redundant Nucleocytoviricota genomes up to 1.45 Mbp in length. This database expands the known diversity of the class Megaviricetes and revealed two additional putative classes we named Proculviricetes and Mirusviricetes. Critically, the diverse and prevalent Mirusviricetes population genomes seemingly lack several hallmark genes, in particular those related to viral particle morphogenesis. Instead, they share various genes of known (e.g., TATA-binding proteins, histones, proteases and viral rhodopsins) and unknown functions rarely detected if not entirely missing in other characterized Nucleocytoviricota classes. Phylogenomics, comparative genomics, functional trends and the signal among planktonic cellular size fractions point to Mirusviricetes being a major, functionally divergent class of large DNA viruses that actively infect eukaryotes in the sunlit ocean using an enigmatic functional life style. Finally, we built a comprehensive marine genomic database for Nucleocytoviricota by combining multiple environmental surveys that might contribute to future endeavors exploring the ecology and evolution of plankton.
【一字之差】Deepmed,醫(yī)學(xué)圖像學(xué)研究的deepmind?
DeepMed: A unified, modular pipeline for end-to-end deep learning in computational pathology
The interpretation of digitized histopathology images has been transformed thanks to artificial intelligence (AI). End-to-end AI algorithms can infer high-level features directly from raw image data, extending the capabilities of human experts. In particular, AI can predict tumor subtypes, genetic mutations and gene expression directly from hematoxylin and eosin (H&E) stained pathology slides. However, existing end-to-end AI workflows are poorly standardized and not easily adaptable to new tasks. Here, we introduce DeepMed, a Python library for predicting any high-level attribute directly from histopathological whole slide images alone, or from images coupled with additional meta-data (https://github.com/KatherLab/deepmed). Unlike earlier computational pipelines, DeepMed is highly developer-friendly: its structure is modular and separates preprocessing, training, deployment, statistics, and visualization in such a way that any one of these processes can be altered without affecting the others. Also, DeepMed scales easily from local use on laptop computers to multi-GPU clusters in cloud computing services and therefore can be used for teaching, prototyping and for large-scale applications. Finally, DeepMed is user-friendly and allows researchers to easily test multiple hypotheses in a single dataset (via cross-validation) or in multiple datasets (via external validation). Here, we demonstrate and document DeepMed’s abilities to predict molecular alterations, histopathological subtypes and molecular features from routine histopathology images, using a large benchmark dataset which we release publicly. In summary, DeepMed is a fully integrated and broadly applicable end-to-end AI pipeline for the biomedical research community.
5. 【華蓋朵朵】匈牙利學(xué)者:比較基因組顯示蘑菇基因組中約10%的基因與子實(shí)體發(fā)育有關(guān)
Lessons on fruiting body morphogenesis from genomes and transcriptomes of Agaricomycetes
Altogether, our discussions cover 1480 genes of Coprinopsis cinerea, and their orthologs in Agaricus bisporus, Cyclocybe aegerita, Armillaria ostoyae, Auriculariopsis ampla, Laccaria bicolor, Lentinula edodes, Lentinus tigrinus, Mycena kentingensis, Phanerochaete chrysosporium, Pleurotus ostreatus, and Schizophyllum commune, providing functional hypotheses for ~10% of genes in the genomes of these species. Although experimental evidence for the role of these genes will need to be established in the future, our data provide a roadmap for guiding functional analyses of fruiting related genes in the Agaricomycetes. We anticipate that the gene compendium presented here, combined with developments in functional genomics approaches will contribute to uncovering the genetic bases of one of the most spectacular multicellular developmental processes in fungi.
【冰原求生】植物如何適應(yīng)極地環(huán)境?來看看轉(zhuǎn)錄組分析能帶來哪些線索。來自挪威奧斯陸大學(xué)
What can the cold-induced transcriptomes of Arctic Brassicaceae tell us about the evolution of cold tolerance?
We found that the cold response is highly species-specific. Among thousands of differentially expressed genes, ~200 genes were shared among the three Arctic species and A. thaliana, and only ~100 genes were specific to the three Arctic species alone. This pattern was also reflected in the functional comparison. Our results show that the cold response of Arctic plant species has mainly evolved independently, although it likely builds on a conserved basis found across Brassicaceae. The findings also confirm that highly polygenic traits, such as cold tolerance, may show less repeatable patterns of adaptation than traits involving only a few genes.
【小腦發(fā)育】德國海德堡大學(xué)(Heidelberg University):單核RNA測序分析為腦瘤的發(fā)生提供新思路
Mapping pediatric brain tumors to their origins in the developing cerebellum
Understanding the cellular origins of childhood brain tumors is key for discovering novel tumor-specific therapeutic targets. Previous strategies mapping cellular origins typically involved comparing human tumors to murine embryonal tissues1,2, a potentially imperfect approach due to spatio-temporal gene expression differences between species3. Here we use an unprecedented single-nucleus atlas of the developing human cerebellum (Sepp, Leiss, et al) and extensive bulk and single-cell transcriptome tumor data to map their cellular origins with focus on three most common pediatric brain tumors – pilocytic astrocytoma, ependymoma, and medulloblastoma. Using custom bioinformatics approaches, we postulate the astroglial and glial lineages as the origins for posterior fossa ependymomas and radiation-induced gliomas (secondary tumors after medulloblastoma treatment), respectively. Moreover, we confirm that SHH, Group3 and Group4 medulloblastomas stem from granule cell/unipolar brush cell lineages, whereas we propose pilocytic astrocytoma to originate from the oligodendrocyte lineage. We also identify genes shared between the cerebellar lineage of origin and corresponding tumors, and genes that are tumor specific; both gene sets represent promising therapeutic targets. As a common feature among most cerebellar tumors, we observed compositional heterogeneity in terms of similarity to normal cells, suggesting that tumors arise from or differentiate into multiple points along the cerebellar “l(fā)ineage of origin”.
【擴(kuò)增子】amplicon分析再填利器,英國Quadram Institute出品
LotuS2: An ultrafast and highly accurate tool for amplicon sequencing analysis
In LotuS2, six different sequence clustering algorithms as well as extensive pre- and post-processing options allow for flexible data analysis by both experts, where parameters can be fully adjusted, and novices, where defaults are provided for different scenarios. We benchmarked three independent gut and soil datasets, where LotuS2 was on average 29 times faster compared to other pipelines - yet could better reproduce the alpha- and beta-diversity of technical replicate samples. Further benchmarking a mock community with known taxa composition showed that, compared to the other pipelines, LotuS2 recovered a higher fraction of correctly identified genera and species (98% and 57%, respectively). At ASV/OTU level, precision and F-score were highest for LotuS2, as was the fraction of correctly reconstructed 16S sequences.
【博聞強(qiáng)識】英國生物醫(yī)藥公司Astrazeneca科學(xué)家James Hadfield博文:2022伊始BGI進(jìn)軍北美,illumina終遇挑戰(zhàn)
illumina are finally getting some NGS competition
http://enseqlopedia.com/2021/12/illumina-finally-getting-ngs-competition/?utm_campaign=coregenomicstwitter&utm_medium=twitter&utm_source=twitter
【不說生信】斯坦福大學(xué):核糖體的直接觀測揭示真核生物翻譯的掃描和調(diào)控過程
Rapid 40S scanning and its regulation by mRNA structure during eukaryotic translation initiation
How the eukaryotic 43S preinitiation complex scans along the 5′ untranslated region (5′UTR) of a capped mRNA to locate the correct start codon remains elusive. Here, we directly track yeast 43S-mRNA binding, scanning, and 60S subunit joining by real-time single-molecule fluorescence spectroscopy. Once engaged with the mRNA, 43S scanning occurs at >100 nucleotides per second, independent of multiple cycles of ATP-hydrolysis by RNA helicases. The scanning ribosomes can proceed through RNA secondary structures, but 5′UTR hairpin sequences near start codons drive scanning ribosomes at start codons back in the 5′ direction, requiring rescanning to arrive once more at a start codon. Direct observation of scanning ribosomes provides a mechanistic framework for translational regulation by 5′UTR structures and upstream near-cognate start codons.