グローバルバイロームデータベースを使用して第一胃生態系のウイルス性暗黒物質を調査する

Nature Communications volume 14、記事番号: 5254 (2023) この記事を引用

331 アクセス

8 オルトメトリック

メトリクスの詳細

多様なルーメンバイロームはルーメンマイクロバイオームを調節する可能性がありますが、ほとんど未解明のままです。ここでは、975 件の公開されている第一胃メタゲノムからウイルス配列をマイニングし、グローバルな第一胃バイロームデータベース (RVD) を作成し、多様性、ウイルスと宿主の連鎖、第一胃機能に影響を与える潜在的な役割について第一胃バイロームを分析します。 RVD には 397,180 の種レベルのウイルス操作分類単位 (vOTU) が含まれており、IMG/VR V3 と比較してメタゲノムからのルーメンウイルスの検出率が大幅に向上します。分類された vOTU のほとんどはカウドウイルス目に属し、ヒトの腸内で見られるものとは異なります。ルーメンバイロームは、繊維分解菌やメタン生成菌を含むコアルーメンマイクロバイオームに感染すると予測されており、多様な補助代謝遺伝子を保有しているため、トップダウンとボトムアップの両方でルーメン生態系に影響を与える可能性があります。 RVD とその発見は、ウイルスが第一胃の生態系や消化生理学にどのような影響を与えるかを調査するための将来の研究のための有用なリソースとベースラインの枠組みを提供します。

最近のウイルスに焦点を当てた相次ぐメタゲノム研究により、海洋ウイルス 1、2、人間の腸 3、4、5、土壌 6 を含むいくつかの生態系に関する非常に大規模なウイルスゲノムカタログとデータベースが生成されました。彼らは、非常に多様なウイルスを明らかにし、多数の補助代謝遺伝子を同定し、ウイルスの生態学的影響に新たな光を当てました。さらに、モデルシステムに焦点を当てた研究により、ウイルスが宿主の生態学的適合性と代謝を変える別個のウイルス細胞を形成することによって、原核生物宿主の代謝をどのように再プログラムできるかが明らかになり始めている7。新たな証拠は、海洋生物地球化学 1,8、人間の生理機能 4、および疾病状態 9 に対するウイルスの潜在的な影響を裏付けています。第一胃バイロームまたは第一胃特異的バイロームデータベースに関する同様の研究は利用できません。

第一胃には、細菌、古細菌、真菌、原生動物、ウイルスを含む多様な多界生態系が存在します。まとめると、第一胃マイクロバイオームは、他の方法では消化できない飼料を消化および発酵させ、反芻動物が成長して肉と乳を生産するために必要なエネルギーの大部分（揮発性脂肪酸の形で）と代謝可能な窒素（微生物タンパク質の形で）を提供します。第一胃細菌、古細菌、原生動物と飼料効率、メタン（CH4）排出量、および動物の健康との強い関連性が文書化されている10が、第一胃ウイルスは、豊富に存在するにもかかわらず、ウイルスに焦点を当てた研究が第一胃の特徴付けに貢献しているにもかかわらず、十分に理解されていないままである。バイローム11、12。電子顕微鏡を使用した初期の研究では、形態学的に多様なバクテリオファージが記録され、尾部ファージが優勢であることが明らかになりました 13,14。初期の培養依存研究では、プレボテラ属、ルミノコッカス属、レンサ球菌属の蔓延種を含む広範囲のルーメン細菌の種や株に感染する可能性のあるバクテリオファージが発見され、これらのファージのほとんどは形態に基づいてミオウイルス科、シフォウイルス科、ポドウイルス科に分類されました。、およびイノウイルス科 (Gilbert と Klieve15 によるレビュー)。これらの研究はルーメンウイルスに関する貴重な情報を提供しましたが、ファージの単純な形態では信頼できる分類学的分類が不可能であるため、国際ウイルス分類委員会 (ICTV: https://ictv.global/taxonomy) はもはや形態を認識していません。に基づいたウイルス分類。

ゲノミクス、メタゲノミクス、およびメタトランスクリプトミクスは、第一胃バイロームを含むウイルスを研究するための主要な技術となっています。最近の培養依存性全ゲノム配列決定により、飼料の消化と発酵に重要な役割を果たすプレボテラ・ルミニコーラ、ルミノコッカス・アルバス、ストレプトコッカス・ボビス、およびブチリビブリオ・フィブリソルベンスに感染する10のファージが同定された16,17。これらのファージゲノムは、モジュール式のゲノム構成、保存されたウイルス遺伝子、および溶解性と溶原性の両方の可能性を示します 17。ルーメンウイルスは、ウイルス様粒子 (VLP) のメタゲノムを使用して研究されています (11 で概説)。しかし、使用されている参照ゲノムデータベースはルーメンウイルスを過小評価しているため、ルーメンウイルスの同定と分類、およびその宿主の予測が制限されています。たとえば、多様な遺伝子型を持つルーメンウイルスが発見されていますが、それらのほとんどは、参照ウイルス配列と一致しないため分類されていません 18、19、20。 Miller ら 18 は、一部のルーメン微生物ゲノムおよびメタゲノムにクラスター化された規則的に間隔をあけた短い回文反復配列 (CRISPR)/CRISPR 関連タンパク質 (Cas) エレメントを発見しましたが、宿主を予測するためのルーメンウイルス配列と一致するスペーサー配列はほとんど見つかりませんでした。したがって、特に新規ウイルスに関して、ルーメンバイロームを特徴付けることは困難でした。

12-fold) and IMG/VR V3 and improving the identification of viral sequences based on rumen metagenomics, RVD will be useful as a new community resource and will provide new insights for future studies on the rumen virome and its implication in feed digestion, microbial protein synthesis, feed efficiency, and CH4 emissions./p>5 kb each and clustered them into 411,125 vOTUs. After validation with VIBRANT23, we constructed a rumen virome database (RVD, download available at https://zenodo.org/record/7412085#.ZDsE2XbMK5c) representing 397,180 vOTUs (Supplementary Fig. 1), with 193,327 vOTUs of >10 kb. Checking with CheckV21 revealed 4400 complete vOTUs, 4396 high-quality vOTUs, and 32,942 medium-quality vOTUs. The completeness and quality of the RVD vOTUs were probably underestimated because CheckV is database dependent, and the databases used are primarily derived from other ecosystems. All the vOTUs in RVD meet Uncultivated Virus Genome (MIUViG) standards25./p>50% completeness of the current study and the two largest human gut virome databases (MGV4 and GPD5). For better visualization, only one representative vOTU (the longest and most complete) was included for each genus-level vOTU (714 in total). The branches were color-coded: green, the Caudovirales lineages exclusively found in the human virome; red, the lineages exclusively found in the rumen virome of the current study; blue, the lineages found in both the rumen and the human viromes. Lysogeny rates (proportion) were calculated with VIBRANT and shown as the inner ring. The number of vOTUs representing each lineage was shown as a bar plot (red for human viruses, and black for human viruses). d Proportion of lineages of Caudovirales viruses unique to the human intestine, the rumen, and shared. e A rarefaction curve of the vOTUs identified in the rumen virome. The upward trend of the rarefaction curve indicates that more rumen viruses remain to be identified at the specie level./p>1 phage per host genome. The percentage of lysogenic viruses varied among the host genera, and it was low for most host genera (Fig. 3c). Most ciliate SAGs presented multiple EVEs, among which all five SAGs of Isotricha sp. YL-2021b and Dasytricha ruminantium presented the greatest number (>50) EVEs per SAG (Supplementary Fig. 5). Little is known about viruses infecting ciliates, and no EVEs have been reported for even model ciliate species (e.g., Tetrahymena thermophila). However, EVEs have been recently found in Entamoeba and Giardia in human stool metagenomes32. Therefore, rumen ciliates probably carry EVEs. The large number of EVEs per ciliate SAG may correspond to the high polyploidy and the enormous numbers of chromosomes found in many rumen ciliates (e.g., >10,000 in Entodinium caudatum33)./p>12-fold). Based on the gene-sharing network, most rumen vOTUs were clustered into four groups (Fig. 3b). Groups I (the largest) and IV (the smallest) contained more classified vOTUs than groups II and III. Groups I and IV had a broader host range among bacterial phyla, including both gram-positive and gram-negative bacteria with different niches and capacities, but few of their genera or families were predominant in the rumen. Groups II and III mainly infected Bacteroidota and Methanobacteriota, respectively (Fig. 3c), and most viruses of these two groups could not be classified with any of the current virome databases; thus, they represent new viral lineages. The narrow host range (a single phylum) of groups II and III supports the notion that phages with a high degree of gene sharing generally infect phylogenetically related hosts./p>2400) and bacteriophages (>40,000) down to the species level, and many of the host species are known to play important roles in feed digestion, fermentation, and methane emissions. Advancement in the prediction of hosts and virus‒host linkages will aid in understanding the ecological roles of rumen viruses. Such information will be especially useful when both the rumen metagenome and virome are investigated for their association with major rumen functions. Among the rumen vOTUs with a predicted host match, 99.5% were inferred to infect prokaryotes primarily found in the rumen, even though most of the reference prokaryote genomes that were used came from prokaryotes in other environments, demonstrating the rigor and low false positive rate of our host prediction pipeline./p>5 kb were verified using VirSorter222 (option: --min-score 0.5), and the contigs that passed the verification procedure were input to CheckV21 to trim off host sequences flanking prophages. We only chose viral contigs >5 kb because the currently available bioinformatics tools show a relatively high false positive rate when identifying viral contigs <5 kb30. Only the contigs falling into categories Keep1 and Keep2 were retained as putative viral contigs (708,580 in total) for further analyses./p>10 kb to genus-level viral taxa based on a gene-sharing network using vConTACT226, which uses NCBI RefSeq Viral (release 88) as reference genomes. The vOTUs that could be clustered with the reference genomes of a viral genus were assigned to that genus according to the vConTACT2 workflow. We assigned the vOTUs that failed to be assigned to a viral genus and those <10 kb to family-level viral taxa using the majority rule, as applied previously4. Briefly, we predicted the ORFs of each vOTU using Prodigal56 and then aligned the ORF sequences with those of NCBI RefSeq Viral using BLASTp with a bit score of ≥50. The vOTUs that were aligned with the NCBI RefSeq Viral genomes of a viral family with >50% of their protein sequences were assigned to that family. We identified crAss-like phages using BLASTn against 2,478 crAss-like phage genomes identified from previous studies57,58,59, with a threshold of ≥80% sequence identity along ≥50% of the length of previously identified crAss-like vOTUs./p>50% were included in the search. We then aligned each of the marker genes from the three databases using MAFFT62, sliced out the positions with >50% gaps using trimAl63, concatenated each aligned marker gene, and filled the gap where a marker gene was absent. Only the concatenated marker genes that each showed >3 marker genes and were found in >5% of all the aligned concatemers were retained, resulting in 10,203 Caudovirales marker gene concatemers, each with 13,573 alignment columns. These marker gene concatemers were clustered into genus-level vOTUs as described previously5, where benchmarking was performed to achieve high taxonomic homogeneity using NCBI RefSeq Viral genomes. We built a phylogenetic tree of Caudovirales viruses using FastTree v.2.1.9 (option: -mlacc 2 -slownni -wag)64 and aligned the concatenated marker genes of the representative vOTUs sequences of all the genus-level vOTUs with genome completeness >50% (based on CheckV analysis). The Caudovirales tree was visualized using iTOL65. The vOTUs identified as prophages or encoding an integrase were considered lysogenic. The lysogenic rate (%) was calculated based on the VIBRANT results as the percentage of lysogenic viruses of all the viruses for each genus of their probable hosts./p>2,500 bp of a host genome or MAG matched a vOTU sequence at >90% sequence identity over 75% of the vOTU sequence length4. We predicted probable protozoal hosts of the rumen viruses by searching the 52 high-quality ciliate SAGs68 for EVEs using BLASTn and the above criteria./p>10 kb (5912 in total) for AMG identification using the criteria recommended in a benchmarking paper30. The selected vMAGs were then subjected to AMG identification and genome annotation using DRAMv72 after processing with VirSorter2 with the options “—prep-for-dramv” applied. Second, the AMG-carrying vMAGs were removed if the AMGs were at an end of the vMAGs or if the AMGs were not flanked by both one viral hallmark gene and one viral-like gene or by two viral hallmark genes (category 1 and category 2 as determined by DRAMv). Third, the remaining vMAGs were further manually curated based on the criteria specified in the VirSorter2 SOP (https://doi.org/10.17504/protocols.io.bwm5pc86; also see https://github.com/yan1365/RVD/blob/main/vmags_check_helper/readme.txt). We eventually obtained 1,880 vMAGs. To further minimize false identification, we manually checked the genomic context of these vMAGs and found that some of them were still possible genomic islands. Therefore, we filtered the 1880 vMAGs based on the criteria established by Sun and Pratama et al. (unpublished data). Briefly, vMAGs with only integrases/transposases, tail fiber genes, or any nonviral genes were removed. The remaining vMAGs were filtered again to remove those that did not have at least one of the viral structural genes (i.e., capsid protein, portal protein, phage coat protein, baseplate, head protein, tail protein, virion structural protein, and terminase) and those containing genes encoding an endonuclease, plasmid stability protein, lipopolysaccharide biosynthesis enzyme, glycosyltransferase (GT) families 11 and 25, nucleotidyltransferase, carbohydrate kinase, or nucleotide sugar epimerase. We eventually obtained 504 vMAGs free of genomic islands. To benchmark our curation pipeline, 100 of the vMAGs were randomly selected for detailed manual curation based on their genomic context. According to the benchmarking results, we were confident that we retained only complete vMAGs for AMG prediction. Detailed results of each curation step and full annotation of the final vMAGs and the annotation of the identified AMGs are presented in Supplementary Data 4. We compared the AMGs identified in the rumen virome to the previously identified AMGs from other viromes, which are available in an expert-curated AMG database (https://github.com/WrightonLabCSU/DRAM/blob/master/data/amg_database.tsv). For the newly identified AMGs, we double-checked the annotations and searched the literature to ensure that they were truly AMGs./p>50% concentrate). First, we transformed the raw abundance table into a binary matrix (presence or absence). Then, the prevalence of each vOTU in each sample was calculated. A vOTU was included in the core rumen virome if its prevalence exceeded 50% of the prevalence for each concentrate level or all cattle. Based on prevalence, the vOTUs were categorized as individualized (observed in only one sample), one concentrate level (observed in more than 1 sample but exclusively from a single concentrate level), two concentrate levels (observed in animals from two concentrate levels) and three concentrate levels (observed in all three concentrate levels). The numbers of vOTUs shared by the core viromes among the three concentrate levels were visualized with a Venn graph in R. We examined whether animals from the same diet or same breed share more vOTUs compared to animals fed different diets or of different breeds using subsets of data from Stewart et al.78 and Li et al.79 respectively. The Kruskal–Wallis test was used to compare the numbers of shared vOTUs in different groups in R./p>12 metagenomes were retained for the analysis. The number of vOTUs shared by two studies was compared for every study pair, and the results were subjected to hierarchical clustering. The hierarchical clustering results were visualized in R with the ComplexHeatmap package81 and annotated according to the metadata./p>