Both high quality genomes and complete gene structures are the goals of the whole genome project. Development of novel assemblers to improve the completeness of genome is an important methodology in the field of genome assembly. Scaffolding with large-insert libraries is a common solution to improve genome completeness. However, it has some drawbacks. Furthermore, it could not improve the genome completeness for aquatic species of high polymorphism. Therefore, it is necessary to bring forward novel assembly strategies. For low quality genome assemblies, it is possible to scaffold genomes with split gene products. However, such methods are few. The accuracy of these algorithms is unknown or low, limiting their application into genome scaffolding. Previously we developed a scaffolding model based on highest weight in linkage matrix. The model has been successfully applied into a high accuracy algorithm using single-end transcriptome reads to scaffold genomes. With the model, in this study we plan to implement two methods, scaffolding genomes with pair-end RNA-seq reads and with proteins, respectively. We will systematically study the effects of different parameters on assembly efficiency and accuracy. Finally, these two methods will be used to update common carp genome to demonstrate the usabilities of them. The study will supplement current genome assembly methodology and provide reliable and easy solutions to obtain high quality genomes for aquatic species. These two methods are of practical use and will have wide application prospect.
高质量的全基因组图谱以及完整的基因结构,是基因组项目的主要目标。开发新型组装策略以提高基因组完整性,是基因组装配研究的重要方法性问题。构建长片段的DNA文库进行组装是提高基因组完整性的常用方法,但存在弊端。对于高杂合度的水产物种,该方法也无法显著提升完整性。因此需要提出新的组装策略。在低质量基因组中,利用被分割基因拼接基因组,具有可行性。但这类方法屈指可数。由于准确性差或者不明了,限制它们应用于基因组装配。本项目前期提出连接矩阵最优模型。该模型应用于转录组单端读序组装全基因组算法上,拼接准确性高。基于该模型,本项目将实现两种方法:利用转录组双端序列组装基因组和利用蛋白拼接基因组;系统研究不同参数对组装效能和准确性的影响;并应用这两种算法更新鲤鱼基因组图谱。本研究将补充现有的基因组拼接方法,为获得高质量的水产物种基因组图谱提供可靠的、简便的解决手段,有重要的实用价值和广阔的应用前景。
高质量的全基因组图谱以及完整的基因结构,是结构基因组学的主要目标。开发新型组装策略以提高基因组完整性,是基因组装配研究的重要方法性问题。构建长片段的DNA文库进行组装是提高基因组完整性的常用方法,但存在弊端。对于高杂合度的水产物种,该方法也无法显著提升完整性。因此需要提出新的组装策略。本项目在前期提出连接矩阵最优模型的基础上,开发了基于蛋白组装全基因组的算法——PEP_scaffolder,和基于双端转录组测序组装全基因组的算法——P_RNA_scaffolder。其中PEP_scaffolder是利用同源蛋白组装全基因组的算法,该算法准确性高于同类算法,比现有算法快。P_RNA_scaffolder利用双端转录组测序数据组装全基因组,在同类算法中组装准确性最高,并且运算速度最快,转录区域完整性接近于基因组完成图的比例。利用这两项软件,我们更新鲤基因组图谱,contigN50提升了24%。在本项目的支持下,发表SCI论文4篇,其中PEP_scaffolder软件发表在生物信息学一区杂志Bioinformatics(IF=7.3),申请发明专利2项,获得软件著作权3项,重新组装的鲤鱼基因组资源发布在http://www.fishbrowser.org/database/Commoncarp_genome/上。本研究补充现有的基因组拼接方法,为获得高质量的水产物种基因组图谱提供可靠的、简便的解决手段,有重要的实用价值和广阔的应用前景。
{{i.achievement_title}}
数据更新时间:2023-05-31
DeoR家族转录因子PsrB调控黏质沙雷氏菌合成灵菌红素
转录组与代谢联合解析红花槭叶片中青素苷变化机制
基于全模式全聚焦方法的裂纹超声成像定量检测
丙二醛氧化修饰对白鲢肌原纤维蛋白结构性质的影响
PI3K-AKT-mTOR通路对骨肉瘤细胞顺铂耐药性的影响及其机制
基于全基因组序列信息建立中国荷斯坦牛基因组选择新方法
基于序列并整合生物学先验的全基因组预测新方法研究
全基因组关联荟萃分析的新方法研究与应用
祖先野生种花生全基因组MITEs转座子的鉴定、开发及应用研究