A nonparametric regression model do relax the strict assumptions of classical regression models, and serve any form distribution data. It does not choose model form, especially, relaxing the assumption of linear relationship between the responses and the explanatory variables. Therefore, it extends linear models and strengthen model adaptability. In order to improve on LS estimate, the penalized sum of squares is set up. The penalized least squares estimator for regression function by minimizing the penalized sum of squares can be obtained, this estimator compromise between goodness of fit and smoothness.There are few quantitative indices measuring level of gene expression. Distributions of these indices are unknown, and patterns of dependent relationship between level of gene expression and influencing factors are indefinite. So, some strict assumptions supporting the classical theory of linear models are not satisfied. If data do not meet these conditions of classical statistical approaches, statistical inferences drawn from classical approaches would be, to different extent, influenced in negative direction and even erroneous conclusions would be drawn.Therefore, nonparametric regression models would help us solve statistical problems of genome nformatics..This project aims at establishing non-parametric regression models for analyzing gene expression regulation networks. Based on cubic spline and roughness penalty approach, a set of theories and algorithms of nonparametric regression models are proposed for various cases of nonparametric regression analysis. We explore smoothing spline, weighted nonparametric regression model, semiparametric regression model and multidimensional nonparametric regression model in consideration of weights, ties and covariables. We provide cross-validation (CV) score function and generalized cross-validation (GCV) score function. The best design of interest parameters can be obtained by a module form search method. Various nonparametric regression models are verified and assessed by statistical simulations and examples. The computational method measuring codon usage bias is proposed, and codon usage frequencies for two known yeasts are analyzed by using Relative Synonymous Codon Usage (RSCU). Thus highly expressed optimal codons are determined. RSCU-based quantitative statistic, Codon Adaptation Index (CAI), is proposed to measure level of gene expression. The regression relationship between CAI for yeast and such factors as codon usage bias, third base composition and linear correlation of codon usage with tRNA abundance. A proper software for nonparametric regression models is compiled.
本项目研究建立一套适合分析基因表达调控网络的非参数回归模型。从理论上阐述模型的特性,用粗糙度惩罚方法构造出模型的目标函数并证明出密码子不同位置上的碱基组成及其相关性与基因表达水平间的回归关系,提出用于基因表达调控分析的一些统计量。在模型中考虑协变量,控制非调节因素对模型核心参估计值的干扰,并建立高维非参数回归模型。
{{i.achievement_title}}
数据更新时间:2023-05-31
DeoR家族转录因子PsrB调控黏质沙雷氏菌合成灵菌红素
粗颗粒土的静止土压力系数非线性分析与计算方法
正交异性钢桥面板纵肋-面板疲劳开裂的CFRP加固研究
基于LASSO-SVMR模型城市生活需水量的预测
低轨卫星通信信道分配策略
半参数/非参数回归模型的变量选择
非参数半参数分位数回归模型及其应用
方差分量模型中的Bayes分析及非参数回归极值点的研究
非参数回归模型变点的监测方法研究