There has been a tremendous growth in the statistical models and techniques to analyze spatio-temporal data such as air-pollution data in recent years. Spatio-temporal data arise in many other contexts, for example, disease mapping and economic monitoring of real estate prices. Often the primary interests in analyzing such data are to smooth and predict time evolution of some response variables over a certain spatial domain. Linear mixed effects models are main tools to estimate population effects and make statistical inference. However, the restriction that the covariance matrix is derived from the random effects included in the model not only limits their usage in practice but also reduces their efficiencies and even produces misleading conclusions in some cases. In this project we proposal a family of random mean models to overcome such problem.The sophisticated correlations can be set up in such models temporally and spatially. Spatio-temporal are prone to irregularities, including outliers,skewness,excess of zeros and measurement errors.We proposed to model the left skewed distribution of random effects in a linear mixed effects model with a log-gamma distribution in our previous studies. It effectively predicted the patient-specific disease progression rates. We propose to extend this framework in more sophisticated settings to facilitate modelling more complex data. When some of the distributions of random effects appear skewed shapes, the estimation of the correlations may be biased if the random effects are modelled by normal distributions, as usually do in routine mixed effects model fittings. Assuming a log-gamma distribution for the skewed random effects and using a upper triangle matrix for the correlations can correctly estimate not only the subject-specific process but also their correlations. Another issue in the spatio-temporal studies is the tail dependence. Multivariate normal distributions are known to be tail independent but multivariate t distributions are tail dependent and the dependence is determined by the correlation coefficient. We will explore this property of MT model in addition to the robustness against outliers shown in our previous studies. Similar issues may present in spatio-temporal data analysis where some of areas showing outstanding progressions than most of the others. Tail dependence find its own way in this setting as well. Random mean models, a generalization of random effects models that is capable of modelling autocorrelations and spatial correlations with multivariate t distributions, are proposed to deal with spatio-temporal data analysis. Excess zeros are often observed in spatio-temporal data. Measurements in response are prone to errors but it has not been discussed thoroghly in the literature. We propose to study a measurement error model with truncated-zero possion or binomial for the response variable in a spatio-temporal data analysis with discrete response variables.
时空数据分析日趋重要,从空气污染数据到房地产发展预测都离不开时空数据统计分析。本项目将提出随机均值模型,致力于其在时空数据分析中应用的研究以及在这个模型基础上,对各种时空数据中出现的异常现象提出相应的稳健模型。时空数据特征决定了其时间和空间维度上的相关性非常复杂,随机均值模型中时间和空间的相关性将可以自由设定,可以解决混合效应模型协方差矩阵由随机效应决定这一缺陷。当时空数据中出现异常值使得正态分布假设不满足时,我们提出用多维t分布模型提供对异常观测的稳健分析;当随机效应分布不对称时,我们探讨用对数伽马分布随机均值模型提供稳健分析。在离散时空数据出现多零现象,并在因变量观测值有误差时,提出用截零泊松分布随机期望模型,结合测量误差模型技术探讨其估计偏差等性质,完善时空数据分析。
本项目探讨非正态分布稳健均值模型的性质,以及在分析复杂时空数据中的若干问题,主要解决了以下问题:用随机均值模型解决时空数据分析中对复杂相关性假设的要求;异常观测值造成的参数估计偏差及使用正态分布随机场效应忽略了的尾部相关性等问题;个体效应估计偏差以及因此产生的随机场效应相关性估计的偏差等问题。具体如下:. 1.非正态分布分析连续比例数据:通过单纯形分布拟合连续比例数据提出了新的生成单纯形分布随机数的方法,包括4种适合不同情况的单纯形回归模型拟合截断数据和纵向数据。文章发表在期刊《Journal of Statistical Software》上,JSS最新的影响因子高达22.7,排名数学专业期刊第一名,所有学术期刊第53名。.研究提出了基于log-gamma分布的混合效应模型,在纵向数据分析中解决个体效应估计偏差以及因此产生的随机场效应相关性估计的偏差的问题。模型应用于青光眼治疗研究的数据,取得了很好的效果。研究发表在《Biometrical Journal》上。. 2.基于非正态分布混合分布模型的稀疏分类数据聚类分析。研究提出并深入分析了基于混合分布模型的稀疏分类数据的聚类问题。成果发表于《Journal of Computational Statistics & Data Analysis》。根据稀疏分类数据聚类分析中出现的混合分布同质性检验问题提出了新的统计检验方法,研究成果发表在《Statistical Papers》上。.3.教育评价的统计方法研究:针对幼儿园总体质量对幼儿身心发展影响的分析,提出了分段回归模型,拟合数据发现幼儿园质量对儿童认知、社会认知和运动能力的发展存在着门槛效应。研究发表在学前儿童教育领域排名第二的期刊《Early Childhood Research Quarterly》上。. 研究团队在2015-2018年的项目实施期发表4篇SCI检索论文,其中2篇第一作者,2篇通讯作者,一个期刊影响因子高达22.7。发表1篇SSCI Q2分区论文,期刊在学前儿童教育领域排名第二。另参与2项国家社科基金研究和2项横向合作项目,运用统计学知识进行交叉合作,取得满意的科研成果。 在国际合作方面参加3次国际重要的专业学术会议并作专题发言。
{{i.achievement_title}}
数据更新时间:2023-05-31
论大数据环境对情报学发展的影响
一种光、电驱动的生物炭/硬脂酸复合相变材料的制备及其性能
粗颗粒土的静止土压力系数非线性分析与计算方法
基于SSVEP 直接脑控机器人方向和速度研究
宁南山区植被恢复模式对土壤主要酶活性、微生物多样性及土壤养分的影响
复杂数据分析中的稳健统计方法及其应用
基于局部模型的时空数据回归分析方法及其应用
结合图像分析的非参数随机效应模型及其在临床医学数据中的应用研究
一类随机均衡约束优化问题的样本均值逼近-正则化方法及其在经济学模型中的应用