高维数据保真降维方法研究

基本信息

批准号：61471182

项目类别：面上项目

资助金额：75.00

负责人：祁云嵩

学科分类：

依托单位：江苏科技大学

批准年份：2014

结题年份：2018

起止时间：2015-01-01 - 2018-12-31

项目状态：已结题

项目参与者：杨习贝,于化龙,史金龙,束鑫,王东升,胡兴旺,陈继磊

关键词：

数据挖掘特征选择数据过滤

结项摘要

The existing feature dimension reduction methods can roughly be categorized into two classes: feature extraction and feature selection. In feature extraction problems, the original features in the measurement space are initially transformed into a new dimension-reduced space via some specified transformation. Although the significant variables determined in the new space are related to the original variables, the physical interpretation in terms of the original variables may be lost. So, feature extraction will change the description of the original data. Unlike feature extraction, feature selection aims to seek optimal or suboptimal subsets of the original features by preserving the main information carried by the complete data to facilitate future analysis for high dimensional problems. Often, the selected features are a subset of the original features, those insignificant and redundant features may be discarded. It is worth mentioning that almost all of the existing dimensionality reduction methods are not high fidelity methods. The result of these methods are only suitable for specific subsequent data analysis tasks, which is only a particular task under the preprocess. In this project, we study the dimensionality high fidelity reduction problem. The processing results can save all the useful information, eliminate the irrelevant features from the original data. The project will be implemented with the technique of multiple hypothesis testing. The research content involves the characteristics of correlation analysis, threshold estimation of hypothesis testing, null hypothesis proportion estimation, interval analysis, etc. The research has practical significance for big data analysis.

现有的特征降维方法大致可分为特征提取和特征选择。在特征提取过程中，数据中的原始特征通过某些数据变换被映射到一个低维空间。尽管提取出的特征与原始特征相关，但不再具有原始特征的物理意义- - -特征提取改变了原始数据的表达形式。与特征提取不同，特征选择则在原有的特征集中选择一个子集，选择出的特征子集中不再含有与数据分析任务相关性不大或冗余的那部分特征,其结果可能引起信息丢失。由此可见，现所有的数据降维方法几乎都不是保真降维，其降维后的数据仅适合特定的后续数据分析任务，因而只能算是特定数据分析任务的前期数据预处理。本项目的研究探索一类高保真数据降维方法，其降维结果致力于保留原始数据中的全部（期望的）原始特征，最大限度地剔除无关特征。项目研究借助多重假设检验方法，其研究内容涉及特征相关分析、假设检验阈值估算、零假设比例估算、区间值处理分析等关键技术。项目研究结果对大数据清洗、存储等有实际意义。

项目摘要

现有的特征降维方法大致可分为特征提取和特征选择。在特征提取过程中，数据中的原始特征通过某些数据变换被映射到一个低维空间。尽管提取出的特征与原始特征相关，但不再具有原始特征的物理意义---特征提取改变了原始数据的表达形式。与特征提取不同，特征选择则在原有的特征集中选择一个子集，选择出的特征子集中不再含有与数据分析任务相关性不大或冗余的那部分特征,其结果可能引起信息丢失。由此可见，现所有的数据降维方法几乎都不是保真降维，其降维后的数据仅适合特定的后续数据分析任务，因而只能算是特定数据分析任务的前期数据预处理。. 本课题研究致力于探索高保真数据降维方法，其降维结果致力于保留原始数据中的全部（期望的）原始特征，最大限度地剔除无关特征。. 项目研究主要借助多重假设检验方法，其研究内容涉及特征相关分析、假设检验阈值估算、零假设比例估算、区间值处理分析等关键技术。项目研究结果对大数据清洗、存储等有实际意义。. 在项目的资助下，项目组成员发表10余篇高水平理论研究论文，并将理论研究成果应用于雷达信号处理等横向委托科研项目，同时也申请了相关发明专利。

项目成果

DOI：{{i.doi}}

发表时间：{{i.publish_year}}

暂无此项成果

数据更新时间：2023-05-31

其他相关文献

DOI：

发表时间：2017

DOI：

发表时间：2016

DOI：10.3864/j.issn.0578-1752.2019.03.004

发表时间：2019

DOI：10.11949/0438-1157.20201260

发表时间：2021

DOI：10.3964/j.issn.1000-0593(2022)09-2956-07

发表时间：2022

祁云嵩的其他基金

相似国自然基金

资源约束的高维流数据降维方法研究

批准号：60973103

批准年份：2009

负责人：靳晓明

学科分类：F0607

资助金额：31.00

项目类别：面上项目

高维复杂结构数据降维

批准号：11471030

批准年份：2014

负责人：赵俊龙

学科分类：A0402

资助金额：60.00

项目类别：面上项目

数据缺失时高维数据降维分析的方法、理论与应用

批准号：11171331

批准年份：2011

负责人：王启华

学科分类：A0403

资助金额：40.00

项目类别：面上项目

面向高维数据集成降维的半监督聚类方法研究

批准号：61105048

批准年份：2011

负责人：曾洪

学科分类：F0603

资助金额：24.00

项目类别：青年科学基金项目

高维数据保真降维方法研究

{{i.achievement_title}}

暂无此项成果

其他相关文献

论大数据环境对情报学发展的影响

监管的非对称性、盈余管理模式选择与证监会执法效率?

水氮耦合及种植密度对绿洲灌区玉米光合作用和干物质积累特征的调控效应

高压工况对天然气滤芯性能影响的实验研究

空气电晕放电发展过程的特征发射光谱分析与放电识别

祁云嵩的其他基金

相似国自然基金