基于大数据的数据挖掘算法设计K-Means算法
基于大数据的数据挖掘算法设计K-Means算法(任务书,开题报告,论文16000字)
摘 要
作为一种有效的数据分析方法,数据挖掘技术是从大量数据中进行提取、挖掘并最终输出有价值数据的过程,数据挖掘的方法众多,K-Means聚类分析是其中最常见也是最经典的一种方法,在各个领域获得了广泛应用,学术界也不断有新的模型优化和改进算法。
本文是针对聚类算法k-means算法的研究,首先简单介绍了相关的数据处理和算法,阐述了基本概念。其次重点对k-means算法进行了研究和改进,包括对K-Means聚类分析优缺点的分析,基于其优缺点提出了改进算法分析,并进行了实验验证。
在实际研究中,文章结合数据挖掘理论,基于相关概念详细分析了数据挖掘技术及其在国内外的应用。并基于聚类分析理论及其算法,详细分析了k-means算法及其优缺点,针对k-means算法现状提出了改进算法,针对其存在的孤立点和初始聚类随机问题,提出了相应的改进算法。这种新型的改进算法是利用异常分析法,具有缜密的数学数理学基础,还可以防止用户设定阀值条件,同时,新的算法提出了初始聚类的思想,通过这种方法来进行集中数据分类,并确保集群的严格相似性。k-means算法的改进方法实现离群点检测,通过初始聚类降低了对聚类结果可能造成的干扰,首先局部最优解,并可以减少迭代算法的数量。文章研究的最后,通过iris数据集的实验结果,对改进算法进行了验证,实验结果表明,改进算法与原算法相比,具有更好的精度和收敛速度。 [来源:http://www.doc163.com]
关键词:数据挖掘,聚类算法,k-means算法
Abstract
Non-trivial process of data mining is to extract data from a large number of valid, novel, potentially useful, credible, and can eventually be understood pattern. Cluster analysis is an analytical method for data mining, and K-Means is one of the most classic and most widely used clustering methods, however there are many today threw its improved model proposed.
This paper is the study of clustering algorithm k-means algorithm. It introduces the concepts of clustering algorithm. Second, focus on the k-means algorithm is analyzed and studied this paper systematically introduces the basic theory of clustering and clustering mining, and then make their own improved method for the limitations of K-Means Algorithm
First, the article describes the current status of research clustering algorithm at home and abroad. At the same time, briefly describes the content of data mining theory, including the concept and the steps of data mining data mining.
Then, in the concept of clustering algorithms and clustering introduces cluster analysis on the basis of theoretical knowledge, focusing on the interpretation of K-Means algorithm, and to analyze the advantages and disadvantages. For the original K-Means algorithm that isolate the impact point and the initial cluster centers randomly selected issues proposed Outlier Analysis with improved K-Means clustering algorithm. Outlier analysis is mainly based on thought of statistics "Z score (standard score) is greater than the absolute value of the data 2 as isolated points", this approach not only has a strict mathematical theory basis and avoid the user to set the threshold prerequisite. Determine the initial cluster centers policy is relatively centralized data every time carved out first, so that you can ensure that each cluster divided data object has a higher similarity. Outlier detection may reduce the effect of an isolated point of clustering results and improved K-Means algorithm to determine the initial cluster centers algorithm strategy can reduce the possibility of local optimum and to some extent, reduce the number of iterations of the algorithm. Then using iris data sets for improved algorithm experiments to verify the effectiveness and improve the performance of K-Means algorithm than the original algorithm has greatly improved in comparison. [来源:http://Doc163.com]
Key Words:Data Mining,Clustering Algorithm,k -means clustering
目录
摘 要 4
Abstract 5
第1章 绪论 6
1.1论文的研背景与意义 6
1.2 国内外研究动态 7
1.3论文的主要内容及组织结构 8
第2章 数据挖掘 8
2.1 概述 8
2.2 数据挖掘的意义 11
第3章 聚类算法 13
3.1聚类分析介绍 13
3.1.1聚类的概念 13
3.1.2聚类算法 14
第4章 K-Means算法分析及改进 17
4.1 K-Means算法 17
4.2 K-Means算法的优缺点分析 19
4.3改进的K-Means算法 19
4.3.1基本思想 19
4.3.2孤立点检测 20
4.3.3初始中心确定 21
4.3.4改进的K-Means算法描述 22 [来源:http://www.doc163.com]
4.3.5实验结果分析 23
第5章 总结与展望 25
5.1本文的总结 25
5.2 未来的展望 26
致 谢 27
参考文献 28 [资料来源:http://www.doc163.com]