• 网站导航
学院动态
通知公告
学术活动
观点导读
最新资讯
统计数据
统计机构
统计资源
您现在的位置: 首页 > 观点导读 > 正文
数据聚类五十年
[发布时间]:2012-12-09 [浏览次数]:
Data clustering: 50 years beyond K-means,

Jain, A.K., Pattern Recognition Letters, 2010, 31(8): 651-666

Abstract:  Organizing data into sensible groupings is one of the most fundamental modes of understanding and learning. As an example, a common scheme of scientific classification puts organisms into a system of ranked taxa: domain, kingdom, phylum, class, etc.. Cluster analysis is the formal study of methods and algorithms for grouping, or clustering, objects according to measured or perceived intrinsic characteristics or similarity. Cluster analysis does not use category labels that tag objects with prior identifiers, i.e., class labels. The absence of category information distinguishes data clustering (unsupervised learning) from classification or discriminant analysis (supervised learning). The aim of clustering is to find structure in data and is therefore exploratory in nature. Clustering has a long and rich history in a variety of scientific fields. One of the most popular and simple clustering algorithms, K-means, was first published in 1955. In spite of the fact that K-means was proposed over 50 years ago and thousands of clustering algorithms have been published since then, K-means is still widely used. This speaks to the difficulty of designing a general purpose clustering algorithm and the ill-posed problem of clustering.  We provide a brief overview of clustering, summarize well known clustering methods, discuss the major challenges and key issues in designing clustering algorithms, and point out some of the emerging and useful research directions, including semi-supervised clustering, ensemble clustering, simultaneous feature selection during data clustering and large scale data clustering.

聚类分析可谓是统计学中的最重要分支之一了, 在数据分析领域中有着极其广泛的应用. 本文系统总结了聚类分析过去五十年的发展, 包括K-均值模型的发展, 各种聚类方法的优缺点, 类别数的确定方法等应用中的各种问题. 本文用语深入浅出, 适合各级别的学生老师阅读.

编者按: 本小品文由教师张忠元撰写。欢迎那些对本文感兴趣的研究生和教师给编者写信, 提出您的宝贵意见或者评论。我将及时地将您的建议反映在述评当中,或者将您的评论放在本文后面。邮箱是: zhyuanzh@gmail.com.