Python實(shí)現(xiàn)k-means算法
本文實(shí)例為大家分享了Python實(shí)現(xiàn)k-means算法的具體代碼,供大家參考,具體內(nèi)容如下
這也是周志華《機(jī)器學(xué)習(xí)》的習(xí)題9.4。
數(shù)據(jù)集是西瓜數(shù)據(jù)集4.0,如下
編號(hào),密度,含糖率
1,0.697,0.46
2,0.774,0.376
3,0.634,0.264
4,0.608,0.318
5,0.556,0.215
6,0.403,0.237
7,0.481,0.149
8,0.437,0.211
9,0.666,0.091
10,0.243,0.267
11,0.245,0.057
12,0.343,0.099
13,0.639,0.161
14,0.657,0.198
15,0.36,0.37
16,0.593,0.042
17,0.719,0.103
18,0.359,0.188
19,0.339,0.241
20,0.282,0.257
21,0.784,0.232
22,0.714,0.346
23,0.483,0.312
24,0.478,0.437
25,0.525,0.369
26,0.751,0.489
27,0.532,0.472
28,0.473,0.376
29,0.725,0.445
30,0.446,0.459
算法很簡(jiǎn)單,就不解釋了,代碼也不復(fù)雜,直接放上來:
# -*- coding: utf-8 -*-
"""Excercise 9.4"""
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import sys
import random
data = pd.read_csv(filepath_or_buffer = '../dataset/watermelon4.0.csv', sep = ',')[["密度","含糖率"]].values
########################################## K-means #######################################
k = int(sys.argv[1])
#Randomly choose k samples from data as mean vectors
mean_vectors = random.sample(data,k)
def dist(p1,p2):
return np.sqrt(sum((p1-p2)*(p1-p2)))
while True:
print mean_vectors
clusters = map ((lambda x:[x]), mean_vectors)
for sample in data:
distances = map((lambda m: dist(sample,m)), mean_vectors)
min_index = distances.index(min(distances))
clusters[min_index].append(sample)
new_mean_vectors = []
for c,v in zip(clusters,mean_vectors):
new_mean_vector = sum(c)/len(c)
#If the difference betweenthe new mean vector and the old mean vector is less than 0.0001
#then do not updata the mean vector
if all(np.divide((new_mean_vector-v),v) < np.array([0.0001,0.0001]) ):
new_mean_vectors.append(v)
else:
new_mean_vectors.append(new_mean_vector)
if np.array_equal(mean_vectors,new_mean_vectors):
break
else:
mean_vectors = new_mean_vectors
#Show the clustering result
total_colors = ['r','y','g','b','c','m','k']
colors = random.sample(total_colors,k)
for cluster,color in zip(clusters,colors):
density = map(lambda arr:arr[0],cluster)
sugar_content = map(lambda arr:arr[1],cluster)
plt.scatter(density,sugar_content,c = color)
plt.show()
運(yùn)行方式:在命令行輸入 python k_means.py 4。其中4就是k。
下面是k分別等于3,4,5的運(yùn)行結(jié)果,因?yàn)橐婚_始的均值向量是隨機(jī)的,所以每次運(yùn)行結(jié)果會(huì)有不同。



以上就是本文的全部?jī)?nèi)容,希望對(duì)大家的學(xué)習(xí)有所幫助,也希望大家多多支持腳本之家。
相關(guān)文章
Python中的xml與dict的轉(zhuǎn)換方法詳解
這篇文章主要介紹了Python中的xml與dict的轉(zhuǎn)換方法詳解,xml 是指可擴(kuò)展標(biāo)記語言,一種標(biāo)記語言類似html,作用是傳輸數(shù)據(jù),而且不是顯示數(shù)據(jù)??梢宰远x標(biāo)簽,需要的朋友可以參考下2023-07-07
postman發(fā)送文件請(qǐng)求并以python服務(wù)接收方式
這篇文章主要介紹了postman發(fā)送文件請(qǐng)求并以python服務(wù)接收方式,具有很好的參考價(jià)值,希望對(duì)大家有所幫助,如有錯(cuò)誤或未考慮完全的地方,望不吝賜教2024-07-07
python 統(tǒng)計(jì)代碼耗時(shí)的幾種方法分享
本文實(shí)例講述了Python中統(tǒng)計(jì)代碼片段、函數(shù)運(yùn)行耗時(shí)的幾種方法,分享給大家,僅供參考。2021-04-04
Python數(shù)據(jù)結(jié)構(gòu)之翻轉(zhuǎn)鏈表
這篇文章主要介紹了Python數(shù)據(jù)結(jié)構(gòu)之翻轉(zhuǎn)鏈表的相關(guān)資料,需要的朋友可以參考下2017-02-02
python2.7實(shí)現(xiàn)郵件發(fā)送功能
這篇文章主要為大家詳細(xì)介紹了python2.7實(shí)現(xiàn)郵件發(fā)送功能包,含文本、附件、正文圖片等,具有一定的參考價(jià)值,感興趣的小伙伴們可以參考一下2018-12-12
Python OpenCV利用筆記本攝像頭實(shí)現(xiàn)人臉檢測(cè)
這篇文章主要為大家詳細(xì)介紹了Python OpenCV利用筆記本攝像頭實(shí)現(xiàn)人臉檢測(cè),文中示例代碼介紹的非常詳細(xì),具有一定的參考價(jià)值,感興趣的小伙伴們可以參考一下2019-04-04
Python3自定義http/https請(qǐng)求攔截mitmproxy腳本實(shí)例
這篇文章主要介紹了Python3自定義http/https請(qǐng)求攔截mitmproxy腳本實(shí)例,具有很好的參考價(jià)值,希望對(duì)大家有所幫助。一起跟隨小編過來看看吧2020-05-05
教你如何使用Python快速爬取需要的數(shù)據(jù)
學(xué)點(diǎn)數(shù)據(jù)爬蟲基礎(chǔ)能讓繁瑣的數(shù)據(jù)CV工作(Ctrl+C,Ctrl+V)成為自動(dòng)化就足夠了.作為一名數(shù)據(jù)分析師而并非開發(fā)工程師,需要掌握的爬蟲必備的知識(shí)內(nèi)容,能獲取需要的數(shù)據(jù)即可 ,需要的朋友可以參考下2021-06-06

