Python實(shí)現(xiàn)k-means算法

更新時(shí)間：2018年02月23日 10:25:06 作者：the_Chain_Warden

這篇文章主要為大家詳細(xì)介紹了Python實(shí)現(xiàn)k-means算法，具有一定的參考價(jià)值，感興趣的小伙伴們可以參考一下

本文實(shí)例為大家分享了Python實(shí)現(xiàn)k-means算法的具體代碼，供大家參考，具體內(nèi)容如下

這也是周志華《機(jī)器學(xué)習(xí)》的習(xí)題9.4。

數(shù)據(jù)集是西瓜數(shù)據(jù)集4.0，如下

編號(hào),密度,含糖率
1,0.697,0.46
2,0.774,0.376
3,0.634,0.264
4,0.608,0.318
5,0.556,0.215
6,0.403,0.237
7,0.481,0.149
8,0.437,0.211
9,0.666,0.091
10,0.243,0.267
11,0.245,0.057
12,0.343,0.099
13,0.639,0.161
14,0.657,0.198
15,0.36,0.37
16,0.593,0.042
17,0.719,0.103
18,0.359,0.188
19,0.339,0.241
20,0.282,0.257
21,0.784,0.232
22,0.714,0.346
23,0.483,0.312
24,0.478,0.437
25,0.525,0.369
26,0.751,0.489
27,0.532,0.472
28,0.473,0.376
29,0.725,0.445
30,0.446,0.459

算法很簡(jiǎn)單，就不解釋了，代碼也不復(fù)雜，直接放上來：

# -*- coding: utf-8 -*- 
"""Excercise 9.4"""
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import sys
import random

data = pd.read_csv(filepath_or_buffer = '../dataset/watermelon4.0.csv', sep = ',')[["密度","含糖率"]].values

########################################## K-means ####################################### 
k = int(sys.argv[1])
#Randomly choose k samples from data as mean vectors
mean_vectors = random.sample(data,k)

def dist(p1,p2):
  return np.sqrt(sum((p1-p2)*(p1-p2)))
while True:
  print mean_vectors
  clusters = map ((lambda x:[x]), mean_vectors) 
  for sample in data:
    distances = map((lambda m: dist(sample,m)), mean_vectors) 
    min_index = distances.index(min(distances))
    clusters[min_index].append(sample)
  new_mean_vectors = []
  for c,v in zip(clusters,mean_vectors):
    new_mean_vector = sum(c)/len(c)
    #If the difference betweenthe new mean vector and the old mean vector is less than 0.0001
    #then do not updata the mean vector
    if all(np.divide((new_mean_vector-v),v) < np.array([0.0001,0.0001]) ):
      new_mean_vectors.append(v)  
    else:
      new_mean_vectors.append(new_mean_vector)  
  if np.array_equal(mean_vectors,new_mean_vectors):
    break
  else:
    mean_vectors = new_mean_vectors 

#Show the clustering result
total_colors = ['r','y','g','b','c','m','k']
colors = random.sample(total_colors,k)
for cluster,color in zip(clusters,colors):
  density = map(lambda arr:arr[0],cluster)
  sugar_content = map(lambda arr:arr[1],cluster)
  plt.scatter(density,sugar_content,c = color)
plt.show()

運(yùn)行方式：在命令行輸入 python k_means.py 4。其中4就是k。
下面是k分別等于3，4，5的運(yùn)行結(jié)果，因?yàn)橐婚_始的均值向量是隨機(jī)的，所以每次運(yùn)行結(jié)果會(huì)有不同。

以上就是本文的全部?jī)?nèi)容，希望對(duì)大家的學(xué)習(xí)有所幫助，也希望大家多多支持腳本之家。

您可能感興趣的文章:

相關(guān)文章

Python中的xml與dict的轉(zhuǎn)換方法詳解
這篇文章主要介紹了Python中的xml與dict的轉(zhuǎn)換方法詳解,xml 是指可擴(kuò)展標(biāo)記語言，一種標(biāo)記語言類似html,作用是傳輸數(shù)據(jù)，而且不是顯示數(shù)據(jù)?？梢宰远x標(biāo)簽,需要的朋友可以參考下
2023-07-07
postman發(fā)送文件請(qǐng)求并以python服務(wù)接收方式
這篇文章主要介紹了postman發(fā)送文件請(qǐng)求并以python服務(wù)接收方式,具有很好的參考價(jià)值,希望對(duì)大家有所幫助,如有錯(cuò)誤或未考慮完全的地方,望不吝賜教
2024-07-07
python 統(tǒng)計(jì)代碼耗時(shí)的幾種方法分享
本文實(shí)例講述了Python中統(tǒng)計(jì)代碼片段、函數(shù)運(yùn)行耗時(shí)的幾種方法，分享給大家，僅供參考。
2021-04-04
Python數(shù)據(jù)結(jié)構(gòu)之翻轉(zhuǎn)鏈表
這篇文章主要介紹了Python數(shù)據(jù)結(jié)構(gòu)之翻轉(zhuǎn)鏈表的相關(guān)資料,需要的朋友可以參考下
2017-02-02
python2.7實(shí)現(xiàn)郵件發(fā)送功能
這篇文章主要為大家詳細(xì)介紹了python2.7實(shí)現(xiàn)郵件發(fā)送功能包，含文本、附件、正文圖片等，具有一定的參考價(jià)值，感興趣的小伙伴們可以參考一下
2018-12-12
使用Python獲取PDF文本和圖片的精確位置的操作方法
在處理和分析PDF文檔時(shí),獲取文本和圖片在頁(yè)面上的精確位置是一個(gè)重要的操作,通過確定這些元素的具體坐標(biāo),我們可以實(shí)現(xiàn)對(duì)PDF內(nèi)容的更精細(xì)控制和理解,本文將介紹如何使用Python獲取PDF文本和圖片在頁(yè)面上的位置坐標(biāo),需要的朋友可以參考下
2024-12-12
Python OpenCV利用筆記本攝像頭實(shí)現(xiàn)人臉檢測(cè)
這篇文章主要為大家詳細(xì)介紹了Python OpenCV利用筆記本攝像頭實(shí)現(xiàn)人臉檢測(cè)，文中示例代碼介紹的非常詳細(xì)，具有一定的參考價(jià)值，感興趣的小伙伴們可以參考一下
2019-04-04
python如何在終端里面顯示一張圖片
這篇文章主要為大家詳細(xì)介紹了python如何在終端里面顯示一張圖片的方法，感興趣的小伙伴們可以參考一下
2016-08-08
Python3自定義http/https請(qǐng)求攔截mitmproxy腳本實(shí)例
這篇文章主要介紹了Python3自定義http/https請(qǐng)求攔截mitmproxy腳本實(shí)例，具有很好的參考價(jià)值，希望對(duì)大家有所幫助。一起跟隨小編過來看看吧
2020-05-05
教你如何使用Python快速爬取需要的數(shù)據(jù)
學(xué)點(diǎn)數(shù)據(jù)爬蟲基礎(chǔ)能讓繁瑣的數(shù)據(jù)CV工作（Ctrl+C，Ctrl+V）成為自動(dòng)化就足夠了.作為一名數(shù)據(jù)分析師而并非開發(fā)工程師,需要掌握的爬蟲必備的知識(shí)內(nèi)容,能獲取需要的數(shù)據(jù)即可 ,需要的朋友可以參考下
2021-06-06