pyspark給dataframe增加新的一列的實(shí)現(xiàn)示例
熟悉pandas的pythoner 應(yīng)該知道給dataframe增加一列很容易,直接以字典形式指定就好了,pyspark中就不同了,摸索了一下,可以使用如下方式增加
from pyspark import SparkContext from pyspark import SparkConf from pypsark.sql import SparkSession from pyspark.sql import functions spark = SparkSession.builder.config(conf=SparkConf()).getOrCreate() data = [['Alice', 19, 'blue', '["Alice", 19, "blue"]'], ['Jane', 20, 'green', '["Jane", 20, "green"]'], ['Mary', 21, 'blue', '["Mary", 21, "blue"]'], ] frame = spark.createDataFrame(data, schema=["name", "age", "eye_color", "detail"]) frame.cache() frame.show()
+-----+---+---------+--------------------+
| name|age|eye_color| detail|
+-----+---+---------+--------------------+
|Alice| 19| blue|["Alice", 19, "bl...|
| Jane| 20| green|["Jane", 20, "gre...|
| Mary| 21| blue|["Mary", 21, "blue"]|
+-----+---+---------+--------------------+
1、 增加常數(shù)項(xiàng)
frame2 = frame.withColumn("contant", functions.lit(10))
frame2.show()
+-----+---+---------+--------------------+-------+
| name|age|eye_color| detail|contant|
+-----+---+---------+--------------------+-------+
|Alice| 19| blue|["Alice", 19, "bl...| 10|
| Jane| 20| green|["Jane", 20, "gre...| 10|
| Mary| 21| blue|["Mary", 21, "blue"]| 10|
+-----+---+---------+--------------------+-------+
2、簡單根據(jù)某列進(jìn)行計算
2.1 使用 withColumn
frame3_1 = frame.withColumn("name_length", functions.length(frame.name))
frame3_1.show()
+-----+---+---------+--------------------+-----------+
| name|age|eye_color| detail|name_length|
+-----+---+---------+--------------------+-----------+
|Alice| 19| blue|["Alice", 19, "bl...| 5|
| Jane| 20| green|["Jane", 20, "gre...| 4|
| Mary| 21| blue|["Mary", 21, "blue"]| 4|
+-----+---+---------+--------------------+-----------+
2.2 使用 select
frame3_2 = frame.select(["name", functions.length(frame.name).alias("name_length")])
frame3_2.show()
+-----+-----------+
| name|name_length|
+-----+-----------+
|Alice| 5|
| Jane| 4|
| Mary| 4|
+-----+-----------+
2.3 使用 selectExpr
frame3_3 = frame.selectExpr(["name", "length(name) as name_length"]) frame3_3.show()
+-----+-----------+
| name|name_length|
+-----+-----------+
|Alice| 5|
| Jane| 4|
| Mary| 4|
+-----+-----------+
3、定制化根據(jù)某列進(jìn)行計算
比如我想對某列做指定操作,但是對應(yīng)的函數(shù)沒得咋辦,造,自己造~
frame4 = frame.withColumn("detail_length", functions.UserDefinedFunction(lambda obj: len(json.loads(obj)))(frame.detail))
# or
def length_detail(obj):
return len(json.loads(obj))
frame4 = frame.withColumn("detail_length", functions.UserDefinedFunction(length_detail)(frame.detail))
frame4.show()
+-----+---+---------+--------------------+-------------+
| name|age|eye_color| detail|detail_length|
+-----+---+---------+--------------------+-------------+
|Alice| 19| blue|["Alice", 19, "bl...| 3|
| Jane| 20| green|["Jane", 20, "gre...| 3|
| Mary| 21| blue|["Mary", 21, "blue"]| 3|
+-----+---+---------+--------------------+-------------+
到此這篇關(guān)于pyspark給dataframe增加新的一列的實(shí)現(xiàn)示例的文章就介紹到這了,更多相關(guān)pyspark dataframe增加列內(nèi)容請搜索腳本之家以前的文章或繼續(xù)瀏覽下面的相關(guān)文章希望大家以后多多支持腳本之家!
相關(guān)文章
使用Python進(jìn)行數(shù)獨(dú)求解詳解(一)
本文主要介紹了如何構(gòu)建一個Python腳本來解決數(shù)獨(dú)難題,本文的重點(diǎn)在于介紹用于構(gòu)建數(shù)獨(dú)求解器的回溯算法。感興趣的小伙伴可以學(xué)習(xí)一下2022-02-02
利用Python將圖片批量轉(zhuǎn)化成素描圖的過程記錄
萬能的Python真的是除了不會生孩子,其他的還真不在話下,下面這篇文章主要給大家介紹了關(guān)于如何利用Python將圖片批量轉(zhuǎn)化成素描圖的相關(guān)資料,需要的朋友可以參考下2021-08-08
Django提示mysql版本過低:django.db.utils.NotSupportedError: My
這篇文章主要介紹了Django提示mysql版本過低:django.db.utils.NotSupportedError: MySQL 8 or later is required (found 5.7.26).的解決方法,文中有詳細(xì)的解決方案,具有一定的參考價值,需要的朋友可以參考下2024-03-03
opencv+python識別七段數(shù)碼顯示器的數(shù)字(數(shù)字識別)
本文主要介紹了opencv+python識別七段數(shù)碼顯示器的數(shù)字(數(shù)字識別),文中通過示例代碼介紹的非常詳細(xì),具有一定的參考價值,感興趣的小伙伴們可以參考一下2022-01-01
Python移動測試開發(fā)subprocess模塊項(xiàng)目實(shí)戰(zhàn)
這篇文章主要為大家介紹了Python移動測試開發(fā)subprocess模塊項(xiàng)目實(shí)戰(zhàn)示例,有需要的朋友可以借鑒參考下,希望能夠有所幫助,祝大家多多進(jìn)步,早日升職加薪2022-07-07

