Python数据分析之缺失值处理(一)
统计与数据分析实战
共 2707字,需浏览 6分钟
·
2020-08-08 15:16
◆ ◆ ◆ ◆ ◆
dropna为pandas库下DataFrame的一个方法,用于删除缺失值。基本参数如下:
dropna(self, axis=0, how='any', subset=None, inplace=False)
接下来,我们一一进行讲解。
# 预览模拟数据
> df
Out[1]:
name age birthday gender
0 Alan 17.0 1999-01-25 male
1 NaN NaT female
2 Black 18.0 1997-02-07 male
3 Cici NaN 2000-01-18 female
4 David 25.0 NaT male
5 NaN 22.0 NaT female
# 不加任何参数
> df.dropna()
Out[2]:
name age birthday gender
0 Alan 17.0 1999-01-25 male
2 Black 18.0 1997-02-07 male
'any') > df.dropna(how =
Out[3]:
name age birthday gender
0 Alan 17.0 1999-01-25 male
2 Black 18.0 1997-02-07 male
# all——删除整行均为缺失值的行
'all') > df.dropna(how =
Out[4]:
name age birthday gender
0 Alan 17.0 1999-01-25 male
1 NaN NaT female
2 Black 18.0 1997-02-07 male
3 Cici NaN 2000-01-18 female
4 David 25.0 NaT male
5 NaN 22.0 NaT female
:,1:3].dropna(how = 'all') > df.iloc[
Out[5]:
age birthday
0 17.0 1999-01-25
2 18.0 1997-02-07
3 NaN 2000-01-18
4 25.0 NaT
5 22.0 NaT
# 按列删除——即包含缺失值的列统统被删除
1) > df.dropna(axis =
Out[6]:
gender
0 male
1 female
2 male
3 female
4 male
5 female
# 删除指定列包含缺失值的行
'name','gender']) > df.dropna(subset = [
Out[7]:
name age birthday gender
0 Alan 17.0 1999-01-25 male
1 NaN NaT female
2 Black 18.0 1997-02-07 male
3 Cici NaN 2000-01-18 female
4 David 25.0 NaT male
当然了,以上所有的操作均不是对元数据产生作用,只是生成了一个副本。如果想要对元数据产生作用,则必须加一个inplace参数。
# 再次查看元数据,观察是否变化
> df
Out[8]:
name age birthday gender
0 Alan 17.0 1999-01-25 male
1 NaN NaT female
2 Black 18.0 1997-02-07 male
3 Cici NaN 2000-01-18 female
4 David 25.0 NaT male
5 NaN 22.0 NaT female
# 加入inplace参数,对元数据产生影响
> df.dropna(inplace = True)
> df
Out[9]:
name age birthday gender
0 Alan 17.0 1999-01-25 male
2 Black 18.0 1997-02-07 male
1:2] > df[
Out[182]:
name age birthday gender
1 NaN NaT female
《安家》热播,我用Python对北京房价进行了分析,结果……
记得点在看~祝大家一夜暴富,基金、股票一片红~
【送书,包邮到家】
规则:点在看,分享,排行榜第一即可获得本书。
评论