探索性数据分析,这8个流行的 Python可视化工具就够了
天作之程
共 6820字,需浏览 14分钟
· 2020-09-08
![](https://filescdn.proginn.com/5c79bde4b160fa6172d6d8286b866619/da3da515a00b387e51b155c933d5efb2.webp)
Matplotlib、Seaborn 和 Pandas
ggplot(2)
Bokeh
Plotly
Pygal
Networkx
import seaborn as sns
import matplotlib.pyplot as plt
color_order = ['xkcd:cerulean', 'xkcd:ocean',
'xkcd:black','xkcd:royal purple',
'xkcd:royal purple', 'xkcd:navy blue',
'xkcd:powder blue', 'xkcd:light maroon',
'xkcd:lightish blue','xkcd:navy']
sns.barplot(x=top10.Team,
y=top10.Salary,
palette=color_order).set_title('Teams with Highest Median Salary')
plt.ticklabel_format(style='sci', axis='y', scilimits=(0,0))
![](https://filescdn.proginn.com/09b536afd2d25c61438b16998f24f5b6/17565332b217fb52b99b33aba1409d55.webp)
import matplotlib.pyplot as plt
import scipy.stats as stats
#model2 is a regression model
log_resid = model2.predict(X_test)-y_test
stats.probplot(log_resid, dist="norm", plot=plt)
plt.title("Normal Q-Q plot")
plt.show()
![](https://filescdn.proginn.com/faee0b3c318399224045fdc61bcb8a49/95b5abfff0ca059ec6d6edc28f7207fd.webp)
#All Salaries
ggplot(data=df, aes(x=season_start, y=salary, colour=team)) +
geom_point() +
theme(legend.position="none") +
labs(title = 'Salary Over Time', x='Year', y='Salary ($)')
![](https://filescdn.proginn.com/3e1cb8f03f616a2c32e7f6515f583832/589a60e67944c04d85353089aabae1cc.webp)
import pandas as pd
from bokeh.plotting import figure
from bokeh.io import show
# is_masc is a one-hot encoded dataframe of responses to the question:
# "Do you identify as masculine?"
#Dataframe Prep
counts = is_masc.sum()
resps = is_masc.columns
#Bokeh
p2 = figure(title='Do You View Yourself As Masculine?',
x_axis_label='Response',
y_axis_label='Count',
x_range=list(resps))
p2.vbar(x=resps, top=counts, width=0.6, fill_color='red', line_color='black')
show(p2)
#Pandas
counts.plot(kind='bar')
![](https://filescdn.proginn.com/0321a1d2271aa618bbd23465c66dfcea/09592681d234f1e7585780105607f6b3.webp)
![](https://filescdn.proginn.com/41c504bdca723941d73afec6e3ffc947/8d989c650d4a5d57c34f8e37f8432562.webp)
![](https://filescdn.proginn.com/8ff5b276aa5916ada81c05bee617cbb9/4f3eb4f01f116b5e92dbbb8d5435b6e9.webp)
安装时要有 API 秘钥,还要注册,不是只用 pip 安装就可以;
Plotly 所绘制的数据和布局对象是独一无二的,但并不直观;
图片布局对我来说没有用(40 行代码毫无意义!)
你可以在 Plotly 网站和 Python 环境中编辑图片;
支持交互式图片和商业报表;
Plotly 与 Mapbox 合作,可以自定义地图;
很有潜力绘制优秀图形。
#plot 1 - barplot
# **note** - the layout lines do nothing and trip no errors
data = [go.Bar(x=team_ave_df.team,
y=team_ave_df.turnovers_per_mp)]
layout = go.Layout(
title=go.layout.Title(
text='Turnovers per Minute by Team',
xref='paper',
x=0
),
xaxis=go.layout.XAxis(
title = go.layout.xaxis.Title(
text='Team',
font=dict(
family='Courier New, monospace',
size=18,
color='#7f7f7f'
)
)
),
yaxis=go.layout.YAxis(
title = go.layout.yaxis.Title(
text='Average Turnovers/Minute',
font=dict(
family='Courier New, monospace',
size=18,
color='#7f7f7f'
)
)
),
autosize=True,
hovermode='closest')
py.iplot(figure_or_data=data, layout=layout, filename='jupyter-plot', sharing='public', fileopt='overwrite')
#plot 2 - attempt at a scatterplot
data = [go.Scatter(x=player_year.minutes_played,
y=player_year.salary,
marker=go.scatter.Marker(color='red',
size=3))]
layout = go.Layout(title="test",
xaxis=dict(title='why'),
yaxis=dict(title='plotly'))
py.iplot(figure_or_data=data, layout=layout, filename='jupyter-plot2', sharing='public')
![](https://filescdn.proginn.com/fe02bd31860c9cc2ca3544ccdc1426ca/0a94f3ee53ce6ea407e7635ccde46d92.webp)
![](https://filescdn.proginn.com/c5d71ec7312df06defe61ed821cb0a48/0ddd26cc28d85b20f012ac701160fee3.webp)
![](https://filescdn.proginn.com/b300f3667eef3a9b121a6af19c835289/a786449ed7afd1c4a550911a76a0b7a0.webp)
实例化图片;
用图片目标属性格式化;
用 figure.add() 将数据添加到图片中。
![](https://filescdn.proginn.com/fb60440181a4f63047233111acbc561f/5c8a7312f2a8041ca99942f1afaf5b88.webp)
![](https://filescdn.proginn.com/551f1de55980f8b8e8d8be8e24bbdd1d/3f848b3ccfb142854d4916cabaaa0e4d.webp)
options = {
'node_color' : range(len(G)),
'node_size' : 300,
'width' : 1,
'with_labels' : False,
'cmap' : plt.cm.coolwarm
}
nx.draw(G, **options)
![](https://filescdn.proginn.com/df2dc5314fbe20e5901c7f76946feeb3/da84fa8af78e03e3ce9e32b21ea19aaf.webp)
import itertools
import networkx as nx
import matplotlib.pyplot as plt
f = open('data/facebook/1684.circles', 'r')
circles = [line.split() for line in f]
f.close()
network = []
for circ in circles:
cleaned = [int(val) for val in circ[1:]]
network.append(cleaned)
G = nx.Graph()
for v in network:
G.add_nodes_from(v)
edges = [itertools.combinations(net,2) for net in network]
for edge_group in edges:
G.add_edges_from(edge_group)
options = {
'node_color' : 'lime',
'node_size' : 3,
'width' : 1,
'with_labels' : False,
}
nx.draw(G, **options)
![](https://filescdn.proginn.com/cb186507dbe557d00c937a6a7a341dca/64c3e195be060545c99d336dd335382e.webp)
-END- 往期精彩推荐 -- -- 1、在线代码编辑器,可以分享给任何人 -- 2、Python 造假数据,用Faker就够了 -- 3、在Python中玩转Json数据 -- 留下你的“在看”呗!
评论