前言
如何在Python中使用Plotly制作小提琴图。
Plotly: https://plotly.com/python/
src link: https://plotly.com/python/violin/
Operating System: Ubuntu 22.04.4 LTS
参考文档
安装
pip install plotly
使用Plotly Express制作小提琴图。
小提琴图是对数值数据的统计表示。它类似于箱线图,但在每侧增加了旋转的核密度图。
用于可视化分布的小提琴图的替代方法包括直方图、箱线图、ECDF图和条带图。
使用Plotly Express制作基础小提琴图。
Plotly Express是Plotly的易用、高级接口,它适用于多种类型的数据,并生成易于样式化的图形。
import plotly.express as px
df = px.data.tips()
fig = px.violin(df, y="total_bill")
fig.show()
带有箱体和数据点的小提琴图。
import plotly.express as px
df = px.data.tips()
fig = px.violin(df, y="total_bill", box=True, # draw box plot inside the violin
points='all', # can be 'outliers', or False
)
fig.show()
多个小提琴图。
import plotly.express as px
df = px.data.tips()
fig = px.violin(df, y="tip", x="smoker", color="sex", box=True, points="all",
hover_data=df.columns)
fig.show()
import plotly.express as px
df = px.data.tips()
fig = px.violin(df, y="tip", color="sex",
violinmode='overlay', # draw violins on top of each other
# default violinmode is 'group' as in example above
hover_data=df.columns)
fig.show()
使用go.Violin制作小提琴图。
如果Plotly Express没有提供一个好的起点,您可以使用plotly.graph_objects中更通用的go.Violin类。go.Violin的所有选项都在参考文档 https://plotly.com/python/reference/violin/ 中有记录。
基本的提琴图
import plotly.graph_objects as go
import pandas as pd
df = pd.read_csv("https://raw.githubusercontent.com/plotly/datasets/master/violin_data.csv")
fig = go.Figure(data=go.Violin(y=df['total_bill'], box_visible=True, line_color='black',
meanline_visible=True, fillcolor='lightseagreen', opacity=0.6,
x0='Total Bill'))
fig.update_layout(yaxis_zeroline=False)
fig.show()
多条轨迹
import plotly.graph_objects as go
import pandas as pd
df = pd.read_csv("https://raw.githubusercontent.com/plotly/datasets/master/violin_data.csv")
fig = go.Figure()
days = ['Thur', 'Fri', 'Sat', 'Sun']
for day in days:
fig.add_trace(go.Violin(x=df['day'][df['day'] == day],
y=df['total_bill'][df['day'] == day],
name=day,
box_visible=True,
meanline_visible=True))
fig.show()
分组提琴图
import plotly.graph_objects as go
import pandas as pd
df = pd.read_csv("https://raw.githubusercontent.com/plotly/datasets/master/violin_data.csv")
fig = go.Figure()
fig.add_trace(go.Violin(x=df['day'][ df['sex'] == 'Male' ],
y=df['total_bill'][ df['sex'] == 'Male' ],
legendgroup='M', scalegroup='M', name='M',
line_color='blue')
)
fig.add_trace(go.Violin(x=df['day'][ df['sex'] == 'Female' ],
y=df['total_bill'][ df['sex'] == 'Female' ],
legendgroup='F', scalegroup='F', name='F',
line_color='orange')
)
fig.update_traces(box_visible=True, meanline_visible=True)
fig.update_layout(violinmode='group')
fig.show()
分割提琴图
import plotly.graph_objects as go
import pandas as pd
df = pd.read_csv("https://raw.githubusercontent.com/plotly/datasets/master/violin_data.csv")
fig = go.Figure()
fig.add_trace(go.Violin(x=df['day'][ df['smoker'] == 'Yes' ],
y=df['total_bill'][ df['smoker'] == 'Yes' ],
legendgroup='Yes', scalegroup='Yes', name='Yes',
side='negative',
line_color='blue')
)
fig.add_trace(go.Violin(x=df['day'][ df['smoker'] == 'No' ],
y=df['total_bill'][ df['smoker'] == 'No' ],
legendgroup='No', scalegroup='No', name='No',
side='positive',
line_color='orange')
)
fig.update_traces(meanline_visible=True)
fig.update_layout(violingap=0, violinmode='overlay')
fig.show()
高级提琴图
import plotly.graph_objects as go
import pandas as pd
df = pd.read_csv("https://raw.githubusercontent.com/plotly/datasets/master/violin_data.csv")
pointpos_male = [-0.9,-1.1,-0.6,-0.3]
pointpos_female = [0.45,0.55,1,0.4]
show_legend = [True,False,False,False]
fig = go.Figure()
for i in range(0,len(pd.unique(df['day']))):
fig.add_trace(go.Violin(x=df['day'][(df['sex'] == 'Male') &
(df['day'] == pd.unique(df['day'])[i])],
y=df['total_bill'][(df['sex'] == 'Male')&
(df['day'] == pd.unique(df['day'])[i])],
legendgroup='M', scalegroup='M', name='M',
side='negative',
pointpos=pointpos_male[i], # where to position points
line_color='lightseagreen',
showlegend=show_legend[i])
)
fig.add_trace(go.Violin(x=df['day'][(df['sex'] == 'Female') &
(df['day'] == pd.unique(df['day'])[i])],
y=df['total_bill'][(df['sex'] == 'Female')&
(df['day'] == pd.unique(df['day'])[i])],
legendgroup='F', scalegroup='F', name='F',
side='positive',
pointpos=pointpos_female[i],
line_color='mediumpurple',
showlegend=show_legend[i])
)
# update characteristics shared by all traces
fig.update_traces(meanline_visible=True,
points='all', # show all points
jitter=0.05, # add some jitter on points for better visibility
scalemode='count') #scale violin plot area with total count
fig.update_layout(
title_text="Total bill distribution<br><i>scaled by number of bills per gender",
violingap=0, violingroupgap=0, violinmode='overlay')
fig.show()
山脊线图
山脊线图(之前被称为Joy Plot)展示了多个组的数值分布。它们可以用于可视化随时间或空间变化的分布情况。
import plotly.graph_objects as go
from plotly.colors import n_colors
import numpy as np
np.random.seed(1)
# 12 sets of normal distributed random data, with increasing mean and standard deviation
data = (np.linspace(1, 2, 12)[:, np.newaxis] * np.random.randn(12, 200) +
(np.arange(12) + 2 * np.random.random(12))[:, np.newaxis])
colors = n_colors('rgb(5, 200, 200)', 'rgb(200, 10, 10)', 12, colortype='rgb')
fig = go.Figure()
for data_line, color in zip(data, colors):
fig.add_trace(go.Violin(x=data_line, line_color=color))
fig.update_traces(orientation='h', side='positive', width=3, points=False)
fig.update_layout(xaxis_showgrid=False, xaxis_zeroline=False)
fig.show()
仅有点的提琴图
条形图类似于显示点的提琴图,但没有提琴部分。
import plotly.express as px
df = px.data.tips()
fig = px.strip(df, x='day', y='tip')
fig.show()
选择计算四分位数的算法
默认情况下,小提琴图的四分位数使用线性方法计算(有关线性插值的更多信息,请参阅http://jse.amstat.org/v14n3/langford.html和https://en.wikipedia.org/wiki/Quartile中列出的#10以获取更多详细信息)。
然而,你也可以选择使用独占算法或包含算法来计算四分位数。
独占算法使用中位数将有序数据集分为两半。如果样本数为奇数,中位数不包括在任何一半中。Q1是下半部分的中位数,Q3是上半部分的中位数。
包含算法同样使用中位数将有序数据集分为两半,但若样本数为奇数,中位数包括在两半中。Q1是下半部分的中位数,Q3是上半部分的中位数。
import plotly.express as px
df = px.data.tips()
fig = px.violin(df, y="total_bill")
fig.update_traces(quartilemethod="exclusive") # or "inclusive", or "linear" by default
fig.show()
参考
请参阅 px.violin() 函数参考或访问 https://plotly.com/python/reference/violin/ 以获取更多信息及图表属性选项!
结语
第二百零三篇博文写完,开心!!!!
今天,也是充满希望的一天。