前言

如何在Python中使用Plotly制作小提琴图。

Plotly: https://plotly.com/python/

src link: https://plotly.com/python/violin/

Operating System: Ubuntu 22.04.4 LTS

参考文档

  1. Violin Plots in Python
  2. Getting Started with Plotly in Python

安装

1
pip install plotly

使用Plotly Express制作小提琴图。

小提琴图是对数值数据的统计表示。它类似于箱线图,但在每侧增加了旋转的核密度图。

用于可视化分布的小提琴图的替代方法包括直方图箱线图ECDF图条带图

使用Plotly Express制作基础小提琴图。

Plotly Express是Plotly的易用、高级接口,它适用于多种类型的数据,并生成易于样式化的图形

1
2
3
4
5
import plotly.express as px

df = px.data.tips()
fig = px.violin(df, y="total_bill")
fig.show()

带有箱体和数据点的小提琴图。

1
2
3
4
5
6
7
import plotly.express as px

df = px.data.tips()
fig = px.violin(df, y="total_bill", box=True, # draw box plot inside the violin
points='all', # can be 'outliers', or False
)
fig.show()

多个小提琴图。

1
2
3
4
5
6
import plotly.express as px

df = px.data.tips()
fig = px.violin(df, y="tip", x="smoker", color="sex", box=True, points="all",
hover_data=df.columns)
fig.show()

1
2
3
4
5
6
7
8
import plotly.express as px

df = px.data.tips()
fig = px.violin(df, y="tip", color="sex",
violinmode='overlay', # draw violins on top of each other
# default violinmode is 'group' as in example above
hover_data=df.columns)
fig.show()

使用go.Violin制作小提琴图。

如果Plotly Express没有提供一个好的起点,您可以使用plotly.graph_objects中更通用的go.Violin类。go.Violin的所有选项都在参考文档 https://plotly.com/python/reference/violin/ 中有记录。

基本的提琴图

1
2
3
4
5
6
7
8
9
10
11
12
import plotly.graph_objects as go

import pandas as pd

df = pd.read_csv("https://raw.githubusercontent.com/plotly/datasets/master/violin_data.csv")

fig = go.Figure(data=go.Violin(y=df['total_bill'], box_visible=True, line_color='black',
meanline_visible=True, fillcolor='lightseagreen', opacity=0.6,
x0='Total Bill'))

fig.update_layout(yaxis_zeroline=False)
fig.show()

多条轨迹

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
import plotly.graph_objects as go

import pandas as pd

df = pd.read_csv("https://raw.githubusercontent.com/plotly/datasets/master/violin_data.csv")

fig = go.Figure()

days = ['Thur', 'Fri', 'Sat', 'Sun']

for day in days:
fig.add_trace(go.Violin(x=df['day'][df['day'] == day],
y=df['total_bill'][df['day'] == day],
name=day,
box_visible=True,
meanline_visible=True))

fig.show()

分组提琴图

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
import plotly.graph_objects as go

import pandas as pd

df = pd.read_csv("https://raw.githubusercontent.com/plotly/datasets/master/violin_data.csv")

fig = go.Figure()

fig.add_trace(go.Violin(x=df['day'][ df['sex'] == 'Male' ],
y=df['total_bill'][ df['sex'] == 'Male' ],
legendgroup='M', scalegroup='M', name='M',
line_color='blue')
)
fig.add_trace(go.Violin(x=df['day'][ df['sex'] == 'Female' ],
y=df['total_bill'][ df['sex'] == 'Female' ],
legendgroup='F', scalegroup='F', name='F',
line_color='orange')
)

fig.update_traces(box_visible=True, meanline_visible=True)
fig.update_layout(violinmode='group')
fig.show()

分割提琴图

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
import plotly.graph_objects as go

import pandas as pd

df = pd.read_csv("https://raw.githubusercontent.com/plotly/datasets/master/violin_data.csv")

fig = go.Figure()

fig.add_trace(go.Violin(x=df['day'][ df['smoker'] == 'Yes' ],
y=df['total_bill'][ df['smoker'] == 'Yes' ],
legendgroup='Yes', scalegroup='Yes', name='Yes',
side='negative',
line_color='blue')
)
fig.add_trace(go.Violin(x=df['day'][ df['smoker'] == 'No' ],
y=df['total_bill'][ df['smoker'] == 'No' ],
legendgroup='No', scalegroup='No', name='No',
side='positive',
line_color='orange')
)
fig.update_traces(meanline_visible=True)
fig.update_layout(violingap=0, violinmode='overlay')
fig.show()

高级提琴图

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
import plotly.graph_objects as go

import pandas as pd

df = pd.read_csv("https://raw.githubusercontent.com/plotly/datasets/master/violin_data.csv")

pointpos_male = [-0.9,-1.1,-0.6,-0.3]
pointpos_female = [0.45,0.55,1,0.4]
show_legend = [True,False,False,False]

fig = go.Figure()

for i in range(0,len(pd.unique(df['day']))):
fig.add_trace(go.Violin(x=df['day'][(df['sex'] == 'Male') &
(df['day'] == pd.unique(df['day'])[i])],
y=df['total_bill'][(df['sex'] == 'Male')&
(df['day'] == pd.unique(df['day'])[i])],
legendgroup='M', scalegroup='M', name='M',
side='negative',
pointpos=pointpos_male[i], # where to position points
line_color='lightseagreen',
showlegend=show_legend[i])
)
fig.add_trace(go.Violin(x=df['day'][(df['sex'] == 'Female') &
(df['day'] == pd.unique(df['day'])[i])],
y=df['total_bill'][(df['sex'] == 'Female')&
(df['day'] == pd.unique(df['day'])[i])],
legendgroup='F', scalegroup='F', name='F',
side='positive',
pointpos=pointpos_female[i],
line_color='mediumpurple',
showlegend=show_legend[i])
)

# update characteristics shared by all traces
fig.update_traces(meanline_visible=True,
points='all', # show all points
jitter=0.05, # add some jitter on points for better visibility
scalemode='count') #scale violin plot area with total count
fig.update_layout(
title_text="Total bill distribution<br><i>scaled by number of bills per gender",
violingap=0, violingroupgap=0, violinmode='overlay')
fig.show()

山脊线图

山脊线图(之前被称为Joy Plot)展示了多个组的数值分布。它们可以用于可视化随时间或空间变化的分布情况。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
import plotly.graph_objects as go
from plotly.colors import n_colors
import numpy as np
np.random.seed(1)

# 12 sets of normal distributed random data, with increasing mean and standard deviation
data = (np.linspace(1, 2, 12)[:, np.newaxis] * np.random.randn(12, 200) +
(np.arange(12) + 2 * np.random.random(12))[:, np.newaxis])

colors = n_colors('rgb(5, 200, 200)', 'rgb(200, 10, 10)', 12, colortype='rgb')

fig = go.Figure()
for data_line, color in zip(data, colors):
fig.add_trace(go.Violin(x=data_line, line_color=color))

fig.update_traces(orientation='h', side='positive', width=3, points=False)
fig.update_layout(xaxis_showgrid=False, xaxis_zeroline=False)
fig.show()

仅有点的提琴图

条形图类似于显示点的提琴图,但没有提琴部分。

1
2
3
4
import plotly.express as px
df = px.data.tips()
fig = px.strip(df, x='day', y='tip')
fig.show()

选择计算四分位数的算法

默认情况下,小提琴图的四分位数使用线性方法计算(有关线性插值的更多信息,请参阅http://jse.amstat.org/v14n3/langford.html和https://en.wikipedia.org/wiki/Quartile中列出的#10以获取更多详细信息)。

然而,你也可以选择使用独占算法或包含算法来计算四分位数。

独占算法使用中位数将有序数据集分为两半。如果样本数为奇数,中位数不包括在任何一半中。Q1是下半部分的中位数,Q3是上半部分的中位数。

包含算法同样使用中位数将有序数据集分为两半,但若样本数为奇数,中位数包括在两半中。Q1是下半部分的中位数,Q3是上半部分的中位数。

1
2
3
4
5
6
7
import plotly.express as px

df = px.data.tips()
fig = px.violin(df, y="total_bill")
fig.update_traces(quartilemethod="exclusive") # or "inclusive", or "linear" by default

fig.show()

参考

请参阅 px.violin() 函数参考或访问 https://plotly.com/python/reference/violin/ 以获取更多信息及图表属性选项!

结语

第二百零三篇博文写完,开心!!!!

今天,也是充满希望的一天。