Matplotlib | Violin Plot. Mean, Median, Extrema, and Color Explained (violinplot)

You may want to accurately understand the distribution and characteristics of your data and visually capture the extreme regions and medians, but you may be confused about how to use violinplots and what they mean for your data.

This article explains the different types of violinplots in Matplotlib, how to customize them, and how to display the distribution, mean, median, and extreme of data in an easy-to-understand manner.

Improve your data analysis skills and grasp the essence of your data!

Please refer to the following article for a box-and-whisker diagram

Table of Contents

Axes.violinplot function

Violin plots are drawn by specifying an array as the first argument to Axes.violinplot function.

Axes.violinplot
Parameters
  • dataset (array) : The input data.
  • positions (array) : The positions of the violins.
  • vert (bool) : If true, creates a vertical violin plot.
  • widths (float, array) : Either a scalar or a vector that sets the maximal width of each violin.
  • showmeans (bool) : means. Default is False
  • showextrema (bool) : extrema.Default is True
  • showmedians (bool) : medians.Default is False
  • quantiles (array) : Specified as an array with the same configuration as dataset
  • points (int) : Defines the number of points to evaluate each of the gaussian kernel density estimations at.
  • bw_method (str) : The method used to calculate the estimator bandwidth. This can be ‘scott’, ‘silverman’, a scalar constant or a callable.
Returns

dict

A dictionary mapping each component of the violinplot

Official Documentation

Basic Violin Plots (dataset)

A 1D array will produce one violin plot
A 2-dimensional array contains as many violin plots as there are elements

The following tabs explain the code and flowchart

import matplotlib.pyplot as plt
import numpy as np

# step1 Fix the random numbers generated
np.random.seed(19680801)
# step2 Create data
all_data = [np.random.normal(0, std, 100) for std in range(7, 10)]
labels = ['x1', 'x2', 'x3']
# step3 Create graph frames
fig, ax = plt.subplots()
# step4 Plot violin plots
ax.violinplot(all_data)

ax.set_xticks(np.arange(1, len(labels) + 1), labels=labels)
ax.set_xlabel('X label')
ax.set_ylabel('Y label')
ax.set_title('Basic Violin')

plt.show()

Place horizontally (vert)

Basically, the violin diagram is vertical, but it can be displayed horizontally with vert=True.

# step4 Plot violin plots
ax.violinplot(all_data, vert=False)

Adjust maximum width (widths)

The maximum width of the data distribution is set by widths, which can be specified as a scalar or an array.

widths = scalar

If specified as a scalar, all violin plots are drawn with the same maximum width.

Default is widths=0.5

# step4 Plot violin plots
ax.violinplot(all_data, widths=0.5)

widths = array

The array should have the same number of elements as the number of violin plots.

The maximum value of each violin diagram corresponding to an array can be changed.

# step4 Plot violin plots
ax.violinplot(all_data, widths=[0.2, 0.5, 1])

Displays mean, extrema, and median

For violin and box plots, it is important to display the mean, extrema, and median.

First, let’s draw a violin diagram with everything hidden

# step4 Plot violin plots
ax.violinplot(all_data, showmeans=False, showextrema=False, showmedians=False)

Mean value (showmeans)

Define the violinplot function argument showmeans to be True.

In the graph below, set_color was used to change the line color.

# step4 Plot violin plots
showmeans = ax.violinplot(all_data, showmeans=True, showextrema=False, showmedians=False)
showmeans['cmeans'].set_color('C1')

Extrema (showextrema)

Define the violinplot function argument showextrema to be True.

In the graph below, set_color is used to change the line color of the lower and upper extrema.

# step4 Plot violin plots
showextrem = ax.violinplot(all_data, showmeans=False, showextrema=True, showmedians=False)
showextrem['cmins'].set_color('C2')
showextrem['cmaxes'].set_color('C3')

Median value (showmedians)

Define the violinplot function argument showmedians to be True.

In the graph below, set_color was used to change the line color.

# step4 Plot violin plots
showmedians = ax.violinplot(all_data, showmeans=False, showextrema=False, showmedians=True)
showmedians['cmedians'].set_color('C1')

Set Colors (bodies)

The violin plot sets the color using the returned dict key, bodies.

Various settings are available, but the most commonly used color settings are described below.

Color of data distribution area (set_color, set_facecolor)

Customize the color of the data distribution area

Use set_color for bodies as the return key

# step4 Plot violin plots
vio = ax.violinplot(all_data)

# Use bodies as the return key
for body in vio['bodies']:
    body.set_color('C1')

plt.show()

The same can be done with set_facecolor for bodies in the return key

# Use bodies as the return key
for body in vio['bodies']:
    body.set_facecolor('C2')

Color and width of the data distribution frame (set_edgecolor, set_linewidth)

Customize the color of the frame of the data distribution

Use set_edgecolor for bodies as return key

The line width of the border can be adjusted for better clarity by using set_linewidth at the same time.

# step4 Plot violin plots
vio = ax.violinplot(all_data)

# Use bodies as the return key
for body in vio['bodies']:
    body.set_edgecolor('red')
    body.set_linewidth(3)

plt.show()

Comparison of Violin and Box plot

The violin plot is especially useful when there are multiple peaks in the data distribution. The mean (or median) and interquartile range are plotted inside the violin plot.

Note that many people do not know the violin diagram, so it is difficult for those who do not know to understand it.

Violin plot

Includes all distribution data, not just summary statistics

# step4 Plot violin plots
vio = ax.violinplot(all_data, showmeans=True, showmedians=False)
vio['cmeans'].set_color('C1')

ax.set_xticks(np.arange(1, len(labels) + 1), labels=labels)

Box-and-Whisker plot

Only the summary statistics, mean/median and interquartile range, are included.

# step4 Plot Box-and-Whisker plots
ax.boxplot(all_data)

ax.set_xticks(np.arange(1, len(labels) + 1), labels=labels)

References

Axes.violinplot function

Basics of Violin plot

Customize a violin plot

Box-and-whisker plot vs. violin plot

I hope you will share it with me!
  • URLをコピーしました!

Comments

コメントする

Table of Contents