I have a few Pandas DataFrames sharing the same value scale, but having different columns and indices. When invoking df.plot(), I get separate plot images. what I really want is to have them all in the same plot as subplots, but I’m unfortunately failing to come up with a solution to how and would highly appreciate some help.
You can manually create the subplots with matplotlib, and then plot the dataframes on a specific subplot using the ax keyword. For example for 4 subplots (2×2):
Here axes is an array which holds the different subplot axes, and you can access one just by indexing axes.
If you want a shared x-axis, then you can provide sharex=True to plt.subplots.
You can see e.gs. in the documentation demonstrating joris answer. Also from the documentation, you could also set subplots=True and layout=(,) within the pandas plot function:
df.plot(subplots=True, layout=(1,2))
You could also use fig.add_subplot() which takes subplot grid parameters such as 221, 222, 223, 224, etc. as described in the post here. Nice examples of plot on pandas data frame, including subplots, can be seen in this ipython notebook.
plt.figure(1)
plt.subplot(2,2,1)
df.A.plot()#no need to specify for first axis
plt.subplot(2,2,2)
df.B.plot(ax=plt.gca())
plt.subplot(2,2,3)
df.C.plot(ax=plt.gca())
You can use the familiar Matplotlib style calling a figure and subplot, but you simply need to specify the current axis using plt.gca(). An example:
plt.figure(1)
plt.subplot(2,2,1)
df.A.plot() #no need to specify for first axis
plt.subplot(2,2,2)
df.B.plot(ax=plt.gca())
plt.subplot(2,2,3)
df.C.plot(ax=plt.gca())
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
# dataframe sample data
df1 = pd.DataFrame(np.random.rand(10,2)*100, columns=['A','B'])
df2 = pd.DataFrame(np.random.rand(10,2)*100, columns=['A','B'])
df3 = pd.DataFrame(np.random.rand(10,2)*100, columns=['A','B'])
df4 = pd.DataFrame(np.random.rand(10,2)*100, columns=['A','B'])
df5 = pd.DataFrame(np.random.rand(10,2)*100, columns=['A','B'])
df6 = pd.DataFrame(np.random.rand(10,2)*100, columns=['A','B'])#define number of rows and columns for subplots
nrow=3
ncol=2# make a list of all dataframes
df_list =[df1 ,df2, df3, df4, df5, df6]
fig, axes = plt.subplots(nrow, ncol)# plot counter
count=0for r in range(nrow):for c in range(ncol):
df_list[count].plot(ax=axes[r,c])
count=+1
You can plot multiple subplots of multiple pandas data frames using matplotlib with a simple trick of making a list of all data frame. Then using the for loop for plotting subplots.
Working code:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
# dataframe sample data
df1 = pd.DataFrame(np.random.rand(10,2)*100, columns=['A', 'B'])
df2 = pd.DataFrame(np.random.rand(10,2)*100, columns=['A', 'B'])
df3 = pd.DataFrame(np.random.rand(10,2)*100, columns=['A', 'B'])
df4 = pd.DataFrame(np.random.rand(10,2)*100, columns=['A', 'B'])
df5 = pd.DataFrame(np.random.rand(10,2)*100, columns=['A', 'B'])
df6 = pd.DataFrame(np.random.rand(10,2)*100, columns=['A', 'B'])
#define number of rows and columns for subplots
nrow=3
ncol=2
# make a list of all dataframes
df_list = [df1 ,df2, df3, df4, df5, df6]
fig, axes = plt.subplots(nrow, ncol)
# plot counter
count=0
for r in range(nrow):
for c in range(ncol):
df_list[count].plot(ax=axes[r,c])
count=+1
Using this code you can plot subplots in any configuration. You need to just define number of rows nrow and number of columns ncol. Also, you need to make list of data frames df_list which you wanted to plot.
x = np.linspace(0,2*np.pi,400)
y = np.sin(x**2)
f, axes = plt.subplots(2,1)for c, i in enumerate(axes):
axes[c].plot(x, y)
axes[c].set_title('cats')
plt.tight_layout()
You may not need to use Pandas at all. Here’s a matplotlib plot of cat frequencies:
x = np.linspace(0, 2*np.pi, 400)
y = np.sin(x**2)
f, axes = plt.subplots(2, 1)
for c, i in enumerate(axes):
axes[c].plot(x, y)
axes[c].set_title('cats')
plt.tight_layout()
import pandas as pd
import numpy as np # used for random dataimport random # used for random dataimport matplotlib.pyplot as plt
from matplotlib.patches importPatch# for custom legendimport seaborn as sns
import math import ceil # determine correct number of subplot# synthetic data
df_dict = dict()for i in range(1,7):
np.random.seed(i)
random.seed(i)
data_length =100
data ={'cat':[random.choice(['A','B','C'])for _ in range(data_length)],'x': np.random.rand(data_length),'y': np.random.rand(data_length)}
df_dict[i]= pd.DataFrame(data)# display(df_dict[1].head())
cat x y
0 A 0.4170220.3266451 C 0.7203240.5270582 A 0.0001140.8859423 B 0.3023330.3572704 A 0.1467560.908535
创建颜色映射并绘制
# create color mapping based on all unique values of cat
unique_cat ={cat for v in df_dict.values()for cat in v.cat.unique()}# get unique cats
colors = sns.color_palette('husl', n_colors=len(unique_cat))# get a number of colors
cmap = dict(zip(unique_cat, colors))# zip values to colors# iterate through dictionary and plot
col_nums =3# how many plots per row
row_nums = math.ceil(len(df_dict)/ col_nums)# how many rows of plots
plt.figure(figsize=(10,5))# change the figure size as neededfor i,(k, v)in enumerate(df_dict.items(),1):
plt.subplot(row_nums, col_nums, i)# create subplots
p = sns.scatterplot(data=v, x='x', y='y', hue='cat', palette=cmap)
p.legend_.remove()# remove the individual plot legends
plt.title(f'DataFrame: {k}')
plt.tight_layout()# create legend from cmap
patches =[Patch(color=v, label=k)for k, v in cmap.items()]# place legend outside of plot; change the right bbox value to move the legend up or down
plt.legend(handles=patches, bbox_to_anchor=(1.06,1.2), loc='center left', borderaxespad=0)
plt.show()
How to create multiple plots from a dictionary of dataframes with long (tidy) data
Assumptions
There is a dictionary of multiple dataframes of tidy data
Created by reading in from files
Created by separating a single dataframe into multiple dataframes
The categories, cat, may be overlapping, but all dataframes may not contain all values of cat
hue='cat'
Because dataframes are being iterated through, there’s not guarantee that colors will be mapped the same for each plot
A custom color map needs to be created from the unique 'cat' values for all the dataframes
Since the colors will be the same, place one legend to the side of the plots, instead of a legend in every plot
Imports and synthetic data
import pandas as pd
import numpy as np # used for random data
import random # used for random data
import matplotlib.pyplot as plt
from matplotlib.patches import Patch # for custom legend
import seaborn as sns
import math import ceil # determine correct number of subplot
# synthetic data
df_dict = dict()
for i in range(1, 7):
np.random.seed(i)
random.seed(i)
data_length = 100
data = {'cat': [random.choice(['A', 'B', 'C']) for _ in range(data_length)],
'x': np.random.rand(data_length),
'y': np.random.rand(data_length)}
df_dict[i] = pd.DataFrame(data)
# display(df_dict[1].head())
cat x y
0 A 0.417022 0.326645
1 C 0.720324 0.527058
2 A 0.000114 0.885942
3 B 0.302333 0.357270
4 A 0.146756 0.908535
Create color mappings and plot
# create color mapping based on all unique values of cat
unique_cat = {cat for v in df_dict.values() for cat in v.cat.unique()} # get unique cats
colors = sns.color_palette('husl', n_colors=len(unique_cat)) # get a number of colors
cmap = dict(zip(unique_cat, colors)) # zip values to colors
# iterate through dictionary and plot
col_nums = 3 # how many plots per row
row_nums = math.ceil(len(df_dict) / col_nums) # how many rows of plots
plt.figure(figsize=(10, 5)) # change the figure size as needed
for i, (k, v) in enumerate(df_dict.items(), 1):
plt.subplot(row_nums, col_nums, i) # create subplots
p = sns.scatterplot(data=v, x='x', y='y', hue='cat', palette=cmap)
p.legend_.remove() # remove the individual plot legends
plt.title(f'DataFrame: {k}')
plt.tight_layout()
# create legend from cmap
patches = [Patch(color=v, label=k) for k, v in cmap.items()]
# place legend outside of plot; change the right bbox value to move the legend up or down
plt.legend(handles=patches, bbox_to_anchor=(1.06, 1.2), loc='center left', borderaxespad=0)
plt.show()