plt.figure(figsize=(10,6))sns.scatterplot(x='Year',y='ratings',color='hotpink',data=disney_df)plt.title('Rotten Tomatoes rating for Disney movies by year')plt.ylabel('Ratings out of 100')plt.yticks(ticks=range(0,100,5))plt.xticks(ticks=range(1920,2024,10))#using regplot to add a regression line# sns.regplot(x='Year', # y='ratings', # data=disney_df,# scatter=False, # Disable scatter points to avoid duplicate points# color='skyblue')# plt.ylabel('Ratings out of 100')# plt.yticks(ticks=range(0,100,5))plt.show()
trends –> visual representaion of the relationship between x and y
can be used to predict future values
summarises the overall direction and strength of the relationship
identifies outliers
# Using sns.lmplotsns.lmplot(x='Year',y='ratings',height=6,aspect=1.5,data=disney_df,scatter_kws={'color':'turquoise'},line_kws={'color':'hotpink'})# Change the color of the regression lineplt.title('Rotten Tomatoes rating for Disney movies by year')plt.ylabel('Ratings out of 100')plt.yticks(ticks=range(0,105,5))plt.xticks(ticks=range(1920,2024,10))plt.show()
Regression Line: represents the trend of the data. The shaded area around the line represents the confidence interval, indicating the uncertainty of the regression estimate.
The regression line is relatively flat, suggesting that there is no strong trend in the ratings over time. This implies that the average Rotten Tomatoes rating for Disney movies has remained relatively stable over the decades.
There is significant variability in the ratings, especially in recent years (from the 1990s onwards), indicating that Disney has released movies with both very high and very low ratings.
Earlier years (1930s to 1950s) show fewer movies, with a tendency towards lower ratings compared to later years.
Recent Years: The dense clustering of data points in recent years indicates that Disney has released a higher number of movies. The ratings for these movies vary widely, but there is no clear upward or downward trend in average ratings.
Flat Regression Line: The lack of a strong slope in the regression line suggests that, on average, Disney movies’ Rotten Tomatoes ratings have not significantly improved or declined over time.
High Variability: The wide spread of points indicates that Disney has produced a diverse range of movies, with some receiving very high ratings and others very low ratings, particularly in recent decades.
Categorical Plots
These plots are used to show the distribution of data across different categories.
stripplot: sns.stripplot()
swarmplot: sns.swarmplot()
boxplot: sns.boxplot()
boxenplot (an enhanced box plot): sns.boxenplot()
violinplot: sns.violinplot()
catplot: A figure-level function for creating categorical plots.
penguin_lter=pd.read_csv('/home/annie/Python/data/penguins_lter.csv')penguin_df=penguin_lter.dropna(subset=['Sex'])#drop NaNs from Sex columnpenguin_df=penguin_df.drop(columns=['Sample Number','Individual ID','Stage','Clutch Completion','Clutch Completion','Date Egg','Comments'])penguin_df=penguin_df[penguin_df['Sex']!='.']penguin_df.head()# penguin_df.info()
#penguin_dfsns.boxplot(data=penguin_df,x='Species',y='Body Mass (g)',palette='mako',hue='Species')plt.xticks(rotation=45)plt.xticks(ticks=[0,1,2],labels=['Adelie','Chinstrap','Gentoo'])# rename ticker labelsplt.title('Body Mass Distribution by species')plt.show()
sns.boxenplot(data=penguin_df,x='Sex',y='Body Mass (g)',palette='viridis',hue='Sex')plt.xticks(rotation=45)plt.title('Body Mass Distribution by sex')plt.show()
sns.violinplot(data=penguin_df,x='Species',y='Culmen Length (mm)',hue='Sex',palette='mako')plt.xticks(ticks=[0,1,2],labels=['Adelie','Chinstrap','Gentoo'])# rename ticker labelsplt.legend(loc="upper left")plt.title('Culmen Length Distribution by species and sex')plt.show()
# sns.stripplot(x=disney_df['ratings'])# plt.show()sns.stripplot(x='Culmen Depth (mm)',y='Species',hue='Species',data=penguin_df)plt.yticks(ticks=[0,1,2],labels=['Adelie','Chinstrap','Gentoo'])plt.title('Culmen Depth Distribution by species')plt.show()
# sns.swarmplot(x=disney_df['ratings'])# plt.show()sns.swarmplot(x='Flipper Length (mm)',y='Species',hue='Species',data=penguin_df)plt.yticks(ticks=[0,1,2],labels=['Adelie','Chinstrap','Gentoo'])plt.title('Flipper Length Distribution by species')plt.show()
Distribution Plots
These plots show the distribution of a single variable.
histplot (aka histogram): sns.histplot()
kdeplot (Kernel Density Estimate plot): sns.kdeplot()
ecdfplot (Empirical Cumulative Distribution Function): sns.ecdfplot()
displot: A figure-level function for creating histograms and KDE plots.
disney_df.head(2)
Title
Year
Age
Rotten Tomatoes
Disney+
Type
ratings
270
White Fang
2018
7+
76/100
1
0
76
712
Muppets Most Wanted
2014
7+
67/100
1
0
67
penguin_df.head(2)# penguin_df['Island'].unique()
studyName
Species
Region
Island
Culmen Length (mm)
Culmen Depth (mm)
Flipper Length (mm)
Body Mass (g)
Sex
Delta 15 N (o/oo)
Delta 13 C (o/oo)
0
PAL0708
Adelie Penguin (Pygoscelis adeliae)
Anvers
Torgersen
39.1
18.7
181.0
3750.0
MALE
NaN
NaN
1
PAL0708
Adelie Penguin (Pygoscelis adeliae)
Anvers
Torgersen
39.5
17.4
186.0
3800.0
FEMALE
8.94956
-24.69454
sns.histplot(disney_df,x='Year',kde=True,color='hotpink')plt.title('Distribution of Disney Movies between 1920-2020')plt.show()
adelie_matrix=penguin_df[penguin_df['Species']=='Adelie Penguin (Pygoscelis adeliae)']adelie_matrix=adelie_matrix.drop(columns=['Species','studyName','Region','Island','Sex','Delta 15 N (o/oo)','Delta 13 C (o/oo)'])
adelie_matrix.head()
Culmen Length (mm)
Culmen Depth (mm)
Flipper Length (mm)
Body Mass (g)
0
39.1
18.7
181.0
3750.0
1
39.5
17.4
186.0
3800.0
2
40.3
18.0
195.0
3250.0
4
36.7
19.3
193.0
3450.0
5
39.3
20.6
190.0
3650.0
# sns.heatmap(df)plt.figure(figsize=(8,6))sns.heatmap(adelie_matrix.corr(),cmap='mako_r')plt.title('Correlation Heatmap of Adelie Penguin Measurements')plt.show()
Multi-Plot Grids
These are used for plotting multiple plots in a grid layout.
peng=sns.FacetGrid(penguins,col='species',#creates separate columns for each unique value in the Species columnrow='sex',hue='sex',palette="mako_r",sharex=False)peng.map(sns.histplot,'bill_length_mm')# plt.xticks(ticks=[0, 1, 2], labels=['Adelie', 'Chinstrap', 'Gentoo'])plt.show()
penguin_df.head(2)
studyName
Species
Region
Island
Culmen Length (mm)
Culmen Depth (mm)
Flipper Length (mm)
Body Mass (g)
Sex
Delta 15 N (o/oo)
Delta 13 C (o/oo)
0
PAL0708
Adelie Penguin (Pygoscelis adeliae)
Anvers
Torgersen
39.1
18.7
181.0
3750.0
MALE
NaN
NaN
1
PAL0708
Adelie Penguin (Pygoscelis adeliae)
Anvers
Torgersen
39.5
17.4
186.0
3800.0
FEMALE
8.94956
-24.69454
sns.pairplot(x_vars=['Culmen Length (mm)','Culmen Depth (mm)','Flipper Length (mm)','Body Mass (g)'],y_vars=['Culmen Length (mm)','Culmen Depth (mm)','Flipper Length (mm)','Body Mass (g)'],hue='Species',data=penguin_df)plt.suptitle('Pairplot of Penguin Measurements by Species',y=1.02)plt.show()
# sns.PairGrid()pg=sns.PairGrid(penguin_df,x_vars=['Culmen Length (mm)','Culmen Depth (mm)','Flipper Length (mm)','Body Mass (g)'],y_vars=['Culmen Length (mm)','Culmen Depth (mm)','Flipper Length (mm)','Body Mass (g)'],hue='Species',palette='cubehelix')pg.map(sns.scatterplot)pg.add_legend()plt.suptitle('Pair Grid showing Penguin Measurements by Species',y=1.02)plt.show()
pg2=sns.PairGrid(penguin_df,x_vars=['Culmen Length (mm)','Culmen Depth (mm)','Flipper Length (mm)','Body Mass (g)'],y_vars=['Culmen Length (mm)','Culmen Depth (mm)','Flipper Length (mm)','Body Mass (g)'],hue='Species',palette='cubehelix')pg2.map_upper(sns.scatterplot)pg2.map_lower(sns.kdeplot)pg2.map_diag(sns.histplot)# pg.add_legend()# plt.suptitle('Pair Grid showing Penguin Measurements by Species', y=1.02)plt.show()
Joint Plots
These combine univariate and bivariate plots to show relationships between two variables.
These plots are used to visualise time series data.
lineplot: sns.lineplot()
disney_df.head(2)
Title
Year
Age
Rotten Tomatoes
Disney+
Type
ratings
270
White Fang
2018
7+
76/100
1
0
76
712
Muppets Most Wanted
2014
7+
67/100
1
0
67
plt.figure(figsize=(8,6))sns.lineplot(x='Year',y='ratings',data=disney_df,color='hotpink')plt.title('Disney ratings over time')plt.xticks(ticks=range(1920,2024,10))plt.show()
Statistical Estimation
These plots are used to show statistical estimates of the data.
sns.barplot(data=penguin_df,x='Species',y='Culmen Length (mm)',color='lavender')plt.xticks(ticks=[0,1,2],labels=['Adelie','Chinstrap','Gentoo'])plt.title('Culmen Length of Penguins by species')plt.show()
sns.barplot(data=penguin_df,x='Species',y='Culmen Length (mm)',hue='Sex',palette='mako')plt.xticks(ticks=[0,1,2],labels=['Adelie','Chinstrap','Gentoo'])plt.title('Culmen Length of Penguins by sex and species')plt.show()
titanic=sns.load_dataset('titanic')titanic.head()
survived
pclass
sex
age
sibsp
parch
fare
embarked
class
who
adult_male
deck
embark_town
alive
alone
0
0
3
male
22.0
1
0
7.2500
S
Third
man
True
NaN
Southampton
no
False
1
1
1
female
38.0
1
0
71.2833
C
First
woman
False
C
Cherbourg
yes
False
2
1
3
female
26.0
0
0
7.9250
S
Third
woman
False
NaN
Southampton
yes
True
3
1
1
female
35.0
1
0
53.1000
S
First
woman
False
C
Southampton
yes
False
4
0
3
male
35.0
0
0
8.0500
S
Third
man
True
NaN
Southampton
no
True
sns.barplot(data=titanic,x='pclass',y='age',palette='viridis',hue='alive')plt.title('Passenger Age by Class and Survival Status on the Titanic')plt.show()