What The Plot | Annie Boskova

Seaborn plots exploration

import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
import seaborn as sns

Relational Plots

These plots show relationships between two or more variables.

scatterplot: sns.scatterplot()

relplot: A figure-level function for creating scatter and line plots.

movies = pd.read_csv('/home/annie/Python/data/MoviesOnStreamingPlatforms.csv')
movies.head()

	Unnamed: 0	ID	Title	Year	Age	Rotten Tomatoes	Netflix
0	0	1	The Irishman	2019	18+	98/100	1
1	1	2	Dangal	2016	7+	97/100	1
2	2	3	David Attenborough: A Life on Our Planet	2020	7+	95/100	1
3	3	4	Lagaan: Once Upon a Time in India	2001	7+	94/100	1
4	4	5	Roma	2018	18+	94/100	1

movies.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9515 entries, 0 to 9514
Data columns (total 11 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   Unnamed: 0       9515 non-null   int64 
 1   ID               9515 non-null   int64 
 2   Title            9515 non-null   object
 3   Year             9515 non-null   int64 
 4   Age              5338 non-null   object
 5   Rotten Tomatoes  9508 non-null   object
 6   Netflix          9515 non-null   int64 
 7   Hulu             9515 non-null   int64 
 8   Prime Video      9515 non-null   int64 
 9   Disney+          9515 non-null   int64 
 10  Type             9515 non-null   int64 
dtypes: int64(8), object(3)
memory usage: 817.8+ KB

movie_ratings_df = movies.copy().drop(columns=['Unnamed: 0', 'ID'])
movie_ratings_df['ratings'] = movie_ratings_df['Rotten Tomatoes'].str.replace('/100', '').fillna('0').astype(int)
movie_ratings_df.head()
# movie_ratings_df[movie_ratings_df['ratings'] == '0']

	Title	Year	Age	Rotten Tomatoes	Netflix	ratings
0	The Irishman	2019	18+	98/100	1	98
1	Dangal	2016	7+	97/100	1	97
2	David Attenborough: A Life on Our Planet	2020	7+	95/100	1	95
3	Lagaan: Once Upon a Time in India	2001	7+	94/100	1	94
4	Roma	2018	18+	94/100	1	94

# movie_ratings_df.info()

disney_df = movie_ratings_df.copy().drop(columns=['Netflix', 'Hulu', 'Prime Video'])
disney_df = disney_df[disney_df['Disney+'] == 1]

disney_df.head()

	Title	Year	Age	Rotten Tomatoes	Disney+	ratings
270	White Fang	2018	7+	76/100	1	76
712	Muppets Most Wanted	2014	7+	67/100	1	67
1330	Zapped	2014	all	59/100	1	59
1813	The Blue Umbrella	2005	NaN	54/100	1	54
2029	Sky High	2020	NaN	51/100	1	51

plt.figure(figsize=(10, 6))
sns.scatterplot(x='Year', 
                y='ratings', 
                color='hotpink',
                data=disney_df)
plt.title('Rotten Tomatoes rating for Disney movies by year')
plt.ylabel('Ratings out of 100')
plt.yticks(ticks=range(0,100,5))
plt.xticks(ticks=range(1920, 2024, 10))

#using regplot to add a regression line

# sns.regplot(x='Year', 
#             y='ratings', 
#             data=disney_df,
#             scatter=False, # Disable scatter points to avoid duplicate points
#             color='skyblue')
# plt.ylabel('Ratings out of 100')
# plt.yticks(ticks=range(0,100,5))
plt.show()

disney_df[disney_df['ratings']<20].sort_values(by='Year', ascending=False)

	Title	Year	Age	Rotten Tomatoes	Disney+	ratings
9503	Disney My Music Story: Sukima Switch	2021	16+	16/100	1	16
9509	Built for Mars: The Perseverance Rover	2021	NaN	14/100	1	14
9502	Sharkcano	2020	NaN	16/100	1	16
9507	Texas Storm Squad	2020	13+	14/100	1	14
9508	What the Shark?	2020	13+	14/100	1	14
9510	Most Wanted Sharks	2020	NaN	14/100	1	14
9511	Doc McStuffins: The Doc Is In	2020	NaN	13/100	1	13
9505	Great Shark Chow Down	2019	7+	14/100	1	14
9512	Ultimate Viking Sword	2019	NaN	13/100	1	13
9514	Women of Impact: Changing the World	2019	7+	10/100	1	10
9504	Big Cat Games	2015	NaN	15/100	1	15
9513	Hunt for the Abominable Snowman	2011	NaN	10/100	1	10
9506	In Beaver Valley	1950	NaN	14/100	1	14

Regression line

trends –> visual representaion of the relationship between x and y
can be used to predict future values
summarises the overall direction and strength of the relationship
identifies outliers

# Using sns.lmplot
sns.lmplot(x='Year', 
                y='ratings', 
                height=6,
                aspect=1.5,
                data=disney_df,
                scatter_kws={'color': 'turquoise'},
                line_kws={'color': 'hotpink'})  # Change the color of the regression line

plt.title('Rotten Tomatoes rating for Disney movies by year')
plt.ylabel('Ratings out of 100')
plt.yticks(ticks=range(0,105,5))
plt.xticks(ticks=range(1920, 2024, 10))
plt.show()

Regression Line: represents the trend of the data. The shaded area around the line represents the confidence interval, indicating the uncertainty of the regression estimate.
The regression line is relatively flat, suggesting that there is no strong trend in the ratings over time. This implies that the average Rotten Tomatoes rating for Disney movies has remained relatively stable over the decades.
There is significant variability in the ratings, especially in recent years (from the 1990s onwards), indicating that Disney has released movies with both very high and very low ratings. Earlier years (1930s to 1950s) show fewer movies, with a tendency towards lower ratings compared to later years.
Recent Years: The dense clustering of data points in recent years indicates that Disney has released a higher number of movies. The ratings for these movies vary widely, but there is no clear upward or downward trend in average ratings.
Flat Regression Line: The lack of a strong slope in the regression line suggests that, on average, Disney movies’ Rotten Tomatoes ratings have not significantly improved or declined over time.
High Variability: The wide spread of points indicates that Disney has produced a diverse range of movies, with some receiving very high ratings and others very low ratings, particularly in recent decades.

Categorical Plots

These plots are used to show the distribution of data across different categories.

stripplot: sns.stripplot()
swarmplot: sns.swarmplot()
boxplot: sns.boxplot()
boxenplot (an enhanced box plot): sns.boxenplot()
violinplot: sns.violinplot()

catplot: A figure-level function for creating categorical plots.

penguin_lter = pd.read_csv('/home/annie/Python/data/penguins_lter.csv')
penguin_df = penguin_lter.dropna(subset=['Sex']) #drop NaNs from Sex column
penguin_df = penguin_df.drop(columns=['Sample Number', 'Individual ID', 'Stage', 'Clutch Completion', 'Clutch Completion', 'Date Egg', 'Comments'])
penguin_df = penguin_df[penguin_df['Sex'] != '.']
penguin_df.head()

# penguin_df.info()

	studyName	Species	Region	Island	Culmen Length (mm)	Culmen Depth (mm)	Flipper Length (mm)	Body Mass (g)	Sex	Delta 15 N (o/oo)	Delta 13 C (o/oo)
0	PAL0708	Adelie Penguin (Pygoscelis adeliae)	Anvers	Torgersen	39.1	18.7	181.0	3750.0	MALE	NaN	NaN
1	PAL0708	Adelie Penguin (Pygoscelis adeliae)	Anvers	Torgersen	39.5	17.4	186.0	3800.0	FEMALE	8.94956	-24.69454
2	PAL0708	Adelie Penguin (Pygoscelis adeliae)	Anvers	Torgersen	40.3	18.0	195.0	3250.0	FEMALE	8.36821	-25.33302
4	PAL0708	Adelie Penguin (Pygoscelis adeliae)	Anvers	Torgersen	36.7	19.3	193.0	3450.0	FEMALE	8.76651	-25.32426
5	PAL0708	Adelie Penguin (Pygoscelis adeliae)	Anvers	Torgersen	39.3	20.6	190.0	3650.0	MALE	8.66496	-25.29805

# penguin_lter['Clutch Completion'].unique()
# penguin_lter['studyName'].unique()
# penguin_lter['Date Egg'].unique()
# penguin_lter['Stage'].unique()

# penguin_clean[penguin_clean['Sex']=='.']
penguin_df.head()

	studyName	Species	Region	Island	Culmen Length (mm)	Culmen Depth (mm)	Flipper Length (mm)	Body Mass (g)	Sex	Delta 15 N (o/oo)	Delta 13 C (o/oo)
0	PAL0708	Adelie Penguin (Pygoscelis adeliae)	Anvers	Torgersen	39.1	18.7	181.0	3750.0	MALE	NaN	NaN
1	PAL0708	Adelie Penguin (Pygoscelis adeliae)	Anvers	Torgersen	39.5	17.4	186.0	3800.0	FEMALE	8.94956	-24.69454
2	PAL0708	Adelie Penguin (Pygoscelis adeliae)	Anvers	Torgersen	40.3	18.0	195.0	3250.0	FEMALE	8.36821	-25.33302
4	PAL0708	Adelie Penguin (Pygoscelis adeliae)	Anvers	Torgersen	36.7	19.3	193.0	3450.0	FEMALE	8.76651	-25.32426
5	PAL0708	Adelie Penguin (Pygoscelis adeliae)	Anvers	Torgersen	39.3	20.6	190.0	3650.0	MALE	8.66496	-25.29805

penguin_df.sample(5)

	studyName	Species	Region	Island	Culmen Length (mm)	Culmen Depth (mm)	Flipper Length (mm)	Body Mass (g)	Sex	Delta 15 N (o/oo)	Delta 13 C (o/oo)
309	PAL0910	Gentoo penguin (Pygoscelis papua)	Anvers	Biscoe	52.1	17.0	230.0	5550.0	MALE	8.27595	-26.11657
294	PAL0809	Gentoo penguin (Pygoscelis papua)	Anvers	Biscoe	46.4	15.0	216.0	4700.0	FEMALE	8.47938	-26.95470
50	PAL0809	Adelie Penguin (Pygoscelis adeliae)	Anvers	Biscoe	39.6	17.7	186.0	3500.0	FEMALE	8.46616	-26.12989
157	PAL0708	Chinstrap penguin (Pygoscelis antarctica)	Anvers	Dream	45.2	17.8	198.0	3950.0	FEMALE	8.88942	-24.49433
111	PAL0910	Adelie Penguin (Pygoscelis adeliae)	Anvers	Biscoe	45.6	20.3	191.0	4600.0	MALE	8.65466	-26.32909

penguin_df['Sex'].unique()

array(['MALE', 'FEMALE'], dtype=object)

#penguin_df

sns.boxplot(data=penguin_df, x='Species', y='Body Mass (g)', palette='mako', hue='Species')
plt.xticks(rotation=45)  
plt.xticks(ticks=[0, 1, 2], labels=['Adelie', 'Chinstrap', 'Gentoo']) # rename ticker labels
plt.title('Body Mass Distribution by species')
plt.show()

sns.boxenplot(data=penguin_df, x='Sex', y='Body Mass (g)', palette='viridis', hue='Sex')
plt.xticks(rotation=45)  
plt.title('Body Mass Distribution by sex')
plt.show()

# #disney_df
# sns.boxplot(x=disney_df['ratings'])
# plt.show()

# sns.boxenplot(x=disney_df['ratings'])
# plt.show()

sns.violinplot(data=penguin_df, x='Species', y='Culmen Length (mm)', hue='Sex', palette='mako')
plt.xticks(ticks=[0, 1, 2], labels=['Adelie', 'Chinstrap', 'Gentoo']) # rename ticker labels
plt.legend(loc="upper left")
plt.title('Culmen Length Distribution by species and sex')
plt.show()

# sns.stripplot(x=disney_df['ratings'])
# plt.show()

sns.stripplot(x='Culmen Depth (mm)', y='Species', hue='Species', data=penguin_df)
plt.yticks(ticks=[0, 1, 2], labels=['Adelie', 'Chinstrap', 'Gentoo'])
plt.title('Culmen Depth Distribution by species')
plt.show()

# sns.swarmplot(x=disney_df['ratings'])
# plt.show()

sns.swarmplot(x='Flipper Length (mm)',y='Species', hue='Species', data=penguin_df)
plt.yticks(ticks=[0, 1, 2], labels=['Adelie', 'Chinstrap', 'Gentoo'])
plt.title('Flipper Length Distribution by species')
plt.show()

Distribution Plots

These plots show the distribution of a single variable.

histplot (aka histogram): sns.histplot()
kdeplot (Kernel Density Estimate plot): sns.kdeplot()
ecdfplot (Empirical Cumulative Distribution Function): sns.ecdfplot()

displot: A figure-level function for creating histograms and KDE plots.

disney_df.head(2)

	Title	Year	Age	Rotten Tomatoes	Disney+	Type	ratings
270	White Fang	2018	7+	76/100	1	0	76
712	Muppets Most Wanted	2014	7+	67/100	1	0	67

penguin_df.head(2)
# penguin_df['Island'].unique()

	studyName	Species	Region	Island	Culmen Length (mm)	Culmen Depth (mm)	Flipper Length (mm)	Body Mass (g)	Sex	Delta 15 N (o/oo)	Delta 13 C (o/oo)
0	PAL0708	Adelie Penguin (Pygoscelis adeliae)	Anvers	Torgersen	39.1	18.7	181.0	3750.0	MALE	NaN	NaN
1	PAL0708	Adelie Penguin (Pygoscelis adeliae)	Anvers	Torgersen	39.5	17.4	186.0	3800.0	FEMALE	8.94956	-24.69454

sns.histplot(disney_df, x='Year', 
             kde=True, 
             color='hotpink')
plt.title('Distribution of Disney Movies between 1920-2020')
plt.show()

sns.kdeplot(disney_df, x='Year', color='hotpink')
plt.show()

sns.histplot(data=penguin_df, x='Body Mass (g)')
plt.show()

sns.kdeplot(x=disney_df['ratings']) #'Year'
plt.show()

sns.ecdfplot(x=disney_df['ratings'])
plt.show()

Matrix Plots

These plots are used to visualize data in matrix form.

heatmap: sns.heatmap()
clustermap (hierarchically-clustered heatmap): sns.clustermap()

adelie_matrix = penguin_df[penguin_df['Species']=='Adelie Penguin (Pygoscelis adeliae)']
adelie_matrix = adelie_matrix.drop(columns=['Species','studyName', 'Region', 'Island', 'Sex', 'Delta 15 N (o/oo)', 'Delta 13 C (o/oo)'])

adelie_matrix.head()

	Culmen Length (mm)	Culmen Depth (mm)	Flipper Length (mm)	Body Mass (g)
0	39.1	18.7	181.0	3750.0
1	39.5	17.4	186.0	3800.0
2	40.3	18.0	195.0	3250.0
4	36.7	19.3	193.0	3450.0
5	39.3	20.6	190.0	3650.0

# sns.heatmap(df)
plt.figure(figsize=(8, 6))
sns.heatmap(adelie_matrix.corr(), cmap='mako_r')
plt.title('Correlation Heatmap of Adelie Penguin Measurements')
plt.show()

Multi-Plot Grids

These are used for plotting multiple plots in a grid layout.

FacetGrid: sns.FacetGrid()
PairGrid: sns.PairGrid()
pairplot: sns.pairplot()

penguins = sns.load_dataset('penguins')
penguins.head()

	species	island	bill_length_mm	bill_depth_mm	flipper_length_mm	body_mass_g	sex
0	Adelie	Torgersen	39.1	18.7	181.0	3750.0	Male
1	Adelie	Torgersen	39.5	17.4	186.0	3800.0	Female
2	Adelie	Torgersen	40.3	18.0	195.0	3250.0	Female
3	Adelie	Torgersen	NaN	NaN	NaN	NaN	NaN
4	Adelie	Torgersen	36.7	19.3	193.0	3450.0	Female

peng = sns.FacetGrid(penguins, col='species', #creates separate columns for each unique value in the Species column
                     row='sex', hue='sex', palette="mako_r", sharex=False)
peng.map(sns.histplot, 'bill_length_mm')
# plt.xticks(ticks=[0, 1, 2], labels=['Adelie', 'Chinstrap', 'Gentoo'])

plt.show()

penguin_df.head(2)

	studyName	Species	Region	Island	Culmen Length (mm)	Culmen Depth (mm)	Flipper Length (mm)	Body Mass (g)	Sex	Delta 15 N (o/oo)	Delta 13 C (o/oo)
0	PAL0708	Adelie Penguin (Pygoscelis adeliae)	Anvers	Torgersen	39.1	18.7	181.0	3750.0	MALE	NaN	NaN
1	PAL0708	Adelie Penguin (Pygoscelis adeliae)	Anvers	Torgersen	39.5	17.4	186.0	3800.0	FEMALE	8.94956	-24.69454

sns.pairplot(x_vars = ['Culmen Length (mm)', 'Culmen Depth (mm)', 'Flipper Length (mm)', 'Body Mass (g)'],
             y_vars = ['Culmen Length (mm)', 'Culmen Depth (mm)', 'Flipper Length (mm)', 'Body Mass (g)'],
             hue = 'Species',
             data = penguin_df)
plt.suptitle('Pairplot of Penguin Measurements by Species', y=1.02)
plt.show()

# sns.PairGrid()
pg = sns.PairGrid(penguin_df,
                 x_vars = ['Culmen Length (mm)', 'Culmen Depth (mm)', 'Flipper Length (mm)', 'Body Mass (g)'],
                 y_vars = ['Culmen Length (mm)', 'Culmen Depth (mm)', 'Flipper Length (mm)', 'Body Mass (g)'],
                 hue='Species',
                 palette='cubehelix')
pg.map(sns.scatterplot)
pg.add_legend()
plt.suptitle('Pair Grid showing Penguin Measurements by Species', y=1.02)

plt.show()

pg2 = sns.PairGrid(penguin_df,
                 x_vars = ['Culmen Length (mm)', 'Culmen Depth (mm)', 'Flipper Length (mm)', 'Body Mass (g)'],
                 y_vars = ['Culmen Length (mm)', 'Culmen Depth (mm)', 'Flipper Length (mm)', 'Body Mass (g)'],
                 hue='Species',
                 palette='cubehelix')
pg2.map_upper(sns.scatterplot)
pg2.map_lower(sns.kdeplot)
pg2.map_diag(sns.histplot)
# pg.add_legend()
# plt.suptitle('Pair Grid showing Penguin Measurements by Species', y=1.02)

plt.show()

Joint Plots

These combine univariate and bivariate plots to show relationships between two variables.

jointplot: sns.jointplot()
JointGrid: sns.JointGrid()

iris = sns.load_dataset('iris')

iris.sample(5)

	sepal_length	sepal_width	petal_length	petal_width	species
36	5.5	3.5	1.3	0.2	setosa
46	5.1	3.8	1.6	0.2	setosa
52	6.9	3.1	4.9	1.5	versicolor
140	6.7	3.1	5.6	2.4	virginica
106	4.9	2.5	4.5	1.7	virginica

iris.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 5 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   sepal_length  150 non-null    float64
 1   sepal_width   150 non-null    float64
 2   petal_length  150 non-null    float64
 3   petal_width   150 non-null    float64
 4   species       150 non-null    object 
dtypes: float64(4), object(1)
memory usage: 6.0+ KB

# sns.jointplot()

ir = sns.jointplot(data=iris, x="sepal_length", y="petal_length", 
              hue='species', 
              palette='husl')
plt.show()

# sns.jointplot()

ir = sns.jointplot(data=iris, x="sepal_length", y="petal_length", kind='hex')
plt.show()

# sns.JointGrid()

ir2 = sns.JointGrid(data=iris, x="sepal_width", y="petal_width",
              hue='species', 
              palette='husl')

ir2.plot(sns.scatterplot, sns.histplot)
plt.show()

# sns.JointGrid()

ir2 = sns.JointGrid(data=iris, x="sepal_width", y="petal_width",
              hue='species')
                   # height=8)

ir2.plot_joint(sns.scatterplot, s=100, palette='husl', marker="+")
ir2.plot_marginals(sns.histplot, kde=True, palette='mako')
plt.show()

Time Series Plots

These plots are used to visualise time series data.

lineplot: sns.lineplot()

disney_df.head(2)

	Title	Year	Age	Rotten Tomatoes	Disney+	Type	ratings
270	White Fang	2018	7+	76/100	1	0	76
712	Muppets Most Wanted	2014	7+	67/100	1	0	67

plt.figure(figsize=(8,6))
sns.lineplot(x='Year', 
                y='ratings', 
                data=disney_df,
                color='hotpink')
plt.title('Disney ratings over time')
plt.xticks(ticks=range(1920, 2024, 10)) 
plt.show()

Statistical Estimation

These plots are used to show statistical estimates of the data.

barplot: sns.barplot()
pointplot: sns.pointplot()
countplot: sns.countplot()

penguin_df.head(2)

	studyName	Species	Region	Island	Culmen Length (mm)	Culmen Depth (mm)	Flipper Length (mm)	Body Mass (g)	Sex	Delta 15 N (o/oo)	Delta 13 C (o/oo)
0	PAL0708	Adelie Penguin (Pygoscelis adeliae)	Anvers	Torgersen	39.1	18.7	181.0	3750.0	MALE	NaN	NaN
1	PAL0708	Adelie Penguin (Pygoscelis adeliae)	Anvers	Torgersen	39.5	17.4	186.0	3800.0	FEMALE	8.94956	-24.69454

adelie = penguin_df[penguin_df['Species']=='Adelie Penguin (Pygoscelis adeliae)']
adelie

	studyName	Species	Region	Island	Culmen Length (mm)	Culmen Depth (mm)	Flipper Length (mm)	Body Mass (g)	Sex	Delta 15 N (o/oo)	Delta 13 C (o/oo)
0	PAL0708	Adelie Penguin (Pygoscelis adeliae)	Anvers	Torgersen	39.1	18.7	181.0	3750.0	MALE	NaN	NaN
1	PAL0708	Adelie Penguin (Pygoscelis adeliae)	Anvers	Torgersen	39.5	17.4	186.0	3800.0	FEMALE	8.94956	-24.69454
2	PAL0708	Adelie Penguin (Pygoscelis adeliae)	Anvers	Torgersen	40.3	18.0	195.0	3250.0	FEMALE	8.36821	-25.33302
4	PAL0708	Adelie Penguin (Pygoscelis adeliae)	Anvers	Torgersen	36.7	19.3	193.0	3450.0	FEMALE	8.76651	-25.32426
5	PAL0708	Adelie Penguin (Pygoscelis adeliae)	Anvers	Torgersen	39.3	20.6	190.0	3650.0	MALE	8.66496	-25.29805
...	...	...	...	...	...	...	...	...	...	...	...
147	PAL0910	Adelie Penguin (Pygoscelis adeliae)	Anvers	Dream	36.6	18.4	184.0	3475.0	FEMALE	8.68744	-25.83060
148	PAL0910	Adelie Penguin (Pygoscelis adeliae)	Anvers	Dream	36.0	17.8	195.0	3450.0	FEMALE	8.94332	-25.79189
149	PAL0910	Adelie Penguin (Pygoscelis adeliae)	Anvers	Dream	37.8	18.1	193.0	3750.0	MALE	8.97533	-26.03495
150	PAL0910	Adelie Penguin (Pygoscelis adeliae)	Anvers	Dream	36.0	17.1	187.0	3700.0	FEMALE	8.93465	-26.07081
151	PAL0910	Adelie Penguin (Pygoscelis adeliae)	Anvers	Dream	41.5	18.5	201.0	4000.0	MALE	8.89640	-26.06967

146 rows × 11 columns

sns.barplot(data=penguin_df, x='Species', y='Culmen Length (mm)', color='lavender')
plt.xticks(ticks=[0, 1, 2], labels=['Adelie', 'Chinstrap', 'Gentoo'])
plt.title('Culmen Length of Penguins by species')
plt.show()

sns.barplot(data=penguin_df, x='Species', y='Culmen Length (mm)', hue='Sex', palette='mako')
plt.xticks(ticks=[0, 1, 2], labels=['Adelie', 'Chinstrap', 'Gentoo'])
plt.title('Culmen Length of Penguins by sex and species')
plt.show()

titanic = sns.load_dataset('titanic')
titanic.head()

	survived	pclass	sex	age	sibsp	fare	embarked	class	who	adult_male	deck	embark_town	alive	alone
0	0	3	male	22.0	1	7.2500	S	Third	man	True	NaN	Southampton	no	False
1	1	1	female	38.0	1	71.2833	C	First	woman	False	C	Cherbourg	yes	False
2	1	3	female	26.0	0	7.9250	S	Third	woman	False	NaN	Southampton	yes	True
3	1	1	female	35.0	1	53.1000	S	First	woman	False	C	Southampton	yes	False
4	0	3	male	35.0	0	8.0500	S	Third	man	True	NaN	Southampton	no	True

sns.barplot(data=titanic, x='pclass', y='age', palette='viridis', hue='alive')
plt.title('Passenger Age by Class and Survival Status on the Titanic')
plt.show()

# sns.pointplot()

sns.pointplot(data=penguin_df, x='Species', y='Culmen Length (mm)', hue='Sex', palette='viridis')
plt.xticks(ticks=[0, 1, 2], labels=['Adelie', 'Chinstrap', 'Gentoo'])
plt.show()

# sns.countplot()
sns.countplot(data=penguin_df, x='Species', color='turquoise')
plt.xticks(ticks=[0, 1, 2], labels=['Adelie', 'Chinstrap', 'Gentoo'])
plt.show()

penguin_df.Species.value_counts()#.sum()

Species
Adelie Penguin (Pygoscelis adeliae)          146
Gentoo penguin (Pygoscelis papua)            119
Chinstrap penguin (Pygoscelis antarctica)     68
Name: count, dtype: int64

(152/344)*100

44.18604651162791

(124/344)*100

36.04651162790697

(68/344)*100

19.767441860465116