Monthly Provisional Counts Of Deaths

In this blog I have used Python packages in the Rstudio and done the EDA as well as Summary Statistics on the cause of the death in the united state of america in the year 2019 and 2020.

Jaykumar patel
2021-05-06

Analysis of the Monthly Provisional Death counts

Importing Required Libraries

import pandas as pd
import numpy as np

Reading Data

death = pd.read_csv('Data/causes_of_death.csv', index_col= 0)
death.head()
   Date.Of.Death.Year  ...  Jurisdiction.of.Occurrence
1                2019  ...               United States
2                2019  ...               United States
3                2019  ...               United States
4                2019  ...               United States
5                2019  ...               United States

[5 rows x 23 columns]

Summary Statistics for Numerical Columns

AllCause

death['AllCause'].describe()
count     2849.000000
mean      2280.159354
std       6028.864306
min         10.000000
25%         86.000000
50%        271.000000
75%       1498.000000
max      53242.000000
Name: AllCause, dtype: float64

This column tells us about total number of people died in the month of particular year in the United State of America.

NaturalCause

death['NaturalCause'].describe()
count     2717.000000
mean      2194.944056
std       5945.745418
min          0.000000
25%         70.000000
50%        216.000000
75%       1294.000000
max      52054.000000
Name: NaturalCause, dtype: float64

This column tells us about total number of people died in the month of particular year in the United State of America due to natural cause.

Septicemia (A40-A41)

death['Septicemia..A40.A41.'].describe()
count    1736.000000
mean       44.580069
std        88.269978
min         0.000000
25%         0.000000
50%        10.000000
75%        36.000000
max       484.000000
Name: Septicemia..A40.A41., dtype: float64

This column tells us about number of people died in the month of particular year in the United State of America due to Septicemia (A40-A41) disease.

Malignant neoplasms (C00-C97)

death['Malignant.neoplasms..C00.C97.'].describe()
count    2249.000000
mean      549.167185
std      1277.215975
min         0.000000
25%        20.000000
50%        69.000000
75%       338.000000
max      6498.000000
Name: Malignant.neoplasms..C00.C97., dtype: float64

This column tells us about total number of people died in the month of particular year in the United State of America due to Malignant neoplasms.

For the remaining columns

Summary Statistics for Categorical Data

Race/Ethnicity

print(death.groupby(['Race.Ethnicity']).size())
Race.Ethnicity
Hispanic                                         500
Non-Hispanic American Indian or Alaska Native    500
Non-Hispanic Asian                               500
Non-Hispanic Black                               500
Non-Hispanic White                               500
Other                                            500
dtype: int64

Sex

print(death.groupby(['Sex']).size())
Sex
F         720
Female    780
M         720
Male      780
dtype: int64

Here also,
* Our data have uniform distribution when we look for the sex. * Having uniform distribution in the data is help us to see how each category affects

AgeGroup

print (death.groupby(['AgeGroup']).size())
AgeGroup
0-4 years            300
15-24 years          300
25-34 years          300
35-44 years          300
45-54 years          300
5-14 years           300
55-64 years          300
65-74 years          300
75-84 years          300
85 years and over    300
dtype: int64

Same as above data column it also have uniform distribution so if we want to train our model to predict the cause of the death.

We can use following columns:

Exploratory Data Analysis Numeric Data

Plotting histogram for AllCause Column

py$death %>% ggplot(aes(AllCause))+    
  geom_histogram(aes(y = stat(density)), color = "#13B4FA",fill = "#FF6F91", bins = 40) +
    geom_density(fill = "#845EC2", alpha = 0.5, color = NaN)+
  labs(title = "Distribution of Total Death",
         x = "Total Death",
         y = "density") +
    theme_minimal()+
  themes()

Removing Skewness

p1 <- py$death %>%
    ggplot(aes(AllCause)) +
    geom_histogram(aes(y = stat(density)),bins = 28, color = "#13B4FA",fill = "#FF6F91") +
    geom_density(fill = "#845EC2", alpha = 0.5, color = NaN)+
    labs(title = "Total Death",
         x = "Total Death",
         y = "density") +
    theme_minimal() + themes()

p2 <- py$death %>%
    ggplot(aes(log10(AllCause))) +
    geom_histogram(aes(y = stat(density)),bins = 28, color = "#13B4FA",fill = "#FFC75F") +
    geom_density(fill = "#845EC2", alpha = 0.5, color = NaN)+
    labs(title = "Total Death (log10 based)",
         x = "Total Death",
         y = "density") +
    theme_minimal()+ themes()

p3 <- py$death %>%
    ggplot(aes(sqrt(AllCause))) +
    geom_histogram(aes(y = stat(density)),bins = 28, color = "#13B4FA",fill = "#845EC2") +
    geom_density(fill = "#845EC2", alpha = 0.5, color = NaN)+
    labs(title = "Total Death(Sqrt. based)",
         x = "Total Death",
         y = "density") +
    theme_minimal() + themes()

grid.arrange(p1,p2,p3, ncol = 2)

Checking for the outliers for All Cause column

py$death %>% 
  ggplot(aes(y = AllCause))+
  geom_boxplot(outlier.colour = "#651a34")+
  theme_minimal()+
  themes()

Outliears after applying Log

py$death %>% 
  ggplot(aes(y = log10(AllCause)))+
  geom_boxplot(outlier.colour = "#651a34")+
  theme_minimal()+
  themes()

Conclusion of numerical data exploration

Exploratory Data Analysis chategorical Data

Plotting Bar Graph for AgeGroup column

Age <- py$death %>% 
          select(AgeGroup) %>% 
          group_by(AgeGroup) %>% 
          summarize(frequency = n())

Age %>% 
  ggplot(aes(AgeGroup,frequency))+
  geom_col(stat = "identity", fill = "#651a34" )+
  coord_flip()+
  theme_minimal()+
  labs( title = "Distribution of Age Group"
  )+
  themes()

Citation

For attribution, please cite this work as

patel (2021, May 6). Jaykumar Patel: Monthly Provisional Counts Of Deaths. Retrieved from https://jaykumar-patel.netlify.app/python/2021-05-06-monthly-provisional-counts-of-deaths/

BibTeX citation

@misc{patel2021monthly,
  author = {patel, Jaykumar},
  title = {Jaykumar Patel: Monthly Provisional Counts Of Deaths},
  url = {https://jaykumar-patel.netlify.app/python/2021-05-06-monthly-provisional-counts-of-deaths/},
  year = {2021}
}