Exploratory Data Analysis for Maternal and Child Health in Indiana: A Decadal Perspective (2010-2020)

This exploratory data analysis delves into the intricacies of maternal and infant health metrics in Indiana from 2010 to 2020. The dataset comprises 813,837 records and spans a gamut of health indicators, demographic features, and regional descriptors. The investigation revolves around the variables MOTHER_RESID_COUNTY_TYPE and MOTHER_AGE_GRP to dissect the data’s multidimensional aspects. Employing descriptive statistics, data visualizations, and correlation analyses, this study aims to offer nuanced insights that could inform public health policy, healthcare programming, and further academic research in Indiana.

Introduction

Maternal and child health outcomes serve as pivotal indicators for assessing the quality of healthcare services, community well-being, and societal progress. Indiana, like many other states, faces diverse challenges in maternal and child healthcare, which may vary based on factors such as age, geography, and socioeconomic conditions. This study utilizes a dataset spanning a decade to identify key patterns and correlations that could serve as foundational knowledge for public health initiatives and policy interventions.

Methodology

Data Source and Preprocessing

The dataset utilized in this study comprises 19 variables, ranging from demographic data to specific health indicators for both the mother and child. The dataset went through rigorous cleaning and preprocessing steps to ensure its integrity. No missing values were found in the primary variables of interest, namely MOTHER_AGE_GRP and MOTHER_RESID_COUNTY_TYPE. Categorical variables were numerically encoded to facilitate the computation of correlation matrices.

Analytical Tools and Techniques

Python’s data science libraries—Pandas for data wrangling, Matplotlib and Seaborn for visualization, and Scikit-learn for preliminary machine learning models—were employed in this study.

Results and Discussion

Descriptive Statistics and Visualizations

Demographic Overview

The largest concentration of mothers falls within the “25-34 Years” age group, making it a critical demographic for targeted healthcare initiatives. This is consistent across urban, rural, and mixed counties. However, urban counties show a disproportionately higher number of births across all age brackets, potentially alluding to higher healthcare needs due to population density.

Health Indicators and Risk Factors

Correlational analyses indicated several significant associations:

Maternal Age and Number of Births

A weak negative correlation of -0.04 was found between maternal age and the number of births by each mother. Although this association is statistically weak, it subtly hints that younger mothers may be experiencing more frequent births, a factor that could have implications for family planning and maternal health programs.

Marital Status and Age

A more prominent negative correlation of -0.38 exists between marital status and maternal age, suggesting that younger mothers are generally less likely to be married at the time of childbirth. This could be an important consideration for social support programs aimed at younger mothers.

Smoking and Pregnancy

A potent correlation coefficient of 0.86 was observed between general smoking habits (SMOKING_IND) and smoking during pregnancy (SMOKING_DURING_PREG_IND). This alarming finding indicates that women who smoke are highly likely to continue this risky behavior during pregnancy, a scenario that demands immediate attention from public health officials.

Child Health Metrics

A correlation coefficient of -0.17 between low birth weight and child survival rates underscores the need for neonatal care interventions. Although the relationship is not extremely strong, it is statistically significant enough to warrant attention from healthcare providers.

Conclusion

The EDA generated a plethora of insights into maternal and child health, highlighting demographic distributions and identifying critical correlations among key health indicators. While some findings, such as the prevalence of smoking during pregnancy, raise significant public health concerns, others offer a nuanced understanding of the demographic landscape in Indiana. These findings lay the groundwork for targeted healthcare interventions, policy formulation, and further academic research.

Limitations and Future Directions

The study is constrained by its inability to include additional demographic variables, such as income and education, which could provide a more rounded understanding of the underlying factors affecting maternal and child health. Future research should aim to augment this data with additional variables and employ more advanced statistical methods, such as logistic regression and machine learning algorithms, for predictive analytics.

Acknowledgments

We express our gratitude towards healthcare institutions and data repositories in Indiana for making this valuable data accessible for research purposes. Code for this analysis will be posted on my GitHub account.

Indiana State Department of Health. (n.d.). Indiana Births and Infant Deaths. Indiana Public Health Data Hub. Retrieved July 14, 2023, from https://hub.mph.in.gov/dataset/indiana-births-and-infant-deaths