Exploring diabetes indicators across the US

First Post

This material is from a learning project. I wanted to learn some rmarkdown and how to work with the sf package and ggplot2’s geom_sf(). It is not a polished write up (or analysis) but it’s a good first post to help me set up a workflow for posting to this site. All the R code can be found on GitHub.

County level deprivation index

Details about the index can be found here. In short, estimates are collected from the American Community Survey and a principal component analysis extracts a deprivation score for each observation. Higher index scores represent higher area deprivation relative to all other counties in the US.

For this project, the index was calculated using 5-year ACS estimates from 2013, at the county-level, to link to the available diabetes dataset.

plot of chunk unnamed-chunk-3

Diabetes Indicators

National data on diabetes indciators are available from the CDC. Indicator estimates are the most up to date as of 2013 and available at the county-level only. Indicators (age-adjusted) are: 1. prevalence of diagnosed diabetes, 2. prevalence of obesity, and 3. prevalence of leisure-time physical inactivity, hereafter just inactivity.

The spatial distribution of diabetes prevalence in 2013 (below) parallels that of 2007, which led to the recognition of a “diabetes belt” in the southern United States.

plot of chunk unnamed-chunk-5

Data Analysis

Obesity and physical inactivity are well-known risk factors for developing diabetes. Therefore, in order to assess whether area deprivation is associated with diabetes prevalence, independent of obesity and inactivity, a blocked multiple regression was fitted. First we estimated the effects of obesity and inactivity; then, the deprivation variable was introduced.

Regression results

library(stargazer, quietly = TRUE)

stargazer(lm1,lm2, type ="html")
Dependent variable:
DiabPrev
(1)(2)
ObPrev0.188***0.165***
(0.008)(0.007)
InactPrev0.192***0.132***
(0.007)(0.007)
PC10.740***
(0.031)
Constant-0.912***1.311***
(0.166)(0.180)
Observations3,1423,142
R20.6010.661
Adjusted R20.6000.660
Residual Std. Error1.383 (df = 3139)1.275 (df = 3138)
F Statistic2,361.260*** (df = 2; 3139)2,035.008*** (df = 3; 3138)
Note:*p<0.1; **p<0.05; ***p<0.01

Increases in county-level deprivation predict increases in diabetes prevalence.

plot of chunk unnamed-chunk-7

plot of chunk unnamed-chunk-8

Interaction Term

Results summary

## MODEL INFO:
## Observations: 3142 (79 missing obs. deleted)
## Dependent Variable: DiabPrev
## Type: OLS linear regression 
## 
## MODEL FIT:
## F(4,3137) = 1668.00, p = 0.00
## R² = 0.68
## Adj. R² = 0.68 
## 
## Standard errors: OLS
##             Est. S.E. t val.    p    
## (Intercept) 9.50 0.02 400.02 0.00 ***
## InactPrev   0.14 0.01  19.71 0.00 ***
## ObPrev      0.16 0.01  22.99 0.00 ***
## PC1         0.68 0.03  22.08 0.00 ***
## ObPrev:PC1  0.06 0.00  13.90 0.00 ***
## 
## Continuous predictors are mean-centered.

plot of chunk unnamed-chunk-10

Spatial Auto-correlation of county prevalence

## 
## 	Moran I test under randomisation
## 
## data:  MapSP$diabetes  
## weights: nb2listw(neighbors) 
## omitted: 1543, 3118   
## 
## Moran I statistic standard deviate = 67.205, p-value < 2.2e-16
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic       Expectation          Variance 
##      0.6962110037     -0.0003185728      0.0001074177
MapSP$diabetes <- ifelse(is.na(MapSP$diabetes), 0, MapSP$diabetes)
moran.plot(MapSP$diabetes,nb2listw(neighbors), zero.policy = FALSE, 
           labels = FALSE,main=c(" "),
           xlab="Diabetes Prevalence",ylab = "Spatially Lagged Diabetes Prevalence")

plot of chunk unnamed-chunk-12

Written on September 13, 2018