I wrote about mixed-level models before and I want to expand on it here. I sort-of finished Andrew Gelman’s Data Analysis Using Regression and Multilevel/Hierarchical Models and will continue to play with it. As far as I know Andrew uses the term multilevel models and avoids the terms fixed and random effect. This is a great book to have.
My next step in the next couple of months is to learn Bayesian Data Analysis since it is used with Multilevel/Hierarchical models.
Anyway, for this playbook I will continue to use sleepstudy dataset from lme4 package (which is used for fitting multilevel models). Sleepstudy is longitudinal study (n=18) over 10 days of sleep deprivation effects on reaction time.
library("lme4")
library("ggplot2")
library("googleVis")
library("stargazer")
library("sjPlot")
sleepstudy <- sleepstudy
As with the previous example we can show the “pooled” scatterplot where on x-axis are days and on y-axis is reaction time. Each dot is one individual’s reaction time (including repeated measures)
# "Pooled" (days ~ reaction, without groups) scatterplot and linear model
gg <- ggplot(sleepstudy, aes(x = Days, y = Reaction)) +
geom_smooth(method = "lm", se = FALSE) +
geom_point(alpha = 0.3, size = 3) +
theme_bw() +
scale_x_continuous(breaks = 0:9)
print(gg)
We can fit linear regression model over “pooled” data to get effect of days of sleep deprivation on reaction time
# Create pooled linear model and predictions
pooled.model <- lm(Reaction ~ Days, sleepstudy)
# Save the fitted values
sleepstudy$PooledPredictions <- fitted(pooled.model)
Reaction | |
Constant | 251.41^{***} |
(238.45, 264.36) | |
Days | 10.47^{***} |
(8.04, 12.89) | |
Observations | 180 |
R^{2} | 0.29 |
Residual Std. Error | 47.71 (df = 178) |
F Statistic | 71.46^{***} (df = 1; 178) |
Notes: | ^{***}Significant at the 1 percent level. |
^{**}Significant at the 5 percent level. | |
^{*}Significant at the 10 percent level. |
As can be seen, the (average) effect of days on reaction time is 10.47, which can be interpreted that for every day of sleep deprivation there is increase in reaction time for 10.47 ms.
Unfortunatelly, this is not a great model since it can predict only 28% of variances (R squared). Let’s plot the residuals (y-axis) versus fitted values (x-axis)
gg <- qplot(x = fitted(pooled.model), y = resid(pooled.model)) +
geom_smooth(method = "lm", se = FALSE) +
geom_point(alpha = 0.3, size = 3) +
theme_bw()
print(gg)