May 30, we spent our time coding to look at our specific questions. Cleaning up, sorting, filtering our data sets to better phrase our questions and look for patterns.
Towards the end of class time, Louie gave an introduction of statistical analyses in R. The first of the models Louie went over was the linear model.
#Simple linear model for the data set “d.s.merge”. The first part of the code is designated by response ~ terms. In this case, count.tot is the response variable being matched with block, date, and treatment terms. The linear model is then ‘saved’ as “m1”
m1 <- lm(count.tot ~ Block+Date.Collected+Treatment, d.s.merge)
This makes a simple linear model and does not output anything into the console for viewing. You can view the output with the “summary()” function.
While these statistics may be useful where the data fits to a linear model, those cases are likely outside three standard deviations of the normal distribution curve of data. Most data is best looked at with the next model.
The next model we spoke about was the ANOVA model.
#Using the previous “m1” made with the linear model, we can perform a simple ANOVA test with the following code.
An ANOVA is a good model for looking at multiple variables as you can look at responses between and within variables. This alone doesn’t tell us if any individual variable is significant, but whether there is an overall significance.
In order to determine if a variable (or term) is significant, we can perform a likelihood ratio test (drop one test) by dropping a term and seeing if it makes a significant difference.
#To determine if a factor is significant, say “treatment” we could remove the treatment effect from the model and save it as a new model using the linear model function we first used to define “m1” but excluding the “Treatment” term from the code. We will designate the new model as “m1.1”
m1.1 <- lm(count.tot ~ Block+Date.Collected, d.s.merge)
#Using “m1” and “m1.1” we can test the significance of the “Treatment” term by performing an ANOVA with both data sets.
The ANOVA test assumes that all terms are non-random. However, in our data, “Block” is random, and some other terms may also be considered random. We can use a mixed effects model to make note of the random terms, rather than using a simple linear model. This requires the lme4 library as below.
#lmer functions similar to the lm function but allows you to designate a term as random. In this case, we have Block designated as a categorical variable treated as a random effect, while date and treatment are linear fixed effects.
m3 <- lmer(count.tot ~ Date.Collected+Treatment+(1|Block), d.s.merge)
Another explanation for the lmer function can be found below, from Mike Lawrence on Stackexchange.
Say you have variable V1 predicted by categorical variable V2, which is treated as a random effect, and continuous variable V3, which is treated as a linear fixed effect. Using lmer syntax, simplest model (M1) is:
V1 ~ (1|V2) + V3
This model will estimate:
P1: A global intercept
P2: Random effect intercepts for V2 (i.e. for each level of V2, that level’s intercept’s deviation from the global intercept)
P3: A single global estimate for the effect (slope) of V3
The next most complex model (M2) is:
V1 ~ (1|V2) + V3 + (0+V3|V2)
This model estimates all the parameters from M1, but will additionally estimate:
P4: The effect of V3 within each level of V2 (more specifically, the degree to which the V3 effect within a given level deviates from the global effect of V3), while enforcing a zero correlation between the intercept deviations and V3 effect deviations across levels of V2.
This latter restriction is relaxed in a final most complex model (M3):
V1 ~ (1+V3|V2) + V3
In which all parameters from M2 are estimated while allowing correlation between the intercept deviations and V3 effect deviations within levels of V2. Thus, in M3, an additional parameter is estimated:
P5: The correlation between intercept deviations and V3 deviations across levels of V2
Usually model pairs like M2 and M3 are computed then compared to evaluate the evidence for correlations between fixed effects (including the global intercept).
Now consider adding another fixed effect predictor, V4. The model:
V1 ~ (1+V3*V4|V2) + V3*V4
P1: A global intercept
P2: A single global estimate for the effect of V3
P3: A single global estimate for the effect of V4
P4: A single global estimate for the interaction between V3 and V4
P5: Deviations of the intercept from P1 in each level of V2
P6: Deviations of the V3 effect from P2 in each level of V2
P7: Deviations of the V4 effect from P3 in each level of V2
P8: Deviations of the V3-by-V4 interaction from P4 in each level of V2
P9 Correlation between P5 and P6 across levels of V2
P10 Correlation between P5 and P7 across levels of V2
P11 Correlation between P5 and P8 across levels of V2
P12 Correlation between P6 and P7 across levels of V2
P13 Correlation between P6 and P8 across levels of V2
P14 Correlation between P7 and P8 across levels of V2