Regression Confidence Band With a Mix of Continuous and dummy Variables
In the previous blog post, I demonstrated how you can draw effect displays to render regression models more intelligible to yourself and to your audience. These effect displays did not contain information about the uncertainty inherent to estimating regression models, however. To that end, this blog post demonstrates how you can add confidence bands to effect displays for multiple regression, logistic regression, and logistic mixed-effects models, and explains how these confidence bands are constructed.
Multiple regression
The Data, Model, and Effect display subsections are identical to those in the previous post, so I left out the accompanying text. In the following subsections, I first show how you can easily add confidence bands to such effect displays and then explain how this method works under the hood.
Data
Model
Effect display
Add confidence bands using predict()
's built-in method
For regression models fitted using R
's lm()
function, confidence bands can easily be constructed using the predict()
function we already used to calculate the outcome values for the predictor variables in the previous subsection. In addition to specifying the model and a data frame containing the combinations of predictor variables for which you want to generate predictions, you need to explicitly set the parameter interval
to "confidence"
. If you don't want to construct 95% confidence bands (the default), you can specify the desired confidence level using the level
parameter. Below I set the desired confidence level to 90%, just to demonstrate the parameter's use.
When used in this fashion, predict()
generates a matrix with 3 columns. The first column contains the predictions themselves, i.e., what you'd get if you didn't specify interval
or level
, whereas the second and third column contain the lower (lwr
) and upper (upr
) boundaries of the confidence intervals about the predictions. The code below adds these two latter columns to nd1_eng
.
We can now draw ribbons showing the 90% confidence band around the lines showing the predictions. Using ggplot
, you can use either geom_ribbon()
or geom_smooth()
to this end, but I find geom_smooth()
easier.
How does predict()
construct confidence bands?
When you want to construct confidence bands for more complex models, such as the logistic mixed-effects model we'll discuss below, the predict()
function is of little help and you'll need to construct them 'by hand'. For this, it's useful to know how predict()
works under the hood so that we'll be able to apply the same principles to more complex models.
The main idea behind this approach is that you can compute the standard errors of the predicted values on the basis of (a) the combinations of predictor variables for which the predictions were computed and (b) the variance–covariance matrix of the estimated parameters of the model you fitted.
As for (a), we need to construct the model matrix for the data frame with the combinations of predictor variables (nd1_eng
). The model matrix is just an array of numbers which, when multiplied with the estimated model coefficients, yields the model's predictions. This matrix is very similar to the data frame with the predictor combinations itself, except that it contains an additional column containing 1's (representing the intercept in the model) and one for which the n.Sex
and c.Raven
values have been multiplied (representing the interaction between these two predictors in the model). Since all c.Raven
values in nd1_eng
are set to 0, this column, too, only contains 0's.
If you multiply this matrix by the estimated model coefficients, you get the predicted outcomes for these predictor combinations:
It so happens that the variances of the products of a matrix (such as mm
) and a random vector (such as coef(mod1)
) can be computed by multiplying the matrix by the variance–covariance matrix of the random vector and then by the transpose of the matrix. This is quite a mouthful, but what it means is that if we get the variance–covariance matrix of the model coefficients, which summarises the uncertainty about the model coefficients and their relationships, we can compute the variances of the predicted outcomes. Luckily, you can easily extract the variance–covariance matrix and calculate this three-way product:
In the present case, this yields a 50-by-50 matrix that has the variances of the 50 predicted outcomes on the main diagonal. The square roots of these variances are the standard errors of the predicted outcomes. Let's extract these:
These standard errors, in turn, can be used to compute confidence intervals about the predicted outcomes by multiplying them by the appropriate t-value. For a symmetric 90% confidence interval, we need the t-value corresponding to the 95th percentile of the t-distribution with the model's residual degrees of freedom (90% of the distribution is contained between the 5th and the 95th percentile; the t-value corresponding to the 5th percentile has the same absolute value as the one corresponding to the 95th but is negative, so we take the t-value corresponding to the 95th percentile):
Multiply the standard errors by this t-value (1.65) and add or subtract the product from the predicted outcomes to get the confidence intervals about the predictions:
As you can see, these manually computed boundaries correspond to the boundaries computed using predict()
:
By stringing together these confidence intervals, you get a confidence band. Or if you want to be more precise, a pointwise confidence band. What this is means is that the coverage probability of the confidence band is (in this case) 90% for each point on the line—which makes sense, because that's how the confidence band was constructed: by stringing together 90% confidence intervals.
(There also exists another type of confidence band: simultaneous confidence bands. I'm not going to discuss these here.)
The confidence bands for the Raven-by-sex interaction is left as an exercise to the reader :)
Some miscellaneous points about confidence bands
- Confidence bands are narrowest for the mean predictor value.
- If you fix non-focal predictors at a typical value for the purposes of drawing an effect display, fixing them at their sample mean yields narrower confidence bands than fixing them at any other value.
Logistic model
We can similarly construct confidence bands for logistic regression models. See the previous post for details about the data and the model, which I'll skip here.
Data, model and effect display
Note that the predicted values are expressed in log-odds, not in probabilities (type = "link"
).
Constructing confidence bands using predict()
For generalised linear models, such as logistic regression models, predict()
doesn't return confidence intervals for predicted values, but it can return standard errors. We can use these to construct the confidence bands ourselves.
These standard errors are expressed in log-odds. To construct confidence intervals, we need to find the appropriate multiplier. For ordinary least-squares regression (cf. above), there is an exact solution to this problem, namely by referring to a t-distribution with the appropriate degrees of freedom. For logistic regression, we have to make do with an approximation based on the normal distribution.
Now multiply the standard errors by this multiplier and add/subtract them from the predicted outcomes:
This yields confidence intervals in log-odds. Probabilities are easier to interpret, though, so we run these through the logistic function to transform the log-odds to probabilities:
Stringing the confidence intervals together, we get the 90% pointwise confidence band:
Mixed-effects logistic regression
Once you get the hang of it, constructing confidence bands for the fixed effects in a mixed-effects model isn't really that much more difficult than for the other models.
For details about the data, the model specification and the basic effect display, please refer to the previous blog post.
Data, model and effect display
Note that this time, the predictions are generated in log-odds, not in probabilities (type = "link"
).
Construct confidence band manually
predict()
doesn't generate confidence intervals or standard errors for mixed-effects models, so we'll have to compute them manually. The method is the same as for the ordinary least-squares model above; see http://glmm.wikidot.com/faq#predconf for a summary.
First, construct the model matrix for the combinations of predictor variables we generated the outcomes for.
Then extract the variance–covariance matrix of the model and use it to compute the standard errors for the predictions (in log-odds):
Now find the multiplier appropriate for 90% confidence intervals and carry out the same calculation as the previous two times:
Run the predicted values and their confidence bands through the logistic function to transform them to probabilities:
And plot:
Note that this confidence band is based solely on the fixed-effects. As such, it should be taken to reflect the uncertainty about the average participant's regression curve; the regression curves of individual participants can differ wildly from this average.
This post hardly makes for captivating reading, but I hope it helps some of you who want to visualise the uncertainty in their regression models.
welliverhenew1967.blogspot.com
Source: https://janhove.github.io/reporting/2017/05/12/visualising-models-2
0 Response to "Regression Confidence Band With a Mix of Continuous and dummy Variables"
Enregistrer un commentaire