Title: | Automatic Direct Variable Selection via Interrupted Coefficient Estimation |
---|---|
Description: | Accurate point and interval estimation methods for multiple linear regression coefficients, under classical normal and independent error assumptions, taking into account variable selection. |
Authors: | L. Tazik [aut, cre], W.J. Braun [aut] |
Maintainer: | L. Tazik <[email protected]> |
License: | Unlimited |
Version: | 1.0 |
Built: | 2024-11-01 11:22:23 UTC |
Source: | https://github.com/cran/ADVICE |
Computes confidence intervals for one or more parameters in a fitted model. There is a default and a method for objects inheriting from class "qrs".
## S3 method for class 'QRS' confint(object, parm, level, ...)
## S3 method for class 'QRS' confint(object, parm, level, ...)
object |
a fitted model object from the QRS class. |
parm |
a specification of which parameters are to be given confidence intervals, either a vector of numbers or a vector of names. If missing, all parameters are considered. |
level |
a numeric value specifying the required confidence level. |
... |
additional argument(s) for the methods. |
This function computes t-based confidence intervals using n-p degrees of freedom, where n is the number of observations and p is the number of regression coefficients in the full model.
A 2-column matrix giving lower and upper confidence limits (corresponding to the given level) for each parameter. These will be labelled as (1-level)/2 and 1 - (1-level)/2 in
Ladan Tazik, W.J. Braun
ices.R
myRegressionData <- rmultreg(100, k=20, p=.1, sdnoise = 1) pairs(myRegressionData$data) out <- ices(y ~ ., data = myRegressionData$data) # fit model to simulated data confint(out) # calculate 95% confidence intervals for all coefficients myRegressionData$coefficients # compare with true coefficients
myRegressionData <- rmultreg(100, k=20, p=.1, sdnoise = 1) pairs(myRegressionData$data) out <- ices(y ~ ., data = myRegressionData$data) # fit model to simulated data confint(out) # calculate 95% confidence intervals for all coefficients myRegressionData$coefficients # compare with true coefficients
This function provides an alternative multiple regression fitting procedure which simultaneously estimates and selects variables. The resulting coefficient estimates will tend to be slightly biased, but in a sparse setting, they can be quite accurate. A full regression model is specified by the user, and the function usually returns coefficient estimates for a reduced model, i.e., a model for which some of the coefficient estimates are exactly 0.
ices(formula, data, model = TRUE, x = FALSE, y = FALSE, qr = TRUE)
ices(formula, data, model = TRUE, x = FALSE, y = FALSE, qr = TRUE)
formula |
a formula object specifying the full regression model. |
data |
a data frame containing observations on the response variable and the predictor variables. |
model , x , y , qr
|
logicals. If |
a QRS class object
coefficients |
a named numeric vector of coefficients |
residuals |
a numeric vector containing the response minus the fitted values. |
effects |
a numeric vector of containing the projections of the response variable under the orthogonal Q matrix coming from the QR decomposition of the model matrix. |
rank |
the numeric rank of the fitted linear model. |
fitted.values |
the estimated response values according to the fitted interrupted coefficient estimation selection regression model. |
sigma2 |
the estimated noise variance based on the n-p residual effects, where p is the size of the full model. |
std_error |
a numeric vector of standard errors. |
df.residual |
residual degrees of freedom. |
x |
a numeric matrix containing the model matrix. |
y |
a numeric vector containing the response variable values. |
qr |
the QR decomposition object coming from the model matrix (after re-ordering columns). |
coefOrder |
permutation of the sequence 1:p which gives the ascending order of the coefficients of the linear model object, as a result of the pre-screening. |
call |
the matched call. |
terms |
the terms object used. |
names |
a character vector containing the column names of the model matrix. |
model |
if requested (the default), the model frame used in the case of the full regression model. |
Ladan Tazik, W.J. Braun
lm.R
, QRS.R
myRegressionData <- rmultreg(50, k=10, p=.25, sdnoise = .5) pairs(myRegressionData$data) out <- ices(y ~ ., data = myRegressionData$data) # fit model to simulated data confint(out) # calculate 95 % confidence intervals for all coefficients myRegressionData$coefficients # compare with true coefficients
myRegressionData <- rmultreg(50, k=10, p=.25, sdnoise = .5) pairs(myRegressionData$data) out <- ices(y ~ ., data = myRegressionData$data) # fit model to simulated data confint(out) # calculate 95 % confidence intervals for all coefficients myRegressionData$coefficients # compare with true coefficients
This data frame contains the time (in weeks) between the initial symptoms (onset symptoms) and the decision time to visit a doctor in the case of 54 patients who eventually were diagnosed with multiple sclerosis. Interest centers on whether there are any factors which tend to be related to the delay time.
data(MSDecision)
data(MSDecision)
A data frame with 54 observations on the following 16 variables.
Delay
numeric, time in weeks
ClinicalDiseaseCourse
factor, 2 levels
CodedGender
factor, 2 levels, 1 = Male, 2 = Female
AgeAtOnset
numeric, age in years
OnsetSymptom1
factor, 4 levels
OnsetSymptom2
factor, 5 levels
OnsetSymptomSeverity
factor, 2 levels, 0 = Low, 1 = High
TriggerSymptom1
factor, 4 levels
TriggerSymptom2
factor, 4 levels
TriggerSymptomSeverity
factor, 2 levels, 0 = Low, 1 = High
FamilyHistory
factor, 2 levels, yes = there is MS in the family history
FearOfWorseningSymptoms
factor, 2 levels
MoreThanOneSymptom
factor, 2 levels
EffectonResponsibilities
factor, 2 levels, yes = the symptoms are having an effect on the individual
UncertainResponse
logical, TRUE = recorded delay time is not accurate
The levels of the Clinical Disease Course variable are: Clinically Isolated Syndrome and Relapse-Remitting.
xy <- MSDecision xy$sensoryOnset1 <- factor(xy$OnsetSymptom1=="SENSORY") xy$brainstemOnset2 <- factor(xy$OnsetSymptom2=="BRAINSTEM") xy$sensoryTrigger1 <- factor(xy$TriggerSymptom1=="SENSORY") xy$brainstemTrigger2 <- factor(xy$TriggerSymptom2=="BRAINSTEM") xy <- xy[, -c(5, 6, 8, 9, 15)] xy[,1]<-log(xy[,1]) names(xy)[1] <- "y" out <- ices(y ~ ., data = xy) summary(out) plot(out) plot(out, normqq=TRUE) plot(out, scaleloc=TRUE)
xy <- MSDecision xy$sensoryOnset1 <- factor(xy$OnsetSymptom1=="SENSORY") xy$brainstemOnset2 <- factor(xy$OnsetSymptom2=="BRAINSTEM") xy$sensoryTrigger1 <- factor(xy$TriggerSymptom1=="SENSORY") xy$brainstemTrigger2 <- factor(xy$TriggerSymptom2=="BRAINSTEM") xy <- xy[, -c(5, 6, 8, 9, 15)] xy[,1]<-log(xy[,1]) names(xy)[1] <- "y" out <- ices(y ~ ., data = xy) summary(out) plot(out) plot(out, normqq=TRUE) plot(out, scaleloc=TRUE)
By default, this function plots residuals from the interrupted coefficient estimation selection model versus the corresponding fitted values. Alternatively, options to obtain a normal QQ plot or a scale-location plot of the residuals are also available.
## S3 method for class 'QRS' plot(x, normqq = FALSE, scaleloc = FALSE, ...)
## S3 method for class 'QRS' plot(x, normqq = FALSE, scaleloc = FALSE, ...)
x |
an object of QRS class |
normqq |
a logical value, if TRUE, a normal QQ plot of the residuals will be plotted. |
scaleloc |
a logical value, if TRUE, a scale-location plot of the residuals will be plotted. |
... |
arguments to be passed to plot methods, such as graphical parameters (see par). |
No return value
Ladan Tazik, W.J. Braun
plot.lm
Given a design matrix and a response variable, create a list which has the fitted model, estimated regression coefficents and standard error based on interrupted coefficient estimation selection.
QRS(x, y, Nsims)
QRS(x, y, Nsims)
x |
a numeric matrix; usually the model matrix for a multiple regression model. |
y |
a numeric vector; usually the values of the response variable in the regression model. |
Nsims |
number of simulation runs required for estimating the regression coefficient standard errors. |
The interrupted coefficient estimation selection procedure begins with consideration of a full model whereby a regression model with p terms is fit to n observations on a response and p-1 predictor variables. The variables are pre-screened by application of lm in order to cast the columns of the model matrix in increasing order of the p-values observed for the corresponding regression coefficients. The estimation then proceeds by the usual QR decomposition of the model matrix but is interrupted at the effects stage. The effects are classified as "different from 0" or "not different from 0", according to what is essentially a control chart procedure. The effects that are "not different from 0" are replaced with true 0's and the nonzero effects are left alone. The estimation is completed by backward-substitution solution of the zero and nonzero effects using the upper triangular matrix from the QR decomposition. The result is a set of coefficient estimates that will tend to be more accurate in a mean-squared-error sense than the original lm coefficient estimates, especially when some or all of the regression coefficients are 0. Coefficient standard error estimates are obtained by a parametric bootstrap procedure. This method is not recommended for strongly non-normal data, or where there is substantial multicollinearity.
a QRS class object
coefficients |
a named numeric vector of coefficients |
residuals |
a numeric vector containing the response minus the fitted values. |
effects |
a numeric vector of containing the projections of the response variable under the orthogonal Q matrix coming from the QR decomposition of the model matrix. |
rank |
the numeric rank of the fitted linear model. |
fitted.values |
the estimated response values according to the fitted interrupted coefficient estimation selection regression model. |
sigma2 |
the estimated noise variance based on the n-p residual effects, where p is the size of the full model. |
std_error |
a numeric vector of standard errors. |
qr |
the QR decomposition object coming from the model matrix (after re-ordering columns). |
df.residual |
he residual degrees of freedom. |
model |
if requested (the default), the model frame used. |
x |
a numeric matrix containing the model matrix. |
y |
a numeric vector containing the response variable values. |
coefOrder |
A permutation of the sequence 1:p which gives the ascending order of the coefficients of the linear model object, as a result of the pre-screening. |
names |
a character vector containing the column names of the model matrix. |
Ladan Tazik, W.J. Braun
ices.R
, lm.R
Values of any number of predictor variables and a single response variable are simulated according to a model with randomly generated coefficients. Values of each predictor are simulated independently from standard normal distributions. The regression coefficients are generated independently from a uniform distribution on the interval (minimum, maximum), and each coefficient is multiplied by a Bernoulli (p) variate, independent of the other coefficients. This results in some of the coefficients being zeroed out. Noise is added to the regression response according to independent t variates with degrees of freedom equal to dfnoise.
rmultreg(n, k = 1, minimum = 0, maximum = 1, p = 0.5, dfnoise = 100, sdnoise = 1)
rmultreg(n, k = 1, minimum = 0, maximum = 1, p = 0.5, dfnoise = 100, sdnoise = 1)
n |
number of observations. |
k |
number of predictor variables in addition to the intercept. |
minimum |
minimum possible value for the regression coefficients, apart, possibly, from some zeroes. |
maximum |
maximum possible value for the regression coefficients, apart, possibly, from some zeroes. |
p |
probability that a given regression coefficient remains nonzero. |
dfnoise |
degrees of freedom for t-distributed additive noise. |
sdnoise |
standard deviation of the noise term. |
a list containing
data |
a dataframe containing n observations on k predictor variables and a response y. |
coefficients |
a numeric vector containing the true regression coefficients. |
W.J. Braun
myRegressionData <- rmultreg(50, k=3, p=.5, sdnoise = .25) pairs(myRegressionData$data) out <- ices(y ~ ., data = myRegressionData$data) # fit model to simulated data confint(out) # calculate 95% confidence intervals for all coefficients myRegressionData$coefficients # compare with true coefficients
myRegressionData <- rmultreg(50, k=3, p=.5, sdnoise = .25) pairs(myRegressionData$data) out <- ices(y ~ ., data = myRegressionData$data) # fit model to simulated data confint(out) # calculate 95% confidence intervals for all coefficients myRegressionData$coefficients # compare with true coefficients
summary method for class "qrs"
## S3 method for class 'QRS' summary(object, ...)
## S3 method for class 'QRS' summary(object, ...)
object |
an abject of class "qrs" |
... |
additional arguments affecting the summary produced. |
The function computes and returns a list of summary statistics of the fitted linear model given in the QRS object.
Residuals |
the weighted residuals, the usual residuals rescaled by the square root of the weights specified in the call to qrs |
Coefficients |
a p x 4 matrix with columns for the estimated coefficient, its standard error, z-score and corresponding (two-sided) probabilities |
df |
degrees of freedom |
residualStandardError |
Residual standard error |
Ladan Tazik, W.J.Braun
QRS.R
myRegressionData <- rmultreg(25, k=5, p=.15, sdnoise = .25) pairs(myRegressionData$data) out <- ices(y ~ ., data = myRegressionData$data) # fit model to simulated data summary(out) # estimates and standard errors for all coefficients myRegressionData$coefficients # compare with true coefficients
myRegressionData <- rmultreg(25, k=5, p=.15, sdnoise = .25) pairs(myRegressionData$data) out <- ices(y ~ ., data = myRegressionData$data) # fit model to simulated data summary(out) # estimates and standard errors for all coefficients myRegressionData$coefficients # compare with true coefficients