Title: | Statistical Methods for Interval-Censored Data |
---|---|
Description: | Functions that provide statistical methods for interval-censored (grouped) data. The package supports the estimation of linear and linear mixed regression models with interval-censored dependent variables. Parameter estimates are obtained by a stochastic expectation maximization algorithm. Furthermore, the package enables the direct (without covariates) estimation of statistical indicators from interval-censored data via an iterative kernel density algorithm. Survey and Organisation for Economic Co-operation and Development (OECD) weights can be included into the direct estimation (see, Walter, P. (2019) <doi:10.17169/refubium-1621>). |
Authors: | Paul Walter |
Maintainer: | Paul Walter <[email protected]> |
License: | GPL-2 |
Version: | 1.1.5 |
Built: | 2025-02-03 04:41:38 UTC |
Source: | https://github.com/chiquadrat/smicd |
Exam scores of 4,059 students from 65 schools in Inner London, as in
Exam
.
A data frame with 4059 observations with the following 10 variables:
School ID - a factor.
Exam score.
School gender - a factor. Levels are mixed, boys, and girls.
School average of intake score.
Student level Verbal Reasoning (VR) score band at intake - a factor. Levels are bottom 25%, mid 50%, and top 25%
Band of student's intake score - a factor. Levels are bottom 25%, mid 50% and top 25%
Standardised LR test score.
Sex of the student - levels are F and M.
School type - levels are Mxd and Sngl.
Student id (within school) - a factor
Goldstein, H., Rasbash, J., et al (1993). A multilevel analysis of school examination results. Oxford Review of Education 19: 425-433
The function applies an iterative kernel density algorithm for the estimation of a variety of statistical indicators (e.g. mean, median, quantiles, gini) from interval-censored data. The estimation of the standard errors is facilitated by a non-parametric bootstrap.
kdeAlgo( xclass, classes, threshold = 0.6, burnin = 80, samples = 400, bootstrap.se = FALSE, b = 100, bw = "nrd0", evalpoints = 4000, adjust = 1, custom_indicator = NULL, upper = 3, weights = NULL, oecd = NULL )
kdeAlgo( xclass, classes, threshold = 0.6, burnin = 80, samples = 400, bootstrap.se = FALSE, b = 100, bw = "nrd0", evalpoints = 4000, adjust = 1, custom_indicator = NULL, upper = 3, weights = NULL, oecd = NULL )
xclass |
interval-censored values; factor with ordered factor values,
as in |
classes |
numeric vector of classes; Inf as last value is allowed,
as in |
threshold |
used for the Head-Count Ratio and Poverty Gap, default is 60%
of the median e.g. |
burnin |
burn-in sample size, as in |
samples |
sampling iteration size, as in |
bootstrap.se |
if |
b |
number of bootstrap iterations for the estimation of the standard errors |
bw |
bandwidth selector method, defaults to "nrd0", as in
|
evalpoints |
number of evaluation grid points, as in
|
adjust |
the user can multiply the bandwidth by a certain factor such
that bw=adjust*bw as in |
custom_indicator |
a list of functions containing the indicators to be
additionally calculated.
Such functions must only depend on the target variable |
upper |
if the upper bound of the upper interval is |
weights |
any kind of survey or design weights that will be used for the weighted estimation of the statistical indicators |
oecd |
weights for equivalized household size |
The statistical indicators are estimated using pseudo samples as
proxy for the interval-censored variable. The object resultX
returns the
pseudo samples for each iteration step of the KDE-algorithm.
An object of class "kdeAlgo" that provides estimates for statistical indicators
and optionally, corresponding standard error estimates. Generic
functions such as, print
,
and plot
have methods that can be used
to obtain further information. See kdeAlgoObject
for a description
of components of objects of class "kdeAlgo".
Walter, P. (2019). A Selection of Statistical Methods for Interval-Censored
Data with Applications to the German Microcensus, PhD thesis,
Freie Universitaet Berlin
Groß, M., U. Rendtel, T. Schmid, S. Schmon, and N. Tzavidis (2017).
Estimating the density of ethnic minorities and aged people in Berlin:
Multivariate
Kernel Density Estimation applied to sensitive georeferenced administrative
data
protected via measurement error. Journal of the Royal Statistical Society:
Series A
(Statistics in Society), 180.
dclass
, print.kdeAlgo
,
plot.kdeAlgo
## Not run: # Generate data x <- rlnorm(500, meanlog = 8, sdlog = 1) classes <- c(0, 500, 1000, 1500, 2000, 2500, 3000, 4000, 5000, 6000, 8000, 10000, 15000, Inf) xclass <- cut(x, breaks = classes) weights <- abs(rnorm(500, 0, 1)) oecd <- rep(seq(1, 6.9, 0.3), 25) # Estimate statistical indicators with default settings Indicator <- kdeAlgo(xclass = xclass, classes = classes) # Include custom indicators Indicator_custom <- kdeAlgo( xclass = xclass, classes = classes, custom_indicator = list(quant5 = function(y, threshold) { quantile(y, probs = 0.05) }) ) # Indclude survey and oecd weights Indicator_weights <- kdeAlgo( xclass = xclass, classes = classes, weights = weights, oecd = oecd ) ## End(Not run)
## Not run: # Generate data x <- rlnorm(500, meanlog = 8, sdlog = 1) classes <- c(0, 500, 1000, 1500, 2000, 2500, 3000, 4000, 5000, 6000, 8000, 10000, 15000, Inf) xclass <- cut(x, breaks = classes) weights <- abs(rnorm(500, 0, 1)) oecd <- rep(seq(1, 6.9, 0.3), 25) # Estimate statistical indicators with default settings Indicator <- kdeAlgo(xclass = xclass, classes = classes) # Include custom indicators Indicator_custom <- kdeAlgo( xclass = xclass, classes = classes, custom_indicator = list(quant5 = function(y, threshold) { quantile(y, probs = 0.05) }) ) # Indclude survey and oecd weights Indicator_weights <- kdeAlgo( xclass = xclass, classes = classes, weights = weights, oecd = oecd ) ## End(Not run)
An object of class "kdeAlgo" that represents the estimated
statistical indicators and the estimated standard errors.
Objects of this class have methods for the generic functions
print
and plot
.
An object of class "kdeAlgo" is a list containing at least the following components.
Point_estimate |
the estimated statistical indicators: Mean, Gini, Head-Count Ratio, Quantiles (10%, 25%, 50%, 75%, 90%), Poverty-Gap, Quintile-Share Ratio and if specified the selected custom indicators. |
Standard_Error |
if |
Mestimates |
kde object containing the corrected density estimate,
as in |
resultDensity |
estimated density for each iteration,
as in |
resultX |
true latent values X estimates,
as in |
xclass |
classified values; factor with ordered factor values,
as in |
gridx |
grid on which density is evaluated,
as in |
classes |
classes; Inf as last value is allowed,
as in |
burnin |
burn-in sample size,
as in |
samples |
sampling iteration size,
as in |
Point_estimates.run |
the estimated statistical indicators: Mean, Gini, Head-Count Ratio, Quantiles (10%, 25%, 50%, 75%, 90%), Poverty-Gap, Quintile-Share Ratio and if specified the selected custom indicators for each iteration run of the KDE-algorithm |
oecd |
the weights used for the estimation of the equivalised household income |
weights |
any kind of survey or design weights that will be used for the weighted estimation of the statistical indicators |
upper |
if the upper bound of the upper interval is |
Walter, P. (2019). A Selection of Statistical Methods for Interval-Censored
Data with Applications to the German Microcensus, PhD thesis,
Freie Universitaet Berlin
Groß, M., U. Rendtel, T. Schmid, S. Schmon, and N. Tzavidis (2017).
Estimating the density of ethnic minorities and aged people in Berlin: Multivariate
Kernel Density Estimation applied to sensitive georeferenced administrative data
protected via measurement error. Journal of the Royal Statistical Society: Series A
(Statistics in Society), 180.
Plots the estimated density of the interval-censored variable. Also, convergence plots are given for all estimated statistical indicators. The estimated indicator is plotted for each iteration step of the KDE-algorithm. Furthermore, the average up to iteration step M is plotted (without the burn-in iterations). A vertical line indicates the end of the burn-in period. A horizontal line marks the value of the estimated statistical indicator
## S3 method for class 'kdeAlgo' plot(x, indicator = NULL, ...)
## S3 method for class 'kdeAlgo' plot(x, indicator = NULL, ...)
x |
an object of type "kdeAlgo", typical result of |
indicator |
a vector of indicator names specifying for which indicators
convergence plots are plotted, e.g. |
... |
optional arguments passed to generic function. |
Convergence and density plots.
Available are convergence plots for the estimated fixed effects model parameters and the residual variance of the linear or linear mixed regression model. If the Box-Cox transformation is used for the transformation of the dependent variable, a convergence plot of the transformation parameter lambda is also available. In each of the convergence plots, the estimated parameter is plotted for each iteration step of the SEM-algorithm. Furthermore, the average up to iteration step M is plotted (without the burn-in iterations). A vertical line indicates the end of the burn-in period. A horizontal line marks the value of the estimated statistical indicator Furthermore, the estimated density of the simulated dependent variable from the last iteration step is plotted with a histogram of the interval-censored true dependent variable in the back.
## S3 method for class 'sem' plot(x, ...)
## S3 method for class 'sem' plot(x, ...)
x |
|
... |
optional arguments passed to generic function. |
Convergence and density plots.
Walter, P. (2019). A Selection of Statistical Methods for Interval-Censored Data with Applications to the German Microcensus, PhD thesis, Freie Universitaet Berlin
Basic information of a kdeAlgo object is printed.
## S3 method for class 'kdeAlgo' print(x, ...)
## S3 method for class 'kdeAlgo' print(x, ...)
x |
an object of class |
... |
optional arguments passed to generic function |
Basic information of a sem object is printed
## S3 method for class 'sem' print(x, ...)
## S3 method for class 'sem' print(x, ...)
x |
an object of class |
... |
optional arguments passed to generic function |
The elements described in summary.sem are printed.
## S3 method for class 'summary.sem' print(x, ...)
## S3 method for class 'summary.sem' print(x, ...)
x |
an object of class "summary.sem". |
... |
additional arguments that are not used in this method. |
This function estimates the linear regression model when the dependent variable is interval-censored. The estimation of the standard errors is fasciliated by a non-parametric bootstrap.
semLm( formula, data, classes, burnin = 40, samples = 200, trafo = "None", adjust = 2, bootstrap.se = FALSE, b = 100 )
semLm( formula, data, classes, burnin = 40, samples = 200, trafo = "None", adjust = 2, bootstrap.se = FALSE, b = 100 )
formula |
an object of class |
data |
a data frame containing the variables of the model |
classes |
numeric vector of classes; |
burnin |
the number of burn-in iterations of the SEM-algorithm |
samples |
the number of additional iterations of the SEM-algorithm for parameter estimation |
trafo |
transformation of the dependent variable to fulfill the model assumptions
default is |
adjust |
extends the number of iteration steps of the SEM-algorithm
for finding the optimal lambda of the Box-Cox transformation. The number of
iterations
is extended in the following way: |
bootstrap.se |
if |
b |
number of bootstrap iterations for the estimation of the standard errors |
The model parameters are estimated using pseudo samples as a proxy
for the
interval-censored dependent variable. The object pseudo.y
returns the
pseudo samples of each iteration step of the SEM-algorithm.
An object of class "sem" that provides parameter estimates for linear
regression models with interval-censored dependent variable. Generic
functions such as, print
,
plot
, and summary
have methods that can be used
to obtain further information. See semObject
for a description
of the components
of objects of class "sem".
Walter, P. (2019). A Selection of Statistical Methods for Interval-Censored Data with Applications to the German Microcensus, PhD thesis, Freie Universitaet Berlin
lm
, print.sem
,
plot.sem
, summary.sem
## Not run: # Load and prepare data data <- Exam classes <- c(1, 1.5, 2.5, 3.5, 4.5, 5.5, 6.5, 7.7, 8.5, Inf) data$examsc.class <- cut(data$examsc, classes) # Run model with default settings model <- semLm( formula = examsc.class ~ standLRT + schavg, data = data, classes = classes ) summary(model) ## End(Not run)
## Not run: # Load and prepare data data <- Exam classes <- c(1, 1.5, 2.5, 3.5, 4.5, 5.5, 6.5, 7.7, 8.5, Inf) data$examsc.class <- cut(data$examsc, classes) # Run model with default settings model <- semLm( formula = examsc.class ~ standLRT + schavg, data = data, classes = classes ) summary(model) ## End(Not run)
This function estimates the linear mixed regression model when the dependent variable is interval-censored. The estimation of the standard errors is fasciliated by a parametric bootstrap.
semLme( formula, data, classes, burnin = 40, samples = 200, trafo = "None", adjust = 2, bootstrap.se = FALSE, b = 100 )
semLme( formula, data, classes, burnin = 40, samples = 200, trafo = "None", adjust = 2, bootstrap.se = FALSE, b = 100 )
formula |
a two-sided linear formula object describing both the fixed-effects
and random-effects part of the model, with the response on the left of a ~ operator
and the terms, separated by + operators, on the right. Random-effects terms are
distinguished by vertical bars (|) separating expressions for design matrices from
grouping factors, as in |
data |
a data frame containing the variables of the model |
classes |
numeric vector of classes; |
burnin |
the number of burn-in iterations of the SEM-algorithm |
samples |
the number of additional iterations of the SEM-algorithm for parameter estimation |
trafo |
transformation of the dependent variable to fulfil the model assumptions
default is |
adjust |
extends the number of iteration steps of the SEM-algorithm
for finding the optimal lambda of the Box-Cox transformation. The number of iterations
is extended in the following way: |
bootstrap.se |
if |
b |
number of bootstrap iterations for the estimation of the standard errors |
The model parameters are estimated using pseudo samples of the
interval-censored dependent variable. The object pseudo.y
returns the
pseudo samples of each iteration step of the SEM-algorithm.
An object of class "sem" that provides parameter estimated for linear
regression models with interval-censored dependent variable. Generic
functions such as, print
,
plot
, and summary
have methods that can be used
to obtain further information. See semObject
for descriptions
of components
of objects of class "sem".
Walter, P. (2019). A Selection of Statistical Methods for Interval-Censored Data with Applications to the German Microcensus, PhD thesis, Freie Universitaet Berlin
lmer
, print.sem
,
plot.sem
, summary.sem
## Not run: # Load and prepare data data <- Exam classes <- c(1, 1.5, 2.5, 3.5, 4.5, 5.5, 6.5, 7.7, 8.5, Inf) data$examsc.class <- cut(data$examsc, classes) # Run model with random intercept and default settings model1 <- semLme( formula = examsc.class ~ standLRT + schavg + (1 | school), data = data, classes = classes ) summary(model1) ## End(Not run)
## Not run: # Load and prepare data data <- Exam classes <- c(1, 1.5, 2.5, 3.5, 4.5, 5.5, 6.5, 7.7, 8.5, Inf) data$examsc.class <- cut(data$examsc, classes) # Run model with random intercept and default settings model1 <- semLme( formula = examsc.class ~ standLRT + schavg + (1 | school), data = data, classes = classes ) summary(model1) ## End(Not run)
An object of class "sem" that represents the estimated model
parameters and standard errors.
Objects of this class have methods for the generic functions
print
, plot
and summary
.
An object of class "sem" is a list containing the following components. Some parameters are only estimated for liner mixed regression models (and vice versa).
pseudo.y |
a matrix containing the pseudo samples of the interval-censored variable from each iteration step |
coef |
the estimated regression coefficients (fixed effects) |
ranef |
the estimated regression random effects |
sigmae |
estimated variance |
VaVoc |
estimated covariance matrix of the random effects |
se |
bootstrapped standard error of the coefficients |
ci |
bootstrapped 95% confidence interval of the coefficients |
lambda |
estimated lambda for the Box-Cox transformation |
bootstraps |
number of bootstrap iterations for the estimation of the standard errors |
r2 |
estimated coefficient of determination |
icc |
estimated interclass correlation coefficient |
adj.r2 |
estimated adjusted coefficient of determination |
formula |
|
transformation |
the specified transformation "log" for logarithmic and "bc" for Box-Cox |
n.classes |
the number of classes, the dependent variable is censored to |
conv.coef |
estimated coefficients for each iteration step of the SEM-algorithm |
conv.sigmae |
estimated variance |
conv.VaCov |
estimated covariance matrix of the random effects for each iteration step of the SEM-algorithm |
conv.lambda |
estimated lambda for the Box-Cox transformation for each iteration step of the SEM-algorithm |
b.lambda |
the number of burn-in iteration the SEM-algorithm used to estimate lambda |
m.lambda |
the number of additional iteration the SEM-algorithm used to estimate lambda |
burnin |
the number of burn-in iterations of the SEM-algorithm |
samples |
the number of additional iterations of the SEM-algorithm |
classes |
specified intervals |
original.y |
the dependent variable of the regression model measured on an interval-censored scale |
call |
the function call |
Walter, P. (2019). A Selection of Statistical Methods for Interval-Censored Data with Applications to the German Microcensus, PhD thesis, Freie Universitaet Berlin
The package smicd supports the estimation of linear and linear mixed regression models (random slope and random intercept models) with interval censored dependent variable. Parameter estimates are obtain by a stochastic expectation maximization (SEM) algorithm (Walter, 2019). Standard errors are estimated by a non-parametric bootstrap in the linear regression model and by a parametric bootstrap in the linear mixed regression model. To handle departures from the model assumptions transformations (log and Box-Cox) of the interval censored dependent variable are incorporated into the algorithm (Walter, 2019). Furthermore, the package smicd has implemented a non-parametric kernel density algorithm for the direct (without covariates) estimation of statistical indicators from interval censored data (Walter, 2019; Gross et al., 2017). The standard errors of the statistical indicators are estimated by a non-parametric bootstrap.
The two estimation functions for the linear and linear mixed regression model
are called semLm
and semLme
. So far, only random
intercept and random slope models are implemented. For both functions
the following methods are available: summary.sem
,
print.sem
and plot.sem
.
The function for the direct estimation of statistical indicators is called
kdeAlgo
. The following methods are available:
print.kdeAlgo
and plot.kdeAlgo
.
An overview of all currently provided functions can be requested by
library(help=smicd)
.
Walter, P. (2019). A Selection of Statistical Methods for Interval-Censored
Data with Applications to the German Microcensus, PhD thesis,
Freie Universitaet Berlin
Gross, M., U. Rendtel, T. Schmid, S. Schmon, and N. Tzavidis (2017).
Estimating the density of ethnic minorities and aged people in Berlin: Multivariate
Kernel Density Estimation applied to sensitive georeferenced administrative data
protected via measurement error. Journal of the Royal Statistical Society: Series A
(Statistics in Society), 180.
summary
method for class "sem"
.
## S3 method for class 'sem' summary(object, ...)
## S3 method for class 'sem' summary(object, ...)
object |
an object of class |
... |
additional arguments that are not used in this method. |
an object of type "summary.sem" with following components:
call |
a list containing an image of the function call that produced the object. |
coefficients |
a table that returns the estimation parameters and the standard errors and confidence intervals in case that the standard erros are estimated. |
standard errors |
bootstraped standard errors |
confidence intervals |
bootstraped confidence intervals |
two R2 measures |
a multiple and adjusted R-squared |