Title: | Recursive Partitioning for Structural Equation Models |
---|---|
Description: | SEM Trees and SEM Forests -- an extension of model-based decision trees and forests to Structural Equation Models (SEM). SEM trees hierarchically split empirical data into homogeneous groups each sharing similar data patterns with respect to a SEM by recursively selecting optimal predictors of these differences. SEM forests are an extension of SEM trees. They are ensembles of SEM trees each built on a random sample of the original data. By aggregating over a forest, we obtain measures of variable importance that are more robust than measures from single trees. A description of the method was published by Brandmaier, von Oertzen, McArdle, & Lindenberger (2013) <doi:10.1037/a0030001> and Arnold, Voelkle, & Brandmaier (2020) <doi:10.3389/fpsyg.2020.564403>. |
Authors: | Andreas M. Brandmaier [aut, cre], John J. Prindle [aut], Manuel Arnold [aut], Caspar J. Van Lissa [aut] |
Maintainer: | Andreas M. Brandmaier <[email protected]> |
License: | GPL-3 |
Version: | 0.9.20 |
Built: | 2024-10-31 05:38:56 UTC |
Source: | https://github.com/brandmaier/semtree |
SEM Tree Package
.SCALE_METRIC
.SCALE_METRIC
An object of class numeric
of length 1.
A function to calculate biodiversity of a semforest
object.
biodiversity(x, aggregate.fun = median)
biodiversity(x, aggregate.fun = median)
x |
A |
aggregate.fun |
Takes a function to apply to the vector of pairwise diversities. By default, this is the median. |
Andreas M. Brandmaier
Grows a series of SEM Forests following the boruta algorithm to determine feature importance as moderators of the underlying model.
boruta( model, data, control = NULL, predictors = NULL, maxRuns = 30, pAdjMethod = "none", alpha = 0.05, verbose = FALSE, quant = 1, ... )
boruta( model, data, control = NULL, predictors = NULL, maxRuns = 30, pAdjMethod = "none", alpha = 0.05, verbose = FALSE, quant = 1, ... )
model |
A template SEM. Same as in |
data |
A dataframe to boruta on. Same as in |
control |
A semforest control object to set forest parameters. |
predictors |
An optional list of covariates. See semtree code example. |
maxRuns |
Maximum number of boruta search cycles |
pAdjMethod |
A value from stats::p.adjust.methods defining a multiple testing correction method |
alpha |
p-value cutoff for decisionmaking. Default .05 |
verbose |
Verbosity level for boruta processing similar to the same argument in semtree.control and semforest.control |
... |
Optional parameters to undefined subfunctions |
A vim object with several elements that need work. Of particular note, '$importance' carries mean importance; '$decision' denotes Accepted/Rejected/Tentative; '$impHistory' has the entire varimp history; and '$details' has exit values for each parameter.
Priyanka Paul, Timothy R. Brick, Andreas Brandmaier
Return the parameter estimates of a given leaf of a SEM tree
## S3 method for class 'semtree' coef(object, ...)
## S3 method for class 'semtree' coef(object, ...)
object |
semtree. A SEM tree node. |
... |
Extra arguments. Currently unused. @exportS3Method coef semtree |
Wrapper function for computing the maxLR corrected p value from strucchange
computePval_maxLR(maxLR, q, covariate, from, to, nrep)
computePval_maxLR(maxLR, q, covariate, from, to, nrep)
maxLR |
maximum of the LR test statistics |
q |
number of free SEM parameters / degrees of freedom |
covariate |
covariate under evaluation. This is important to get the level of measurement from the covariate and the bin size for ordinal and categorical covariates. |
from |
numeric from interval (0, 1) specifying start of trimmed sample period. With the default from = 0.15 the first and last 15 percent of observations are trimmed. This is only needed for continuous covariates. |
to |
numeric from interval (0, 1) specifying end of trimmed sample period. By default, to is 1. |
nrep |
numeric. Number of replications used for simulating from the asymptotic distribution (passed to efpFunctional). Only needed for ordinal covariates. |
Numeric. p value for maximally selected LR statistic
Manuel Arnold
Computes a diversity matrix using a distance function between trees
diversityMatrix(forest, divergence = klsym, showProgressBar = TRUE)
diversityMatrix(forest, divergence = klsym, showProgressBar = TRUE)
forest |
A SEM forest |
divergence |
A divergence function such as hellinger or klsym |
showProgressBar |
Boolean. Show a progress bar. |
Evaluates the average deviance (-2LL) of a dataset given a forest.
evaluate(x, data = NULL, ...)
evaluate(x, data = NULL, ...)
x |
A fitted |
data |
A data.frame |
... |
No extra parameters yet. |
Average deviance
Andreas M. Brandmaier
Brandmaier, A.M., Oertzen, T. v., McArdle, J.J., & Lindenberger, U. (2013). Structural equation model trees. Psychological Methods, 18(1), 71-86.
evaluateDataLikelihood
, semtree
,
semforest
This helper function is used
in the semforest
varimp
and
proximity
aggregate functions.
evaluateDataLikelihood(model, data, data_type = "raw")
evaluateDataLikelihood(model, data, data_type = "raw")
model |
|
data |
Data set to apply to a fitted model. |
data_type |
Type of data ("raw", "cov", "cor") |
Returns a -2LL model fit for the model
Andreas M. Brandmaier, John J. Prindle
Brandmaier, A.M., Oertzen, T. v., McArdle, J.J., & Lindenberger, U. (2013). Structural equation model trees. Psychological Methods, 18(1), 71-86.
A helper function to evaluate the negative two log-likelihood (-2LL) of leaf (terminal) nodes for a
dataset. When given a semtree
and a unique dataset, the model
estimates -2LL for the tree parameters and data subsets that fit the tree
branching criteria.
evaluateTree(tree, test_set, data_type = "raw", leaf_ids = NULL)
evaluateTree(tree, test_set, data_type = "raw", leaf_ids = NULL)
tree |
A fitted |
test_set |
Dataset to fit to a fitted |
data_type |
type of data ("raw", "cov", "cor") |
leaf_ids |
Identifies which nodes are leaf nodes. Default is NULL, which checks model for leaf nodes and fills this information in automatically. |
A list with two elements:
deviance |
Combined -2LL for leaf node models of the tree. |
num_models |
Number of leaf nodes used for the deviance calculations. |
Andreas M. Brandmaier, John J. Prindle
Brandmaier, A.M., Oertzen, T. v., McArdle, J.J., & Lindenberger, U. (2013). Structural equation model trees. Psychological Methods, 18(1), 71-86.
evaluateDataLikelihood
, semtree
,
semforest
Search tool to search nodes for alternative splitting values found during
the semtree
process. Given a particular node, competing split
values are listed assuming they also meet the criteria for a significant
splitting value as set by semtree.control
.
findOtherSplits(node, tree)
findOtherSplits(node, tree)
node |
A node from a |
tree |
A |
A data.frame()
with rows corresponding to the variable names
and split values for alternative splits found in the node of interest.
...
Andreas M. Brandmaier, John J. Prindle
Brandmaier, A.M., Oertzen, T. v., McArdle, J.J., & Lindenberger, U. (2013). Structural equation model trees. Psychological Methods, 18(1), 71-86.
Fit multigroup model for evaluating a candidate split
fitSubmodels( model, subset1, subset2, control, invariance = NULL, return.models = FALSE )
fitSubmodels( model, subset1, subset2, control, invariance = NULL, return.models = FALSE )
model |
A model specification that is used as template for each of the two groups |
subset1 |
Dataset for the first group model |
subset2 |
Dataset for the second group model |
control |
a |
invariance |
fit models with invariant parameters if given. NULL otherwise (default). |
return.models |
boolean. Return the fitted models returns NA if fit fails |
Returns the length of the longest path from a root node to a leaf node.
getDepth(tree)
getDepth(tree)
tree |
A |
Andreas M. Brandmaier
Brandmaier, A.M., Oertzen, T. v., McArdle, J.J., & Lindenberger, U. (2013). Structural equation model trees. Psychological Methods, 18(1), 71-86.
Returns height of a SEM Tree, which equals to the length of the longest path from root to a terminal node.
getHeight(tree)
getHeight(tree)
tree |
A SEM tree. |
Andreas M. Brandmaier, John J. Prindle
Brandmaier, A.M., Oertzen, T. v., McArdle, J.J., & Lindenberger, U. (2013). Structural equation model trees. Psychological Methods, 18(1), 71-86.
Get a list of all leafs in a tree by recursively searching the tree starting
at the given node (if not data
object is given. If data
is
given, the function returns the leafs that are predicted for each row of the
given data.
getLeafs(tree, data = NULL)
getLeafs(tree, data = NULL)
tree |
A |
data |
A |
Andreas M. Brandmaier
Brandmaier, A.M., Oertzen, T. v., McArdle, J.J., & Lindenberger, U. (2013). Structural equation model trees. Psychological Methods, 18(1), 71-86.
Return a node matching a given node ID
getNodeById(tree, id)
getNodeById(tree, id)
tree |
A SEM Tree object. |
id |
Numeric. A Node id. |
Andreas M. Brandmaier, John J. Prindle
Brandmaier, A.M., Oertzen, T. v., McArdle, J.J., & Lindenberger, U. (2013). Structural equation model trees. Psychological Methods, 18(1), 71-86.
Counts the number of nodes in a tree.
getNumNodes(tree)
getNumNodes(tree)
tree |
A SEM tree object. |
Andreas M. Brandmaier, John J. Prindle
Brandmaier, A.M., Oertzen, T. v., McArdle, J.J., & Lindenberger, U. (2013). Structural equation model trees. Psychological Methods, 18(1), 71-86.
Returns a list of tables with some measure of parameter differences between post-split nodes.
getParDiffForest(forest, measure = "wald", normalize = FALSE)
getParDiffForest(forest, measure = "wald", normalize = FALSE)
forest |
a semforest object. |
measure |
a character. "wald" (default) gives the squared parameter differences divided by their pooled standard errors. test" gives the contributions of the parameters to the test statistic. "raw" gives the absolute values of the parameter differences. |
normalize |
logical value; if TRUE parameter differences of each split are divided by sum of all differences the corresponding split. Set to FALSE by default. |
A list with data.frames containing parameter differences for each
tree of the forest. The rows of the data.frames correspond to the non-leaf
nodes of the respective trees. The first column contains the name of the
predictor variables and the remaining columns contain the parameter
differences. The rows of the data.frames are named by the node IDs as given
getNodeById
and the columns are named as in coef
.
Manuel Arnold
Returns a table with some measure of parameter differences between post-split nodes.
getParDiffTree(tree, measure = "wald", normalize = FALSE)
getParDiffTree(tree, measure = "wald", normalize = FALSE)
tree |
a semtree object. |
measure |
a character. "wald" (default) gives the squared parameter differences divided by their pooled standard errors. "test" gives the contributions of the parameters to the test statistic."raw" gives the absolute values of the parameter differences. |
normalize |
logical value; if TRUE parameter differences of each split are divided by sum of all differences the corresponding split. Set to FALSE by default. |
A matrix containing parameter differences. The
matrix has rows and
columns, where
is the number of
non-leaf nodes of the tree and
is the number of model parameters. The
rows are named by the node IDs as given
getNodeById
and the columns
are named as in coef
.
Manuel Arnold
Returns all leafs (=terminal nodes) of a tree.
getTerminalNodes(tree)
getTerminalNodes(tree)
tree |
A semtree object. |
Andreas M. Brandmaier, John J. Prindle
Brandmaier, A.M., Oertzen, T. v., McArdle, J.J., & Lindenberger, U. (2013). Structural equation model trees. Psychological Methods, 18(1), 71-86.
Tests whether a semtree object is a leaf. Returns TRUE or FALSE.
isLeaf(tree)
isLeaf(tree)
tree |
A |
Andreas M. Brandmaier
Brandmaier, A.M., Oertzen, T. v., McArdle, J.J., & Lindenberger, U. (2013). Structural equation model trees. Psychological Methods, 18(1), 71-86.
Divergence measures for multivariate normal distributions as used in the diversityMatrix function.
kl(mu1, cov1, mu2, cov2)
kl(mu1, cov1, mu2, cov2)
mu1 |
Mean vector |
cov1 |
Covariance matrix |
mu2 |
Mean vector |
cov2 |
Covariance matrix |
This data set provides simple data to fit with a LGCM.
lgcm
is a matrix containing 400 rows and 8 columns of
simulated data. Longitudinal observations are o1-o5. Covariates are
agegroup, training, and noise.
Andreas M. Brandmaier [email protected]
This overrides generic base::merge() to merge two forests into one.
## S3 method for class 'semforest' merge(x, y, ...)
## S3 method for class 'semforest' merge(x, y, ...)
x |
A SEM Forest |
y |
A second SEM Forest |
... |
Extra arguments. Currently unused. |
Andreas M. Brandmaier, John J. Prindle
Brandmaier, A.M., Oertzen, T. v., McArdle, J.J., & Lindenberger, U. (2013). Structural equation model trees. Psychological Methods, 18(1), 71-86.
Return model estimates of the tree.
modelEstimates(tree, ...)
modelEstimates(tree, ...)
tree |
A semtree object. |
... |
Optional arguments. |
Andreas M. Brandmaier, John J. Prindle
Brandmaier, A.M., Oertzen, T. v., McArdle, J.J., & Lindenberger, U. (2013). Structural equation model trees. Psychological Methods, 18(1), 71-86.
Compute outlier score based on proximity matrix.
outliers(prox)
outliers(prox)
prox |
A proximity matrix. |
Andreas M. Brandmaier, John J. Prindle
Brandmaier, A.M., Oertzen, T. v., McArdle, J.J., & Lindenberger, U. (2013). Structural equation model trees. Psychological Methods, 18(1), 71-86.
Returns a table of parameters with columns corresponding to freely estimated parameters and rows corresponding to nodes in the tree.
parameters(tree, leafs.only = TRUE)
parameters(tree, leafs.only = TRUE)
tree |
A SEMtree object obtained from |
leafs.only |
Default = TRUE. Only the terminal nodes (leafs) are
printed. If set to FALSE, all node parameters are written to the
|
The row names of the resulting data frame correspond to internal node ids
and the column names correspond to parameters in the SEM. Standard errors of
the estimates can be obtained from parameters
.
Returns a data.frame
with rows for parameters and columns for
terminal nodes.
Andreas M. Brandmaier, John J. Prindle
Brandmaier, A.M., Oertzen, T. v., McArdle, J.J., & Lindenberger, U. (2013). Structural equation model trees. Psychological Methods, 18(1), 71-86.
Compute the partial dependence of a predictor, or set of predictors, on a model parameter.
partialDependence( x, data, reference.var, support = 20, points = NULL, mc = NULL, FUN = "median", ... )
partialDependence( x, data, reference.var, support = 20, points = NULL, mc = NULL, FUN = "median", ... )
x |
An object for which a method exists |
data |
Optional |
reference.var |
Character vector, referring to the (independent) reference variable or variables for which partial dependence is calculated. Providing two (or more) variables allows for probing interactions, but note that this is computationally expensive. |
support |
Integer. Number of grid points for interpolating the
|
points |
Named list, with elements corresponding to |
mc |
Integer. If |
FUN |
Character string with function used to integrate predictions
across all elements of |
... |
Extra arguments passed to |
Caspar J. Van Lissa, , Andreas M. Brandmaier
Create a dataset with fixed values for reference.var
for all other
values of data
, or using mc
random samples from data
(Monte Carlo integration).
partialDependence_data( data, reference.var, support = 20, points = NULL, mc = NULL, keep_id = FALSE )
partialDependence_data( data, reference.var, support = 20, points = NULL, mc = NULL, keep_id = FALSE )
data |
The |
reference.var |
Character vector, referring to the (independent) reference variable or variables for which partial dependence is calculated. Providing two (or more) variables allows for probing interactions, but note that this is computationally expensive. |
support |
Integer. Number of grid points for interpolating the
|
points |
Named list, with elements corresponding to |
mc |
Integer. If |
keep_id |
Boolean. Default is false. Should output contain a row id column? marginal dependency using Monte Carlo integration. This is less computationally expensive. |
Caspar J. Van Lissa
Compute the partial dependence of a predictor, or set of predictors, on the predicted trajectory of a latent growth model.
partialDependence_growth( x, data, reference.var, support = 20, points = NULL, mc = NULL, FUN = "median", times = NULL, parameters = NULL, ... )
partialDependence_growth( x, data, reference.var, support = 20, points = NULL, mc = NULL, FUN = "median", times = NULL, parameters = NULL, ... )
x |
An object for which a method exists |
data |
Optional |
reference.var |
Character vector, referring to the (independent) reference variable or variables for which partial dependence is calculated. Providing two (or more) variables allows for probing interactions, but note that this is computationally expensive. |
support |
Integer. Number of grid points for interpolating the
|
points |
Named list, with elements corresponding to |
mc |
Integer. If |
FUN |
Character string with function used to integrate predictions
across all elements of |
times |
Numeric matrix, representing the factor loadings of a latent
growth model, with columns equal to the number of growth |
parameters |
Character vector of the names of the growth parameters;
defaults to |
... |
Extra arguments passed to |
Caspar J. Van Lissa
Visualizes parameter differences between post-split nodes in a forest with boxplots.
plotParDiffForest( forest, plot = "boxplot", measure = "wald", normalize = FALSE, predictors = NULL, title = TRUE )
plotParDiffForest( forest, plot = "boxplot", measure = "wald", normalize = FALSE, predictors = NULL, title = TRUE )
forest |
a semforest object. |
plot |
a character that specifies the plot typ. Available plot types are "boxplot" (default) and "jitter" for a jittered strip plot with mean and standard deviation. |
measure |
a character. "wald" (default) gives the squared parameter differences divided by their pooled standard errors. "test" gives the contributions of the parameters to the test statistic. "raw" gives the absolute values of the parameter differences. |
normalize |
logical value; if TRUE parameter differences of each split are divided by sum of all differences the corresponding split. Set to FALSE by default. |
predictors |
a character. Select predictors that are to be plotted. |
title |
logical value; if TRUE a title is added to the plot. |
Manuel Arnold
Visualizes parameter differences between post-split nodes with different plot types.
plotParDiffTree( tree, plot = "ballon", measure = "wald", normalize = FALSE, title = TRUE, structure = FALSE )
plotParDiffTree( tree, plot = "ballon", measure = "wald", normalize = FALSE, title = TRUE, structure = FALSE )
tree |
a semtree object. |
plot |
a character that specifies the plot typ. Available plot types are "ballon" (default), "heatmap", and "bar". |
measure |
a character. "wald" (default) gives the squared parameter differences divided by their pooled standard errors. "test" gives the contributions of the parameters to the test statistic. "raw" gives the absolute values of the parameter differences. |
normalize |
logical value; if TRUE parameter differences of each split are divided by sum of all differences the corresponding split. Set to FALSE by default. |
title |
logical value; if TRUE a title is added to the plot. |
structure |
logical value; if TRUE the structure of the tree is plotted on the right side. |
Manuel Arnold
Plots the structure of a semtree object. This function is
similar to plot.semtree
, but it does not print the parameter values in
the leaf nodes and labels the leaf nodes instead.
plotTreeStructure(tree, type = 2, no.plot = FALSE, ...)
plotTreeStructure(tree, type = 2, no.plot = FALSE, ...)
tree |
a semtree object. |
type |
Type of plot. See |
no.plot |
logical value; if TRUE structure of the tree is printed to the console. |
... |
additional arguments passed to |
Manuel Arnold
Predict method for semtree and semforest
## S3 method for class 'semforest' predict(object, data, type = "node_id", ...)
## S3 method for class 'semforest' predict(object, data, type = "node_id", ...)
object |
Object of class |
data |
New test data of class |
type |
Type of prediction. One of ‘c(’node_id')'. See Details. |
... |
further arguments passed to or from other methods. |
Object of class matrix
.
Caspar J. van Lissa, Andreas Brandmaier
Compute a n by n matrix across all trees in a forest, where n is the number of rows in the data, reflecting the proportion of times two cases ended up in the same terminal node of a tree.
proximity(x, data, ...)
proximity(x, data, ...)
x |
An object for which a method exists. |
data |
A data.frame on which proximity is computed |
... |
Parameters passed to other functions. |
SEM Forest Case Proximity
A matrix with dimensions [i, j] whose elements reflect the proportion of times case i and j were in the same terminal node of a tree.
Caspar J. Van Lissa, Andreas M. Brandmaier, John J. Prindle
Brandmaier, A.M., Oertzen, T. v., McArdle, J.J., & Lindenberger, U. (2013). Structural equation model trees. Psychological Methods, 18(1), 71-86.
nodeids <- structure(c(9, 3, 5, 7, 10, 4, 6, 8, 9, 3, 5, 7, 10, 4, 6, 8), .Dim = c(4L, 4L)) class(nodeids) <- "semforest_node_id" sims <- proximity(nodeids) dd <- as.dist(1-sims) hc <- hclust(dd) groups <- cutree(hc, 2)
nodeids <- structure(c(9, 3, 5, 7, 10, 4, 6, 8, 9, 3, 5, 7, 10, 4, 6, 8), .Dim = c(4L, 4L)) class(nodeids) <- "semforest_node_id" sims <- proximity(nodeids) dd <- as.dist(1-sims) hc <- hclust(dd) groups <- cutree(hc, 2)
Returns a new tree with a maximum depth selected by the user. can be used in conjunction with plot commands to view various pruning levels.
prune(object, ...)
prune(object, ...)
object |
A |
... |
Optional parameters, such as |
The returned tree is only modified by the number of levels for the tree.
This function does not reevaluate the data, but provides alternatives to
reduce tree complexity. If the user would like to alter the tree by
increasing depth, then max.depth option must be adjusted in the
semtree.control
object (provided further splits are able to be
computed).
Returns a semtree
object.
Andreas M. Brandmaier, John J. Prindle
Brandmaier, A.M., Oertzen, T. v., McArdle, J.J., & Lindenberger, U. (2013). Structural equation model trees. Psychological Methods, 18(1), 71-86.
Returns a table of standard errors with columns corresponding to freely estimated standard errors and rows corresponding to nodes in the tree.
se(tree, leafs.only = TRUE)
se(tree, leafs.only = TRUE)
tree |
A SEMtree object obtained from |
leafs.only |
Default = TRUE. Only the terminal nodes (leafs) are
printed. If set to FALSE, all node standard errors are written to the
|
The row names of the resulting data frame correspond to internal node ids
and the column names correspond to standard errors in the SEM. Parameter
estimates can be obtained from parameters
.
Returns a data.frame
with rows for parameters and columns for
terminal nodes.
Andreas M. Brandmaier, John J. Prindle
Brandmaier, A.M., Oertzen, T. v., McArdle, J.J., & Lindenberger, U. (2013). Structural equation model trees. Psychological Methods, 18(1), 71-86.
semtree
, semtree.control
,
parameters
Grows a SEM Forest from a template model and a dataset. This may take some time.
semforest( model, data, control = NULL, predictors = NULL, constraints = NULL, ... )
semforest( model, data, control = NULL, predictors = NULL, constraints = NULL, ... )
model |
A template SEM. Same as in |
data |
A dataframe to create a forest from. Same as in |
control |
A semforest control object to set forest parameters. |
predictors |
An optional list of covariates. See semtree code example. |
constraints |
An optional list of covariates. See semtree code example. |
... |
Optional parameters. |
A semforest object.
Andreas M. Brandmaier, John J. Prindle
Brandmaier, A.M., Prindle, J. J., McArdle, J. J., & Lindenberger, U. (2016). Theory-guided exploration with structural equation model forests. Psychological Methods, 21(4), 566–582.
Brandmaier, A.M., Oertzen, T. v., McArdle, J.J., & Lindenberger, U. (2013). Structural equation model trees. Psychological Methods, 18(1), 71–86.
A SEM Forest control object to tune parameters of the forest learning algorithm.
semforest.control( num.trees = 5, sampling = "subsample", control = NA, mtry = 2, remove_dead_trees = TRUE )
semforest.control( num.trees = 5, sampling = "subsample", control = NA, mtry = 2, remove_dead_trees = TRUE )
num.trees |
Number of trees. |
sampling |
Sampling procedure. Can be subsample or bootstrap. |
control |
A SEM Tree control object. Will be generated by default. |
mtry |
Number of subsampled covariates at each node. |
remove_dead_trees |
Remove trees from forest that had runtime errors |
Andreas M. Brandmaier, John J. Prindle
Brandmaier, A.M., Oertzen, T. v., McArdle, J.J., & Lindenberger, U. (2013). Structural equation model trees. Psychological Methods, 18(1), 71-86.
Structural equation model (SEM) trees are a combination of SEM and decision trees (also known as classification and regression trees or recursive partitioning). SEM trees hierarchically split empirical data into homogeneous groups sharing similar data patterns with respect to a SEM by recursively selecting optimal predictors of these differences from a potentially large set of predictors.
semtree( model, data = NULL, control = NULL, constraints = NULL, predictors = NULL, ... )
semtree( model, data = NULL, control = NULL, constraints = NULL, predictors = NULL, ... )
model |
A template model specification from |
data |
Data.frame used in the model creation using
|
control |
|
constraints |
A |
predictors |
A vector of variable names matching variable names in
dataset. If NULL (default) all variables that are in dataset and not part of
the model are potential predictors. Optional function input to select a
subset of the unmodeled variables to use as predictors in the |
... |
Optional arguments passed to the tree growing function. |
Calling semtree
with an OpenMx
or
lavaan
model creates a tree that recursively
partitions a dataset such that the partitions maximally differ with respect
to the model-predicted distributions. Each resulting subgroup (represented
as a leaf in the tree) is represented by a SEM with a distinct set of
parameter estimates.
Predictors (yet unmodeled variables) can take on any form for the splitting algorithm to function (categorical, ordered categories, continuous). Care must be taken in choosing how many predictors to include in analyses because as the number of categories grows for unordered categorical variables, the number of multigroup comparisons increases exponentially for unordered categories.
Currently available evaluation methods for assessing partitions:
1. "naive" selection method compares all possible split values to one another over all predictors included in the dataset.
2. "fair" selection uses a two step procedure for analyzing split values on predictors at each node of the tree. The first phase uses half of the sample to examine the model improvement for each split value on each predictor, and retains the the value that presents the largest improvement for each predictor. The second phase then evaluates these best split points for each predictor on the second half of the sample. The best improvement for the c splits tested on c predictors is selected for the node and the dataset is split from this node for further testing.
3. "score" uses score-based test statistics. These statistics are much faster than the classic SEM tree approach while having favorable statistical properties.
All other parameters controlling the tree growing process are available
through a separate semtree.control
object.
A semtree
object is created which can be examined with
summary
, plot
, and print
.
Andreas M. Brandmaier, John J. Prindle, Manuel Arnold
Brandmaier, A.M., Oertzen, T. v., McArdle, J.J., & Lindenberger, U. (2013). Structural equation model trees. Psychological Methods, 18(1), 71-86.
Arnold, M., Voelkle, M. C., & Brandmaier, A. M. (2021). Score-guided structural equation model trees. Frontiers in Psychology, 11, Article 564403. https://doi.org/10.3389/fpsyg.2020.564403
semtree.control
, summary.semtree
,
parameters
, se
, prune.semtree
,
subtree
, OpenMx
,
lavaan
A SEM Tree constraints object holds information regarding specifics on how the tree is grown (similar to the control object). The SEM tree control object holds all information that is independent of a specific model whereas the constraints object holds information that is specific to a certain model (e.g., specifies differential treatment of certain parameters, e.g., by holding them constant across the forest).
semtree.constraints( local.invariance = NULL, global.invariance = NULL, focus.parameters = NULL )
semtree.constraints( local.invariance = NULL, global.invariance = NULL, focus.parameters = NULL )
local.invariance |
Vector of parameter names that are locally equal, that is, they are assumed to be equal when assessing a local split but allowed to differ subsequently. |
global.invariance |
Vector of parameter names that are globally equal, that is, estimated only once and then fixed in the tree. |
focus.parameters |
Vector of parameter names that exclusively are evaluated for between-group differences when assessing split candidates. If NULL all parameters add to the difference. |
Andreas M. Brandmaier, John J. Prindle
Brandmaier, A.M., Oertzen, T. v., McArdle, J.J., & Lindenberger, U. (2013). Structural equation model trees. Psychological Methods, 18(1), 71-86.
A semtree.control
object contains parameters that determine the tree
growing process. These parameters include choices of different split
candidate selection procedures and hyperparameters of those. Calling the
constructor without parameters creates a default control object. A number of
tree growing methods are included in with this package: 1. 'naive' splitting
takes the best split value of all possible splits on each covariate. 2.
'fair' selection is so called because it tests all splits on half of the
data, then tests the best split value for each covariate on the other half
of the data. The equal footing of each covariate in this two phase test
removes bias from testing variables with many possible splits compared to
those with few. 3. "fair3" does the phases described above, with an
additional step of retesting all of the split values on the best covariate
found in the second phase. Variations in the sample from subsetting are
removed and bias in split selection further reduced. 4. 'score'
implements modern score-based statistics.
semtree.control( method = c("naive", "score", "fair", "fair3"), min.N = NULL, max.depth = NA, alpha = 0.05, alpha.invariance = NA, folds = 5, exclude.heywood = TRUE, progress.bar = TRUE, verbose = FALSE, bonferroni = FALSE, use.all = FALSE, seed = NA, custom.stopping.rule = NA, mtry = NA, report.level = 0, exclude.code = NA, linear = TRUE, min.bucket = NULL, naive.bonferroni.type = 0, missing = "ignore", use.maxlm = FALSE, strucchange.from = 0.15, strucchange.to = NULL, strucchange.nrep = 50000, refit = TRUE, ctsem_sd = FALSE )
semtree.control( method = c("naive", "score", "fair", "fair3"), min.N = NULL, max.depth = NA, alpha = 0.05, alpha.invariance = NA, folds = 5, exclude.heywood = TRUE, progress.bar = TRUE, verbose = FALSE, bonferroni = FALSE, use.all = FALSE, seed = NA, custom.stopping.rule = NA, mtry = NA, report.level = 0, exclude.code = NA, linear = TRUE, min.bucket = NULL, naive.bonferroni.type = 0, missing = "ignore", use.maxlm = FALSE, strucchange.from = 0.15, strucchange.to = NULL, strucchange.nrep = 50000, refit = TRUE, ctsem_sd = FALSE )
method |
Default: 'naive'. One out of
|
min.N |
Default: 10. Minimum sample size per a node, used to determine whether to continue splitting a tree or establish a terminal node. |
max.depth |
Default: NA. Maximum levels per a branch. Parameter for limiting tree growth. |
alpha |
Default: 0.05. Significance level for splitting at a given node. |
alpha.invariance |
Default: NA. Significance level for invariance tests. If NA, the value of alpha is used. |
folds |
Default: 5. Defines the number of folds for the |
exclude.heywood |
Default: TRUE. Reports whether there is an identification problem in the covariance structure of an SEM tested. |
progress.bar |
Default: NA. Option to disable the progress bar for tree growth. |
verbose |
Default: FALSE. Option to turn on or off all model messages during tree growth. |
bonferroni |
Default: FALSE. Correct for multiple tests with Bonferroni type correction. |
use.all |
Treatment of missing variables. By default, missing values stay in a decision node. If TRUE, cases are distributed according to a maximum likelihood principle to the child nodes. |
seed |
Default: NA. Set a random number seed for repeating random fold generation in tree analysis. |
custom.stopping.rule |
Default: NA. Otherwise, this can be a boolean function with a custom stopping rule for tree growing. |
mtry |
Default: NA. Number of sample columns to use in SEMforest analysis. |
report.level |
Default: 0. Values up to 99 can be used to increase the number of onscreen reports for semtree analysis. |
exclude.code |
Default: NA. NPSOL error code for exclusion from model fit evaluations when finding best split. Default: Models with errors during fitting are retained. |
linear |
If TRUE (default), the structural equation model is assumed to not contain any nonlinear parameter constraints and scores are computed analytically, resulting in a shorter runtime. Only relevant for models fitted with OpenMx. |
min.bucket |
Minimum bucket size. This is the minimum size any node must have, such that a given split is considered valid. Minimum bucket size is a lower bound to the sample size in the terminal nodes of a tree. |
naive.bonferroni.type |
Default: 0. When set to zero, bonferroni correction for the naive test counts the number of dichotomous tests. When set to one, bonferroni correction counts the number of variables tested. |
missing |
Missing value treatment. Default is ignore |
use.maxlm |
Use MaxLR statistic for split point selection (as proposed by Arnold et al., 2021) |
strucchange.from |
Strucchange argument. See their package documentation. |
strucchange.to |
Strucchange argument. See their package documentation. |
strucchange.nrep |
Strucchange argument. See their package documentation. |
refit |
If TRUE (default) the initial model is fitted on the data
provided to |
ctsem_sd |
If FALSE (default) no standard errors of CT model parameters are computed. Requesting standard errors increases runtime. |
A control object containing a list of the above parameters.
Andreas M. Brandmaier, John J. Prindle, Manuel Arnold
Brandmaier, A.M., Oertzen, T. v., McArdle, J.J., & Lindenberger, U. (2013). Structural equation model trees. Psychological Methods, 18(1), 71-86.
Arnold, M., Voelkle, M. C., & Brandmaier, A. M. (2021). Score-guided structural equation model trees. Frontiers in Psychology, 11, Article 564403. https://doi.org/10.3389/fpsyg.2020.564403
# create a control object with an alpha level of 1% my.control <- semtree.control(alpha=0.01) # set the minimum number of cases per node to ten my.control$min.N <- 10 # print contents of the control object print(my.control)
# create a control object with an alpha level of 1% my.control <- semtree.control(alpha=0.01) # set the minimum number of cases per node to ten my.control$min.N <- 10 # print contents of the control object print(my.control)
Removes all elements of a semforest
or semtree
except for the tree structure and terminal node parameters. This is to
reduce the heavy memory footprint of sem trees and forests.
strip(x, parameters = NULL)
strip(x, parameters = NULL)
x |
An object for which a method exists. |
parameters |
Character vector, referencing parameters in the SEM model.
Defaults to |
Objects of class semforest
and semtree
are very
large, which complicates downstream operations such as making partial
dependence plots, or using the model in interactive contexts (like Shiny
apps). Running strip
removes all elements of the model
except for the tree structure and terminal node parameters. Note that some
methods are no longer available for the resulting object - e.g.,
varimp
requires the terminal node SEM models to compute the
likelihood ratio.
List
## Not run: if(interactive()){ #EXAMPLE1 } ## End(Not run)
## Not run: if(interactive()){ #EXAMPLE1 } ## End(Not run)
Creates subsets of a forest. This can be used to subset a number of trees, e.g. from:(from+num), or to remove all null (type="nonnull") trees that were due to errors, or to randomly select a sub forest (type=random).
subforest(forest, num = NULL, type = "nonnull", from = 1)
subforest(forest, num = NULL, type = "nonnull", from = 1)
forest |
A SEM Forest object. |
num |
Number of trees to select. |
type |
Either 'random' or 'nonnull' or NULL. First selects a random subset, second selects all non-null trees, third allows subsetting trees. |
from |
Starting index if type=NULL. |
Andreas M. Brandmaier, John J. Prindle
Brandmaier, A.M., Oertzen, T. v., McArdle, J.J., & Lindenberger, U. (2013). Structural equation model trees. Psychological Methods, 18(1), 71-86.
The subtree
function returns a tree from a selected node of the
semtree
returned tree.
subtree(tree, startNode = NULL, level = 0, foundNode = FALSE)
subtree(tree, startNode = NULL, level = 0, foundNode = FALSE)
tree |
A SEMtree object obtained from |
startNode |
Node id, which will be future root node (0 to max node
number of |
level |
Ignore. Only used internally. |
foundNode |
Ignore. Only used internally. |
The row names of the resulting data frame correspond to internal node ids
and the column names correspond to standard errors in the SEM. Standard
errors of the estimates can be obtained from se
.
Returns a semtree
object which is a partitioned tree
from the input semtree
.
Andreas M. Brandmaier, John J. Prindle
Brandmaier, A.M., Oertzen, T. v., McArdle, J.J., & Lindenberger, U. (2013). Structural equation model trees. Psychological Methods, 18(1), 71-86.
Converts a tree into a tabular representation. This may be useful as a textual representation for use in manuscripts.
toTable(tree, added.param.cols = NULL, round.param = NULL)
toTable(tree, added.param.cols = NULL, round.param = NULL)
tree |
A SEM Tree object. |
added.param.cols |
String. Add extra columns with parameter estimates. Pass a vector with the names of the parameters that should be rendered in the table. |
round.param |
Integer. Number of digits to round parameter estimates. Default is no rounding (NULL) |
Andreas M. Brandmaier
Brandmaier, A. M., Ram, N., Wagner, G. G., & Gerstorf, D. (in press). Terminal decline in well-being: The role of multi-indicator constellations of physical health and psychosocial correlates. Developmental Psychology.
A function to calculate relative variable importance for selecting node
splits over a semforest
object.
varimp( forest, var.names = NULL, verbose = F, eval.fun = evaluateTree, method = "permutation", conditional = FALSE, ... )
varimp( forest, var.names = NULL, verbose = F, eval.fun = evaluateTree, method = "permutation", conditional = FALSE, ... )
forest |
A |
var.names |
Covariates used in the forest creation process. NULL value will be automatically filled in by the function. |
verbose |
Boolean to print messages while function is running. |
eval.fun |
Default is |
method |
Experimental. Some alternative methods to compute importance. Default is "permutation". |
conditional |
Conditional variable importance if TRUE, otherwise marginal variable importance. |
... |
Optional arguments. |
Andreas M. Brandmaier, John J. Prindle
Brandmaier, A.M., Oertzen, T. v., McArdle, J.J., & Lindenberger, U. (2013). Structural equation model trees. Psychological Methods, 18(1), 71-86.