| Title: | Recursive Partitioning for Structural Equation Models |
|---|---|
| Description: | SEM Trees and SEM Forests -- an extension of model-based decision trees and forests to Structural Equation Models (SEM). SEM trees hierarchically split empirical data into homogeneous groups each sharing similar data patterns with respect to a SEM by recursively selecting optimal predictors of these differences. SEM forests are an extension of SEM trees. They are ensembles of SEM trees each built on a random sample of the original data. By aggregating over a forest, we obtain measures of variable importance that are more robust than measures from single trees. A description of the method was published by Brandmaier, von Oertzen, McArdle, & Lindenberger (2013) <doi:10.1037/a0030001> and Arnold, Voelkle, & Brandmaier (2020) <doi:10.3389/fpsyg.2020.564403>. |
| Authors: | Andreas M. Brandmaier [aut, cre], John J. Prindle [aut], Manuel Arnold [aut], Caspar J. Van Lissa [aut], Moritz John [ctb] |
| Maintainer: | Andreas M. Brandmaier <[email protected]> |
| License: | GPL-3 |
| Version: | 0.10.0 |
| Built: | 2026-05-25 07:13:49 UTC |
| Source: | https://github.com/brandmaier/semtree |
SEM Tree Package
.SCALE_METRIC.SCALE_METRIC
An object of class numeric of length 1.
This function aggregates variable importance estimates over trees. It is a helper function used when print() is called on a variable importance estimate from a SEM forest.
aggregateVarimp( vimp, aggregate = c("mean", "median"), scale = c("absolute", "relative.baseline"), omit.na = TRUE )aggregateVarimp( vimp, aggregate = c("mean", "median"), scale = c("absolute", "relative.baseline"), omit.na = TRUE )
vimp |
Variable importance estimate from a SEM forest. |
aggregate |
Character. Either 'mean' or 'median' as function to aggregate estimates over a forest |
scale |
Character. Either 'absolute' or 'relative'. |
omit.na |
Boolean. By default TRUE, which ignores NA estimates when aggregating. Otherwise they are interpreted as zero. |
A function to calculate biodiversity of a semforest object.
biodiversity(x, aggregate.fun = median)biodiversity(x, aggregate.fun = median)
x |
A |
aggregate.fun |
Takes a function to apply to the vector of pairwise diversities. By default, this is the median. |
Andreas M. Brandmaier
Grows a series of SEM Forests following the boruta algorithm to determine feature importance as moderators of the underlying model.
boruta( model, data, control = NULL, predictors = NULL, maxRuns = 30, pAdjMethod = "none", alpha = 0.05, verbose = FALSE, quant = 1, ... )boruta( model, data, control = NULL, predictors = NULL, maxRuns = 30, pAdjMethod = "none", alpha = 0.05, verbose = FALSE, quant = 1, ... )
model |
A template SEM. Same as in |
data |
A dataframe to boruta on. Same as in |
control |
A semforest control object to set forest parameters. |
predictors |
An optional list of covariates. See semtree code example. |
maxRuns |
Maximum number of boruta search cycles |
pAdjMethod |
A value from p.adjust.methods defining a multiple testing correction method |
alpha |
p-value cutoff for decision making. Default .05 |
verbose |
Verbosity level for boruta processing similar to the same argument in semtree.control and semforest.control |
quant |
Quantile for selection. Default 1. |
... |
Optional parameters to undefined subfunctions |
A vim object with several elements that need work. Of particular note, '$importance' carries mean importance; '$decision' denotes Accepted/Rejected/Tentative; '$impHistory' has the entire varimp history; and '$details' has exit values for each parameter.
Priyanka Paul, Timothy R. Brick, Andreas Brandmaier
Return the parameter estimates of a given leaf of a SEM tree
## S3 method for class 'semtree' coef(object, ...)## S3 method for class 'semtree' coef(object, ...)
object |
semtree. A SEM tree node. |
... |
Extra arguments. Currently unused. @exportS3Method coef semtree |
Wrapper function for computing the maxLR corrected p value from strucchange
computePval_maxLR(maxLR, q, covariate, from, to, nrep)computePval_maxLR(maxLR, q, covariate, from, to, nrep)
maxLR |
maximum of the LR test statistics |
q |
number of free SEM parameters / degrees of freedom |
covariate |
covariate under evaluation. This is important to get the level of measurement from the covariate and the bin size for ordinal and categorical covariates. |
from |
numeric from interval (0, 1) specifying start of trimmed sample period. With the default from = 0.15 the first and last 15 percent of observations are trimmed. This is only needed for continuous covariates. |
to |
numeric from interval (0, 1) specifying end of trimmed sample period. By default, to is 1. |
nrep |
numeric. Number of replications used for simulating from the asymptotic distribution (passed to efpFunctional). Only needed for ordinal covariates. |
Numeric. p value for maximally selected LR statistic
Manuel Arnold
This function is a generic to count the number of occurences of
predictors in either a
"semtree" or a "semforest". Note that this must not
be confused with the importance of those predictors. To estimate
importance, rather use permutation-based variable importance with
varimp.
countPredictors(x, ...) ## S3 method for class 'semtree' countPredictors(x, ...) ## S3 method for class 'semforest' countPredictors(x, ...)countPredictors(x, ...) ## S3 method for class 'semtree' countPredictors(x, ...) ## S3 method for class 'semforest' countPredictors(x, ...)
x |
An object representing either a SEM tree or a forest. |
... |
Additional arguments passed to methods. |
A result depending on the method.
Computes a diversity matrix using a distance function between trees
diversityMatrix(forest, divergence = klsym, showProgressBar = TRUE)diversityMatrix(forest, divergence = klsym, showProgressBar = TRUE)
forest |
A SEM forest |
divergence |
A divergence function such as hellinger or klsym |
showProgressBar |
Boolean. Show a progress bar. |
Evaluates the average deviance (-2LL) of a dataset given a forest.
evaluate(x, data = NULL, ...)evaluate(x, data = NULL, ...)
x |
A fitted |
data |
A data.frame |
... |
No extra parameters yet. |
Average deviance
Andreas M. Brandmaier
Brandmaier, A.M., Oertzen, T. v., McArdle, J.J., & Lindenberger, U. (2013). Structural equation model trees. Psychological Methods, 18(1), 71-86.
evaluateDataLikelihood, semtree,
semforest
This helper function is used
in the semforest varimp and
proximity aggregate functions.
evaluateDataLikelihood( model, data, data_type = "raw", loglik = c("default", "model", "mvn") )evaluateDataLikelihood( model, data, data_type = "raw", loglik = c("default", "model", "mvn") )
model |
|
data |
Data set to apply to a fitted model. |
data_type |
Type of data ("raw", "cov", "cor") |
loglik |
Character. Either 'model' for model-based evaluation or 'mvn' for multivariate normal density. |
Returns a -2LL model fit for the model
Andreas M. Brandmaier, John J. Prindle
Brandmaier, A.M., Oertzen, T. v., McArdle, J.J., & Lindenberger, U. (2013). Structural equation model trees. Psychological Methods, 18(1), 71-86.
A helper function to evaluate the negative two log-likelihood (-2LL) of leaf (terminal) nodes for a
dataset. When given a semtree and a unique dataset, the model
estimates -2LL for the tree parameters and data subsets that fit the tree
branching criteria.
evaluateTree( tree, test_set, data_type = "raw", leaf_ids = NULL, loglik = c("default", "model", "mvn") )evaluateTree( tree, test_set, data_type = "raw", leaf_ids = NULL, loglik = c("default", "model", "mvn") )
tree |
A fitted |
test_set |
Dataset to fit to a fitted |
data_type |
type of data ("raw", "cov", "cor") |
leaf_ids |
Identifies which nodes are leaf nodes. Default is NULL, which checks model for leaf nodes and fills this information in automatically. |
loglik |
Algorithm to compute log likelihood. The default is 'model' and refers to a model-based computation. This is preferable because it is more general. As an alternative, 'mvn' computes the log likelihood based on the multivariate normal density and the model-implied mean and covariance matrix. |
A list with two elements:
deviance |
Combined -2LL for leaf node models of the tree. |
num_models |
Number of leaf nodes used for the deviance calculations. |
Andreas M. Brandmaier, John J. Prindle
Brandmaier, A.M., Oertzen, T. v., McArdle, J.J., & Lindenberger, U. (2013). Structural equation model trees. Psychological Methods, 18(1), 71-86.
evaluateDataLikelihood, semtree,
semforest
Search tool to search nodes for alternative splitting values found during
the semtree process. Given a particular node, competing split
values are listed assuming they also meet the criteria for a significant
splitting value as set by semtree.control.
findOtherSplits(node, tree)findOtherSplits(node, tree)
node |
A node from a |
tree |
A |
A data.frame() with rows corresponding to the variable names
and split values for alternative splits found in the node of interest.
...
Andreas M. Brandmaier, John J. Prindle
Brandmaier, A.M., Oertzen, T. v., McArdle, J.J., & Lindenberger, U. (2013). Structural equation model trees. Psychological Methods, 18(1), 71-86.
Fit multigroup model for evaluating a candidate split. This function has the fundamental logic for fitting two-group models in different variants
fitSubmodels( model, subset1, subset2, control, invariance = NULL, return.models = FALSE )fitSubmodels( model, subset1, subset2, control, invariance = NULL, return.models = FALSE )
model |
A model specification that is used as template for each of the two groups |
subset1 |
Dataset for the first group model |
subset2 |
Dataset for the second group model |
control |
a |
invariance |
fit models with invariant parameters if given. NULL otherwise (default). |
return.models |
boolean. Return the fitted models returns NA if fit fails |
Returns the length of the longest path from a root node to a leaf node.
getDepth(tree)getDepth(tree)
tree |
A |
Andreas M. Brandmaier
Brandmaier, A.M., Oertzen, T. v., McArdle, J.J., & Lindenberger, U. (2013). Structural equation model trees. Psychological Methods, 18(1), 71-86.
Returns height of a SEM Tree, which equals to the length of the longest path from root to a terminal node.
getHeight(tree)getHeight(tree)
tree |
A SEM tree. |
Example: A SEM tree with only the root node has depth 0. A SEM tree with one decision node has depth 1.
Andreas M. Brandmaier, John J. Prindle
Brandmaier, A.M., Oertzen, T. v., McArdle, J.J., & Lindenberger, U. (2013). Structural equation model trees. Psychological Methods, 18(1), 71-86.
Get a list of all leafs in a tree by recursively searching the tree starting
at the given node (if not data object is given. If data is
given, the function returns the leafs that are predicted for each row of the
given data.
getLeafs(tree, data = NULL)getLeafs(tree, data = NULL)
tree |
A |
data |
A |
Andreas M. Brandmaier
Brandmaier, A.M., Oertzen, T. v., McArdle, J.J., & Lindenberger, U. (2013). Structural equation model trees. Psychological Methods, 18(1), 71-86.
Return a node matching a given node ID
getNodeById(tree, id)getNodeById(tree, id)
tree |
A SEM Tree object. |
id |
Numeric. A Node id. |
Andreas M. Brandmaier, John J. Prindle
Brandmaier, A.M., Oertzen, T. v., McArdle, J.J., & Lindenberger, U. (2013). Structural equation model trees. Psychological Methods, 18(1), 71-86.
Counts the number of nodes in a tree.
getNumNodes(tree)getNumNodes(tree)
tree |
A SEM tree object. |
Andreas M. Brandmaier, John J. Prindle
Brandmaier, A.M., Oertzen, T. v., McArdle, J.J., & Lindenberger, U. (2013). Structural equation model trees. Psychological Methods, 18(1), 71-86.
Returns a list of tables with some measure of parameter differences between post-split nodes.
getParDiffForest(forest, measure = "wald", normalize = FALSE)getParDiffForest(forest, measure = "wald", normalize = FALSE)
forest |
a semforest object. |
measure |
a character. "wald" (default) gives the squared parameter differences divided by their pooled standard errors. test" gives the contributions of the parameters to the test statistic. "raw" gives the absolute values of the parameter differences. |
normalize |
logical value; if TRUE parameter differences of each split are divided by sum of all differences the corresponding split. Set to FALSE by default. |
A list with data.frames containing parameter differences for each
tree of the forest. The rows of the data.frames correspond to the non-leaf
nodes of the respective trees. The first column contains the name of the
predictor variables and the remaining columns contain the parameter
differences. The rows of the data.frames are named by the node IDs as given
getNodeById and the columns are named as in coef.
Manuel Arnold
Returns a table with some measure of parameter differences between post-split nodes.
getParDiffTree(tree, measure = "wald", normalize = FALSE)getParDiffTree(tree, measure = "wald", normalize = FALSE)
tree |
a semtree object. |
measure |
a character. "wald" (default) gives the squared parameter differences divided by their pooled standard errors. "test" gives the contributions of the parameters to the test statistic."raw" gives the absolute values of the parameter differences. |
normalize |
logical value; if TRUE parameter differences of each split are divided by sum of all differences the corresponding split. Set to FALSE by default. |
A matrix containing parameter differences. The
matrix has rows and columns, where is the number of
non-leaf nodes of the tree and is the number of model parameters. The
rows are named by the node IDs as given getNodeById and the columns
are named as in coef.
Manuel Arnold
Returns all leafs (=terminal nodes) of a tree.
getTerminalNodes(tree)getTerminalNodes(tree)
tree |
A semtree object. |
Andreas M. Brandmaier, John J. Prindle
Brandmaier, A.M., Oertzen, T. v., McArdle, J.J., & Lindenberger, U. (2013). Structural equation model trees. Psychological Methods, 18(1), 71-86.
Tests whether a semtree object is a leaf. Returns TRUE or FALSE.
isLeaf(tree)isLeaf(tree)
tree |
A |
Andreas M. Brandmaier
Brandmaier, A.M., Oertzen, T. v., McArdle, J.J., & Lindenberger, U. (2013). Structural equation model trees. Psychological Methods, 18(1), 71-86.
Divergence measures for multivariate normal distributions as used in the diversityMatrix function.
kl(mu1, cov1, mu2, cov2)kl(mu1, cov1, mu2, cov2)
mu1 |
Mean vector |
cov1 |
Covariance matrix |
mu2 |
Mean vector |
cov2 |
Covariance matrix |
This data set provides simple data to fit with a LGCM.
lgcm is a matrix containing 400 rows and 8 columns of
simulated data. Longitudinal observations are o1-o5. Covariates are
agegroup, training, and noise.
Andreas M. Brandmaier [email protected]
This overrides generic base::merge() to merge two forests into one.
## S3 method for class 'semforest' merge(x, y, ...)## S3 method for class 'semforest' merge(x, y, ...)
x |
A SEM Forest |
y |
A second SEM Forest |
... |
Extra arguments. Currently unused. |
Andreas M. Brandmaier, John J. Prindle
Brandmaier, A.M., Oertzen, T. v., McArdle, J.J., & Lindenberger, U. (2013). Structural equation model trees. Psychological Methods, 18(1), 71-86.
Return model estimates of the tree.
modelEstimates(tree, ...)modelEstimates(tree, ...)
tree |
A semtree object. |
... |
Optional arguments. |
Andreas M. Brandmaier, John J. Prindle
Brandmaier, A.M., Oertzen, T. v., McArdle, J.J., & Lindenberger, U. (2013). Structural equation model trees. Psychological Methods, 18(1), 71-86.
Compute outlier score based on proximity matrix.
outliers(prox)outliers(prox)
prox |
A proximity matrix. |
Andreas M. Brandmaier, John J. Prindle
Brandmaier, A.M., Oertzen, T. v., McArdle, J.J., & Lindenberger, U. (2013). Structural equation model trees. Psychological Methods, 18(1), 71-86.
Returns a table of parameters with columns corresponding to freely estimated parameters and rows corresponding to nodes in the tree.
parameters(tree, leafs.only = TRUE)parameters(tree, leafs.only = TRUE)
tree |
A SEMtree object obtained from |
leafs.only |
Default = TRUE. Only the terminal nodes (leafs) are
printed. If set to FALSE, all node parameters are written to the
|
The row names of the resulting data frame correspond to internal node ids
and the column names correspond to parameters in the SEM. Standard errors of
the estimates can be obtained from parameters.
Returns a data.frame with rows for parameters and columns for
terminal nodes.
Andreas M. Brandmaier, John J. Prindle
Brandmaier, A.M., Oertzen, T. v., McArdle, J.J., & Lindenberger, U. (2013). Structural equation model trees. Psychological Methods, 18(1), 71-86.
Compute the partial dependence of a predictor, or set of predictors, on a model parameter.
partialDependence( x, data, reference.var, support = 20, points = NULL, mc = NULL, FUN = "median", ... )partialDependence( x, data, reference.var, support = 20, points = NULL, mc = NULL, FUN = "median", ... )
x |
An object for which a method exists |
data |
Optional |
reference.var |
Character vector, referring to the (independent) reference variable or variables for which partial dependence is calculated. Providing two (or more) variables allows for probing interactions, but note that this is computationally expensive. |
support |
Integer. Number of grid points for interpolating the
|
points |
Named list, with elements corresponding to |
mc |
Integer. If |
FUN |
Character string with function used to integrate predictions
across all elements of |
... |
Extra arguments passed to |
Caspar J. Van Lissa, , Andreas M. Brandmaier
Create a dataset with fixed values for reference.var for all other
values of data, or using mc random samples from data
(Monte Carlo integration).
partialDependence_data( data, reference.var, support = 20, points = NULL, mc = NULL, keep_id = FALSE )partialDependence_data( data, reference.var, support = 20, points = NULL, mc = NULL, keep_id = FALSE )
data |
The |
reference.var |
Character vector, referring to the (independent) reference variable or variables for which partial dependence is calculated. Providing two (or more) variables allows for probing interactions, but note that this is computationally expensive. |
support |
Integer. Number of grid points for interpolating the
|
points |
Named list, with elements corresponding to |
mc |
Integer. If |
keep_id |
Boolean. Default is false. Should output contain a row id column? marginal dependency using Monte Carlo integration. This is less computationally expensive. |
Caspar J. Van Lissa
Compute the partial dependence of a predictor, or set of predictors, on the predicted trajectory of a latent growth model.
partialDependence_growth( x, data, reference.var, support = 20, points = NULL, mc = NULL, FUN = "median", times = NULL, parameters = NULL, ... )partialDependence_growth( x, data, reference.var, support = 20, points = NULL, mc = NULL, FUN = "median", times = NULL, parameters = NULL, ... )
x |
An object for which a method exists |
data |
Optional |
reference.var |
Character vector, referring to the (independent) reference variable or variables for which partial dependence is calculated. Providing two (or more) variables allows for probing interactions, but note that this is computationally expensive. |
support |
Integer. Number of grid points for interpolating the
|
points |
Named list, with elements corresponding to |
mc |
Integer. If |
FUN |
Character string with function used to integrate predictions
across all elements of |
times |
Numeric matrix, representing the factor loadings of a latent
growth model, with columns equal to the number of growth |
parameters |
Character vector of the names of the growth parameters;
defaults to |
... |
Extra arguments passed to |
Caspar J. Van Lissa
Visualizes parameter differences between post-split nodes in a forest with boxplots.
plotParDiffForest( forest, plot = "boxplot", measure = "wald", normalize = FALSE, predictors = NULL, title = TRUE )plotParDiffForest( forest, plot = "boxplot", measure = "wald", normalize = FALSE, predictors = NULL, title = TRUE )
forest |
a semforest object. |
plot |
a character that specifies the plot typ. Available plot types are "boxplot" (default) and "jitter" for a jittered strip plot with mean and standard deviation. |
measure |
a character. "wald" (default) gives the squared parameter differences divided by their pooled standard errors. "test" gives the contributions of the parameters to the test statistic. "raw" gives the absolute values of the parameter differences. |
normalize |
logical value; if TRUE parameter differences of each split are divided by sum of all differences the corresponding split. Set to FALSE by default. |
predictors |
a character. Select predictors that are to be plotted. |
title |
logical value; if TRUE a title is added to the plot. |
Manuel Arnold
Visualizes parameter differences between post-split nodes with different plot types.
plotParDiffTree( tree, plot = "ballon", measure = "wald", normalize = FALSE, title = TRUE, structure = FALSE )plotParDiffTree( tree, plot = "ballon", measure = "wald", normalize = FALSE, title = TRUE, structure = FALSE )
tree |
a semtree object. |
plot |
a character that specifies the plot typ. Available plot types are "ballon" (default), "heatmap", and "bar". |
measure |
a character. "wald" (default) gives the squared parameter differences divided by their pooled standard errors. "test" gives the contributions of the parameters to the test statistic. "raw" gives the absolute values of the parameter differences. |
normalize |
logical value; if TRUE parameter differences of each split are divided by sum of all differences the corresponding split. Set to FALSE by default. |
title |
logical value; if TRUE a title is added to the plot. |
structure |
logical value; if TRUE the structure of the tree is plotted on the right side. |
Manuel Arnold
Plots the structure of a semtree object. This function is
similar to plot.semtree, but it does not print the parameter values in
the leaf nodes and labels the leaf nodes instead.
plotTreeStructure(tree, type = 2, no.plot = FALSE, ...)plotTreeStructure(tree, type = 2, no.plot = FALSE, ...)
tree |
a semtree object. |
type |
Type of plot. See |
no.plot |
logical value; if TRUE structure of the tree is printed to the console. |
... |
additional arguments passed to |
Manuel Arnold
Predict method for semtree and semforest
## S3 method for class 'semforest' predict(object, data, type = "node_id", ...)## S3 method for class 'semforest' predict(object, data, type = "node_id", ...)
object |
Object of class |
data |
New test data of class |
type |
Type of prediction. One of ‘c(’node_id')'. See Details. |
... |
further arguments passed to or from other methods. |
Object of class matrix.
Caspar J. van Lissa, Andreas Brandmaier
Compute a n by n matrix across all trees in a forest, where n is the number of rows in the data, reflecting the proportion of times two cases ended up in the same terminal node of a tree.
proximity(x, data, ...)proximity(x, data, ...)
x |
An object for which a method exists. |
data |
A data.frame on which proximity is computed |
... |
Parameters passed to other functions. |
SEM Forest Case Proximity
A matrix with dimensions [i, j] whose elements reflect the proportion of times case i and j were in the same terminal node of a tree.
Caspar J. Van Lissa, Andreas M. Brandmaier, John J. Prindle
Brandmaier, A.M., Oertzen, T. v., McArdle, J.J., & Lindenberger, U. (2013). Structural equation model trees. Psychological Methods, 18(1), 71-86.
nodeids <- structure(c(9, 3, 5, 7, 10, 4, 6, 8, 9, 3, 5, 7, 10, 4, 6, 8), .Dim = c(4L, 4L)) class(nodeids) <- "semforest_node_id" sims <- proximity(nodeids) dd <- as.dist(1-sims) hc <- hclust(dd) groups <- cutree(hc, 2)nodeids <- structure(c(9, 3, 5, 7, 10, 4, 6, 8, 9, 3, 5, 7, 10, 4, 6, 8), .Dim = c(4L, 4L)) class(nodeids) <- "semforest_node_id" sims <- proximity(nodeids) dd <- as.dist(1-sims) hc <- hclust(dd) groups <- cutree(hc, 2)
Returns a new tree with a maximum depth selected by the user. can be used in conjunction with plot commands to view various pruning levels.
prune(object, ...)prune(object, ...)
object |
A |
... |
Optional parameters, such as |
The returned tree is only modified by the number of levels for the tree.
This function does not reevaluate the data, but provides alternatives to
reduce tree complexity. If the user would like to alter the tree by
increasing depth, then max.depth option must be adjusted in the
semtree.control object (provided further splits are able to be
computed).
Returns a semtree object.
Andreas M. Brandmaier, John J. Prindle
Brandmaier, A.M., Oertzen, T. v., McArdle, J.J., & Lindenberger, U. (2013). Structural equation model trees. Psychological Methods, 18(1), 71-86.
## Not run: # prunes a tree to a maximum depth of five prune(tree, max.depth = 5) # prunes a tree that removes all nodes that have at least one # non-converged daugther node prune(tree, converged = TRUE) ## End(Not run)## Not run: # prunes a tree to a maximum depth of five prune(tree, max.depth = 5) # prunes a tree that removes all nodes that have at least one # non-converged daugther node prune(tree, converged = TRUE) ## End(Not run)
Returns a table of standard errors with columns corresponding to freely estimated standard errors and rows corresponding to nodes in the tree.
se(tree, leafs.only = TRUE)se(tree, leafs.only = TRUE)
tree |
A SEMtree object obtained from |
leafs.only |
Default = TRUE. Only the terminal nodes (leafs) are
printed. If set to FALSE, all node standard errors are written to the
|
The row names of the resulting data frame correspond to internal node ids
and the column names correspond to standard errors in the SEM. Parameter
estimates can be obtained from parameters.
Returns a data.frame with rows for parameters and columns for
terminal nodes.
Andreas M. Brandmaier, John J. Prindle
Brandmaier, A.M., Oertzen, T. v., McArdle, J.J., & Lindenberger, U. (2013). Structural equation model trees. Psychological Methods, 18(1), 71-86.
semtree, semtree.control,
parameters
Grows a SEM Forest from a template model and a dataset. This may take some time.
semforest( model, data, control = NULL, predictors = NULL, constraints = NULL, ... )semforest( model, data, control = NULL, predictors = NULL, constraints = NULL, ... )
model |
A template SEM. Same as in |
data |
A dataframe to create a forest from. Same as in |
control |
A semforest control object to set forest parameters. |
predictors |
An optional list of covariates. See semtree code example. |
constraints |
An optional list of covariates. See semtree code example. |
... |
Optional parameters. |
A semforest object.
Andreas M. Brandmaier, John J. Prindle
Brandmaier, A.M., Prindle, J. J., McArdle, J. J., & Lindenberger, U. (2016). Theory-guided exploration with structural equation model forests. Psychological Methods, 21(4), 566–582.
Brandmaier, A.M., Oertzen, T. v., McArdle, J.J., & Lindenberger, U. (2013). Structural equation model trees. Psychological Methods, 18(1), 71–86.
A SEM Forest control object to tune parameters of the forest learning algorithm.
semforest.control( num.trees = 5, sampling = "subsample", control = NA, mtry = 2, remove_dead_trees = TRUE, logfile = FALSE )semforest.control( num.trees = 5, sampling = "subsample", control = NA, mtry = 2, remove_dead_trees = TRUE, logfile = FALSE )
num.trees |
Number of trees. |
sampling |
Sampling procedure. Can be subsample or bootstrap. |
control |
A SEM Tree control object. Will be generated by default. |
mtry |
Number of subsampled covariates at each node. |
remove_dead_trees |
Remove trees from forest that had runtime errors |
logfile |
Boolean/Character. If FALSE, no log file is written. Otherwise a logfile is written to the path given in this argument. |
Andreas M. Brandmaier, John J. Prindle
Brandmaier, A.M., Oertzen, T. v., McArdle, J.J., & Lindenberger, U. (2013). Structural equation model trees. Psychological Methods, 18(1), 71-86.
Structural equation model (SEM) trees combine SEM with decision-trees (a paradigm also known as recursive partitioning).
semtree( model, data = NULL, control = NULL, constraints = NULL, predictors = NULL, ... )semtree( model, data = NULL, control = NULL, constraints = NULL, predictors = NULL, ... )
model |
A template model specification from OpenMx using
the |
data |
A |
control |
A |
constraints |
A |
predictors |
A vector of variable names matching variable names in the data set. If NULL (default) all variables that are in data set and not part of the model are potential predictors. |
... |
Optional arguments. |
Core idea: Instead of assuming that one SEM fits all individuals equally well, SEM Trees recursively split the sample into subgroups based on covariates (e.g., age, gender, SES) such that model parameters differ between subgroups. This results in a tree where each node contains an SEM, revealing heterogeneity in model structure or parameters across groups.
The package supports model specification in lavaan and OpenMx.
Calling semtree with an mxModel or
lavaan model fits the template model to the entire data set and
then recurses over the following steps until no further meaningful partition
into sub groups is found:
Fit the model on the current node's data and compute model fit.
For each predictor, find the split point that best
improves model fit – this is either done by likelihood ratio tests or
score-based tests as determined in the semtree.control.
Select the best-performing predictor/split combination unless
a stopping rule applies, e.g., split is not significant given a
significance criterion (alpha), there are too few observations in
a node (min.N, min.bucket, a maximum depth is reached, max.depth, or a
custom stopping rule applies.
Continue the procedure independently on each resulting sub group
Predictors can be categorical (ordered or unordered) or continuous. When using unordered categorical predictors with many levels, the number of candidate partitions grows quickly, so limiting the predictor set can reduce computation and the number of multiple comparisons.
Splitting quality can be evaluated with three built-in strategies:
1. "naive" selection compares all possible split values across all predictors and chooses the best overall improvement.
2. "fair" selection uses a two-step procedure at each node: a first phase on half the sample identifies the best split value per predictor, and a second phase on the remaining data picks the most promising predictor among those candidates.
3. "score" relies on score-based statistics that provide faster evaluations while retaining favorable statistical properties for detecting parameter instabilities.
All other parameters controlling the tree growing process are adjusted
in the semtree.control object.
In order to get robust estimates of the importance of predictors,
consider growing a semforest
A semtree object. This can be further examined with
summary, plot, and print.
Andreas M. Brandmaier, John J. Prindle, Manuel Arnold
Brandmaier, A.M., Oertzen, T. v., McArdle, J.J., & Lindenberger, U. (2013). Structural equation model trees. Psychological Methods, 18(1), 71-86.
Arnold, M., Voelkle, M. C., & Brandmaier, A. M. (2021). Score-guided structural equation model trees. Frontiers in Psychology, 11, Article 564403. https://doi.org/10.3389/fpsyg.2020.564403
semtree.control, summary.semtree,
parameters, se, prune.semtree,
subtree, OpenMx,
lavaan
A semtree_control object contains parameters that determine the tree
growing process. These parameters include choices of different split
candidate selection procedures and hyperparameters of those. Calling the
constructor without parameters creates a default control object. A number of
tree growing methods are included in with this package: 1. 'naive' splitting
takes the best split value of all possible splits on each covariate. 2.
'fair' selection is so called because it tests all splits on half of the
data, then tests the best split value for each covariate on the other half
of the data. The equal footing of each covariate in this two phase test
removes bias from testing variables with many possible splits compared to
those with few. 3. "fair3" does the phases described above, with an
additional step of retesting all of the split values on the best covariate
found in the second phase. Variations in the sample from subsetting are
removed and bias in split selection further reduced. 4. 'score'
implements modern score-based statistics.
semtree_control( method = c("naive", "score", "fair", "fair3"), min.N = NULL, max.depth = NA, alpha = 0.05, alpha.invariance = NA, exclude.heywood = TRUE, progress.bar = TRUE, verbose = FALSE, bonferroni = FALSE, use.all = FALSE, seed = NA, custom.stopping.rule = NA, mtry = NA, report.level = 0, exclude.code = NA, linear = TRUE, min.bucket = NULL, missing = "ignore", use.maxlr = FALSE, strucchange.from = 0.15, strucchange.to = NULL, strucchange.nrep = 50000, refit = TRUE, ctsem_sd = FALSE, loglik = c("default", "model", "mvn"), check.convergence = TRUE, chunk.random.samples = 0 )semtree_control( method = c("naive", "score", "fair", "fair3"), min.N = NULL, max.depth = NA, alpha = 0.05, alpha.invariance = NA, exclude.heywood = TRUE, progress.bar = TRUE, verbose = FALSE, bonferroni = FALSE, use.all = FALSE, seed = NA, custom.stopping.rule = NA, mtry = NA, report.level = 0, exclude.code = NA, linear = TRUE, min.bucket = NULL, missing = "ignore", use.maxlr = FALSE, strucchange.from = 0.15, strucchange.to = NULL, strucchange.nrep = 50000, refit = TRUE, ctsem_sd = FALSE, loglik = c("default", "model", "mvn"), check.convergence = TRUE, chunk.random.samples = 0 )
method |
Default: 'naive'. One of
|
min.N |
Integer. Default: 'NULL' heuristically selects this number based on the number of parameters in the model. Minimum sample size per node used to determine whether splitting can continue. It is recommended to set 'min.N' explicitly. |
max.depth |
Integer. Default: NA. Maximum levels per a branch. Parameter for limiting tree growth. |
alpha |
Numeric. Default: 0.05. Significance level for splitting at a given node. |
exclude.heywood |
Default: TRUE. Reports whether there is an identification problem in the covariance structure of an SEM tested. |
progress.bar |
Boolean. Default: TRUE. Option to enable or disable the progress bar for tree growth. |
verbose |
Boolean. Default: FALSE. Option to turn on or off all model messages during tree growth. |
bonferroni |
Boolean. Default: FALSE. Correct for multiple tests with Bonferroni type correction. p-values are adjusted for the number of variables tested. |
use.all |
Boolean. Treatment of missing variables. By default, missing values stay in a decision node. If TRUE, cases are distributed according to a maximum likelihood principle to the child nodes. |
seed |
Default: 'NA'. Set a random-number seed to make randomized parts of tree analysis reproducible (for example, in fair splitting or subsampling procedures). |
custom.stopping.rule |
Default: NA. Otherwise, this can be a boolean function with a custom stopping rule for tree growing. |
mtry |
Default: NA. Number of sample columns to use in SEMforest analysis. |
report.level |
Integer. Default: 0. Values up to 99 increase console reporting detail during tree growth and can help diagnose fitting or split-selection issues. |
exclude.code |
Default: NA. NPSOL error code for exclusion from model fit evaluations when finding best split. Default: Models with errors during fitting are retained. |
linear |
If TRUE (default), the structural equation model is assumed to not contain any nonlinear parameter constraints and scores are computed analytically, resulting in a shorter runtime. Only relevant for models fitted with OpenMx. |
min.bucket |
Integer. Minimum bucket size. This is the minimum size any node must have, such that a given split is considered valid. Minimum bucket size is a lower bound to the sample size in the terminal nodes of a tree. |
missing |
Missing value treatment. Default is ignore |
use.maxlr |
Boolean. Use MaxLR statistic for split point selection (as proposed by Arnold et al., 2021). This corrects the bias in the LR statistics incurred by testing multiple split points within one variable. |
strucchange.from |
Strucchange argument. See their package documentation. |
strucchange.to |
Strucchange argument. See their package documentation. |
strucchange.nrep |
Strucchange argument. See their package documentation. |
refit |
If TRUE (default) the initial model is fitted on the data
provided to |
ctsem_sd |
If FALSE (default) no standard errors of CT model parameters are computed. Requesting standard errors increases runtime. |
loglik |
Character.Character. Algorithm to compute log likelihood. '"default"' depends on the chosen SEM package: '"mvn"' for lavaan and '"model"' for all other packages. '"model"' refers to model-based computation and is more general. '"mvn"' computes likelihood from the multivariate normal density using model-implied means and covariance matrices. multivariate normal density and the model-implied mean and covariance matrix. |
check.convergence |
Boolean. Should convergence be checked when growing a tree. Default: TRUE |
chunk.random.samples |
Integer. Controlling split-point subsampling for 'method = "naive"'. '0' (default) evaluates all eligible split points. Values '> 0' evaluate random chunks of split points, which can speed up tree growth on very large datasets at the cost of a less exhaustive search. |
Numeric. |
alpha.invariance Default: NA. Significance level for invariance tests. If NA, the value of alpha is used. |
A control object containing a list of the above parameters.
Andreas M. Brandmaier, John J. Prindle, Manuel Arnold
Brandmaier, A.M., Oertzen, T. v., McArdle, J.J., & Lindenberger, U. (2013). Structural equation model trees. Psychological Methods, 18(1), 71-86.
Arnold, M., Voelkle, M. C., & Brandmaier, A. M. (2021). Score-guided structural equation model trees. Frontiers in Psychology, 11, Article 564403. https://doi.org/10.3389/fpsyg.2020.564403
# create a control object with an alpha level of 1% my_control <- semtree_control(alpha=0.01) # set the minimum number of cases per node to ten my_control$min.N <- 10 # print contents of the control object print(my_control)# create a control object with an alpha level of 1% my_control <- semtree_control(alpha=0.01) # set the minimum number of cases per node to ten my_control$min.N <- 10 # print contents of the control object print(my_control)
A SEM Tree constraints object holds information regarding specifics on how the tree is grown (similar to the control object). The SEM tree control object holds all information that is independent of a specific model whereas the constraints object holds information that is specific to a certain model (e.g., specifies differential treatment of certain parameters, e.g., by holding them constant across the forest).
semtree.constraints(local.invariance = NULL, focus.parameters = NULL)semtree.constraints(local.invariance = NULL, focus.parameters = NULL)
local.invariance |
Vector of parameter names that are locally equal, that is, they are assumed to be equal when assessing a local split but allowed to differ subsequently. |
focus.parameters |
Vector of parameter names that exclusively are evaluated for between-group differences when assessing split candidates. If NULL all parameters add to the difference. |
Andreas M. Brandmaier, John J. Prindle
Brandmaier, A.M., Oertzen, T. v., McArdle, J.J., & Lindenberger, U. (2013). Structural equation model trees. Psychological Methods, 18(1), 71-86.
Removes all elements of a semforest or semtree
except for the tree structure and terminal node parameters. This is to
reduce the heavy memory footprint of sem trees and forests.
strip(x, parameters = NULL)strip(x, parameters = NULL)
x |
An object for which a method exists. |
parameters |
Character vector, referencing parameters in the SEM model.
Defaults to |
Objects of class semforest and semtree are very
large, which complicates downstream operations such as making partial
dependence plots, or using the model in interactive contexts (like Shiny
apps). Running strip removes all elements of the model
except for the tree structure and terminal node parameters. Note that some
methods are no longer available for the resulting object - e.g.,
varimp requires the terminal node SEM models to compute the
likelihood ratio.
List
## Not run: if(interactive()){ #EXAMPLE1 } ## End(Not run)## Not run: if(interactive()){ #EXAMPLE1 } ## End(Not run)
Creates subsets of a forest. This can be used to subset a number of trees, e.g. from:(from+num), or to remove all null (type="nonnull") trees that were due to errors, or to randomly select a sub forest (type=random).
subforest(forest, num = NULL, type = "nonnull", from = 1)subforest(forest, num = NULL, type = "nonnull", from = 1)
forest |
A SEM Forest object. |
num |
Number of trees to select. |
type |
Either 'random' or 'nonnull' or NULL. First selects a random subset, second selects all non-null trees, third allows subsetting trees. |
from |
Starting index if type=NULL. |
Andreas M. Brandmaier, John J. Prindle
Brandmaier, A.M., Oertzen, T. v., McArdle, J.J., & Lindenberger, U. (2013). Structural equation model trees. Psychological Methods, 18(1), 71-86.
The subtree function returns a tree from a selected node of the
semtree returned tree.
subtree(tree, startNode = NULL, level = 0, foundNode = FALSE)subtree(tree, startNode = NULL, level = 0, foundNode = FALSE)
tree |
A SEMtree object obtained from |
startNode |
Node id, which will be future root node (0 to max node
number of |
level |
Ignore. Only used internally. |
foundNode |
Ignore. Only used internally. |
The row names of the resulting data frame correspond to internal node ids
and the column names correspond to standard errors in the SEM. Standard
errors of the estimates can be obtained from se.
Returns a semtree object which is a partitioned tree
from the input semtree.
Andreas M. Brandmaier, John J. Prindle
Brandmaier, A.M., Oertzen, T. v., McArdle, J.J., & Lindenberger, U. (2013). Structural equation model trees. Psychological Methods, 18(1), 71-86.
This is a function that removes all "empty" trees, that is, those that contain only the root node and no splits
thinOut(x)thinOut(x)
x |
A SEM forest |
Converts a tree into a tabular representation. This may be useful as a textual representation for use in manuscripts.
toTable(tree, added.param.cols = NULL, round.param = 3)toTable(tree, added.param.cols = NULL, round.param = 3)
tree |
A SEM Tree object. |
added.param.cols |
String. Add extra columns with parameter estimates. Pass a vector with the names of the parameters that should be rendered in the table. |
round.param |
Integer. Number of digits to round parameter estimates. Default is rounding to three digits after the decimal point. |
Andreas M. Brandmaier
Brandmaier, A. M., Ram, N., Wagner, G. G., & Gerstorf, D. (in press). Terminal decline in well-being: The role of multi-indicator constellations of physical health and psychosocial correlates. Developmental Psychology.
A function to calculate relative variable importance for selecting node
splits over a semforest object.
varimp( forest, var.names = NULL, verbose = FALSE, eval.fun = evaluateTree, method = NULL, conditional = FALSE, strict = TRUE, ... )varimp( forest, var.names = NULL, verbose = FALSE, eval.fun = evaluateTree, method = NULL, conditional = FALSE, strict = TRUE, ... )
forest |
A |
var.names |
Covariates used in the forest creation process. NULL value will be automatically filled in by the function. |
verbose |
Boolean to print messages while function is running. |
eval.fun |
Default is |
method |
Character. Define the method, with which importance is computed. The default is NULL and picks the appropriate permutation-based estimation method depending on whether no focus parameters are given ("permutation") or focus parameters are given ("permutationFocus") |
conditional |
Conditional variable importance if TRUE, otherwise marginal variable importance. |
strict |
Boolean. Default is TRUE. Only consider estimates from models if there were no model convergence problems. Otherwise, partial results are used, which may incur some downward bias. |
... |
Optional arguments. |
Andreas M. Brandmaier, John J. Prindle
Brandmaier, A.M., Oertzen, T. v., McArdle, J.J., & Lindenberger, U. (2013). Structural equation model trees. Psychological Methods, 18(1), 71-86.