1. Introduction
Confirmatory factor analysis (CFA) and structural equation models (SEM) are amongst the most important statistical approaches for analyzing multivariate data in the social sciences [
1,
2,
3,
4,
5,
6,
7]. These models relate a multivariate vector
of observed
I variables (also referred to as items) to a vector of latent variables (i.e., factors)
of lower dimension using a linear model. SEMs represent the mean vector
and the covariance matrix
of the random variable
as a function of an unknown parameter vector
. That is, the mean vector is represented as
, and the covariance matrix is given by
.
SEM and CFA, as particular cases, impose a measurement model that relates the observed variables
to latent variables
In addition, we denote the covariance matrix
, and
as well as
are multivariate normally distributed random vectors. Note that
and
are assumed to be uncorrelated. In CFA, the multivariate normal (MVN) distribution is represented as
and
. Hence, one can represent the mean and the covariance matrix in CFA as
In SEM, more structured relationships among the latent variables can be imposed. A matrix
of regression coefficients is specified such that
Hence, the mean vector and the covariance matrix are represented in SEM as
where
denotes the identity matrix.
In practice, SEM parsimoniously parametrizes the mean vector and the covariance matrix using a parameter
as a summary. As is true with all statistical models, these restrictions are unlikely to hold in practice, and model assumptions in SEM are only an approximation of a true data-generating model. In SEM, model deviations (i.e., model errors) in covariances emerge as a difference between a population covariance matrix
and a model-implied covariance matrix
(see [
8,
9]). Similarly, there can be model errors in the mean vector leading to a difference between the population mean vector
and the model-implied mean vector
. As a consequence, the SEM is misspecified at the population level. Note that the model errors are defined at the population level in infinite sample sizes. With finite samples in real-data applications, the empirical covariance matrix
estimates the population covariance matrix
and the mean vector
estimates the population mean vector
. In this article, we investigate estimators that possess some kind of resistance against model deviations. That is, the presence of some amount of model errors does not impact the parameter estimate
. This kind of robustness is referred to as model robustness and follows the principles of robust statistics [
10,
11,
12]. While in classical robust statistics, observations (i.e., cases or subjects) do not follow an imposed statistical model and should be treated as outliers, model errors in SEM occur as residuals in the modeled mean vector and the modeled covariance matrix. That is, an estimator in an SEM should automatically treat large deviations in
and
as outliers that should not (substantially) impact the estimated parameter
. We compare regularized maximum likelihood estimation with robust moment estimation approaches. Robust moment estimation has been scarcely used in SEM. We propose an alternative and more flexible loss function for robust moment estimation and demonstrate that it results in similar statistical performance compared to the computationally much more tedious maximum likelihood estimation.
In this article, we discuss model-robust estimation of SEM when modeling means and covariance matrices in the more general case of multiple groups [
4,
13]. Model errors in the modeled covariance matrix can emerge to incorrectly specified loadings in the matrix
or unmodeled residual error correlations in the matrix
. In the following, we only consider the case of unmodeled residual error correlations. Moreover, model errors in the modeled mean vectors are mainly due to incorrectly specified item intercepts
. In the multiple-group SEM, this case is often referred to as a violation of measurement invariance [
14,
15,
16]. The investigation of model-robust SEM estimators under such a measurement noninvariance situation is also the focus of this article. For discrete items, measurement noninvariance is also referred to as differential item functioning (DIF; [
17,
18]), and several approaches for model-robust estimation have been proposed (e.g., [
19,
20,
21]).
The remainder of the article is organized as follows: Different model-robust estimation methods are discussed in
Section 2.
Section 3 presents results from a simulation study with unmodeled residual error correlations. In
Section 4, we focus on standard error estimation methods in a selected number of conditions of the simulation study presented in
Section 3.
Section 5 presents results from a simulation study with unmodeled item intercepts which indicates the presence of violations of measurement invariance (i.e., the presence of DIF). In
Section 6, results from an empirical example involving data from the European social survey are offered. Finally, the article closes with a discussion in
Section 7.
3. Simulation Study 1: Unmodeled Residual Error Correlation
In this Simulation Study 1, we compare the estimation approaches RME and RegML in a misspecified single-group two-dimensional factor model. It is of interest which of the estimation methods is resistant to model misspecification. It can be expected that RegML and RME with powers will provide some resistance to model errors.
3.1. Method
The data-generating model is a two-dimensional factor model involving six manifest variables
, and two latent (factor) variables
and
. The data-generating model is graphically presented in
Figure 1. The first three items load on the first factor, while the last three item load on the second factor. Model errors exist by introducing two unmodeled residual correlations of items corresponding to the two factors (i.e., of items
and
, and items
and
).
The means of all variables were set to equal in the population. However, the mean structure was left saturated in estimation because it is not of interest in this simulation study. The specified factor loadings were all set to 0.7, and all error variances were set to 0.5. The latent correlation of the factors and was set to 0.5 in the simulation. The factor variable , as well as the residuals , were multivariate normally distributed.
We varied the size of both residual error correlations as with , 0.3, and 0.6, indicating moderate-sized and large-sized residual error correlations. The sample size N was chosen to 500, 1000, or 2500.
We estimated the two-dimensional factor model with unmodeled residual error correlations. That is, the analysis models were misspecified (except for RegML). The six factor loadings and the six residual variances were freely estimated. The variances of the latent factors were set to 1. We employed ML, ULS, RegML, and RME, utilizing the loss function with powers , 0.5, and 1. ML and ULS can be considered non-robust SEM estimation approaches, while RegML and RME are considered robust approaches. In total, six different estimates were compared for each simulated dataset in each condition.
For RegML, we only imposed the SCAD penalty on nondiagonal entries in the residual covariance matrix
. RegML was estimated at a grid of regularization parameters between 10 and 0.002 (see replication material on
https://osf.io/v8f45 (accessed on 27 March 2023) for specification details). In this simulation study, the tuning parameter
a was fixed to 3.7. We chose estimates of the regularization approach using the optimal regularization
based on the BIC.
We studied the estimation recovery of the factor correlation
, a factor loading
, and a residual variance
. We assessed the bias and root mean square error (RMSE) of a parameter
of interest. In each of
replications in a simulation condition, the estimate
(
) was calculated. The bias was estimated by
The RMSE was estimated by
To ease the comparability of the RMSE between different methods across sample sizes, we used a relative RMSE in which we divided the RMSE of a particular method by the RMSE of the best-performing method in a simulation condition. Hence, a relative RMSE of 100 is the reference value for the best-performing method.
The statistical software R [
60] was employed for all parts of the simulation. All estimators of the SEM were obtained using the
mgsem() function in the sirt [
61] package. Replication material can be found at
https://osf.io/v8f45 (accessed on 27 March 2023).
3.2. Results
In
Table 1, the bias and the relative RMSE for the factor correlation
, the factor loading
, and the residual variance
are presented.
It turned out that in the condition of no unmodeled residual error correlations (i.e., and no model misspecification), no biases occurred for all parameters. However, there were slight efficiency losses with RME for , which were particularly pronounced for the powers and for the residual variance. The efficiency loss for the factor loading for the robust estimation methods was slightly smaller than for the residual variance. Finally, there was almost no efficiency loss for the robust methods for the factor correlation.
In the conditions with model errors (i.e., in the case of unmodeled residual correlations), ML and ULS were severely biased for the factor correlation. The bias for ML and ULS turned out to be slightly smaller for the factor loading and the residual variance. RegML and RME, using the powers or , were unbiased, while RME with had a small bias for the latent correlation. Regarding efficiency, RME with was superior to .
In general, RME with performed quite competitively to RegML. Given the fact that RMSE is computationally much less demanding than RegML, one could prefer model-robust SEM estimation in the presence of unmodeled residual error correlations using the robust loss function in practical applications.
5. Simulation Study 2: Noninvariant Item Intercepts (DIF)
In this Simulation Study 2, we investigate the impact of unmodelled group-specific item intercepts in a multiple-group one-dimensional factor model. The presence of group-specific item intercepts indicates measurement noninvariance (i.e., in the presence of DIF). This simulation study investigates whether robust estimation approaches can handle the occurrence of DIF.
5.1. Method
The setup of the simulation study closely follows [
32]. Data were simulated from a one-dimensional factor model involving five items and three groups. The factor variable
was normally distributed with group means
,
, and
. The group variances were set to 1, 1.5, and 1.2, respectively. All factor loadings were set to 1, and all measurement error variances were set to 1 in all groups and uncorrelated with each other. The factor variable, as well as the residuals, were normally distributed.
DIF effects were simulated in exactly one of the five items in each group. In the first group, the fourth item intercepts had a DIF effect . In the second group, the first item had a DIF effect , while the second item had a DIF effect in the third group. The DIF effect was chosen as either 0 (no DIF, measurement invariance), 0.3 (small DIF), or 0.6 (moderate DIF). The sample size per group was chosen as , 1000, or 2000.
A multiple-group one-dimensional SEM was specified as the analysis model. The analysis model assumes invariant item intercepts and factor loadings. In the first group, the factor mean was set to 0, and the factor variance was set to 1 for identification reasons. Note that the data-generating model included some group-specific item intercepts that remained unmodelled in the analysis models (except for RegML). This, in turn, led to misspecified analysis models. ML, ULS, RegML with the SCAD penalty on group-specific item intercepts, and RME with powers , 0.5, and 1 were utilized. In RegML, the optimal regularization parameter was chosen based on the BIC.
We investigated the estimation recovery of group means
and
of the second and the third group, respectively. Bias and relative RMSE (see
Section 3.1) were again used for assessing the performance of the estimates. In total, 2000 replications were conducted. The models were estimated using the
mgsem() function in the R [
60] package sirt [
61]. Replication material can be found at
https://osf.io/v8f45 (accessed on 27 March 2023).
5.2. Results
In
Table 3, bias and relative RMSE for the factor means of the second and the third group (i.e.,
and
) are presented.
In the condition of no DIF effects (i.e., ), all different estimators were unbiased. There were minor efficiency losses for robust moment estimation with and for the sample size . However, in larger samples, there were almost no differences among the different estimators.
In the simulation conditions with small or moderate DIF effects, ML and ULS were substantially biased. Notably, RME with also produced a nonnegligible bias. RME with powers or performed similarly to RegML concerning bias. RME had only slightly increased efficiency losses. Interestingly, RME performed better than in terms of RMSE.
Given the fact that there were only small efficiency losses with RME, the computationally more expensive RegML could perhaps be avoided in practice if the goal is estimating factor group means.
6. Empirical Example: ESS 2005 Data
We now present an empirical example to illustrate the performance of the different robust and non-robust SEM estimation approaches.
6.1. Method
In this empirical example, we use a dataset that was also analyzed in [
32,
63,
64]. The data came from the European social survey (ESS) conducted in the year 2005 (ESS 2005) including subjects from 26 countries. The latent factor variable of tradition and conformity was assessed by four items presented in portrait format, where the scale of the items is such that a high value represents a low level of tradition conformity. The wording of the four items was (see [
32]): “It is important for him to be humble and modest. He tries not to draw attention to himself.” (item
TR9), “Tradition is important to him. He tries to follow the customs handed down by his religion or family.” (item
TR20), “He believes that people should do what they’re told. He thinks people should follow rules at all times, even when no one is watching.” (item
CO7), and “It is important for him to always behave properly. He wants to avoid doing anything people would say is wrong.” (item
CO16). The full dataset used in [
32] was downloaded from
https://www.statmodel.com/Alignment.shtml (accessed on 27 March 2023). For this empirical example, a subsample of 33,060 persons from 17 selected countries was included to restrict the range of variability of country factor means. The sample sizes per country ranged between 1358 and 2963, with an average of 1997.1 (
). We only included participants in the sample that had no missing values on all four items.
We specified a one-dimensional factor model with 17 groups (i.e., 17 countries) assuming invariant item parameters (i.e., invariant intercepts, loadings, and residual variances) in the estimation approaches ML, ULS, and RME, with powers , 0.5, and 1. In RegML, the SCAD penalty was imposed on group-specific item intercepts.
The obtained country means and country standard deviations of the factor variable were linearly transformed for all different estimators such that the total population comprising all persons from all 17 countries had a mean of 0 and a standard deviation of 1.
The analysis was conducted using the
mgsem() function from the R [
60] package sirt [
61]. The processed dataset and replication syntax can be found at
https://osf.io/v8f45 (accessed on 27 March 2023).
6.2. Results
In the following, we present the estimates of RegML with the obtained optimal regularization parameter of 0.02.
In
Table 4, the country means for the six different estimators are presented. The maximum absolute difference between country means stemming from the different models ranged between 0.032 and 0.337, with an average of 0.116 (
). This showed considerable variability. Hence, the different robust and non-robust estimators differently weighted DIF effects in the computation of country means.
We also computed the ranks of the 17 countries across the six different estimates. The maximum absolute country rank difference ranged between 0 (country C10, rank 1) and 8 (country C21, rank 2) with an average of 3.9 (). A large range for country means was observed for country C21 (rank 2). The means for this country ranged between −0.01 (RME) and 0.33 (ML). Note that robust approaches provided similar results, but they substantially differed from RME with , ULS, and ML. Similarly, for country C06 (rank 3), there was a range of country means with values between 0.06 and 0.26. However, low dependencies of country means from the model estimator choice were observed for countries C10 (rank1), C09 (rank 13), and C13 (rank 14).
In
Figure 2, average absolute differences in country means between different models are displayed. Large absolute differences are displayed in darker red-brown color, while small differences are colored in light yellow-orange. RME with
(i.e., RME0.25) and RME with
(i.e., “RME0.5”) provided similar country means with an average distance of 0.01. Notably, RegML also resulted in similar estimates to RME, with
or
(i.e., average distances of 0.02 or 0.03). In addition, ML and ULS resulted in similar estimates. Interestingly, RME with
performed slightly differently compared to the other approaches. As in the simulation studies, it did not show full robustness to outlying DIF effects in item intercepts.
In
Figure 3, the regularization paths of the estimated country means as a function of the regularization parameter
are displayed. The RegML estimates of the country means are displayed on the left side of the figure, and the ML estimates are displayed on the right side of the figure. Interestingly, the paths are not monotone, and there is some variability in country mean estimates depending on the choice of the regularization parameter.
In
Table 5, the estimated country-specific item intercepts for regularized ML estimation are presented. Overall, three countries had three regularized item intercepts, 12 countries had two regularized item intercepts, and two countries had one regularized item intercept. In total, 35 out of 68 item intercepts were regularized (i.e., the country-specific item intercepts were estimated as zero).
The regularization paths for country-specific item intercepts for four selected countries, C01, C09, C13, and C21, are displayed in
Figure 4. One immediately recognizes that more and stronger deviations from invariance were observed with smaller values of the regularization parameter
.
7. Discussion
In this article, we have demonstrated that, using model-robust SEM estimation methods, such as regularized maximum likelihood estimation and robust moment estimation, provides some resistance to model errors for modeled means and covariances. Importantly, this robustness property is only guaranteed if model errors are sparsely distributed. That is, most of the model errors are zero (or approximately zero), while only a few model errors are allowed to differ from zero. The plausibility of this assumption should be evaluated for each empirical application that potentially involves model error. Moreover it is noted that robust moment SEM estimation approaches did not result in practically relevant efficiency losses compared to non-robust approaches (i.e., maximum likelihood or unweighted least squares estimation) in the absence of model errors. Hence, robust estimation approaches can be recommended in empirical applications with moderate to large sample sizes (i.e., ). Importantly, robust moment estimation is computationally much less demanding than regularized estimation. Moreover, it also provides valid standard errors that are much more difficult to obtain with regularized ML estimation.
We applied robust moment estimation and regularized ML estimation for unmodeled item intercepts in Simulation Study 2 in
Section 5. In this case, there is essentially no need to utilize the more computationally expensive regularization technique. The crucial assumption is that there are only a few model deviations in the modeled mean vectors and covariance matrices. This property is also referred to as the sparsity assumption [
51], which means that the majority of the entries in mean vectors and covariance matrices are correctly specified. The methods discussed in this article will typically fail if the majority of or all the elements in the mean vectors and covariance matrices are misspecified [
65,
66]. This will likely be the case if the item loadings in the confirmatory factor model vary across groups. This situation is referred to as nonuniform DIF. Model deviations in item loadings lead to model deviations in the covariance matrix that is not as sparse as deviations in the mean structure. Hence, this is a situation in which regularized ML estimation might be preferred over robust moment estimation.
The treatment in this article was based on the multivariate normal distribution utilizing mean vectors and covariance matrices as sufficient statistics. Sometimes, researchers tend to model ordinal items by relying on underlying latent normally distributed variables [
67,
68]. In this case, thresholds and polychoric correlations replace mean vectors and covariance matrices in the weighted least squares fitting function. The approach based on model-robust fitting function can also be simply transferred to the case of modeling ordinal items. Future research might compare the performance of robust moment estimation with limited information methods relying on thresholds and polychoric correlations with regularized maximum likelihood estimation. Clearly, the computational advantages of model-robust limited information methods would be even more pronounced compared to the case of continuous items.
It should be noted that multivariate normality is not a requirement for obtaining consistent estimates in correctly specified structural equation models [
37]. This article investigates the robustness of parameter estimates regarding model misspecification in the mean and the covariance structure. There is a distinct literature that focuses on robust estimation of SEM in the violation of multivariate normality [
69,
70]. These approaches might result in more efficient estimates in the case of heavy-tailed or contaminated distributions. However, outliers in this article are defined in the sense that some entries of the mean vector and the covariance matrix are misspecified infinite sample sizes (i.e., at the population level).
As pointed out in this article, the multiple-group SEM with unmodeled item intercepts corresponds to the estimation in the violation of measurement invariance (i.e., in the presence of differential item functioning). Linking approaches, such as invariance alignment [
32,
71] or robust Haberman linking [
30], also deal with the estimation of factor means in one- or multidimensional confirmatory factor models. They do so by first estimating model parameters of the factor model separately in each group in the first step. In the second step, group-specific estimated item intercepts and factor loadings are transformed with a robust or non-robust linking function to identify factor means and factor standard deviations. This approach is advantageous if there is a misfit in the mean and the covariance structure. If the covariance structure is correctly specified, a joint robust moment estimation approach of the multiple-group SEM could have higher efficiency. In practical applications, it has been shown that misspecification more frequently occurs in the mean structure than in the covariance structure [
72]. Hence, the robust moment estimation approach proposed in this article might be a viable alternative to the frequently employed invariance alignment technique. Future research could thoroughly compare robust moment estimation with invariance alignment or robust Haberman linking.
It has been argued that regularized estimation could also be applied with a fixed tuning parameter [
73]. Recent research indicated that model parameter estimation could improve when using a fixed tuning parameter instead of an optimal tuning parameter based on the minimal information criterion [
20]. Notably, regularized estimation in SEM is also referred to as penalized estimation [
74]. In this framework, the penalty function
is interpreted as a fixed prior distribution that has a density
[
74] using the so-called alignment loss function (see [
30,
32]).
An anonymous reviewer also suggested investigating the Huber loss function [
10,
38] in addition to the
power loss function
(
). As a disadvantage, the Huber loss function requires an additional tuning parameter. We have some limited empirical experience in using the Huber loss for robust Haberman linking [
30]. In this situation, the Huber loss function performed in an inferior way to the
loss function. This was particularly the case for asymmetrically distributed (i.e., asymmetrically structured) model deviations. Studying a wider range of robust loss functions in multiple-group SEM might be an interesting topic for future research, although we are somewhat skeptical that the Huber loss function will outperform the
loss function in the conditions of our simulation studies.
Finally, it is always a substantive question of whether model errors are allowed to have outliers or not [
75]. That is, should statistical models automatically remove particular means or covariances for estimating the parameter
? We attacked such a mechanistic approach to model estimation and argued that model deviations, such as violations of measurement invariance, should not automatically result in model refinement [
76]. Hence, we do not think that there is always an optimal loss function in every application [
75]. If this were true, the statistical inference would always be based on maximum likelihood estimation. However, we believe that researchers intentionally choose a statistical parameter of interest and apply M-estimation theory [
77] for the statistical inference that is not based on correctly specified models. In this sense, SEMs can be estimated with non-robust estimation approaches (such as maximum likelihood or unweighted least squares estimation) using an intentionally misspecified model [
78,
79]. Appearing model errors can be included as additional uncertainty in standard errors of the estimated
parameter. In SEM, ref. [
9] proposed such an approach by imposing a stochastic model for model errors. We think that this approach should deserve more attention in empirical research.