Image

Missing Values in SmartPLS

Missing values are common in empirical research and should be addressed before estimating a partial least squares structural equation modeling (PLS-SEM) model in SmartPLS (Ringle, Wende, & Becker, 2024). In SmartPLS, missing values are handled at the data level. The selected missing value treatment can affect the number of observations used, the variation in the data, and the stability of the PLS-SEM results (Hair, Hult, Ringle, & Sarstedt, 2027).
Before choosing a missing value treatment, inspect the amount and pattern of missing data in your dataset. Missing values may occur randomly, but they can also indicate systematic issues in the data collection process, such as skipped survey sections, misunderstood questions, or respondent fatigue.

Coding missing values in your data file

SmartPLS identifies missing values by a user-defined missing value marker. When preparing your data file (e.g., CSV or Excel format), code missing values consistently, for example, with a value that cannot occur in the data (such as -99) or by leaving the cell empty. After importing the dataset, specify the missing value marker in the data view settings so that SmartPLS correctly recognizes missing entries. An incorrectly defined marker is a frequent source of errors: unrecognized missing values are then treated as regular data points and distort the results.

Types of missing data: MCAR, MAR, and MNAR

The methodological literature distinguishes three missing data mechanisms (see Hair et al., 2027, Chapter 2; Liu, Chin, Cheah, Hair, & Lyu, 2025):
  • Missing completely at random (MCAR): The probability of a missing value is unrelated to any observed or unobserved data. Simple treatments such as mean replacement or deletion are least problematic under MCAR.
  • Missing at random (MAR): Missingness is related to other observed variables (e.g., older respondents skip a specific question more often). Deletion methods can introduce bias; more advanced treatments should be considered.
  • Missing not at random (MNAR): Missingness depends on the unobserved value itself (e.g., respondents with extreme opinions refuse to answer). This is the most problematic case and requires careful assessment of the affected indicators and observations.
Diagnosing the missing data mechanism, at least descriptively, helps to select an appropriate treatment and to justify this choice in the research report.

Missing value treatment options in SmartPLS

SmartPLS provides the following options for handling missing values:
  • None keeps missing values untreated. Use this option only if your dataset does not contain missing values or if you want to inspect the data before selecting a treatment method.
  • Mean replacement replaces missing values with the mean value of the corresponding indicator. In the PLS-SEM literature, this method is commonly referred to as mean value replacement. SmartPLS uses the shorter label Mean replacement in the software interface.
  • Casewise deletion (also known as listwise deletion) excludes observations that contain one or more missing values.
  • Pairwise deletion uses all available observations for each calculation, depending on which values are available.
Each option has advantages and limitations. Mean replacement (i.e., mean value replacement) is easy to apply, but it reduces the variance of the affected indicators and can attenuate relationships in the model. Casewise deletion is straightforward, but it can substantially reduce the sample size and may introduce bias when the missing values are not missing completely at random. Pairwise deletion retains more information, but different calculations may be based on different subsets of observations, which can produce inconsistent results. None should generally not be used when missing values are present and the analysis requires complete numerical input.

Which missing value option should I choose?

As a rule of thumb, use mean replacement only when the amount of missing data is small, that is, less than 5% of the values per indicator are missing, and the missing values appear to be randomly distributed (Hair et al., 2027). Use casewise deletion with caution: if removing incomplete observations would eliminate a substantial share of the sample, or if missingness is concentrated in specific respondent groups, deletion can bias the results and undermine statistical power. Pairwise deletion should only be used when the researcher is aware of its implications and can justify that calculations based on different subsets of observations are acceptable.
The following decision aid summarizes these recommendations:
SituationRecommended approach
No missing values in the datasetNone
Less than 5% missing per indicator, randomly distributed (MCAR)Mean replacement
Few affected observations, large sample, missingness randomCasewise deletion
More than 5% missing per indicator, or systematic patterns (MAR/MNAR)Consider advanced treatment before import (e.g., multiple imputation); assess whether affected indicators or observations should be removed
More than 15% missing for a single indicator or observationConsider removing the affected indicator or observation entirely
Recent empirical comparisons support this cautious approach. Liu et al. (2025) discuss strategies and provide practical insights for handling missing data in PLS-SEM-based business research, and Amusa and Hossana (2024) empirically compare the performance of different missing data treatments in PLS-SEM. Both studies underline that the choice of treatment is not neutral: it can affect parameter estimates and the conclusions drawn from the model, especially when missing data are frequent or systematic.

Advanced treatments before importing data into SmartPLS

The options available in SmartPLS cover the standard treatments used in most applied PLS-SEM studies. When missing data are more substantial or clearly not missing completely at random, researchers can apply more advanced procedures, such as multiple imputation or expectation-maximization (EM) based imputation, in statistical software (e.g., R or SPSS) before importing the completed dataset into SmartPLS. In this case, select None as the missing value treatment in SmartPLS, since the imported dataset no longer contains missing values. Document the imputation procedure and its settings in the research report.
When preparing data for a PLS-SEM analysis, we recommend the following workflow:
  1. Identify missing values in the data matrix before running the PLS-SEM algorithm, and verify that the missing value marker is correctly specified in SmartPLS.
  2. Assess the extent of missing data for each indicator and observation (e.g., percentage of missing values per indicator).
  3. Check whether missing values follow a systematic pattern, for example, whether they are concentrated in specific indicators, constructs, or respondent groups (MCAR vs. MAR/MNAR).
  4. Choose a treatment method that is appropriate for the amount and pattern of missing data, using the decision aid above.
  5. Document the selected approach in the research report, including the amount of missing data and the method used to handle it.
As a general rule, small amounts of randomly distributed missing data are less problematic than larger or systematic patterns of missingness. When missing data are substantial or appear to be systematic, researchers should carefully evaluate whether the affected indicators or observations should be retained, treated, or removed.

Reporting missing values in PLS-SEM studies

In academic reporting, researchers should be transparent about how missing values were handled. A concise report may include:
The dataset was examined for missing values before estimating the PLS-SEM model in SmartPLS (Ringle, Wende, & Becker, 2024). Less than X% of the values per indicator were missing, and the pattern of missingness showed no systematic concentration in specific indicators or respondent groups. Missing values were handled using selected method, following the recommendations of Hair et al. (2027). The final dataset included number observations for the analysis.
When the amount of missing data is relevant, also report the percentage of missing values per indicator or the number of observations removed.

Frequently asked questions

What does Mean replacement do in SmartPLS?

Mean replacement replaces each missing value with the mean value of the corresponding indicator. This method is commonly known as mean value replacement in the methodological literature. It is easy to apply, but it reduces the variance of the affected indicator and can attenuate the estimated relationships.

Is Mean replacement the same as mean value replacement?

Yes. In SmartPLS, the software option is labeled Mean replacement to keep the interface concise. The corresponding methodological term is mean value replacement.

How much missing data is acceptable in PLS-SEM?

As a rule of thumb, mean replacement is acceptable when less than 5% of the values per indicator are missing and the missing values are randomly distributed (Hair et al., 2027). When more data are missing, or when missingness follows a systematic pattern, researchers should consider deletion, advanced imputation before importing the data, or removing the affected indicators or observations.

What is the difference between Casewise deletion and Pairwise deletion?

Casewise deletion (listwise deletion) removes observations that contain one or more missing values. Pairwise deletion uses all available observations for each calculation, which can retain more data but may lead to calculations being based on different subsets of observations.

Should I use mean replacement or casewise deletion?

Use mean replacement when only a small share of values is missing (less than 5% per indicator) and the sample size should be preserved. Use casewise deletion when only a few observations are affected and the remaining sample remains sufficiently large. When missing data are frequent or systematic, neither simple method may be adequate; consider advanced imputation before importing the data into SmartPLS.

How do I code missing values in my data file for SmartPLS?

Code missing values consistently in your data file, for example, with a unique numeric marker such as -99 or by leaving cells empty, and specify this missing value marker in the SmartPLS data settings after importing the dataset. Verify in the data view that SmartPLS correctly reports the number of missing values per indicator.

Can I leave missing values untreated in SmartPLS?

The None option keeps missing values untreated. This option is useful for inspecting the data, when the dataset does not contain missing values, or when missing values were already imputed in external software before the import. If untreated missing values are present, researchers should select an appropriate treatment method before estimating the model.

Further reading

For a comprehensive introduction to data examination and missing value treatment in PLS-SEM, see Chapter 2 in Hair et al. (2027). For recent methodological discussions and empirical comparisons of missing data treatments in PLS-SEM, see Liu et al. (2025) and Amusa and Hossana (2024).

References

  • Hair, J. F., Hult, G. T. M., Ringle, C. M., & Sarstedt, M. (2027). A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM) (4 ed.). Sage.
  • Ringle, C. M., Wende, S., & Becker, J.-M. (2024). SmartPLS 4. In SmartPLS. https://www.smartpls.com/
  • Liu, Y., Chin, W. W., Cheah, J.-H., Hair, J. F., & Lyu, C. (2025). Tackling Missing Data in PLS-SEM: Strategies and Insights for Business Research. Journal of Business Research, 201, 115739.
  • Amusa, L. B., & Hossana, T. (2024). An Empirical Comparison of Some Missing Data Treatments in PLS-SEM. PLOS ONE, 19(1), e0297037.
  • Sarstedt, M., Ringle, C. M., & Hair, J. F. (2025). Partial Least Squares Structural Equation Modeling. In C. Homburg, M. Klarmann, & A. Vomberg (Eds.), Handbook of Market Research. Springer Nature Switzerland.
  • More Literature