Artículo

La versión final de este artículo es de uso interno de la institución.
Consulte el artículo en la página del editor
Consulte la política de Acceso Abierto del editor

Abstract:

Detecting outlying observations is an important step in any analysis, even when robust estimates are used. In particular, the robustified Mahalanobis distance is a natural measure of outlyingness if one focuses on ellipsoidal distributions. However, it is well known that the asymptotic chi-square approximation for the cutoff value of the Mahalanobis distance based on several robust estimates (like the minimum volume ellipsoid, the minimum covariance determinant and the S-estimators) is not adequate for detecting atypical observations in small samples from the normal distribution. In the multi-population setting and under a common principal components model, aggregated measures based on standardized empirical influence functions are used to detect observations with a significant impact on the estimators. As in the one-population setting, the cutoff values obtained from the asymptotic distribution of those aggregated measures are not adequate for small samples. More appropriate cutoff values, adapted to the sample sizes, can be computed by using a cross-validation approach. Cutoff values obtained from a Monte Carlo study using S-estimators are provided for illustration. A real data set is also analyzed. © 2010 Elsevier B.V. All rights reserved.

Registro:

Documento: Artículo
Título:Detecting influential observations in principal components and common principal components
Autor:Boente, G.; Pires, A.M.; Rodrigues, I.M.
Filiación:Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires, Argentina
CONICET, Argentina
Departamento de Matemática and CEMAT, Instituto Superior Técnico, Technical University of Lisbon (TULisbon), Lisboa, Portugal
Palabras clave:Common principal components; Detection of outliers; Influence functions; Robust estimation; Multivariable systems; Normal distribution; Asymptotic distributions; Influence functions; Influential observations; Mahalanobis distances; Minimum covariance determinant; Minimum volume ellipsoids; Principal Components; Robust estimation; Method of moments
Año:2010
Volumen:54
Número:12
Página de inicio:2967
Página de fin:2975
DOI: http://dx.doi.org/10.1016/j.csda.2010.01.001
Título revista:Computational Statistics and Data Analysis
Título revista abreviado:Comput. Stat. Data Anal.
ISSN:01679473
CODEN:CSDAD
Registro:https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_01679473_v54_n12_p2967_Boente

Referencias:

  • Becker, C., Gather, U., The masking breakdown point of multivariate outlier identification rules (1999) Journal of the American Statistical Association, 94, pp. 947-955
  • Becker, C., Gather, U., The largest nonidentifiable outlier: A comparison of multivariate simultaneous outlier identification rules (2001) Computational Statistics and Data Analysis, 36 (1), pp. 119-127. , DOI 10.1016/S0167-9473(00)00032-3, PII S0167947300000323
  • Boente, G., Pires, A.M., Rodrigues, I.M., Influence functions and outlier detection under the common principal components model: A robust approach (2002) Biometrika, 89 (4), pp. 861-875. , DOI 10.1093/biomet/89.4.861
  • Boente, G., Pires, A.M., Rodrigues, I.M., General projection-pursuit estimators for the common principal components model: Influence functions and Monte Carlo study (2006) Journal of Multivariate Analysis, 97 (1), pp. 124-147. , DOI 10.1016/j.jmva.2004.11.007, PII S0047259X04002313
  • Chen, T., Martin, E., Montague, G., Robust probabilistic PCA with missing data and contribution analysis for outlier detection (2009) Computational Statistics and Data Analysis, 53, pp. 3706-3716
  • Critchley, F., Influence in principal components analysis (1985) Biometrika, 72, pp. 627-636
  • Croux, C., Haesbroeck, G., Empirical influence functions for robust principal component analysis (1999) Proceedings of the Statistical Computing Section of the American Statistical Association, pp. 201-206. , Am. Statist. Assoc., Alexandria, VA
  • Croux, C., Haesbroeck, G., Principal component analysis based on robust estimators of the covariance or correlation matrix: Influence functions and efficiencies (2000) Biometrika, 87, pp. 603-618
  • Donoho, D.L., (1982) Breakdown Properties of Multivariate Location Estimators, , Ph.D. Thesis. Harvard University (in English)
  • Filzmoser, P., Maronna, R., Werner, M., Outlier identification in high dimensions (2008) Computational Statistics and Data Analysis, 52, pp. 1694-1711
  • Flury, B.N., Common principal components in k groups (1984) Journal of the American Statistical Association, 79, pp. 892-898
  • Flury, B.N., (1988) Common Principal Components and Related Multivariate Models, , John Wiley, New York
  • Hardin, J., Rocke, D., The distribution of robust distances (2005) Journal of Computational and Graphical Statistics, 14, pp. 928-946
  • Hubert, M., Rousseeuw, P., Verdonck, T., Robust PCA for skewed data and its outlier map (2009) Computational Statistics and Data Analysis, 53, pp. 2264-2274
  • Oliveira, I., Variedades de castanheiros em Trás-os-Montes (1995) Uma Análise em Componentes Principais Dos Caracteres Morfológicos da Folha, , Master Thesis. Universidade de Lisboa (in Portuguese)
  • Pison, G., Rousseeuw, P.J., Filzmoser, P., Croux, C., A robust version of principal factor analysis (2000) Compstat: Proceedings in Computational Statistics, pp. 385-390. , Bethlehem, J., van der Heijden, P. (Eds.), Physica-Verlag, Heidelberg
  • Rousseeuw, P.J., Multivariate estimation with high breakdown point (1985) Mathematical Statistics and Applications, B, pp. 283-297. , Grossmann, W., et al. (Eds.), Akadémiai Kiadó, Budapest
  • Rousseeuw, P.J., Van Zomeren, B.C., Unmasking multivariate outliers and leverage points (1990) Journal of the American Statistical Association, 85, pp. 633-639
  • Rousseeuw, P.J., Yohai, V.J., Robust regression by means of S-estimators (1984) Lecture Notes in Statistics, 26, pp. 256-272. , Franke, J., et al. (Eds.), Robust and Nonlinear Time Series Analysis. In: Springer-Verlag, New York
  • Serneels, S., Verdonck, T., Principal component analysis for data containing outliers and missing elements (2008) Computational Statistics and Data Analysis, 52 (3), pp. 1712-1727. , DOI 10.1016/j.csda.2007.05.024, PII S0167947307002241
  • Shi, L., Local influence in principal components analysis (1997) Biometrika, 84, pp. 175-186
  • Stahel, W.A., (1981) Robust Estimation: Infinitesimal Optimality and Covariance Matrix Estimators, , Ph.D. Thesis. ETH, Zurich (in German)

Citas:

---------- APA ----------
Boente, G., Pires, A.M. & Rodrigues, I.M. (2010) . Detecting influential observations in principal components and common principal components. Computational Statistics and Data Analysis, 54(12), 2967-2975.
http://dx.doi.org/10.1016/j.csda.2010.01.001
---------- CHICAGO ----------
Boente, G., Pires, A.M., Rodrigues, I.M. "Detecting influential observations in principal components and common principal components" . Computational Statistics and Data Analysis 54, no. 12 (2010) : 2967-2975.
http://dx.doi.org/10.1016/j.csda.2010.01.001
---------- MLA ----------
Boente, G., Pires, A.M., Rodrigues, I.M. "Detecting influential observations in principal components and common principal components" . Computational Statistics and Data Analysis, vol. 54, no. 12, 2010, pp. 2967-2975.
http://dx.doi.org/10.1016/j.csda.2010.01.001
---------- VANCOUVER ----------
Boente, G., Pires, A.M., Rodrigues, I.M. Detecting influential observations in principal components and common principal components. Comput. Stat. Data Anal. 2010;54(12):2967-2975.
http://dx.doi.org/10.1016/j.csda.2010.01.001