Abstract:
Real data may contain both cellwise outliers and casewise outliers. There is a vast literature on robust estimation for casewise outliers, but only a scant literature for cellwise outliers and almost none for both types of outliers. Estimation of multivariate location and scatter matrix is a corner stone in multivariate data analysis. A two-step approach was recently proposed to perform robust estimation of multivariate location and scatter matrix in the presence of cellwise and casewise outliers. In the first step a univariate filter was applied to remove cellwise outliers. In the second step a generalized S-estimator was used to downweight casewise outliers. This proposal can be further improved in three main directions. First, through the introduction of a consistent bivariate filter to be used in combination with the univariate filter in the first step. Second, through the proposal of a new fast subsampling procedure to generate starting points for the generalized S-estimator in the second step. Third, through the use of a non-monotonic weight function for the generalized S-estimator to better handle casewise outliers in high dimension. A simulation study and a real data example show that, unlike the original two-step procedure, the modified two-step approach performs and scales well in high dimension. Moreover, they show that the modified procedure outperforms the original one and other state-of-the-art robust procedures under cellwise and casewise data contamination. © 2017 Elsevier B.V.
Registro:
Documento: |
Artículo
|
Título: | Multivariate location and scatter matrix estimation under cellwise and casewise contamination |
Autor: | Leung, A.; Yohai, V.; Zamar, R. |
Filiación: | Department of Statistics, University of British Columbia, 3182-2207 Main Mall, Vancouver, British Columbia, V6T 1Z4, Canada Departamento de Matemática, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires, Ciudad Universitaria, Pabellón 1, Buenos Aires, 1426, Argentina
|
Palabras clave: | Cellwise outliers; Componentwise contamination; Multivariate location and scatter; Robust estimation; Location; Matrix algebra; Multivariant analysis; Cellwise outliers; Componentwise; Multivariate data analysis; Robust estimation; Robust procedures; Simulation studies; Two-step approach; Two-step procedure; Statistics |
Año: | 2017
|
Volumen: | 111
|
Página de inicio: | 59
|
Página de fin: | 76
|
DOI: |
http://dx.doi.org/10.1016/j.csda.2017.02.007 |
Título revista: | Computational Statistics and Data Analysis
|
Título revista abreviado: | Comput. Stat. Data Anal.
|
ISSN: | 01679473
|
CODEN: | CSDAD
|
Registro: | https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_01679473_v111_n_p59_Leung |
Referencias:
- Agostinelli, C., Leung, A., Yohai, V.J., Zamar, R.H., Rejoinder on: Robust estimation of multivariate location and scatter in the presence of cellwise and casewise contamination (2015) TEST, 24 (3), pp. 484-488
- Agostinelli, C., Leung, A., Yohai, V.J., Zamar, R.H., Robust estimation of multivariate location and scatter in the presence of cellwise and casewise contamination (2015) TEST, 24 (3), pp. 441-461
- Alqallaf, F.A., Konis, K.P., Martin, R.D., Zamar, R.H., Scalable robust covariance and correlation estimates for data mining (2002) Proceedings of the eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 14-23. , In: KDD ’02. pp
- Alqallaf, F., Van~Aelst, S., Yohai, V.J., Zamar, R.H., Propagation of outliers in multivariate data (2009) Ann. Statist., 37 (1), pp. 311-331
- Danilov, M., Yohai, V.J., Zamar, R.H., Robust estimation of multivariate location and scatter in the presence of missing data (2012) J. Amer. Statist. Assoc., 107, pp. 1178-1186
- Farcomeni, A., Robust constrained clustering in presence of entry-wise outliers (2014) Technometrics, 56, pp. 102-111
- Friedman, J., Hastie, T., Tibshirani, R., Sparse inverse covariance estimation with the graphical lasso (2008) Biostatistics, 9 (3), pp. 432-441
- Gnanadesikan, R., Kettenring, J.R., Robust estimates, residuals, and outlier detection with multiresponse data (1972) Biometrics, 28, pp. 81-124
- Hall, P., Marron, J., Neeman, A., Geometric representation of high dimension, low sample size data (2005) J. R. Stat. Soc. Ser. B Stat. Methodol., 67, pp. 427-444
- Leung, A., Danilov, M., Yohai, V., Zamar, R., GSE: Robust Estimation in the Presence of Cellwise and Casewise Contamination and Missing Data (2015), R package version 3.2.3; Maronna, R.A., Comments on: Robust estimation of multivariate location and scatter in the presence of cellwise and casewise contamination (2015) TEST, 24 (3), pp. 471-472
- Maronna, R.A., Martin, R.D., Yohai, V.J., Robust Statistics: Theory and Methods (2006), John Wiley & Sons Chichister; Maronna, R.A., Yohai, V.J., Robust and efficient estimation of high dimensional scatter and location (2015); Martin, R., Robust covariances: Common risk versus specific risk outliers (2013), www.rinfinance.com/agenda/2013/talk/DougMartin.pdf, In: Presented at the 2013 R-Finance Conference, Chicago, IL, (visited 2016-08-24); Peña, D., Prieto, F.J., Multivariate outlier detection and robust covariance matrix estimation (2001) Technometrics, 43, pp. 286-310
- Rocke, D.M., Robustness properties of S-estimators of multivariate location and shape in high dimension (1996) Ann. Statist., 24, pp. 1327-1345
- Rousseeuw, P.J., Croux, C., Alternatives to the median absolute deviation (1993) J. Amer. Statist. Assoc., 88, pp. 1273-1283
- Rousseeuw, P.J., Van~den Bossche, W., Comments on: Robust estimation of multivariate location and scatter in the presence of cellwise and casewise contamination (2015) TEST, 24 (3), pp. 473-477
- Rousseeuw, P.J., Van den Bossche, W., 2016. Detecting deviating data cells. [stat.ME]; Van Aelst, S., Vandervieren, E., Willems, G., A Stahel-Donoho estimator based on Huberized outlyingness (2012) Comput. Statist. Data Anal., 56, pp. 531-542
Citas:
---------- APA ----------
Leung, A., Yohai, V. & Zamar, R.
(2017)
. Multivariate location and scatter matrix estimation under cellwise and casewise contamination. Computational Statistics and Data Analysis, 111, 59-76.
http://dx.doi.org/10.1016/j.csda.2017.02.007---------- CHICAGO ----------
Leung, A., Yohai, V., Zamar, R.
"Multivariate location and scatter matrix estimation under cellwise and casewise contamination"
. Computational Statistics and Data Analysis 111
(2017) : 59-76.
http://dx.doi.org/10.1016/j.csda.2017.02.007---------- MLA ----------
Leung, A., Yohai, V., Zamar, R.
"Multivariate location and scatter matrix estimation under cellwise and casewise contamination"
. Computational Statistics and Data Analysis, vol. 111, 2017, pp. 59-76.
http://dx.doi.org/10.1016/j.csda.2017.02.007---------- VANCOUVER ----------
Leung, A., Yohai, V., Zamar, R. Multivariate location and scatter matrix estimation under cellwise and casewise contamination. Comput. Stat. Data Anal. 2017;111:59-76.
http://dx.doi.org/10.1016/j.csda.2017.02.007