Abstract:
We discuss a Bayesian model selection approach to high-dimensional data in the deep under-sampling regime. The data is based on a representation of the possible discrete states s, as defined by the observer, and it consists of M observations of the state. This approach shows that, for a given sample size M, not all states observed in the sample can be distinguished. Rather, only a partition of the sampled states s can be resolved. Such a partition defines an emergent classification qs of the states that becomes finer and finer as the sample size increases, through a process of symmetry breaking between states. This allows us to distinguish between the resolution of a given representation of the observer defined states s, which is given by the entropy of s, and its relevance, which is defined by the entropy of the partition qs. Relevance has a nonmonotonic dependence on resolution, for a given sample size. In addition, we characterise most relevant samples and we show that they exhibit power law frequency distributions, generally taken as signatures of criticality. This suggests that criticality reflects the relevance of a given representation of the states of a complex system, and does not necessarily require a specific mechanism of self-organisation to a critical point. � 2015 IOP Publishing Ltd and SISSA Medialab srl.
Registro:
Documento: |
Artículo
|
Título: | Criticality of mostly informative samples: A Bayesian model selection approach |
Autor: | Haimovici, A.; Marsili, M. |
Filiación: | Departamento de F�sica, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires, Buenos Aires, 1053, Argentina Abdus Salam International Centre for Theoretical Physics, Strada Costiera, 411 Trieste, Trieste, 34151, Italy
|
Palabras clave: | data mining (theory); statistical inference |
Año: | 2015
|
Volumen: | 2015
|
Número: | 10
|
DOI: |
http://dx.doi.org/10.1088/1742-5468/2015/10/P10013 |
Título revista: | Journal of Statistical Mechanics: Theory and Experiment
|
Título revista abreviado: | J. Stat. Mech. Theory Exp.
|
ISSN: | 17425468
|
Registro: | https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_17425468_v2015_n10_p_Haimovici |
Referencias:
- Cover, T.M., Thomas, J.A., (1991) Elements of Information Theory, , (New York: Wiley)
- Marsili, M., Mastromatteo, I., Roudi, Y., On sampling and modeling complex systems (2013) J. Stat. Mech., 2013 (9)
- Clauset, A., Shalizi, C.R., Newman, M.E.J., Power-law distributions in empirical data (2009) SIAM Rev., 51, pp. 661-703. , 661-703
- Zipf, G.K., (1932) Selected Studies of the Principle of Relative Frequency in Language, , (Cambridge, MA: Harvard University Press)
- Ruderman, D., Bialek, W., Statistics of natural images: Scaling in the woods (1994) Phys. Rev. Lett., 73, pp. 814-817. , 814-7
- Egu�luz, V.M., Chialvo, D.R., Cecchi, G.A., Baliki, M., Apkarian, A.V., Scale-free brain functional networks (2005) Phys. Rev. Lett., 94
- Schneidman, E., Berry, M.J., Segev, R., Bialek, W., Weak pairwise correlations imply strongly correlated network states in a neural population (2006) Nature, 440, pp. 1007-1012. , 1007-12
- Gabaix, X., Zipf's law for cities: An explanation (1999) Q. J. Econ., 114, pp. 739-767. , 739-67
- Bak, P., (1996) How Nature Works. the Science of Self-Organized Criticality, , (New York: Copernicus)
- Nemenman, I., Shafee, F., Bialek, W., Entropy and inference, revisited (2001) Adv. Neural Inf. Process., 14
- Orbanz, P., Teh, Y.W., (2010) Bayesian Nonparametric Models Encyclopedia of Machine Learning, pp. 88-89. , (Berlin: Springer) 88-9
- Good, I.J., The population frequencies of species and the estimation of population parameters (1953) Biometrika, 40, pp. 237-264. , 237-64
- Giada, L., Marsili, M., Algorithms of maximum likelihood data clustering with applications (2002) Physica, 315, pp. 650-664. , 650-64
- Marsili, M., Dissecting financial markets: Sectors and states (2002) Quant. Finance, 2, pp. 297-302. , 297-302
- Hubert, L., Arabie, P., Comparing partitions (1985) J. Classif., 2, pp. 193-218. , 193-218
- Grigolon, S., Franz, S., Marsili, M., (2015) Identifying Relevant Positions in Proteins by Critical Variable Selection, , arXiv:1503.03815
- Klein, M.L., Carnevale, V., Palovcak, E., Delemotte, L., Evolutionary imprint of activation: The design principles of vsds (2014) J. Gen. Physiol., 143, pp. 145-156. , 145-56
- Mora, T., Bialek, W., Are biological systems poised at criticality? (2011) J. Stat. Phys., 144, pp. 268-302. , Are biological systems poised at criticality? 268-302
Citas:
---------- APA ----------
Haimovici, A. & Marsili, M.
(2015)
. Criticality of mostly informative samples: A Bayesian model selection approach. Journal of Statistical Mechanics: Theory and Experiment, 2015(10).
http://dx.doi.org/10.1088/1742-5468/2015/10/P10013---------- CHICAGO ----------
Haimovici, A., Marsili, M.
"Criticality of mostly informative samples: A Bayesian model selection approach"
. Journal of Statistical Mechanics: Theory and Experiment 2015, no. 10
(2015).
http://dx.doi.org/10.1088/1742-5468/2015/10/P10013---------- MLA ----------
Haimovici, A., Marsili, M.
"Criticality of mostly informative samples: A Bayesian model selection approach"
. Journal of Statistical Mechanics: Theory and Experiment, vol. 2015, no. 10, 2015.
http://dx.doi.org/10.1088/1742-5468/2015/10/P10013---------- VANCOUVER ----------
Haimovici, A., Marsili, M. Criticality of mostly informative samples: A Bayesian model selection approach. J. Stat. Mech. Theory Exp. 2015;2015(10).
http://dx.doi.org/10.1088/1742-5468/2015/10/P10013