Artículo

El editor solo permite decargar el artículo en su versión post-print desde el repositorio. Por favor, si usted posee dicha versión, enviela a
Consulte el artículo en la página del editor
Consulte la política de Acceso Abierto del editor

Abstract:

All known terrestrial proteins are coded as continuous strings of ≈20 amino acids. The patterns formed by the repetitions of elements in groups of finite sequences describes the natural architectures of protein families. We present a method to search for patterns and groupings of patterns in protein sequences using a mathematically precise definition for "repetition", an efficient algorithmic implementation and a robust scoring system with no adjustable parameters. We show that the sequence patterns can be well-separated into disjoint classes according to their recurrence in nested structures. The statistics of the occurrences of patterns indicate that short repetitions are sufficient to account for the differences between natural families and randomized groups of sequences by more than 10 standard deviations, while contiguous sequence patterns shorter than 5 residues are effectively random in their occurrences. A small subset of patterns is sufficient to account for a robust "familiarity" definition between arbitrary sets of sequences. © 2018 American Chemical Society.

Registro:

Documento: Artículo
Título:On the Natural Structure of Amino Acid Patterns in Families of Protein Sequences
Autor:Turjanski, P.; Ferreiro, D.U.
Filiación:KAPOW, Departamento de Computación, Facultad de Ciencias Exactas y Naturales, UBA-CONICET-ICC, Buenos Aires, Argentina
Protein Physiology Lab, Departamento de Química Biológica, Facultad de Ciencias Exactas y Naturales, UBA-CONICET-IQUIBICEN, Buenos Aires, Argentina
Palabras clave:Amino acids; Bioinformatics; Structure (composition); Adjustable parameters; Amino acid patterns; Natural architecture; Natural structures; Precise definition; Protein sequences; Sequence patterns; Standard deviation; Proteins
Año:2018
Volumen:122
Número:49
Página de inicio:11295
Página de fin:11301
DOI: http://dx.doi.org/10.1021/acs.jpcb.8b07206
Título revista:Journal of Physical Chemistry B
Título revista abreviado:J Phys Chem B
ISSN:15206106
CODEN:JPCBF
Registro:https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_15206106_v122_n49_p11295_Turjanski

Referencias:

  • Weiss, O., Jiménez-Montaño, M.A., Herzel, H., Information Content of Protein Sequences (2000) J. Theor. Biol., 206, pp. 379-386
  • Wolynes, P.G., Eaton, W.A., Fersht, A.R., Chemical physics of protein folding (2012) Proc. Natl. Acad. Sci. U. S. A., 109, pp. 17770-17771
  • Eaton, W.A., Wolynes, P.G., Theory, simulations, and experiments show that proteins fold by multiple pathways (2017) Proc. Natl. Acad. Sci. U. S. A., 114, pp. E9759-E9760
  • Dryden, D.T.F., Thomson, A.R., White, J.H., How much of protein sequence space has been explored by life on Earth? (2008) J. R. Soc., Interface, 5, pp. 953-956
  • Doolittle, R.F., The roots of bioinformatics in protein evolution (2010) PLoS Comput. Biol., 6, p. e1000875
  • Morcos, F., Pagnani, A., Lunt, B., Bertolino, A., Marks, D.S., Sander, C., Zecchina, R., Weigt, M., Direct-coupling analysis of residue coevolution captures native contacts across many protein families (2011) Proc. Natl. Acad. Sci. U. S. A., 108, pp. E1293-E1301
  • Schafer, N.P., Kim, B.L., Zheng, W., Wolynes, P.G., Learning to Fold Proteins Using Energy Landscape Theory (2014) Isr. J. Chem., 54, pp. 1311-1337
  • Morcos, F., Schafer, N.P., Cheng, R.R., Onuchic, J.N., Wolynes, P.G., Coevolutionary information, protein folding landscapes, and the thermodynamics of natural selection (2014) Proc. Natl. Acad. Sci. U. S. A., 111, pp. 12408-12413
  • Dickson, R.J., Wahl, L.M., Fernandes, A.D., Gloor, G.B., Identifying and seeing beyond multiple sequence alignment errors using intra-molecular protein covariation (2010) PLoS One, 5, p. e11082
  • Becher, V., Deymonnaz, A., Heiber, P., Efficient computation of all perfect repeats in genomic sequences of up to half a gigabyte, with a case study on the human genome (2009) Bioinformatics, 25, pp. 1746-1753
  • Turjanski, P., Parra, R.G., Espada, R., Becher, V., Ferreiro, D.U., Protein Repeats from First Principles (2016) Sci. Rep., 6, p. 23959
  • Gusfield, D., (1997) Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology, , Cambridge University Press
  • Taillefer, E., Miller, J., Exhaustive computation of exact duplications via super and non-nested local maximal repeats (2014) J. Bioinf. Comput. Biol., 12, p. 1350018
  • Di Domenico, T., Potenza, E., Walsh, I., Gonzalo Parra, R., Giollo, M., Minervini, G., Piovesan, D., Kajava, A.V., RepeatsDB: A database of tandem repeat protein structures (2014) Nucleic Acids Res., 42, pp. D352-D357
  • Björklund, A.K., Light, S., Sagit, R., Elofsson, A., Nebulin: A study of protein repeat evolution (2010) J. Mol. Biol., 402, pp. 38-51
  • Krick, T., Verstraete, N., Alonso, L.G., Shub, D.A., Ferreiro, D.U., Shub, M., Sánchez, I.E., Amino Acid metabolism conflicts with protein diversity (2014) Mol. Biol. Evol., 31, pp. 2905-2912
  • Crochemore, M., Rytter, W., (2002) Jewels of Stringology, , World Scientific
  • Trelle, M.B., Ramsey, K.M., Lee, T.C., Zheng, W., Lamboy, J., Wolynes, P.G., Deniz, A., Komives, E.A., Binding of NFκB Appears to Twist the Ankyrin Repeat Domain of IκBα (2016) Biophys. J., 110, pp. 887-895
  • Espada, R., Parra, R., Sippl, M., Mora, T., Walczak, A., Ferreiro, D., Repeat proteins challenge the concept of structural domains (2015) Biochem. Soc. Trans., 43, pp. 844-849
  • Smith, J.M., The Concept of Information in Biology (2000) Philosophy of Science, 67, pp. 177-194
  • Adami, C., Information theory in molecular biology (2004) Physics of Life Reviews, 1, pp. 3-22
  • Godfrey-Smith, P., Information in biology (2007) The Cambridge Companion to the Philosophy of Biology, pp. 103-119
  • Ferreiro, D.U., Komives, E.A., Wolynes, P.G., Frustration in biomolecules (2014) Q. Rev. Biophys., 47, pp. 285-363
  • Bryngelson, J.D., Wolynes, P.G., Spin glasses and the statistical mechanics of protein folding (1987) Proc. Natl. Acad. Sci. U. S. A., 84, pp. 7524-7528
  • Muñoz, V., Eaton, W.A., A simple model for calculating the kinetics of protein folding from three-dimensional structures (1999) Proc. Natl. Acad. Sci. U. S. A., 96, pp. 11311-11316
  • Robustelli, P., Piana, S., Shaw, D.E., Developing a molecular dynamics force field for both folded and disordered protein states (2018) Proc. Natl. Acad. Sci. U. S. A., 115, pp. E4758-E4766
  • Leaver-Fay, A., Tyka, M., Lewis, S.M., Lange, O.F., Thompson, J., Jacak, R., Kaufman, K., Sheffler, W., ROSETTA3: An object-oriented software suite for the simulation and design of macromolecules (2011) Methods Enzymol., 487, pp. 545-574
  • Bornberg-Bauer, E., Chan, H.S., Modeling evolutionary landscapes: Mutational stability, topology, and superfunnels in sequence space (1999) Proc. Natl. Acad. Sci. U. S. A., 96, pp. 10689-10694
  • Lavelle, D.T., Pearson, W.R., Globally, unrelated protein sequences appear random (2010) Bioinformatics, 26, pp. 310-318
  • Parra, R.G., Espada, R., Sánchez, I.E., Sippl, M.J., Ferreiro, D.U., Detecting repetitions and periodicities in proteins by tiling the structural space (2013) J. Phys. Chem. B, 117, pp. 12887-12897
  • Panchenko, A.R., Luthey-Schulten, Z., Cole, R., Wolynes, P.G., The foldon universe: A survey of structural similarity and self-recognition of independently folding units (1997) J. Mol. Biol., 272, pp. 95-105
  • Stapleton, H.J., Allen, J.P., Flynn, C.P., Stinson, D.G., Kurtz, S.R., Fractal Form of Proteins (1980) Phys. Rev. Lett., 45, pp. 1456-1459
  • Lewis, M., Rees, D.C., Fractal surfaces of proteins (1985) Science, 230, pp. 1163-1165
  • Reuveni, S., Granek, R., Klafter, J., Anomalies in the vibrational dynamics of proteins are a consequence of fractal-like structure (2010) Proc. Natl. Acad. Sci. U. S. A., 107, pp. 13696-13700
  • Kornev, A.P., Self-organization, entropy and allostery (2018) Biochem. Soc. Trans., 46, pp. 587-597
  • Chowdary, P.D., Gruebele, M., Molecules: What kind of a bag of atoms? (2009) J. Phys. Chem. A, 113, pp. 13139-13143
  • Dayhoff, M.O., The origin and evolution of protein superfamilies (1976) Fed Proc., 35, pp. 2132-2138
  • Schwende, I., Pham, T.D., Pattern recognition and probabilistic measures in alignment-free sequence analysis (2014) Briefings Bioinf., 15, pp. 354-368
  • Rodríguez, P.E., Dogma periférico: ?de qué mensaje me están hablando? (2015) Química Viva, 14, pp. 1-10
  • Kirschner, M., Gerhart, J., Mitchison, T., Molecular "vitalism (2000) Cell, 100, pp. 79-88
  • Ferreiro, D.U., Komives, E.A., Wolynes, P.G., Frustration, function and folding (2018) Curr. Opin. Struct. Biol., 48, pp. 68-73
  • Adams, D., (2002) The Salmon of Doubt: Hitchhiking the Galaxy One Last Time, , William Heinemann Ltd

Citas:

---------- APA ----------
Turjanski, P. & Ferreiro, D.U. (2018) . On the Natural Structure of Amino Acid Patterns in Families of Protein Sequences. Journal of Physical Chemistry B, 122(49), 11295-11301.
http://dx.doi.org/10.1021/acs.jpcb.8b07206
---------- CHICAGO ----------
Turjanski, P., Ferreiro, D.U. "On the Natural Structure of Amino Acid Patterns in Families of Protein Sequences" . Journal of Physical Chemistry B 122, no. 49 (2018) : 11295-11301.
http://dx.doi.org/10.1021/acs.jpcb.8b07206
---------- MLA ----------
Turjanski, P., Ferreiro, D.U. "On the Natural Structure of Amino Acid Patterns in Families of Protein Sequences" . Journal of Physical Chemistry B, vol. 122, no. 49, 2018, pp. 11295-11301.
http://dx.doi.org/10.1021/acs.jpcb.8b07206
---------- VANCOUVER ----------
Turjanski, P., Ferreiro, D.U. On the Natural Structure of Amino Acid Patterns in Families of Protein Sequences. J Phys Chem B. 2018;122(49):11295-11301.
http://dx.doi.org/10.1021/acs.jpcb.8b07206