Artículo

El editor solo permite decargar el artículo en su versión post-print desde el repositorio. Por favor, si usted posee dicha versión, enviela a
Consulte el artículo en la página del editor
Consulte la política de Acceso Abierto del editor

Abstract:

Recent advances in the accessibility of databases containing representations of complex objects - exemplified by repositories of time-series data, information about biological macromolecules, or knowledge about metabolic pathways - have not been matched by availability of tools that facilitate the retrieval of objects of particular interest and aid understanding their structure and relations. In applications, such as the analysis of DNA sequences, on the other hand, requirements to retrieve objects on the basis of qualitative characteristics are poorly met by descriptions that emphasize precision and detail rather than structural features. This paper presents a method for identification of interesting qualitative features in biological sequences. Our approach relies on a generalized clustering methodology in which the features being sought correspond to the solutions of a multivariable, multiobjective optimization problem with features generally corresponding to fuzzy subsets of the object being represented. Foremost among the optimization objectives being considered are measures of the degree by which features resemble prototypical structures deemed to be interesting by database users. Other objectives include feature size and, in some cases, performance criteria related to domain-specific constraints. Genetic-algorithm methods are employed to solve the multiobjective optimization problem. These optimization algorithms discover candidate features as subsets of the object being described and that lie in the set of all Pareto-optimal solutions - of that problem. These candidate features are then summarized, employing again evolutionary-computation methods, and interrelated by employing domain-specific relations of interest to the end users. We present results of the application of this two-step method to the recognition and summarization of interesting features in DNA sequences of Tripanosoma cruzi.

Registro:

Documento: Artículo
Título:Automated biological sequence description by genetic multiobjective generalized clustering
Autor:Zwir, I.; Zaliz, R.R.; Ruspini, E.H.
Filiación:Department of Molecular Microbiology, Howard Hughes Med. Inst. Res. Labs., Washington Univ. School of Medicine, Saint Louis, MO 63110-1093, United States
Department of Computer Science, Facultad de Cie. Exact. y Naturales, Universidad of Buenos Aires, Buenos Aires, Argentina
Artificial Intelligence Center, SRI International, Menlo Park, CA, United States
Palabras clave:Biological DNA sequences; Feature elicitation; Generalized clustering; Hierarchy of evolution programs; Multiobjective genetic algorithms; Pareto optimality; Qualitative description; accuracy; automation; calculation; conference paper; data base; DNA sequence; gene cluster; gene sequence; genetic algorithm; information retrieval; mathematical analysis; nonhuman; performance; qualitative analysis; sequence analysis; short interspersed repeat; Trypanosoma cruzi; Protozoa; Trypanosoma; Trypanosoma cruzi
Año:2002
Volumen:980
Página de inicio:65
Página de fin:82
DOI: http://dx.doi.org/10.1111/j.1749-6632.2002.tb04889.x
Título revista:Annals of the New York Academy of Sciences
Título revista abreviado:Ann. New York Acad. Sci.
ISSN:00778923
CODEN:ANYAA
Registro:https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_00778923_v980_n_p65_Zwir

Referencias:

  • Zadeh, L.A., Outline of a computational theory of perceptions based on computing with words (2000) Soft Computing and Intelligent Systems: Theory and Applications, pp. 3-22. , N.K. Sinha, M.M. Gupta & L.A. Zadeh, Eds.: Academic Press, San Diego
  • Ruspini, E.H., Zwir, I., Automated generation of qualitative representations of complex object by hybrid soft-computing methods (2001) Pattern Recognition: From Classical to Modern Approaches, , S.K. Pal & A. Pal, Eds. World Scientific Company, Singapore
  • Zwir, I., Ruspini, E.H., Qualitative object description: Initial reports of the exploration of the frontier (1999) Proc. EUROFUSE-SIC\\acute99, pp. 485-490. , Budapest, Hungary
  • Vázquez, M., Ben-Dov, C., Lorenzi, H., The short interspersed repetitive element of Trypanosoma Cruzi, SIRE, is part of VIPER, an anusual retroelement related to long terminal repeat retrotransposon (2000) Proc. Natl. Acad. Sci. USA, 97 (5), pp. 2128-2133
  • Michalewicz, Z., (1999) Genetic Algorithms + Data Structures = Evolution Programs, , Springer
  • Ruspini, E.H., A new approach to clustering (1969) Inform. Contl., 15 (1), pp. 22-32
  • Bezdek, J.C., Fuzzy clustering (1998) Handbook of Fuzzy Computation, , E.H. Ruspini, P.P. Bonissone & W. Pedrycz, Eds.: F6.2. Institute of Physics Press
  • Krishnapuram, R., Keller, J., A possibilistic approach to clustering (1993) IEEE Transactions on Fuzzy Systems, pp. 98-110
  • Ruspini, E.H., Zwir, I., Automated qualitative description of measurements (1999) Proc. 16th IEEE Instrumentation and Measurement Technology Conf.
  • Rissanen, J., (1989) Stochastic Complexity in Statistical Inquiry. World Scientific
  • Horn, J., Nafpliotis, N., (1993) Multiobjective Optimization Using the Niched Pareto Genetic Algorithm, , IlliGAL 93005. Illinois Genetic Algorithms Laboratory (IlliGAL), Department of General Engineering, University of Illinois at Urbana-Champaign
  • Horn, J., Nafpliotis, N., Goldberg, D., A niched Pareto genetic algorithm for multiobjective optimization (1994) Proc. First IEEE Conf. on Evolutionary Computation, pp. 82-87
  • Van Helden, J., Rios, A., Collado-Vides, J., Discovering regulatory elements in non-coding sequence by analysis of space dyads (2000) Nucl. Acids Res., 28 (8), pp. 1808-1818
  • Zitzler, E., Thiele, L., Deb, K., Comparison of multiobjective evolutionary algorithms: Empirical results (2000) Evol. Comput., 8 (2), pp. 173-195
  • Deb, K., (2001) Multi-Objective Optimization Using Evolutionary Algorithms, , John Wiley & Sons
  • Setubal, J., Meidanis, J., (1997) Introduction to Computational Molecular Biology, , PWS Publishing Company
  • Chiclana, F., Herrera, F., Herrera-Viedma, E., A note on the internal consistency of various preference representations (2002) Fuzzy Sets Systems, 131, pp. 75-78
  • Bäck, T., Fogel, D., Michalewicz, Z., (1997) Handbook of Evolutionary Computation, , Institute of Physics Publishing and Oxford University Press
  • Fonseca, C., Fleming, P., Multiobjective genetic algorithms made easy: Selection, sharing and mating restriction (1995) Genetic Algorithms in Engineering Systems: Innovation and Applications, pp. 42-52. , IEEE
  • Zaliz, R.C.R., (2001) Reconocimiento y Descripción de Objetos Complejos en Biología Molecular, , Masters Thesis, Universidad de Buenos Aires, Argentina
  • Horng, J., Ching-Mei, L., Liu, B., Kao, C., Using genetic algorithms to solve multiple sequence alignments (2000) Proc. of the Genetic and Evolutionary Computation Conf., pp. 883-890
  • (2002) Machi DNA Server, , http://machi.dc.uba.ar:8080

Citas:

---------- APA ----------
Zwir, I., Zaliz, R.R. & Ruspini, E.H. (2002) . Automated biological sequence description by genetic multiobjective generalized clustering. Annals of the New York Academy of Sciences, 980, 65-82.
http://dx.doi.org/10.1111/j.1749-6632.2002.tb04889.x
---------- CHICAGO ----------
Zwir, I., Zaliz, R.R., Ruspini, E.H. "Automated biological sequence description by genetic multiobjective generalized clustering" . Annals of the New York Academy of Sciences 980 (2002) : 65-82.
http://dx.doi.org/10.1111/j.1749-6632.2002.tb04889.x
---------- MLA ----------
Zwir, I., Zaliz, R.R., Ruspini, E.H. "Automated biological sequence description by genetic multiobjective generalized clustering" . Annals of the New York Academy of Sciences, vol. 980, 2002, pp. 65-82.
http://dx.doi.org/10.1111/j.1749-6632.2002.tb04889.x
---------- VANCOUVER ----------
Zwir, I., Zaliz, R.R., Ruspini, E.H. Automated biological sequence description by genetic multiobjective generalized clustering. Ann. New York Acad. Sci. 2002;980:65-82.
http://dx.doi.org/10.1111/j.1749-6632.2002.tb04889.x