Artículo

Consens, M.P.; Miller, R.J.; Rizzolo, F.; Vaisman, A.A. "Exploring XML Web collections with DescribeX" (2010) ACM Transactions on the Web. 4(3)
Estamos trabajando para incorporar este artículo al repositorio
Consulte el artículo en la página del editor
Consulte la política de Acceso Abierto del editor

Abstract:

As Web applications mature and evolve, the nature of the semistructured data that drives these applications also changes. An important trend is the need for increased flexibility in the structure of Web documents. Hence, applications cannot rely solely on schemas to provide the complex knowledge neededtovisualize, use, query and manage documents. Even when XMLWeb documents are valid with regard to a schema, the actual structure of such documents may exhibit significant variations across collections for several reasons: the schema may be very lax (e.g., RSS feeds), the schema may be large and different subsets of it may be used in different documents (e.g., industry standards like UBL), or open content models may allow arbitrary schemas to be mixed (e.g., RSS extensions like those used for podcasting). For these reasons, many applications that incorporate XPath queries to process a large Web document collection require an understanding of the actual structure present in the collection, and not just the schema. To support modern Web applications, we introduce DescribeX, a powerful framework that is capable of describing complex XML summaries of Web collections. DescribeX supports the construction of heterogenous summaries that can be declaratively defined and refined by means of axis path regular expression (AxPREs). AxPREs provide the flexibility necessary for declaratively defining complex mappings between instance nodes (in the documents) and summary nodes. These mappings are capable of expressing order and cardinality, among other properties, which can significantly help in the understanding of the structure of large collections of XML documents and enhance the performance of Web applications over these collections. DescribeX captures most summary proposals in the literature by providing (for the first time) a common declarative definition for them. Experimental results demonstrate the scalability of DescribeX summary operations (summary creation, as well as refinement and stabilization, two key enablers for tailoring summaries) on multi-gigabyte Web collections. © 2010 ACM.

Registro:

Documento: Artículo
Título:Exploring XML Web collections with DescribeX
Autor:Consens, M.P.; Miller, R.J.; Rizzolo, F.; Vaisman, A.A.
Filiación:University of Toronto, 40 St. George St., Toronto, Canada
University of Ottawa, SITE, 800 King Edward St., Ottawa, Canada
Universidad de Buenos Aires, Facultad de Ciencias Exactas y Naturales, Ciudad Universitaria, Buenos Aires, Argentina
Palabras clave:Semistructured data; Structural summaries; XML; XPath; Cardinalities; Complex mapping; Increased flexibility; Industry standards; Open content; Podcasting; Regular expressions; Schemas; Semi structured data; Structural summaries; Structural summary; WEB application; Web collections; Web document; Web document collection; XPath queries; Markup languages; Rough set theory; World Wide Web; XML
Año:2010
Volumen:4
Número:3
DOI: http://dx.doi.org/10.1145/1806916.1806920
Título revista:ACM Transactions on the Web
Título revista abreviado:ACM Trans. Web
ISSN:15591131
Registro:https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_15591131_v4_n3_p_Consens

Referencias:

  • Al-Khalifa, S., Jagadish, H.V., Patel, J.M., Wu, Y., Koudas, N., Srivastava, D., Structural joins: A primitive for efficient XML query pattern matching (2002) Proceedings of the 18th International Conference on Data Engineering, pp. 141-152
  • Ali, M.S., Consens, M.P., Gu, X., Kanza, Y., Rizzolo, F., Stasiu, R.K., Efficient, effective and flexible XML retrieval using summaries (2006) Proceedings of the 5th International Workshop of the Initiative for the Evaluation of XML Retrieval (INEX'06), 4518, pp. 89-103. , Lecture Notes in Computer Science, Springer
  • Ali, M.S., Consens, M.P., Khatchadourian, S., XML retrieval by improving structural relevance measures obtained from summary models (2007) Proceedings of the 6th International Workshop of the Initiative for the Evaluation of XML Retrieval (INEX'07), pp. 34-48. , Springer
  • Ali, M.S., Consens, M.P., Khatchadourian, S., Rizzolo, F., DescribeX: Interacting with AxPRE summaries (2008) Proceedings of the 24th International Conference on Data Engineering (Demonstrations), pp. 1540-1543
  • Amato, G., Debole, F., Rabitti, F., Savino, P., Zezula, P., A signature-based approach for efficient relationship search on XML data collections (2004) Proceedings of the 2nd International XML Database Symposium, XSym, pp. 82-96
  • Balmin, A., Ozcan, F., Beyer, K.S., Cochrane, R., Pirahesh, H., A framework for using materialized XPath views in XML query processing (2004) Proceedings of the 30th International Conference on Very Large Data Bases, pp. 60-71
  • Barta, A., Consens, M.P., Mendelzon, A.O., Benefits of path summaries in an XML query optimizer supporting multiple access methods (2005) Proceedings of the 31st International Conference on Very Large Data Bases, pp. 133-144
  • Bex, G.J., Neven, F., Schwentick, T., Tuyls, K., Inference of concise DTDs from XML data (2006) Proceedings of the 32nd International Conference on Very Large Data Bases, pp. 115-126
  • Bruno, N., Koudas, N., Srivastava, D., Holistic twig joins: Optimal XML pattern matching (2002) Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 310-321
  • Buneman, P., Choi, B., Fan, W., Hutchison, R., Mann, R., Viglas, S., Vectorizing and querying large XML repositories (2005) Proceedings of the 21st International Conference on Data Engineering, pp. 261-272
  • Chien, S.-Y., Vagena, Z., Zhang, D., Tsotras, V.J., Zaniolo, C., Efficient structural joins on indexed XML documents (2002) Proceedings of the 28th International Conference on Very Large Data Bases, pp. 263-274
  • Chung, C.-W., Min, J.-K., Shim, K., APEX: An adaptive path index for XML data (2002) Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 121-132
  • Clark, J., Makoto, M., (2001), http://www.oasis-open.org/committees/relax-ng/spec-20011203.html, RELAX NG specification; Consens, M.P., Milo, T., Optimizing queries on files (1994) Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 301-312
  • Consens, M.P., Rizzolo, F., Fast answering of XPath query workloads on Web collections (2007) Proceedings of the 5th International XML Database Symposium, XSym, pp. 31-45
  • Consens, M.P., Rizzolo, F., Vaisman, A.A., AxPRE summaries: Exploring the (semi-) structure of XML Web collections (2008) Proceedings of the 24th International Conference on Data Engineering, pp. 1519-1521
  • Cooper, B.F., Sample, N., Franklin, M.J., Hjaltason, G.R., Shadmon, M., A fast index for semistructured data (2001) Proceedings of the 27th International Conference on Very Large Data Bases, pp. 341-350
  • Denoyer, L., Gallinari, P., The Wikipedia XML Corpus (2006) SIGIR Forum.
  • Dietz, P.F., Maintaining order in a linked list (1982) Proceedings of the 14th Annual ACM Symposium on Theory of Computing, pp. 122-127
  • Dovier, A., Piazza, C., Policriti, A., An efficient algorithm for computing bisimulation equivalence (2004) Theoret. Comput. Sci., 311 (1-3), pp. 221-256
  • Fletcher, G.H.L., Gucht, D.V., Wu, Y., Gyssens, M., Brenes, S., Paredaens, J., A methodology for coupling fragments of XPath with structural indexes for XML documents (2007) Proceedings of the 11th International Symposium on Database Programming Languages (DBPL'07), pp. 48-65
  • Garofalakis, M., Gionis, A., Rastogi, R., Seshadri, S., Shim, K., XTRACT: Learning document type descriptors from XML document collections (2003) Data Mining Knowl. Disc, 7 (1), pp. 23-56
  • Goldman, R., Widom, J., Dataguides: Enabling query formulation and optimization in semistructured databases (1997) Proceedings of the 23rd International Conference on Very Large Data Bases, pp. 436-445
  • He, H., Yang, J., Multiresolution indexing of XML for frequent queries (2004) Proceedings of the 20th International Conference on Data Engineering, pp. 683-694
  • Hopcroft, J.E., Ullman, J.D., (1979) Introduction to Automata Theory, Languages and Computation, , Addison-Wesley
  • Jiang, H., Lu, H., Wang, W., Ooi, B.C., XR-Tree: Indexing XML data for efficient structural joins (2003) Proceedings of the 19th International Conference on Data Engineering, pp. 253-263
  • Jiang, H., Wang, W., Lu, H., Yu, J.X., Holistic twig joins on indexed XML documents (2003) Proceedings of the 29th International Conference on Very Large Data Bases, pp. 273-284
  • Kaplan, H., Milo, T., Shabo, R., A comparison of labeling schemes for ancestor queries (2002) Proceedings of the 13th Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 954-963
  • Kaushik, R., Bohannon, P., Naughton, J.F., Korth, H.F., Covering indexes for branching path queries (2002) Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 133-144
  • Kaushik, R., Bohannon, P., Naughton, J.F., Shenoy, P., Updates for structure indexes (2002) Proceedings of the 28th International Conference on Very Large Data Bases, pp. 239-250
  • Kaushik, R., Shenoy, P., Bohannon, P., Gudes, E., Exploiting local similarity for indexing paths in graph-structured data (2002) Proceedings of the 18th International Conference on Data Engineering, pp. 129-140
  • Kazai, G., Gövert, N., Lalmas, M., Fuhr, N., The INEX evaluation initiative (2003) Intelligent Search on XML Data, pp. 279-293
  • Kha, D.D., Yoshikawa, M., Uemura, S., An XML indexing structure with relative region coordinate (2001) Proceedings of the 17th International Conference on Data Engineering, pp. 313-320
  • Lakshmanan, L.V., Wang, H.W., Zhao, Z.J., Answering tree pattern queries using views (2006) Proceedings of the 32nd International Conference on Very Large Data Bases, pp. 571-582
  • Li, Q., Moon, B., Indexing and querying XML data for regular path expressions (2001) Proceedings of the 27th International Conference on Very Large Data Bases, pp. 361-370
  • Li, Y., Yu, C., Jagadish, H.V., Enabling Schema-Free XQuery with meaningful query focus (2008) Int. J. VLDB, 17 (3), pp. 355-377
  • Lu, J., Ling, T.W., Chan, C.Y., Chen, T., From region encoding to extended Dewey: On efficient processing of XML twig pattern matching (2005) Proceedings of the 31st International Conference on Very Large Data Bases, pp. 193-204
  • Mandhani, B., Suciu, D., Query caching and view selection for XML databases (2005) Proceedings of the 31st International Conference on Very Large Data Bases, pp. 469-480
  • Martens, W., Neven, F., Schwentick, T., Bex, G.J., Expressiveness and complexity of XML schema (2006) ACM Trans. Datab. Syst., 31 (3), pp. 770-813
  • Mendelzon, A.O., Wood, P.T., Finding regular simple paths in graph databases (1995) SIAM J. Comput., 24 (6), pp. 1235-1258
  • Miller, R.J., Haas, L.M., Hernández, M., Schema mapping as query discovery (2000) Proceedings of the 26th International Conference on Very Large Data Bases, pp. 77-88
  • Milo, T., Suciu, D., Index structures for path expressions (1999) Proceedings of the 7th International Conference on Database Theory, pp. 277-295
  • Murata, M., Lee, D., Mani, M., Kawaguchi, K., Taxonomy of XML schema languages using formal language theory (2005) ACM Trans. Intern. Techn, 5 (4), pp. 660-704
  • Nestorov, S., Ullman, J.D., Wiener, J.L., Chawathe, S.S., Representative objects: Concise representations of semistructured, hierarchial data (1997) Proceedings of the 13th International Conference on Data Engineering, pp. 79-90
  • Paige, R., Tarjan, R.E., Three partition refinement algorithms (1987) SIAM J. Comput., 16 (6), pp. 973-989
  • Polyzotis, N., Garofalakis, M.N., XCluster synopses for structured XML content (2006) Proceedings of the 22nd International Conference on Data Engineering
  • Polyzotis, N., Garofalakis, M.N., XSketch synopses for XML data graphs (2006) ACM Trans. Datab. Syst., 31 (3), pp. 1014-1063
  • Polyzotis, N., Garofalakis, M.N., Ioannidis, Y.E., Approximate XML query answers (2004) Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 263-274
  • Popa, L., Velegrakis, Y., Miller, R.J., Hernández, M.A., Fagin, R., Translating Web data (2002) Proceedings of the 28th International Conference on Very Large Data Bases, pp. 598-609
  • Qun, C., Lim, A., Ong, K.W., D (k)-index: An adaptive structural summary for graph-structured data (2003) Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 134-144
  • Rao, P., Moon, B., PRIX: Indexing and querying XML using prufer sequences (2004) Proceedings of the 20th International Conference on Data Engineering, pp. 288-300
  • Rizzolo, F., (2008) DescribeX: A Framework for Exploring and Querying XML Web Collections, , http://arXiv.org/abs/0807.2972, Ph. D. thesis, University of Toronto. CoRR arXiv:0807.2972v1
  • Rizzolo, F., Mendelzon, A.O., Indexing XML data with ToXin (2001) Proceedings of 4th International Workshop on the Web and Databases, pp. 49-54
  • Rizzolo, F., Vaisman, A.A., Temporal XML: Modeling, indexing, and query processing (2008) Int. J. VLDB, 17 (5), pp. 1179-1212
  • Samavi, R., Consens, M., Khatchadourian, S., Topaloglou, T., Exploring PSI-MI XML collections using DescribeX (2007) J. Integr. Bioinform, 4, p. 3
  • Santoro, N., Khatib, R., Labelling and implicit routing in networks (1985) Comput. J., 28, pp. 5-8
  • Vagena, Z., Moro, M.M., Tsotras, V.J., Efficient processing of XML containment queries using partition-based schemes (2004) Proceedings of the 8th International Database Engineering and Applications Symposium (IDEAS'04), pp. 161-170
  • (1999), http://www.w3.org/TR/xpath, W3C, XML Path Language XPath 1.0; (2004), http://www.w3.org/TR/xmlschema-0, W3C, XML Schema; (2006), http://www.w3.org/TR/REC-xml, W3C, Extensible Markup Language XML 1.0; (2007), http://www.w3.org/TR/xpath20, W3C, XML Path Language XPath 2.0; Wang, H., Park, S., Fan, W., Yu, P.S., ViST: A dynamic index method for querying XML data by tree structures (2003) Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 110-121
  • Wang, W., Jiang, H., Lu, H., Yu, J.X., PBiTree coding and efficient processing of containment joins (2003) Proceedings of the 19th International Conference on Data Engineering, p. 391
  • Xu, W., Özsoyoglu, Z.M., Rewriting XPath queries using materialized views (2005) Proceedings of the 31st International Conference on Very Large Data Bases, pp. 121-132
  • Yannakakis, M., Graph-theoretic methods in database theory (1990) Proceedings of the 9th Symposium on Principles of Database Systems, pp. 230-242
  • Yi, K., He, H., Stanoi, I., Yang, J., Incremental maintenance of XML structural indexes (2004) Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 491-502
  • Young-Lai, M., Tompa, F.W., One-pass evaluation of region algebra expressions (2003) Inform. Syst., 28 (3), pp. 159-168
  • Yu, C., Jagadish, H.V., Efficient discovery of XML data redundancies (2006) Proceedings of the 32nd International Conference on Very Large Data Bases, pp. 103-114
  • Yu, C., Jagadish, H.V., Schema summarization (2006) Proceedings of the 32nd International Conference on Very Large Data Bases, pp. 319-330
  • Yu, C., Jagadish, H.V., Querying complex structured databases (2007) Proceedings of the 33rd International Conference on Very Large Data Bases, pp. 1010-1021
  • Yu, C., Jagadish, H.V., XML schema refinement through redundancy detection and normalization (2008) Int. J. VLDB, 17 (2), pp. 203-223
  • Zhang, N., Kacholia, V., Özsu, M.T., A succinct physical storage scheme for efficient evaluation of path queries in XML (2004) Proceedings of the 20th International Conference on Data Engineering, pp. 54-65
  • Zhang, N., Özsu, M.T., Ilyas, I.F., Aboulnaga, A., FIX: Feature-based indexing technique for XML documents (2006) Proceedings of the 32nd International Conference on Very Large Data Bases, pp. 259-270

Citas:

---------- APA ----------
Consens, M.P., Miller, R.J., Rizzolo, F. & Vaisman, A.A. (2010) . Exploring XML Web collections with DescribeX. ACM Transactions on the Web, 4(3).
http://dx.doi.org/10.1145/1806916.1806920
---------- CHICAGO ----------
Consens, M.P., Miller, R.J., Rizzolo, F., Vaisman, A.A. "Exploring XML Web collections with DescribeX" . ACM Transactions on the Web 4, no. 3 (2010).
http://dx.doi.org/10.1145/1806916.1806920
---------- MLA ----------
Consens, M.P., Miller, R.J., Rizzolo, F., Vaisman, A.A. "Exploring XML Web collections with DescribeX" . ACM Transactions on the Web, vol. 4, no. 3, 2010.
http://dx.doi.org/10.1145/1806916.1806920
---------- VANCOUVER ----------
Consens, M.P., Miller, R.J., Rizzolo, F., Vaisman, A.A. Exploring XML Web collections with DescribeX. ACM Trans. Web. 2010;4(3).
http://dx.doi.org/10.1145/1806916.1806920