Conferencia

Garbervetsky, D.; Pavlinovic, Z.; Barnett, M.; Musuvathi, M.; Mytkowicz, T.; Zoppi, E.; Zisman A.; Bodden E.; Schafer W.; van Deursen A.; Special Interest Group on Software Engineering (ACM SIGSOFT) "Static analysis for optimizing big data queries" (2017) 11th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, ESEC/FSE 2017. Part F130154:932-937
Estamos trabajando para incorporar este artículo al repositorio
Consulte el artículo en la página del editor

Abstract:

Query languages for big data analysis provide user extensibility through a mechanism of user-deined operators (UDOs). These operators allow programmers to write proprietary functionalities on top of a relational query skeleton. However, achieving efective query optimization for such languages is extremely challenging since the optimizer needs to understand data dependencies induced by UDOs. SCOPE, the query language from Microsoft, allows for hand coded declarations of UDO data dependencies. Unfortunately, most programmers avoid using this facility since writing and maintaining the declarations is tedious and error-prone. In this work, we designed and implemented two sound and robust static analyses for computing UDO data dependencies. The analyses can detect what columns of an input table are never used or pass-through a UDO unchanged. This information can be used to signiicantly improve execution of SCOPE scripts. We evaluate our analyses on thousands of real-world queries and show we can catch many unused and pass-through columns automatically without relying on any manually provided declarations. © 2017 Association for Computing Machinery.

Registro:

Documento: Conferencia
Título:Static analysis for optimizing big data queries
Autor:Garbervetsky, D.; Pavlinovic, Z.; Barnett, M.; Musuvathi, M.; Mytkowicz, T.; Zoppi, E.; Zisman A.; Bodden E.; Schafer W.; van Deursen A.; Special Interest Group on Software Engineering (ACM SIGSOFT)
Filiación:Universidad de Buenos Aires, FCEyN, DC ICC, CONICET, Argentina
New York University, United States
Microsoft Research, United States
Palabras clave:Big Data; Query optimization; Static analysis; UDOs; Query languages; Software engineering; Static analysis; Data dependencies; Data query; Error prones; Optimizers; Query optimization; Real-world; Relational queries; UDOs; Big data
Año:2017
Volumen:Part F130154
Página de inicio:932
Página de fin:937
DOI: http://dx.doi.org/10.1145/3106237.3117774
Título revista:11th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, ESEC/FSE 2017
Título revista abreviado:Proc ACM SIGSOFT Symp Found Software Eng
Registro:https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_97814503_vPartF130154_n_p932_Garbervetsky

Referencias:

  • U-SQL, the New Big Data Language for Azure Data Lake, , https://azure.microsoft.com/en-us/blog/u-sql-the-new-big-data-language-for-azure-data-lake/, Accessed: 2017-05-09
  • Aho, A.V., Sethi, R., Ullman, J.D., (1986) Compilers: Principles Techniques and Tools, , Addison-Wesley
  • Barnett, M., Fähndrich, M., Logozzo, F., Garbervetsky, D., Annotations for (more) precise points-to analysis (2007) IWACO, pp. 11-18
  • Blanchet, B., Escape analysis: Correctness proof, implementation and experimental results (1998) POPL, pp. 25-37. , ACM
  • Chaiken, R., Jenkins, B., Larson, P.-A., Ramsey, B., Shakib, D., Weaver, S., Zhou, J., Scope: Easy and eicient parallel processing of massive data sets (2008) Proceedings of the VLDB Endowment, 1 (2), pp. 1265-1276
  • Cousot, P., Cousot, R., Abstract interpretation frameworks (1992) Journal of Logic and Computation, 2 (4), pp. 511-547
  • Guo, Z., Fan, X., Chen, R., Zhang, J., Zhou, H., McDirmid, S., Liu, C., Zhou, L., Spotting code optimizations in data-parallel pipelines through periscope (2012) OSDI, pp. 121-133
  • Livshits, B., Sridharan, M., Smaragdakis, Y., Lhoták, O., Amaral, J.N., Chang, B.E., Guyer, S.Z., Vardoulakis, D., Defense of soundiness: A manifesto (2015) Commun. ACM, 58 (2), pp. 44-46
  • Logozzo, F., Clousot: Static contract checking with abstract interpretation Formal Veriication of Object-Oriented Software, p. 5
  • Muchnick, S.S., (1997) Advanced Compiler Design Implementation, , Morgan Kaufmann
  • Salcianu, A., Rinard, M., Purity and side efect analysis for Java programs (2005) VMCAI, pp. 199-215. , Springer
  • Steensgaard, B., Points-to analysis in almost linear time (1996) POPL, pp. 32-41. , ACM
  • Xia, S., Fähndrich, M., Logozzo, F., Inferring datalow properties of user deined table processors (2009) SAS, pp. 19-35
  • Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I., Spark: Cluster computing with working sets (2010) HotCloud, 10 (10), p. 95. , 10
  • Zhou, J., Bruno, N., Wu, M., Larson, P., Chaiken, R., Shakib, D., SCOPE: Parallel databases meet mapreduce (2012) VLDB J., 21 (5), pp. 611-636A4 - Special Interest Group on Software Engineering (ACM SIGSOFT)

Citas:

---------- APA ----------
Garbervetsky, D., Pavlinovic, Z., Barnett, M., Musuvathi, M., Mytkowicz, T., Zoppi, E., Zisman A.,..., Special Interest Group on Software Engineering (ACM SIGSOFT) (2017) . Static analysis for optimizing big data queries. 11th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, ESEC/FSE 2017, Part F130154, 932-937.
http://dx.doi.org/10.1145/3106237.3117774
---------- CHICAGO ----------
Garbervetsky, D., Pavlinovic, Z., Barnett, M., Musuvathi, M., Mytkowicz, T., Zoppi, E., et al. "Static analysis for optimizing big data queries" . 11th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, ESEC/FSE 2017 Part F130154 (2017) : 932-937.
http://dx.doi.org/10.1145/3106237.3117774
---------- MLA ----------
Garbervetsky, D., Pavlinovic, Z., Barnett, M., Musuvathi, M., Mytkowicz, T., Zoppi, E., et al. "Static analysis for optimizing big data queries" . 11th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, ESEC/FSE 2017, vol. Part F130154, 2017, pp. 932-937.
http://dx.doi.org/10.1145/3106237.3117774
---------- VANCOUVER ----------
Garbervetsky, D., Pavlinovic, Z., Barnett, M., Musuvathi, M., Mytkowicz, T., Zoppi, E., et al. Static analysis for optimizing big data queries. Proc ACM SIGSOFT Symp Found Software Eng. 2017;Part F130154:932-937.
http://dx.doi.org/10.1145/3106237.3117774