Home > CSC-OpenAccess Library > Manuscript Information
EXPLORE PUBLICATIONS BY COUNTRIES |
EUROPE | |
MIDDLE EAST | |
ASIA | |
AFRICA | |
............................. | |
United States of America | |
United Kingdom | |
Canada | |
Australia | |
Italy | |
France | |
Brazil | |
Germany | |
Malaysia | |
Turkey | |
China | |
Taiwan | |
Japan | |
Saudi Arabia | |
Jordan | |
Egypt | |
United Arab Emirates | |
India | |
Nigeria |
AQPrius: Offline Approximate Query Processing Enhanced by
Error Assessment using Bootstrap Sampling
Feng Yu, Sabin Maharjan, Lucy Kerns, Xiangjia Min, Abdu Arslanyilmaz, Michelle Zhu
Pages - 30 - 47 | Revised - 31-08-2024 | Published - 01-10-2024
MORE INFORMATION
KEYWORDS
Approximate Query Processing, Bootstrap Sampling, Big Data.
ABSTRACT
In this work, we present AQPrius, an offline approximate query processing (AQP) engine that can
efficiently answer complex analytic queries on large datasets. Unlike existing systems that
employ the online AQP schemes, AQPrius employs the offline AQP scheme which has two
advantages: (1) it doesn't require high-end hardware or expensive auxiliary data structures such
as indices or hash tables; (2) the synopses collected are reusable for future queries on the same
database which can significantly save computing resources. However, the error assessment for
offline AQP systems is still a challenging problem. The contributions of this research are four-fold.
First, AQPrius is an offline AQP engine that can quickly answercommon analytic queries including
selection conditions, join conditions, and aggregate functions. It can speed up complex query
processing on big data. Second, AQPrius enables error assessment using a non-parametric
statistic method, namely bootstrap sampling, that can provide the standard error of query
estimation. Third, using the standard error by bootstrap sampling, we extend the traditional offline
AQP system from providing a single-point query estimation to a range estimation which is a
bounded answer presented as a confidence interval (CI). Finally, the system is developed using
the Rust programming language which can prevent many security issues and potential
vulnerabilities. We evaluate AQPrius using the well-known TPC-H benchmarks. The experimental
results show that AQPrius can rapidly generate accurate bounded query answers for various test
queries with selection and join conditions.
"TPC-H Benchmark." [Online]. Available: https://www.tpc.org/tpch/ | |
B. Efron and R. J. Tibshirani, An introduction to the bootstrap. CRC press, 1994. | |
C. Jermaine, S. Arumugam, A. Pol, and A. Dobra, "Scalable approximate query processing with the DBO engine," ACM Trans. Database Syst., vol. 33, no. 4, pp. 1-54, 2008, doi: 10.1145/1412331.1412335. | |
D. L. Quoc et al., "Approximate Distributed Joins in Apache Spark," ArXiv e-prints, vol. abs/1805.0, May 2018, [Online]. Available: http://arxiv.org/abs/1805.05874 | |
D. Wilson, W.-C. Hou, and F. Yu, "Scalable Correlated Sampling for Join Query Estimations on Big Data," in Proc. of 28th International Conference on Software Engineering and Data Engineering | |
F. Harris, S. Dascalu, S. Sharma, and R. Wu, Eds., EasyChair, 2019, pp. 41-50. doi: 10.29007/87vt. | |
F. Li et al., "Wander Join: Online Aggregation via Random Walks," Proc. SIGMOD'16, pp. 615-629, 2016. | |
F. Li, B. Wu, K. Yi, and Z. Zhao, "Wander Join and XDB: Online Aggregation via Random Walks," ACM Trans. Database Syst., vol. 44, no. 1, p. 2:1-2:41, Jan. 2019. | |
F. Yu, W.-C. Hou, C. Luo, D. Che, and M. Zhu, "CS2: A New Database Synopsis for Query Estimation," in Proc. SIGMOD'13, ACM, 2013, pp. 469-480. doi: 10.1145/2463676.2463701. | |
F. Yu, W.-C. Hou, C. Luo, D. Che, and M. Zhu, "CS2: a new database synopsis for query estimation," in SIGMOD 2013, ACM, 2013, pp. 469-480. | |
J. Bater, Y. Park, X. He, X. Wang, and J. Rogers, "Saqe: practical privacy-preserving approximate query processing for data federations," Proceedings of the VLDB Endowment, vol. 13, no. 12, pp. 2691-2705, 2020. | |
J. Spiegel and N. Polyzotis, "TuG synopses for approximate query answering," ACM Trans. Database Syst., vol. 34, no. 1, p. 3:1—-3:56, Apr. 2009, doi: 10.1145/1508857.1508860. | |
K. Li and G. Li, "Approximate query processing: what is new and where to go?," Data Science and Engineering, vol. 3, no. 4, pp. 379-397, 2018. | |
M. Sch, J. Schildgen, and S. Deßloch, "Sampling with Incremental MapReduce," in Datenbanksysteme für Business, Technologie und Web (BTW), 2015. | |
Q. Liu, "Approximate Query Processing," in Encyclopedia of Database Systems, L. LIU and M. T. ÖZSU, Eds., Springer US, 2009, pp. 113-119. doi: 10.1007/978-0-387-39940-9_534. | |
S. Acharya, P. B. Gibbons, V. Poosala, and S. Ramaswamy, "Join Synopses for Approximate Query Answering," in Proc. SIGMOD'99, ACM, 1999, pp. 275-286. | |
S. Agarwal et al., "BlinkDB: Queries with Bounded Errors and Bounded Response Times on Very Large Data," in Eurosys'13, 2013, pp. 29-42. doi: 10.1145/2465351.2465355. | |
S. Agarwal et al., "Knowing When You're Wrong: Building Fast and Reliable Approximate Query Processing Systems," in Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data - SIGMOD, 2014, pp. 481-492. doi: 10.1145/2588555.2593667. | |
S. Chaudhuri, B. Ding, and S. Kandula, "Approximate query processing: No silver bullet," in Proc. SIGMOD'17, 2017, pp. 511-519. | |
T. Siddiqui, A. Jindal, S. Qiao, H. Patel, and W. Le, "Cost models for big data query processing: Learning, retrofitting, and our findings," in Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, 2020, pp. 99-113. | |
T. Tian, "Social big data: techniques and recent applications," International Journal of Computer Science and Security (IJCSS), vol. 14, no. 5, p. 224, 2020. | |
V. Leis, B. Radke, A. Gubichev, A. Kemper, and T. Neumann, "Cardinality Estimation Done Right: Index-Based Join Sampling," in Proc. CIDR'17, 2017. | |
Y. Chen and K. Yi, "Two-Level Sampling for Join Size Estimation," in Proc. ICDE'17, ACM, 2017, pp. 759-774. doi: 10.1145/3035918.3035921. | |
Y. Park, B. Mozafari, J. Sorenson, and J. Wang, "VerdictDB: universalizing approximate query processing," in Proc. SIGMOD'18, ACM, 2018, pp. 1461-1476. | |
Z. Zhou, H. Zhang, S. Li, and X. Du, "Hermes: A Privacy-Preserving Approximate Search Framework for Big Data," IEEE Access, vol. 6, pp. 20009-20020, 2018, doi: 10.1109/ACCESS.2017.2788013. | |
Dr. Feng Yu
Computer Science and Information Systems, Youngstown State University, Youngstown, OH 44555 - United States of America
fyu@ysu.edu
Mr. Sabin Maharjan
Computer Science and Information Systems, Youngstown State University, Youngstown, OH 44555 - United States of America
Dr. Lucy Kerns
Statistics and Mathematics, Youngstown State University, Youngstown, OH 44555 - United States of America
Dr. Xiangjia Min
Bioinformatics and Plant Biology, Youngstown State University, Youngstown, OH 44555 - United States of America
Dr. Abdu Arslanyilmaz
Computer Science and Information Systems, Youngstown State University, Youngstown, OH 44555 - United States of America
Dr. Michelle Zhu
School of Computing, Montclair State University, Montclair, NJ 07043 - United States of America
|
|
|
|
View all special issues >> | |
|
|