EXPLORE PUBLICATIONS BY COUNTRIES


	EUROPE

	MIDDLE EAST

	ASIA

	AFRICA
.............................

	United States of America

	United Kingdom

	Canada

	Australia

	Italy

	France

	Brazil

	Germany

	Malaysia

	Turkey

	China

	Taiwan

	Japan

	Saudi Arabia

	Jordan

	Egypt

	United Arab Emirates

	India

	Nigeria

AQPrius: Offline Approximate Query Processing Enhanced by Error Assessment using Bootstrap Sampling

Feng Yu, Sabin Maharjan, Lucy Kerns, Xiangjia Min, Abdu Arslanyilmaz, Michelle Zhu

Pages - 30 - 47 | Revised - 31-08-2024 | Published - 01-10-2024

Published in International Journal of Computer Science and Security (IJCSS)

Volume - 18 Issue - 3 | Publication Date - October 2024 Table of Contents

MORE INFORMATION

References | Abstracting & Indexing

KEYWORDS

Approximate Query Processing, Bootstrap Sampling, Big Data.

ABSTRACT

In this work, we present AQPrius, an offline approximate query processing (AQP) engine that can efficiently answer complex analytic queries on large datasets. Unlike existing systems that employ the online AQP schemes, AQPrius employs the offline AQP scheme which has two advantages: (1) it doesn't require high-end hardware or expensive auxiliary data structures such as indices or hash tables; (2) the synopses collected are reusable for future queries on the same database which can significantly save computing resources. However, the error assessment for offline AQP systems is still a challenging problem. The contributions of this research are four-fold. First, AQPrius is an offline AQP engine that can quickly answercommon analytic queries including selection conditions, join conditions, and aggregate functions. It can speed up complex query processing on big data. Second, AQPrius enables error assessment using a non-parametric statistic method, namely bootstrap sampling, that can provide the standard error of query estimation. Third, using the standard error by bootstrap sampling, we extend the traditional offline AQP system from providing a single-point query estimation to a range estimation which is a bounded answer presented as a confidence interval (CI). Finally, the system is developed using the Rust programming language which can prevent many security issues and potential vulnerabilities. We evaluate AQPrius using the well-known TPC-H benchmarks. The experimental results show that AQPrius can rapidly generate accurate bounded query answers for various test queries with selection and join conditions.

REFERENCES

"TPC-H Benchmark." [Online]. Available: https://www.tpc.org/tpch/

B. Efron and R. J. Tibshirani, An introduction to the bootstrap. CRC press, 1994.

C. Jermaine, S. Arumugam, A. Pol, and A. Dobra, "Scalable approximate query processing with the DBO engine," ACM Trans. Database Syst., vol. 33, no. 4, pp. 1-54, 2008, doi: 10.1145/1412331.1412335.

D. L. Quoc et al., "Approximate Distributed Joins in Apache Spark," ArXiv e-prints, vol. abs/1805.0, May 2018, [Online]. Available: http://arxiv.org/abs/1805.05874

D. Wilson, W.-C. Hou, and F. Yu, "Scalable Correlated Sampling for Join Query Estimations on Big Data," in Proc. of 28th International Conference on Software Engineering and Data Engineering

F. Harris, S. Dascalu, S. Sharma, and R. Wu, Eds., EasyChair, 2019, pp. 41-50. doi: 10.29007/87vt.

F. Li et al., "Wander Join: Online Aggregation via Random Walks," Proc. SIGMOD'16, pp. 615-629, 2016.

F. Li, B. Wu, K. Yi, and Z. Zhao, "Wander Join and XDB: Online Aggregation via Random Walks," ACM Trans. Database Syst., vol. 44, no. 1, p. 2:1-2:41, Jan. 2019.

F. Yu, W.-C. Hou, C. Luo, D. Che, and M. Zhu, "CS2: A New Database Synopsis for Query Estimation," in Proc. SIGMOD'13, ACM, 2013, pp. 469-480. doi: 10.1145/2463676.2463701.

F. Yu, W.-C. Hou, C. Luo, D. Che, and M. Zhu, "CS2: a new database synopsis for query estimation," in SIGMOD 2013, ACM, 2013, pp. 469-480.

J. Bater, Y. Park, X. He, X. Wang, and J. Rogers, "Saqe: practical privacy-preserving approximate query processing for data federations," Proceedings of the VLDB Endowment, vol. 13, no. 12, pp. 2691-2705, 2020.

J. Spiegel and N. Polyzotis, "TuG synopses for approximate query answering," ACM Trans. Database Syst., vol. 34, no. 1, p. 3:1—-3:56, Apr. 2009, doi: 10.1145/1508857.1508860.

K. Li and G. Li, "Approximate query processing: what is new and where to go?," Data Science and Engineering, vol. 3, no. 4, pp. 379-397, 2018.

M. Sch, J. Schildgen, and S. De�loch, "Sampling with Incremental MapReduce," in Datenbanksysteme f�r Business, Technologie und Web (BTW), 2015.

Q. Liu, "Approximate Query Processing," in Encyclopedia of Database Systems, L. LIU and M. T. ÖZSU, Eds., Springer US, 2009, pp. 113-119. doi: 10.1007/978-0-387-39940-9_534.

S. Acharya, P. B. Gibbons, V. Poosala, and S. Ramaswamy, "Join Synopses for Approximate Query Answering," in Proc. SIGMOD'99, ACM, 1999, pp. 275-286.

S. Agarwal et al., "BlinkDB: Queries with Bounded Errors and Bounded Response Times on Very Large Data," in Eurosys'13, 2013, pp. 29-42. doi: 10.1145/2465351.2465355.

S. Agarwal et al., "Knowing When You're Wrong: Building Fast and Reliable Approximate Query Processing Systems," in Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data - SIGMOD, 2014, pp. 481-492. doi: 10.1145/2588555.2593667.

S. Chaudhuri, B. Ding, and S. Kandula, "Approximate query processing: No silver bullet," in Proc. SIGMOD'17, 2017, pp. 511-519.

T. Siddiqui, A. Jindal, S. Qiao, H. Patel, and W. Le, "Cost models for big data query processing: Learning, retrofitting, and our findings," in Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, 2020, pp. 99-113.

T. Tian, "Social big data: techniques and recent applications," International Journal of Computer Science and Security (IJCSS), vol. 14, no. 5, p. 224, 2020.

V. Leis, B. Radke, A. Gubichev, A. Kemper, and T. Neumann, "Cardinality Estimation Done Right: Index-Based Join Sampling," in Proc. CIDR'17, 2017.

Y. Chen and K. Yi, "Two-Level Sampling for Join Size Estimation," in Proc. ICDE'17, ACM, 2017, pp. 759-774. doi: 10.1145/3035918.3035921.

Y. Park, B. Mozafari, J. Sorenson, and J. Wang, "VerdictDB: universalizing approximate query processing," in Proc. SIGMOD'18, ACM, 2018, pp. 1461-1476.

Z. Zhou, H. Zhang, S. Li, and X. Du, "Hermes: A Privacy-Preserving Approximate Search Framework for Big Data," IEEE Access, vol. 6, pp. 20009-20020, 2018, doi: 10.1109/ACCESS.2017.2788013.

MANUSCRIPT AUTHORS

Dr. Feng Yu

Computer Science and Information Systems, Youngstown State University, Youngstown, OH 44555 - United States of America

fyu@ysu.edu

Mr. Sabin Maharjan

Computer Science and Information Systems, Youngstown State University, Youngstown, OH 44555 - United States of America

Dr. Lucy Kerns

Statistics and Mathematics, Youngstown State University, Youngstown, OH 44555 - United States of America

Dr. Xiangjia Min

Bioinformatics and Plant Biology, Youngstown State University, Youngstown, OH 44555 - United States of America

Dr. Abdu Arslanyilmaz

Computer Science and Information Systems, Youngstown State University, Youngstown, OH 44555 - United States of America

Dr. Michelle Zhu

School of Computing, Montclair State University, Montclair, NJ 07043 - United States of America

CREATE AUTHOR ACCOUNT

LAUNCH YOUR SPECIAL ISSUE

View all special issues >>

PUBLICATION VIDEOS