Home > CSC-OpenAccess Library > Manuscript Information
EXPLORE PUBLICATIONS BY COUNTRIES |
![]() |
![]() |
EUROPE |
![]() |
MIDDLE EAST |
![]() |
ASIA |
![]() |
AFRICA |
............................. | |
![]() |
United States of America |
![]() |
United Kingdom |
![]() |
Canada |
![]() |
Australia |
![]() |
Italy |
![]() |
France |
![]() |
Brazil |
![]() |
Germany |
![]() |
Malaysia |
![]() |
Turkey |
![]() |
China |
![]() |
Taiwan |
![]() |
Japan |
![]() |
Saudi Arabia |
![]() |
Jordan |
![]() |
Egypt |
![]() |
United Arab Emirates |
![]() |
India |
![]() |
Nigeria |
A Comprehensive Survey on Security Features and Vulnerabilities in Data Science Tools
Fatema Islam Meem, Imran Hussain Mahdy, Sabiha Jannath Tisha, Shahidur Rahoman Sohag
Pages - 17 - 38 | Revised - 31-12-2024 | Published - 01-02-2025
MORE INFORMATION
KEYWORDS
Data Science Security, Data Science Tools, Data Protection, Secure Data Analysis.
ABSTRACT
Data science tools have grown quickly, changing many industries by allowing advanced data analysis, predictive models, and more intelligent decisions. However, their rapid development has also introduced significant security challenges and vulnerabilities. This study investigates the security features and weaknesses commonly found in widely used data science tools. The analysis focuses on key security mechanisms and identifies frequent vulnerabilities. The research aims to comprehensively comprehend the security landscape within the data science domain by examining these aspects. The findings underline the critical need for robust security protocols to safeguard data integrity, confidentiality, and privacy in data-driven processes. This work aims to guide users in adopting better security strategies and enhancing the overall safety of their data science workflows.
Ablahd, Ann Zeki (2023). "Using python to detect web application vulnerability". Res Militaris 13.2, pp. 1045-1058. | |
Acito, Frank (2023). "Predictive analytics with KNIME". Analytics for citizen data scientists. Switzerland: Springer. | |
Administration & Architecture (2020). https://community.databricks.com. | |
Al-khateeb, Samer and Nitin Agarwal (2020). "Social cyber forensics: leveraging open source information and social network analysis to advance cyber security informatics". Computational and Mathematical Organization Theory 26, pp. 412-430. | |
Alharbi, Fuad S (2020). "Dealing with Data Breaches Amidst Changes In Technology." International Journal of Computer Science and Security (IJCSS) 14.3, pp. 108-115. | |
Almalki, Sultan Ahmed and Jia Song (2020). "A review on data falsification-based attacks in cooperative intelligent transportation systems". International Journal of Computer Science and Security (IJCSS) 14, p. 22. | |
Ambrosio-Cestero, Gregorio, Jose-Raul Ruiz-Sarmiento, and Javier Gonzalez-Jimenez (2023). "The Robot@ Home2 dataset: A new release with improved usability tools". SoftwareX 23, p. 101490. | |
Ankam, Venkat (2016). Big data analytics. Packt Publishing Ltd. | |
Antiga, Luca Pietro Giovanni, Eli Stevens, and Thomas Viehmann (2020). Deep learning with PyTorch. Simon and Schuster. | |
Apache Log4j2 vulnerability (Log4shell) (2022). https://kb.tableau.com/QuickFix?id=kA46Q000000oNkl. | |
Aziz, Khadija, Dounia Zaidouni, and Mostafa Bellafkih (2018). "Real-time data analysis using Spark and Hadoop". 2018 4th international conference on optimization and applications (ICOA). IEEE, pp. 1-6. | |
Baviskar, M. R., Nagargoje, P. N., Deshmukh, P. A., & Baviskar, R. R. (2021). A survey of data science techniques and available tools. International Research Journal of Engineering and Technology (IRJET), 8.04, 4258-4263. | |
Bhosale, Harshawardhan S and Devendra P Gadekar (2014). "A review paper on big data and hadoop". International Journal of Scientific and Research Publications 4.10, pp. 1-7. | |
Bilokon, Paul, Oleksandr Bilokon, and Saeed Amen (2023). "A compendium of data sources for data science, machine learning, and artificial intelligence". arXiv preprint arXiv:2309.05682. | |
Boros, Attila Péter, Péter Lehotay-Kéry, and Attila Kiss (2023). "Performance impact of network security features on log processing with spark". Annales Universitatis Scientiarum Budapestinensis de Rolando Eotvos Nominatae. Sectio Computatorica. Vol. 55. | |
Buratti, B. J., Eichmann, P., Shang, Z., Zgraggen, E., Blanc, J., Bowditch, N., ... & Yang, P. (2023). Should Drag-and-Drop Analytics Become Part of the Data Scientist Toolkit?. | |
Cao, Phuong (2024). "Jupyter Notebook Attacks Taxonomy: Ransomware, Data Exfiltration, and Security Misconfiguration". arXiv preprint arXiv:2409.19456. | |
Carvalho, Marcelo de (2024). "A Data Reference Architecture for Brazilian Electrical Companies". PhD thesis. PUC-Rio. | |
Cecil, Roy R and Jorge Soares (2019). "IBM Watson studio: a platform to transform data to intelligence". Pharmaceutical Supply Chains-Medicines Shortages, pp. 183-192. | |
Chandra, K. U., Teja, R. S., Arelli, S., & Das, D. (2022, November). CattleCare: IoT-Based Smart Collar for Automatic Continuous Vital and Activity Monitoring of Cattle. In 2022 International Conference on Futuristic Technologies (INCOFT). IEEE, pp. 1-7. | |
Chang, Ming-Li Emily and Hui Na Chua (2019). "SQL and NoSQL database comparison: from performance perspective in supporting semi-structured data". Advances in Information and Communication Networks: Proceedings of the 2018 Future of Information and Communication Conference (FICC), Vol. 1. Springer, pp. 294-310. | |
Ciaburro, Giuseppe (2017). MATLAB for machine learning. Packt Publishing Ltd. | |
Cloud Security Report: Tableau Cloud Security in the Cloud (2024). https://www.tableau.com/learn/whitepapers/tableau-online-security-cloud. | |
Critical Security fixes for Qlik Sense Enterprise for Windows (CVE-2023-41266, CVE-2023-41265) (2024). https://customerportal.,qlik.com/article/Critical-Security-fixes-for-Qlik-Sense-Enterprise-for-Windows-CVE-2023. | |
Cyber security risks to artificial intelligence (2024). https://www.gov.uk/government/publications/research-on-the-cyber-security-of-ai/cyber-security-risks-to-artificial-intelligence. | |
Dwivedi, S., Balaji, R., Ampatt, P., & Sudarsan, S. D. (2023, December). A Survey on Security Threats and Mitigation Strategies for NoSQL Databases: MongoDB as a Use Case. In International Conference on Information Systems Security. Cham: Springer Nature Switzerland, pp. 57-76. | |
Edwards, Dr Jason (2024). "Audit Log Management". Critical Security Controls for Effective Cyber Defense: A Comprehensive Guide to CIS 18 Controls. Springer, pp. 211-245. | |
Eisenmann, Thomas R (2008). "Managing proprietary and shared platforms". California management review 50.4, pp. 31-53. | |
Elhalid, Osama Burak, Zaynelabdin Alm Alhelal, and Samer Hassan (2023). "Exploring the Fundamentals of Python Programming: A comprehensive guide for beginners". International Journal of Computer and Information Sciences. | |
Elshawi, R., Sakr, S., Talia, D., & Trunfio, P. (2018). Big data systems meet machine learning challenges: towards big data science as a service. Big data research, 14, pp. 1-11. | |
Empowering Organizations with Solutions They Can Trust (n.d.). https://www.qlik.com/us/trust?. | |
Fillbrunn, A., Dietz, C., Pfeuffer, J., Rahn, R., Landrum, G. A., & Berthold, M. R. (2017). KNIME for reproducible cross-domain analysis of life science data. Journal of biotechnology, 261, pp. 149-156. | |
Fowdur, T. P., Beeharry, Y., Hurbungs, V., Bassoo, V., & Ramnarain-Seetohul, V. (2018). Big data analytics with machine learning tools. Internet of things and big data analytics toward next-generation intelligence, pp. 49-97. | |
Galluccio, E., Caselli, E., & Lombari, G. (2020). SQL injection strategies: Practical techniques to secure old vulnerabilities against modern attacks. Packt Publishing Ltd. | |
Ghaffar, A. (2020). Integration of business intelligence dashboard for enhanced data analytics capabilities. | |
Gorelik, A. (2019). The enterprise big data lake: Delivering the promise of big data and data science. O'Reilly Media. | |
Gupta, N. (2023). Critical Apache vulnerabilities—Impact of Tableau. https://community.tableau.com/s/question/0D58b0000BgN6AyCQK/critical-apache-vulnerabilities-impact-of-tableau. | |
Gupta, P. (2021). Practical data science with Jupyter: Explore data cleaning, pre-processing, data wrangling, feature engineering, and machine learning using Python and Jupyter (English edition). Bpb Publications. | |
Gupta, Y. K., & Kumari, S. (2020). A study of big data analytics using Apache Spark with Python and Scala. In 2020 3rd International Conference on Intelligent Sustainable Systems (ICISS) (pp. 471-478). IEEE. | |
Hajare, R., Hodage, R., Wangwad, O., Mali, Y., & Bagwan, F. (2021). Data security in cloud. International Journal of Scientific Research in Computer Science, Engineering and Information Technology (IJSRCSEIT), 8(3), 240-245. | |
Haq, H. B. U., Kayani, H. U. R., Toor, S. K., Zafar, S., & Khalid, I. (2020). The popular tools of data sciences: Benefits, challenges and applications. IJCSNS, 20(5), 65. | |
Hazra, R., Chatterjee, P., Singh, Y., Podder, G., & Das, T. (2024). Data encryption and secure communication protocols. In Strategies for E-Commerce Data Security: Cloud, Blockchain, AI, and Machine Learning (pp. 546-570). IGI Global. | |
Henry, E., Heath, I., & de Jong, P. (2022). Workflow automation in Alteryx for tax season processes. | |
Hussein, A. A. (2020). Using Hadoop technology to overcome big data problems by choosing proposed cost-efficient scheduler algorithm for heterogeneous Hadoop system (BD3). Journal of Scientific Research and Reports, 26(9), 58-84. | |
Islam, M., Shamsa, K., Khush, B., Khadija, K., Muhammad, U., & Rashid, K. (2020). Data-driven decision support system: A business intelligence approach. American Journal of Engineering, 5. | |
Kimm, H., Paik, I., & Kimm, H. (2021). Performance comparison of TPU, GPU, CPU on Google Colaboratory over distributed deep learning. In 2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC) (pp. 312-319). IEEE. | |
Klein, B. T., Tyler, C., & Fields, S. (2022). DevOps and data: Faster-time-to-knowledge through SageOps, MLOps, and DataOps. | |
Kotu, V., & Deshpande, B. (2014). Predictive analytics and data mining: Concepts and practice with RapidMiner. Morgan Kaufmann. | |
Kovacs, E. (2023). Qlik Sense vulnerabilities exploited in ransomware attacks. https://www.securityweek.com/qlik-sense-vulnerabilities-exploited-in-ransomware-attacks/?. | |
Kuszczynski, K., & Walkowski, M. (2023). Comparative analysis of open-source tools for conducting static code analysis. Sensors, 23(18), 7978. | |
L'Esteve, R. (2022). Databricks. In The Azure Data Lakehouse Toolkit: Building and Scaling Data Lakehouses on Azure with Delta Lake, Apache Spark, Databricks, Synapse Analytics, and Snowflake (pp. 83-139). Springer. | |
Labbe, P., Anjos, C., Solanki, K., & DiMaso, J. (2019). Hands-On Business Intelligence with Qlik Sense: Implement self-service data analytics with insights and guidance from Qlik Sense experts. Packt Publishing Ltd. | |
Lakshmanan, V. (2022). Data science on the Google Cloud Platform. O'Reilly Media. | |
Lavin, M. (2016). Using Jupyter notebooks to build code literacy and introduce digital humanities. | |
Llerena, L., Rodriguez, N., Castro, J. W., & Acuña, S. T. (2019). Adapting usability techniques for application in open source software: A multiple case study. Information and Software Technology, 107, 48-64. | |
Mahdy, I. H., Rahman, M., Meem, F. I., & Roy, P. P. (2024). Comparative study between observed and numerical downscaled data of surface air temperature. World Journal of Advanced Research and Reviews, 23(1), 2019-2034. | |
Marasinghe, M. G., & Koehler, K. J. (2018). Statistical data analysis using SAS. | |
Martinez, I., Viles, E., & Olaizola, I. G. (2021). Data science methodologies: Current challenges and future approaches. Big Data Research, 24, 100183. | |
Martinez, W. L., Martinez, A. R., & Solka, J. (2017). Exploratory data analysis with MATLAB. Chapman and Hall/CRC. | |
Meem, F. I., & Mishu, N. D. R. (2023). An evaluation of machine learning models for deep learning image classification with Fashion-MNIST dataset. | |
Mendez, K. M., Pritchard, L., Reinke, S. N., & Broadhurst, D. I. (2019). Toward collaborative open data science in metabolomics using Jupyter notebooks and cloud computing. Metabolomics, 15, 1-16. | |
Miller, J. D. (2019). Hands-On Machine Learning with IBM Watson: Leverage IBM Watson to implement machine learning techniques and algorithms using Python. Packt Publishing Ltd. | |
Mishu, N. D. R., Meem, F. I., Ridwan, A. E., Rahman, M. M., & Mary, M. M. (2021). Quantum error correction using quantum convolutional neural network (Thesis). Brac University. | |
Mittal, M., & Raheja, N. G. (2024). Data visualization and storytelling with Tableau. CRC Press. | |
Molin, S. (2021). Hands-On Data Analysis with Pandas: A Python data science handbook for data collection, wrangling, analysis, and visualization. Packt Publishing Ltd. | |
Morabito, V., & Morabito, V. (2016). Data visualization. In The Future of Digital Business Innovation: Trends and Practices (pp. 61-83). | |
Nguyen, G., Dlugolinsky, S., Bobák, M., Tran, V., López García, Á., Heredia, I., ... & Hluchý, L. (2019). Machine learning and deep learning frameworks and libraries for large-scale data mining: A survey. Artificial Intelligence Review, 52, 77-124. | |
NoSQL injection (2024). https://portswigger.net/web-security/nosql-injection?. | |
Ogutu, J. O. (2016). A methodology to test the richness of forensic evidence of database storage engine: Analysis of MySQL update operation in InnoDB and MyISAM storage engines (PhD thesis). University of Nairobi. | |
Omar, H. K., & Jumaa, A. K. (2019). Big data analysis using Apache Spark MLlib and Hadoop HDFS with Scala and Java. Kurdistan Journal of Applied Research, 4(1), 7-14. | |
Ombiro, Z. B. H. (2016). Mobile-based multi-factor authentication scheme for mobile banking (PhD thesis). University of Nairobi. | |
Pala, S. K. (2021). Databricks analytics: Empowering data processing, machine learning and real-time analytics. Machine Learning, 10(1). | |
Paper, D. (2019). Hands-on Scikit-Learn for Machine Learning Applications: Data Science Fundamentals with Python. Apress. | |
Parmar, R. R., Roy, S., Bhattacharyya, D., Bandyopadhyay, S. K., & Kim, T. H. (2017). Large-scale encryption in the Hadoop environment: Challenges and solutions. IEEE Access, 5, 7156-7163. https://doi.org/10.1109/ACCESS.2017.2694431 | |
Patel, A. (2021). Data visualization using Tableau. | |
Pavlenko, L. V., Pavlenko, M. P., Khomenko, V. H., & Mezhuyev, V. I. (2022). Application of R programming language in learning statistics. Proceedings of the 1st Symposium on Advances in Educational Technology, 2, 62-72. | |
Pearson, E., Jensen, R. B., & Adey, P. (2024). Pred-Pol-Pov: Visibility, data flows, and the predictive policing of poverty. Surveillance & Society, 22(2), 120-137. https://doi.org/10.24908/ss.v22i2.7993 | |
Pereira, R. F., Silva, R. M., & Orvalho, J. P. (2020). Virtualization and security aspects: An overview. International Journal of Computer Science and Security (IJCSS), 14(5), 154-163. | |
Pokorný, J. (2020). JSON functionally. In Advances in Databases and Information Systems: 24th European Conference, ADBIS 2020, Lyon, France, August 25-27, 2020, Proceedings 24 (pp. 139-153). Springer. https://doi.org/10.1007/978-3-030-49992-7_12 | |
Pope, D. (2017). Big data analytics with SAS: Get actionable insights from your big data using the power of SAS. Packt Publishing Ltd. | |
Purgindla, V. R. (2018). Data processing and the envision. | |
Ramasubramanian, K., & Singh, A. (2017). Machine learning using R (1st ed.). Springer. | |
Ramuka, M. (2019). Data analytics with Google Cloud platform. BPB Publications. | |
Ranjan, M. K., Barot, K., Khairnar, V., Rawal, V., Pimpalgaonkar, A., Saxena, S., & Sattar, A. M. (2023). Python: Empowering data science applications and research. | |
Raschka, S., Patterson, J., & Nolet, C. (2020). Machine learning in Python: Main developments and technology trends in data science, machine learning, and artificial intelligence. Information, 11(4), 193. https://doi.org/10.3390/info11040193 | |
Rawat, B., & Purnama, S. (2021). MySQL database management system (DBMS) on FTP site LAPAN Bandung. International Journal of Cyber and IT Service Management, 1(2), 173-179. https://doi.org/10.1016/j.ijcism.2021.03.005 | |
Reece, M., Lander, T. E., Stoffolano, M., Sampson, A., Dykstra, J., Mittal, S., & Rastogi, N. (2023). Systemic risk and vulnerability analysis of multi-cloud environments. arXiv preprint arXiv:2306.01862. https://arxiv.org/abs/2306.01862 | |
Sasikala, V. (2017). Big data analytics steps and tools used in the analytical process. Journal of Management and Science, 7(1), 183-195. | |
Security Advisories. (2024). KNIME Security Advisories. https://www.knime.com/security/advisories | |
Security advisory regarding TIBCO Spotfire. (2023). TIBCO Support. https://support.tibco.com/external/article?articleUrl=Security-Advisory-regarding-TIBCO-Spotfire-20231010 | |
Security and compliance guide. (2024). Databricks Documentation. https://docs.databricks.com/en/security/index.html | |
Security guide-Azure Databricks. (2024). Microsoft Learn. https://learn.microsoft.com/en-us/azure/databricks/security | |
Security overview. (2024). Google Cloud Documentation. https://cloud.google.com/docs/security | |
Serra, A. M., Estima, J., & Rodrigues da Silva, A. (2023). Evaluation of Maestro, an extensible general-purpose data gathering and data classification platform. Information Processing & Management, 60(5), 103458. https://doi.org/10.1016/j.ipm.2023.103458 | |
Sewal, P., & Singh, H. (2021). A critical analysis of Apache Hadoop and Spark for big data processing. In 2021 6th International Conference on Signal Processing, Computing and Control (ISPCC) (pp. 308-313). IEEE. https://doi.org/10.1109/ISPCC51984.2021.00061 | |
Shukla, S., George, J. P., Tiwari, K., & Kureethara, J. V. (2022). Data ethics and challenges. In Data Ethics and Challenges (pp. 41-59). Springer. https://doi.org/10.1007/978-3-030-65548-1_5 | |
Sial, A. H., Rashdi, S. Y. S., & Khan, A. H. (2021). Comparative analysis of data visualization libraries Matplotlib and Seaborn in Python. International Journal, 10(1), 277-281. https://doi.org/10.1016/j.ijcss.2021.03.004 | |
Sohag, S. R., Zhang, S., Xian, M., Sun, S., Xu, F., & Ma, Z. (2024). Causality extraction from nuclear licensee event reports using a hybrid framework. arXiv preprint arXiv:2404.05656. https://arxiv.org/abs/2404.05656 | |
Sukhdeve, S. R., & Sukhdeve, S. S. (2023a). Google Colaboratory. In Google Cloud Platform for Data Science: A Crash Course on Big Data, Machine Learning, and Data Analytics Services (pp. 11-34). Springer. https://doi.org/10.1007/978-3-030-85843-1_2 | |
Sukhdeve, S. R., & Sukhdeve, S. S. (2023b). Introduction to GCP. In Google Cloud Platform for Data Science: A Crash Course on Big Data, Machine Learning, and Data Analytics Services (pp. 1-9). Springer. https://doi.org/10.1007/978-3-030-85843-1_1 | |
Sveen, A. F. (2019). Efficient storage of heterogeneous geospatial data in spatial databases. Journal of Big Data, 6(1), 102. https://doi.org/10.1186/s40537-019-0171-6 | |
Svolba, G. (2017). Applying data science: Business case studies using SAS. SAS Institute. | |
Tall, A. M., & Zou, C. C. (2023). A framework for attribute-based access control in processing big data with multiple sensitivities. Applied Sciences, 13(2), 1183. https://doi.org/10.3390/app13021183 | |
Tian, T. (2020). Social big data: Techniques and recent applications. International Journal of Computer Science and Security (IJCSS), 14(5), 224-233. | |
Uta, A., Ghit, B., Dave, A., Rellermeyer, J., & Boncz, P. (2022). In-memory indexed caching for distributed data processing. In 2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS) (pp. 104-114). IEEE. https://doi.org/10.1109/IPDPS53613.2022.00022 | |
VanderPlas, J. (2016). Python data science handbook: Essential tools for working with data. O’Reilly Media, Inc. | |
Wahid, A., & Kashyap, K. (2019). Cassandra-A distributed database system: An overview. In Emerging Technologies in Data Mining and Information Security: Proceedings of IEMIS 2018, Volume 1 (pp. 519-526). Springer. | |
Weiss, C. J. (2020). A creative commons textbook for teaching scientific computing to chemistry students with Python and Jupyter notebooks. Journal of Chemical Education, 98(2), 489-494. https://doi.org/10.1021/acs.jchemed.0c00277 | |
What’s new in RapidMiner AI Hub 9.9.3? (2021). RapidMiner Documentation. https://docs.rapidminer.com/9.10/hub/releases/changes-9.9.3.html | |
Wickham, H., Çetinkaya-Rundel, M., & Grolemund, G. (2023). R for data science. O’Reilly Media, Inc. | |
Wijaya, A. (2022). Data engineering with Google Cloud Platform: A practical guide to operationalizing scalable data analytics systems on GCP. Packt Publishing Ltd. | |
Woodie, A. (2024). New Hadoop and Flink hacks leveraging known configuration vulnerability. Datanami. https://www.datanami.com/2024/01/10/new-hadoop-and-flink-hacks-leveraging-known-configuration-vulnerability/ | |
Xavier, M. (2013). TIBCO Spotfire for Developers. Packt Publishing. | |
Yang, X., Wang, X., Liu, Z., & Shu, F. (2022). M2Coder: A fully automated translator from Matlab M-functions to C/C++ codes for ACS motion controllers. In International Conference on Guidance, Navigation and Control (pp. 3130-3139). Springer Nature Singapore. https://doi.org/10.1007/978-3-030-68240-2_376 | |
Yarlagadda, V. K., & Pydipalli, R. (2018). Secure programming with SAS: Mitigating risks and protecting data integrity. Engineering International, 6(2), 211-222. https://doi.org/10.26713/engint.051_6.2.1388 | |
Zahra, S., & Ashif, A. (2020). A generic view of big data: Tools and techniques. International Journal of Computing and Information Science, 1(1), 1-13. | |
Zhai, Y., Yin, L., Chase, J., Ristenpart, T., & Swift, M. (2016). CQSTR: Securing cross-tenant applications with cloud containers. In Proceedings of the Seventh ACM Symposium on Cloud Computing (pp. 223-236). ACM. https://doi.org/10.1145/2987550. | |
Miss Fatema Islam Meem
Department of Computer Science, University of Idaho, Moscow, 83844 - United States of America
meem7822@vandals.uidaho.edu
Mr. Imran Hussain Mahdy
Department of Chemical and Biological Engineering, University of Idaho, Moscow, 83844 - United States of America
Miss Sabiha Jannath Tisha
Department of Computer Science, University of Idaho, Moscow, 83844 - United States of America
Mr. Shahidur Rahoman Sohag
Department of Computer Science, University of Idaho, Moscow, 83844 - United States of America
|
|
|
|
View all special issues >> | |
|
|