Optimizing IoT Data Pipelines Using Oracle Autonomous Databases and AI Analytics
Keywords:
IoT Data Pipeline, Oracle ADW, AI/MLAbstract
The rapid rise of Internet of Things (IoT) devices necessitates the existence of architectures that are scalable, efficient, and cost-effective for managing and handling data streams with high frequency. The inefficiencies in implementation and costs incurred in keeping traditional data pipelines running derive from a lack of speed and scalability in storage, as well as real-time analytics. In this paper, we present a solution for an end-to-end IoT data pipeline by Oracle Cloud Infrastructure (OCI), designed to address the described challenges through the integration of Oracle Autonomous Database (ADW), OCI Streaming, serverless functions, and AI-analytics.
The architecture proposed starts with ingestion via a secure API gateway of sensor data from over 10,000 industrial IoT devices. The real-time processing is carried out using serverless functions for validation, enrichment, and routing to OCI Streaming for efficient queuing. The processed data will subsequently land in Oracle Autonomous Database, which is an autonomous, self-driving, self-securing database optimized for both transactional and analytical workloads. The novelty lies in applying AI-driven analytics inside the database layer using Oracle Machine Learning for anomaly detection and predictive maintenance—truly an in-database approach with no data movement required.
To make the processed data accessible, it was exposed through a Flask-based REST API running on OCI Compute instances with load-balanced and fault-tolerant design features. The architecture offers an uptime of 99.99 percent, with end-to-end latencies under 100 milliseconds and 40 percent savings in infrastructure costs compared to traditional ETL pipelines.
This research thus showcases a real-world application of the architecture in an industrial IoT scenario, leading to a reduction in unplanned downtime by 72 percent via predictive maintenance. The paper provides prescriptive guidance for organizations interested in building scalable and cost-efficient IoT data pipelines, focusing on the intersection of autonomous cloud services, serverless computing, and embedded AI.
Downloads
References
. Kumanov D, Hung LH, Lloyd W, Yeung KY. Serverless computing provides on-demand high performance computing for biomedical research. arXiv preprint. 2018. arXiv:1807.11659.
. Hung LH, Kumanov D, Niu X, Lloyd W, Yeung KY. Rapid RNA sequencing data analysis using serverless computing. bioRxiv. 2019. https://doi.org/10.1101/576199.
. Aytekin A, Johansson M. Harnessing the power of serverless runtimes for large-scale optimization. arXiv preprint. 2019. arXiv:1901.03161.
. Shankar V, Krauth K, Pu Q, Jonas E, Venkataraman S, Stoica I, Recht B, Ragan-Kelley J. Numpywren: serverless linear algebra. arXiv preprint. 2018. arXiv:1810.09679.
. Rathore MM, Paul A, Ahmad A, Anisetti M, Jeon G. Hadoop-based intelligent care system (hics) analytical approach for big data in iot. ACM Trans Internet Technol. 2017;18(1):8–1824. https://doi.org/10.1145/3108936.
. Anisetti M, Ardagna C, Bellandi V, Cremonini M, Frati F, Damiani E. Privacy-aware big data analytics as a service for public health policies in smart cities. Sustain Cities Soc. 2018;39:68–77. https://doi.org/10.1016/j.scs.2017.12.019.
. Sellami M, Mezni H, Hacid MS. On the use of big data frameworks for big service composition. J Netw Comput Appl. 2020;166: 102732. 10.1016/j.jnca.2020.102732.
. Carminati B, Ferrari E, K Hung, PC. Security conscious web service composition. In: 2006 IEEE International Conference on Web Services (ICWS’06), 2006;489–496. https://doi.org/10.1109/ICWS.2006.115.
. Vavilis S, Petković M, Zannone N. Data leakage quantification. In: Atluri V, Pernul G, editors. Data and applications security and privacy XXVIII. Berlin, Heidelberg: Springer; 2014. p. 98–113. https://doi.org/10.1007/978-3-662-43936-4_7.
. Rahman M, Hassan MR, Buyya R. Jaccard index based availability prediction in enterprise grids. Procedia Comput Sci. 2010;1(1):2707–16. https://doi.org/10.1016/j.procs.2010.04.304.
. Fuglede B, Topsoe F. Jensen-shannon divergence and hilbert space embedding. In: International Symposium onInformation Theory, 2004. ISIT 2004. Proceedings. IEEE. https://doi.org/10.1109/isit.2004.1365067. http://dx.doi.org/10.1109/ISIT.2004.1365067.
. Machanavajjhala A, Kifer D, Gehrke J, Venkitasubramaniam M. L-diversity: privacy beyond k-anonymity. ACM Trans Knowl Discov Data. 2007;1(1):3. https://doi.org/10.1145/1217299.1217302.
. Majeed A, Lee S. Anonymization techniques for privacy preserving data publishing: a comprehensive survey. IEEE Access. 2021;9:8512–45. https://doi.org/10.1109/ACCESS.2020.3045700.
. Fung BCM, Wang K, Fu AW-C, Yu PS. Introduction to Privacy-Preserving Data Publishing: Concepts and Techniques, 1st edn. Chapman & Hall/CRC, New York 2010. https://doi.org/10.1201/9781420091502.
. Sharma A, Singh G, Rehman S. A review of big data challenges and preserving privacy in big data. In: Kolhe ML, Tiwari S, Trivedi MC, Mishra KK, editors. Advances in data and information sciences. Singapore: Springer; 2020. p. 57–65. https://doi.org/10.1007/978-981-15-0694-9_7.
. Colombo P, Ferrari E. Access control technologies for big data management systems: literature review and future trends. Cybersecurity. 2019;2(1):3. https://doi.org/10.1186/s42400-018-0020-9.
. Geetha P, Naikodi C, Setty SLN. Design of big data privacy framework–a balancing act. In: Jain V, Chaudhary G, Taplamacioglu MC, Agarwal MS, editors. Advances in data sciences, security and applications. Singapore: Springer; 2020. p. 253–65. https://doi.org/10.1007/978-981-15-0372-6_19.
. van den Broek T, van Veenstra AF. Governance of big data collaborations: how to balance regulatory compliance and disruptive innovation. Technol Forecasting Social Change. 2018;129:330–8. https://doi.org/10.1016/j.techfore.2017.09.040.
. Ahlbrandt J, Brammen D, Majeed R, Lefering R, Semler S, Thun S, Walcher F, Rohrig R. Balancing the need for big data and patient data privacy-an it infrastructure for a decentralized emergency care research database. Stud Health Technol Inf. 2014;205:750–4. https://doi.org/10.3233/978-1-61499-432-9-750.
. Hotz V, Bollinger C, Komarova T, Manski C, Moffitt R, Nekipelov D, Sojourner A, Spencer B. Balancing data privacy and usability in the federal statistical system. Proc Natl Acad Sci. 2022. https://doi.org/10.1073/pnas.2104906119.
. Creese S, Hopkins P, Pearson S, Shen Y. Data protection-aware design for cloud services. In: Jaatun MG, Zhao G, Rong C, editors. Cloud Comput. Berlin, Heidelberg: Springer; 2009. p. 119–30. https://doi.org/10.1007/978-3-642-10665-1_11.
. Al-Badi A, Tarhini A, Khan AI. Exploring big data governance frameworks. Procedia Comput Sci. 2018;141:271–7. https://doi.org/10.1016/j.procs.2018.10.181.
. Aissa MMB, Sfaxi L, Robbana R. DECIDE: a new decisional big data methodology for a better data governance. In: Proc. of EMCIS, vol. 402. Dubai, EAU, 2020;63–78. https://doi.org/10.1007/978-3-030-63396-7_5.
. Anisetti M, Bena N, Berto F, Jeon G. A devsecops-based assurance process for big data analytics. In: Proc. of IEEE ICWS 2022, Barcelona, Spain 2022. https://doi.org/10.1109/ICWS55610.2022.00017.
. Marco A, A, AC, Chiara B, Ernesto D, Antongiacomo, P, Alessandro B. Dynamic and scalable enforcement of access control policies for big data, pp. 71–78. Association for Computing Machinery, New York, NY, USA 2021. https://doi.org/10.1145/3444757.3485107.