Roopa is a seasoned IT professional with over 17 years of extensive experience across multiple domains including Banking, Airlines, and Insurance. Throughout her career, she has held diverse roles in development, devOps, presales, and consulting, gaining invaluable insights into industry-specific challenges and solutions. She has made significant contributions to renowned organisations such as IBM, Qantas, Cevo, AMP, Macquarie Bank, and currently Interactive, where she leverages her expertise to drive innovation and operational efficiency. As an AWS Community Builder specialising in Machine Learning and AI, Roopa actively fosters knowledge sharing and community engagement within the AWS ecosystem. She is also a prolific technical blogger, regularly sharing insights and best practices through her publications. She is also a finalist in ARN Women in ICT Awards, 2021. Roopa's upcoming talk at Data Engg Bytes will focus on "Transactional Data Lake on AWS," where she will delve into practical strategies and considerations for implementing robust data lake solutions tailored to transactional environments.
In today's data-driven world, constructing a transactional data lake that ensures consistency and accuracy has many challenges. I embarked on the journey to architect a state-of-the-art data lake, evaluating various data format solutions and architectural layers to find the optimal approach. After extensive consideration, Apache Iceberg was chosen as the table format, particularly for its robust support for SQL analytics and upsert incremental updates, which outperformed other open table formats in these areas. This talk, aimed at intermediate-level practitioners, will delve into the intricacies of how I built a system that maintains data integrity and accuracy while managing transactional data at scale. I will explore the design and implementation of a two-layer architecture consisting of raw and curated data layers, which forms the foundation of this platform. Additionally, I’ll cover the architectural decisions, challenges faced, trade-offs made, and the innovative solutions that led to the success of this project. Join me as I share the strategies, techniques, and technologies behind building a robust, scalable, and reliable data lake, offering insights you can apply to your own data engineering projects.
Starts: 3:05 PM
Ends: 3:30 PM