Main track
Registration
Please collect your lanyards from the front counter.
Introduction
Lars Klint will welcome us to the conference.
Keynote: Battle of Centralized vs Personal Compute
Being part of this industry means getting comfortable holding on to an endlessly swinging pendulum of trends. Centralized vs Personal. Single-Node vs Distributed. Small Data vs Big Data. Structured Data vs Unstructured Data. Cloud Compute vs On-Premise Compute. As engineers, we care about these trends because we want to learn news things and advance our careers. But our users don't care as long as we can get the job done well. And sometimes our industry gets stuck on a trend and the pendulum doesn't swing fast enough as constraints change. This talk will explore some of these trends and hopefully make us a bit more introspective as we choose which technologies and trends to advocate within and outside our orgs.
MLOps Beyond LLMs
Companies & organizations know they shouldn’t build for Google but they also don’t know how NOT to build for Google scale. The MLOps tooling ecosystem is fragmented and companies that are just starting on their journeys to becoming ML-native or ML-fluent are confused by the ML Ops maturity models that don't account for their particular organizational goals or trajectory, especially if they're not "on the road" to Google maturity. Toss in the the emergence and (seemingly widespread) adoption of LLM’s and companies (and teams) are lost and looking for clarity on: - How can existing ML platforms be extended to account for new uses cases involving LLMs? - Does the team composition change? Do we now need to start hiring “prompt engineers”? - Should we stop existing initiatives? Do we need to pivot? My goal in this session is to help cut through the noise and cover: - What are the main problems MLOps tries to solve? - What does the archetypal MLOps platform look like? - What are the most common components of an MLOps platform? - Where do LLM based applications fit in?
Morning Tea
Enjoy some delicious refreshments to keep you going towards lunch.
Data Contracts: Data Quality for AI
In this session, we'll dive into the world of data contracts - API-based agreements between data producers and consumers that capture the schema, semantics, distribution, and enforcement policies of the data. Learn how data contracts provide a single surface for collaboration on data in a shared language, allow the data model to evolve in an agile, iterative way, and apply data governance incrementally where it's needed for AI and ML systems. We'll explore how organizations can leverage data contracts to ensure Artificial Intelligence systems are trained on trustworthy and well-governed datasets.
Transactional Data Platform at Atlassian
A shift from your traditional OLTP Systems. How Atlassian built a new generation multi-region and massively scalable Storage Platform ground up for its Transactional data. Sprinkled with some scale challenges and tradeoffs along the journey.
Data Vault Engineering
Snowflake breathes new life into Data Vault! A Data Vault model is intended to represent the business through Enterprise Architecture as repeatable patterns. These are repeatable patterns in: • Data modelling, • Data engineering and model testing • Data architecture • and information consumption. This talk is on what those patterns are and how you can use Snowflake native features to simplify your Data Vault patterns
Lunch
Relax and network with your fellow peers
Super charge data engineering productivity in era of AI
Learn how simplifying the data platform, leveraging data mesh patterns and generative AI can enhance data engineering productivity
Build your own electric vehicle charging map with PostGIS
An educational live demo of a map server showing electric vehicle charging stations so you'll never be stuck without a charge again. Your map will show the available charging stations, which ones you can reach with your current range and the most efficient route to each one. Come along and see what you can build with PostGIS and pgRouting in the context of a topical real world use case. We will also take a look at some ETL and geocoding examples using PL/Python running natively in Postgres. Finally, we will look at why PostGIS is so powerful for both performance and integration - including functions to work with GeoJSON, KML and MVT, built-in 3D and topology support, and advanced spatial indexing enhancements. This is a beginner to intermediate session for PostGIS and assumes a rudimentary knowledge of SQL and spatial data.
Getting serious about transformations in the modern data stack
Data transformations, processing, and modelling serve as the fundamental elements in constructing a data platform for analytics, machine learning, and AI purposes. In this session, learn how Coalesce helps Snowflake customers to implement data engineering best practices and data modelling best practices with the speed of an intuitive graphical user interface (GUI), the flexibility of code, and the efficiency of automation for data transformations.
Afternoon Tea
Enjoy some delicious refreshments to keep you going into the afternoon.
Sydney Data Engineering Panel
Hear from the greatest minds locally and internationally about the latest trends in data engineering!
Unleashing the Power of Real-Time Machine Learning: A Journey of Rokt's Architecture evolution
This talk that delves into the journey of Rokt's evolution in architecture as we built a web-scale real-time machine learning system. Discover how we harnessed the power of machine learning to drive business growth and success, and learn about the valuable lessons we learned along the way. You'll come away with a deeper understanding of the key factors that led to our success, and how you can leverage similar strategies to drive innovation in your own projects. This should be an exciting exploration of real-time machine learning architecture!
Data Modeling is Dead! Long Live Data Modeling!
Data modeling is on life support. Some say it’s dead. The traditional practices are increasingly ignored and forgotten. The result is often a loss of structure and a shared understanding of business rules and vocabulary. At the same time, data modeling is more critical than ever. With AI's rising popularity, many organizations rush to incorporate it into their infrastructure. Without consideration of the underlying data framework, the result will be unpleasant for many organizations. In this talk, I argue that data modeling is a key enabler for success with AI. We must return to basics and revamp data modeling to work with modern business workflows and technologies. Long live data modeling!
Closing Remarks
Lars Klint will close out the conference.
After Party
Join us for some late refreshments and further networking to close out the conference.
ML Track
Vector Database: The What, The Why, and The How
This talk dives into the world of vector databases, providing a concise overview of their fundamental concepts, growing importance, and practical application Attendees will: - Gain a clear understanding of what vector databases are and how they differ from traditional relational databases - Discover the reasons behind their increasing popularity, driven by the need for efficient storage and retrieval of high-dimensional data in fields such as AI, machine learning, and advanced analytics. - Explore some architectural considerations, indexing strategies, and data modelling techniques that enable effective usage of vector databases. - Uncover the potential connection between vector databases and large language models, such as OpenAI’s ChatGPT, and how their integration can enhance natural language understanding and generation. By the end of this presentation, attendees will be equipped with valuable insights into leveraging vector databases to unlock new frontiers in data-driven applications, powering the next generation of AI applications with advanced and high-performant vector similarity search technology and stay ahead in this era of industrial revolution.
LLMs: A Data Engineer's Game
We have a controversial opinion, but please hear us out: LLMs has become a data engineering problem rather than a data science problem! In this session we'll explore how data engineering teams can leverage LLMs with ease within the Databricks Data+AI platform. We'll explore how we can utilise commercial LLMs and build-your-own LLMs to bring natural language querying to your data. Best of both worlds: fine-tuned LLMs + commercial LLMs. We'll look at fine-tuning and applying Hugging Face models thanks to the ease of MLflow's integration with Hugging Face. Using SQL AI Functions to apply OpenAI GPT-3.5 text capabilities to unstructured data Call to action: We hope we demonstrate that it's possible for DEs to leverage LLMs today with a combination of fine-tuned LLMs and commercial LLMs. You don't need to wait for your data science team to start creating value from your data
Navigating the MLOps Journey: Key Considerations for Successful Implementations
Embarking on the MLOps journey can be a game-changer for those seeking to unlock the full potential of their machine learning teams and solutions. Explore the critical considerations required with an end to end overview of an MLOps implementation. In this session, we will delve into the essential components of MLOps and how they streamline the development, deployment, and maintenance of machine learning models. We will discuss key considerations such as infrastructure requirements, data management and model versioning and team roles and responsibilities that work best. Whether you're just starting your MLOps journey or looking to optimize your existing processes, see how MLOps can enhance model performance, improve productivity, ensure regulatory compliance, and foster a culture of collaboration and innovation within your team.
Lunch
Relax and network with your fellow peers
Crossing the River by Feeling the --Stones-- Data
The importance of data in building AI applications is well understood. But often, chasing deadlines, we end up de-prioritising making data as first-class citizens. We have all been there and done that. Can we perhaps change our AI practices to iteratively develop to cross the proverbial river by feeling the stone...err data? This fun talk is all about the role of data in AI and a different perspective on incrementally growing it!
What does MLOps look like for Large Language Models (LLMs) in production?
In the BC (Before- ChatGPT!) era, MLOps primarily encompassed the processes related to model serving, inference and monitoring. During that time, considerable effort was dedicated to data collection and model training, with MLOps serving as a means to operationalize and manage those models effectively. However, the landscape has shifted post-BC and the traditional practices of model training and fine-tuning have become less common due to extended context window size. While all of these core processes still remains crucial, in this talk we deep dive into emerging dynamics of operatizaling LLMs at scale in addition to invaluable insights gleaned from the real-world industrial applications.
Sydney MLOps Panel Discussion
Hear from the greatest minds locally and internationally in the MLOps space!
Afternoon Tea
Enjoy some delicious refreshments to keep you going into the afternoon.
DE Track
Delta Lake UniForm - Open data formats FTW!
We will take a look at the Open Source Delta Lake storage format and in particular the recently announced universal format (UniForm) feature that can be enabled to allow access from tools that support Iceberg. Let's examine how this will help users access their data from an expanded set of tools.
Processing 40 TB of code from ~10 million projects with a dedicated server and Go for $100
The problem? We have some software that can tell you how complex a project is. However in order to gauge what that number means we need to compare it to other projects. But which projects? How about all of them! Learn why using AWS was not the right approach here, and how we ended up processing millions of repository to find the answers to such questions as, YAML vs YML? Which group of developers have the biggest potty mouth? How many files does an average repository have? How many lines of code are in a typical file per language? How many repositories appear to be missing a license? Why does that even matter?! And more!
Can we get a Connection?
Optimised data flows using Neo4j Fabric Learn how we discover patterns and insights across billions of data connections deeply, easily, and quickly with Neo4j GraphDB. We'll walk through a scalable big data model implementation using a Fabric Architecture to optimise real time ingestion and graph search.
Lunch
Relax and network with your fellow peers
Compilation is coming to a data stack near you
Abstract: In the 1960s, software engineering went through a radical shift. Instead of writing programs, people started writing programs that wrote programs. The same is about to happen to data engineering. Description: Remember when people wrote low-level code with a basic instruction set that was quite similar across vendors but slightly different in annoying ways that made you want to tear your hair out? That's the life of a data engineer today, but it doesn't need to be this way. Software engineering went through a similar shift back in the 1960s. This made their programs less buggy, more portable, easier to debug and enabled more people to "come to the party", driving further innovation. Data engineering stands on the brink of the same revolution, and the semantic layer is the key to this. In this talk, Abhinav will demonstrate how compilers will change the field in the years to come.
Supercharging SQL Query Performance: Unleash the Power Within!
In a world of diverse database technologies, relational databases continue to reign supreme when it comes to managing and storing data. However, as data volumes skyrocket, backend engineers often find themselves grappling with sluggish SQL query performance. Instead of relying solely on overburdened DBAs, why not take the reins and unlock the secrets to optimizing queries yourself? Prepare to embark on a thrilling journey into the heart of SQL query execution within relational databases. From uncovering the hidden potential of various index types in SQL server to diving deep into their implementation details, this talk leaves no stone unturned. But that's not all—brace yourself for a mind-blowing exploration of the intricate SQL query execution process. Witness the magic behind the scenes as a database engine springs into action, executing your queries step by step. Armed with this newfound knowledge, you'll be equipped with practical optimization techniques to supercharge your queries like never before! Calling all engineers and data enthusiasts who yearn to conquer query performance challenges! Join us in this dynamic session as Lisa Li, a Lead Staff Software Engineer, takes you on an adrenaline-pumping ride through the realm of SQL query optimization. Get ready to collaborate, learn, and emerge victorious with strategies to revolutionize your database performance.
The impact of Native Applications
Snowflake have launched their Native Applications framework, and this has huge implications many parts of the industry. In light of this new model, this talk will discuss areas such as: Open source projects - With Snowflake-managed infrastructure, how can open source data product companies commercialise their offering? Product design - What does a data product look like? Instead of a complete end-to-end solution, can it be a component that users can mix in? Go-to-market strategy - Will the Snowflake marketplace take off as a distribution channel? If so, will developers be able to find customers more directly than via traditional means?
Afternoon Tea
Enjoy some delicious refreshments to keep you going into the afternoon.
Abhinav Goyal
Founder/CTO at Ordinatim
Akanksha Malik
Data Consultant and Microsoft AI MVP
Arezou Soltani
Atlassian, Senior Machine Learning Engineer
Benjamin Boyter
Principle at Kablamo
James Weakley
Omnata CEO
John Cosgrove
Chief Executive Officer at Lightfold
Kanishka Mohaia
VP of Engineering| AI/ML & Data
Lisa li
The Trade Desk - Lead Staff Software Engineer
Lizzie Macneill
Solutions Engineer at EDB
Patrick Cuba
Solution Architect
Rene Essomba
Cevo Australia - Data Consultant
Rishu Saxena
Principal Solutions Architect
Sana Sanai
Cloud Scale Analytics Global Black Belt at Microsoft
Scott Eade
Sr. Solutions Architect at Databricks
Suneeta Mall
Head of AI Engineering (Harrison AI)
Suzanne Nieuwenhuizen
Customer Success Architect - APAC
Vinny Vijeyakumaar
Senior Solutions Architect, Retail at Databricks
Yash Sharma
Engineering Manager, Data Platform at Atlassian
Zak Sheikh
Technical Lead, Endeavour Group