Designing better data models: simple solutions for maximum impact

The evolution of artificial intelligence has highlighted that advanced statistics have progressed into intelligence that can gather insights from large data lakes and make decisions to achieve a specific goal. AI in its near future is also anticipated to get complex and hit an intelligence explosion that could surpass human intelligence and decision-making. This has sparked a greater need among data scientists to build complex decision-making models to drive business, as a result, there is an increased focus on piping/cascading machine learning models to achieve complex business needs along with the demand for large-scale computation. Working for the biggest energy distributor in Victoria to ensure customer safety and optimize asset management has pushed me to build advanced use cases to establish an efficient and secure electricity network. Over the last 5 years, I have shaped my data science skills to build highly efficient and conservative automated data science models that stand high within the business. My role has enabled me to efficiently operate with low-risk and high impact when influencing business decisions. The talk emphasizes best practices for a data scientist to plan data flow and compute resources effectively. I will be using a basic ML architecture with a strategic data flow structure and model monitoring to explain a brief about the approach to solving issues listed below • How to foster test & learn techniques in statistical model development that empowers new strategies or paths without impacting the ongoing business workflow. • How to scale and score the model over a large data set and a resilient architectural solution to optimize the use of DB/compute resources. • How to enable the reusability of processed data using a feature store implementation in the architecture. I will be talking about the best practices that I have incorporated and tailored my implementation when architecting data science models to demonstrate continuous improvement using ML architecture. I’m planning to emphasize my learning on “data for machine learning” and talk about data artefacts, model monitoring, and feature stores to enable reusability and efficient use of compute resources.