Juan Sequeda

Juan Sequeda

Principal Scientist, Head of AI Lab at data.world and Co-Host of Catalog & Cocktails, honest no-bs data podcast

Juan Sequeda is the Principal Scientist and Head of the AI Lab at data.world. He holds a PhD in Computer Science from The University of Texas at Austin. Juan’s research and industry work has been on the intersection of data and AI, with the goal to reliably create knowledge from inscrutable data, specifically designing and building Knowledge Graph for enterprise data and metadata management. Juan is the co-author of the book “Designing and Building Enterprise Knowledge Graph” and the co-host of Catalog and Cocktails, an honest, no-bs, non-salesy data podcast. Juan has researched and developed technology on semantic data virtualization, graph data modeling, schema mapping and data integration methodologies. He pioneered technology to construct knowledge graphs from relational databases, resulting in W3C standards, research awards, patents, software and his startup Capsenta acquired by data.world in 2019. Juan is the recipient of the NSF Graduate Research Fellowship, received 2nd Place in the 2013 Semantic Web Challenge for his work on ConstituteProject.org, Best Student Research Paper at the 2014 International Semantic Web Conference (ISWC), the 2015 Best Transfer and Innovation Project awarded by the Institute for Applied Informatics, 2023 Best Industry Paper at SIGMOD and nominated two additional times for best paper at ISWC. Juan strives to build bridges between academia and industry as former co-chair of the LDBC Property Graph Schema Working Group, member of the LDBC Graph Query Languages task force, standards editor at the World Wide Web Consortium (W3C). Juan continues to be an active member of the scientific community by being on the editorial board and program committees of scientific journals and conferences in Semantic Web, Knowledge Graphs, Databases and AI, as well as organizer of various academic and industry conferences, including recently being the General Chair of The ACM Web Conference 2023.

Sessions

Increasing the LLM Accuracy for Question Answering on Structured Data: Knowledge Graphs to the Rescu

Our research aims to understand the accuracy of LLM-powered question answering systems in the context of enterprise questions, SQL databases, and knowledge graphs. We introduce a benchmark comprising an enterprise SQL schema in the insurance domain, a range of enterprise queries encompassing reporting to metrics, and a contextual layer incorporating an ontology and mappings that define a knowledge graph. Our first finding reveals that question answering using GPT-4, with zero-shot prompts directly on SQL databases, achieves an accuracy of 16%. Notably, this accuracy increases to 54% when questions are posed over a Knowledge Graph representation of the enterprise SQL database, thus a 3X increase in accuracy. The question remains: how can we further improve the accuracy? Building on the observations of our first work where the inaccurate LLM-generated SPARQL queries followed incorrect paths, we present a Ontology-based Query Check (OBQC) approach which 1) leverages the ontology of the knowledge graph to check if the LLM-generated SPARQL query matches the semantic of ontology to detect errors and 2) use the explanations of the errors with an LLM to repair the errors. Our next finding is that this approach increases the overall accuracy to 72% including an additional 8% of unknown results. The overall error rate of 20%. Thus an overall 4X accuracy improvement. Our call to action is to invest in Knowledge Graph to provide higher accuracy for LLM powered question answering systems over structured data.

Starts: 11:10 AM

Ends: 11:55 AM

Future of Machine Learning Panel

Our "Future of Machine Learning" panel will explore cutting-edge developments and emerging trends that are shaping the field. Leading experts will discuss advancements in areas such as deep learning architectures, reinforcement learning, and federated learning. The panel will delve into practical applications of ML. Join us for a fascinating look into the future of machine learning and its potential to revolutionize industries, scientific research, and our daily lives.

Starts: 1:40 PM

Ends: 2:30 PM