PyCon Colombia 2024

Filling the LLM Gap: From Natural Language To Structured Outputs With Pydantic

Authors

Description

By now, most of us developers have experienced the power of Large Language Models. They are incredibly quick and can solve almost any kind of problem. Most companies and individuals are eager to harness this technology and have crafted ambitious plans to adopt generative AI. However, most of these plans overlook a silent problem: LLMs cannot reliably produce structured outputs. Researchers have long known that LLMs cannot reliably produce structured outputs. But the rewards of achieving such a feat are significant. No more JSON parsing or Key errors. Instead, a reliable and deterministic function that we can trust. A modular building block for any pipeline. Jason Liu's Instructor package aims to solve the problem of structured outputs. Instructor is an OpenAI patcher that leverages function calling and Pydantic to enforce a pre-defined output from LLMs. It is lightweight, handles retries, and maintains the familiar OpenAI interface. This talk explains what Instructor is, its benefits, and how to start using it right away to extract structured data from LLMs. The talk will showcase a real-world example of creating a recruiting bot that captures candidate data, schedules interviews, and integrates seamlessly with Google Calendar and a PostgreSQL database. The outline of the talk is as follows: 1. Introduce the problem of extracting structured data from LLMs. 2. Share a brief story about Jason Liu and Instructor, inspired by the open-source spirit of solving problems, as championed by Sebastian Ramirez (Tiangolo). 3. Demonstrate building the recruiting bot step-by-step. 4. Run an interactive demo of the bot. 5. Deliver a closing message.