Apply Now Apply Now Apply Now
header_logo
Post thumbnail
ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING

What is DSPy? Build a Text-to-SQL App with Python (Complete Guide)

By Vaishali

Quick Answer: DSPy (Declarative Self-improving Python) is a framework that helps developers build reliable LLM applications by structuring prompts as programs instead of raw text. It enables automatic optimization, modular design, and evaluation-driven improvements. Using DSPy, you can build a Text-to-SQL app in Python that converts natural language queries into accurate SQL by defining modules, signatures, and optimization pipelines.

How do you convert a simple question like “Show total revenue last month” into a precise SQL query without manually crafting fragile prompts every time? Traditional prompt engineering often breaks in production due to inconsistencies and lack of evaluation.

This is where DSPy changes the game. Instead of treating prompts as static text, it treats them as structured programs that can be optimized, tested, and improved over time.

In this guide, you will learn how to build a Text-to-SQL app using DSPy in Python, step by step. By the end, you will understand how to create modular LLM pipelines that generate accurate SQL queries from natural language inputs.

Table of contents


  1. What is DSPy?
  2. What is a Text-to-SQL Application?
  3. Step-by-Step Guide: Build a Text-to-SQL App with DSPy
    • Step 1: Install Dependencies
    • Step 2: Define the Signature
    • Step 3: Create a DSPy Module
    • Step 4: Configure the Language Model
    • Step 5: Add Training Examples
    • Step 6: Apply Optimizer
    • Step 7: Run the Application
  4. Full Code Example
  5. DSPy vs Traditional Prompt Engineering
  6. Best Practices for Building a Text-to-SQL App with DSPy
  7. Conclusion
  8. FAQs
    • What is DSPy used for?
    • Is DSPy better than prompt engineering?
    • Do I need SQL knowledge for Text-to-SQL apps?
    • Can DSPy be used with any LLM?

What is DSPy?

DSPy (Declarative Self-improving Python) is a framework for building reliable LLM applications by treating prompts as structured programs instead of raw text. It uses signatures (input-output definitions), modules (reusable components), and optimizers (automatic improvements) to separate logic from prompt design. 

What is a Text-to-SQL Application?

A Text-to-SQL application converts natural language queries into SQL statements.

Example:

  • Input: “Show top 5 products by sales”
  • Output:
SELECT product_name, SUM(sales)

FROM products

GROUP BY product_name

ORDER BY SUM(sales) DESC

LIMIT 5;

These applications are widely used in:

  • Business intelligence dashboards
  • Data analytics platforms
  • Chat-based database querying tools

Step-by-Step Guide: Build a Text-to-SQL App with DSPy

Step 1: Install Dependencies

The first step is to install the required Python packages. At minimum, you need the DSPy library. The official DSPy getting-started docs show installation with pip install -U dspy. If you want typed output validation, adding pedantic is also useful in many DSPy workflows.  

You also need to prepare your LLM setup. DSPy works by connecting your program to a language model backend, so you must have access to a supported provider and an API key. In practice, this means selecting a model, exporting the provider key in your environment, and ensuring your Python environment can call that model successfully. DSPy’s tutorials show model setup through dspy.LM(…) and global configuration with dspy.configure(lm=lm).  

pip install -U dspy

Step 2: Define the Signature

In DSPy, a signature defines the contract between the input and output of a task. For a Text-to-SQL app, the input is a natural language query, and the output is a SQL statement. This step is important because DSPy encourages you to declare what the system should do before worrying about the exact wording of prompts. That is a core part of its programming model.  

A clean signature makes the Text-to-SQL pipeline easier to test, optimize, and reuse. You can keep it simple at first with just a user question and SQL output, then later extend it with database schema, business rules, or SQL dialect hints.

import dspy

class TextToSQL(dspy.Signature):

    """Convert a natural language question into a valid SQL query."""

    question: str = dspy.InputField(desc="User's natural language database question")

    sql: str = dspy.OutputField(desc="Executable SQL query")

Step 3: Create a DSPy Module

Once the signature is defined, wrap the logic in a reusable DSPy module, the core building block of the application. Instead of relying on a single prompt, modules enable reusable, inspectable, and independently improvable components, making the system more robust than brittle prompt strings. 

For a Text-to-SQL app, use dspy.Predict inside a custom module to take a user query and schema context as input and generate SQL. This modular design supports scalability, allowing easy addition of validation, query explanation, and execution checks later.

class TextToSQLModule(dspy.Module):

    def __init__(self):

        super().__init__()

        self.generator = dspy.Predict(TextToSQL)

    def forward(self, question: str):

        return self.generator(question=question)
MDN

Step 4: Configure the Language Model

After creating the module, configure the language model DSPy will use. DSPy tutorials show this pattern clearly: initialize the LM, then register it using dspy.configure. This central configuration allows the rest of the DSPy program to call the model consistently.  

You must set your API key before creating the model object. In a real application, keep credentials in environment variables rather than hardcoding them. You should also choose a model that is good at structured generation, because SQL tasks depend on syntax precision and schema alignment.

import os

import dspy

os.environ["OPENAI_API_KEY"] = "your_api_key_here"

lm = dspy.LM("openai/gpt-4o-mini")

dspy.configure(lm=lm)

Step 5: Add Training Examples

DSPy becomes much more powerful when you give it examples. Instead of manually rewriting prompts, you provide a small training set of natural-language questions and their expected SQL outputs. DSPy can then use those examples to improve how the program behaves. Its FAQ and documentation emphasize that DSPy programs can be compiled into optimized prompts or weights based on your pipeline and your data.  

For a Text-to-SQL app, your examples should cover different query types such as filtering, aggregation, sorting, grouping, and date-based conditions. Better example coverage generally leads to better SQL generation quality.

trainset = [

    dspy.Example(

        question="Show all customers from Mumbai",

        sql="SELECT * FROM customers WHERE city = 'Mumbai';"

    ).with_inputs("question"),

    dspy.Example(

        question="Count total orders placed in 2025",

        sql="SELECT COUNT(*) FROM orders WHERE YEAR(order_date) = 2025;"

    ).with_inputs("question"),

    dspy.Example(

        question="Get the top 5 products by sales",

        sql="SELECT product_name, SUM(sales) AS total_sales FROM products GROUP BY product_name ORDER BY total_sales DESC LIMIT 5;"

    ).with_inputs("question"),

]

Step 6: Apply Optimizer

This is the step where DSPy stands out. The framework is designed to automatically optimize prompts and program behavior using your examples and evaluation signals, instead of relying only on hand-tuned prompt engineering. DSPy explicitly positions this capability as a key difference from prompt-only workflows.  

In practical terms, you define your module, prepare examples, and then compile or optimize the program so DSPy can learn a better prompting strategy for the task. This is especially useful in Text-to-SQL, where small changes in wording can greatly affect SQL correctness.

from dspy.teleprompt import BootstrapFewShot

optimizer = BootstrapFewShot()

compiled_text_to_sql = optimizer.compile(

    student=TextToSQLModule(),

    trainset=trainset

)

Step 7: Run the Application

Once optimized, test the app with real queries. The DSPy Text-to-SQL system converts plain English into SQL, which can be displayed, validated, or executed after safety checks. Since SQL is executable, validation is essential in production. DSPy also emphasizes debugging and transparency as systems scale.

question = "List the names of customers who placed more than 3 orders"

result = compiled_text_to_sql(question=question)

print("Generated SQL:")

print(result.sql)

Full Code Example

import os

import dspy

from dspy.teleprompt import BootstrapFewShot

# Step 1: Set API key

os.environ["OPENAI_API_KEY"] = "your_api_key_here"

# Step 2: Configure LM

lm = dspy.LM("openai/gpt-4o-mini")

dspy.configure(lm=lm)

# Step 3: Define signature

class TextToSQL(dspy.Signature):

    """Convert a natural language question into a valid SQL query."""

    question: str = dspy.InputField(desc="User's natural language database question")

    sql: str = dspy.OutputField(desc="Executable SQL query")

# Step 4: Create module

class TextToSQLModule(dspy.Module):

    def __init__(self):

        super().__init__()

        self.generator = dspy.Predict(TextToSQL)

    def forward(self, question: str):

        return self.generator(question=question)

# Step 5: Add training examples

trainset = [

    dspy.Example(

        question="Show all customers from Mumbai",

        sql="SELECT * FROM customers WHERE city = 'Mumbai';"

    ).with_inputs("question"),

    dspy.Example(

        question="Count total orders placed in 2025",

        sql="SELECT COUNT(*) FROM orders WHERE YEAR(order_date) = 2025;"

    ).with_inputs("question"),

    dspy.Example(

        question="Get the top 5 products by sales",

        sql="SELECT product_name, SUM(sales) AS total_sales FROM products GROUP BY product_name ORDER BY total_sales DESC LIMIT 5;"

    ).with_inputs("question"),

]

# Step 6: Apply optimizer

optimizer = BootstrapFewShot()

compiled_text_to_sql = optimizer.compile(

    student=TextToSQLModule(),

    trainset=trainset

)

# Step 7: Run the application

question = "List the names of customers who placed more than 3 orders"

result = compiled_text_to_sql(question=question)

print("Generated SQL:")

print(result.sql)

DSPy vs Traditional Prompt Engineering

FeatureDSPyPrompt Engineering
StructureProgrammatic (signatures, modules)Manual prompt writing
OptimizationAutomated via optimizersManual iteration
ScalabilityHigh, modular and reusableLimited and hard to scale
ReliabilityStrong, evaluation-drivenInconsistent and fragile
MaintainabilityEasy to update components independentlyDifficult to manage large prompts
DebuggingStructured and traceableHard to debug prompt behavior
ReusabilityHigh, supports modular pipelinesLow, prompts are often task-specific

Build production-ready AI applications like Text-to-SQL systems with structured learning and expert guidance. Join HCL GUVI’s Artificial Intelligence and Machine Learning Program to learn from industry experts and Intel engineers through live online classes, master Python, SQL, ML, MLOps, Generative AI, and Agentic AI, and gain hands-on experience with 20+ industry-grade projects, 1:1 doubt sessions, and placement support with 1000+ hiring partners.

Best Practices for Building a Text-to-SQL App with DSPy

  • Use Clear and Structured Schema Context: Always provide the database schema (tables, columns, relationships) in a clean and structured format so the model generates accurate SQL without hallucinating fields.
  • Design Strong Signatures: Define precise input-output contracts in your DSPy signatures to ensure consistent and predictable SQL generation across different queries.
  • Evaluate Outputs Regularly: Use test datasets and evaluation metrics to continuously monitor accuracy and identify failure patterns.
  • Keep Modules Modular and Reusable: Break your pipeline into smaller DSPy modules so you can debug and scale components independently.
  • Handle Edge Cases and Ambiguity: Add fallback logic or clarification steps when user queries are vague or incomplete.
  • Optimize for SQL Dialect Compatibility: Ensure generated queries match your database system (MySQL, PostgreSQL, etc.) to avoid syntax errors in production.

Conclusion

Building a Text-to-SQL app with DSPy demonstrates how structured LLM programming outperforms traditional prompt engineering. By using signatures, modules, and optimizers, you can create scalable and reliable AI systems. As AI adoption grows, frameworks like DSPy will become essential for production-ready applications.

FAQs

What is DSPy used for?

DSPy is used to build structured, reliable LLM applications such as chatbots, RAG systems, and Text-to-SQL tools.

Is DSPy better than prompt engineering?

Yes, for production use cases because it enables automation and scalability.

Do I need SQL knowledge for Text-to-SQL apps?

Basic SQL understanding is recommended to validate outputs.

MDN

Can DSPy be used with any LLM?

Yes, DSPy supports multiple LLM providers through configurable interfaces.

Success Stories

Did you enjoy this article?

Schedule 1:1 free counselling

Similar Articles

Loading...
Get in Touch
Chat on Whatsapp
Request Callback
Share logo Copy link
Table of contents Table of contents
Table of contents Articles
Close button

  1. What is DSPy?
  2. What is a Text-to-SQL Application?
  3. Step-by-Step Guide: Build a Text-to-SQL App with DSPy
    • Step 1: Install Dependencies
    • Step 2: Define the Signature
    • Step 3: Create a DSPy Module
    • Step 4: Configure the Language Model
    • Step 5: Add Training Examples
    • Step 6: Apply Optimizer
    • Step 7: Run the Application
  4. Full Code Example
  5. DSPy vs Traditional Prompt Engineering
  6. Best Practices for Building a Text-to-SQL App with DSPy
  7. Conclusion
  8. FAQs
    • What is DSPy used for?
    • Is DSPy better than prompt engineering?
    • Do I need SQL knowledge for Text-to-SQL apps?
    • Can DSPy be used with any LLM?