What is DSPy? Build a Text-to-SQL App with Python (Complete Guide)
Apr 10, 2026 5 Min Read 21 Views
(Last Updated)
Quick Answer: DSPy (Declarative Self-improving Python) is a framework that helps developers build reliable LLM applications by structuring prompts as programs instead of raw text. It enables automatic optimization, modular design, and evaluation-driven improvements. Using DSPy, you can build a Text-to-SQL app in Python that converts natural language queries into accurate SQL by defining modules, signatures, and optimization pipelines.
How do you convert a simple question like “Show total revenue last month” into a precise SQL query without manually crafting fragile prompts every time? Traditional prompt engineering often breaks in production due to inconsistencies and lack of evaluation.
This is where DSPy changes the game. Instead of treating prompts as static text, it treats them as structured programs that can be optimized, tested, and improved over time.
In this guide, you will learn how to build a Text-to-SQL app using DSPy in Python, step by step. By the end, you will understand how to create modular LLM pipelines that generate accurate SQL queries from natural language inputs.
Table of contents
- What is DSPy?
- What is a Text-to-SQL Application?
- Step-by-Step Guide: Build a Text-to-SQL App with DSPy
- Step 1: Install Dependencies
- Step 2: Define the Signature
- Step 3: Create a DSPy Module
- Step 4: Configure the Language Model
- Step 5: Add Training Examples
- Step 6: Apply Optimizer
- Step 7: Run the Application
- Full Code Example
- DSPy vs Traditional Prompt Engineering
- Best Practices for Building a Text-to-SQL App with DSPy
- Conclusion
- FAQs
- What is DSPy used for?
- Is DSPy better than prompt engineering?
- Do I need SQL knowledge for Text-to-SQL apps?
- Can DSPy be used with any LLM?
What is DSPy?
DSPy (Declarative Self-improving Python) is a framework for building reliable LLM applications by treating prompts as structured programs instead of raw text. It uses signatures (input-output definitions), modules (reusable components), and optimizers (automatic improvements) to separate logic from prompt design.
What is a Text-to-SQL Application?
A Text-to-SQL application converts natural language queries into SQL statements.
Example:
- Input: “Show top 5 products by sales”
- Output:
SELECT product_name, SUM(sales)
FROM products
GROUP BY product_name
ORDER BY SUM(sales) DESC
LIMIT 5;
These applications are widely used in:
- Business intelligence dashboards
- Data analytics platforms
- Chat-based database querying tools
Step-by-Step Guide: Build a Text-to-SQL App with DSPy
Step 1: Install Dependencies
The first step is to install the required Python packages. At minimum, you need the DSPy library. The official DSPy getting-started docs show installation with pip install -U dspy. If you want typed output validation, adding pedantic is also useful in many DSPy workflows.
You also need to prepare your LLM setup. DSPy works by connecting your program to a language model backend, so you must have access to a supported provider and an API key. In practice, this means selecting a model, exporting the provider key in your environment, and ensuring your Python environment can call that model successfully. DSPy’s tutorials show model setup through dspy.LM(…) and global configuration with dspy.configure(lm=lm).
pip install -U dspy
Step 2: Define the Signature
In DSPy, a signature defines the contract between the input and output of a task. For a Text-to-SQL app, the input is a natural language query, and the output is a SQL statement. This step is important because DSPy encourages you to declare what the system should do before worrying about the exact wording of prompts. That is a core part of its programming model.
A clean signature makes the Text-to-SQL pipeline easier to test, optimize, and reuse. You can keep it simple at first with just a user question and SQL output, then later extend it with database schema, business rules, or SQL dialect hints.
import dspy
class TextToSQL(dspy.Signature):
"""Convert a natural language question into a valid SQL query."""
question: str = dspy.InputField(desc="User's natural language database question")
sql: str = dspy.OutputField(desc="Executable SQL query")
Step 3: Create a DSPy Module
Once the signature is defined, wrap the logic in a reusable DSPy module, the core building block of the application. Instead of relying on a single prompt, modules enable reusable, inspectable, and independently improvable components, making the system more robust than brittle prompt strings.
For a Text-to-SQL app, use dspy.Predict inside a custom module to take a user query and schema context as input and generate SQL. This modular design supports scalability, allowing easy addition of validation, query explanation, and execution checks later.
class TextToSQLModule(dspy.Module):
def __init__(self):
super().__init__()
self.generator = dspy.Predict(TextToSQL)
def forward(self, question: str):
return self.generator(question=question)
Step 4: Configure the Language Model
After creating the module, configure the language model DSPy will use. DSPy tutorials show this pattern clearly: initialize the LM, then register it using dspy.configure. This central configuration allows the rest of the DSPy program to call the model consistently.
You must set your API key before creating the model object. In a real application, keep credentials in environment variables rather than hardcoding them. You should also choose a model that is good at structured generation, because SQL tasks depend on syntax precision and schema alignment.
import os
import dspy
os.environ["OPENAI_API_KEY"] = "your_api_key_here"
lm = dspy.LM("openai/gpt-4o-mini")
dspy.configure(lm=lm)
Step 5: Add Training Examples
DSPy becomes much more powerful when you give it examples. Instead of manually rewriting prompts, you provide a small training set of natural-language questions and their expected SQL outputs. DSPy can then use those examples to improve how the program behaves. Its FAQ and documentation emphasize that DSPy programs can be compiled into optimized prompts or weights based on your pipeline and your data.
For a Text-to-SQL app, your examples should cover different query types such as filtering, aggregation, sorting, grouping, and date-based conditions. Better example coverage generally leads to better SQL generation quality.
trainset = [
dspy.Example(
question="Show all customers from Mumbai",
sql="SELECT * FROM customers WHERE city = 'Mumbai';"
).with_inputs("question"),
dspy.Example(
question="Count total orders placed in 2025",
sql="SELECT COUNT(*) FROM orders WHERE YEAR(order_date) = 2025;"
).with_inputs("question"),
dspy.Example(
question="Get the top 5 products by sales",
sql="SELECT product_name, SUM(sales) AS total_sales FROM products GROUP BY product_name ORDER BY total_sales DESC LIMIT 5;"
).with_inputs("question"),
]
Step 6: Apply Optimizer
This is the step where DSPy stands out. The framework is designed to automatically optimize prompts and program behavior using your examples and evaluation signals, instead of relying only on hand-tuned prompt engineering. DSPy explicitly positions this capability as a key difference from prompt-only workflows.
In practical terms, you define your module, prepare examples, and then compile or optimize the program so DSPy can learn a better prompting strategy for the task. This is especially useful in Text-to-SQL, where small changes in wording can greatly affect SQL correctness.
from dspy.teleprompt import BootstrapFewShot
optimizer = BootstrapFewShot()
compiled_text_to_sql = optimizer.compile(
student=TextToSQLModule(),
trainset=trainset
)
Step 7: Run the Application
Once optimized, test the app with real queries. The DSPy Text-to-SQL system converts plain English into SQL, which can be displayed, validated, or executed after safety checks. Since SQL is executable, validation is essential in production. DSPy also emphasizes debugging and transparency as systems scale.
question = "List the names of customers who placed more than 3 orders"
result = compiled_text_to_sql(question=question)
print("Generated SQL:")
print(result.sql)
Full Code Example
import os
import dspy
from dspy.teleprompt import BootstrapFewShot
# Step 1: Set API key
os.environ["OPENAI_API_KEY"] = "your_api_key_here"
# Step 2: Configure LM
lm = dspy.LM("openai/gpt-4o-mini")
dspy.configure(lm=lm)
# Step 3: Define signature
class TextToSQL(dspy.Signature):
"""Convert a natural language question into a valid SQL query."""
question: str = dspy.InputField(desc="User's natural language database question")
sql: str = dspy.OutputField(desc="Executable SQL query")
# Step 4: Create module
class TextToSQLModule(dspy.Module):
def __init__(self):
super().__init__()
self.generator = dspy.Predict(TextToSQL)
def forward(self, question: str):
return self.generator(question=question)
# Step 5: Add training examples
trainset = [
dspy.Example(
question="Show all customers from Mumbai",
sql="SELECT * FROM customers WHERE city = 'Mumbai';"
).with_inputs("question"),
dspy.Example(
question="Count total orders placed in 2025",
sql="SELECT COUNT(*) FROM orders WHERE YEAR(order_date) = 2025;"
).with_inputs("question"),
dspy.Example(
question="Get the top 5 products by sales",
sql="SELECT product_name, SUM(sales) AS total_sales FROM products GROUP BY product_name ORDER BY total_sales DESC LIMIT 5;"
).with_inputs("question"),
]
# Step 6: Apply optimizer
optimizer = BootstrapFewShot()
compiled_text_to_sql = optimizer.compile(
student=TextToSQLModule(),
trainset=trainset
)
# Step 7: Run the application
question = "List the names of customers who placed more than 3 orders"
result = compiled_text_to_sql(question=question)
print("Generated SQL:")
print(result.sql)
DSPy vs Traditional Prompt Engineering
| Feature | DSPy | Prompt Engineering |
| Structure | Programmatic (signatures, modules) | Manual prompt writing |
| Optimization | Automated via optimizers | Manual iteration |
| Scalability | High, modular and reusable | Limited and hard to scale |
| Reliability | Strong, evaluation-driven | Inconsistent and fragile |
| Maintainability | Easy to update components independently | Difficult to manage large prompts |
| Debugging | Structured and traceable | Hard to debug prompt behavior |
| Reusability | High, supports modular pipelines | Low, prompts are often task-specific |
Build production-ready AI applications like Text-to-SQL systems with structured learning and expert guidance. Join HCL GUVI’s Artificial Intelligence and Machine Learning Program to learn from industry experts and Intel engineers through live online classes, master Python, SQL, ML, MLOps, Generative AI, and Agentic AI, and gain hands-on experience with 20+ industry-grade projects, 1:1 doubt sessions, and placement support with 1000+ hiring partners.
Best Practices for Building a Text-to-SQL App with DSPy
- Use Clear and Structured Schema Context: Always provide the database schema (tables, columns, relationships) in a clean and structured format so the model generates accurate SQL without hallucinating fields.
- Design Strong Signatures: Define precise input-output contracts in your DSPy signatures to ensure consistent and predictable SQL generation across different queries.
- Evaluate Outputs Regularly: Use test datasets and evaluation metrics to continuously monitor accuracy and identify failure patterns.
- Keep Modules Modular and Reusable: Break your pipeline into smaller DSPy modules so you can debug and scale components independently.
- Handle Edge Cases and Ambiguity: Add fallback logic or clarification steps when user queries are vague or incomplete.
- Optimize for SQL Dialect Compatibility: Ensure generated queries match your database system (MySQL, PostgreSQL, etc.) to avoid syntax errors in production.
Conclusion
Building a Text-to-SQL app with DSPy demonstrates how structured LLM programming outperforms traditional prompt engineering. By using signatures, modules, and optimizers, you can create scalable and reliable AI systems. As AI adoption grows, frameworks like DSPy will become essential for production-ready applications.
FAQs
What is DSPy used for?
DSPy is used to build structured, reliable LLM applications such as chatbots, RAG systems, and Text-to-SQL tools.
Is DSPy better than prompt engineering?
Yes, for production use cases because it enables automation and scalability.
Do I need SQL knowledge for Text-to-SQL apps?
Basic SQL understanding is recommended to validate outputs.
Can DSPy be used with any LLM?
Yes, DSPy supports multiple LLM providers through configurable interfaces.



Did you enjoy this article?