Evaluating LLM text-to-SQL performance

Promptfoo is a command-line tool that allows you to test and validate text-to-SQL conversions.

This guide will walk you through setting up an eval harness that will help you improve the quality of your text-to-SQL prompts.

The end result is a view that looks like this:

text to sql evaluation

Configuration

Start by creating a blank promptfooconfig.yaml file (optionally, generate a placeholder using npx promptfoo@latest init).

Step 1: Define the Prompt(s)

Specify the text prompts that will be used to generate the SQL queries. Use {{placeholders}} for variables that will be replaced with actual values during testing.

prompts:
  - |
    Output a SQL query that returns the number of {{product}} sold in the last month.

    Database schema:
    {{database}}

    Only output SQL code.

If you'd like, you can reference prompts in an external file:

prompts:
  - file://path/to/my_prompt.txt
  - file://path/to/another_prompt.json

Step 2: Specify the Providers

Define one or more language model providers to use. For example, here we compare the performance between GPT 3.5 and GPT 4:

providers:
  - openai:gpt-4o-mini
  - openai:gpt-4o

A wide variety of LLM APIs are supported, including local models. See providers for more information.

Step 3: Define the Tests

Create test cases to validate the generated SQL queries. Each test case includes:

vars: Variables used in the prompt template.
assert: Assertions to used to validate the output.

Basic SQL Validation

This test checks produces a query for bananas (remember our prompt above) and confirms that the generated output is valid SQL.

- vars:
    product: bananas
    database: file://database.sql
  assert:
    - type: is-sql

tip

Use contains-sql instead of is-sql to allow responses that contain text with SQL code blocks.

Table-Specific SQL Validation

This test ensures the SQL query only uses specified tables (Products and Shipments).

- vars:
    product: apples
    database: file://database.sql
  assert:
    - type: is-sql
      value:
        databaseType: 'MySQL'
        allowedTables:
          - select::null::Products
          - select::null::Shipments

The format for allowed notation is {type}::{tableName}::{columnName}, and null can be used to allow any.

Column-Specific SQL Validation

This test is expected to fail since the DoesntExist column is not present in the database:

- vars:
    product: oranges
    database: file://database.sql
  assert:
    - type: is-sql
      value:
        databaseType: 'MySQL'
        allowedColumns:
          - select::null::DoesntExist

Step 4: Define the Database Schema

Define the structure of your database in a separate SQL file (database.sql).

CREATE DATABASE IF NOT EXISTS ShipmentSystem;

USE ShipmentSystem;

CREATE TABLE IF NOT EXISTS Products (
    product_id INT AUTO_INCREMENT PRIMARY KEY,
    name VARCHAR(255) NOT NULL,
    description TEXT,
    price DECIMAL(10, 2) NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE IF NOT EXISTS Shipments (
    shipment_id INT AUTO_INCREMENT PRIMARY KEY,
    product_id INT NOT NULL,
    quantity INT NOT NULL,
    shipment_date DATE NOT NULL,
    status VARCHAR(50) NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    FOREIGN KEY (product_id) REFERENCES Products(product_id)
);

CREATE TABLE IF NOT EXISTS ShipmentDetails (
    detail_id INT AUTO_INCREMENT PRIMARY KEY,
    shipment_id INT NOT NULL,
    location VARCHAR(255) NOT NULL,
    status VARCHAR(50) NOT NULL,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
    FOREIGN KEY (shipment_id) REFERENCES Shipments(shipment_id)
);

Final Configuration

Combine all the steps into a final configuration file (promptfooconfig.yaml):

description: 'Is-SQL example'

prompts:
  - |
    Output a SQL query that returns the number of {{product}} sold in the last month.

    Database schema:
    {{database}}

    Only output SQL code.

providers:
  - openai:gpt-4o-mini

tests:
  - vars:
      product: bananas
      database: file://database.sql
    assert:
      - type: is-sql

  - vars:
      product: apples
      database: file://database.sql
    assert:
      - type: is-sql
        value:
          databaseType: 'MySQL'
          allowedTables:
            - select::null::Products
            - select::null::Shipments

  - vars:
      product: oranges
      database: file://database.sql
    assert:
      - type: is-sql
        value:
          databaseType: 'MySQL'
          allowedColumns:
            - select::null::DoesntExist

Running Tests

Run your tests:

npx promptfoo@latest eval

This will generate a summary of outputs in your terminal.

Review results

Use the web viewer:

npx promptfoo@latest view

This will open your test results and allow you to refine your prompts and compare model performance.

text to sql evaluation

Configuration​

Step 1: Define the Prompt(s)​

Step 2: Specify the Providers​

Step 3: Define the Tests​

Basic SQL Validation​

Table-Specific SQL Validation​

Column-Specific SQL Validation​

Step 4: Define the Database Schema​

Final Configuration​

Running Tests​

Review results​