Mastering Prompt Optimization and Model Migration with Amazon Bedrock's Advanced Tool

Overview

Amazon Bedrock's Advanced Prompt Optimization is a powerful new tool designed to help you fine-tune prompts for any supported model while evaluating performance across up to five models simultaneously. Whether you're migrating from one model to another or simply seeking to improve your current model's output, this tool provides a metric-driven feedback loop that compares original and optimized prompts. It supports multimodal inputs—including PNG, JPG, and PDF files—making it ideal for document and image analysis tasks. You can guide the optimization process using a natural language description, an AWS Lambda function, or a custom LLM-as-a-judge rubric. The tool outputs evaluation scores, cost estimates, and latency comparisons, giving you a comprehensive view of performance improvements.

Mastering Prompt Optimization and Model Migration with Amazon Bedrock's Advanced Tool — Source: aws.amazon.com

Prerequisites

Before you begin, ensure you have the following:

An AWS account with access to Amazon Bedrock and the necessary IAM permissions to create prompt optimizations.
Basic familiarity with JSONL format (each JSON object on a single line).
Your prompt templates prepared with example user inputs, ground truth answers, and an evaluation metric (or rewriting guidance).
(Optional) An AWS Lambda function for custom evaluation, or a custom LLM-as-a-judge configuration.
(Optional) Multimodal files (PNG, JPG, PDF) if your prompt template includes image or document inputs.

Step-by-Step Guide

Preparing Your Prompt Templates (JSONL Format)

The core of the optimization process is a JSONL file where each line contains a JSON object representing a prompt template. Follow this schema:

{
    "version": "bedrock-2026-05-14",
    "templateId": "my-template-1",
    "promptTemplate": "Answer the question based on the context: \nQuestion: ",
    "steeringCriteria": ["Be concise", "Use bullet points if listing"],
    "customEvaluationMetricLabel": "accuracy",
    "customLLMJConfig": {
        "customLLMJPrompt": "Evaluate if the answer matches the ground truth. Score 1 if correct, 0 otherwise.",
        "customLLJModelId": "amazon.nova-pro-v1:0"
    },
    "evaluationSamples": [
        {
            "inputVariables": {
                "context": "Amazon Bedrock is a fully managed service...",
                "question": "What is Amazon Bedrock?"
            },
            "referenceResponse": "Amazon Bedrock is a fully managed service that makes foundation models accessible via an API."
        }
    ]
}

version: Must be bedrock-2026-05-14 (fixed).
templateId: A unique identifier for your prompt template.
promptTemplate: The template with placeholders like .
steeringCriteria (optional): Additional instructions for the optimizer (e.g., style, tone).
customEvaluationMetricLabel: Required if you provide a custom LLM judge or Lambda ARN.
customLLMJConfig (optional): Define a custom LLM judge with its own prompt and model ID.
evaluationMetricLambdaArn (optional): ARN of a Lambda function that performs evaluation.
evaluationSamples: An array of objects, each containing inputVariables and referenceResponse. For multimodal inputs, include file paths or base64-encoded data in the variables.

Tip: If you use images, store them in S3 and reference them by URL or use base64 encoding. The tool supports PNG, JPG, and PDF formats.

Defining Evaluation Metrics

You must specify how the optimization will measure success. Choose one of these methods:

Natural language description: Provide a simple text goal (e.g., “Answers should be factual and under 100 words”). The optimizer internally interprets this.
AWS Lambda function: Write a Lambda that accepts the prompt, model response, and reference answer, returning a score. This is ideal for domain-specific metrics.
Custom LLM-as-a-judge: Configure an LLM (using Bedrock model IDs) to evaluate responses. You define the judge’s prompt.

Whichever method you choose, the optimizer uses the feedback loop to iteratively improve the prompt until the evaluation metric converges.

Selecting Models for Optimization

In the Bedrock console, navigate to Advanced Prompt Optimization and click Create prompt optimization. You can select up to five inference models:

Baseline model: Your current model (used for comparison).
Up to 4 target models: The models you’re considering migrating to, or additional models to compare optimized versions.

If you are not migrating, simply select your current model alone. The tool will then generate both the original prompt and an optimized version for that model.

Running the Optimization

Once your JSONL file is ready and models selected:

Upload the JSONL file in the console or provide an S3 path.
Set the evaluation method (choose from the three options above).
Click Start optimization.

The process runs a metric-driven loop: it tests variations of your prompt, evaluates results using your defined metric, and refines until satisfied. The duration depends on the number of samples, models, and iterations. You can monitor progress in the console.

Interpreting Results

After completion, you’ll see a comparison dashboard showing:

Original prompt and optimized prompt for each model.
Evaluation scores for both versions.
Cost estimates (per 1K inference requests).
Latency (average response time in milliseconds).

Use these metrics to decide whether to adopt the optimized prompt or migrate to a different model. You can also download the optimized prompt templates for further experimentation.

Common Mistakes and Troubleshooting

Incorrect JSONL formatting: Ensure each object is on a single line and that you don’t have trailing commas. Validate your JSONL with a linter before uploading.
Missing evaluation metric: If you omit both customLLMJConfig and evaluationMetricLambdaArn, the optimizer requires a natural language description in the steeringCriteria field—otherwise it will fail.
Reference responses not representative: Your ground truth answers must match the expected output format. Inconsistent or vague references confuse the optimizer.
Overly complex steering criteria: Too many contradictory instructions can degrade optimization. Stick to 2-3 clear criteria.
Ignoring multimodal limits: While the tool supports PNG, JPG, and PDF, very large files may exceed request limits. Compress images or use smaller resolutions.
Model selection mismatch: Not all models support the same input modalities (e.g., some models don’t accept images). Check model compatibility beforehand.

Summary

Amazon Bedrock Advanced Prompt Optimization empowers you to systematically improve prompts and compare models with minimal guesswork. By preparing well-structured JSONL templates, choosing a suitable evaluation method, and selecting up to five models, you can quickly identify the best combination of prompt and model for your use case. The tool’s built-in metrics—score, cost, and latency—give you data-driven confidence for migration or optimization decisions. Avoid common pitfalls like malformed JSON or missing evaluation metrics, and you’ll be on your way to more reliable, performant AI applications on Bedrock.

Tags: