Bedrock
The bedrock
lets you use Amazon Bedrock in your evals. This is a common way to access Anthropic's Claude, Meta's Llama 3.1, AI21's Jamba, and other models. The complete list of available models can be found here.
Setup
-
Ensure you have access to the desired models under the Providers page in Amazon Bedrock.
-
Install
@aws-sdk/client-bedrock-runtime
:npm install -g @aws-sdk/client-bedrock-runtime
-
The AWS SDK will automatically pull credentials from the following locations:
- IAM roles on EC2
~/.aws/credentials
AWS_ACCESS_KEY_ID
andAWS_SECRET_ACCESS_KEY
environment variables
See setting node.js credentials (AWS) for more details.
-
Edit your configuration file to point to the AWS Bedrock provider. Here's an example:
providers:
- id: bedrock:anthropic.claude-3-5-sonnet-20241022-v2:0Note that the provider is
bedrock:
followed by the ARN/model id of the model. -
Additional config parameters are passed like so:
providers:
- id: bedrock:anthropic.claude-3-5-sonnet-20241022-v2:0
config:
accessKeyId: YOUR_ACCESS_KEY_ID
secretAccessKey: YOUR_SECRET_ACCESS_KEY
region: 'us-west-2'
max_tokens: 256
temperature: 0.7
Example
See Github for full examples of Claude, AI21, Llama 3.1, and Titan model usage.
prompts:
- 'Write a tweet about {{topic}}'
providers:
- id: bedrock:meta.llama3-1-405b-instruct-v1:0
config:
region: 'us-east-1'
temperature: 0.7
max_tokens: 256
- id: bedrock:ai21.jamba-1-5-large-v1:0
config:
region: 'us-east-1'
temperature: 0.7
max_tokens: 256
- id: bedrock:anthropic.claude-3-5-sonnet-20241022-v2:0
config:
region: 'us-east-1'
temperature: 0.7
max_tokens: 256
tests:
- vars:
topic: Our eco-friendly packaging
- vars:
topic: A sneak peek at our secret menu item
- vars:
topic: Behind-the-scenes at our latest photoshoot
Model-specific Configuration
Different models may support different configuration options. Here are some model-specific parameters:
AI21 Models
For AI21 models (e.g., ai21.jamba-1-5-mini-v1:0
, ai21.jamba-1-5-large-v1:0
), you can use the following configuration options:
config:
max_tokens: 256
temperature: 0.7
top_p: 0.9
frequency_penalty: 0.5
presence_penalty: 0.3
Claude Models
For Claude models (e.g., anthropic.claude-3-5-sonnet-20241022-v2:0
), you can use the following configuration options:
config:
max_tokens: 256
temperature: 0.7
anthropic_version: 'bedrock-2023-05-31'
tools: [...] # Optional: Specify available tools
tool_choice: { ... } # Optional: Specify tool choice
Titan Models
For Titan models (e.g., amazon.titan-text-express-v1
), you can use the following configuration options:
config:
maxTokenCount: 256
temperature: 0.7
topP: 0.9
stopSequences: ['END']
Llama
For Llama models (e.g., meta.llama3-1-70b-instruct-v1:0
), you can use the following configuration options:
config:
max_gen_len: 256
temperature: 0.7
top_p: 0.9
Cohere Models
For Cohere models (e.g., cohere.command-text-v14
), you can use the following configuration options:
config:
max_tokens: 256
temperature: 0.7
p: 0.9
k: 0
stop_sequences: ['END']
Mistral Models
For Mistral models (e.g., mistral.mistral-7b-instruct-v0:2
), you can use the following configuration options:
config:
max_tokens: 256
temperature: 0.7
top_p: 0.9
top_k: 50
Model-graded tests
You can use Bedrock models to grade outputs. By default, model-graded tests use OpenAI and require the OPENAI_API_KEY
environment variable to be set. However, when using AWS Bedrock, you have the option of overriding the grader for model-graded assertions to point to AWS Bedrock or other providers.
Note that because of how model-graded evals are implemented, the LLM grading models must support chat-formatted prompts (except for embedding or classification models).
To set this for all your test cases, add the defaultTest
property to your config:
defaultTest:
options:
provider:
id: provider:chat:modelname
config:
temperature: 0
# Other provider config options
You can also do this for individual assertions:
# ...
assert:
- type: llm-rubric
value: Do not mention that you are an AI or chat assistant
provider:
text:
id: provider:chat:modelname
config:
region: us-east-1
temperature: 0
# Other provider config options...
Or for individual tests:
# ...
tests:
- vars:
# ...
options:
provider:
id: provider:chat:modelname
config:
temperature: 0
# Other provider config options
assert:
- type: llm-rubric
value: Do not mention that you are an AI or chat assistant
Embeddings
To override the embeddings provider for all assertions that require embeddings (such as similarity), use defaultTest
:
defaultTest:
options:
provider:
embedding:
id: bedrock:embeddings:amazon.titan-embed-text-v2:0
config:
region: us-east-1
Environment Variables
The following environment variables can be used to configure the Bedrock provider:
AWS_BEDROCK_REGION
: Default region for Bedrock API callsAWS_BEDROCK_MAX_TOKENS
: Default maximum number of tokens to generateAWS_BEDROCK_TEMPERATURE
: Default temperature for generationAWS_BEDROCK_TOP_P
: Default top_p value for generationAWS_BEDROCK_STOP
: Default stop sequences (as a JSON string)AWS_BEDROCK_FREQUENCY_PENALTY
: Default frequency penalty (for supported models)AWS_BEDROCK_PRESENCE_PENALTY
: Default presence penalty (for supported models)
Model-specific environment variables:
MISTRAL_MAX_TOKENS
,MISTRAL_TEMPERATURE
,MISTRAL_TOP_P
,MISTRAL_TOP_K
: For Mistral modelsCOHERE_TEMPERATURE
,COHERE_P
,COHERE_K
,COHERE_MAX_TOKENS
: For Cohere models
These environment variables can be overridden by the configuration specified in the YAML file.