Bedrock

The bedrock lets you use Amazon Bedrock in your evals. This is a common way to access Anthropic's Claude and other models. The complete list of available models can be found here.

Setup

Ensure you have access to the desired models under the Providers page in Amazon Bedrock.

Install @aws-sdk/client-bedrock-runtime:

npm install -g @aws-sdk/client-bedrock-runtime

The AWS SDK will automatically pull credentials from the following locations:
- IAM roles on EC2
- ~/.aws/credentials
- AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY environment variables
See setting node.js credentials (AWS) for more details.
Edit your configuration file to point to the AWS Bedrock provider. Here's an example:
```
providers:
  - id: bedrock:anthropic.claude-3-5-sonnet-20240620-v1:0
```
Note that the provider is bedrock: followed by the ARN/model id of the model.

Additional config parameters are passed like so:

providers:
  - id: bedrock:anthropic.claude-3-5-sonnet-20240620-v1:0
    config:
      region: 'us-west-2'
      temperature: 0.7
      max_tokens: 256

Example

See Github for full examples of Claude and Titan model usage.

prompts:
  - 'Write a tweet about {{topic}}'

providers:
  - id: bedrock:anthropic.claude-instant-v1
    config:
      region: 'us-east-1'
      temperature: 0.7
      max_tokens_to_sample: 256
  - id: bedrock:anthropic.claude-3-haiku-20240307-v1:0
    config:
      region: 'us-east-1'
      temperature: 0.7
      max_tokens: 256
  - id: bedrock:anthropic.claude-3-5-sonnet-20240620-v1:0
    config:
      region: 'us-east-1'
      temperature: 0.7
      max_tokens: 256

tests:
  - vars:
      topic: Our eco-friendly packaging
  - vars:
      topic: A sneak peek at our secret menu item
  - vars:
      topic: Behind-the-scenes at our latest photoshoot

Model-graded Tests

By default, model-graded tests use OpenAI and require the OPENAI_API_KEY environment variable to be set. When using AWS Bedrock, you have the option of overriding the grader for model-graded assertions to point to AWS Bedrock, or other providers.

Note that because of how model-graded evals are implemented, the LLM grading models must support chat-formatted prompts (except for embedding or classification models).

To set this for all your test cases, add the defaultTest property to your config:

defaultTest:
  options:
    provider:
      id: provider:chat:modelname
      config:
        # Provider config options

You can also do this for individual assertions:

# ...
assert:
  - type: llm-rubric
    value: Do not mention that you are an AI or chat assistant
    provider:
      text:
        id: provider:chat:modelname
        config:
          region: us-east-1
          # Other provider config options...

Or for individual tests:

# ...
tests:
  - vars:
      # ...
    options:
      provider:
        id: provider:chat:modelname
        config:
          # Provider config options
    assert:
      - type: llm-rubric
        value: Do not mention that you are an AI or chat assistant

Embeddings

To override the embeddings provider for all assertions that require embeddings (such as similarity), use defaultTest:

defaultTest:
  options:
    provider:
      embedding:
        id: bedrock:embeddings:amazon.titan-embed-text-v2:0
        config:
          region: us-east-1

Bedrock

Setup​

Example​

Model-graded Tests​

Embeddings​

Setup

Example

Model-graded Tests

Embeddings