Skip to main content

IBM BAM

The bam provider integrates with IBM's BAM API, allowing access to various models like meta-llama/llama-2-70b-chat and ibm/granite-13b-chat-v2.

Setup

This provider requires you to install the IBM SDK:

npm install @ibm-generative-ai/node-sdk

Configuration

Configure the BAM provider by specifying the model and various generation parameters. Here is an example of how to configure the BAM provider in your configuration file:

providers:
- id: bam:chat:meta-llama/llama-2-70b-chat
config:
temperature: 0.01
max_new_tokens: 1024
prompt:
prefix: '[INST] '
suffix: '[/INST] '
- id: bam:chat:ibm/granite-13b-chat-v2
config:
temperature: 0.01
max_new_tokens: 1024
prompt:
prefix: '[INST] '
suffix: '[/INST] '

Authentication

To use the BAM provider, you need to set the BAM_API_KEY environment variable or specify the apiKey directly in the provider configuration. The API key can also be dynamically fetched from an environment variable specified in the apiKeyEnvar field in the configuration.

export BAM_API_KEY='your-bam-api-key'

API Client Initialization

The BAM provider initializes an API client using the IBM Generative AI Node SDK. The endpoint for the BAM API is configured to https://bam-api.res.ibm.com/.

Configuration

ParameterTypeDescription
top_knumberControls diversity via random sampling: lower values make sampling more deterministic.
top_pnumberNucleus sampling: higher values cause the model to consider more candidates.
typical_pnumberControls the "typicality" during sampling, balancing between top_k and top_p.
beam_widthnumberSets the beam width for beam search decoding, controlling the breadth of the search.
time_limitnumberMaximum time in milliseconds the model should take to generate a response.
random_seednumberSeed for random number generator, ensuring reproducibility of the output.
temperaturenumberControls randomness. Lower values make the model more deterministic.
length_penaltyobjectAdjusts the length of the generated output. Includes start_index and decay_factor.
max_new_tokensnumberMaximum number of new tokens to generate.
min_new_tokensnumberMinimum number of new tokens to generate.
return_optionsobjectOptions for additional information to return with the output, such as token probabilities.
stop_sequencesstring[]Array of strings that, if generated, will stop the generation.
decoding_methodstringSpecifies the decoding method, e.g., 'greedy' or 'sample'.
repetition_penaltynumberPenalty applied to discourage repetition in the output.
include_stop_sequencebooleanWhether to include stop sequences in the output.
truncate_input_tokensnumberMaximum number of tokens to consider from the input text.

Moderation Parameters

Moderation settings can also be specified to manage content safety and compliance:

ParameterTypeDescription
hapobjectSettings for handling hate speech. Can be enabled/disabled and configured with thresholds.
stigmaobjectSettings for handling stigmatizing content. Includes similar configurations as hap.
implicit_hateobjectSettings for managing implicitly hateful content.

Each moderation parameter can include the following sub-parameters: input, output, threshold, and send_tokens to customize the moderation behavior.

Here's an example:

providers:
- id: bam:chat:ibm/granite-13b-chat-v2
config:
moderations:
hap:
input: true
output: true
threshold: 0.9