Replicate
Replicate is an API for machine learning models. It currently hosts models like Llama v2, Gemma, and Mistral/Mixtral.
To run a model, specify the Replicate model name and version, like so:
replicate:replicate/llama70b-v2-chat:e951f18578850b652510200860fc4ea62b3b16fac280f83ff32282f87bbd2e48
Examples
Here's an example of using Llama on Replicate. In the case of Llama, the version hash and everything under config
is optional:
providers:
- id: replicate:meta/llama-2-7b-chat
config:
temperature: 0.01
max_length: 1024
prompt:
prefix: '[INST] '
suffix: ' [/INST]'
Here's an example of using Gemma on Replicate. Note that unlike Llama, it does not have a default version, so we specify the model version:
providers:
- id: replicate:google-deepmind/gemma-7b-it:2790a695e5dcae15506138cc4718d1106d0d475e6dca4b1d43f42414647993d5
config:
temperature: 0.01
max_new_tokens: 1024
prompt:
prefix: "<start_of_turn>user\n"
suffix: "<end_of_turn>\n<start_of_turn>model"
Configuration
The Replicate provider supports several configuration options that can be used to customize the behavior of the models, like so:
Parameter | Description |
---|---|
temperature | Controls randomness in the generation process. |
max_length | Specifies the maximum length of the generated text. |
max_new_tokens | Limits the number of new tokens to generate. |
top_p | Nucleus sampling: a float between 0 and 1. |
top_k | Top-k sampling: number of highest probability tokens to keep. |
repetition_penalty | Penalizes repetition of words in the generated text. |
system_prompt | Sets a system-level prompt for all requests. |
stop_sequences | Specifies stopping sequences that halt the generation. |
seed | Sets a seed for reproducible results. |
Not every model supports every completion parameter. Be sure to review the API provided by the model beforehand.
These parameters are supported for all models:
Parameter | Description |
---|---|
apiKey | The API key for authentication with Replicate. |
prompt.prefix | String added before each prompt. Useful for instruction/chat formatting. |
prompt.suffix | String added after each prompt. Useful for instruction/chat formatting. |
Supported environment variables:
REPLICATE_API_TOKEN
- Your Replicate API key.REPLICATE_API_KEY
- An alternative toREPLICATE_API_TOKEN
for your API key.REPLICATE_MAX_LENGTH
- Specifies the maximum length of the generated text.REPLICATE_TEMPERATURE
- Controls randomness in the generation process.REPLICATE_REPETITION_PENALTY
- Penalizes repetition of words in the generated text.REPLICATE_TOP_P
- Controls the nucleus sampling: a float between 0 and 1.REPLICATE_TOP_K
- Controls the top-k sampling: the number of highest probability vocabulary tokens to keep for top-k-filtering.REPLICATE_SEED
- Sets a seed for reproducible results.REPLICATE_STOP_SEQUENCES
- Specifies stopping sequences that halt the generation.REPLICATE_SYSTEM_PROMPT
- Sets a system-level prompt for all requests.
Images
Image generators such as SDXL can be used like so:
prompts:
- 'Generate an image: {{subject}}'
providers:
- id: replicate:image:stability-ai/sdxl:7762fd07cf82c948538e41f63f77d685e02b063e37e496e96eefd46c929f9bdc
config:
width: 768
height: 768
num_inference_steps: 50
tests:
- vars:
subject: fruit loops
Supported Parameters for Images
These parameters are supported for image generation models:
Parameter | Description |
---|---|
width | The width of the generated image. |
height | The height of the generated image. |
refine | Which refine style to use |
apply_watermark | Apply a watermark to the generated image. |
num_inference_steps | The number of inference steps to use during image generation. |
Not every model supports every image parameter. Be sure to review the API provided by the model beforehand.
Supported environment variables for images:
REPLICATE_API_TOKEN
- Your Replicate API key.REPLICATE_API_KEY
- An alternative toREPLICATE_API_TOKEN
for your API key.