Skip to main content

Google Vertex

The vertex provider is compatible with Google's Vertex AI offering, which offers access to models such as Gemini and Bison.

You can use it by specifying any of the available stable or latest model versions offered by Vertex AI. These include:

  • vertex:chat-bison
  • vertex:chat-bison@001
  • vertex:chat-bison@002
  • vertex:chat-bison-32k
  • vertex:chat-bison-32k@001
  • vertex:chat-bison-32k@002
  • vertex:codechat-bison
  • vertex:codechat-bison@001
  • vertex:codechat-bison@002
  • vertex:codechat-bison-32k
  • vertex:codechat-bison-32k@001
  • vertex:codechat-bison-32k@002
  • vertex:gemini-pro
  • vertex:gemini-ultra
  • vertex:gemini-1.0-pro-vision
  • vertex:gemini-1.0-pro-vision-001
  • vertex:gemini-1.0-pro
  • vertex:gemini-1.0-pro-001
  • vertex:gemini-1.0-pro-002
  • vertex:gemini-pro-vision
  • vertex:gemini-1.5-pro-latest
  • vertex:gemini-1.5-pro-preview-0409
  • vertex:gemini-1.5-pro-preview-0514
  • vertex:gemini-1.5-pro
  • vertex:gemini-1.5-pro-001
  • vertex:gemini-1.0-pro-vision-001
  • vertex:gemini-1.5-flash-preview-0514
  • vertex:gemini-1.5-flash-001
  • vertex:aqa

Embeddings models such as vertex:embedding:text-embedding-004 are also supported.

tip

If you are using Google AI Studio, see the google provider.

Authenticating with Google

To call Vertex AI models in Node, you'll need to install Google's official auth client as a peer dependency:

npm i google-auth-library

Make sure the Vertex AI API is enabled for the relevant project in Google Cloud. Then, ensure that you've selected that project in the gcloud cli:

gcloud config set project PROJECT_ID

Next, make sure that you've authenticated to Google Cloud using one of these methods:

  • You are logged into an account using gcloud auth application-default login
  • You are running on a machine that uses a service account with the appropriate role
  • You have downloaded the credentials for a service account with the appropriate role and set the GOOGLE_APPLICATION_CREDENTIALS environment variable to the path of the credentials file.

Environment variables

  • VERTEX_API_KEY - gcloud API token. The easiest way to get an API key is to run gcloud auth print-access-token.
  • VERTEX_PROJECT_ID - gcloud project ID
  • VERTEX_REGION - gcloud region, defaults to us-central1
  • VERTEX_PUBLISHER - model publisher, defaults to google
  • VERTEX_API_HOST - used to override the full Google API host, e.g. for an LLM proxy, defaults to {region}-aiplatform.googleapis.com

The Vertex provider also supports various configuration options such as context, examples, temperature, maxOutputTokens, and more, which can be used to customize the the behavior of the model like so:

providers:
- id: vertex:chat-bison-32k
config:
generationConfig:
temperature: 0
maxOutputTokens: 1024

Safety settings

AI safety settings can be configured using the safetySettings key. For example:

- id: vertex:gemini-pro
config:
safetySettings:
- category: HARM_CATEGORY_HARASSMENT
threshold: BLOCK_ONLY_HIGH
- category: HARM_CATEGORY_VIOLENCE
threshold: BLOCK_MEDIUM_AND_ABOVE

See more details on Google's SafetySetting API here.

Overriding graders

To use Vertex models for model grading (e.g. llm-rubric and factuality assertions), or to override the embeddings provider for the similar assertion, use defaultTest to override providers for all tests:

defaultTest:
options:
provider:
# Use gemini-pro for model-graded evals (e.g. assertions such as llm-rubric)
text: vertex:chat:gemini-pro
# Use vertex embeddings for similarity
embedding: vertex:embedding:text-embedding-004

Other configuration options

OptionDescriptionDefault Value
apiKeygcloud API token.None
apiHostFull Google API host, e.g., for an LLM proxy.{region}-aiplatform.googleapis.com
projectIdgcloud project ID.None
regiongcloud region.us-central1
publisherModel publisher.google
contextContext for the model to consider when generating responses.None
examplesExamples to prime the model.None
safetySettingsSafety settings to filter generated content.None
generationConfig.temperatureControls randomness. Lower values make the model more deterministic.None
generationConfig.maxOutputTokensMaximum number of tokens to generate.None
generationConfig.topPNucleus sampling: higher values cause the model to consider more candidates.None
generationConfig.topKControls diversity via random sampling: lower values make sampling more deterministic.None
generationConfig.stopSequencesSet of string outputs that will stop output generation.[]
toolConfigConfiguration for tool usageNone
systemInstructionSystem prompt. Nunjucks template variables {{var}} are supportedNone

Note that not all models support all parameters. Please consult the Google documentation on how to use and format parameters.