本站内所有文档均为中英对照文档，点击中文可以显示英文。此提醒连续关闭5次后，将不再显示。

文档内容来源于 spring.io，由 springdoc.tech 翻译，版权归属于 SPRING.IO (Broadcom. Inc)。可供个人学习、研究，未经许可，不得进行转载或用于商业行为。

NVIDIA 聊天

DeepSeek V3 中英对照 NVIDIA NVIDIA Chat

NVIDIA LLM API 是一个代理 AI 推理引擎，提供来自各种供应商的广泛模型。

Spring AI 通过复用现有的 OpenAI 客户端与 NVIDIA LLM API 进行集成。为此，你需要将 base-url 设置为 [integrate.api.nvidia.com](https://integrate.api.nvidia.com)，选择提供的 LLM 模型之一，并获取其 api-key。

spring ai nvidia llm api 1

备注

NVIDIA LLM API 要求显式设置 max-tokens 参数，否则会抛出服务器错误。

查看 NvidiaWithOpenAiChatModelIT.java 测试，了解如何使用 Spring AI 与 NVIDIA LLM API 的示例。

先决条件

创建一个 NVIDIA 账户，确保有足够的积分。
选择一个要使用的 LLM 模型。例如下图中的 meta/llama-3.1-70b-instruct。
从所选模型的页面中，你可以获取访问该模型的 api-key。

spring ai nvidia 注册

自动配置

Spring AI 为 OpenAI Chat Client 提供了 Spring Boot 自动配置。要启用它，请将以下依赖项添加到项目的 Maven pom.xml 文件中：

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-openai-spring-boot-starter</artifactId>
</dependency>
xml

或添加到你的 Gradle build.gradle 构建文件中。

dependencies {
    implementation 'org.springframework.ai:spring-ai-openai-spring-boot-starter'
}
groovy

提示

请参考依赖管理部分，将 Spring AI BOM 添加到你的构建文件中。

聊天属性

重试属性

前缀 spring.ai.retry 用作属性前缀，允许你为 OpenAI 聊天模型配置重试机制。

属性	描述	默认值
spring.ai.retry.max-attempts	最大重试次数。	10
spring.ai.retry.backoff.initial-interval	指数退避策略的初始休眠时间。	2 秒
spring.ai.retry.backoff.multiplier	退避间隔的倍数。	5
spring.ai.retry.backoff.max-interval	最大退避时间。	3 分钟
spring.ai.retry.on-client-errors	如果为 false，则抛出 NonTransientAiException，并且不会对 `4xx` 客户端错误代码进行重试。	false
spring.ai.retry.exclude-on-http-codes	不应触发重试的 HTTP 状态代码列表（例如，抛出 NonTransientAiException）。	空
spring.ai.retry.on-http-codes	应触发重试的 HTTP 状态代码列表（例如，抛出 TransientAiException）。	空

连接属性

前缀 spring.ai.openai 用作属性前缀，允许你连接到 OpenAI。

属性	描述	默认值
spring.ai.openai.base-url	连接的 URL。必须设置为 `[integrate.api.nvidia.com](https://integrate.api.nvidia.com)`	-
spring.ai.openai.api-key	NVIDIA API 密钥	-

配置属性

前缀 spring.ai.openai.chat 是一个属性前缀，允许你为 OpenAI 的聊天模型实现进行配置。

属性	描述	默认
spring.ai.openai.chat.enabled	启用 OpenAI 聊天模型。	true
spring.ai.openai.chat.base-url	可选地覆盖 `spring.ai.openai.base-url` 以提供聊天特定的 URL。必须设置为 `[integrate.api.nvidia.com](https://integrate.api.nvidia.com)`。	-
spring.ai.openai.chat.api-key	可选地覆盖 `spring.ai.openai.api-key` 以提供特定于聊天的 API 密钥。	-
spring.ai.openai.chat.options.model	使用的 NVIDIA LLM 模型	-
spring.ai.openai.chat.options.temperature	用于控制生成补全内容表现出的创造性的采样温度。较高的值会使输出更加随机，而较低的值会使结果更加集中和确定性。不建议在同一个补全请求中同时修改 `temperature` 和 `top_p`，因为这两个设置的交互作用难以预测。	0.8
spring.ai.openai.chat.options.frequencyPenalty	数值介于 -2.0 和 2.0 之间。正值会根据新 token 在文本中已有的出现频率对其进行惩罚，从而降低模型逐字重复相同内容的可能性。	0.0f
spring.ai.openai.chat.options.maxTokens	在聊天补全中生成的最大 token 数量。输入 token 和生成 token 的总长度受到模型上下文长度的限制。	注意：NVIDIA LLM API 要求显式设置 `max-tokens` 参数，否则会抛出服务器错误。
spring.ai.openai.chat.options.n	为每个输入消息生成多少个聊天完成选项。请注意，您将根据所有选项中生成的 token 数量进行计费。将 `n` 保持为 1 以最小化成本。	1
spring.ai.openai.chat.options.presencePenalty	数值在 -2.0 到 2.0 之间。正值会根据新 token 是否已出现在当前文本中对其进行惩罚，从而增加模型讨论新话题的可能性。	-
spring.ai.openai.chat.options.responseFormat	一个指定模型必须输出的格式的对象。设置为 `{ "type": "json_object" }` 可以启用 JSON 模式，该模式保证模型生成的消息是有效的 JSON。	-
spring.ai.openai.chat.options.seed	此功能处于测试阶段（Beta）。如果指定了此功能，我们的系统将尽最大努力进行确定性采样，以便使用相同的种子和参数重复请求应返回相同的结果。	-
spring.ai.openai.chat.options.stop	最多 4 个序列，API 将在生成这些序列后停止生成更多的 token。	-
spring.ai.openai.chat.options.topP	一种替代基于温度的采样方法，称为核心采样（nucleus sampling），在这种方法中，模型会考虑概率质量最高的前 `top_p` 部分的 token。例如，`top_p=0.1` 表示只考虑概率质量最高的前 10% 的 token。我们通常建议调整 `top_p` 或温度参数之一，而不是同时调整两者。	-
spring.ai.openai.chat.options.tools	模型可以调用的工具列表。目前，仅支持函数作为工具。使用此选项提供模型可能生成 JSON 输入的函数列表。	-
spring.ai.openai.chat.options.toolChoice	控制模型调用哪个（如果有的话）函数。`none` 表示模型不会调用函数，而是生成一条消息。`auto` 表示模型可以选择生成消息或调用函数。通过指定 `{"type": "function", "function": {"name": "my_function"}}` 可以强制模型调用特定函数。如果没有函数存在，默认值为 `none`。如果存在函数，默认值为 `auto`。	-
spring.ai.openai.chat.options.user	一个代表您终端用户的唯一标识符，可以帮助 OpenAI 监控和检测滥用行为。	-
spring.ai.openai.chat.options.functions	在单个提示请求中启用的函数列表，通过函数名称标识。这些名称的函数必须存在于 `functionCallbacks` 注册表中。	-
spring.ai.openai.chat.options.stream-usage	（仅适用于流式传输）设置为在整个请求中添加一个包含令牌使用统计信息的附加块。此块的 `choices` 字段为空数组，所有其他块也将包含一个 `usage` 字段，但其值为 `null`。	false
spring.ai.openai.chat.options.proxy-tool-calls	如果为 `true`，Spring AI 将不会在内部处理函数调用，而是将其代理给客户端。此时，客户端负责处理函数调用，将其分派到适当的函数并返回结果。如果为 `false`（默认值），Spring AI 将在内部处理函数调用。仅适用于支持函数调用的聊天模型。	false

提示

所有以 spring.ai.openai.chat.options 为前缀的属性都可以在运行时通过在 Prompt 调用中添加特定请求的运行时选项来覆盖。

运行时选项

OpenAiChatOptions.java 提供了模型的配置选项，例如使用的模型、温度（temperature）、频率惩罚（frequency penalty）等。

在启动时，默认选项可以通过 OpenAiChatModel(api, options) 构造函数或 spring.ai.openai.chat.options.* 属性进行配置。

在运行时，你可以通过向 Prompt 调用添加新的、特定于请求的选项来覆盖默认选项。例如，要为特定请求覆盖默认的模型和温度设置：

ChatResponse response = chatModel.call(
    new Prompt(
        "Generate the names of 5 famous pirates.",
        OpenAiChatOptions.builder()
            .withModel("mixtral-8x7b-32768")
            .withTemperature(0.4)
        .build()
    ));
java

提示

除了特定于模型的 OpenAiChatOptions，你还可以使用一个可移植的 ChatOptions 实例，该实例是通过 ChatOptionsBuilder#builder() 创建的。

函数调用

NVIDIA LLM API 在选择支持工具/函数调用的模型时，支持工具/函数调用功能。

spring ai nvidia 函数调用

你可以将自定义的 Java 函数注册到你的 ChatModel 中，并让提供的模型智能地选择输出一个包含参数的 JSON 对象，以调用一个或多个已注册的函数。这是一种强大的技术，可以将 LLM 的能力与外部工具和 API 连接起来。

工具示例

以下是一个简单的示例，展示如何在 Spring AI 中使用 NVIDIA LLM API 函数调用：

spring.ai.openai.api-key=${NVIDIA_API_KEY}
spring.ai.openai.base-url=https://integrate.api.nvidia.com
spring.ai.openai.chat.options.model=meta/llama-3.1-70b-instruct
spring.ai.openai.chat.options.max-tokens=2048
application.properties

@SpringBootApplication
public class NvidiaLlmApplication {

    public static void main(String[] args) {
        SpringApplication.run(NvidiaLlmApplication.class, args);
    }

    @Bean
    CommandLineRunner runner(ChatClient.Builder chatClientBuilder) {
        return args -> {
            var chatClient = chatClientBuilder.build();

            var response = chatClient.prompt()
                .user("What is the weather in Amsterdam and Paris?")
                .functions("weatherFunction") // reference by bean name.
                .call()
                .content();

            System.out.println(response);
        };
    }

    @Bean
    @Description("Get the weather in location")
    public Function<WeatherRequest, WeatherResponse> weatherFunction() {
        return new MockWeatherService();
    }

    public static class MockWeatherService implements Function<WeatherRequest, WeatherResponse> {

        public record WeatherRequest(String location, String unit) {}
        public record WeatherResponse(double temp, String unit) {}

        @Override
        public WeatherResponse apply(WeatherRequest request) {
            double temperature = request.location().contains("Amsterdam") ? 20 : 25;
            return new WeatherResponse(temperature, request.unit);
        }
    }
}
java

在这个示例中，当模型需要天气信息时，它会自动调用 weatherFunction bean，然后该 bean 可以获取实时天气数据。预期的响应如下所示：“阿姆斯特丹的当前气温为 20 摄氏度，巴黎的当前气温为 25 摄氏度。”

了解更多关于 OpenAI 函数调用的信息。

示例控制器

创建一个新的 Spring Boot 项目，并将 spring-ai-openai-spring-boot-starter 添加到你的 pom（或 gradle）依赖中。

在 src/main/resources 目录下添加一个 application.properties 文件，以启用并配置 OpenAi 聊天模型：

spring.ai.openai.api-key=${NVIDIA_API_KEY}
spring.ai.openai.base-url=https://integrate.api.nvidia.com
spring.ai.openai.chat.options.model=meta/llama-3.1-70b-instruct

# The NVIDIA LLM API doesn't support embeddings, so we need to disable it.
spring.ai.openai.embedding.enabled=false

# The NVIDIA LLM API requires this parameter to be set explicitly or server internal error will be thrown.
spring.ai.openai.chat.options.max-tokens=2048
application.properties

提示

将 api-key 替换为你的 NVIDIA 凭证。

备注

NVIDIA LLM API 要求明确设置 max-token 参数，否则会抛出服务器错误。

以下是一个简单的 @Controller 类示例，该类使用聊天模型进行文本生成。

@RestController
public class ChatController {

    private final OpenAiChatModel chatModel;

    @Autowired
    public ChatController(OpenAiChatModel chatModel) {
        this.chatModel = chatModel;
    }

    @GetMapping("/ai/generate")
    public Map generate(@RequestParam(value = "message", defaultValue = "Tell me a joke") String message) {
        return Map.of("generation", this.chatModel.call(message));
    }

    @GetMapping("/ai/generateStream")
	public Flux<ChatResponse> generateStream(@RequestParam(value = "message", defaultValue = "Tell me a joke") String message) {
        Prompt prompt = new Prompt(new UserMessage(message));
        return this.chatModel.stream(prompt);
    }
}
java

先决条件​

自动配置​

聊天属性​

重试属性​

连接属性​

配置属性​

运行时选项​

函数调用​

工具示例​

示例控制器​