本站内所有文档均为中英对照文档，点击中文可以显示英文。此提醒连续关闭5次后，将不再显示。

文档内容来源于 spring.io，由 springdoc.tech 翻译，版权归属于 SPRING.IO (Broadcom. Inc)。可供个人学习、研究，未经许可，不得进行转载或用于商业行为。

评估测试

DeepSeek V3 中英对照 AI Model Evaluation Evaluation Testing

测试 AI 应用程序需要评估生成的内容，以确保 AI 模型没有产生幻觉响应。

评估响应的一种方法是使用 AI 模型本身进行评估。选择最佳的 AI 模型进行评估，这可能与生成响应所使用的模型不同。

用于评估响应的 Spring AI 接口是 Evaluator，定义如下：

@FunctionalInterface
public interface Evaluator {
    EvaluationResponse evaluate(EvaluationRequest evaluationRequest);
}
java

评估的输入是定义为 EvaluationRequest 的请求

public class EvaluationRequest {

	private final String userText;

	private final List<Content> dataList;

	private final String responseContent;

	public EvaluationRequest(String userText, List<Content> dataList, String responseContent) {
		this.userText = userText;
		this.dataList = dataList;
		this.responseContent = responseContent;
	}

  ...
}
java

userText: 来自用户的原始输入，类型为 String
dataList: 上下文数据，例如来自检索增强生成（Retrieval Augmented Generation）的数据，附加到原始输入中。
responseContent: AI 模型的响应内容，类型为 String

相关性评估器

一个实现是 RelevancyEvaluator，它使用 AI 模型进行评估。未来版本中将提供更多实现。

RelevancyEvaluator 使用输入 (userText) 和 AI 模型的输出 (chatResponse) 来提出问题：

Your task is to evaluate if the response for the query
is in line with the context information provided.\n
You have two options to answer. Either YES/ NO.\n
Answer - YES, if the response for the query
is in line with context information otherwise NO.\n
Query: \n {query}\n
Response: \n {response}\n
Context: \n {context}\n
Answer: "
text

以下是一个 JUnit 测试的示例，该测试对加载到 Vector Store 中的 PDF 文档执行 RAG 查询，然后评估响应是否与用户文本相关。

@Test
void testEvaluation() {

    dataController.delete();
    dataController.load();

    String userText = "What is the purpose of Carina?";

    ChatResponse response = ChatClient.builder(chatModel)
            .build().prompt()
            .advisors(new QuestionAnswerAdvisor(vectorStore))
            .user(userText)
            .call()
            .chatResponse();
    String responseContent = response.getResult().getOutput().getContent();

    var relevancyEvaluator = new RelevancyEvaluator(ChatClient.builder(chatModel));

    EvaluationRequest evaluationRequest = new EvaluationRequest(userText,
            (List<Content>) response.getMetadata().get(QuestionAnswerAdvisor.RETRIEVED_DOCUMENTS), responseContent);

    EvaluationResponse evaluationResponse = relevancyEvaluator.evaluate(evaluationRequest);

    assertTrue(evaluationResponse.isPass(), "Response is not relevant to the question");

}
java

上面的代码来自位于这里的示例应用程序。

FactCheckingEvaluator

FactCheckingEvaluator 是 Evaluator 接口的另一种实现，旨在根据提供的上下文评估 AI 生成响应的准确性。该评估器通过验证给定声明（claim）是否在逻辑上由提供的上下文（document）支持，来帮助检测和减少 AI 输出中的幻觉（hallucinations）。

'claim' 和 'document' 被提交给 AI 模型进行评估。为此目的，有更小、更高效的 AI 模型可用，例如 Bespoke 的 Minicheck，与 GPT-4 等旗舰模型相比，它有助于降低执行这些检查的成本。Minicheck 也可以通过 Ollama 使用。

使用

FactCheckingEvaluator 构造函数接受一个 ChatClient.Builder 作为参数：

public FactCheckingEvaluator(ChatClient.Builder chatClientBuilder) {
  this.chatClientBuilder = chatClientBuilder;
}
java

评估者使用以下提示模板进行事实核查：

Document: {document}
Claim: {claim}

text

其中 {document} 是上下文信息，{claim} 是需要评估的 AI 模型的响应。

示例

以下是一个使用基于 Ollama 的 ChatModel（特别是 Bespoke-Minicheck 模型）的 FactCheckingEvaluator 的示例：

@Test
void testFactChecking() {
  // Set up the Ollama API
  OllamaApi ollamaApi = new OllamaApi("http://localhost:11434");

  ChatModel chatModel = new OllamaChatModel(ollamaApi,
				OllamaOptions.builder().model(BESPOKE_MINICHECK).numPredict(2).temperature(0.0d).build())

  // Create the FactCheckingEvaluator
  var factCheckingEvaluator = new FactCheckingEvaluator(ChatClient.builder(chatModel));

  // Example context and claim
  String context = "The Earth is the third planet from the Sun and the only astronomical object known to harbor life.";
  String claim = "The Earth is the fourth planet from the Sun.";

  // Create an EvaluationRequest
  EvaluationRequest evaluationRequest = new EvaluationRequest(context, Collections.emptyList(), claim);

  // Perform the evaluation
  EvaluationResponse evaluationResponse = factCheckingEvaluator.evaluate(evaluationRequest);

  assertFalse(evaluationResponse.isPass(), "The claim should not be supported by the context");

}
java

相关性评估器​

FactCheckingEvaluator​

使用​

示例​

相关性评估器

FactCheckingEvaluator

使用

示例