Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
{
"createdAt": "2025-08-18 09:38:01.990700",
"name": "groundedness",
"scenario": "genai-evaluations",
"version": "0.0.1",
"evaluationMethod": "llm-as-a-judge",
"metricType": "evaluation",
"managedBy": "imperative",
"systemPredefined": false,
"spec": {
"promptType": "free-form",
"configuration": {
"modelConfiguration": {
"name": "gpt-4o",
"version": "2024-08-06",
"parameters": [
{
"key": "temperature",
"value": "0.1"
},
{
"key": "max_tokens",
"value": "110"
}
]
},
"promptConfiguration": {
"systemPrompt": "You should strictly follow the instruction given to you. Please act as an impartial judge and evaluate the quality of the responses based on the prompt and following criteria:",
"userPrompt": "You are an expert evaluator. Your task is to evaluate the quality of the responses generated by AI models. We will provide you with a reference and an AI-generated response. You should first read the user input carefully for analyzing the task, and then evaluate the quality of the responses based on the criteria provided in the Evaluation section below. You will assign the response a rating following the Rating Rubric and Evaluation Steps. Give step-by-step explanations for your rating, and only choose ratings from the Rating Rubric.\n\n## Metric Definition\nYou are an INFORMATION OVERLAP classifier providing the overlap of information between a response and reference.\n\n## Criteria\nGroundedness: The of information between a response generated by AI models and provided reference.\n\n## Rating Rubric\n5: (Fully grounded). The response and the reference are fully overlapped.\n4: (Mostly grounded). The response and the reference are mostly overlapped.\n3: (Somewhat grounded). The response and the reference are somewhat overlapped.\n2: (Poorly grounded). The response and the reference are slightly overlapped.\n1: (Not grounded). There is no overlap between the response and the reference.\n\n## Evaluation Steps\nSTEP 1: Assess the response in aspects of Groundedness. Identify any information in the response and provide assessment according to the Criteria.\nSTEP 2: Score based on the rating rubric. Give a brief rationale to explain your evaluation considering Groundedness.\n\nReference: {{?reference}}\nResponse: {{?aicore_llm_completion}}\n\nBegin your evaluation by providing a short explanation. Be as unbiased as possible. After providing your explanation, please rate the response according to the rubric and outputs STRICTLY following this JSON format:\n\n{ \"explanation\": string, \"rating\": integer }\n\nOutput:\n",
"dataType": "numeric"
}
}
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{"createdAt":"2025-08-18 09:38:01.990700","name":"groundedness","scenario":"genai-evaluations","version":"0.1.6","evaluationMethod":"llm-as-a-judge", "metricType":"evaluation", "managedBy":"imperative","systemPredefined":false,"spec":{"promptType":"free-form","configuration":{"modelConfiguration":{"name":"gpt-4o","version":"2024-08-06","parameters":[{"key":"temperature","value":"0.1"},{"key":"max_tokens","value":"110"}]},"promptConfiguration":{"systemPrompt":"You should strictly follow the instruction given to you. Please act as an impartial judge and evaluate the quality of the responses based on the prompt and following criteria:","userPrompt":"You are an expert evaluator. Your task is to evaluate the quality of the responses generated by AI models. We will provide you with a reference and an AI-generated response. You should first read the user input carefully for analyzing the task, and then evaluate the quality of the responses based on the criteria provided in the Evaluation section below. You will assign the response a rating following the Rating Rubric and Evaluation Steps. Give step-by-step explanations for your rating, and only choose ratings from the Rating Rubric.\n\n## Metric Definition\nYou are an INFORMATION OVERLAP classifier providing the overlap of information between a response and reference.\n\n## Criteria\nGroundedness: The of information between a response generated by AI models and provided reference.\n\n## Rating Rubric\n5: (Fully grounded). The response and the reference are fully overlapped.\n4: (Mostly grounded). The response and the reference are mostly overlapped.\n3: (Somewhat grounded). The response and the reference are somewhat overlapped.\n2: (Poorly grounded). The response and the reference are slightly overlapped.\n1: (Not grounded). There is no overlap between the response and the reference.\n\n## Evaluation Steps\nSTEP 1: Assess the response in aspects of Groundedness. Identify any information in the response and provide assessment according to the Criteria.\nSTEP 2: Score based on the rating rubric. Give a brief rationale to explain your evaluation considering Groundedness.\n\nReference: {{?reference}}\nResponse: {{?aicore_llm_completion}}\n\nBegin your evaluation by providing a short explanation. Be as unbiased as possible. After providing your explanation, please rate the response according to the rubric and outputs STRICTLY following this JSON format:\n\n{ \"explanation\": string, \"rating\": integer }\n\nOutput:\n","dataType":"numeric"}}}}
Loading