Set up the evaluation environment: — Ensure ADK is configured and the agent code is accessible.
2
Create or select an evalset: — Define test cases that cover the agent's core capabilities.
3
Configure evaluation metrics: — Choose appropriate metrics based on the evaluation goals (e.g., `tool_trajectory_avg_score`, `final_response_match_v2`).
4
Run the evaluation: — Execute the evaluation using `make eval` or `adk eval`.
5
Analyze the results: — Identify failing test cases and the corresponding metrics.
6
Diagnose the cause: — Investigate the agent's behavior and identify the root cause of the failures.
7
Implement fixes: — Adjust agent instructions, tool logic, or the evalset based on the diagnosis.
8
Re-run the evaluation: — Verify that the fixes have improved the scores.
evaluate my AI agentrun ADK evaluationdebug agent evaluation failuresimprove agent evaluation scorescreate an evalset for my agentconfigure ADK evaluationanalyze tool trajectory failuresfix agent hallucination issues