AgentsAvançado

Evaluation Methods for Claude Code Agents

porneolabhq·neolabhq· v1.0.0 · atualizado em 2026-04-10

Score

Evaluate and improve Claude Code commands, skills, and agents. Use when testing prompt effectiveness, validating context engineering choices, or measuring improvement quality.

llm-evaluationagent-testingprompt-validationquality-assuranceevaluation-metricsbias-mitigationllm-as-judge

Linguagens

TypeScriptJavaScriptPythonJavaC#

0Stars

0Forks

0Usos

Fork

Documento do Skill

SKILL.mdcustomaize-agent-agent-evaluation/workflow

1. Define evaluation criteria (instruction following, completeness, tool efficiency, reasoning quality, coherence).

2. Create test cases covering simple, medium, complex, and edge case scenarios.

3. Run direct scoring evaluation using LLM-as-judge with chain-of-thought justification.

4. Mitigate position bias using techniques like position swapping.

5. Perform human evaluation to catch edge cases and subtle misunderstandings.

6. Analyze evaluation results to identify areas for improvement.

7. Iterate on prompts, context, and agent architecture based on evaluation feedback.

Telemetria de Agentes

Execuções

total

Taxa de Sucesso

últimos 30d

Latência Média

0.0s

p50

Alucinação

0.0%

detecção

Tokens Entrada

avg 0/exec

Tokens Saída

avg 0/exec

Uso por Plataforma

Skills Relacionados

Similar aSteve Jobs Perspective

60%

Similar aAmazon Working Backwards

60%

Similar aRoundtable Discussion

60%

Árvore do Skill

Evaluation Methods for Claude Code Agents

customaize-agent-agent-evaluation

Fases Cognitivas5

1.SENSE

2.CONTEXTUALIZE

3.EVALUATE

4.RECOMMEND

5.REFLECT

Triggers8

evaluate Claude Code agent performanceimprove agent skills and commandstest prompt effectivenessvalidate context engineering choicesmeasure agent improvement qualityassess agent outputrun regression tests on agentscompare agent performance across different prompts

Avaliar este Skill

Score Breakdown

⭐Avaliação Humana0%

🤖Sucesso de Agentes0%

🕐Atualidade100%

🔗Saúde de Dependências100%

🕸️Centralidade no Grafo0%

🛡️Segurança50%

CompositeScore = α·Humano + β·Agente + γ·Recência + δ·Deps + ε·Centralidade + ζ·Segurança

Instalação

$ synaptic mcp download customaize-agent-agent-evaluation

$ synaptic skills detail customaize-agent-agent-evaluation

$ synaptic skills live customaize-agent-agent-evaluation

Links

GitHub Repository