Synaptic SkillsSynapticSkills
MarketplaceSkill GraphCriar SkillMCP ServerPlataformaEnterprise
v0.1.0-beta
Voltar ao Marketplace
AgentsAvançado

Evaluation Methods for Claude Code Agents

porneolabhq·neolabhq· v1.0.0 · atualizado em 2026-04-10
80
Score

Evaluate and improve Claude Code commands, skills, and agents. Use when testing prompt effectiveness, validating context engineering choices, or measuring improvement quality.

llm-evaluationagent-testingprompt-validationquality-assuranceevaluation-metricsbias-mitigationllm-as-judge
Linguagens
TypeScriptJavaScriptPythonJavaC#
0Stars
0Forks
0Usos
Fork

Documento do Skill

SKILL.mdcustomaize-agent-agent-evaluation/workflow
1. Define evaluation criteria (instruction following, completeness, tool efficiency, reasoning quality, coherence).
2. Create test cases covering simple, medium, complex, and edge case scenarios.
3. Run direct scoring evaluation using LLM-as-judge with chain-of-thought justification.
4. Mitigate position bias using techniques like position swapping.
5. Perform human evaluation to catch edge cases and subtle misunderstandings.
6. Analyze evaluation results to identify areas for improvement.
7. Iterate on prompts, context, and agent architecture based on evaluation feedback.

Telemetria de Agentes

Execuções
0
total
Taxa de Sucesso
0%
últimos 30d
Latência Média
0.0s
p50
Alucinação
0.0%
detecção
Tokens Entrada
0
avg 0/exec
Tokens Saída
0
avg 0/exec

Uso por Plataforma

Skills Relacionados

Similar aSteve Jobs Perspective
60%
Hebbian Synapse
Composite0.600
w = 0.3·α + 0.5·β + 0.2·γ
82
Similar aAmazon Working Backwards
60%
Hebbian Synapse
Composite0.600
w = 0.3·α + 0.5·β + 0.2·γ
83
Similar aRoundtable Discussion
60%
Hebbian Synapse
Composite0.600
w = 0.3·α + 0.5·β + 0.2·γ
79

Árvore do Skill

Evaluation Methods for Claude Code Agents
customaize-agent-agent-evaluation
Fases Cognitivas5
1.SENSE
2.CONTEXTUALIZE
3.EVALUATE
4.RECOMMEND
5.REFLECT
Triggers8
evaluate Claude Code agent performanceimprove agent skills and commandstest prompt effectivenessvalidate context engineering choicesmeasure agent improvement qualityassess agent outputrun regression tests on agentscompare agent performance across different prompts

Avaliar este Skill

Score Breakdown

⭐Avaliação Humana0%
🤖Sucesso de Agentes0%
🕐Atualidade100%
🔗Saúde de Dependências100%
🕸️Centralidade no Grafo0%
🛡️Segurança50%
CompositeScore = α·Humano + β·Agente + γ·Recência + δ·Deps + ε·Centralidade + ζ·Segurança

Instalação

$ synaptic mcp download customaize-agent-agent-evaluation
$ synaptic skills detail customaize-agent-agent-evaluation
$ synaptic skills live customaize-agent-agent-evaluation

Links

GitHub Repository