Synaptic SkillsSynapticSkills
MarketplaceSkill GraphCriar SkillMCP ServerPlataformaEnterprise
v0.1.0-beta
Voltar ao Marketplace
DevelopmentAvançado

Evaluation-Driven Development for Python LLM Applications

porgithub·github· v1.0.0 · atualizado em 2026-04-10
83
Score

Add instrumentation, build golden datasets, write eval-based tests, run them, root-cause failures, and iterate — Ensure your Python LLM application works correctly. Make sure to use this skill whenever a user is developing, testing, QA-ing, evaluating, or benchmarking a Python project that calls an LLM. Use for making sure an LLM application works correctly, catching regressions after prompt changes, fixing unexpected behavior, or validating output quality before shipping.

llm-testingeval-driven-developmentpythonquality-assurancellm-integrationtest-automationprompt-engineering
0Stars
0Forks
0Usos
Fork

Documento do Skill

SKILL.mdeval-driven-dev/workflow
1
Startup checks: — Upgrade the `pixie-qa` package using the appropriate package manager.
2
Project setup: — Create a `pixie_qa` directory at the project root to store all generated files.
3
Code analysis: — Read the user's code to identify the core LLM-calling function and its dependencies.
4
Run harness creation: — Build a script to invoke the LLM-calling function and capture traces.
5
Trace capture: — Run the app with representative inputs to generate real traces.
6
Test creation: — Write eval-based tests to assess LLM response quality and agent routing decisions.
7
Dataset creation: — Build golden datasets to compare against the LLM's output.
8
Test execution: — Run the tests and analyze the results.

Telemetria de Agentes

Execuções
0
total
Taxa de Sucesso
0%
últimos 30d
Latência Média
0.0s
p50
Alucinação
0.0%
detecção
Tokens Entrada
0
avg 0/exec
Tokens Saída
0
avg 0/exec

Uso por Plataforma

Skills Relacionados

Similar aAPI Test Generator
60%
Hebbian Synapse
Composite0.600
w = 0.3·α + 0.5·β + 0.2·γ
88
Similar aTest Data Generator
60%
Hebbian Synapse
Composite0.600
w = 0.3·α + 0.5·β + 0.2·γ
83
Similar aAutoresearch: Autonomous Iterative Experimentation
60%
Hebbian Synapse
Composite0.600
w = 0.3·α + 0.5·β + 0.2·γ
83
Co-executedTest Data Generator
49%
Hebbian Synapse
Composite0.488
w = 0.3·α + 0.5·β + 0.2·γ
83

Árvore do Skill

Evaluation-Driven Development for Python LLM Applications
eval-driven-dev
Fases Cognitivas6
1.SENSE
2.CONTEXTUALIZE
3.HYPOTHESIZE
4.EVALUATE
5.ACT
6.REFLECT
Triggers8
set up evals for my LLM applicationadd tests to my Python LLM projectcreate a QA pipeline for my LLM appdebug why my LLM application is failingimprove the quality of my LLM applicationfix failing tests in my LLM projectbenchmark my LLM applicationevaluate my LLM application

Avaliar este Skill

Score Breakdown

⭐Avaliação Humana0%
🤖Sucesso de Agentes0%
🕐Atualidade100%
🔗Saúde de Dependências100%
🕸️Centralidade no Grafo0%
🛡️Segurança48%
CompositeScore = α·Humano + β·Agente + γ·Recência + δ·Deps + ε·Centralidade + ζ·Segurança

Instalação

$ synaptic mcp download eval-driven-dev
$ synaptic skills detail eval-driven-dev
$ synaptic skills live eval-driven-dev

Dependências

pixie-qa

Links

GitHub Repository