AgentsMédio

Site Crawler Skill

pormindmorass·mindmorass· v1.0.0 · atualizado em 2026-04-11

Score

Crawl and extract content from websites

web-crawlingcontent-extractionrag-pipelinedata-ingestionsite-scrapingdocument-processinginformation-retrieval

Linguagens

Python

0Stars

0Forks

0Usos

Fork

Documento do Skill

SKILL.mdsite-crawler/workflow

Identify Target Website: — Determine the base URL and scope of the website to be crawled.

Check Robots.txt: — Respectfully parse the robots.txt file to identify disallowed paths.

Discover URLs: — Use sitemaps and initial URLs to build a queue of pages to crawl.

Crawl Pages: — Fetch each page, respecting rate limits, and extract content.

Extract Content: — Use trafilatura and BeautifulSoup to extract the main content, headings, and metadata.

Convert to Markdown: — Convert the extracted content to markdown format for RAG ingestion.

Store Results: — Save the extracted content and metadata for use in a RAG pipeline.

Telemetria de Agentes

Execuções

total

Taxa de Sucesso

últimos 30d

Latência Média

0.0s

p50

Alucinação

0.0%

detecção

Tokens Entrada

avg 0/exec

Tokens Saída

avg 0/exec

Uso por Plataforma

Skills Relacionados

Similar aByted Web Search

60%

Árvore do Skill

Site Crawler Skill

site-crawler

Fases Cognitivas4

1.SENSE

2.CONTEXTUALIZE

3.ACT

4.REFLECT

Triggers8

crawl a website for contentextract content from a URLscrape a website for RAGingest data from a websitecrawl documentation sitesextract structured content from a websiteharvest web content for RAGcrawl a site and extract markdown

Avaliar este Skill

Score Breakdown

⭐Avaliação Humana0%

🤖Sucesso de Agentes0%

🕐Atualidade100%

🔗Saúde de Dependências100%

🕸️Centralidade no Grafo0%

🛡️Segurança50%

CompositeScore = α·Humano + β·Agente + γ·Recência + δ·Deps + ε·Centralidade + ζ·Segurança

Instalação

$ synaptic mcp download site-crawler

$ synaptic skills detail site-crawler

$ synaptic skills live site-crawler

Dependências

httpx beautifulsoup4 lxml trafilatura markdownify

Links

GitHub Repository