2026-04-03 · 14 min read

Web Content Capture Compared: 4 OpenClaw Skills, Head-to-Head

From WeChat articles to Feishu docs and Twitter threads: we benchmark markdown-proxy, agent-fetch, built-in WebFetch, and defuddle CLI on install friction, success rate, Markdown quality, special platforms, and ease of use.

From WeChat articles to Feishu docs, from Twitter to ordinary web pages—which tool should be your default "fetch this URL for my agent" assistant? As content creators and heavy AI users, we constantly juggle public-account hits, Feishu team wikis, viral tweets, and technical blogs. Yet when we paste a bare link into an LLM, results are often disappointing: anti-bot walls return empty pages, or we get noisy HTML stuffed with ads and navigation chrome.

In this column we benchmark four mainstream web-capture solutions—so you can pick what fits your workflow, not chase a mythical one-size-fits-all winner.

Contenders

Skill / toolAuthorPositioningInstall
markdown-proxyjoeseesunMulti-channel intelligent routingnpx skills add joeseesun/markdown-proxy
agent-fetchteng-linLightweight local fetchnpx agent-fetch
WebFetchOpenClaw (built-in)Baseline HTTP fetchNo install
defuddle CLIdefuddleArticle body extractionnpm install -g defuddle

What we score

Five editorial dimensions (more stars = better):

  1. Install friction — easier is better
  2. Success rate — broader real-world coverage
  3. Output quality — cleanliness of Markdown / main text
  4. Special platforms — WeChat articles, Feishu, Twitter/X, etc.
  5. Convenience — time-to-first-success for daily use

Deep dive

1. markdown-proxy ⭐⭐⭐⭐⭐

One-liner: the well-rounded pick—a "routing-aware" fetch stack.

The standout feature is intelligent routing: it classifies the URL first, then picks the best backend instead of hammering every site with the same strategy.

Incoming URL
├── WeChat public article → Playwright headless
├── Feishu doc            → Feishu API
├── YouTube               → Dedicated parser
└── Everything else       → r.jina.ai → defuddle.md → agent-fetch → defuddle CLI

Observed results:

ScenarioOutcomeLatency
WeChat article✅ Clean article + images8–12s
Feishu doc✅ API path, structure preserved3–5s
Twitter / X✅ Tweet text complete2–3s
Typical tech blog✅ Clean body, low ad noise1–2s

Pros: layered fallbacks; optional YAML front-matter (title, author, time); Playwright path for tough anti-bot pages; no third-party API key required for the happy path.

Cons: WeChat flow expects Python + Playwright setup; Feishu needs App ID / Secret configuration.

Best for: creators and researchers who routinely mix "special" Chinese-ecosystem URLs with normal sites.

2. agent-fetch ⭐⭐⭐⭐

One-liner: the local, privacy-first fallback when cloud helpers flake.

Runs entirely on your machine—ideal when you want predictable behavior without routing traffic through hosted extractors.

Observed results:

ScenarioOutcomeLatency
Tech blog✅ Clean body2–4s
News site✅ Usable extraction2–4s
WeChat article❌ Blocked by anti-bot
Feishu doc❌ Not reachable

Pros: local execution; no account wall; fits naturally as a downstream step in markdown-proxy's chain.

Cons: no magic for hard anti-bot or authenticated sessions; requires Node.

Best for: developers who mostly fetch public documentation and care about data locality.

3. WebFetch (OpenClaw built-in) ⭐⭐⭐

One-liner: the zero-install baseline—fast, but rarely "publication ready".

Ships with OpenClaw: great for a quick sanity check, not for polished article Markdown.

Observed results:

ScenarioOutcomeLatency
Tech blog⚠️ Ads / nav noise1–2s
News site⚠️ Sidebars bleed in1–2s
WeChat article❌ Blank or captcha
Twitter / X❌ Login wall

Pros: zero setup; very fast for simple pages.

Cons: inconsistent cleanliness; no answer for anti-bot or gated content; weaker main-text focus.

Best for: light users who only peek at pages occasionally.

4. defuddle CLI ⭐⭐⭐⭐

One-liner: minimalist CLI obsessed with main article text.

Excels at stripping chrome from HTML; pair it with a fetch layer that can actually retrieve the bytes.

Observed results:

ScenarioOutcomeLatency
Tech blog✅ Sharp body text1–3s
News article✅ Ads reduced1–3s
Complex layout⚠️ May drop formatting1–3s
Anti-bot page❌ Cannot bypass alone

Pros: strong readability-first extraction; selector hooks; CLI-friendly for batch pipelines.

Cons: images/links may need a second pass; not a full browser automation story; separate install.

Best for: editors batching articles who live in the terminal.

Scorecard

ToolInstallSuccessQualityPlatformsEaseTotal
markdown-proxy⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐24 / 25
agent-fetch⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐19 / 25
WebFetch⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐19 / 25
defuddle CLI⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐18 / 25

Which one should I install?

Closing thoughts

After a week of focused testing, our editorial call is clear: markdown-proxy currently offers the strongest all-around web capture story for mixed Chinese + global content. The routing-first design—pick the right engine per URL—scales far better than forcing every page through a single generic fetcher.

If you can only install one skill for "turn URLs into agent-ready Markdown", start there—then layer defuddle or agent-fetch when you need specialized pipelines. Always verify against the latest upstream README, terms of service, and your own compliance policies.

This article reflects editorial opinion. Tooling changes quickly; commands and scores are synthesized from community notes and hands-on trials, not a certified benchmark. Confirm install steps with the official repositories before running anything in production.

Third-party names (markdown-proxy, agent-fetch, defuddle, etc.) belong to their respective authors. Search GitHub for the latest docs.