Skip to content

Conversation

@Tomo1912
Copy link

@Tomo1912 Tomo1912 commented Jan 5, 2026

Description

Add optional distill parameter to the fetch server that aggressively cleans HTML content to minimize token usage. When enabled, removes scripts, styles, navigation, headers, footers, ads, and other non-essential elements before conversion to markdown. Achieves 72.8% average token reduction across real-world tests.

Server Details

  • Server: fetch
  • Changes to: tools (added distill parameter to fetch tool)

Motivation and Context

LLM token costs are a significant operational expense. Current web-fetch returns full HTML including navigation menus, ads, scripts, and UI clutter - wasting tokens on non-content elements.

Test Results:

Website Standard Tokens Distilled Tokens Reduction
MCP Docs 2,154 13 99.4%
TechCrunch 506 263 48.0%
Python Docs 662 629 5.0%
Average 1,107 302 72.8%

How Has This Been Tested?

  • Tested with Claude Desktop as MCP client
  • Tested against multiple real-world websites (documentation sites, news sites, technical docs)
  • Verified backward compatibility - existing calls without distill parameter work unchanged

Breaking Changes

None. The distill parameter defaults to False, maintaining full backward compatibility.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation update

Checklist

  • I have read the MCP Protocol Documentation
  • My changes follows MCP security best practices
  • I have updated the server's README accordingly
  • I have tested this with an LLM client
  • My code follows the repository's style guidelines
  • New and existing tests pass locally
  • I have added appropriate error handling
  • I have documented all environment variables and configuration options

Additional context

No new dependencies required - uses existing readabilipy for content extraction.

@Tomo1912 Tomo1912 force-pushed the feature/distill-token-optimization branch 5 times, most recently from 4322b3e to a3cff79 Compare January 8, 2026 21:40
Add distill parameter to aggressively clean HTML before processing:
- Remove scripts, styles, navigation, headers, footers
- Remove ads, sidebars, popups, cookie banners
- Remove social widgets and non-content elements
- Normalize whitespace

Typical token reduction: 60-85%

This is an opt-in feature (distill=false by default) to maintain
backward compatibility.
@Tomo1912 Tomo1912 force-pushed the feature/distill-token-optimization branch from a3cff79 to 98c0dd3 Compare January 8, 2026 22:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant