EU Parliament Monitor — API Documentation - v0.9.24
    Preparing search index...

    Module Aggregator/Metadata/SeoBudgets

    Per-script SEO byte budgets and a script-aware clamp.

    Background. Google Search Central and Bing Webmaster Guidelines both document SERP snippet limits in pixels, not characters. Latin glyphs render at roughly half the pixel width of CJK glyphs, while Arabic/Hebrew letterforms sit between the two. A single length budget for <title> / <meta description> will always be wrong for at least one of the 14 publishing languages — typically over-truncating Latin copy and over-running CJK by a factor of two.

    This module provides:

    • classifyScript — three-way latin | cjk | rtl family classifier driven by the locale code (no glyph inspection — the BCP-47 language tag is authoritative because every publishing pipeline emits one full output per language).
    • SEO_BUDGETS — per-surface × per-script byte caps derived from the documented platform envelopes (Google ≤580 px title / ≤155 char description; Bing slightly more generous; Facebook ≤95 chars on og:title; Twitter ≤70 / ≤200; LinkedIn shares OG).
    • budgetFor — typed accessor returning the byte cap for a (lang, surface) pair, with a uniform fallback to the strictest Latin budget when the locale is unknown.
    • clampForBudget — script-aware truncator that prefers natural clause boundaries (CJK full-width punctuation, RTL sentence punctuation, Latin clause separators) before falling back to whitespace breaks. Returns the input verbatim when it already fits.

    Pure, leaf module. No I/O, no dependencies on other aggregator modules beyond the existing text-utils.ts clause-boundary vocabulary.

    Functions

    classifyScript
    budgetFor
    clampForBudget
    clampTitleForSurface

    Interfaces

    TitleSurfaceOptions

    Type Aliases

    ScriptFamily
    SeoSurface

    Variables

    ALL_SCRIPT_FAMILIES
    SEO_BUDGETS