Data Structures & Relationships for European Parliament Intelligence
๐ Entity Models โข ๐ Data Relationships โข ๐ Schema Documentation
๐ Document Owner: CEO | ๐ Version: 1.1 | ๐
Last Updated:
2026-03-19 (UTC)
๐ Review Cycle: Quarterly | โฐ Next Review: 2026-06-19
This document defines the data structures and relationships used in the EU Parliament Monitor platform for news generation, storage, and delivery.
| Document | Focus | Description | Documentation Link |
|---|---|---|---|
| Architecture | ๐๏ธ Architecture | C4 model showing current system structure | View Source |
| Future Architecture | ๐๏ธ Architecture | C4 model showing future system structure | View Source |
| Mindmaps | ๐ง Concept | Current system component relationships | View Source |
| Future Mindmaps | ๐ง Concept | Future capability evolution | View Source |
| SWOT Analysis | ๐ผ Business | Current strategic assessment | View Source |
| Future SWOT Analysis | ๐ผ Business | Future strategic opportunities | View Source |
| Data Model | ๐ Data | Current data structures and relationships | View Source |
| Future Data Model | ๐ Data | Enhanced European Parliament data architecture | View Source |
| Flowcharts | ๐ Process | Current data processing workflows | View Source |
| Future Flowcharts | ๐ Process | Enhanced AI-driven workflows | View Source |
| State Diagrams | ๐ Behavior | Current system state transitions | View Source |
| Future State Diagrams | ๐ Behavior | Enhanced adaptive state transitions | View Source |
| Security Architecture | ๐ก๏ธ Security | Current security implementation | View Source |
| Future Security Architecture | ๐ก๏ธ Security | Security enhancement roadmap | View Source |
| Threat Model | ๐ฏ Security | STRIDE threat analysis | View Source |
| Classification | ๐ท๏ธ Governance | CIA classification & BCP | View Source |
| CRA Assessment | ๐ก๏ธ Compliance | Cyber Resilience Act | View Source |
| Workflows | โ๏ธ DevOps | CI/CD documentation | View Source |
| Future Workflows | ๐ DevOps | Planned CI/CD enhancements | View Source |
| Business Continuity Plan | ๐ Resilience | Recovery planning | View Source |
| Financial Security Plan | ๐ฐ Financial | Cost & security analysis | View Source |
| End-of-Life Strategy | ๐ฆ Lifecycle | Technology EOL planning | View Source |
| Unit Test Plan | ๐งช Testing | Unit testing strategy | View Source |
| E2E Test Plan | ๐ Testing | End-to-end testing | View Source |
| Performance Testing | โก Performance | Performance benchmarks | View Source |
| Security Policy | ๐ Security | Vulnerability reporting & security policy | View Source |
This data model aligns with Hack23 ISMS policies to ensure secure data handling, classification, and development practices:
| Policy | Relevance | Implementation in Data Model |
|---|---|---|
| Data Classification Policy | High | All data classified as Public (Level 1) per CLASSIFICATION.md. European Parliament data is publicly available open data. No PII or sensitive information processed. |
| Cryptography Policy | Medium | TLS 1.3 for data in transit from European Parliament API. At-rest encryption via GitHub repository storage. Planned SHA-256 hashes for data integrity verification in future generator updates. |
| Secure Development Policy | High | Planned schema validation for EP API responses and planned HTML sanitization (e.g., DOMPurify) in future generator/client updates. Input validation for external data where implemented. Git-based audit trail for all changes. |
ISO 27001:2022 Controls:
GDPR Compliance:
NIST CSF 2.0:
erDiagram
NEWS_ARTICLE ||--o{ METADATA : has
NEWS_ARTICLE ||--o{ SOURCE : references
NEWS_ARTICLE }o--|| ARTICLE_TYPE : "belongs to"
NEWS_ARTICLE }o--|| LANGUAGE : "written in"
PLENARY_SESSION ||--o{ NEWS_ARTICLE : "mentioned in"
COMMITTEE_MEETING ||--o{ NEWS_ARTICLE : "mentioned in"
PARLIAMENTARY_QUESTION ||--o{ NEWS_ARTICLE : "mentioned in"
DOCUMENT ||--o{ NEWS_ARTICLE : "referenced in"
NEWS_ARTICLE {
string slug PK "Unique article identifier"
string category "ArticleCategory enum value"
string language "en, sv, da, no, fi, de, fr, es, nl, ar, he, ja, ko, zh"
string date "Publication date string"
string title "Article title"
string subtitle "Article subtitle"
string content "Full HTML content"
int readTime "Estimated read time (minutes)"
array keywords "SEO keywords (optional)"
array sources "ArticleSource references (optional)"
}
METADATA {
string filename "Article filename"
string date "Publication date"
string slug "Article slug"
string lang "Language code"
string title "Article title"
string type "ArticleCategory value (optional)"
}
SOURCE {
string title "Source title"
string url "Source URL"
}
ARTICLE_TYPE {
string code PK "ArticleCategory enum value"
string perspective "ArticlePerspective (prospective, retrospective, real-time, analytical)"
string label_en "English label"
string label_de "German label"
string label_fr "French label"
}
LANGUAGE {
string code PK "ISO 639-1 code"
string name "Language name"
string direction "ltr or rtl"
}
PLENARY_SESSION {
string session_id PK "EP session identifier"
date session_date "Session date"
string title "Session title"
array agenda_items "Agenda item IDs"
}
COMMITTEE_MEETING {
string meeting_id PK "EP meeting identifier"
string committee_code "Committee code (LIBE, ECON, etc.)"
date meeting_date "Meeting date"
string title "Meeting title"
}
PARLIAMENTARY_QUESTION {
string question_id PK "EP question identifier"
date submission_date "Date submitted"
string question_type "Written, Oral, Priority"
string author_mep "MEP name"
}
DOCUMENT {
string document_id PK "EP document identifier"
string document_type "Report, Resolution, Opinion"
date publication_date "Publication date"
string title "Document title"
}
File Location: news/YYYY-MM-DD-{slug}-{lang}.html
HTML Structure:
<!DOCTYPE html>
<html lang="en" dir="ltr">
<head>
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Article Title - EU Parliament Monitor</title>
<!-- SEO Meta Tags -->
<meta name="description" content="Article subtitle" />
<meta name="keywords" content="european parliament, keyword1, keyword2" />
<meta name="author" content="EU Parliament Monitor" />
<meta name="publication-date" content="2026-03-01" />
<meta name="article-type" content="prospective" />
<meta name="language" content="en" />
<!-- Open Graph -->
<meta property="og:title" content="Article Title" />
<meta property="og:description" content="Article subtitle" />
<meta property="og:type" content="article" />
<meta
property="og:url"
content="https://euparliamentmonitor.com/news/2026-week-ahead-en.html"
/>
<!-- Schema.org structured data -->
<script type="application/ld+json">
{
"@context": "https://schema.org",
"@type": "NewsArticle",
"headline": "Article Title",
"description": "Article subtitle",
"datePublished": "2026-03-01T06:15:32Z",
"author": {
"@type": "Organization",
"name": "EU Parliament Monitor"
},
"publisher": {
"@type": "Organization",
"name": "EU Parliament Monitor",
"logo": {
"@type": "ImageObject",
"url": "https://euparliamentmonitor.com/logo.png"
}
}
}
</script>
<link rel="stylesheet" href="../styles.css" />
</head>
<body>
<article class="news-article">
<header>
<span class="article-type">Week Ahead</span>
<h1>Article Title</h1>
<p class="subtitle">Article subtitle</p>
<div class="meta">
<time datetime="2026-03-01">March 1, 2026</time>
<span class="read-time">5 min read</span>
</div>
</header>
<main class="content">
<!-- Generated HTML content -->
</main>
<footer>
<section class="sources">
<h3>Sources</h3>
<ul>
<li>
<a href="https://data.europarl.europa.eu/...">EP Source 1</a>
</li>
<li>
<a href="https://data.europarl.europa.eu/...">EP Source 2</a>
</li>
</ul>
</section>
<section class="languages">
<h3>Available Languages</h3>
<ul>
<li><a href="2026-week-ahead-de.html">Deutsch</a></li>
<li><a href="2026-week-ahead-fr.html">Franรงais</a></li>
</ul>
</section>
</footer>
</article>
</body>
</html>
File Location: articles-metadata.json
TypeScript Interface (NewsMetadataDatabase):
{
"lastUpdated": "2026-03-01T06:15:32Z",
"articles": [
{
"filename": "2026-03-01-week-ahead-en.html",
"date": "2026-03-01",
"slug": "week-ahead",
"lang": "en",
"title": "Week Ahead: European Parliament March Session",
"type": "week-ahead"
},
{
"filename": "2026-03-01-week-ahead-de.html",
"date": "2026-03-01",
"slug": "week-ahead",
"lang": "de",
"title": "Woche Voraus: Europรคisches Parlament Mรคrzsitzung",
"type": "week-ahead"
}
]
}
File Location: src/types/index.ts (ArticleCategory enum)
{
"article_types": [
{
"code": "week-ahead",
"perspective": "prospective",
"labels": {
"en": "Week Ahead",
"sv": "Vecka Framรฅt",
"da": "Ugen Fremover",
"no": "Uken Fremover",
"fi": "Viikko Eteenpรคin",
"de": "Woche Voraus",
"fr": "Semaine ร Venir",
"es": "Semana Prรณxima",
"nl": "Week Vooruit",
"ar": "ุงูุฃุณุจูุน ุงููุงุฏู
",
"he": "ืืฉืืืข ืืงืจืื",
"ja": "ไป้ฑใฎๅฑๆ",
"ko": "์ฃผ๊ฐ ์ ๋ง",
"zh": "ไธๅจๅฑๆ"
},
"description": "Preview of upcoming parliamentary events and committee meetings"
},
{
"code": "committee-reports",
"perspective": "retrospective",
"labels": {
"en": "Committee Reports",
"de": "Ausschussberichte",
"fr": "Rapports de Commission"
},
"description": "Analysis of committee activities and decisions"
},
{
"code": "breaking",
"perspective": "real-time",
"labels": {
"en": "Breaking News",
"de": "Eilmeldung",
"fr": "Derniรจres Nouvelles"
},
"description": "Rapid-response coverage of significant developments"
},
{
"code": "deep-analysis",
"perspective": "analytical",
"labels": {
"en": "Deep Analysis",
"de": "Tiefenanalyse",
"fr": "Analyse Approfondie"
},
"description": "Multi-perspective deep dive analysis"
}
]
}
File Location: src/constants/languages.ts
{
"languages": [
{
"code": "en",
"name": "English",
"native_name": "English",
"direction": "ltr",
"flag": "๐ฌ๐ง"
},
{
"code": "sv",
"name": "Swedish",
"native_name": "Svenska",
"direction": "ltr",
"flag": "๐ธ๐ช"
},
{
"code": "de",
"name": "German",
"native_name": "Deutsch",
"direction": "ltr",
"flag": "๐ฉ๐ช"
},
{
"code": "fr",
"name": "French",
"native_name": "Franรงais",
"direction": "ltr",
"flag": "๐ซ๐ท"
},
{
"code": "ar",
"name": "Arabic",
"native_name": "ุงูุนุฑุจูุฉ",
"direction": "rtl",
"flag": "๐ธ๐ฆ"
},
{
"code": "he",
"name": "Hebrew",
"native_name": "ืขืืจืืช",
"direction": "rtl",
"flag": "๐ฎ๐ฑ"
}
],
"language_groups": {
"eu-core": ["en", "de", "fr", "es", "nl"],
"nordic": ["en", "sv", "da", "no", "fi"],
"all": [
"en", "sv", "da", "no", "fi",
"de", "fr", "es", "nl",
"ar", "he", "ja", "ko", "zh"
]
}
}
EP API Endpoint:
https://data.europarl.europa.eu/api/v2/sessions/{session_id}
{
"session_id": "PS-2026-03-01",
"session_date": "2026-03-01",
"session_type": "Plenary",
"title": "March 2026 Plenary Session I",
"location": "Strasbourg",
"agenda": [
{
"item_id": "AGI-2026-03-001",
"order": 1,
"title": "Commission statement: European Green Deal progress",
"speaker": "European Commission",
"duration_minutes": 60,
"voting_required": false
},
{
"item_id": "AGI-2026-03-002",
"order": 2,
"title": "Vote: Digital Services Act amendments",
"rapporteur": "MEP Name",
"duration_minutes": 30,
"voting_required": true
}
],
"attendees": 705,
"status": "scheduled"
}
EP API Endpoint:
https://data.europarl.europa.eu/api/v2/committees/{committee_code}/meetings/{meeting_id}
{
"meeting_id": "LIBE-2026-02-25",
"committee_code": "LIBE",
"committee_name": "Committee on Civil Liberties, Justice and Home Affairs",
"meeting_date": "2026-02-25",
"meeting_time": "14:00:00",
"location": "Brussels",
"agenda": [
{
"item_id": "LIBE-AGI-001",
"title": "Artificial Intelligence Act implementation review",
"type": "Discussion",
"documents": ["DOC-2026-001", "DOC-2026-002"]
}
],
"chair": "MEP Name",
"status": "completed"
}
EP API Endpoint:
https://data.europarl.europa.eu/api/v2/questions/{question_id}
{
"question_id": "PQ-2026-000123",
"question_type": "Written",
"priority": false,
"submission_date": "2026-02-20",
"author": {
"mep_id": "MEP-12345",
"name": "MEP Name",
"political_group": "EPP",
"country": "Germany"
},
"addressee": "European Commission",
"subject": "Implementation of GDPR enforcement",
"question_text": "What measures is the Commission taking to...",
"answer": {
"answer_date": "2026-03-05",
"answer_text": "The Commission has undertaken the following actions...",
"answered_by": "Commissioner Name"
},
"languages": ["en", "de"]
}
EP API Endpoint:
https://data.europarl.europa.eu/api/v2/documents/{document_id}
{
"document_id": "DOC-2026-001",
"document_type": "Report",
"title": "Report on the implementation of the Digital Services Act",
"publication_date": "2026-02-15",
"rapporteur": {
"mep_id": "MEP-67890",
"name": "MEP Name",
"political_group": "S&D"
},
"committee": "LIBE",
"procedure": "INI",
"languages": ["en", "de", "fr", "es", "it"],
"documents": [
{
"language": "en",
"format": "PDF",
"url": "https://data.europarl.europa.eu/documents/DOC-2026-001-EN.pdf"
}
],
"status": "published"
}
erDiagram
MEP ||--o{ COMMITTEE_MEMBERSHIP : "serves on"
MEP ||--o{ VOTING_RECORD : "casts"
MEP ||--o{ PARLIAMENTARY_QUESTION : "authors"
MEP }o--|| POLITICAL_GROUP : "belongs to"
MEP }o--|| COUNTRY : "represents"
MEP }o--|| NATIONAL_PARTY : "member of"
POLITICAL_GROUP ||--o{ MEP : "has members"
COUNTRY ||--o{ MEP : "has representatives"
COMMITTEE ||--o{ COMMITTEE_MEMBERSHIP : "has members"
MEP {
string id PK "MEP-xxxxx"
string name "Full name"
string email "Contact email"
string photoUrl "Photo URL"
date termStart "Term start date"
date termEnd "Term end date"
boolean active "Active status"
}
POLITICAL_GROUP {
string code PK "PPE, S&D, Renew, Greens/EFA, ECR, etc."
string name "Full group name"
string abbreviation "Short name"
int memberCount "Number of MEPs"
string politicalOrientation "Left, Center, Right"
}
COUNTRY {
string code PK "ISO 3166-1 alpha-2"
string name "Country name"
int seatCount "EP seats allocated"
string region "EU region"
}
NATIONAL_PARTY {
string id PK "Party identifier"
string name "Party name"
string country FK "Country code"
string europeanAffiliation FK "Political group code"
}
COMMITTEE_MEMBERSHIP {
string mepId FK
string committeeCode FK
string role "Member, Chair, Vice-Chair"
date joinDate
date leaveDate
}
VOTING_RECORD {
string id PK
string mepId FK
string documentReference
string vote "FOR, AGAINST, ABSTAIN"
date voteDate
string sessionId
}
PARLIAMENTARY_QUESTION {
string questionId PK
string authorMepId FK
string questionType "Written, Oral, Priority"
date submissionDate
string subject
string addressee
}
erDiagram
MCP_SERVER ||--o{ MCP_TOOL : "provides"
MCP_TOOL ||--o{ API_ENDPOINT : "calls"
API_ENDPOINT }o--|| EP_API : "endpoint of"
MCP_TOOL ||--o{ TOOL_RESPONSE : "returns"
TOOL_RESPONSE ||--o{ CACHED_RESPONSE : "cached as"
NEWS_GENERATOR ||--o{ MCP_CLIENT : "uses"
MCP_CLIENT ||--o{ MCP_TOOL : "invokes"
MCP_CLIENT ||--o{ RESPONSE_VALIDATOR : "validates with"
MCP_SERVER {
string version "1.2.3"
string connectionType "stdio, SSE"
string status "running, stopped"
datetime lastHealthCheck
}
MCP_TOOL {
string name PK "get_meps, get_plenary_sessions"
string description "Tool description"
json inputSchema "JSON Schema for parameters"
json outputSchema "JSON Schema for response"
string endpoint FK "EP API endpoint"
}
API_ENDPOINT {
string url PK "https://data.europarl.europa.eu/..."
string method "GET, POST"
json parameters "Query parameters"
int rateLimitPerMinute
int cacheTTL "Seconds"
}
EP_API {
string baseUrl "https://data.europarl.europa.eu"
string version "v2"
string authentication "None (public API; field reserved for future use such as API key, OAuth)"
boolean requiresAuth "false for current EP MCP; reserved for future use"
}
TOOL_RESPONSE {
string id PK
string toolName FK
json data "Response data"
datetime timestamp
string dataHash "SHA-256 hash"
int statusCode
}
CACHED_RESPONSE {
string cacheKey PK
string toolName FK
json cachedData
datetime cachedAt
datetime expiresAt
int hitCount
}
MCP_CLIENT {
string clientId PK
string version
string connectionType
int timeoutSeconds
int retryAttempts
}
RESPONSE_VALIDATOR {
string toolName FK
json schema "JSON Schema"
array requiredFields
boolean strictMode
}
NEWS_GENERATOR {
string version
string mode "daily, manual"
array supportedLanguages
}
erDiagram
ARTICLE ||--o{ TRANSLATION : "has"
TRANSLATION }o--|| LANGUAGE : "written in"
ARTICLE ||--o{ ARTICLE_METADATA : "has"
TRANSLATION ||--o{ SEO_METADATA : "has"
LANGUAGE ||--o{ TRANSLATION : "used for"
LANGUAGE ||--o{ INDEX_PAGE : "has"
ARTICLE {
string slug PK "2026-01-01-week-ahead"
string category "ArticleCategory enum value"
datetime generatedAt
string commitSha "Git commit hash"
array sourceIds "EP data source IDs"
}
TRANSLATION {
string id PK
string articleSlug FK
string languageCode FK
string title "Translated title"
string subtitle "Translated subtitle"
string contentHtml "Full HTML content"
int wordCount
int readTimeMinutes
array keywords
}
LANGUAGE {
string code PK "ISO 639-1"
string name "Native language name"
string flag "Flag emoji"
string direction "ltr or rtl"
string preset "all, eu-core, nordic"
}
ARTICLE_METADATA {
string articleSlug FK
string generatorVersion
string workflowRunId
string mcpServerVersion
json sources "Array of source data"
json statistics "Word counts, read times"
}
SEO_METADATA {
string translationId FK
string metaDescription
array metaKeywords
string ogTitle "Open Graph title"
string ogDescription
string ogImage
string canonicalUrl
array hreflangLinks
}
INDEX_PAGE {
string languageCode FK
string filename "index-{lang}.html"
array articleList "Ordered article references"
datetime lastUpdated
int articleCount
}
erDiagram
SITEMAP ||--o{ SITEMAP_ENTRY : "contains"
SITEMAP_ENTRY }o--|| TRANSLATION : "references"
SITEMAP_ENTRY ||--o{ HREFLANG_LINK : "has"
INDEX_PAGE ||--o{ INDEX_ENTRY : "lists"
INDEX_ENTRY }o--|| TRANSLATION : "links to"
SITEMAP {
string filename "sitemap.xml"
datetime lastModified
int urlCount
string xmlns "XML namespace"
}
SITEMAP_ENTRY {
string loc PK "Full URL"
datetime lastmod "Last modified"
string changefreq "always, daily, weekly"
float priority "0.0 to 1.0"
string translationId FK
}
HREFLANG_LINK {
string sourceUrl FK
string targetUrl "Alternate language URL"
string hreflang "Language code or x-default"
string rel "alternate"
}
INDEX_PAGE {
string languageCode PK
string filename "index-{lang}.html"
string title "Page title"
string metaDescription
datetime lastUpdated
}
INDEX_ENTRY {
string indexLanguage FK
string articleUrl "Relative URL"
string articleTitle
string articleSubtitle
string articleType
date publicationDate
int displayOrder
}
TRANSLATION {
string id PK
string articleSlug
string languageCode
string title
string filename
}
flowchart TB
subgraph "European Parliament"
EP_API["European Parliament<br/>Open Data API"]
EP_PLENARY["Plenary Sessions<br/>API Endpoint"]
EP_COMMITTEE["Committee Meetings<br/>API Endpoint"]
EP_MEP["MEPs Data<br/>API Endpoint"]
EP_DOCUMENTS["Documents<br/>API Endpoint"]
end
subgraph "MCP Server Layer"
MCP_SERVER["European Parliament<br/>MCP Server"]
TOOL_GET_MEPS["Tool: get_meps"]
TOOL_PLENARY["Tool: get_plenary_sessions"]
TOOL_COMMITTEE["Tool: get_committee_info"]
TOOL_DOCUMENTS["Tool: search_documents"]
MCP_CACHE["LRU Response Cache<br/>TTL: 24h"]
end
subgraph "Generator Layer"
GENERATOR["News Generator<br/>TypeScript Script"]
MCP_CLIENT["MCP Client<br/>stdio connection"]
VALIDATOR["Schema Validator<br/>(Planned)"]
SANITIZER["HTML Sanitizer<br/>(Planned: DOMPurify)"]
end
subgraph "Template Layer"
TEMPLATE_ENGINE["Template Module<br/>src/templates/article-template.ts"]
TEMPLATE_WEEK["Article Template<br/>(TS-based)"]
TEMPLATE_COMMITTEE["Committee Reports Template<br/>(TS-based)"]
LANGUAGE_PROCESSOR["Multi-Language<br/>Processor"]
end
subgraph "Output Layer"
ARTICLE_HTML["Article HTML<br/>news/*.html"]
METADATA_JSON["Metadata JSON<br/>articles-metadata.json"]
INDEX_HTML["Index Pages<br/>index-*.html"]
SITEMAP_XML["sitemap.xml"]
end
subgraph "Deployment"
GIT_COMMIT["Git Commit<br/>& Push"]
GHA_DEPLOY["GitHub Actions<br/>Deploy Workflow"]
GH_PAGES["GitHub Pages<br/>Static Hosting"]
end
EP_API --> EP_PLENARY
EP_API --> EP_COMMITTEE
EP_API --> EP_MEP
EP_API --> EP_DOCUMENTS
EP_PLENARY -->|"HTTPS GET<br/>TLS 1.3"| MCP_SERVER
EP_COMMITTEE -->|"HTTPS GET<br/>TLS 1.3"| MCP_SERVER
EP_MEP -->|"HTTPS GET<br/>TLS 1.3"| MCP_SERVER
EP_DOCUMENTS -->|"HTTPS GET<br/>TLS 1.3"| MCP_SERVER
MCP_SERVER --> TOOL_GET_MEPS
MCP_SERVER --> TOOL_PLENARY
MCP_SERVER --> TOOL_COMMITTEE
MCP_SERVER --> TOOL_DOCUMENTS
TOOL_GET_MEPS --> MCP_CACHE
TOOL_PLENARY --> MCP_CACHE
TOOL_COMMITTEE --> MCP_CACHE
TOOL_DOCUMENTS --> MCP_CACHE
MCP_CACHE -->|"stdio protocol"| MCP_CLIENT
MCP_CLIENT --> GENERATOR
GENERATOR --> VALIDATOR
VALIDATOR -->|"Valid JSON"| SANITIZER
VALIDATOR -->|"Invalid"| GENERATOR
SANITIZER --> TEMPLATE_ENGINE
TEMPLATE_ENGINE --> TEMPLATE_WEEK
TEMPLATE_ENGINE --> TEMPLATE_COMMITTEE
TEMPLATE_WEEK --> LANGUAGE_PROCESSOR
TEMPLATE_COMMITTEE --> LANGUAGE_PROCESSOR
LANGUAGE_PROCESSOR -->|"14 languages"| ARTICLE_HTML
LANGUAGE_PROCESSOR --> METADATA_JSON
LANGUAGE_PROCESSOR --> INDEX_HTML
LANGUAGE_PROCESSOR --> SITEMAP_XML
ARTICLE_HTML --> GIT_COMMIT
METADATA_JSON --> GIT_COMMIT
INDEX_HTML --> GIT_COMMIT
SITEMAP_XML --> GIT_COMMIT
GIT_COMMIT -->|"Push triggers<br/>deploy workflow"| GHA_DEPLOY
GHA_DEPLOY -->|"Deploy to<br/>GitHub Pages"| GH_PAGES
style EP_API fill:#fff4e1
style MCP_SERVER fill:#e8f5e9
style GENERATOR fill:#e1f5ff
style TEMPLATE_ENGINE fill:#f3e5f5
style ARTICLE_HTML fill:#e3f2fd
style GH_PAGES fill:#e0f2f1
euparliamentmonitor/
โโโ news/ # Generated articles
โ โโโ 2026-01-01-week-ahead-en.html
โ โโโ 2026-01-01-week-ahead-de.html
โ โโโ 2026-01-01-week-ahead-fr.html
โ โโโ ...
โ
โโโ articles-metadata.json # News metadata database
โ
โโโ index-{lang}.html # Language-specific indexes
โ โโโ index.html
โ โโโ index-de.html
โ โโโ index-fr.html
โ
โโโ sitemap.xml # SEO sitemap
โโโ robots.txt # Crawler rules
โโโ styles.css # Global styles
โโโ favicon.ico # Site icon
flowchart LR
subgraph "External Sources"
EP_API[European Parliament<br/>Open Data API]
end
subgraph "MCP Layer"
MCP[MCP Server]
CACHE[Response Cache]
end
subgraph "Generation Layer"
CLIENT[MCP Client]
VALIDATE[Data Validator]
SANITIZE[HTML Sanitizer]
end
subgraph "Template Layer"
TEMPLATE[Article Template]
META[Metadata Generator]
HTML[HTML Builder]
end
subgraph "Storage Layer"
FS[File System]
ARTICLE[Article HTML]
METADATA[Metadata JSON]
end
EP_API -->|JSON Response| MCP
MCP --> CACHE
CACHE --> CLIENT
CLIENT --> VALIDATE
VALIDATE --> SANITIZE
SANITIZE --> TEMPLATE
TEMPLATE --> META
TEMPLATE --> HTML
HTML --> ARTICLE
META --> METADATA
ARTICLE --> FS
METADATA --> FS
style EP_API fill:#fff4e1
style MCP fill:#e8f5e9
style CLIENT fill:#e8f5e9
style VALIDATE fill:#e1f5ff
style SANITIZE fill:#e1f5ff
style TEMPLATE fill:#e8f5e9
style ARTICLE fill:#f0f0f0
style METADATA fill:#f0f0f0
flowchart LR
subgraph "Input"
ARTICLES[Generated Articles<br/>news/*.html]
end
subgraph "Scanner"
SCAN[File Scanner]
PARSE[Metadata Parser]
end
subgraph "Processor"
GROUP[Group by Language]
SORT[Sort by Date]
FILTER[Filter by Type]
end
subgraph "Generator"
TEMPLATE[Index Template]
HTML[HTML Builder]
end
subgraph "Output"
INDEX[index-{lang}.html]
end
ARTICLES --> SCAN
SCAN --> PARSE
PARSE --> GROUP
GROUP --> SORT
SORT --> FILTER
FILTER --> TEMPLATE
TEMPLATE --> HTML
HTML --> INDEX
style ARTICLES fill:#f0f0f0
style SCAN fill:#e8f5e9
style PARSE fill:#e8f5e9
style GROUP fill:#e1f5ff
style SORT fill:#e1f5ff
style FILTER fill:#e1f5ff
style TEMPLATE fill:#e8f5e9
style INDEX fill:#f0f0f0
| Data Type | Classification | Storage | Encryption |
|---|---|---|---|
| News Articles | Public | Git repository | At-rest (GitHub) |
| Metadata | Public | Git repository | At-rest (GitHub) |
| EP API Responses | Public | Ephemeral (runtime) | In-transit (TLS 1.3) |
| Generation Logs | Internal | GitHub Actions | At-rest (GitHub) |
All data in EU Parliament Monitor is classified according to CLASSIFICATION.md and the Hack23 ISMS Classification Policy:
| Data Type | Classification | Confidentiality | Integrity | Availability | Rationale |
|---|---|---|---|---|---|
| News Articles | Public (Level 1) | Public | Medium | Medium | Derived from public EP data, accuracy critical for democratic transparency |
| Generation Metadata | Public (Level 1) | Public | Medium | Low | Technical provenance data, publicly accessible |
| EP API Responses | Public (Level 1) | Public | Medium | Medium | Public European Parliament data, temporary runtime storage |
| MCP Tool Responses | Public (Level 1) | Public | Medium | Medium | Cached EP data, integrity critical |
| GitHub Actions Logs | Public (Level 1) | Public | Low | Low | Actions logs are visible to anyone with read access to this public repo and contain technical build details but no secrets |
PII Status: No User/Customer PII Collected
EU Parliament Monitor processes publicly available European Parliament data only. MEP names, affiliations, and official contact details are publicly available personal data about public officials in their official capacity:
Note: Publicly available personal data about public officials (MEP names, affiliations, official emails) processed in their official capacity is handled under GDPR Article 6 lawful basis (e.g., Art. 6(1)(e) public task and/or Art. 6(1)(f) legitimate interests). No special category data under Article 9 is processed. No user or private personal data is collected.
GDPR Article 5 Alignment:
| GDPR Principle | Implementation | Status |
|---|---|---|
| Art. 5(1)(a) - Lawfulness | Processing of publicly available personal data of MEPs from official EP sources under GDPR Art. 6 lawful basis (public task/legitimate interests); no user/customer personal data processed | โ Compliant |
| Art. 5(1)(b) - Purpose Limitation | Data used only for news generation about parliamentary activities | โ Compliant |
| Art. 5(1)(c) - Data Minimization | Only necessary public EP data collected, no excessive data | โ Compliant |
| Art. 5(1)(d) - Accuracy | EP data used as-is from official sources; planned schema validation and HTML sanitization to ensure accurate representation | โ Compliant |
| Art. 5(1)(e) - Storage Limitation | Articles immutable, no unnecessary retention, git history for audit | โ Compliant |
| Art. 5(1)(f) - Integrity & Confidentiality | TLS 1.3 encryption, SHA-256 hashes, GitHub encryption at rest | โ Compliant |
Control Statement: Information shall be classified in terms of legal requirements, value, criticality, and sensitivity to unauthorized disclosure or modification.
Implementation:
Classification Labels:
Handling Requirements:
Review Process:
Evidence:
| Control | Implementation | Purpose |
|---|---|---|
| TLS 1.3 Encryption | All EP API calls use HTTPS | Protect data in transit |
| At-Rest Encryption | GitHub repository encryption | Protect stored data |
| Schema Validation | Planned: JSON Schema validation for EP API responses | Prevent malformed data |
| HTML Sanitization | Planned: DOMPurify-based sanitization for rendered HTML | Prevent XSS attacks |
| Input Validation | Planned: Whitelist-based validation for all configurable inputs | Prevent injection attacks |
| SHA-256 Hashing | Planned: SHA-256 checksums for source data integrity | Detect data tampering |
| Git Audit Trail | Complete commit history | Track all changes |
| Immutable Articles | Articles never modified post-generation | Preserve integrity |
The EU Parliament Monitor data model has evolved through multiple phases to support enhanced functionality and multi-language content:
timeline
title Data Model Evolution Timeline
section v1.0 - Foundation (2026-Q1)
Basic Article Schema : Simple HTML generation
: Single language (English)
: Manual EP data entry
File Storage : Git repository
: Static HTML files
No Metadata : No generation tracking
section v1.1 - Multi-Language (2026-Q1)
14 Languages : en, sv, da, no, fi, de, fr, es, nl, ar, he, ja, ko, zh
: Language-specific index pages
: Hreflang SEO optimization
MCP Integration : European Parliament MCP Server
: Automated data fetching
: Tool-based API access
Generation Metadata : Provenance tracking
: Source data hashing
: Workflow run IDs
section v1.2 - Current (2026-Q1)
Enhanced ER Diagrams : MEP entity model
: MCP integration model
: Multi-language content model
: Sitemap & SEO model
ISMS Alignment : Data classification documented
: GDPR compliance verified
: ISO 27001 controls mapped
Data Flow : Comprehensive data flow diagrams
: European Parliament to GitHub Pages
section v2.0 - Future (2026-Q3)
Real-Time Updates : WebSocket data streams
: Live plenary session updates
: Instant breaking news
Enhanced Analytics : Article performance metrics
: Reader engagement tracking
: SEO optimization insights
AI-Driven Content : LLM-based content generation
: Automated fact-checking
: Sentiment analysis
Database Backend : PostgreSQL for metadata
: Elasticsearch for search
: Redis for caching
| Version | Release Date | Key Changes | Diagrams Added |
|---|---|---|---|
| v1.0 | 2026-02-01 | Initial release, basic article generation | 1 (Main ER diagram) |
| v1.1 | 2026-02-15 | Multi-language support, MCP integration | 2 (Article & Index generation flows) |
| v1.2 | 2026-02-20 | Enhanced diagrams, ISMS alignment, data security | 4 (MEP, MCP, Multi-language, Sitemap models) + 1 (EP data flow) |
| v2.0 | 2026-Q3 (Planned) | Real-time updates, database backend | TBD (Real-time state diagrams, DB schema) |
No breaking changes to date. All schema changes backward-compatible.
Planned enhancement: responses from the European Parliament API will be validated against JSON Schemas before processing:
MEP Data Schema:
{
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
"required": ["id", "name", "country", "politicalGroup"],
"properties": {
"id": { "type": "string", "pattern": "^MEP-[0-9]+$" },
"name": { "type": "string", "minLength": 1, "maxLength": 200 },
"country": { "type": "string", "pattern": "^[A-Z]{2}$" },
"party": { "type": "string", "maxLength": 200 },
"politicalGroup": { "type": "string", "enum": ["PPE", "S&D", "Renew", "Greens/EFA", "ID", "ECR", "The Left", "NI"] },
"committees": { "type": "array", "items": { "type": "string" } },
"email": { "type": "string", "format": "email" },
"photoUrl": { "type": "string", "format": "uri", "pattern": "^https://" }
}
}
Plenary Session Schema:
{
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
"required": ["session_id", "session_date", "title"],
"properties": {
"session_id": { "type": "string", "pattern": "^PS-[0-9]{4}-[0-9]{2}-[0-9]{2}$" },
"session_date": { "type": "string", "format": "date" },
"title": { "type": "string", "minLength": 5, "maxLength": 500 },
"location": { "type": "string", "enum": ["Strasbourg", "Brussels"] },
"agenda": { "type": "array", "items": { "type": "object" } },
"status": { "type": "string", "enum": ["scheduled", "ongoing", "completed", "cancelled"] }
}
}
Validation Process (Planned Enhancements):
Generated Article Validation (Planned Enhancements):
| Field | Validation Rule | Error Handling |
|---|---|---|
slug |
Alphanumeric + hyphens, max 100 chars | Planned: generation fails and alert is sent |
title |
Min 10 chars, max 200 chars | Planned: generation retries with adjusted prompt |
subtitle |
Min 20 chars, max 500 chars | Planned: optional, can be empty |
content_html |
Valid HTML5, no <script> tags |
Planned: HTML sanitization with DOMPurify |
language |
ISO 639-1 code, must be in supported list | Planned: generation fails for that language |
keywords |
Array of strings, max 10 keywords | Planned: truncated to 10 if exceeded |
read_time |
Integer >= 1, <= 60 minutes | Planned: calculated from word count |
Note: HTML sanitization via DOMPurify is a planned security enhancement. The current generator (
src/templates/article-template.ts) produces HTML from EP API data. The configuration below documents the intended future implementation.
Planned DOMPurify Configuration:
const clean = DOMPurify.sanitize(dirtyHtml, {
ALLOWED_TAGS: ['p', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'ul', 'ol', 'li', 'a', 'strong', 'em', 'blockquote', 'code', 'pre'],
ALLOWED_ATTR: ['href', 'title', 'class', 'id'],
ALLOWED_URI_REGEXP: /^https?:\/\/(data\.europarl\.europa\.eu|europarl\.europa\.eu|www\.europarl\.europa\.eu)\/.*/,
ALLOW_DATA_ATTR: false,
KEEP_CONTENT: true,
RETURN_DOM: false,
RETURN_DOM_FRAGMENT: false
});
Sanitization Rules:
href, title, class, id<script>, <style>, onclick, etc. removedPolicy: Once generated, articles are never modified.
Note: Source data hashing is a planned integrity enhancement. The metadata structure below shows the intended future implementation; SHA-256 hashing of EP/MCP responses is not yet implemented in the current generator code.
Planned Source Data Hashing Pattern:
const sourceHash = crypto.createHash('sha256')
.update(JSON.stringify(epApiResponse))
.digest('hex');
Metadata Storage:
{
"sources": [
{
"type": "plenary_session",
"id": "PS-2026-03-01",
"data_hash": "a1b2c3d4e5f6...",
"timestamp": "2026-03-01T06:00:00Z"
}
]
}
Integrity Verification (future):
Every change tracked:
github-actions[bot])Example Metadata:
{
"generator": {
"version": "1.0.0",
"commit_sha": "abc123def456...",
"workflow_run_id": "12345678",
"workflow_url": "https://github.com/Hack23/euparliamentmonitor/actions/runs/12345678"
}
}
Audit Capabilities:
git log shows complete historygit blame identifies when each line was addedgit diff shows exact changes between versionsDocument Status: Active
Next Review: 2026-05-24
Owner: Development Team, Hack23 AB