10th Parliament· 154 sittings on record · 30,475 speeches · latest 10 June 2026

Methodology

Data & Sources — how Politick works

This page is generated from the pipeline's own metadata — coverage, models and freshness — so it cannot drift from reality. Politick is built from machine OCR, translation and AI tagging of public documents; it will have errors, and they are shown here, not hidden.

The pipeline

  1. 1Collectparliament.lk Hansard PDFs, polled every 6h
  2. 2OCRTesseract sin + tam + eng, 300 DPI
  3. 3TranslateGPT-5 → unified English Markdown
  4. 4Identifyspeaker → MP roster matching
  5. 5EnrichAI summaries + topic tags (labelled)
  6. 6Publishdatabase · search · this site

Coverage

154sittings
30,475speeches
2,052debates
21,786AI summaries
36,939topic tags
225MPs

10th Parliament · 21 November 2024 to 10 June 2026. 2 sitting-date(s) published upstream are not yet transcribed. Dead-letter queue: 0 open of 0 ever recorded. Pipeline runs: 2 succeeded, last 22 June 2026.

StageDoneModel

Field coverage

What share of speeches carry each field — honest, generated. A 0% means the field isn't captured yet.

  • Speeches matched to an MP— the rest are role-only / procedural attributions 64.3%
  • Speeches with an AI summary 71.5%
  • Speeches with a topic tag 71.5%
  • Speeches with page/column anchors— not captured yet — the source PDF is the citable location (known gap) 0%

AI use

AI does disambiguation, segmentation, summarisation and topic tagging — and nothing else: no scoring, no prediction, no interpretation. Every AI output is labelled in the UI. The models behind each stage:

StageModel

Accuracy & verification

StageBenchmarkResult
OCRvs professional ground truth (validation)~1.8% char error
Translationvs professional translatorbenchmark pending
Speaker matchinghuman-reviewed samplebenchmark pending

The OCR figure is from the feasibility validation; the translation and speaker-matching benchmarks are not yet published — shown honestly as pending rather than estimated.

Dataset register

DatasetSourceCoverageCadenceLast updatedStatus
Hansard (speeches)parliament.lk154 sittings · 2024-11-21 to 2026-06-106h20 June 2026live
MP roster + profilesparliament.lk225 membersdaily22 June 2026live
Attendanceparliament.lk house-attendance8 sitting-days recordedper sitting17 June 2026live
Questions · Bills · Votes · Gazettes · Cabinet · Budgetplanned

Known gaps

  • OCR and translation are machine processes with stated error rates.
  • Page/column anchors are not yet captured — the source PDF is the citable location.
  • 10,872 speeches are role-only / procedural and not matched to an MP.
  • Original Sinhala/Tamil text is not yet displayed alongside the English.
  • Votes/divisions, questions and committee reports are not yet extracted.

Corrections

Anyone can suggest a correction or comment on any record. Submissions are public, reviewed against the source, and logged.

0 open 0 under review 0 resolved
Open the corrections log →