Political Platforms Costa Rica 2026

A comprehensive platform for analyzing and comparing political party platforms for Costa Rica's 2026 elections.

Project Intent

This project aims to make political information more accessible to Costa Rican voters by:

Automated Analysis: Using AI to extract and structure information from lengthy government plan PDFs (100+ pages each)
Easy Comparison: Providing a clean, side-by-side comparison of up to 3 parties across 13 key policy categories
Transparency: Making all data and methodologies open source
Accessibility: Delivering a fast, mobile-friendly website with dark mode support
Non-partisan: Presenting factual information without editorial bias

The system processes official TSE documents through an LLM pipeline to generate structured summaries, making it easier for voters to understand where parties stand on issues that matter to them.

Project Structure

Elecciones2026/
├── data/              # Shared data (SQLite DB and PDFs)
│   ├── database.db    # Database with complete analyses
│   └── partidos/      # Government plan PDFs
├── pipeline/          # Python analysis pipeline
│   ├── src/           # Pipeline source code
│   ├── scripts/       # Download and processing scripts
│   ├── config/        # Category configuration
│   └── main.py        # Main CLI
└── web/               # Next.js application
    ├── app/           # Application pages
    ├── components/    # Reusable components
    ├── lib/           # Utilities and DB connection
    └── package.json   # Website dependencies

Analysis Pipeline (Python)

Installation

cd pipeline
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Configuration

Create a .env file with your OpenAI API key:

OPENAI_API_KEY=your-api-key-here

Usage

# Initialize database
python main.py init

# Process all parties
python main.py process

# View processing status
python main.py status

# Show specific party platform
python main.py show PLN

# Add new category (backfill)
python main.py backfill category_name

# Generate embeddings for semantic search (run after processing)
python scripts/generate_embeddings.py

Current Status

20 political parties downloaded
13 categories defined
260 complete analyses (20 parties × 13 categories)
~2,000 document pages extracted and embedded
Vector embeddings generated for semantic search
AI chat with RAG-based question answering

Website (Next.js)

Installation

cd web
bun install

Configuration

Create a .env file in the web/ directory with the following variables:

# Required for AI chat feature
OPENAI_API_KEY=your-openai-api-key

# Optional: PostHog analytics and feature flags
NEXT_PUBLIC_POSTHOG_KEY=your-posthog-key
NEXT_PUBLIC_POSTHOG_HOST=https://us.i.posthog.com

# Optional: Feature flag to enable/disable chat (default: enabled in dev)
NEXT_PUBLIC_CHAT_ENABLED=true

Note: Without OPENAI_API_KEY, the chat feature will not work, but all other features (party listings, comparisons, etc.) will function normally.

Development

# Start development server
bun run dev

# Build for production
bun run build

# Run production build
bun run start

The application will be available at http://localhost:3000 (or next available port).

Features

Home Page

Grid of cards for all political parties
Logo placeholders with party initials and colors
Links to full platform and comparison
Dark mode support with theme toggle

Party Detail Page

Complete party platform view
Accordion with 13 categories
Summary, key proposals, ideology position, and budget
Button to compare with other parties

Comparison Page

Select up to 3 parties
Side-by-side comparison of all categories
Filter by specific category
URL-based persistent state

AI-Powered Chat (New!)

Semantic Search with RAG: Ask natural language questions about party platforms
Multi-Party Support: Query specific parties or compare across all parties
Vector Search: Powered by OpenAI embeddings and sqlite-vec for accurate context retrieval
Streaming Responses: Real-time AI answers using GPT-4
Source Attribution: Responses include page numbers from official documents
Party Selector: Choose one or more parties to focus your questions
Example Questions: Quick-start prompts for common queries
Responsive UI: Full-screen mobile support with sliding sidebar

Technologies

Frontend & Build

Next.js 14 - React framework with App Router
Bun - Package manager and runtime
Tailwind CSS - Utility-first CSS
TypeScript - Static typing
Biome - Linter and formatter

Data & AI

better-sqlite3 - SQLite database access
sqlite-vec - Vector similarity search extension
OpenAI API - GPT-4 for chat, text-embedding-3-small for semantic search
Vercel AI SDK - Streaming chat responses
PostHog - Feature flags and analytics

UI Components

next-themes - Dark mode support
lucide-react - Icon library
react-markdown - Markdown rendering in chat

Architecture & Design Decisions

Three-Stage Pipeline

The project uses a multi-stage approach to process and serve political data:

Analysis Pipeline (Python):
- Downloads PDFs from TSE
- Extracts text from PDFs (PyMuPDF + OCR fallback)
- Processes documents through OpenAI's GPT-4
- Extracts structured data (summaries, proposals, ideology, budget)
- Stores results in SQLite database
- Run once, can backfill new categories
Embedding Generation (Python):
- Chunks document text using adaptive strategy (1500-3500 chars)
- Generates vector embeddings using OpenAI text-embedding-3-small
- Stores embeddings in SQLite with sqlite-vec extension
- Enables semantic search across all party documents
- Run once per document set
Web Application (Next.js):
- Static Pages: Reads from SQLite at build time, generates static HTML
- Dynamic Chat: Server-side API routes for real-time AI interactions
- Hybrid Rendering: Fast static pages + dynamic AI features
- Vector Search: Retrieval-Augmented Generation (RAG) for accurate answers

This separation allows expensive AI processing to happen once offline, while the website combines instant static pages with dynamic AI-powered features.

Hybrid Architecture: Static + Dynamic

The website combines static generation with dynamic features:

Static Pages (Party listings, comparisons, detail views):

Pre-rendered HTML loads instantly
All party data embedded at build time
Client-side interactivity (filtering, theme switching)
Can be deployed to static CDN

Dynamic Features (AI Chat):

Server-side API routes for real-time interactions
Streaming responses from OpenAI
Vector similarity search across documents
Requires Node.js runtime (Vercel, Railway, etc.)

Benefits:

Performance: Static pages load instantly, chat responds in real-time
Cost-Effective: Most traffic hits static pages, only chat uses compute
Reliability: Static pages always available, even if chat is down
Scalability: CDN handles static traffic, serverless handles chat bursts
User Experience: Best of both worlds - speed + intelligent features

Docker Deployment

The project includes production-ready Docker infrastructure:

Multi-stage build: Bun for dependencies → Node.js for build → nginx for serving
Security: Non-root user, minimal attack surface
Performance: Optimized caching, gzip compression, aggressive asset caching
Health checks: Built-in monitoring
CI/CD: GitHub Actions workflow for automated builds and deployment to GHCR

# Build and run locally
docker build -t elecciones2026 .
docker run -d -p 8080:8080 elecciones2026

# Or use GitHub Actions to deploy automatically on push to main

Configuration

Theme & Dark Mode

The site supports both light and dark themes with automatic system preference detection:

Default: Light theme (with system preference detection enabled)
Theme Toggle: Click the sun/moon icon in the header to switch themes
Powered by: next-themes for seamless theme switching
Persistence: Theme preference is saved in localStorage
Configuration:
- Tailwind config: tailwind.config.ts (darkMode: 'class')
- Base styles: app/globals.css
- Theme provider: components/ThemeProvider.tsx
- Toggle button: components/ThemeToggle.tsx

All components support dark mode with Tailwind's dark: variant classes.

AI Chat Feature

The chat feature uses Retrieval-Augmented Generation (RAG) for accurate answers:

How it works:

User asks a question - e.g., "What does PLN propose for education?"
Query embedding - Question is converted to vector using OpenAI embeddings
Semantic search - sqlite-vec finds most relevant chunks from party documents
Context injection - Top 10 relevant chunks are added to GPT-4 prompt with page numbers
Streaming response - AI generates answer based only on retrieved context
Source attribution - Response includes page numbers for verification

Key Features:

Accurate: Only answers based on actual document content
Transparent: Includes page numbers from source PDFs
Fast: Vector search typically returns results in <100ms
Multi-party: Can search across specific parties or all parties
Context-aware: Maintains conversation history for follow-up questions

Configuration:

Controlled by PostHog feature flag chat-sidebar
Can be enabled/disabled per environment
Gracefully degrades if OpenAI API key is missing
All chat data is ephemeral (not stored in database)

Party Colors

Party colors are configured in lib/party-colors.ts. Currently uses placeholders with different Tailwind colors.

Static Site Generation (SSG)

Pages are statically generated at build time
To update site after running pipeline: bun run build
Static output is in the out/ folder

Useful Commands

Linting and Formatting

cd web
bun run lint      # Check for errors
bun run format    # Format code
bun run check     # Lint and auto-format

Database

# View parties in DB
sqlite3 data/database.db "SELECT * FROM parties;"

# View statistics
sqlite3 data/database.db "SELECT COUNT(*) FROM party_positions;"

# Export data
sqlite3 data/database.db ".mode csv" ".output export.csv" "SELECT * FROM party_positions;"

Future Ideas & Enhancements

Completed ✅

Dark mode support with theme persistence
Party flag images
Docker deployment infrastructure
CI/CD pipeline with GitHub Actions
Responsive mobile design
AI-powered chat with semantic search (RAG)
Vector embeddings for document search
Multi-party comparison in chat
Real-time streaming responses
PostHog analytics and feature flags

In Progress 🚧

Performance monitoring and optimization
Error tracking and logging
Chat conversation analytics

Potential Enhancements

Content & Features:

Advanced Filters: Filter parties by ideology on home page
Social Sharing: Generate shareable comparison images
PDF Export: Download comparison as PDF
Historical Data: Compare with 2022 platforms
Candidate Profiles: Add presidential candidate information
Chat Improvements: Save conversations, share chat links, export Q&A

Technical:

SEO: Enhanced metadata, Open Graph tags, structured data
PWA: Progressive Web App with offline support (cache static pages)
Internationalization: English translation
Testing: E2E tests with Playwright
Rate Limiting: Protect chat API from abuse
Caching: Redis cache for frequently asked questions

Analysis Pipeline:

More Categories: Environmental policy, foreign policy, etc.
Improved Embeddings: Fine-tune chunking strategy based on chat usage
Fact Checking: Cross-reference proposals with budget data
Timeline Extraction: Extract implementation timelines
Cost Analysis: Detailed budget breakdown per proposal
Multi-language Support: Generate embeddings for English translations

Data Source

Data comes from the Tribunal Supremo de Elecciones (TSE) of Costa Rica:

URL: https://www.tse.go.cr/2026/planesgobierno.html
Last updated: November 2025

Data Access & Structure

All data is publicly available for exploration, analysis, and building upon:

Database Structure

The SQLite database (data/database.db) contains:

Core Tables:

parties - Basic party information (name, abbreviation, colors)
categories - Policy categories (economy, health, education, etc.)
party_positions - Analysis results linking parties to categories with:
- summary - Brief overview of party's position
- key_proposals - JSON array of specific proposals
- ideology_position - Ideological classification
- budget_priority - Budget allocation level

Document & Embedding Tables:

documents - PDF metadata for each party
document_text - Extracted text from each PDF page
document_embeddings - Vector embeddings for semantic search
- Uses sqlite-vec virtual table for vector similarity
- 1536-dimensional embeddings from text-embedding-3-small
- Adaptive chunking strategy (1500-3500 chars per chunk)
- Cosine distance for similarity scoring

Search Indexes:

party_positions_fts - Full-text search index (FTS5) for party positions
vec_document_embeddings - Vector index for semantic similarity

Raw PDFs:

Original government plans: data/partidos/*.pdf
Direct downloads from TSE website

Accessing the Data

# Query all party positions
sqlite3 data/database.db "SELECT * FROM party_positions;"

# Export to CSV
sqlite3 data/database.db <<EOF
.mode csv
.headers on
.output party_data.csv
SELECT
  p.name as party_name,
  c.name as category,
  pp.summary,
  pp.ideology_position,
  pp.budget_priority
FROM party_positions pp
JOIN parties p ON pp.party_id = p.id
JOIN categories c ON pp.category_id = c.id;
EOF

Use Cases

The structured data enables:

Academic research on political platforms
Data journalism and visualization
Comparative policy analysis
Machine learning on political text
Alternative interfaces and tools

All data processing code is open source in the pipeline/ directory.

Contributing

Contributions are welcome! Here's how you can help:

Adding New Categories

# 1. Add category to config
vim pipeline/config/categories.yaml

# 2. Backfill analysis for all parties
cd pipeline
python main.py backfill new_category_name

# 3. Rebuild website
cd ../web
bun run build

Improving Analysis

Enhance prompts in pipeline/src/analyzer.py
Add validation rules
Improve data extraction

Website Enhancements

Fix bugs or add features
Improve mobile experience
Add visualizations
Enhance accessibility

Reporting Issues

Found incorrect data or analysis? Please open an issue with:

Party name and category
What's incorrect
Link to source in PDF (page number)

Pull requests are appreciated but please open an issue first to discuss major changes.

License

Public data from TSE Costa Rica. Project code is open source.

Name		Name	Last commit message	Last commit date
Latest commit History 128 Commits
.github/workflows		.github/workflows
.husky		.husky
.private-journal		.private-journal
backups		backups
data		data
pipeline		pipeline
web		web
.dockerignore		.dockerignore
.gitignore		.gitignore
DEPLOYMENT.md		DEPLOYMENT.md
IDEAS.md		IDEAS.md
IMPROVEMENTS.md		IMPROVEMENTS.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
RELEASING.md		RELEASING.md
SYSTEM_PROMPT.md		SYSTEM_PROMPT.md

Folders and files

Latest commit

History

Repository files navigation

Political Platforms Costa Rica 2026

Project Intent

Project Structure

Analysis Pipeline (Python)

Installation

Configuration

Usage

Current Status

Website (Next.js)

Installation

Configuration

Development

Features

Home Page

Party Detail Page

Comparison Page

AI-Powered Chat (New!)

Technologies

Frontend & Build

Data & AI

UI Components

Architecture & Design Decisions

Three-Stage Pipeline

Hybrid Architecture: Static + Dynamic

Docker Deployment

Configuration

Theme & Dark Mode

AI Chat Feature

Party Colors

Static Site Generation (SSG)

Useful Commands

Linting and Formatting

Database

Future Ideas & Enhancements

Completed ✅

In Progress 🚧

Potential Enhancements

Data Source

Data Access & Structure

Database Structure

Accessing the Data

Use Cases

Contributing

Adding New Categories

Improving Analysis

Website Enhancements

Reporting Issues

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 15

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages