Spatial Distribution of Council Decisions

Mining locational data via OParl from the City of Augsburg Council Information System

Published

December 12, 2025

Abstract

This project demonstrates the spatial distribution of council decisions in Augsburg, Germany. By extracting location references from council papers (OParl data) and matching them against a street gazetteer, we identify where council activities are geographically focused. The study combines natural language processing (fuzzy string matching) with interactive mapping to visualize the political negotiation of urban space.

Project Goal

Making spatial information in council records visible enables better understanding of political priorities, equity in resource allocation, and patterns in urban development discourse.

Introduction

Background

Municipal councils shape urban development through formal decisions documented in official records. Understanding where political attention is concentrated provides insights into urban governance priorities and spatial inequalities.

In Augsburg (population ~295,000), the city council approves development plans (Bebauungspläne), building permits, and infrastructure projects. These are documented in the OParl API, a standardized platform for publishing municipal legislative information.

Research Question

What geographically relevant information, such as development plans and infrastructure projects in Augsburg, was discussed politically in the council minutes?

Objectives

  1. Develop a reproducible pipeline to extract location references from council papers
  2. Create a street gazetteer for Augsburg with geocoded coordinates
  3. Match council papers to geographic locations using fuzzy string matching
  4. Visualize results on an interactive map with temporal metadata

Study Area

Augsburg, Bavaria, Germany - Population: ~295,000 (city proper) - Metropolitan area: ~800,000 - Historical context: Roman city (2,000 years), medieval trading hub, industrial heritage - Administrative structure: City Council (Stadtrat) with 60 members - Data source: ALLRIS OParl API

Analysis limited to streets and locations within Augsburg city boundaries. A gazetteer of street names and locaitons was extracted from OpenStreetMap (Overpass API) for matching against council papers.

Data and Methods

Data Sources

OParl Council Records

  • Source: ALLRIS OParl API (Augsburg municipal system)
  • Content: Council papers (Drucksachen), agendas, decisions
  • Format: JSON API responses with full-text PDF access
  • Coverage: sample limited to 10 papers for this analysis
  • Metadata: Paper ID, title, date, main file (PDF), URL references

OpenStreetMap Street Data

  • Source: Overpass API query for ways tagged with ["highway"]["name"] within Augsburg bounding box
  • Processing: 12,641 street segments extracted → 3,120 unique street names after deduplication
  • Data quality: Coordinates taken from segment midpoints (representative, not exact boundaries)

PDF Text Extraction

  • Tool: pdftools R package (wrapper around pdftotext)
  • Filter: PDFs ≤10 pages (to avoid overly large documents)
  • Sample: 8 papers downloaded and processed

Methods

1. Gazetteer

Creation of the Location Database (Script: 01_gazetteer.R)

1: Bounding Box Retrieval - Query Nominatim API with city name and country - Extract bounding box coordinates (south, north, west, east)

2: Street Data Acquisition - Query Overpass API for all highway features with names within bounding box - OverpassQL query: way["highway"]["name"](bbox) - Response format: GeoJSON with street names and center coordinates

3: Deduplication - Group by street name, retain first occurrence - Remove duplicates while preserving geographic coordinates - Sort alphabetically for consistency

4: Output - CSV with columns: street_name, latitude, longitude - Used as reference database for subsequent location matching

2. PDF

Download and Metadata (Script: 02_fetch_papers.R)

1: OParl Navigation - Fetch System → Body → Paper endpoints - Extract paper metadata: title, date, main file URL

2: PDF Download - Iterate through papers, fetch mainFile via HTTP - Check file size constraints (max 10 pages via pdf_info) - Store in data/pdfs/ with paper ID as filename

3: Metadata Logging - Record: paper_id, paper_title, paper_date, pdf_url, pdf_path, page_count - Export to CSV for downstream processing

3. Location Extraction

NLP and Fuzzy Matching (Script: 03_extract_locations.R)

1: Text Extraction - Use pdftools::pdf_text() to extract full text from each PDF - Concatenate all pages into single string

2: Fuzzy String Matching - Split extracted text into words - For each street in gazetteer, compute Levenshtein distance to text words - Threshold: distance ≤ 0.2 (80% similarity match) - Captures variations: “Maximilianstr.” → “Maximilianstraße”

3: Match Quality Assessment - Calculate match score: 1 - (distance / street_name_length) - Retain only matches with score ≥ 0.8 - Log all matches with metadata

4: Output - CSV: paper_id, paper_title, paper_date, street_name, latitude, longitude, match_score, pdf_url

4. Visualization

Interactive Mapping (Script: 04_visualize.R)

1: Map Creation - Use Leaflet JS library (via R htmlwidgets) - Base layer: OpenStreetMap + CartoDB Dark/Light options - Center: Median of all extracted coordinates

2: Markers and Clusters - Circle markers for each location match - Size: Fixed 8px radius - Color: Red (#f44336) with 0.7 opacity - Clustering: Markercluster plugin (maxClusterRadius=50)

3: Popups - Interactive popups with: - Street name - Paper title (truncated to 100 chars) - Paper date - Match score (85-100%) - Clickable link to PDF

4: Output - HTML file: figures/interactive_map.html - Self-contained or with assets folder (depending on pandoc availability) - Browser-compatible: all modern browsers supported

Results

Interactive Map

Code
# Lade die extrahierten Locations
if (file.exists("data/extracted_locations.csv")) {
  locations <- read.csv("data/extracted_locations.csv",
                        stringsAsFactors = FALSE,
                        fileEncoding = "UTF-8")

  # Filter: Nur Matches ≥ 85%
  min_match_score <- 0.85
  locations_filtered <- locations[locations$match_score >= min_match_score, ]

  if (nrow(locations_filtered) > 0) {
    # Center berechnen
    center_lat <- median(locations_filtered$latitude, na.rm = TRUE)
    center_lon <- median(locations_filtered$longitude, na.rm = TRUE)

    # Popup erstellen
    locations_filtered <- locations_filtered %>%
      mutate(
        popup_text = sprintf(
          "<div style='font-family: Arial, sans-serif; max-width: 300px;'>
            <h4 style='margin: 0 0 10px 0; color: #d32f2f;'>%s</h4>
            <p style='margin: 5px 0;'><strong>Paper:</strong><br>%s</p>
            <p style='margin: 5px 0;'><strong>Date:</strong> %s</p>
            <p style='margin: 5px 0;'><strong>Match:</strong> %.0f%%</p>
            <a href='%s' target='_blank' style='display: inline-block; margin-top: 10px; padding: 5px 10px; background: #2c3e50; color: white; text-decoration: none; border-radius: 3px;'>
              View PDF
            </a>
          </div>",
          street_name,
          substr(paper_title, 1, 100),
          ifelse(is.na(paper_date), "n/a", paper_date),
          match_score * 100,
          pdf_url
        )
      )

    # Leaflet Karte
    leaflet(locations_filtered) %>%
      addTiles(group = "OpenStreetMap") %>%
      addProviderTiles(providers$CartoDB.Positron, group = "Light") %>%
      addProviderTiles(providers$CartoDB.DarkMatter, group = "Dark") %>%
      setView(lng = center_lon, lat = center_lat, zoom = 13) %>%
      addCircleMarkers(
        lng = ~longitude,
        lat = ~latitude,
        popup = ~popup_text,
        radius = 8,
        color = "#d32f2f",
        fillColor = "#f44336",
        fillOpacity = 0.7,
        stroke = TRUE,
        weight = 2,
        clusterOptions = markerClusterOptions(
          maxClusterRadius = 50,
          spiderfyOnMaxZoom = TRUE
        )
      ) %>%
      addLayersControl(
        baseGroups = c("OpenStreetMap", "Light", "Dark"),
        options = layersControlOptions(collapsed = FALSE)
      ) %>%
      addControl(
        html = sprintf(
          "<div style='background: white; padding: 10px; border-radius: 5px; box-shadow: 0 2px 5px rgba(0,0,0,0.2);'>
            <h3 style='margin: 0 0 5px 0;'>Augsburg Policy Map</h3>
            <p style='margin: 0; font-size: 14px;'>%d Locations from %d Papers<br>(Match ≥ 85%%)</p>
          </div>",
          nrow(locations_filtered),
          length(unique(locations_filtered$paper_id))
        ),
        position = "topright"
      )
  } else {
    cat("::: {.callout-warning}\nNo locations with ≥85% match found in the current dataset.\n:::")
  }
} else {
  cat("::: {.callout-important}\nData file `data/extracted_locations.csv` not found. Please run the extraction script.\n:::")
}

Karteninteraktion: - Zoomen: Scroll oder +/- Buttons - Verschieben: Klick & Drag - Cluster: Klick auf Cluster zum Aufteilen - Layer: Wechsel zwischen OpenStreetMap, Hell, Dunkel - Popups: Klick auf Marker für Details und PDF-Link

Discussion & Outlook

Strengths & Limitations

  1. Reproducibility: All steps documented and automated via scripts
  2. Transparency: Fuzzy matching parameters explicit and tunable
  3. Data availability: OParl (partially) and OpenStreetMap are public, reusable for other cities
  4. Scalability: Theoretically scalable for hundreds of papers
  5. PDF text quality: Scanned PDFs without OCR will fail (mitigation: OCR not yet implemented)
  6. Fuzzy matching and Gazetteer: Misses location references. And adds irrelevant information. Better fine-tuning required.
  7. Size and clarity: This pilot uses only 8 papers; full analysis 100+ for statistical power come with compromises and considerations for the application. Adjustment of the layout, e.g. drop-down menu.

Interpretation

Due to the existing challenges, it is difficult to draw a final conclusion. Focusing on meta data together with scaling could be one option. However, thanks to technological advances, working with political data at the local level certainly holds potential for human geography, for example in fields such as urban development.

Future Work

A few ideas and examples: Scale up: Process all 2,000+ available papers, Temporal analysis: Map locations over time (heatmaps, animations), OCR integration: Handle scanned PDFs, Network analysis: Connect related papers and locations, Comparative study: Replicate methodology in other German cities, Policy outcomes: Link council discussions to actual development outcomes and NLP enhancement: Use named entity recognition (NER) instead of fuzzy matching.

I see the greatest potential in the connection with Linked Open Data, both for scientific purposes and for the benefit of citizens. Keywords here are RFG and GeoSPARQL. 

Conclusion

This study demonstrates a practical pipeline for extracting and visualizing geographic information embedded in municipal council documents. By combining OParl APIs, OpenStreetMap data, and fuzzy string matching, we create a reproducible workflow for understanding spatial dimensions of urban governance.

The resulting interactive map could provide city planners, researchers, and citizens with a novel tool for analyzing where and when policy discussions occur in the future. While this pilot analysis is limited in scope, the methodology is scalable and adaptable to other municipalities.

Key takeaway: Making spatial information in council records visible enables better understanding of political priorities, equity in resource allocation, and patterns in urban development discourse.