Module 02: The Cast of Characters — EUDR Forest Analyzer Course

Meet Each Component by Reading Its Code

This tutorial walks you through the major components of the system, one at a time. By the end you will know what each piece does, where it lives, and how it connects to everything else.

Step 1

The Front Door — `app.py`

app.py is where the application starts. It creates the FastAPI application object, imports 13 routers, registers each one with app.include_router(), sets up CORS middleware, mounts static files for the frontend, and defines a startup event that initializes the database, alert scheduler, queue worker, and stuck analysis fixer.

Open the file and look at the top section:

          backend/app.py
# Router imports
from api.analysis import router as analysis_router
from api.auth import router as auth_router
from api.reports import router as reports_router
from api.dashboard import router as dashboard_router
from api.batch import router as batch_router
from api.queue import router as queue_router
from api.alerts import router as alerts_router
from api.suppliers import router as suppliers_router
from api.plots import router as plots_router
from api.organizations import router as organizations_router
from api.metrics import router as metrics_router
from api.supply_chain import router as supply_chain_router
from routes.enhanced_analysis import router as enhanced_analysis_router

# Each router is registered with a URL prefix
app.include_router(analysis_router, prefix="/api/analysis")
app.include_router(auth_router, prefix="/api/auth")
app.include_router(alerts_router, prefix="/api/alerts")
# ... and so on for all 13
        

Now look at the startup event:

          backend/app.py
@app.on_event("startup")
async def startup_event():
    # 1. Connect to PostgreSQL + PostGIS
    await init_database()

    # 2. Start the alert scheduler (periodic GLAD/RADD checks)
    start_alert_scheduler()

    # 3. Start the async analysis queue worker
    start_queue_worker()

    # 4. Start the stuck analysis fixer (auto-retries hung jobs)
    start_stuck_analysis_fixer()
        

What you should see: app.py is purely a wiring file. It does not contain business logic — it just connects all the pieces and starts background services.

Step 2

The File Processor

file_processor_simple.py is responsible for validating and parsing uploads. When a user submits a geospatial file, this component takes over. It accepts three file formats: GeoJSON, KML, and Shapefile ZIP.

One of its most important features is auto-repair. If an uploaded geometry is invalid (self-intersecting polygons, unclosed rings), the processor calls shapely.make_valid() to fix it silently rather than rejecting the upload.

The processor returns a PlotData object containing:

geometry — the validated/repaired GeoJSON geometry
feature_count — how many features were in the file
total_area_hectares — total area of all features
bounds — bounding box of the geometry

Key insight: The file processor is a gatekeeper. Nothing enters the system without passing through it, and it ensures every geometry is valid before analysis begins.

Step 3

The Forest Analyzer

forest_analyzer_with_alerts.py (45KB) is the brain of the entire system. This is the single largest piece of business logic and the component that turns raw data into EUDR compliance verdicts.

It combines four data sources into a single result:

GEE forest data — Hansen Global Forest Change for forest coverage percentage
GLAD alerts — optical deforestation detection (Landsat, 30m)
RADD alerts — radar deforestation detection (Sentinel-1, 10m)
Country risk — high-risk country list (BR, ID, CD, PE, CO, BO, VE, MY)

Thresholds are loaded from a configuration file rather than hardcoded, so they can be adjusted without changing code. The main entry point is:

async def analyze_plot(
    geometry_data,     # GeoJSON geometry
    plot_id,           # database ID
    check_alerts=True,# whether to query GLAD + RADD
    country_code=None # ISO country code for risk scoring
):
        

Key insight: The analyzer does not fetch its own data. It delegates to GLAD and RADD services, then combines their answers. This is the orchestrator, not the data layer.

Step 4

The Alert Services

The system has two alert services that work independently but produce compatible results:

GLAD (glad_alert_service.py, 35KB) — Uses the Hansen Global Forest Change dataset via Google Earth Engine. Resolution: 30m. Coverage: global, since 2001. This is optical (Landsat-based), which means clouds can block detection.

RADD (radd_alert_service.py) — Uses Sentinel-1 radar imagery via Google Earth Engine. Resolution: 10m. Coverage: tropical regions, since 2019. Because it uses radar, it works through clouds and at night.

Both services query GEE and return the same shape of data:

{
    "has_alerts": True,
    "alert_count": 12,
    "total_area_ha": 3.45,
    "loss_by_year": { "2021": 1.2, "2022": 2.25 }
}
        

Why two systems? GLAD has a longer track record and global coverage. RADD sees through clouds. When both detect deforestation on the same plot, confidence is highest. The analyzer cross-validates them.

Step 5

The Report Generator

report_generator.py (76KB) is the largest file in the entire codebase. It uses ReportLab to build multi-page PDF reports that serve as the primary compliance document.

A single report includes:

Compliance summary — verdict, risk level, key metrics
Satellite imagery — three time periods (2018 baseline, alert period, current)
NDVI maps — vegetation health change over time
GLAD/RADD comparison table — side-by-side alert analysis
Land cover classification — ESA WorldCover breakdown

Why is it 76KB? PDF generation is inherently verbose. Each page requires precise coordinate calculations, color definitions, font sizing, and layout logic. This file is large because it does a lot of visual work, not because it is poorly organized.

Working with Components

How to find where a feature is implemented

The codebase follows a consistent folder convention. Knowing the convention gets you to the right file within seconds.

For business logic (the actual work), look in backend/services/. This is where analysis, alerts, reports, and email are implemented.
For HTTP endpoints (what URLs exist, what parameters they accept), look in backend/api/. Each file here is a router that calls into services.
For data shapes (what fields an object has), look in backend/models/. You will find PlotData, AnalysisResult, ComplianceStatus, and others here.
For external data source integrations (GEE, Planetary Computer, CDSE), look in backend/providers/. Each provider implements the same interface.
For infrastructure (database connections, configuration), look in backend/utils/.

How to add a new API endpoint

Create your endpoint function in the appropriate file inside backend/api/. If your endpoint relates to analysis, add it to api/analysis.py. If it is a new domain, create a new file.
Add a decorator to register the route: @router.get("/your-path") or @router.post("/your-path").
If you created a new file, import its router in app.py and register it with app.include_router(your_router, prefix="/api/your-domain").
If you used an existing file, the router is already registered — your new endpoint is available immediately.

How to trace a bug through the layers

Start at the API layer. Find the endpoint in backend/api/ that handles the failing request. Check what parameters it receives and what service function it calls.
Follow into the service. Open the service function in backend/services/. Read the logic. Check what data it expects and what it returns.
Check the data model. Open the relevant class in backend/models/. Verify the field names and types match what the service produces and what the API returns.
Check the database query. If the bug involves stored data, look at the SQL in backend/utils/database.py or inline queries in the service. Verify column names match the model.
Add logging at each layer. Python's logging module is already imported in most files. Add logger.info() calls at each boundary to see where data changes shape unexpectedly.

Why This Structure?

Separation of Concerns

The codebase uses a layered architecture with four distinct layers:

API Layer

HTTP handling. Receives requests, validates input, calls services, formats responses. Lives in backend/api/.

Services Layer

Business logic. Does the actual work — analyzes forests, generates alerts, builds PDFs. Lives in backend/services/.

Models Layer

Data shapes. Defines what a plot, analysis result, or alert looks like. Lives in backend/models/.

Database Layer

Persistence. PostGIS connections, schema setup, queries. Lives in backend/utils/.

Think of it like a hospital. The reception desk (API layer) does not perform surgery — it checks your appointment and directs you to the right department. The surgical team (services layer) does the actual work, using standardized patient charts (models) and a records room (database). Each department has clear responsibilities, and mixing them would create chaos.

This separation means you can change how a PDF looks without touching the API endpoints. You can swap the database without rewriting business logic. Each layer has one job, and it does that job well.

The Services Layer Is Where the Weight Is

If you look at file sizes, the pattern is clear:

File	Size	Purpose
`report_generator.py`	76KB	PDF creation with satellite imagery, NDVI, land cover
`forest_analyzer_with_alerts.py`	45KB	Core compliance engine
`glad_alert_service.py`	35KB	Optical deforestation detection via GEE
`analysis_queue_worker.py`	28KB	Background job processing

These service files are the product. The API layer is a thin wrapper — most endpoint functions are under 30 lines. They validate input, call a service method, and return the result. All the domain knowledge, all the rules about what constitutes deforestation, all the logic for combining GLAD and RADD data — it lives in services.

This is deliberate. If you moved business logic into the API layer, you could not reuse it from the background queue worker. By keeping it in services, both the synchronous API and the async queue can call the same analyze_plot() method.

Why 13 Routers?

Each router handles one domain: analysis, authentication, reports, alerts, queue management, suppliers, plots, organizations, metrics, supply chain, batch processing, dashboard, and enhanced analysis (WebSocket-enabled).

This is standard FastAPI practice. Putting all endpoints in one file would create a 3000-line monster that is impossible to navigate. Separate routers mean:

Each file is focused and readable (typically 100-300 lines)
Different developers can work on different domains without merge conflicts
Tests can target one router at a time
URL prefixes are explicit: /api/analysis, /api/auth, /api/alerts

Component Directory

Key Files

File	Role	Size	Key Method
`app.py`	Entry point	~300 lines	`startup_event()`
`forest_analyzer_with_alerts.py`	Core analysis engine	45KB	`analyze_plot()`
`glad_alert_service.py`	Optical deforestation detection	35KB	`get_alerts_for_geometry_gee()`
`radd_alert_service.py`	Radar deforestation detection	—	`get_alerts_for_geometry()`
`report_generator.py`	PDF creation	76KB	`generate_compliance_report()`
`analysis_queue_worker.py`	Background job processor	28KB	`_worker_loop()`
`email_service.py`	Notifications	—	`send_analysis_complete_email()`
`file_processor_simple.py`	Upload parsing	—	`process_file()`

Folder Structure

Folder	Purpose
`backend/api/`	HTTP endpoints
`backend/services/`	Business logic
`backend/models/`	Data classes
`backend/providers/`	External data source adapters
`backend/utils/`	Infrastructure (DB, config)
`frontend/public/`	Browser UI (HTML, JS)

File Tree

backend/ Server application root

app.py Entry point, router registration, startup

api/ HTTP endpoints (13 routers)

analysis.py Upload and analyze

auth.py Login, register, tokens

reports.py PDF/Excel generation

alerts.py GLAD, RADD, combined

queue.py Async job management

dashboard.py Analytics summary

batch.py Batch processing

suppliers.py Supplier CRUD

plots.py Plot CRUD

organizations.py Organization CRUD

metrics.py Prometheus endpoint

supply_chain.py Supplier relationships

services/ Business logic (the heavy lifting)

forest_analyzer_with_alerts.py Core compliance engine (45KB)

glad_alert_service.py Optical alerts via GEE (35KB)

radd_alert_service.py Radar alerts via GEE

report_generator.py PDF reports (76KB)

analysis_queue_worker.py Background queue (28KB)

file_processor_simple.py Upload validation

email_service.py SMTP notifications

sentinel_imagery_service.py Satellite imagery + NDVI

land_cover_service.py ESA WorldCover

biodiversity_service.py KBA/EBA/IBA/GBIF

alert_scheduler.py Periodic alert checks

stuck_analysis_fixer.py Auto-retry hung jobs

models/ Data classes

__init__.py PlotData, AnalysisResult, ComplianceStatus

alerts.py DeforestationAlert, AlertSubscription

user.py User model with subscription tiers

queue.py AnalysisQueueItem, QueueStatus

providers/ External data source adapters

base.py Abstract interface

factory.py Provider factory

gee_provider.py Google Earth Engine

planetary_provider.py Microsoft Planetary Computer

cdse_provider.py Copernicus Data Space

utils/ Infrastructure

database.py PostgreSQL + PostGIS connection

routes/ WebSocket-enabled routes

enhanced_analysis.py Batch analysis with real-time progress

frontend/public/ Browser UI

index.html Main application page

admin.html Admin dashboard

dashboard.html User dashboard