Meet Each Component by Reading Its Code

This tutorial walks you through the major components of the system, one at a time. By the end you will know what each piece does, where it lives, and how it connects to everything else.

Step 1

The Front Door — app.py

app.py is where the application starts. It creates the FastAPI application object, imports 13 routers, registers each one with app.include_router(), sets up CORS middleware, mounts static files for the frontend, and defines a startup event that initializes the database, alert scheduler, queue worker, and stuck analysis fixer.

Open the file and look at the top section:

backend/app.py # Router imports from api.analysis import router as analysis_router from api.auth import router as auth_router from api.reports import router as reports_router from api.dashboard import router as dashboard_router from api.batch import router as batch_router from api.queue import router as queue_router from api.alerts import router as alerts_router from api.suppliers import router as suppliers_router from api.plots import router as plots_router from api.organizations import router as organizations_router from api.metrics import router as metrics_router from api.supply_chain import router as supply_chain_router from routes.enhanced_analysis import router as enhanced_analysis_router # Each router is registered with a URL prefix app.include_router(analysis_router, prefix="/api/analysis") app.include_router(auth_router, prefix="/api/auth") app.include_router(alerts_router, prefix="/api/alerts") # ... and so on for all 13

Now look at the startup event:

backend/app.py @app.on_event("startup") async def startup_event(): # 1. Connect to PostgreSQL + PostGIS await init_database() # 2. Start the alert scheduler (periodic GLAD/RADD checks) start_alert_scheduler() # 3. Start the async analysis queue worker start_queue_worker() # 4. Start the stuck analysis fixer (auto-retries hung jobs) start_stuck_analysis_fixer()
What you should see: app.py is purely a wiring file. It does not contain business logic — it just connects all the pieces and starts background services.
Step 2

The File Processor

file_processor_simple.py is responsible for validating and parsing uploads. When a user submits a geospatial file, this component takes over. It accepts three file formats: GeoJSON, KML, and Shapefile ZIP.

One of its most important features is auto-repair. If an uploaded geometry is invalid (self-intersecting polygons, unclosed rings), the processor calls shapely.make_valid() to fix it silently rather than rejecting the upload.

The processor returns a PlotData object containing:

  • geometry — the validated/repaired GeoJSON geometry
  • feature_count — how many features were in the file
  • total_area_hectares — total area of all features
  • bounds — bounding box of the geometry
Key insight: The file processor is a gatekeeper. Nothing enters the system without passing through it, and it ensures every geometry is valid before analysis begins.
Step 3

The Forest Analyzer

forest_analyzer_with_alerts.py (45KB) is the brain of the entire system. This is the single largest piece of business logic and the component that turns raw data into EUDR compliance verdicts.

It combines four data sources into a single result:

  • GEE forest data — Hansen Global Forest Change for forest coverage percentage
  • GLAD alerts — optical deforestation detection (Landsat, 30m)
  • RADD alerts — radar deforestation detection (Sentinel-1, 10m)
  • Country risk — high-risk country list (BR, ID, CD, PE, CO, BO, VE, MY)

Thresholds are loaded from a configuration file rather than hardcoded, so they can be adjusted without changing code. The main entry point is:

async def analyze_plot( geometry_data, # GeoJSON geometry plot_id, # database ID check_alerts=True,# whether to query GLAD + RADD country_code=None # ISO country code for risk scoring ):
Key insight: The analyzer does not fetch its own data. It delegates to GLAD and RADD services, then combines their answers. This is the orchestrator, not the data layer.
Step 4

The Alert Services

The system has two alert services that work independently but produce compatible results:

GLAD (glad_alert_service.py, 35KB) — Uses the Hansen Global Forest Change dataset via Google Earth Engine. Resolution: 30m. Coverage: global, since 2001. This is optical (Landsat-based), which means clouds can block detection.

RADD (radd_alert_service.py) — Uses Sentinel-1 radar imagery via Google Earth Engine. Resolution: 10m. Coverage: tropical regions, since 2019. Because it uses radar, it works through clouds and at night.

Both services query GEE and return the same shape of data:

{ "has_alerts": True, "alert_count": 12, "total_area_ha": 3.45, "loss_by_year": { "2021": 1.2, "2022": 2.25 } }
Why two systems? GLAD has a longer track record and global coverage. RADD sees through clouds. When both detect deforestation on the same plot, confidence is highest. The analyzer cross-validates them.
Step 5

The Report Generator

report_generator.py (76KB) is the largest file in the entire codebase. It uses ReportLab to build multi-page PDF reports that serve as the primary compliance document.

A single report includes:

  • Compliance summary — verdict, risk level, key metrics
  • Satellite imagery — three time periods (2018 baseline, alert period, current)
  • NDVI maps — vegetation health change over time
  • GLAD/RADD comparison table — side-by-side alert analysis
  • Land cover classification — ESA WorldCover breakdown
Why is it 76KB? PDF generation is inherently verbose. Each page requires precise coordinate calculations, color definitions, font sizing, and layout logic. This file is large because it does a lot of visual work, not because it is poorly organized.

Working with Components

How to find where a feature is implemented

The codebase follows a consistent folder convention. Knowing the convention gets you to the right file within seconds.

  1. For business logic (the actual work), look in backend/services/. This is where analysis, alerts, reports, and email are implemented.
  2. For HTTP endpoints (what URLs exist, what parameters they accept), look in backend/api/. Each file here is a router that calls into services.
  3. For data shapes (what fields an object has), look in backend/models/. You will find PlotData, AnalysisResult, ComplianceStatus, and others here.
  4. For external data source integrations (GEE, Planetary Computer, CDSE), look in backend/providers/. Each provider implements the same interface.
  5. For infrastructure (database connections, configuration), look in backend/utils/.

How to add a new API endpoint

  1. Create your endpoint function in the appropriate file inside backend/api/. If your endpoint relates to analysis, add it to api/analysis.py. If it is a new domain, create a new file.
  2. Add a decorator to register the route: @router.get("/your-path") or @router.post("/your-path").
  3. If you created a new file, import its router in app.py and register it with app.include_router(your_router, prefix="/api/your-domain").
  4. If you used an existing file, the router is already registered — your new endpoint is available immediately.

How to trace a bug through the layers

  1. Start at the API layer. Find the endpoint in backend/api/ that handles the failing request. Check what parameters it receives and what service function it calls.
  2. Follow into the service. Open the service function in backend/services/. Read the logic. Check what data it expects and what it returns.
  3. Check the data model. Open the relevant class in backend/models/. Verify the field names and types match what the service produces and what the API returns.
  4. Check the database query. If the bug involves stored data, look at the SQL in backend/utils/database.py or inline queries in the service. Verify column names match the model.
  5. Add logging at each layer. Python's logging module is already imported in most files. Add logger.info() calls at each boundary to see where data changes shape unexpectedly.

Why This Structure?

Separation of Concerns

The codebase uses a layered architecture with four distinct layers:

API Layer

HTTP handling. Receives requests, validates input, calls services, formats responses. Lives in backend/api/.

Services Layer

Business logic. Does the actual work — analyzes forests, generates alerts, builds PDFs. Lives in backend/services/.

Models Layer

Data shapes. Defines what a plot, analysis result, or alert looks like. Lives in backend/models/.

Database Layer

Persistence. PostGIS connections, schema setup, queries. Lives in backend/utils/.

Think of it like a hospital. The reception desk (API layer) does not perform surgery — it checks your appointment and directs you to the right department. The surgical team (services layer) does the actual work, using standardized patient charts (models) and a records room (database). Each department has clear responsibilities, and mixing them would create chaos.

This separation means you can change how a PDF looks without touching the API endpoints. You can swap the database without rewriting business logic. Each layer has one job, and it does that job well.

The Services Layer Is Where the Weight Is

If you look at file sizes, the pattern is clear:

FileSizePurpose
report_generator.py76KBPDF creation with satellite imagery, NDVI, land cover
forest_analyzer_with_alerts.py45KBCore compliance engine
glad_alert_service.py35KBOptical deforestation detection via GEE
analysis_queue_worker.py28KBBackground job processing

These service files are the product. The API layer is a thin wrapper — most endpoint functions are under 30 lines. They validate input, call a service method, and return the result. All the domain knowledge, all the rules about what constitutes deforestation, all the logic for combining GLAD and RADD data — it lives in services.

This is deliberate. If you moved business logic into the API layer, you could not reuse it from the background queue worker. By keeping it in services, both the synchronous API and the async queue can call the same analyze_plot() method.

Why 13 Routers?

Each router handles one domain: analysis, authentication, reports, alerts, queue management, suppliers, plots, organizations, metrics, supply chain, batch processing, dashboard, and enhanced analysis (WebSocket-enabled).

This is standard FastAPI practice. Putting all endpoints in one file would create a 3000-line monster that is impossible to navigate. Separate routers mean:

Component Directory

Key Files

FileRoleSizeKey Method
app.py Entry point ~300 lines startup_event()
forest_analyzer_with_alerts.py Core analysis engine 45KB analyze_plot()
glad_alert_service.py Optical deforestation detection 35KB get_alerts_for_geometry_gee()
radd_alert_service.py Radar deforestation detection get_alerts_for_geometry()
report_generator.py PDF creation 76KB generate_compliance_report()
analysis_queue_worker.py Background job processor 28KB _worker_loop()
email_service.py Notifications send_analysis_complete_email()
file_processor_simple.py Upload parsing process_file()

Folder Structure

FolderPurpose
backend/api/HTTP endpoints
backend/services/Business logic
backend/models/Data classes
backend/providers/External data source adapters
backend/utils/Infrastructure (DB, config)
frontend/public/Browser UI (HTML, JS)

File Tree

backend/ Server application root
app.py Entry point, router registration, startup
api/ HTTP endpoints (13 routers)
analysis.py Upload and analyze
auth.py Login, register, tokens
reports.py PDF/Excel generation
alerts.py GLAD, RADD, combined
queue.py Async job management
dashboard.py Analytics summary
batch.py Batch processing
suppliers.py Supplier CRUD
plots.py Plot CRUD
organizations.py Organization CRUD
metrics.py Prometheus endpoint
supply_chain.py Supplier relationships
services/ Business logic (the heavy lifting)
forest_analyzer_with_alerts.py Core compliance engine (45KB)
glad_alert_service.py Optical alerts via GEE (35KB)
radd_alert_service.py Radar alerts via GEE
report_generator.py PDF reports (76KB)
analysis_queue_worker.py Background queue (28KB)
file_processor_simple.py Upload validation
email_service.py SMTP notifications
sentinel_imagery_service.py Satellite imagery + NDVI
land_cover_service.py ESA WorldCover
biodiversity_service.py KBA/EBA/IBA/GBIF
alert_scheduler.py Periodic alert checks
stuck_analysis_fixer.py Auto-retry hung jobs
models/ Data classes
__init__.py PlotData, AnalysisResult, ComplianceStatus
alerts.py DeforestationAlert, AlertSubscription
user.py User model with subscription tiers
queue.py AnalysisQueueItem, QueueStatus
providers/ External data source adapters
base.py Abstract interface
factory.py Provider factory
gee_provider.py Google Earth Engine
planetary_provider.py Microsoft Planetary Computer
cdse_provider.py Copernicus Data Space
utils/ Infrastructure
database.py PostgreSQL + PostGIS connection
routes/ WebSocket-enabled routes
enhanced_analysis.py Batch analysis with real-time progress
frontend/public/ Browser UI
index.html Main application page
login.html Authentication page
admin.html Admin dashboard
dashboard.html User dashboard