Understand Each External Service

The EUDR Forest Analyzer does not generate satellite data itself. It relies on several external services to fetch imagery, detect deforestation, classify land cover, and store geospatial results. This tutorial introduces each one.

Step 1

Google Earth Engine

GEE is a cloud platform with petabytes of satellite data. Instead of downloading images to your own server, you send computation requests to Google’s infrastructure and get back just the results you need.

This application uses four datasets from GEE:

UMD/hansen/global_forest_change_2023_v1_11 Hansen Global Forest Change — 30 m resolution, global coverage. The lossyear band tells you which year trees disappeared (values 1–23 for 2001–2023).
projects/radar-wur/raddalert/v1 RADD Alerts — Sentinel-1 radar, 10 m resolution, tropical regions. Works through clouds because radar does not rely on optical light.
ESA/WorldCover/v200 ESA WorldCover — Land cover classification at 10 m. Tells you if a pixel is tree cover, cropland, water, built-up area, and so on.
COPERNICUS/S2_SR_HARMONIZED Sentinel-2 L2A — Optical imagery for true-color visualization and NDVI analysis.

Authentication happens via a service account key file. The backend calls ee.Initialize(credentials) at startup, and every subsequent GEE call reuses that session.

Step 2

PostgreSQL + PostGIS

PostGIS turns a regular PostgreSQL database into a spatial database. Plot boundaries are stored as geometry columns, so you can run queries like “does this polygon overlap a protected area?”

The app uses connection pooling to avoid the cost of opening a new connection for every request:

Python
_connection_pool = pool.ThreadedConnectionPool(
    minconn=pool_size,
    maxconn=pool_size + max_overflow,
    host=os.getenv("DB_HOST", "localhost"),
    database=os.getenv("DB_NAME", "eudr_forest_analyzer"),
    cursor_factory=RealDictCursor
)
English
Create a pool of pre-opened database connections. Keep pool_size (default 5) ready at all times. Allow up to pool_size + max_overflow (default 15) when traffic is heavy.
host / database — Read from environment variables, falling back to localhost and the default database name.
RealDictCursor — Return query rows as Python dictionaries instead of tuples, so you can write row["plot_id"] instead of row[0].
Step 3

The Provider Abstraction

The codebase supports multiple satellite data backends. A factory pattern in providers/factory.py decides which one to use based on a single environment variable:

Python — providers/factory.py
def get_provider() -> DataProvider:
    provider_type = os.getenv("DATA_PROVIDER", "gee")
    if provider_type == "gee":
        return GEEProvider()
    elif provider_type == "planetary":
        return PlanetaryComputerProvider()
    elif provider_type == "cdse":
        return CDSEProvider()
English
Read DATA_PROVIDER from the environment. Default is "gee" (Google Earth Engine).
Return the matching provider class. Each class implements the same interface (DataProvider), so the rest of the application does not care which backend it is talking to.
Three options today: GEE (fully implemented), Copernicus CDSE (fully implemented, currently active), and Microsoft Planetary Computer (partial — imagery works, forest loss stubbed).

All analysis services call get_provider() and use the returned object. They never import GEEProvider directly.

Step 3a

GEE Provider: How It Works Under the Hood

The GEE provider (providers/gee_provider.py, 392 lines) is a delegation layer. It doesn't call GEE directly — instead, it wraps existing specialized services that were built before the provider abstraction was added:

Forest Loss Alerts

Delegates to hansen_tile_service which downloads Hansen GFC GeoTIFF tiles directly and analyzes the lossyear band. Returns yearly breakdown of tree cover loss in hectares.

Radar Alerts (RADD)

Delegates to radd_alert_service which queries the RADD ImageCollection via GEE's Python API. Filters by date, selects the Alert band, composites with .max().

Land Cover

Delegates to land_cover_service which queries ESA WorldCover via GEE. Returns per-class area statistics (tree cover, cropland, water, etc.) and a color-coded classification image.

Satellite Imagery & NDVI

Delegates to sentinel_imagery_service which queries Sentinel-2 L2A. Applies cloud masking using the SCL band, generates three-period comparisons and NDVI change maps.

Authentication: Uses a GEE service account key file. At startup, calls ee.Initialize(credentials). All subsequent GEE queries reuse this authenticated session.

Architecture Note

GEE provider is the thinnest layer (392 lines) because the real complexity lives in the specialized services it delegates to. This is a common pattern: wrap legacy code behind a new interface without rewriting it.

Step 3b

CDSE Provider: The Active Production System

The CDSE provider (providers/cdse_provider.py, 1,920 lines — the most comprehensive) is what this system currently runs in production (DATA_PROVIDER=cdse). Unlike GEE, it's a self-contained implementation that talks to multiple external APIs directly:

Three Internal Helper Classes

GFWDataAPI

Wrapper for the Global Forest Watch Data API. Queries Hansen tree cover loss via POST /dataset/umd_tree_cover_loss/v1.12/query and integrated alerts (GLAD-L + GLAD-S2 + RADD combined) via /dataset/gfw_integrated_alerts/latest/query.

ESAWorldCoverSTAC

Complex land cover processor with a multi-step fallback chain: first tries WMS from Terrascope, then falls back to STAC catalog search + COG windowed reads. Computes per-class area statistics.

CDSEProvider (main class)

Orchestrates everything. Authenticates via OAuth2 with Copernicus Data Space. Fetches Sentinel-2 imagery via the SentinelHub Process API with custom evalscripts for RGB and NDVI rendering.

API Endpoints It Calls

data-api.globalforestwatch.org GFW Data API — Hansen tree cover loss (yearly breakdown) and integrated deforestation alerts (GLAD + RADD combined)
identity.dataspace.copernicus.eu CDSE OAuth2 — Token endpoint for authenticating with Copernicus Data Space
sh.dataspace.copernicus.eu/api/v1/process SentinelHub Process API — Fetches Sentinel-2 imagery with custom evalscripts (RGB, NDVI, cloud masking)
services.terrascope.be/wms/v2 Terrascope WMS — ESA WorldCover land cover map rendering
services.terrascope.be/stac Terrascope STAC — Catalog search for WorldCover COG tiles (fallback)
💡
Resilience: Deterministic Fallback

When the GFW API is unavailable, CDSE generates reproducible simulated data using an MD5 hash of the geometry as a random seed. This means the same plot always gets the same simulated result — useful for testing and demos, and it prevents the system from crashing when an external API is down.

Step 4

SMTP Email Service

The system sends emails at key moments: account activation, analysis complete notifications, batch summaries with PDF attachments, and failure alerts. The email service uses SMTP (typically Gmail) with HTML templates styled to match the application branding.

Key methods include send_activation_email(), send_analysis_complete_email(), send_batch_complete_email(), and send_analysis_failed_email(). Enterprise-tier users receive enhanced PDF attachments with satellite imagery and NDVI analysis, while free-tier users get a simpler summary report.

Working with External Services

How to switch satellite data providers

  1. Open your .env file (or create one from .env.example).
  2. Change the DATA_PROVIDER variable. Options: gee, planetary, or cdse.
    DATA_PROVIDER=planetary
  3. If switching to Planetary Computer, add PLANETARY_COMPUTER_API_KEY and install extra packages:
    pip install planetary-computer pystac-client stackstac rioxarray
  4. Restart the backend server. All analysis code uses the same DataProvider interface, so nothing else needs to change.

How to add a new data provider

  1. Create a new file in backend/providers/, e.g. my_provider.py.
  2. Define a class that extends DataProvider (from providers/base.py).
  3. Implement the required methods: initialize(), get_forest_loss_alerts(), get_radar_alerts(), get_land_cover(), get_satellite_imagery(), and get_ndvi_analysis().
  4. Each method must return the correct dataclass from providers/base.py (AlertResult, LandCoverResult, ImageryResult, or NDVIResult).
  5. Register it in providers/factory.py by adding an elif branch in get_provider().
  6. Set DATA_PROVIDER=my_provider in .env and restart.

How to configure GEE authentication

  1. Create a service account in the Google Cloud Console with Earth Engine access.
  2. Download the JSON key file for that service account.
  3. Set the following in your .env:
    GEE_SERVICE_ACCOUNT=true GEE_PROJECT_ID=your-project-id GEE_SERVICE_ACCOUNT_EMAIL=sa@your-project.iam.gserviceaccount.com GEE_SERVICE_ACCOUNT_KEY_FILE=/path/to/keyfile.json
  4. Restart the backend. The GEE provider calls ee.Initialize(credentials) at startup using these values.

How to change database connection settings

  1. Open your .env file.
  2. Set any combination of these variables:
    DB_HOST=localhost DB_PORT=5432 DB_NAME=eudr_forest_analyzer DB_USER=postgres DB_PASSWORD=secret DB_POOL_SIZE=5 DB_MAX_OVERFLOW=10
  3. Restart the backend. The connection pool is recreated with the new settings.
  4. To verify the connection, check the startup logs for “Database connection pool created” (no errors).

Why These Design Choices?

The Provider Pattern

Universal Power Adapter

Think of the provider abstraction like a universal power adapter. When you travel to a different country, you plug in a different adapter, but your laptop works the same way. The factory pattern in factory.py selects the adapter; the rest of the application just plugs in.

GEE, Microsoft Planetary Computer, and Copernicus Data Space all offer similar satellite data — Sentinel-2 imagery, land cover maps, forest loss indicators — but with completely different APIs and authentication schemes. The provider abstraction normalizes them into a shared set of dataclasses: AlertResult, LandCoverResult, ImageryResult, and NDVIResult.

This means a new provider can be added (say, a commercial satellite vendor) without touching any business logic. You only write the translation layer between their API and the standard dataclasses.

Why Connection Pooling?

Without connection pooling, every HTTP request would open a fresh TCP connection to PostgreSQL. That means a TCP handshake, SSL negotiation, authentication, and connection setup — all before a single query runs. Under load, this can add hundreds of milliseconds per request and exhaust the database’s connection limit.

With pooling, a set of connections is created once at startup and kept alive. When a request needs the database, it borrows a connection from the pool, runs its queries, and returns it. The pool also handles stale connections: it can test each one with a lightweight SELECT 1 before handing it out. If the test fails (connection dropped, network blip), the pool silently creates a fresh one.

The configuration has two knobs: pool_size (minimum connections always ready, default 5) and max_overflow (how many extra connections can be created under burst traffic, default 10). This keeps memory usage predictable while still handling spikes.

GEE Gotchas

ImageCollection vs. Image — the #1 GEE mistake

RADD alerts (projects/radar-wur/raddalert/v1) and ESA WorldCover (ESA/WorldCover/v200) are both ImageCollections, not single Images. Loading them with ee.Image("...") causes the error: “Image.load: Asset is not an Image.”

The fix: use ee.ImageCollection("...").first() for WorldCover (since it contains a single global mosaic) or filter by date range and reduce with .max() for RADD.

Another common trap: RADD’s ImageCollection contains mixed image types — some have alert bands, others have baseline bands. Calling .max() on the full collection triggers “Expected a homogeneous image collection.” The fix: .select('Alert') before .max() so all images have the same band structure.

GEE vs CDSE: When to Use Which

AspectGEE ProviderCDSE Provider
ArchitectureThin wrapper (392 lines) delegating to 4 specialized servicesSelf-contained (1,920 lines) with 3 internal helper classes
Forest LossHansen GFC via GEE tiles (direct GeoTIFF)Hansen GFC via GFW Data API (REST POST)
Radar AlertsRADD via GEE Python API (ee.ImageCollection)GFW Integrated Alerts (GLAD-L + GLAD-S2 + RADD combined)
Land CoverESA WorldCover via GEEESA WorldCover via Terrascope WMS/STAC with fallback chain
ImagerySentinel-2 via GEESentinel-2 via SentinelHub Process API (evalscripts)
AuthenticationService account key file (ee.Initialize)OAuth2 client credentials + GFW API key
Offline FallbackNone — requires live GEE connectionDeterministic simulation via MD5 hash seed
Best ForCustom geospatial computation, GEE ecosystemEU-hosted data, GFW integration, resilience

The current production deployment uses CDSE because it provides a single integration point for both European Copernicus data and GFW's global alert system, with built-in resilience when external APIs are unavailable.

External Service Reference

Satellite Datasets

Dataset ID Resolution Coverage Band Used
Hansen GFC UMD/hansen/global_forest_change_2023_v1_11 30 m Global (2001+) lossyear
RADD projects/radar-wur/raddalert/v1 10 m Tropical (2019+) Alert
ESA WorldCover ESA/WorldCover/v200 10 m Global (2021) Map
Sentinel-2 L2A COPERNICUS/S2_SR_HARMONIZED 10 m Global B2-B4, B8 (RGB+NIR)

Provider Interface Methods

Method Returns Description
initialize() bool Connect to the data source and authenticate
get_forest_loss_alerts(geometry, start, end) AlertResult Forest loss alerts (optical/Landsat-based)
get_radar_alerts(geometry, start, end) AlertResult Radar-based deforestation alerts (Sentinel-1)
get_land_cover(geometry, year) LandCoverResult Land cover classification (tree, crop, water, etc.)
get_satellite_imagery(geometry, date) ImageryResult True-color satellite image for a given date
get_ndvi_analysis(geometry, baseline, current) NDVIResult Vegetation index comparison between two years

Environment Variables

Variable Default Description
DATA_PROVIDER gee Which satellite provider to use (gee, planetary, cdse)
GEE_SERVICE_ACCOUNT false Use service account authentication for GEE
GEE_PROJECT_ID Google Earth Engine project ID
DB_HOST localhost PostgreSQL database host
DB_POOL_SIZE 5 Minimum connections kept in the connection pool
SMTP_HOST smtp.gmail.com SMTP email server hostname

CDSE Provider API Endpoints

ServiceURLMethodPurpose
GFW Tree Cover Lossdata-api.globalforestwatch.org/dataset/umd_tree_cover_loss/v1.12/queryPOSTHansen annual forest loss by year
GFW Integrated Alertsdata-api.globalforestwatch.org/dataset/gfw_integrated_alerts/latest/queryPOSTCombined GLAD + RADD alerts
CDSE OAuth2 Tokenidentity.dataspace.copernicus.eu/.../tokenPOSTAccess token for CDSE APIs
SentinelHub Processsh.dataspace.copernicus.eu/api/v1/processPOSTSentinel-2 imagery with evalscripts
Terrascope WMSservices.terrascope.be/wms/v2GETWorldCover map rendering
Terrascope STACservices.terrascope.be/stacGETWorldCover catalog search

Provider Implementation Comparison

FeatureGEECDSEPlanetary
Forest Loss Alerts✔ Full✔ Full (GFW API)❌ Stub
Radar Alerts✔ Full (RADD)✔ Full (GFW Integrated)❌ Stub
Land Cover✔ Full✔ Full✔ Full
Land Cover Image✔ Full✔ Full❌ Stub
Satellite Imagery✔ Full✔ Full✔ Full
NDVI Analysis✔ Full✔ Full✔ Full
Offline Fallback❌ No✔ Simulated❌ No
Lines of Code3921,920746
StatusProduction-readyActive in productionPartial

CDSE Environment Variables

VariableDescription
DATA_PROVIDERSet to cdse to use Copernicus provider
CDSE_CLIENT_IDOAuth2 client ID from Copernicus Data Space
CDSE_CLIENT_SECRETOAuth2 client secret
GFW_API_KEYGlobal Forest Watch API key for tree cover loss and alerts