Satellites & External Services
Copernicus CDSE, Google Earth Engine, PostGIS, and the provider abstraction.
Understand Each External Service
The EUDR Forest Analyzer does not generate satellite data itself. It relies on several external services to fetch imagery, detect deforestation, classify land cover, and store geospatial results. This tutorial introduces each one.
Google Earth Engine
GEE is a cloud platform with petabytes of satellite data. Instead of downloading images to your own server, you send computation requests to Google’s infrastructure and get back just the results you need.
This application uses four datasets from GEE:
lossyear band tells you which year trees disappeared (values 1–23 for 2001–2023).
Authentication happens via a service account key file. The backend calls ee.Initialize(credentials) at startup, and every subsequent GEE call reuses that session.
PostgreSQL + PostGIS
PostGIS turns a regular PostgreSQL database into a spatial database. Plot boundaries are stored as geometry columns, so you can run queries like “does this polygon overlap a protected area?”
The app uses connection pooling to avoid the cost of opening a new connection for every request:
_connection_pool = pool.ThreadedConnectionPool(
minconn=pool_size,
maxconn=pool_size + max_overflow,
host=os.getenv("DB_HOST", "localhost"),
database=os.getenv("DB_NAME", "eudr_forest_analyzer"),
cursor_factory=RealDictCursor
)
pool_size (default 5) ready at all times. Allow up to pool_size + max_overflow (default 15) when traffic is heavy.row["plot_id"] instead of row[0].The Provider Abstraction
The codebase supports multiple satellite data backends. A factory pattern in providers/factory.py decides which one to use based on a single environment variable:
def get_provider() -> DataProvider:
provider_type = os.getenv("DATA_PROVIDER", "gee")
if provider_type == "gee":
return GEEProvider()
elif provider_type == "planetary":
return PlanetaryComputerProvider()
elif provider_type == "cdse":
return CDSEProvider()
DATA_PROVIDER from the environment. Default is "gee" (Google Earth Engine).DataProvider), so the rest of the application does not care which backend it is talking to.All analysis services call get_provider() and use the returned object. They never import GEEProvider directly.
GEE Provider: How It Works Under the Hood
The GEE provider (providers/gee_provider.py, 392 lines) is a delegation layer. It doesn't call GEE directly — instead, it wraps existing specialized services that were built before the provider abstraction was added:
Forest Loss Alerts
Delegates to hansen_tile_service which downloads Hansen GFC GeoTIFF tiles directly and analyzes the lossyear band. Returns yearly breakdown of tree cover loss in hectares.
Radar Alerts (RADD)
Delegates to radd_alert_service which queries the RADD ImageCollection via GEE's Python API. Filters by date, selects the Alert band, composites with .max().
Land Cover
Delegates to land_cover_service which queries ESA WorldCover via GEE. Returns per-class area statistics (tree cover, cropland, water, etc.) and a color-coded classification image.
Satellite Imagery & NDVI
Delegates to sentinel_imagery_service which queries Sentinel-2 L2A. Applies cloud masking using the SCL band, generates three-period comparisons and NDVI change maps.
Authentication: Uses a GEE service account key file. At startup, calls ee.Initialize(credentials). All subsequent GEE queries reuse this authenticated session.
GEE provider is the thinnest layer (392 lines) because the real complexity lives in the specialized services it delegates to. This is a common pattern: wrap legacy code behind a new interface without rewriting it.
CDSE Provider: The Active Production System
The CDSE provider (providers/cdse_provider.py, 1,920 lines — the most comprehensive) is what this system currently runs in production (DATA_PROVIDER=cdse). Unlike GEE, it's a self-contained implementation that talks to multiple external APIs directly:
Three Internal Helper Classes
GFWDataAPI
Wrapper for the Global Forest Watch Data API. Queries Hansen tree cover loss via POST /dataset/umd_tree_cover_loss/v1.12/query and integrated alerts (GLAD-L + GLAD-S2 + RADD combined) via /dataset/gfw_integrated_alerts/latest/query.
ESAWorldCoverSTAC
Complex land cover processor with a multi-step fallback chain: first tries WMS from Terrascope, then falls back to STAC catalog search + COG windowed reads. Computes per-class area statistics.
CDSEProvider (main class)
Orchestrates everything. Authenticates via OAuth2 with Copernicus Data Space. Fetches Sentinel-2 imagery via the SentinelHub Process API with custom evalscripts for RGB and NDVI rendering.
API Endpoints It Calls
When the GFW API is unavailable, CDSE generates reproducible simulated data using an MD5 hash of the geometry as a random seed. This means the same plot always gets the same simulated result — useful for testing and demos, and it prevents the system from crashing when an external API is down.
SMTP Email Service
The system sends emails at key moments: account activation, analysis complete notifications, batch summaries with PDF attachments, and failure alerts. The email service uses SMTP (typically Gmail) with HTML templates styled to match the application branding.
Key methods include send_activation_email(), send_analysis_complete_email(), send_batch_complete_email(), and send_analysis_failed_email(). Enterprise-tier users receive enhanced PDF attachments with satellite imagery and NDVI analysis, while free-tier users get a simpler summary report.
Working with External Services
How to switch satellite data providers
- Open your
.envfile (or create one from.env.example). - Change the
DATA_PROVIDERvariable. Options:gee,planetary, orcdse.DATA_PROVIDER=planetary - If switching to Planetary Computer, add
PLANETARY_COMPUTER_API_KEYand install extra packages:pip install planetary-computer pystac-client stackstac rioxarray - Restart the backend server. All analysis code uses the same
DataProviderinterface, so nothing else needs to change.
How to add a new data provider
- Create a new file in
backend/providers/, e.g.my_provider.py. - Define a class that extends
DataProvider(fromproviders/base.py). - Implement the required methods:
initialize(),get_forest_loss_alerts(),get_radar_alerts(),get_land_cover(),get_satellite_imagery(), andget_ndvi_analysis(). - Each method must return the correct dataclass from
providers/base.py(AlertResult,LandCoverResult,ImageryResult, orNDVIResult). - Register it in
providers/factory.pyby adding anelifbranch inget_provider(). - Set
DATA_PROVIDER=my_providerin.envand restart.
How to configure GEE authentication
- Create a service account in the Google Cloud Console with Earth Engine access.
- Download the JSON key file for that service account.
- Set the following in your
.env:GEE_SERVICE_ACCOUNT=true GEE_PROJECT_ID=your-project-id GEE_SERVICE_ACCOUNT_EMAIL=sa@your-project.iam.gserviceaccount.com GEE_SERVICE_ACCOUNT_KEY_FILE=/path/to/keyfile.json - Restart the backend. The GEE provider calls
ee.Initialize(credentials)at startup using these values.
How to change database connection settings
- Open your
.envfile. - Set any combination of these variables:
DB_HOST=localhost DB_PORT=5432 DB_NAME=eudr_forest_analyzer DB_USER=postgres DB_PASSWORD=secret DB_POOL_SIZE=5 DB_MAX_OVERFLOW=10 - Restart the backend. The connection pool is recreated with the new settings.
- To verify the connection, check the startup logs for “Database connection pool created” (no errors).
Why These Design Choices?
The Provider Pattern
Think of the provider abstraction like a universal power adapter. When you travel to a different country, you plug in a different adapter, but your laptop works the same way. The factory pattern in factory.py selects the adapter; the rest of the application just plugs in.
GEE, Microsoft Planetary Computer, and Copernicus Data Space all offer similar satellite data — Sentinel-2 imagery, land cover maps, forest loss indicators — but with completely different APIs and authentication schemes. The provider abstraction normalizes them into a shared set of dataclasses: AlertResult, LandCoverResult, ImageryResult, and NDVIResult.
This means a new provider can be added (say, a commercial satellite vendor) without touching any business logic. You only write the translation layer between their API and the standard dataclasses.
Why Connection Pooling?
Without connection pooling, every HTTP request would open a fresh TCP connection to PostgreSQL. That means a TCP handshake, SSL negotiation, authentication, and connection setup — all before a single query runs. Under load, this can add hundreds of milliseconds per request and exhaust the database’s connection limit.
With pooling, a set of connections is created once at startup and kept alive. When a request needs the database, it borrows a connection from the pool, runs its queries, and returns it. The pool also handles stale connections: it can test each one with a lightweight SELECT 1 before handing it out. If the test fails (connection dropped, network blip), the pool silently creates a fresh one.
The configuration has two knobs: pool_size (minimum connections always ready, default 5) and max_overflow (how many extra connections can be created under burst traffic, default 10). This keeps memory usage predictable while still handling spikes.
GEE Gotchas
RADD alerts (projects/radar-wur/raddalert/v1) and ESA WorldCover (ESA/WorldCover/v200) are both ImageCollections, not single Images. Loading them with ee.Image("...") causes the error: “Image.load: Asset is not an Image.”
The fix: use ee.ImageCollection("...").first() for WorldCover (since it contains a single global mosaic) or filter by date range and reduce with .max() for RADD.
Another common trap: RADD’s ImageCollection contains mixed image types — some have alert bands, others have baseline bands. Calling .max() on the full collection triggers “Expected a homogeneous image collection.” The fix: .select('Alert') before .max() so all images have the same band structure.
GEE vs CDSE: When to Use Which
| Aspect | GEE Provider | CDSE Provider |
|---|---|---|
| Architecture | Thin wrapper (392 lines) delegating to 4 specialized services | Self-contained (1,920 lines) with 3 internal helper classes |
| Forest Loss | Hansen GFC via GEE tiles (direct GeoTIFF) | Hansen GFC via GFW Data API (REST POST) |
| Radar Alerts | RADD via GEE Python API (ee.ImageCollection) | GFW Integrated Alerts (GLAD-L + GLAD-S2 + RADD combined) |
| Land Cover | ESA WorldCover via GEE | ESA WorldCover via Terrascope WMS/STAC with fallback chain |
| Imagery | Sentinel-2 via GEE | Sentinel-2 via SentinelHub Process API (evalscripts) |
| Authentication | Service account key file (ee.Initialize) | OAuth2 client credentials + GFW API key |
| Offline Fallback | None — requires live GEE connection | Deterministic simulation via MD5 hash seed |
| Best For | Custom geospatial computation, GEE ecosystem | EU-hosted data, GFW integration, resilience |
The current production deployment uses CDSE because it provides a single integration point for both European Copernicus data and GFW's global alert system, with built-in resilience when external APIs are unavailable.
External Service Reference
Satellite Datasets
| Dataset | ID | Resolution | Coverage | Band Used |
|---|---|---|---|---|
| Hansen GFC | UMD/hansen/global_forest_change_2023_v1_11 |
30 m | Global (2001+) | lossyear |
| RADD | projects/radar-wur/raddalert/v1 |
10 m | Tropical (2019+) | Alert |
| ESA WorldCover | ESA/WorldCover/v200 |
10 m | Global (2021) | Map |
| Sentinel-2 L2A | COPERNICUS/S2_SR_HARMONIZED |
10 m | Global | B2-B4, B8 (RGB+NIR) |
Provider Interface Methods
| Method | Returns | Description |
|---|---|---|
initialize() |
bool |
Connect to the data source and authenticate |
get_forest_loss_alerts(geometry, start, end) |
AlertResult |
Forest loss alerts (optical/Landsat-based) |
get_radar_alerts(geometry, start, end) |
AlertResult |
Radar-based deforestation alerts (Sentinel-1) |
get_land_cover(geometry, year) |
LandCoverResult |
Land cover classification (tree, crop, water, etc.) |
get_satellite_imagery(geometry, date) |
ImageryResult |
True-color satellite image for a given date |
get_ndvi_analysis(geometry, baseline, current) |
NDVIResult |
Vegetation index comparison between two years |
Environment Variables
| Variable | Default | Description |
|---|---|---|
DATA_PROVIDER |
gee |
Which satellite provider to use (gee, planetary, cdse) |
GEE_SERVICE_ACCOUNT |
false |
Use service account authentication for GEE |
GEE_PROJECT_ID |
— | Google Earth Engine project ID |
DB_HOST |
localhost |
PostgreSQL database host |
DB_POOL_SIZE |
5 |
Minimum connections kept in the connection pool |
SMTP_HOST |
smtp.gmail.com |
SMTP email server hostname |
CDSE Provider API Endpoints
| Service | URL | Method | Purpose |
|---|---|---|---|
| GFW Tree Cover Loss | data-api.globalforestwatch.org/dataset/umd_tree_cover_loss/v1.12/query | POST | Hansen annual forest loss by year |
| GFW Integrated Alerts | data-api.globalforestwatch.org/dataset/gfw_integrated_alerts/latest/query | POST | Combined GLAD + RADD alerts |
| CDSE OAuth2 Token | identity.dataspace.copernicus.eu/.../token | POST | Access token for CDSE APIs |
| SentinelHub Process | sh.dataspace.copernicus.eu/api/v1/process | POST | Sentinel-2 imagery with evalscripts |
| Terrascope WMS | services.terrascope.be/wms/v2 | GET | WorldCover map rendering |
| Terrascope STAC | services.terrascope.be/stac | GET | WorldCover catalog search |
Provider Implementation Comparison
| Feature | GEE | CDSE | Planetary |
|---|---|---|---|
| Forest Loss Alerts | ✔ Full | ✔ Full (GFW API) | ❌ Stub |
| Radar Alerts | ✔ Full (RADD) | ✔ Full (GFW Integrated) | ❌ Stub |
| Land Cover | ✔ Full | ✔ Full | ✔ Full |
| Land Cover Image | ✔ Full | ✔ Full | ❌ Stub |
| Satellite Imagery | ✔ Full | ✔ Full | ✔ Full |
| NDVI Analysis | ✔ Full | ✔ Full | ✔ Full |
| Offline Fallback | ❌ No | ✔ Simulated | ❌ No |
| Lines of Code | 392 | 1,920 | 746 |
| Status | Production-ready | Active in production | Partial |
CDSE Environment Variables
| Variable | Description |
|---|---|
DATA_PROVIDER | Set to cdse to use Copernicus provider |
CDSE_CLIENT_ID | OAuth2 client ID from Copernicus Data Space |
CDSE_CLIENT_SECRET | OAuth2 client secret |
GFW_API_KEY | Global Forest Watch API key for tree cover loss and alerts |