Data Flow & Communication
How components talk and data moves through the system.
Trace Data Through a Complete Analysis
Below is a simulated conversation between the system's components as a real analysis unfolds. Press Next to step through each message, or Play All to watch them stream in. Pay attention to how each component transforms the data before handing it off.
Uploading plantation_borneo.geojson to /api/analysis/upload
Parsed 3 features. One polygon was self-intersecting — auto-repaired. Area: 247.3 ha.
Stored 3 plots with PostGIS geometries. Upload ID: a7f3b...
Starting analysis. Coordinates detected in Indonesia (ID) — HIGH risk per Article 29. +20 points.
Hansen GFC via GEE: 3.2 ha tree cover loss in 2022. Post-cutoff alert!
Sentinel-1 radar confirms change in same area. 2.8 ha. High confidence.
Risk score: 82/100. Forest loss: 3.1%. Verdict: NON_COMPLIANT
PDF built: 8 pages with satellite imagery, NDVI maps, alert table. Emailing user.
Each component receives structured data, adds its own context, and passes an enriched result downstream. The User sends a file; the ReportGenerator receives an AnalysisResult containing everything accumulated along the way.
Concurrent Fetching with asyncio.gather
GLAD and RADD queries are independent — neither needs the other's result. The analyzer fires both at the same time using async/await and concurrency:
glad_task = self._fetch_glad_alerts(geometry, plot_id)
radd_task = self._fetch_radd_alerts(geometry, plot_id)
glad_alerts, radd_data = await asyncio.gather(
glad_task, radd_task, return_exceptions=True
)
return_exceptions=True). Unpack the two results into glad_alerts and radd_data.Data Flow Tasks
How to add a new step to the analysis pipeline
Say you want to add a biodiversity check alongside the existing GLAD and RADD queries.
- Open
backend/services/forest_analyzer_with_alerts.pyand find theanalyze_plot()method. - Create a new async method (e.g.,
_fetch_biodiversity_risk(geometry, plot_id)) that calls your data source and returns a result object. - Add your new task to the
asyncio.gathercall alongside the GLAD and RADD tasks so it runs concurrently. - Unpack the new result and add it to the
detailsdict that gets attached to theAnalysisResult. - If the new data should affect the risk score, update the scoring logic in the same method.
How to track data through the system
When debugging, follow this chain to trace how a piece of data moves:
- API endpoint — Find the route in
backend/api/. This is where the HTTP request arrives and parameters are validated. - Service method — The endpoint calls a method in
backend/services/. This is where business logic lives. - Data model — The service creates or updates a dataclass from
backend/models/. - Database table — The model is persisted via SQL in
backend/utils/database.py. - Response — The result flows back up: database row → model → service → API response JSON. Each layer adds context.
How to add a new data model
- Create a new dataclass in the appropriate file under
backend/models/(e.g.,models/alerts.pyfor alert-related data). - Add a
to_dict()method so the model can be serialized to JSON for API responses. - Use the model in your service layer — import it and return instances from service methods.
- Create the corresponding database table in
backend/utils/database.pyinside theensure_tables_exist()function, using SQLCREATE TABLE IF NOT EXISTS. - Add any necessary indexes, especially spatial indexes if the model includes geometry columns.
Communication Patterns
Synchronous vs Asynchronous
The system offers two paths for running an analysis:
Synchronous (real-time): The user uploads a file, the server analyzes it immediately, and the response comes back in the same HTTP connection. Upload → analyze → wait → results. The user holds the connection open the entire time.
Asynchronous (queue-based): The user uploads a file and gets back a queue ID immediately. A background worker picks up the job, runs the analysis, saves results to the database, and sends an email notification. Upload → queue → background worker processes → email notification.
The sync path lives in /api/analysis/analyze/{upload_id}. The async path uses /api/queue/submit,
with a background worker in analysis_queue_worker.py polling for pending jobs. Real-time progress is available
via WebSocket
connections managed by websocket_manager.py.
Why asyncio.gather?
When two operations do not depend on each other, running them one after the other wastes time. The GLAD service queries Landsat optical data from Google Earth Engine. The RADD service queries Sentinel-1 radar data from GEE. Neither needs the other's result to do its work.
asyncio.gather()
fires both requests concurrently. If each takes 3 seconds, sequential execution would take 6 seconds. With gather,
both run at the same time and the total is roughly 3 seconds — cutting analysis time in half.
The return_exceptions=True parameter is critical: if RADD fails (say, the plot is outside tropical coverage),
the GLAD result is still preserved. Without it, one failure would cancel everything.
Alert Deduplication
GLAD and RADD may detect the same deforestation event. GLAD sees it through optical satellite imagery (Landsat, 30m resolution). RADD sees it through radar (Sentinel-1, 10m resolution). The affected areas often overlap but rarely match exactly because the two sensors have different resolutions and detection methods.
To avoid double-counting, the system takes max(glad_area, radd_area) as the reported deforestation area
rather than summing them. This gives a conservative but honest estimate.
When both systems detect alerts in the same area, the system flags this as cross-validated, which means higher confidence. A single-source detection might be a false positive (cloud shadow misread as loss, for instance), but two independent sensors agreeing makes the finding much more reliable.
Data Flow Reference
Analysis Pipeline Steps
| Step | Component | Input | Output |
|---|---|---|---|
| 1. File upload | FileProcessor | GeoJSON / KML / Shapefile | PlotData (geometry, area, features) |
| 2. Storage | PostGIS | PlotData |
plot_id in database |
| 3. Forest coverage | GEE Provider | geometry | coverage_2020, coverage_current, loss% |
| 4. GLAD alerts | GLAD Service | geometry + date range | has_alerts, count, area_ha, loss_by_year |
| 5. RADD alerts | RADD Service | geometry + date range | has_alerts, count, area_ha |
| 6. Risk scoring | ForestAnalyzer | all above + country code | risk_score (0–100) |
| 7. Compliance | ForestAnalyzer | risk_score + thresholds | ComplianceStatus enum |
| 8. Report | ReportGenerator | AnalysisResult |
PDF file |
Data Models
| Model | File | Key Fields |
|---|---|---|
PlotData |
models/__init__.py |
geometry, feature_count, total_area_hectares, bounds |
AnalysisResult |
models/__init__.py |
plot_id, forest_coverage_percent, compliance_status, risk_score, details |
ComplianceStatus |
models/__init__.py |
COMPLIANT | NON_COMPLIANT | NEEDS_REVIEW | UNKNOWN |
RiskLevel |
models/__init__.py |
LOW | MEDIUM | HIGH | CRITICAL |
DeforestationAlert |
models/alerts.py |
plot_id, alert_date, alert_type, confidence, affected_area_hectares |