An All-Weather AI for Landslide Detection

Fusing Optical and Radar Satellite Data for Reliable Disaster Monitoring

Main image for An All-Weather AI for Landslide Detection

Problem Statement

The challenge I tackled is a critical paradox in disaster management: the heavy rain that often triggers landslides also creates thick cloud cover. This blinds traditional optical satellites, making it impossible for agencies like the National Disaster Operations Center to assess damage or direct resources effectively. They are left blind at the exact moment they need to see the most. My goal was to build an AI system that could overcome this by functioning in any weather condition.

Methodology & Approach

My strategy centered on data fusion. I combined two powerful data sources from Europe's Sentinel satellites: standard optical imagery (Sentinel-2) for visual analysis, and cloud-penetrating Synthetic Aperture Radar (SAR) data (Sentinel-1) to measure ground texture and disturbance. This radar data was the key to creating a true 'all-weather' capability.

I developed and rigorously tested two distinct machine learning approaches:

1. The Statistical Approach (Champion Model): Instead of feeding raw pixels to a traditional model, I engineered a rich set of statistical features. For each of the 12 data channels, I calculated the mean, standard deviation, skew, and kurtosis to capture the area's texture and consistency. I also engineered scientific indices like NDVI (for vegetation health) to detect landslide scars. After comparing several models, a tuned LightGBM classifier proved to be the most effective, delivering the best balance of performance and speed.

2. The Deep Learning Approach: To explore a different angle, I built a Convolutional Neural Network (CNN) using a pre-trained EfficientNetV2B0 model. A major challenge was the severe class imbalance in the data (only 18% of examples were landslides). I solved this by implementing a custom Focal Loss function, which forces the model to focus on these rare, hard-to-find cases, and by building a balanced tf.data pipeline that used oversampling and data augmentation.

Explainability (XAI): A core goal was to build a trustworthy system. To prevent a 'black box' model, I implemented two different explainability techniques. For the LightGBM model, I used SHAP (SHapley Additive exPlanations) to precisely measure the contribution of each statistical feature to every prediction. For the CNN, I used LIME (Local Interpretable Model-agnostic Explanations) to generate heatmaps showing which parts of an image the model was 'looking' at.

Deployment: Finally, I completed the project's lifecycle by deploying the champion LightGBM model as a full application. I built a robust FastAPI backend to handle the data processing and serve the model's predictions. The user interface is a clean Streamlit web app that not only shows the prediction but also displays the real-time SHAP analysis, making the model's reasoning transparent to the user.

Results & Insights

The final LightGBM model achieved a strong F1-Score of 0.86 on the validation data. In practical terms, it successfully identified 82% of actual landslides while keeping the rate of false alarms very low.

The most important insight came from the SHAP analysis. It proved that the model had learned to be a true 'all-weather' system. Its predictions were not just based on visual features like vegetation loss (44% importance); they relied almost equally on the cloud-penetrating radar data (45% combined importance) that measured changes in ground texture and debris fields. This confirmed the model could make accurate decisions even without a clear visual.

Furthermore, the LIME analysis on the CNN provided a stunning confirmation: when shown an image completely covered by clouds, the model learned to entirely ignore the visual (cloud) layer and based its correct prediction solely on the underlying radar signals it could see through the clouds.

Impact & Value

This project is a successful proof-of-concept for a real-world tool that can provide reliable and trustworthy information to disaster response teams. By automating the initial screening of satellite imagery and providing clear, explainable insights, this system can help free up human experts to focus on the most critical, high-risk cases. This project is a complete, end-to-end data science workflow from a raw business problem to a deployed, value-driven AI application.

Tech Stack

PythonLightGBMTensorFlowPandasScikit-learnSHAPStreamlitFastAPI