6.1. Convolutional Neural Networks and Remote Sensing#
6.1.1. Learning Objectives:#
Define a convolutional neural network
Discuss the main applications of convolutional neural networks
Know the steps to design and train a convolutional neural network
Define transfer learning and understand how to re-use pre-trained models
6.1.2. What are Convolutional Neural Networks (CNNs)?#
CNNs are a type of artificial neural network inspired by the structure and function of the human visual system. They excel at tasks related to pattern recognition and image analysis. CNNs are particularly well-suited for handling grid-like data, making them ideal for processing images and spatial data. CNNs are more efficient in handling larger images due to their use of partially connected layers and weight sharing, which reduces the number of parameters and computational complexity. In contrast, deep neural networks with fully connected layers may become impractical and risk network instability when dealing with large images.
6.1.3. Key components of CNNs:#
Convolutional Layers: These layers apply convolutional operations to the input data using learnable filters (kernels). These filters scan the input image, capturing features like edges, corners, textures, and more. Convolutional layers are responsible for feature extraction.
Filters in a CNN are specialized tools for detecting image features like edges and textures. Multiple filters in different layers enable the network to learn complex features, such as those detecting vertical and horizontal features, enabling tasks like image recognition.
Padding involves adding zeros to the input data to control output size. “Valid” padding (no padding) reduces output size, while “Same” padding maintains it.
Stride determines how much the convolutional filter moves across the input: a stride of 1 moves one pixel at a time, preserving spatial dimensions, while a larger stride reduces feature map size.
Activation Functions: After each convolution operation, an activation function (commonly ReLU - Rectified Linear Unit) is applied to introduce non-linearity into the model, allowing it to learn complex relationships in the data.
Pooling Layers: Pooling layers downsample the spatial dimensions of the data. Max-pooling and average-pooling are commonly used techniques to reduce the size of feature maps while retaining essential information.
Fully Connected Layers (Dense Layers): These layers are usually placed at the end of the CNN architecture. They take the flattened output from the previous layers and perform classification or regression tasks. Fully connected layers capture high-level patterns and relationships.
Softmax Layer: In classification tasks, a softmax layer is often used to convert the network’s output into class probabilities. It assigns a probability score to each class, indicating the likelihood of the input belonging to that class.
Fig.1 Scheme of a CNN (Kattenborn et al. 2021)
6.1.4. CNN Architecture#
CNN architectures can vary in complexity and depth, depending on the task and dataset. Some well-known CNN architectures include:
LeNet-5: One of the earliest CNN architectures, designed for handwritten digit recognition.
AlexNet: Introduced in 2012, it played a significant role in popularizing deep learning. It won the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) with a significantly reduced error rate.
VGGNet: Known for its simplicity and uniform architecture, VGGNet uses small 3x3 convolutional filters repeatedly to learn features.
Fig.2 Scheme of the VGG-16 architecture (Kattenborn et al. 2021)
GoogLeNet (Inception): Introduced the concept of inception modules, which are multiple parallel convolutional layers of different sizes.
ResNet (Residual Network): Addressed the vanishing gradient problem by introducing skip connections or residual blocks. It enabled training very deep networks.
6.1.5. CNN applications#
Image Classification: CNNs are widely used for image classification tasks, where they can accurately categorize objects, animals, or scenes within images.
Object Detection: CNNs can locate and classify objects within images, making them valuable for tasks like autonomous driving, surveillance, and more.
Semantic Segmentation: In this task, CNNs assign a class label to each pixel in an image, enabling the identification of object boundaries within scenes.
Face Recognition: CNNs are used for face detection and recognition in security systems and social media applications.
Remote Sensing: In remote sensing, CNNs are used for tasks like land cover classification, change detection, and object recognition in satellite imagery.
Fig.3 Different CNN tasks (Kattenborn et al. 2021)
Tips and tricks
Data Augmentation is a technique that artificially increases the training dataset’s size by creating realistic variations of each training instance. It serves as a regularization technique to reduce overfitting. To be effective, the generated instances should be as realistic as possible, involving modifications such as slight shifts, rotations, and resizing of images in the training set. This approach enhances the model’s tolerance to variations in object position, orientation, and size, effectively expanding the training dataset.
CNNs face memory challenges during training due to storing forward pass values for backpropagation. Solutions involve reducing mini-batch size, which affects convergence speed, and lowering dimensionality via larger strides or layer removal. Adopting 16-bit floats reduces memory but risks precision issues. Distributed computing aids larger models but needs specialized hardware. Pooling layers, beneficial for downsampling, may compromise spatial details. Solutions depend on task needs, resources, and balancing memory efficiency with model performance.
6.1.6. CNN and remote sensing#
CNNs have significantly impacted remote sensing by automating the analysis of satellite and aerial imagery:
Land Cover Classification: CNNs classify land cover types (e.g., forests, urban areas, water bodies) in satellite images with high accuracy. They automatically learn relevant features and spatial relationships, improving classification results.
Object Detection: CNNs locate and classify objects of interest (e.g., buildings, vehicles) within remote sensing images. They enable precise object identification for applications like urban planning and disaster response.
Change Detection: CNNs identify changes between images captured at different times. This is vital for monitoring deforestation, urban expansion, and environmental shifts over time.
Semantic Segmentation: CNNs assign labels to each pixel in an image, allowing for detailed mapping of land cover, vegetation, and infrastructure. This information is valuable for environmental assessments.
Transfer Learning: Pre-trained CNN models, fine-tuned for remote sensing tasks, reduce the need for extensive labeled data, making CNNs accessible for smaller-scale projects.
Reading: Review on Convolutional Neural Networks (CNN) in vegetation remote sensing - Kattenborn et al. 2021
CNNs are employed in vegetation remote sensing, extracting information from spatial data without the need for extensive feature engineering. Unlike traditional ML algorithms, which require demanding and limiting feature engineering, CNNs learn to perceive by optimizing transformations iteratively, rendering feature engineering obsolete. This approach signifies a shift from dictating what a model should learn to how it should learn.
Fig.4 CNN applications in vegetation remote sensing (Kattenborn et al. 2021)
Exercise notebook: Land Cover Classification using Convolutional Neural Networks (CNNs)
Construct CNNs that classifies EuroSAT images into one of its 10 classes