Backbone network

1/30/2024

While ResNet-50 is used in some object detection frameworks such as BlitzNet and RetinaNet. ResNet-50 and ResNet-101 are used widely in object detection models. There are many variants of ResNets, for instance, ResNet-34, ResNet-50 which is composed of 26 million parameters, ResNet-101 with 44 million parameters and ResNet-152 which is deeper with 152 layers. ResNets are mainly consisting of convolutional and identity blocks. in 2015 developed ResNets which are based on residuals or skip connections. 2.4 ResNetsĬonvolutional neural networks have become more and more deeper with the addition of layers, but once the accuracy gets saturated, it quickly drops off. GoogLeNet mainly is used in YOLO object detection model. The network is made of 22 layers with 5 million parameters. It employs a 1 × 1 convolution in the middle of the network to reduce dimensionality and they opted to use global average pooling instead of fully connected layers. The inception block includes filters of varying sizes 1 × 1, 3 × 3 and 5 × 5. They came up with a new notion known as blocks of inception, where it embeds multi-scale convolutional transformations. Their method is different from that of VGGNet and AlexNet. 2.3 GoogLeNetĪlso called Inception V1, GoogLeNet is a small network developed by Szegedy et al. VGG-16 is one of the most used architectures in object detection and achieved interesting performances it’s used for instance in algorithms like Fast R-CNN, Faster R-CNN, HyperNet, RON384, SSD and RefineDet. A deeper version of VGG called VGG-19 is available. VGG-16 provides more layers compared to AlexNet and uses smaller filters of 2 × 2 and 3 × 3. In 2014 a network called VGG-16 was released, composed of 13 convolutional and 3 fully connected layers with ReLU activation. AlexNet is used in object detection models such as R-CNN, and HyperNet. Rectified Linear Units (ReLUs) are used for the first time as activations in AlexNet instead of sigmoid and tanh activations to add non-linearity. In comparison to LeNet-5, AlexNet has more layers and contains around 60 million parameters.

in 2012, developed a convolutional neural network composed of 8 layers, where 5 are convolutional and 3 are fully connected. The selection of CNN architectures to be covered in this article is not made randomly, but according to their popularity and performance in different state of the art object detection models. This shows that the detection of objects can be more difficult than the classification of images. Whereas in object detection, the model must be able to recognize several objects in a single image and provides the coordinates that identify the location of the objects. In image classification or image recognition, the classifier classifies a single object in the image, outputs a single category per image, and gives the probability of matching a class. These networks are mainly used for object classification task and have evaluated on some widely used benchmarks and datasets such as ImageNet (Fig. Several CNNs are available, for instance, AlexNet, VGGNet, and ResNet. The convolutional neural networks (CNNs) represent the heart of state-of-the-art object detection methods.

Keywordsĭetecting objects in a scene proved to be a very difficult task, which has been investigated for a variety of applications in recent years, such as face detection, self-driving cars, medical disease detection, video surveillance, and for natural disaster protection. The results have surpassed all the traditional methods, and in some cases, outperformed the human being’s performance. We demonstrate that the application of some convolutional neural network architectures has yielded very promising state-of-the-art results in image classification in the first place and then in the object detection task. We Also outline the main features of each architecture. We test and evaluate them in the common datasets and benchmarks up-to-date. We analyze and focus on the various state-of-the-art convolutional neural networks serving as a backbone in object detection models.

In this paper, we aim to highlight the important role of deep learning and convolutional neural networks in particular in the object detection task. Object detection is considered as one of the main challenges in the field of computer vision, which focuses on identifying and locating objects of different classes in an image.

Detecting objects in images is an extremely important step in many image and video analysis applications.

0 Comments

Backbone network

Leave a Reply.

Author

Archives

Categories