List: Object Detection | Curated by Ritvik Rastogi

Oct 25, 2024

12 stories

5 saves

Object Detection
A foundation model towards solving promptable visual segmentation in images and videos based on a simple transformer architecture with streaming memory for real-time video processing.
Ritvik Rastogi
Papers Explained 239: SAM 2Segment Anything Model 2 (SAM 2) is a foundation model designed to solve promptable visual segmentation in images and videos. The model is…
Oct 25, 2024
21
Oct 25, 2024
21
Introduces a novel image segmentation task, model, and dataset, aiming to enable prompt-able, zero-shot transfer learning in computer vision.
Ritvik Rastogi
Papers Explained 238: Segment Anything ModelThe Segment Anything (SA) project aims to build a foundation model for segmentation by introducing three interconnected components: a…
Oct 24, 2024
20
Oct 24, 2024
20
Employs Vision Transformers, CLIP-based contrastive pre-training, and bipartite matching loss for open-vocabulary detection, utilizing image-level pre-training, multihead attention pooling, and mosaic image augmentation.
Ritvik Rastogi
Papers Explained 237: OWL ViTOWL ViT (Vision Transformer for Open-World Localization) proposes a strong recipe for transferring image-text models to open-vocabulary…
Oct 23, 2024
22
Oct 23, 2024
22
A novel transformers based object detection model that treats object detection as a set prediction problem, eliminating the need for hand-designed components.
Ritvik Rastogi
Papers Explained 79: DETRDEtection TRansformer or DETR streamlines the detection pipeline, effectively removing the need for many hand-designed components like a…
Dec 15, 2023
22
Dec 15, 2023
22
Proposes a multi-stage approach where detectors are trained with progressively higher IoU thresholds, improving selectivity against false positives.
Ritvik Rastogi
Papers Explained 77: Cascade RCNNThe Intersection over Union (IoU) threshold is crucial in object detection for defining positives and negatives. While low IoU thresholds…
Dec 8, 2023
24
Dec 8, 2023
24
Addresses class imbalance in dense object detectors by down-weighting the loss assigned to well-classified examples.
In
DAIR.AI
by
Ritvik Rastogi
Papers Explained 22: Focal Loss for Dense Object Detection (RetinaNet)The highest accuracy object detectors to date are based on a two-stage approach popularized by R-CNN, where a classifier is applied to a…
Feb 7, 2023
12
Feb 7, 2023
12
Extends Faster R-CNN to solve instance segmentation tasks, by adding a branch for predicting an object mask in parallel with the existing branch.
In
DAIR.AI
by
Ritvik Rastogi
Papers Explained 17: Mask RCNNFaster R-CNN consists of two stages. The first stage, called a Region Proposal Network (RPN), proposes candidate object bounding boxes. The…
Feb 7, 2023
10
Feb 7, 2023
10
Leverages the inherent multi-scale hierarchy of deep convolutional networks to efficiently construct feature pyramids.
In
DAIR.AI
by
Ritvik Rastogi
Papers Explained 21: Feature Pyramid NetworkFeature pyramids are a basic component in recognition systems for detecting objects at different scales. But recent deep learning object…
Feb 7, 2023
10
Feb 7, 2023
10
Discretizes bounding box outputs over a span of various scales and aspect ratios per feature map.
In
DAIR.AI
by
Ritvik Rastogi
Papers Explained 31: Single Shot MultiBox DetectorThe SSD approach is based on a feed-forward convolutional network that produces a fixed-size collection of bounding boxes and scores for…
Feb 14, 2023
40
Feb 14, 2023
40
A region proposal network (RPN) and a Fast R-CNN detector, collaboratively predict object regions by sharing convolutional features.
In
DAIR.AI
by
Ritvik Rastogi
Papers Explained 16: Faster RCNNFaster R-CNN, is composed of two modules.
The first module is a deep fully convolutional network that proposes regions, and the second…
Feb 7, 2023
40
1
Feb 7, 2023
40
1
Processes entire image through CNN, employs RoI Pooling to extract feature vectors from ROIs, followed by classification and BBox regression.
In
DAIR.AI
by
Ritvik Rastogi
Papers Explained 15: Fast RCNNLimitations of RCNN and SPPnets
Feb 7, 2023
10
Feb 7, 2023
10
Uses selective search for region proposals, CNNs for feature extraction, SVM for classification followed by box offset regression.
In
DAIR.AI
by
Ritvik Rastogi
Papers Explained 14: RCNNArchitecture
Feb 7, 2023
10
Feb 7, 2023
10

Object Detection

Papers Explained 239: SAM 2

Segment Anything Model 2 (SAM 2) is a foundation model designed to solve promptable visual segmentation in images and videos. The model is…

Papers Explained 238: Segment Anything Model

The Segment Anything (SA) project aims to build a foundation model for segmentation by introducing three interconnected components: a…

Papers Explained 237: OWL ViT

OWL ViT (Vision Transformer for Open-World Localization) proposes a strong recipe for transferring image-text models to open-vocabulary…

Papers Explained 79: DETR

DEtection TRansformer or DETR streamlines the detection pipeline, effectively removing the need for many hand-designed components like a…

Papers Explained 77: Cascade RCNN

The Intersection over Union (IoU) threshold is crucial in object detection for defining positives and negatives. While low IoU thresholds…

Papers Explained 22: Focal Loss for Dense Object Detection (RetinaNet)

The highest accuracy object detectors to date are based on a two-stage approach popularized by R-CNN, where a classifier is applied to a…

Papers Explained 17: Mask RCNN

Faster R-CNN consists of two stages. The first stage, called a Region Proposal Network (RPN), proposes candidate object bounding boxes. The…

Papers Explained 21: Feature Pyramid Network

Feature pyramids are a basic component in recognition systems for detecting objects at different scales. But recent deep learning object…

Papers Explained 31: Single Shot MultiBox Detector

The SSD approach is based on a feed-forward convolutional network that produces a fixed-size collection of bounding boxes and scores for…

Papers Explained 16: Faster RCNN

Faster R-CNN, is composed of two modules. The first module is a deep fully convolutional network that proposes regions, and the second…

Papers Explained 15: Fast RCNN

Limitations of RCNN and SPPnets

Papers Explained 14: RCNN

Architecture

Ritvik Rastogi