Abstract:
In this thesis, I propose a framework based on a convolutional neural network to perform
multiple object tracking. In particular, I extend the architecture of the convolutional
neural network used in YOLOv3, an object detection algorithm, to perform short (twoframe) tracking. The proposed network takes two image frames as input, detects objects
in one frame, and outputs the locations of the objects in the other frame. Short tracks
are combined in a post processing step to generate long tracks. The network tracks multiple objects simultaneously using only a single forward pass of two image frames. This
makes the tracking framework more efficient compared to methods based on neural networks that follow a traditional tracking-by-detection strategy, which requires repeated
comparison of two sets of detections to score similarities when performing data association. Experimental results on real world data, a quantitaive evaluation, and comparison
with other methods are also included.