The tracking module provides various algorithms for tracking objects across video frames, from classical methods like Kalman filtering to modern deep learning-based approaches.
Tracker Base Class
Base abstract class for long-term object trackers.
class Tracker {
public:
virtual void init(InputArray image, const Rect& boundingBox) = 0;
virtual bool update(InputArray image, Rect& boundingBox) = 0;
};
init
Initialize the tracker with a known bounding box that surrounds the target.
virtual void init(
InputArray image,
const Rect& boundingBox
);
The initial frame containing the object to track.
The initial bounding box surrounding the target object.
update
Update the tracker and find the new most likely bounding box for the target.
virtual bool update(
InputArray image,
Rect& boundingBox
);
The current frame to process.
Output parameter for the new target location. Updated only if the function returns true.
Returns: true if the target was located, false if the tracker cannot locate the target. Note that false does not necessarily mean the tracker has failed—the target may be temporarily out of view.
KalmanFilter
Implements a standard Kalman filter for state estimation.
class KalmanFilter {
public:
KalmanFilter();
KalmanFilter(int dynamParams, int measureParams, int controlParams = 0, int type = CV_32F);
void init(int dynamParams, int measureParams, int controlParams = 0, int type = CV_32F);
const Mat& predict(const Mat& control = Mat());
const Mat& correct(const Mat& measurement);
// State vectors and matrices
Mat statePre; // Predicted state (x'(k))
Mat statePost; // Corrected state (x(k))
Mat transitionMatrix; // State transition matrix (A)
Mat controlMatrix; // Control matrix (B)
Mat measurementMatrix; // Measurement matrix (H)
Mat processNoiseCov; // Process noise covariance (Q)
Mat measurementNoiseCov;// Measurement noise covariance (R)
Mat errorCovPre; // Priori error covariance (P'(k))
Mat gain; // Kalman gain (K(k))
Mat errorCovPost; // Posteriori error covariance (P(k))
};
Constructor
KalmanFilter(
int dynamParams,
int measureParams,
int controlParams = 0,
int type = CV_32F
);
Dimensionality of the state vector.
Dimensionality of the measurement vector.
Dimensionality of the control vector. Default: 0 (no control).
Type of the created matrices. Should be CV_32F or CV_64F. Default: CV_32F.
predict
Computes a predicted state.
const Mat& predict(const Mat& control = Mat());
Optional input control vector.
Returns: Reference to the predicted state vector.
correct
Updates the predicted state from the measurement.
const Mat& correct(const Mat& measurement);
The measured system parameters.
Returns: Reference to the corrected state vector.
The Kalman filter operates in two steps: prediction (using the system model) and correction (using measurements). The filter maintains estimates of the state and its uncertainty through covariance matrices.
Example
// Create Kalman filter: 4D state (x, y, dx, dy), 2D measurement (x, y)
KalmanFilter kf(4, 2, 0);
// Initialize state transition matrix (constant velocity model)
kf.transitionMatrix = (Mat_<float>(4, 4) <<
1, 0, 1, 0,
0, 1, 0, 1,
0, 0, 1, 0,
0, 0, 0, 1);
// Initialize measurement matrix
kf.measurementMatrix = (Mat_<float>(2, 4) <<
1, 0, 0, 0,
0, 1, 0, 0);
// Set process and measurement noise
setIdentity(kf.processNoiseCov, Scalar::all(1e-5));
setIdentity(kf.measurementNoiseCov, Scalar::all(1e-1));
// Tracking loop
while (true) {
Mat prediction = kf.predict();
Mat measurement = getMeasurement(); // Your measurement function
Mat estimated = kf.correct(measurement);
}
TrackerMIL
Multiple Instance Learning (MIL) tracker that trains a classifier online to separate object from background.
class TrackerMIL : public Tracker {
public:
struct Params {
float samplerInitInRadius; // Radius for positive samples during init
int samplerInitMaxNegNum; // # negative samples during init
float samplerSearchWinSize; // Search window size
float samplerTrackInRadius; // Radius for positive samples during tracking
int samplerTrackMaxPosNum; // # positive samples during tracking
int samplerTrackMaxNegNum; // # negative samples during tracking
int featureSetNumFeatures; // # features
};
static Ptr<TrackerMIL> create(const Params& parameters = Params());
};
MIL avoids the drift problem for robust tracking. The implementation is based on “Visual Tracking with Online Multiple Instance Learning” by Babenko et al.
Example
Ptr<TrackerMIL> tracker = TrackerMIL::create();
Rect bbox = selectROI(frame); // User selects initial bounding box
tracker->init(frame, bbox);
while (true) {
cap >> frame;
if (tracker->update(frame, bbox)) {
rectangle(frame, bbox, Scalar(0, 255, 0), 2);
}
imshow("Tracking", frame);
}
TrackerGOTURN
Generic Object Tracking Using Regression Networks - a CNN-based tracker trained offline.
class TrackerGOTURN : public Tracker {
public:
struct Params {
std::string modelTxt; // Path to .prototxt file
std::string modelBin; // Path to .caffemodel file
};
static Ptr<TrackerGOTURN> create(const Params& parameters = Params());
};
GOTURN is much faster than online-training CNN trackers due to its offline training approach. It handles viewpoint changes, lighting changes, and deformations well, but does not handle occlusions. Requires pre-trained models (goturn.prototxt and goturn.caffemodel).
Example
TrackerGOTURN::Params params;
params.modelTxt = "goturn.prototxt";
params.modelBin = "goturn.caffemodel";
Ptr<TrackerGOTURN> tracker = TrackerGOTURN::create(params);
Rect bbox = selectROI(frame);
tracker->init(frame, bbox);
while (true) {
cap >> frame;
if (tracker->update(frame, bbox)) {
rectangle(frame, bbox, Scalar(255, 0, 0), 2);
}
imshow("GOTURN Tracking", frame);
}
TrackerDaSiamRPN
Deep learning-based tracker using Siamese Region Proposal Networks.
class TrackerDaSiamRPN : public Tracker {
public:
struct Params {
std::string model; // SiamRPN model path
std::string kernel_cls1; // CLS kernel path
std::string kernel_r1; // R1 kernel path
int backend; // DNN backend
int target; // DNN target device
};
static Ptr<TrackerDaSiamRPN> create(const Params& parameters = Params());
virtual float getTrackingScore() = 0;
};
Returns the tracking confidence score for the current frame.
TrackerNano
Super lightweight DNN-based tracker with model size of only 1.9 MB.
class TrackerNano : public Tracker {
public:
struct Params {
std::string backbone; // Backbone model for feature extraction
std::string neckhead; // Neckhead model for localization
int backend; // DNN backend
int target; // DNN target device
};
static Ptr<TrackerNano> create(const Params& parameters = Params());
virtual float getTrackingScore() = 0;
};
Nano tracker is extremely lightweight and fast due to its special model structure. Requires two models: one for feature extraction (backbone) and another for localization (neckhead).
TrackerVit
Vision Transformer (ViT) based tracker, extremely lightweight at approximately 767KB.
class TrackerVit : public Tracker {
public:
struct Params {
std::string net; // Model path
int backend; // DNN backend
int target; // DNN target device
Scalar meanvalue; // Mean for preprocessing
Scalar stdvalue; // Std for preprocessing
float tracking_score_threshold; // Score threshold
};
static Ptr<TrackerVit> create(const Params& parameters = Params());
virtual float getTrackingScore() = 0;
};
Mean values for image preprocessing. Default: (0.485, 0.456, 0.406)
Standard deviation values for image preprocessing. Default: (0.229, 0.224, 0.225)
Minimum confidence threshold for tracking. Default: 0.20
Comparison of Trackers
Classical
Deep Learning (Medium)
Deep Learning (Fast)
MIL (Multiple Instance Learning)
- Pros: Robust, handles appearance changes
- Cons: Slower than modern methods
- Use case: General purpose tracking
GOTURN
- Pros: Fast, no online training
- Cons: Doesn’t handle occlusions
- Model size: ~500MB
- Use case: Real-time tracking without occlusions
Nano, Vit
- Pros: Extremely lightweight, very fast
- Model size: 1-2MB
- Use case: Embedded systems, mobile devices
DaSiamRPN
- Pros: High accuracy, robust
- Cons: Larger model size
- Use case: High-accuracy tracking