Object Tracking

The tracking module provides various algorithms for tracking objects across video frames, from classical methods like Kalman filtering to modern deep learning-based approaches.

Tracker Base Class

Base abstract class for long-term object trackers.

class Tracker {
public:
    virtual void init(InputArray image, const Rect& boundingBox) = 0;
    virtual bool update(InputArray image, Rect& boundingBox) = 0;
};

init

Initialize the tracker with a known bounding box that surrounds the target.

virtual void init(
    InputArray image,
    const Rect& boundingBox
);

image

InputArray

required

The initial frame containing the object to track.

boundingBox

const Rect&

required

The initial bounding box surrounding the target object.

update

Update the tracker and find the new most likely bounding box for the target.

virtual bool update(
    InputArray image,
    Rect& boundingBox
);

image

InputArray

required

The current frame to process.

boundingBox

Rect&

required

Output parameter for the new target location. Updated only if the function returns true.

Returns: true if the target was located, false if the tracker cannot locate the target. Note that false does not necessarily mean the tracker has failed—the target may be temporarily out of view.

KalmanFilter

Implements a standard Kalman filter for state estimation.

class KalmanFilter {
public:
    KalmanFilter();
    KalmanFilter(int dynamParams, int measureParams, int controlParams = 0, int type = CV_32F);
    
    void init(int dynamParams, int measureParams, int controlParams = 0, int type = CV_32F);
    const Mat& predict(const Mat& control = Mat());
    const Mat& correct(const Mat& measurement);
    
    // State vectors and matrices
    Mat statePre;           // Predicted state (x'(k))
    Mat statePost;          // Corrected state (x(k))
    Mat transitionMatrix;   // State transition matrix (A)
    Mat controlMatrix;      // Control matrix (B)
    Mat measurementMatrix;  // Measurement matrix (H)
    Mat processNoiseCov;    // Process noise covariance (Q)
    Mat measurementNoiseCov;// Measurement noise covariance (R)
    Mat errorCovPre;        // Priori error covariance (P'(k))
    Mat gain;               // Kalman gain (K(k))
    Mat errorCovPost;       // Posteriori error covariance (P(k))
};

Constructor

KalmanFilter(
    int dynamParams,
    int measureParams,
    int controlParams = 0,
    int type = CV_32F
);

dynamParams

int

required

Dimensionality of the state vector.

measureParams

int

required

Dimensionality of the measurement vector.

controlParams

int

Dimensionality of the control vector. Default: 0 (no control).

type

int

Type of the created matrices. Should be CV_32F or CV_64F. Default: CV_32F.

predict

Computes a predicted state.

const Mat& predict(const Mat& control = Mat());

control

const Mat&

Optional input control vector.

Returns: Reference to the predicted state vector.

correct

Updates the predicted state from the measurement.

const Mat& correct(const Mat& measurement);

measurement

const Mat&

required

The measured system parameters.

Returns: Reference to the corrected state vector.

The Kalman filter operates in two steps: prediction (using the system model) and correction (using measurements). The filter maintains estimates of the state and its uncertainty through covariance matrices.

Example

// Create Kalman filter: 4D state (x, y, dx, dy), 2D measurement (x, y)
KalmanFilter kf(4, 2, 0);

// Initialize state transition matrix (constant velocity model)
kf.transitionMatrix = (Mat_<float>(4, 4) << 
    1, 0, 1, 0,
    0, 1, 0, 1,
    0, 0, 1, 0,
    0, 0, 0, 1);

// Initialize measurement matrix
kf.measurementMatrix = (Mat_<float>(2, 4) << 
    1, 0, 0, 0,
    0, 1, 0, 0);

// Set process and measurement noise
setIdentity(kf.processNoiseCov, Scalar::all(1e-5));
setIdentity(kf.measurementNoiseCov, Scalar::all(1e-1));

// Tracking loop
while (true) {
    Mat prediction = kf.predict();
    Mat measurement = getMeasurement(); // Your measurement function
    Mat estimated = kf.correct(measurement);
}

TrackerMIL

Multiple Instance Learning (MIL) tracker that trains a classifier online to separate object from background.

class TrackerMIL : public Tracker {
public:
    struct Params {
        float samplerInitInRadius;      // Radius for positive samples during init
        int samplerInitMaxNegNum;       // # negative samples during init
        float samplerSearchWinSize;     // Search window size
        float samplerTrackInRadius;     // Radius for positive samples during tracking
        int samplerTrackMaxPosNum;      // # positive samples during tracking
        int samplerTrackMaxNegNum;      // # negative samples during tracking
        int featureSetNumFeatures;      // # features
    };
    
    static Ptr<TrackerMIL> create(const Params& parameters = Params());
};

MIL avoids the drift problem for robust tracking. The implementation is based on “Visual Tracking with Online Multiple Instance Learning” by Babenko et al.

Example

Ptr<TrackerMIL> tracker = TrackerMIL::create();
Rect bbox = selectROI(frame); // User selects initial bounding box
tracker->init(frame, bbox);

while (true) {
    cap >> frame;
    if (tracker->update(frame, bbox)) {
        rectangle(frame, bbox, Scalar(0, 255, 0), 2);
    }
    imshow("Tracking", frame);
}

TrackerGOTURN

Generic Object Tracking Using Regression Networks - a CNN-based tracker trained offline.

class TrackerGOTURN : public Tracker {
public:
    struct Params {
        std::string modelTxt;  // Path to .prototxt file
        std::string modelBin;  // Path to .caffemodel file
    };
    
    static Ptr<TrackerGOTURN> create(const Params& parameters = Params());
};

GOTURN is much faster than online-training CNN trackers due to its offline training approach. It handles viewpoint changes, lighting changes, and deformations well, but does not handle occlusions. Requires pre-trained models (goturn.prototxt and goturn.caffemodel).

Example

TrackerGOTURN::Params params;
params.modelTxt = "goturn.prototxt";
params.modelBin = "goturn.caffemodel";

Ptr<TrackerGOTURN> tracker = TrackerGOTURN::create(params);
Rect bbox = selectROI(frame);
tracker->init(frame, bbox);

while (true) {
    cap >> frame;
    if (tracker->update(frame, bbox)) {
        rectangle(frame, bbox, Scalar(255, 0, 0), 2);
    }
    imshow("GOTURN Tracking", frame);
}

TrackerDaSiamRPN

Deep learning-based tracker using Siamese Region Proposal Networks.

class TrackerDaSiamRPN : public Tracker {
public:
    struct Params {
        std::string model;       // SiamRPN model path
        std::string kernel_cls1; // CLS kernel path
        std::string kernel_r1;   // R1 kernel path
        int backend;             // DNN backend
        int target;              // DNN target device
    };
    
    static Ptr<TrackerDaSiamRPN> create(const Params& parameters = Params());
    virtual float getTrackingScore() = 0;
};

getTrackingScore()

float

Returns the tracking confidence score for the current frame.

TrackerNano

Super lightweight DNN-based tracker with model size of only 1.9 MB.

class TrackerNano : public Tracker {
public:
    struct Params {
        std::string backbone;  // Backbone model for feature extraction
        std::string neckhead;  // Neckhead model for localization
        int backend;           // DNN backend
        int target;            // DNN target device
    };
    
    static Ptr<TrackerNano> create(const Params& parameters = Params());
    virtual float getTrackingScore() = 0;
};

Nano tracker is extremely lightweight and fast due to its special model structure. Requires two models: one for feature extraction (backbone) and another for localization (neckhead).

TrackerVit

Vision Transformer (ViT) based tracker, extremely lightweight at approximately 767KB.

class TrackerVit : public Tracker {
public:
    struct Params {
        std::string net;                    // Model path
        int backend;                        // DNN backend
        int target;                         // DNN target device
        Scalar meanvalue;                   // Mean for preprocessing
        Scalar stdvalue;                    // Std for preprocessing
        float tracking_score_threshold;     // Score threshold
    };
    
    static Ptr<TrackerVit> create(const Params& parameters = Params());
    virtual float getTrackingScore() = 0;
};

meanvalue

Scalar

Mean values for image preprocessing. Default: (0.485, 0.456, 0.406)

stdvalue

Scalar

Standard deviation values for image preprocessing. Default: (0.229, 0.224, 0.225)

tracking_score_threshold

float

Minimum confidence threshold for tracking. Default: 0.20

Comparison of Trackers

Classical
Deep Learning (Medium)
Deep Learning (Fast)

MIL (Multiple Instance Learning)

Pros: Robust, handles appearance changes
Cons: Slower than modern methods
Use case: General purpose tracking

​Tracker Base Class

​init

​update

​KalmanFilter

​Constructor

​predict

​correct

​Example

​TrackerMIL

​Example

​TrackerGOTURN

​Example

​TrackerDaSiamRPN

​TrackerNano

​TrackerVit

​Comparison of Trackers

Tracker Base Class

init

update

KalmanFilter

Constructor

predict

correct

Example

TrackerMIL

Example

TrackerGOTURN

Example

TrackerDaSiamRPN

TrackerNano

TrackerVit

Comparison of Trackers