Skip to main content

Object Detection

Learn how to detect objects in images and video using classical computer vision techniques including Haar cascades and Histogram of Oriented Gradients (HOG) detectors.

Haar Cascade Classifiers

Haar cascades are machine learning-based classifiers trained to detect specific objects. OpenCV comes with pre-trained models for faces, eyes, pedestrians, and more.

Loading Cascade Classifiers

import cv2 as cv

# Load pre-trained cascade classifier
face_cascade = cv.CascadeClassifier(cv.data.haarcascades + 
                                   'haarcascade_frontalface_default.xml')
eye_cascade = cv.CascadeClassifier(cv.data.haarcascades + 
                                  'haarcascade_eye.xml')

# Alternative: load from file path
# face_cascade = cv.CascadeClassifier('haarcascade_frontalface_alt.xml')

# Check if cascade loaded successfully
if face_cascade.empty():
    print('Error loading cascade classifier')
    exit()

Basic Object Detection

import cv2 as cv

# Load image
img = cv.imread('group_photo.jpg')
gray = cv.cvtColor(img, cv.COLOR_BGR2GRAY)

# Load cascade
face_cascade = cv.CascadeClassifier(cv.data.haarcascades + 
                                   'haarcascade_frontalface_default.xml')

# Detect faces
faces = face_cascade.detectMultiScale(
    gray,
    scaleFactor=1.1,      # How much image size is reduced at each scale
    minNeighbors=5,       # How many neighbors each candidate should have
    minSize=(30, 30),     # Minimum object size
    flags=cv.CASCADE_SCALE_IMAGE
)

print(f"Found {len(faces)} faces")

# Draw rectangles around detected faces
for (x, y, w, h) in faces:
    cv.rectangle(img, (x, y), (x+w, y+h), (0, 255, 0), 2)

cv.imshow('Face Detection', img)
cv.waitKey(0)

Nested Detection (Faces and Eyes)

Based on OpenCV’s facedetect.py sample:
import cv2 as cv

def detect(img, cascade):
    """Detect objects using cascade classifier"""
    rects = cascade.detectMultiScale(img, scaleFactor=1.3, 
                                    minNeighbors=4, minSize=(30, 30),
                                    flags=cv.CASCADE_SCALE_IMAGE)
    if len(rects) == 0:
        return []
    rects[:,2:] += rects[:,:2]  # Convert to (x1, y1, x2, y2)
    return rects

def draw_rects(img, rects, color):
    """Draw rectangles on image"""
    for x1, y1, x2, y2 in rects:
        cv.rectangle(img, (x1, y1), (x2, y2), color, 2)

# Load cascades
face_cascade = cv.CascadeClassifier(cv.samples.findFile(
    'haarcascades/haarcascade_frontalface_alt.xml'))
eye_cascade = cv.CascadeClassifier(cv.samples.findFile(
    'haarcascades/haarcascade_eye.xml'))

# Load and process image
img = cv.imread('face.jpg')
gray = cv.cvtColor(img, cv.COLOR_BGR2GRAY)
gray = cv.equalizeHist(gray)  # Improve contrast

# Detect faces
faces = detect(gray, face_cascade)
vis = img.copy()
draw_rects(vis, faces, (0, 255, 0))  # Green for faces

# Detect eyes within each face
if not eye_cascade.empty():
    for x1, y1, x2, y2 in faces:
        roi = gray[y1:y2, x1:x2]
        vis_roi = vis[y1:y2, x1:x2]
        eyes = detect(roi.copy(), eye_cascade)
        draw_rects(vis_roi, eyes, (255, 0, 0))  # Blue for eyes

cv.imshow('Face and Eye Detection', vis)
cv.waitKey(0)
cv.destroyAllWindows()
Key parameters for detectMultiScale():
  • scaleFactor: How much the image size is reduced at each scale (1.1 = 10% reduction). Smaller values are more thorough but slower.
  • minNeighbors: How many neighbors each candidate rectangle should retain. Higher values result in fewer but more accurate detections.
  • minSize: Minimum object size. Objects smaller than this are ignored.

HOG (Histogram of Oriented Gradients) Detector

HOG descriptors are excellent for pedestrian detection.

People Detection with HOG

Based on OpenCV’s peopledetect.py sample:
import cv2 as cv

def inside(r, q):
    """Check if rectangle r is inside rectangle q"""
    rx, ry, rw, rh = r
    qx, qy, qw, qh = q
    return rx > qx and ry > qy and rx + rw < qx + qw and ry + rh < qy + qh

def draw_detections(img, rects, thickness=1):
    """Draw detection rectangles"""
    for x, y, w, h in rects:
        # HOG detector returns slightly larger rectangles
        # so we shrink them a bit
        pad_w, pad_h = int(0.15*w), int(0.05*h)
        cv.rectangle(img, (x+pad_w, y+pad_h), 
                    (x+w-pad_w, y+h-pad_h), (0, 255, 0), thickness)

# Load image
img = cv.imread('people.jpg')

# Create HOG descriptor
hog = cv.HOGDescriptor()
# Set default people detector
hog.setSVMDetector(cv.HOGDescriptor_getDefaultPeopleDetector())

# Detect people
found, weights = hog.detectMultiScale(img, 
                                     winStride=(8, 8),
                                     padding=(32, 32),
                                     scale=1.05)

# Filter overlapping detections
found_filtered = []
for ri, r in enumerate(found):
    for qi, q in enumerate(found):
        if ri != qi and inside(r, q):
            break
    else:
        found_filtered.append(r)

print(f"Found {len(found_filtered)} people (from {len(found)} detections)")

# Draw all detections
draw_detections(img, found)
# Highlight filtered detections
draw_detections(img, found_filtered, 3)

cv.imshow('People Detection', img)
cv.waitKey(0)
cv.destroyAllWindows()

Real-time Detection on Video

import cv2 as cv
import time

# Initialize HOG detector
hog = cv.HOGDescriptor()
hog.setSVMDetector(cv.HOGDescriptor_getDefaultPeopleDetector())

# Open video or camera
cap = cv.VideoCapture(0)  # or 'video.mp4'

while True:
    ret, frame = cap.read()
    if not ret:
        break
    
    # Resize for faster processing
    frame = cv.resize(frame, (640, 480))
    
    # Measure detection time
    start_time = time.time()
    
    # Detect people
    found, weights = hog.detectMultiScale(frame, 
                                         winStride=(8, 8),
                                         padding=(8, 8),
                                         scale=1.05)
    
    elapsed_time = time.time() - start_time
    fps = 1.0 / elapsed_time
    
    # Draw detections
    for (x, y, w, h) in found:
        cv.rectangle(frame, (x, y), (x+w, y+h), (0, 255, 0), 2)
    
    # Display FPS and count
    cv.putText(frame, f'People: {len(found)}', (10, 30),
              cv.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)
    cv.putText(frame, f'FPS: {fps:.1f}', (10, 70),
              cv.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)
    
    cv.imshow('HOG People Detection', frame)
    
    if cv.waitKey(1) & 0xFF == ord('q'):
        break

cap.release()
cv.destroyAllWindows()

Available Pre-trained Cascades

OpenCV includes many pre-trained cascade classifiers:
  • haarcascade_frontalface_default.xml - General frontal face detection
  • haarcascade_frontalface_alt.xml - Alternative frontal face
  • haarcascade_frontalface_alt2.xml - Another alternative
  • haarcascade_profileface.xml - Profile (side) faces
  • lbpcascade_frontalface.xml - LBP-based face detection (faster)
  • haarcascade_eye.xml - General eye detection
  • haarcascade_eye_tree_eyeglasses.xml - Eyes with glasses
  • haarcascade_lefteye_2splits.xml - Left eye
  • haarcascade_righteye_2splits.xml - Right eye
  • haarcascade_fullbody.xml - Full body detection
  • haarcascade_upperbody.xml - Upper body
  • haarcascade_lowerbody.xml - Lower body
  • haarcascade_smile.xml - Smile detection
  • haarcascade_frontalcatface.xml - Cat face detection
  • haarcascade_frontalcatface_extended.xml - Extended cat face
  • haarcascade_licence_plate_rus_16stages.xml - Russian license plates

Custom Cascade Training

You can train custom cascade classifiers for specific objects:
1

Collect Training Data

Gather positive samples (images containing the object) and negative samples (images without the object).
2

Create Sample Description

Create text files listing the locations of positive samples and paths to negative samples.
3

Generate Samples

Use opencv_createsamples to generate training samples from your positive images.
4

Train Cascade

Use opencv_traincascade to train the classifier. This can take hours or days depending on data size.
5

Test and Refine

Test the classifier and collect more samples if needed to improve accuracy.
Training custom cascades requires:
  • Hundreds to thousands of positive samples
  • Even more negative samples
  • Significant computation time (can take days)
  • Careful parameter tuning
For most modern applications, consider using deep learning-based detection instead.

Performance Optimization

import cv2 as cv

img = cv.imread('image.jpg')
gray = cv.cvtColor(img, cv.COLOR_BGR2GRAY)

# Resize for faster detection
scale = 0.5
small = cv.resize(gray, None, fx=scale, fy=scale)

face_cascade = cv.CascadeClassifier(cv.data.haarcascades + 
                                   'haarcascade_frontalface_default.xml')

# Detect on smaller image
faces = face_cascade.detectMultiScale(small, 1.1, 5)

# Scale coordinates back to original size
faces = [[int(x/scale), int(y/scale), 
         int(w/scale), int(h/scale)] for (x, y, w, h) in faces]

# Draw on original image
for (x, y, w, h) in faces:
    cv.rectangle(img, (x, y), (x+w, y+h), (0, 255, 0), 2)
Performance tips:
  • Process at lower resolution (0.5x or 0.25x scale)
  • Use histogram equalization on grayscale images
  • Adjust scaleFactor (larger = faster but less accurate)
  • Increase minNeighbors to reduce false positives
  • Set appropriate minSize to skip small detections

Next Steps