Assignment Title

CS 585 HW 2
Zhongping Zhang
02/16/2021


Problem Definition

The target of this homework is to design and implement algorithms that can recognize hand shapes or gestures and create a graphical display that responds to the recognition of the hand shapes or gestures. The result is useful because the algorithms can be applied on videos to recognize the information expressed by gestures or hand shapes. If the system works perfectly, it can save people from the heavy work of checking the videos. Since our method is based on skin color detection, we need to make the assumption that the background does not contain many segments of skin color. We also need to make the assumption that the background is supplied with appropriate natural light because of the sensitivity of template matching.


Method and Implementation

We mainly applied the following techniques in this project:
1. horizontal and vertical projections to find bounding boxes of "movement blobs" or "skin-color blobs"
2. size and center of region of interest
3. template matching
4. frame-to-frame differencing: D’(x,y,t) = |I(x,y,t)-I(x,y,t-1)|
5. skin-color detection (e.g., thresholding red and green pixel values)

Specifically, the video is read frame by frame. For each frame, we used skin-color detection to extract the region of interest (ROI). Then, ROI is compared with templates to recognize the gestures or hand shapes. In this experiment, the method can recognize 4 hand shapes and 1 gestures, which are rock, six, thumbup, thumbdown, and shaking hand respectively. For hand shape recognition, 2 templates are provided for each hand shape and the template with highest NCC output will be chosen as the final result. The gesture recogination is based on frame-to-frame differencing. To extract the ROI from the template images, we used horizontal and vertical projections to find the bounding boxes. To extract the ROI from the videos, since the background can be much more complex than templates, we used cv2.findContours() functions to determine the ROI.


Experiments

In this system, we provided two ways to read data. The first ways is to load data from a video and the second way is to use the front camera of the laptop to record the real-time data. We applied 2 different templates (shown below) for hand shapes. The performance of the system is mainly evaluated by human eyes and confusion matrix. To demonstrate the performance of our method, we provide a demo video "demo.mp4". The outputs of the demo video are saved as "demo_output.avi", "demo_output_skin_mask.avi", and "demo_output_frame_diff.avi".

Front Back
Rock
Six
Thumbup
Thumbdown

Results

Screenshoots

rock six
thumbup (shaking hand) thumbdown (shaking hand)

Videos

Hand Shapes and Gestures Recognition
Skin Color Detection
Frame-to-frame Difference

Confusion matrix for hand shape recognition

Hand ShapeRockSixThumbupThumbdown
Rock10412
Six0600
Thumbup0070
Thumbdown0028

Confusion matrix for motion recognition (shaking hand)

MotionShakingstable
Shaking100
Stable010

Discussion


Conclusions

In this assignment, we build a system which can recognize hand shapes and gestures well under certain conditions. Generally, the preliminary results look good, but the template matching method make the system be sensitive to the interference of background, lightness, angles and so on. To make the system more practical, we need to develop a more robust system. The potential improvements include but are not limited to collecting more templates, including more preprocessing methods, collecting a dataset to develop deep learning frameworks.


Credits and Bibliography

This work is based on content I learned in the class. Specifically, I refered these two links
https://www.cs.bu.edu/faculty/betke/cs585/restricted/lectures/cs585-Feb9-2021.pdf
http://homepages.inf.ed.ac.uk/rbf/CVonline/LOCAL_COPIES/OWENS/LECT2/node3.html