Over the years, video analytics has gained an unfavourable reputation for over-promising and under-delivering in terms of performance. One of the biggest complaints regarding video analytics has been its inability to correctly identify objects in situations which appear trivial to the human observer. In many cases, this has resulted in a tendency to generate substantial numbers of false alarms, while not detecting actual events accurately. This, together with a propensity for complex set-up procedures and much need for manual fine-tuning, has prevented video analytics from becoming a mainstream application deployed on large numbers of cameras.

Machine Learning and video analytics

Machine Learning is a well-established field of research that has existed for decades, which is already present in many products and applications. Machine Learning is based on collecting large amounts of data specific to a particular problem, training a model using this data and then employing this model to process new data. With regard to video analytics, one of the most critical problems impacting accuracy is object classification. Fundamental to improving performance is the capability to teach the algorithm to distinguish between people, animals, different types of vehicles and sources of noise at an extremely high level of accuracy.

Deep Learning algorithms

The recent increased interest in Deep Learning is largely due to the availability of graphical processing units

Until recently there have been minimal applications of Machine Learning used in video analytics products, largely due to high complexity and high resource usage, which made such products too costly for mainstream deployment. However, the last couple of years have seen a tremendous surge in research and advances surrounding a branch of Machine Learning called Deep Learning. Deep Learning is a name used to describe a family of algorithms based on the concept of neural networks. Very loosely speaking, these algorithms try to emulate the functionality of the brain’s neurons, enabling them to learn efficiently from example, and subsequently apply this learning to new data.

The recent increased interest in Deep Learning is largely due to the availability of graphical processing units (GPUs). GPUs can efficiently train and run Deep Learning algorithms, and have allowed the scientific community to accelerate their research and application, bringing them to the point where they exceed the performance of most traditional Machine Learning algorithms across several categories.

Solving object classification with deep learning

This means that Deep Learning can now be used to solve the most crucial problem facing video analytics – object classification – by collecting many thousands of images from hundreds of surveillance cameras, which must first be manually labelled and classified by a human, into a range of categories that include: person, car, bus, truck, bird, vegetation, dog and many more. To achieve the required accuracy rates, such a vast database must be collected and identified from actual surveillance footage.

A crucial component in achieving and maintaining the high performance of Deep Learning-based applications is the ability to continuously update the models as more data is collected
Basic classification and false alarm reduction are the first applications of Deep Learning for video analytics

A Deep Learning algorithm trained on images collected from YouTube, Google Search and elsewhere on the Internet will completely fail in analysing images from surveillance cameras, due to the difference in viewing angles, resolution and image quality. Once enough images are collected, a Deep Learning classifier algorithm can be trained and deployed as part of a video analytics solution, enabling it to practically eliminate most of the existing causes for false alarms.

Due to GPU requirements in order for the algorithms to run efficiently, video analytics solutions using Deep Learning will initially need to run on a server. A few solutions of this nature are already available and are showing a dramatic leap in performance in comparison to traditional video analytics, with a drastic reduction in false alarm rates and a significant increase in detection accuracy. Concurrently, these new solutions do not require manual tweaking by the user and are essentially plug-and-play, making mass deployment a realistic premise.

Surveillance applications of Deep Learning

Basic classification and false alarm reduction are the first applications of Deep Learning for video analytics, but they are by no means the only ones. In the not-too-distant future, we will see Deep Learning enabling as yet not possible video analytics applications, such as identifying objects carried by people, such as a gun, handbag, or a knife, or being able to quickly find people and vehicles with similar appearances across multiple cameras and more.

Over the next few years, we will see a transition of video analytics using Deep Learning running on servers to running inside cameras

Over the next few years, we will see a transition of video analytics using Deep Learning running on servers, to running inside cameras, as powerful, low-cost hardware capable of running Deep Learning becomes more available and a basic function of newer surveillance camera models. This will push the acceptance of video analytics even further, eventually making it a fundamental element of every surveillance camera deployed.

Increasing accuracy with updates

A crucial component in achieving and maintaining the high performance of Deep Learning-based applications, is the ability to continuously update the models as more data is collected so that the models increase in accuracy. This will give an advantage to cloud-based video analytics services, since they can collect vast amounts of data from cameras connected to the service, train new models in the cloud based on this data and then push these new models to cameras at the edge. This continuous improvement cycle will be instrumental in helping video analytics fulfil the promise of improving peoples’ safety and security, by giving surveillance cameras human-level accuracy and a comprehensive understanding of the environment.