Unrealistic promises by over-enthused marketers and under-delivery by R&D departments have damaged the cause of video analytics almost since its inception. For me, the exaggeration reached its worst point when industry pundits suggested that we would soon be able to identify and alert on anomalous behaviour of the kind demonstrated by the Tsarnaev brothers in the moments before the 2013 Boston Marathon Bombings. Mainstream journalists speculated that the two Chechen brothers might have stood out in a crowd because they were wearing … err … baseball caps.
Irresponsible claims are widespread in the analytics field, and real technological advances will only come about when product developers engage with institutions who are conducting research with academic rigour. A cynic might be forgiven for believing that, with the exception of capturing license plate numbers, CCTV needs human beings to monitor it, period. The reality is more subtle: as Algernon notes in The Importance of Being Earnest: “the truth is rarely pure and never simple.”
Cardiff University collaboration with Airbus Defence and Space
SourceSecurity.com spoke to David Marshall, Professor of Computer Vision at Cardiff University, about his own research, collaboration with applied clinical researchers and the current work of doctoral students. Prof Marshall acknowledges that his current modeling is in essence a research project, but Cardiff University has reached out to industry by collaborating with Airbus Defence and Space on programs with commercial potential.
Flocking birds model
The background to the analytics algorithms being developed at Cardiff is remarkably broad: Marshall is working with a psychologist (Prof Simon Moore, also of Cardiff University), and one of the starting points has been the simulation of drunken behaviour based on models of flocking birds.
He explains: “My colleague Simon was keen to bring standard modelling patterns for group movements of birds and even shoals of fish to bear on research into how crowds of people flow around obstacles at times when many of them may be under the influence of alcohol. Drunkenness immediately introduces an element of randomness to what would otherwise be a tendency to form regular and natural lines.”
Personal space as a simple two-dimensional radius
"The aim is to learn about the warning signs for a potential confrontation, and the algorithm looks for unusual behavioural traits"
He continues: “The other thing we modelled was personal space as a simple two-dimensional radius. Drunk people are less sensitive to occupation of space: they don’t demand a larger radius around them though they are more likely to encroach on what others will consider their own personal space. But get too close to somebody under the influence, and they will tend to lash out. Under normal conditions, people rarely bump into each other, though, of course, if you add alcohol to the mix then collisions become frequent. We wanted to validate our models with real data and obtained surveillance camera streams from police forces in South Wales. We’re also active in police science social research here at Cardiff.”
As with any experimentation of this kind, there was separation of controlled groups; in this case footage of violent incidents compared with normal behaviour. The aim is to learn about the warning signs for a potential confrontation, and the algorithm looks for unusual behavioural traits. Night-time illumination with low contrast is always a challenge, and camera resolution varies. Picture quality, even from local councils with limited budgets, is improving rapidly, but street lighting levels are likely to remain the same.”
The university is sceptical as to how much the improved resolution from megapixel cameras will help its work. “High resolution will not necessarily help us with peoples’ heads and torsos moving around in a seething crowd. The nature of the data remains challenging.” So is Prof Marshall willing to use the adjective ‘intelligent’ of the scene analysis here, this being a crucial distinction for many analytics providers in the commercial sector?
Artificial intelligence machinery and techniques
“We’re certainly using recognised artificial intelligence machinery and techniques for our classifications. We’ve looked at what features can be used to describe the data so we can get discrimination, and then train a standard classifier. But problems with occlusion [obscuring of objects] mean we treat the crowd as a texture flow. When violence erupts, the texture of the crowd changes and otherwise unified movement becomes random and high-frequency.”
Video analytics - A tool for assessing social behaviour
"We’re never going to
accommodate all the
eccentricities of human
behaviour, but the goal is
to develop systems that
can alert operators in
Arguments from those who are skeptical about video analytics as a tool for assessing social behaviour in public spaces can be overwhelming. Early algorithms for street observation proved incapable of distinguishing between the movement involved with somebody taking off their coat and somebody attacking the person next to them. Similarly, for artificial intelligence, greeting a friend with a hug can resemble trying to throttle them. Prof Marshall remains optimistic about the potential. “A hug is lower frequency in movement than a scuffle and will only occur once! Yes. We do of course get some false positives when people use their arms to gesticulate during an animated but non-violent argument. ”
Evaluating footage of crowd dynamics
He continues: “We’re never going to accommodate all the eccentricities of human behaviour, but the goal is to develop systems that can alert operators in control centres if violence may be brewing while accepting a small rate of false positives. The quicker camera control centres can detect a threat, then the quicker police officers and first-responders can be on the scene. We really want to procure substantial video streams that we can study for flashpoints and incidents as they develop rather than footage that has been taken once a situation has unfolded.”
If researchers in computer vision can be given more opportunities to evaluate real rather than simulated footage of crowd dynamics then video analytics may begin to live up to what until now has been seen by many as a series of false dawns.