AI Has Endless Possibilities for AV
In January I wrote about the emergence of ChatGPT and how it is causing waves across the country. I wrote about my belief that ChatGPT, in particular, was going to have a major effect on how we teach students. ChatGPT has allowed people to expand their thinking of what computing can actually do. Previously, we always thought of computing and computers as simply doing what a programmer told them to do. With the launch of this project, however, we are able to realize a different dimension, with computers “thinking” on their own and “learning” as they interact with humans.
Now, with this column, I want to focus more on the concept of artificial intelligence in general, and what the future is going to look like. Let’s start with a definition of artificial intelligence. The Oxford Dictionary defines artificial intelligence as “the theory and development of computer systems able to perform tasks that normally require human intelligence, such as visual perception, speech recognition, decision-making, and translation between languages.” In AV, the concept of something that “normally requires human intelligence” is where we will find the power.
A product rAVe [PUBS] highlighted from ISE allows us to start thinking about this. The Sony SRG-A40 and SRG-A12 as highlighted here use AI. The AI technology allows the camera to auto-track, pan, tilt and zoom without any of the techniques we used in the past (tracking badges, sensor pressures, etc). This is real AI because it does what only a human could do in the past (make decisions about tracking and PTZ). It does this by recognizing the speaker and making decisions on their movement in the room. Yes, there is more development needed on these products in order to replace a human in a broadcast environment, but for recording a speaker in a room, they are a huge leap forward.
Imagine now though how this product could develop even more, and how that would apply to other AV technologies. What if a device was linked to the camera and microphones in the room so it could understand regular language and combine that with the image it is capturing? Unlike our already ubiquitous devices like — Siri or Alexa — which require specific commands to be given to them, what if the camera could understand a presenter who was speaking to the audience and not to the control system? For example, as a presenter is talking and writing on a whiteboard, the AI is able to understand from their language (both verbal and body) what they are doing, and from there decides to zoom in on the section of the whiteboard that is being written on. This is something that in the past we would need a human to make such decisions.
These concepts can be applied to just about every device in a room, and every type of space. Think of a conference room in a corporate environment. People walk into the room and start taking their seats. Someone says, “is Jim on Zoom today?” and another person replies “Yes”. The AI interprets this conversation and closes the shades, turns on the monitor, goes to the Zoom Room input and activates the appropriate mic settings for where people are sitting. It then continues to manage the cameras and microphones during the conference. Finally, when someone says “goodbye Jim,” the system shuts down and raises the blinds. Think of the house-of-worship applications in which AI systems are determining whether the choir is up and singing or sitting down or whether someone is at a pulpit, and moves cameras and sets microphones appropriately.
I am incredibly bullish on the possibilities that AI will bring to the AV environment for a couple of reasons. First, I think it is what we have been thinking about for years when we discussed the need for systems to understand “natural language.” But we were not quite as forward-looking as we could have been. Natural language always represented a control system being more forgiving in the commands given to it (i.e. you have to say “Hey Siri,” not “Hello Siri”). Second, in addition to the capabilities of AI, there are possibilities for machine learning. Think of a home theater. In the middle of a movie, someone gets out of their chair, and the AI recognizes this, so the movie pauses. The homeowner is only getting water from the fridge in the room, so simply says “I am getting water and when I get water I don’t want you to pause.” The next time they get up and head to the door to go to the bathroom, it pauses again, and the owner lets it. The system has now learned the difference and behaves in the way the owner desires. Third, I am excited because the AI systems we are seeing today are already so powerful, that we can envision expansive growth in a short time. No doubt, the big hold-up is going to be money. AI systems need to connect to extremely powerful, fast and expensive supercomputers in order to process language quickly. At the moment, this is the expense of AI. This will lead to more options for “as-a-service” in the industry in order for providers to be able to lower the cost by sharing the resources with many.