Mankind is on the verge of another breakthrough in the realm of artificial intelligence. A new AI system can now "predict" how a scene "unfolds" and even create what it thinks is the next course of events of situations -- from an image at least.
Scientists at MIT have created a deep learning algorithm that generates a "mini video" showing what can happen after being showed a picture. For instance, a train station can lead to a train pulling away. A beach can "let" the AI "animate" waves.
According to New Scientist, "teaching" the AI how to anticipate the future can help it understand present activities. The article added that while humans understand that preparing a meal meant eating it after, AI finds this hard to grasp.
A system like this can help AI assistants recognize threats, such as when someone's about to fall, or to prevent self-driving accidents.
Carl Vondrick of MIT said that any robot that plans to operate in our world will have to have some measure of predicting the future. Vondrick and his colleagues plan to present the paper on December 5 in a neural computing conference.
According to Motherboard, the team apparently used two million videos from Flickr to help the AI "learn" to predict the future. These videos featured beaches, golf courses, stations and even babies in hospitals. As the videos are unlabelled, the AI will not have any guides to understand them. Afterward, the researchers gave the AI still images and let it produce its own "micro-movie" of what may happen next.
They taught the AI how to make videos using "adversarial networks." According to Engadget, one of them generates the videos, while the other "judges" whether they look real or fake. The two are constantly in competition, with one trying its best to make realistic videos, and the other trying its best to judge real videos from fake ones.
The videos are currently in low-res and have 32 frames, which last for just a second. However, they are sharp enough to show the right kind of movement, such as trains moving forward or babies forming faces.
The videos appear to be an impressive feat in the field of AI, but the system still has a lot to learn. For instance, the AI needs to learn that a train leaving meant also leaving the frame. This is because the AI doesn't have prior knowledge on "how the world works" or common sense.
Regardless, this illustrates the various strides in science if computer vision is combined with machine learning. John Daugman at the University of Cambridge said a key aspect of effective systems is to analyze and recognize that things operate on a causal structure that happens over time.
Vondrick is now trying to make the system make larger and longer videos. Mankind may soon develop systems that can "hallucinate" reasonable and plausible futures of images and videos.