If you walk by a cat sitting on their owner lounging on a lawn chair under a tree next to a picket fence, you don't need to think about it to figure out what is going on. Science is still trying to figure out how exactly our brain and eyes work together to fulfil such tasks. By combining several fields such as computer science, psychology, Artificial Intelligence, physics, pattern recognition, image processing, and a lot of math, among others, they are trying to teach computers how to do the same thing.
Computer vision refers to feeding images to a computer, which, in turn, identify specific objects or several objects in that image. The applications for enhanced computer vision technology are boundless. For example, these systems can be used in self-driving cars to identify other road users, traffic light changes, the road's extent, etc. Another application is in facial recognition software; for example, how a phone with facial recognition can recognize the owner to grant them access. Similarly, medical practitioners can use computer vision to help them identify anomalies in x-rays and scans accurately.
Here are some of the most fantastic computer vision techniques:
Image detection
It is one thing to input an image of a cat and then to ask the computer to classify the image. It is a different thing to ask a computer to identify multiple items in an image. The former technique is called image classification, but it's called image detection when there are multiple objects to be identified in an image.
"For a computer to identify a specific object, it has to be fed thousands or even more images of this object. The computer should then be able to identify this object despite different lighting, background, scales, image deformations, etc. This means that image detection is more complicated than image classification because the computer has to create profiles for the different objects to be identified to recognize them easily in the future."
Andrei Alkhouski, Machine Learning R&D Engineer from InData Labs, a computer vision company.
Semantic segmentation
Semantic segmentation is similar to image detection in that the computer is expected to accurately detect several classes or types of objects in an image. The difference is that, in this instance, rather than identifying the item using a box, the image is broken down into pixels, and the image is detected by grouping the pixels that make it up. After the pixels are grouped, they are labeled and classified just as they are during image detection.
Semantic segmentation is a significant improvement from image detection. To make such delineations, pixel prediction models are crucial. Imagine you are in a self-driving car. It is not enough that the vehicle can identify the car in front of it. It needs to identify the boundaries of this car to keep a safe distance from it. The car should be watching out for not just one car but multiple cars, pedestrians on curbs, trees, bicycles, and even road signs. Semantic segmentation allows the car to identify all these items and act accordingly.
Instance segmentation
Instance segmentation is similar to semantic segmentation in that identified objects will be demarcated using pixels. The difference is that in this case, the computer will categorize various instance classes. If you are taking an image of a car lot with several 2018 Porsche 911, Carrera cars, semantic segmentation will mark all the vehicles, but they may not appear different from each other.
However, with instance segmentation, the computer will distinguish between the differences between the different cars. So, if there are five different colored Porsches, you will be able to differentiate them. Instance segmentation will note all instances, such as overlapping objects, identifying their boundaries, different backgrounds, and their interrelation.
Object tracking
This is a fantastic development in computer vision. Object tracking refers to identifying and tracking moving images in a scene. Using object tracking, the computer can monitor the interaction between objects once they have been identified. Object tracking is a crucial part of self-driving cars because most objects aren't static on the road.
Object tracking applies segmentation in that it finds the boundaries of the objects on or around the road. However, since these objects are moving (in relation to you), the boundaries are continually changing, and object tracking monitors how these boundaries are changing. For example, suppose you are back in the self-driving car, and a cyclist is trying to overtake you. In that case, the vehicle will use the various cameras it has to spot him or her and navigate the situation correctly so that you can pass each other safely.
Thanks to how much information, in the form of images, people upload to the internet every day, computer vision techniques are becoming progressively accurate with time. The applications of this technology will have a significant impact on humanity. One can only wait and see what it will be applied to.