Computer Vision, mmm… Think about it, since I’ve been writing and studying Artificial Intelligence I’ve never done an article on Computer Vision. A simple article where I go and explain what this discipline is, how it works and why it’s so cool.
That’s why today I decided to take the topic from the beginning and talk to you about one of the subjects that I am most passionate about.
Also, in order not to always repeat the keyword of this article I’m going to abbreviate it with CV characters (so don’t confuse it for Curriculum Vitae 😅😂).
Before I start, however, I want to give you a little taste of what the CV market is all about. In fact, it is estimated that in 2030 this market (hardware and software) will reach a value of $41.11 billion (source).
That much? How is it possible?
I’ll tell you why in a moment, so read on.
What is Computer Vision
Computer Vision is a branch of Artificial Intelligence that uses Deep Learning technologies to replicate human behavior, at least what we “see” in terms of.
Don’t get it? Here’s an example.
Do you know Elon Musk’s Tesla company? Well. Tesla uses advanced CV systems to identify objects, people, the road, and everything else (just like humans do). This is used by the car so it can help the person with driving or replace them.
Hmm…ok, maybe this is not enough to describe this subject.
Let me give you another example.
Are you familiar with technologies like Augmented Reality (AR), Virtual Reality (VR) and Mixed Reality (XR)? Yes? Well, at the foundation of the technologies I mentioned above there is Computer Vision, or as it were, algorithms and CV systems.
Now, do you see why its market is estimated to reach $41.11 billion by 2030?
Computer Vision is the basis of what’s there today and what we’ll see -more and more- in the future, such as self-driving cars and virtual realities like the Metaverse (a much-discussed topic in Facebook and beyond).
Do you understand what Computer Vision is? Well, then let’s move on and get into a little more detail.
How does CV work?
For centuries humans have been trying to get machines to replicate human behavior. Over the years there have been several failures, but today with new technologies and new computational power, humans are much closer to achieving their goal.
In the specific case of CV, the goal is to make the computer recognize objects of various kinds.
This is very cool, but what actually happens? How does a computer manage to recognize objects in images?.
To answer these questions I’ll try to give you a practical example starting with the basics.
Let’s take an image of a dog, you and I – as humans – simply see a dog, while the computer does not. What the computer sees before it even sees the dog is numbers.
Well yes, numbers! Three-channel number sequences ranging from 0 to 255. These numbers reflect the primary RGB (Red, Green, Blue) colors.

Not clear to you? I’ll make it easier for you.
Let’s consider an image of Abraham Lincoln. In order to display this image in black and white, the computer will divide the entire figure into pixels.
To each pixel, the computer will assign a value from 0 to 255, relative to the brightness, but this time on only one channel (because it is in black and white).

Once the value of each pixel is obtained, the computer will be able to form and display the image.
This is what, in a few words, happens for a single image.
Complex isn’t it? I understand, that’s why I prefer not to go into even more detail. It would confuse you.
Read also: Convolutional Neural Networks what they are
But how do you build real Computer Vision software?
A little bit ago I explained how the PC sees an image, but how do you create real CV software?
Well creating software takes time, although less than a decade ago, but still it takes months, if not years. But let’s look at the main steps.
- The first thing to do is collect data. For projects that start from scratch and want to have their own algorithm you need millions and millions of quality images, because without good data the algorithm will make wrong predictions.
Otherwise as it happens today in many business is used the transfer learning that allows to use few images thanks to the use of pre-trained algorithms (but this is another story). - Once the images have been collected, they need to be labeled. It means that if my goal is to identify people’s faces, in almost every image I collect I will have to select and label people’s faces.
- After finishing the work on the images I will create the algorithm that I will go train on the same captured and labeled images. Then I will test the same algorithm with the images that I did not label or those that were not fed to the algorithm.
- If the software works after several tests then it is time for field application.
This is the process by which a CV algorithm comes out.
I made it very simple, but not all that shines is gold. Behind it lies hours and hours of work.
Read also: How to recognize signs language
Ok, all very cool but what are the fields of application, where is Computer Vision used?
Fields of application of Computer Vision
There are several application fields in which we can see CV at work, some I have told you before, others are always in front of your eyes and you don’t even know it.
Antonio waits, what do you mean by “they are always in front of your eyes”?
Facial Detection & Recognition
Do you have an iPhone X or higher? or a recent Android smartphone?
If the answer is “Yes”, then you will also know that it has the “face unlock” feature. Well, this feature is Computer Vision and in slang, it is called “Facial Recognition “.
Here the algorithm learns to detect and recognize people’s faces.
Healthcare
The medical field is among the many fields of application. Even today we are starting to use the first Computer Vision systems to identify diseases of the human body. A practical example is the identification of cancer in the lungs.
Self-driving cars
I’ve told you about this before, but more and more automakers are using these systems to help the driver on the road, or even replace them. This challenge won’t be easy because of laws that don’t yet exist and also ethical issues.
Virtual Reality, Augmented Reality and Mixed Reality.
Snapchat, Instagram, Facebook are just a few of the companies using these technologies. The 3D filters spread by Snapchat and then copied by Facebook on Instagram are proof.
These kinds of realities will play a key role in the future, precisely because companies like Facebook are competing to build the first metaverse, just like in the movie Ready Player One.
Sports (and more) Analysis
Computer vision can also be used to do video analysis, then return real-time insights into how a game is going.
Or, a little scary, your employer could monitor you and estimate how much you actually work.
Conclusions
These are just clear demonstrations of this matter that I have explained to you today. Maybe now we don’t know but it may be that in the future there will be even more fields of application, where now we don’t imagine it.
For sure the technological progress will influence a lot of Computer Vision and the other fields of Artificial Intelligence like the Natural Language Process (NLP). The important thing is that together with technological progress there are laws to protect users’ privacy and ethical standards.
With that said, I thank you for making it all the way through and I hope you’ve got it all figured out.
Are you excited about what this technology can do or are you a little terrified? Let me know in the comments!
See you next time,
Antonio.