Vision Capabilities Are at the Heart of AI's Evolutionary Possession of Intelligence

Published: 06 July 2022 | Last Updated: 06 July 2022792
Computer vision is one of the fundamental areas of AI research and has contributed to huge advances in areas such as deep learning. According to Li Feifei, a professor in the Department of Computer Science at Stanford University, almost all of these advances have relied on the pursuit of the 'North Star', which refers to a key problem in scientific research.
Stanford was one of the pioneers in artificial intelligence. Hear from professors such as Chris Manning and Li Feifei on the earliest days of natural language processing and computer vision, the work of scholars John McCarthy and Jay McClelland, the launch of the Stanford AI Lab, early robotics at the school, and other pivotal moments in Stanford AI.

Stanford's AI Legacy


Computer vision is one of the fundamental areas of AI research and has contributed to huge advances in areas such as deep learning.

 

According to Li Feifei, a professor in the Department of Computer Science at Stanford University, almost all of these advances have relied on the pursuit of the 'North Star', which refers to a key problem in scientific research.

 

In a recent article in Daedalus, entitled "Searching for Computer Vision North Stars", she explains the latest developments in object recognition in computer vision, a brief history of the ImageNet The article explains the latest developments in object recognition in computer vision, a brief history of the ImageNet dataset and related work.

 

Stanford's Li Feifei Vision Capabilities Are at the Heart of AI's Evolutionary Possession of Intelligence..png

Figure | Li Feifei's team (second from right, front row) (Source: Stanford University)

 

According to Li Feifei, the formulation of key questions will advance the development of computer vision, and indeed the AI field as a whole.

 

The AI field is currently evolving rapidly, with successful implementations of AI everywhere, from spam filters to personalized retail to autonomous driving. As Albert Einstein said, "It is often more important to ask a question than to solve a problem."

 

But it may not always be obvious what scientific problems are involved behind these practices or which problems are most in need of solutions. Once a fundamental problem in a field is formulated - identifying a 'North Star' - it can drive the field forward in leaps and bounds.

 

Li Feifei mentioned that her research in computer vision has been driven by her own series of 'North Stars'.

 

The ability to see is at the heart of intelligence, just as the evolution of the eye has been key to the creation of many different species, including humans. Humans can use visual perception to make sense of the world and interact with it. So, how do you get AI to see? There are many questions to be addressed here, and the choice of fundamental questions is an important part of the scientific exploration of computer vision.

 

"Initially, how to get a computer to correctly recognize what is in a given image was a particular problem we wanted to understand. the rapid growth of the internet and digital cameras in the early 2000s led to an explosion in the number of digital images, giving rise to needs such as automatically cataloging collections of photographs and enabling users to search these collections, which required the use of object recognition ."  Li Feifei says in the article.

 

Recognizing objects requires an understanding of what digital images mean in the visual world, and computers cannot understand these concepts. To a computer, a digital image is nothing more than a collection of pixels and has no meaning.

 

Teaching a computer to recognize objects requires somehow having it connect each collection of numbers to a meaningful concept.

 

Computers learn from examples of contact, which is the essence of machine learning. Specifically, this means that significant progress in object recognition can only be made by accessing large, diverse, high-quality training data.

As a result, Li Feifei et al. created a dataset called ImageNet in 2009 to achieve the following three design goals: scale (large amounts of data), diversity (richly varied objects), and quality (high-resolution, accurately labeled objects).

 

"In focusing on these three goals, we have moved away from general 'Polaris' (image recognition) to a more specific problem formulation."  Li Feifei said.

 

ImageNet is understood to include tens of millions of labeled images that can be trained for machine learning models. Today, we use algorithms related to ImageNet when searching for images on the Internet and automatically grouping photos based on the faces on our smartphones.

 

In addition, the researchers have made ImageNet open source and free to use. They have also created the ImageNet Large Scale Visual Recognition Challenge (the ImageNet Challenge).

 

Notably, at the 2012 ImageNet Challenge, a team applied convolutional neural networks - an algorithm inspired by the way the human brain works - to object recognition for the first time, recognizing images 41% more accurately than the then second-place finisher. In 2015, these machines recognized images with an accuracy of 97.3%, surpassing human recognition (which is around 95% accurate).

 

Although neural networks have been around for decades as a method of machine learning, it wasn't until that year's ImageNet challenge that they became widely used, and in one year almost every AI paper was about neural networks. Large tech companies like Google and Meta (formerly Facebook) were deploying neural network-based technologies.

 

There are then important similarities between object recognition and other tasks in computer vision, such as object detection and activity recognition.

 

This similarity means that computers do not need to start from scratch to process new tasks. In theory, a computer should be able to exploit these similarities and apply what it has learned from one task to perform a slightly different one. For both computers and humans, this process of generalizing knowledge from one task to a similar task is known as transfer learning. For example, if a human learns French, it will be relatively easy to learn Spanish again. Indeed, the ability to spot similarities between tasks and to use this shared knowledge to help us learn new tasks is one of the hallmarks of human intelligence.

 

One way in which computers perform transfer learning is through pre-training. That is, before giving a machine learning model a new challenge, it is first trained to do something similar using existing valid data. Today, almost every computer vision approach uses models pre-trained on ImageNet. Object detection was the first attempt to apply ImageNet data to uses other than object recognition.

 

There are much wider applications for computer vision (or visual intelligence), for example, doctors can use computer vision to help them diagnose and treat patients; machine learning can be used to assess crop yields, the environment, and climate change by analyzing large amounts of satellite imagery; scientists can discover new species, better materials and unknown frontiers with the help of machines.


Finally, in the field of computer vision, what are the next "North Stars"?

 

Li Feifei said that one of the biggest is in the field of embodied AI, including humanoid robots for tasks such as navigation and manipulation and tangible and intelligent machines that move through space, robotic hoovers, robotic arms in factories, self-driving cars, and more.

 

She also spoke of "another one, visual reasoning. For example, the understanding of 3D relationships in a 2D scene. Asking an AI to perform a simple task like moving a glass of water on the table to the right side of a plate also requires visual reasoning. Beyond this, understanding human social relationships and intentions are more complex, and basic social intelligence is another key issue. For example, if a woman is holding a little girl on her lap, it is easy to guess that the two could be mother and daughter, but it is still difficult for a computer to determine this type of situation."

                                                                                                                                                                                                                                                          

Related News

1Chip Packaging Lead Time Has Grown to 50 Weeks

2Eight Internet of Things (IoT) Trends for 2022

3Demand for Automotive Chips Will Surge 300%

4Volkswagen CFO: Chip Supply Shortage Will Continue Until 2024

5BMW CEO: The Car Chip Problem Will Not Be Solved Until 2023

6Shenzhen: This Year Will Focus on Promoting SMIC and CR Micro 12-inch Project



UTMEL

We are the professional distributor of electronic components, providing a large variety of products to save you a lot of time, effort, and cost with our efficient self-customized service. careful order preparation fast delivery service

Related Articles