Vision Capabilities Are at the Heart of AI's Evolutionary Possession of Intelligence

Stanford's AI Legacy
Computer vision is one of the fundamental areas of AI research and has contributed to huge advances in areas such as deep learning.
According to Li Feifei, a professor in the Department of Computer Science at Stanford University, almost all of these advances have relied on the pursuit of the 'North Star', which refers to a key problem in scientific research.
In a recent article in Daedalus, entitled "Searching for Computer Vision North Stars", she explains the latest developments in object recognition in computer vision, a brief history of the ImageNet The article explains the latest developments in object recognition in computer vision, a brief history of the ImageNet dataset and related work.
Figure | Li Feifei's team (second from right, front row) (Source: Stanford University)
According to Li Feifei, the formulation of key questions will advance the development of computer vision, and indeed the AI field as a whole.
The AI field is currently evolving rapidly, with successful implementations of AI everywhere, from spam filters to personalized retail to autonomous driving. As Albert Einstein said, "It is often more important to ask a question than to solve a problem."
But it may not always be obvious what scientific problems are involved behind these practices or which problems are most in need of solutions. Once a fundamental problem in a field is formulated - identifying a 'North Star' - it can drive the field forward in leaps and bounds.
Li Feifei mentioned that her research in computer vision has been driven by her own series of 'North Stars'.
The ability to see is at the heart of intelligence, just as the evolution of the eye has been key to the creation of many different species, including humans. Humans can use visual perception to make sense of the world and interact with it. So, how do you get AI to see? There are many questions to be addressed here, and the choice of fundamental questions is an important part of the scientific exploration of computer vision.
"Initially, how to get a computer to correctly recognize what is in a given image was a particular problem we wanted to understand. the rapid growth of the internet and digital cameras in the early 2000s led to an explosion in the number of digital images, giving rise to needs such as automatically cataloging collections of photographs and enabling users to search these collections, which required the use of object recognition ." Li Feifei says in the article.
Recognizing objects requires an understanding of what digital images mean in the visual world, and computers cannot understand these concepts. To a computer, a digital image is nothing more than a collection of pixels and has no meaning.
Teaching a computer to recognize objects requires somehow having it connect each collection of numbers to a meaningful concept.
Computers learn from examples of contact, which is the essence of machine learning. Specifically, this means that significant progress in object recognition can only be made by accessing large, diverse, high-quality training data.
As a result, Li Feifei et al. created a dataset called ImageNet in 2009 to achieve the following three design goals: scale (large amounts of data), diversity (richly varied objects), and quality (high-resolution, accurately labeled objects).
"In focusing on these three goals, we have moved away from general 'Polaris' (image recognition) to a more specific problem formulation." Li Feifei said.
ImageNet is understood to include tens of millions of labeled images that can be trained for machine learning models. Today, we use algorithms related to ImageNet when searching for images on the Internet and automatically grouping photos based on the faces on our smartphones.
In addition, the researchers have made ImageNet open source and free to use. They have also created the ImageNet Large Scale Visual Recognition Challenge (the ImageNet Challenge).
Notably, at the 2012 ImageNet Challenge, a team applied convolutional neural networks - an algorithm inspired by the way the human brain works - to object recognition for the first time, recognizing images 41% more accurately than the then second-place finisher. In 2015, these machines recognized images with an accuracy of 97.3%, surpassing human recognition (which is around 95% accurate).
Although neural networks have been around for decades as a method of machine learning, it wasn't until that year's ImageNet challenge that they became widely used, and in one year almost every AI paper was about neural networks. Large tech companies like Google and Meta (formerly Facebook) were deploying neural network-based technologies.
There are then important similarities between object recognition and other tasks in computer vision, such as object detection and activity recognition.
This similarity means that computers do not need to start from scratch to process new tasks. In theory, a computer should be able to exploit these similarities and apply what it has learned from one task to perform a slightly different one. For both computers and humans, this process of generalizing knowledge from one task to a similar task is known as transfer learning. For example, if a human learns French, it will be relatively easy to learn Spanish again. Indeed, the ability to spot similarities between tasks and to use this shared knowledge to help us learn new tasks is one of the hallmarks of human intelligence.
One way in which computers perform transfer learning is through pre-training. That is, before giving a machine learning model a new challenge, it is first trained to do something similar using existing valid data. Today, almost every computer vision approach uses models pre-trained on ImageNet. Object detection was the first attempt to apply ImageNet data to uses other than object recognition.
There are much wider applications for computer vision (or visual intelligence), for example, doctors can use computer vision to help them diagnose and treat patients; machine learning can be used to assess crop yields, the environment, and climate change by analyzing large amounts of satellite imagery; scientists can discover new species, better materials and unknown frontiers with the help of machines.
Finally, in the field of computer vision, what are the next "North Stars"?
Li Feifei said that one of the biggest is in the field of embodied AI, including humanoid robots for tasks such as navigation and manipulation and tangible and intelligent machines that move through space, robotic hoovers, robotic arms in factories, self-driving cars, and more.
She also spoke of "another one, visual reasoning. For example, the understanding of 3D relationships in a 2D scene. Asking an AI to perform a simple task like moving a glass of water on the table to the right side of a plate also requires visual reasoning. Beyond this, understanding human social relationships and intentions are more complex, and basic social intelligence is another key issue. For example, if a woman is holding a little girl on her lap, it is easy to guess that the two could be mother and daughter, but it is still difficult for a computer to determine this type of situation."
Related News
1、Chip Packaging Lead Time Has Grown to 50 Weeks
2、Eight Internet of Things (IoT) Trends for 2022
3、Demand for Automotive Chips Will Surge 300%
4、Volkswagen CFO: Chip Supply Shortage Will Continue Until 2024
5、BMW CEO: The Car Chip Problem Will Not Be Solved Until 2023
6、Shenzhen: This Year Will Focus on Promoting SMIC and CR Micro 12-inch Project
- UTMEL 2024 Annual gala: Igniting Passion, Renewing BrillianceUTMEL18 January 20242684
As the year comes to an end and the warm sun rises, Utmel Electronics celebrates its 6th anniversary.
Read More - Electronic Components Distributor Utmel to Showcase at 2024 IPC APEX EXPOUTMEL10 April 20243512
Utmel, a leading electronic components distributor, is set to make its appearance at the 2024 IPC APEX EXPO.
Read More - Electronic components distributor UTMEL to Showcase at electronica ChinaUTMEL07 June 20242145
The three-day 2024 Electronica China will be held at the Shanghai New International Expo Center from July 8th to 10th, 2024.
Read More - Electronic components distributor UTMEL Stands Out at electronica china 2024UTMEL09 July 20242362
From July 8th to 10th, the three-day electronica china 2024 kicked off grandly at the Shanghai New International Expo Center.
Read More - A Combo for Innovation: Open Source and CrowdfundingUTMEL15 November 20193273
Open source is already known as a force multiplier, a factor that makes a company's staff, financing, and resources more effective. However, in the last few years, open source has started pairing with another force multiplier—crowdfunding. Now the results of this combination are starting to emerge: the creation of small, innovative companies run by design engineers turned entrepreneurs. Although the results are just starting to appear, they include a fresh burst of product innovation and further expansion of open source into business.
Read More
Subscribe to Utmel !
- 38PMACR50KLF10
TT Electronics
- 38WKBAR20KLF20
TT Electronics
- 39PNCBR50KLF20
TT Electronics
- 38PLABR100LF20
TT Electronics
- 38WKABR50KLF10
TT Electronics
- 39WRABR200KLF20
TT Electronics
- 39WRBBR100KLF10
TT Electronics
- 38WKBAR100LF10
TT Electronics
- 38WKAAR20KLF10
TT Electronics
- 39PRACR1MEGLF20
TT Electronics