AI Masters Language Learning With Just 1% of A Child’s Experience

A team of researchers at New York University has achieved a remarkable breakthrough in the field of artificial intelligence by training a multimodal AI system using a fraction of the data typically deemed necessary.

Traditionally, AI models like GPT-4 learn from extensive datasets, often in the trillions of words. However, this experiment focused on the input received by a single child from birth to their second birthday, challenging the conventional belief that AI requires large amounts of data to learn language.

The NYU team made use of headcam recordings of a child, referred to as Sam, from six months to two years of age. Astonishingly, the AI model, trained on just about 1% of the child’s waking hours, demonstrated the ability to learn a substantial number of words and concepts.

Wai Keen Vong, a research scientist at NYU’s Center for Data Science and the study’s lead author, says, “Our results demonstrate how recent algorithmic advances paired with one child’s naturalistic experience has the potential to reshape our understanding of early language and concept acquisition.”


Child-Like Learning Process


To understand the child’s language acquisition process, the NYU researchers equipped Sam with a head-mounted camera capturing his experiences from six to 25 months.

This unique dataset, spanning over 60 hours of footage, provided a valuable glimpse into the child’s world. The footage covered various activities such as mealtimes, reading books, and playtime, presenting a rich source for AI learning.

The researchers employed a multimodal neural network with separate vision and language encoders. Contrastive learning, an algorithm combining visual and linguistic data, facilitated the AI in linking words with their visual counterparts.

According to Brenden Lake, an assistant professor at NYU, “Combining these cues is what enables contrastive learning to gradually determine which words belong with which visuals and to capture the learning of a child’s first words.”

The model, after training, showcased the ability to associate words with images, mimicking the early language learning process of a child.


Word Learning Feasibility In AI


The findings of this study challenge the conventional belief that AI requires massive datasets for effective language learning. By training an AI model on the naturalistic input received by a single child, the researchers demonstrated the feasibility of word learning with minimal input.

Lake suggests, “These findings suggest that this aspect of word learning is feasible from the kind of naturalistic data that children receive while using relatively generic learning mechanisms such as those found in neural networks.”

The implications extend beyond AI language learning; the study prompts a reevaluation of classic debates about the ingredients children need to learn words. Whether language-specific biases, innate knowledge, or associative learning are crucial, the study showcases the potential of learning from a child’s perspective.


Linking Visual And Linguistic Cues


The study shows the importance of associative learning in both children and AI. When a parent speaks in the presence of a child, the model links words to visual cues, creating associations between linguistic and visual elements. Vong explains, “This provides the model a clue as to which words should be associated with which objects.”

The model, through contrastive learning, learns to determine which visuals correspond to specific words, mirroring the way children comprehend language through linking visual and linguistic cues.

The results of the evaluation further supported the effectiveness of this approach, showing the model’s ability to generalise learned words to different visual instances. This aspect of generalisation is a crucial element in both AI and child language learning.


Towards More Human-Like AI Abilities


The researchers acknowledge the need for further exploration to make AI models more closely replicate the language learning abilities of children. Lake highlights, “There’s more work to be done to try to get a model with fully two-year-old-like abilities.”

This might involve providing additional data or incorporating elements such as parental gaze or an understanding of object solidity to enhance the model’s learning.

Achieving AI models with more human-like learning capabilities could lead to improved language understanding, responsiveness to new situations, and the ability to learn from diverse experiences.

As Howard Shrobe from the Defense Advanced Research Projects Agency (DARPA) notes, “AI that could learn like a child might be capable of understanding meaning, responding to new situations, and learning from new experiences.” The goal is to bring AI one step closer to human intelligence, using child-like learning as a model for future AI development.