Kyle_mahowald Dissociating Language and Thought in Large Language Models a Cognitive Perspective 2023

[TOC]

Title: Dissociating Language and Thought in Large Language Models a Cognitive Perspective
Author: Kyle Mahowald et. al.
Publish Year: 16 Jan 2023
Review Date: Tue, Jan 31, 2023
url: https://arxiv.org/pdf/2301.06627.pdf

Summary of paper

the author tried to challenge the “good at language $\implies$ good at thought” fallacy.
the second fallacy is “bad at thought $\implies$ bad at language”

the author argued that LLMs have promise as scientific models of one piece of the human cognitive toolbox – formal language processing – but fall short of modelling human thought.
in section 4, we consider several domains required for functional linguistic competence – formal reasoning, world knowledge, situation modelling and social cognitive abilities

deep learning models in linguistics

The view that deep learning models are not of scientific interest remains common in linguistics and psycholinguistics.

two kinds of linguistic competence

formal linguistic competence (the knowledge of rules and statistical regularities of language)
1. problem example: what counts as a valid string of language
functional linguistic competence (the ability to use language in the real world, which often draws on non-linguistic capacities)

separated machinery

the machinery dedicated to processing language is separate from the machinery responsible for memory, reasoning and social skills.

functional linguistic competence

a formal language system in isolation is useless to a language user unless it can interface with the rest of perception, cognition, and action.
despite the nearly complete loss of linguistic abilities, some individuals with severe aphasia have intact non-linguistic cognitive abilities.

LLMs learn hierarchical structure

we review evidence that LLMs learn two features that are argued by many to be central to human linguistic processing: hierarchical structure and abstraction. Both of these features address primarily the syntactic aspect of formal linguistic competence.
In human languages, words combine to make compositional meanings. When a sentence has multiple words, their meanings do not simply get added linearly one by one. Instead, they can be combined hierarchically
some approaches have shown that distances between LLM representations of individual words in a sentence align with the sentence’ hierarchical structure rather than with linear distance between the words, thereby recovering the close structural relationship between the subject and verb of a sentence even when they are linearly far apart.

LLMs learn abstraction

overall, it seems clear that LLMs achieve at least some degree of abstraction. The degree of that abstraction remains a matter of debate.

LLMs are great at pretending to think

language models trained on gigantic text corpora acquire large amounts of factual knowledge, succeed at some types of mathematical reasoning and reproduce many stereotypes and social biases.
how LLMs fail -> any test of LLMs’ ability to reason must account for their ability to use word co-occurrence patterns to “hack” the task.

Situation modelling

people can easily follow the plot of a story that spans multiple chapters or, sometimes, multiple book volumes. We can also have a three-hour conversation with a friend, and the next day the friend will expect us to remember most of what was said. We accomplish this impressive feat not by having a dedicated memory slot for every word that we read or heard, but by abstracting away linguistic information into a situation model – a mental model of entities, relations between them, and a sequence of states they had been in.

Social reasoning (pragmatics and intent

work in cognitive science and linguistics has come to recognise that these kind of grounded, context-dependent aspects of language are not just peripheral but a central part of human language production and understanding.
LLMs struggle on theory of mind task, which require inferring the intentions behind others’ actions.
moreover, LLMs themselves lack communicative intent.

Architectural modularity and emergent modularity approach

A modular language model architecture is much better aligned with the fact that real-life language use is a complex capability, requiring both language-specific knowledge (formal competence) and various non-language-specific cognitive abilities (functional competence). Whether built-in or induced to emerge, modularity can lead the models to mirror the functional organization of the human brain and, consequently, make their behavior much more humanlike.

Curated data and diverse objective functions

We believe that a model that succeeds at real-world language use would include–—in addition to the core language component–—a successful problem solver, a grounded experiencer, a situation modeler, a pragmatic reasoner, and a goal setter
the machinery required to simulate intelligence will include both domain-general components and domain specific components (such as navigation and social reasoning). This modularity could be baked in by training modular models on a mixture of carefully curated datasets using diverse objective functions. (e.g., how GPTChat combines a pure language modelling objective with a additional human feedback objective)