Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      June 4, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      June 4, 2025

      How To Prevent WordPress SQL Injection Attacks

      June 4, 2025

      Smashing Animations Part 4: Optimising SVGs

      June 4, 2025

      I test AI tools for a living. Here are 3 image generators I actually use and how

      June 4, 2025

      The world’s smallest 65W USB-C charger is my latest travel essential

      June 4, 2025

      This Spotlight alternative for Mac is my secret weapon for AI-powered search

      June 4, 2025

      Tech prophet Mary Meeker just dropped a massive report on AI trends – here’s your TL;DR

      June 4, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      Beyond AEM: How Adobe Sensei Powers the Full Enterprise Experience

      June 4, 2025
      Recent

      Beyond AEM: How Adobe Sensei Powers the Full Enterprise Experience

      June 4, 2025

      Simplify Negative Relation Queries with Laravel’s whereDoesntHaveRelation Methods

      June 4, 2025

      Cast Model Properties to a Uri Instance in 12.17

      June 4, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      My Favorite Obsidian Plugins and Their Hidden Settings

      June 4, 2025
      Recent

      My Favorite Obsidian Plugins and Their Hidden Settings

      June 4, 2025

      Rilasciata /e/OS 3.0: Nuova Vita per Android Senza Google, Più Privacy e Controllo per l’Utente

      June 4, 2025

      Rilasciata Oracle Linux 9.6: Scopri le Novità e i Miglioramenti nella Sicurezza e nelle Prestazioni

      June 4, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Artificial Intelligence»Like human brains, large language models reason about diverse data in a general way

    Like human brains, large language models reason about diverse data in a general way

    February 19, 2025

    While early language models could only process text, contemporary large language models now perform highly diverse tasks on different types of data. For instance, LLMs can understand many languages, generate computer code, solve math problems, or answer questions about images and audio.   

    MIT researchers probed the inner workings of LLMs to better understand how they process such assorted data, and found evidence that they share some similarities with the human brain.

    Neuroscientists believe the human brain has a “semantic hub” in the anterior temporal lobe that integrates semantic information from various modalities, like visual data and tactile inputs. This semantic hub is connected to modality-specific “spokes” that route information to the hub. The MIT researchers found that LLMs use a similar mechanism by abstractly processing data from diverse modalities in a central, generalized way. For instance, a model that has English as its dominant language would rely on English as a central medium to process inputs in Japanese or reason about arithmetic, computer code, etc. Furthermore, the researchers demonstrate that they can intervene in a model’s semantic hub by using text in the model’s dominant language to change its outputs, even when the model is processing data in other languages.

    These findings could help scientists train future LLMs that are better able to handle diverse data.

    “LLMs are big black boxes. They have achieved very impressive performance, but we have very little knowledge about their internal working mechanisms. I hope this can be an early step to better understand how they work so we can improve upon them and better control them when needed,” says Zhaofeng Wu, an electrical engineering and computer science (EECS) graduate student and lead author of a paper on this research.

    His co-authors include Xinyan Velocity Yu, a graduate student at the University of Southern California (USC); Dani Yogatama, an associate professor at USC; Jiasen Lu, a research scientist at Apple; and senior author Yoon Kim, an assistant professor of EECS at MIT and a member of the Computer Science and Artificial Intelligence Laboratory (CSAIL). The research will be presented at the International Conference on Learning Representations.

    Integrating diverse data

    The researchers based the new study upon prior work which hinted that English-centric LLMs use English to perform reasoning processes on various languages.

    Wu and his collaborators expanded this idea, launching an in-depth study into the mechanisms LLMs use to process diverse data.

    An LLM, which is composed of many interconnected layers, splits input text into words or sub-words called tokens. The model assigns a representation to each token, which enables it to explore the relationships between tokens and generate the next word in a sequence. In the case of images or audio, these tokens correspond to particular regions of an image or sections of an audio clip.

    The researchers found that the model’s initial layers process data in its specific language or modality, like the modality-specific spokes in the human brain. Then, the LLM converts tokens into modality-agnostic representations as it reasons about them throughout its internal layers, akin to how the brain’s semantic hub integrates diverse information.

    The model assigns similar representations to inputs with similar meanings, despite their data type, including images, audio, computer code, and arithmetic problems. Even though an image and its text caption are distinct data types, because they share the same meaning, the LLM would assign them similar representations.

    For instance, an English-dominant LLM “thinks” about a Chinese-text input in English before generating an output in Chinese. The model has a similar reasoning tendency for non-text inputs like computer code, math problems, or even multimodal data.

    To test this hypothesis, the researchers passed a pair of sentences with the same meaning but written in two different languages through the model. They measured how similar the model’s representations were for each sentence.

    Hostinger

    Then they conducted a second set of experiments where they fed an English-dominant model text in a different language, like Chinese, and measured how similar its internal representation was to English versus Chinese. The researchers conducted similar experiments for other data types.

    They consistently found that the model’s representations were similar for sentences with similar meanings. In addition, across many data types, the tokens the model processed in its internal layers were more like English-centric tokens than the input data type.

    “A lot of these input data types seem extremely different from language, so we were very surprised that we can probe out English-tokens when the model processes, for example, mathematic or coding expressions,” Wu says.

    Leveraging the semantic hub

    The researchers think LLMs may learn this semantic hub strategy during training because it is an economical way to process varied data.

    “There are thousands of languages out there, but a lot of the knowledge is shared, like commonsense knowledge or factual knowledge. The model doesn’t need to duplicate that knowledge across languages,” Wu says.

    The researchers also tried intervening in the model’s internal layers using English text when it was processing other languages. They found that they could predictably change the model outputs, even though those outputs were in other languages.

    Scientists could leverage this phenomenon to encourage the model to share as much information as possible across diverse data types, potentially boosting efficiency.

    But on the other hand, there could be concepts or knowledge that are not translatable across languages or data types, like culturally specific knowledge. Scientists might want LLMs to have some language-specific processing mechanisms in those cases.

    “How do you maximally share whenever possible but also allow languages to have some language-specific processing mechanisms? That could be explored in future work on model architectures,” Wu says.

    In addition, researchers could use these insights to improve multilingual models. Often, an English-dominant model that learns to speak another language will lose some of its accuracy in English. A better understanding of an LLM’s semantic hub could help researchers prevent this language interference, he says.

    “Understanding how language models process inputs across languages and modalities is a key question in artificial intelligence. This paper makes an interesting connection to neuroscience and shows that the proposed ‘semantic hub hypothesis’ holds in modern language models, where semantically similar representations of different data types are created in the model’s intermediate layers,” says Mor Geva Pipek, an assistant professor in the School of Computer Science at Tel Aviv University, who was not involved with this work. “The hypothesis and experiments nicely tie and extend findings from previous works and could be influential for future research on creating better multimodal models and studying links between them and brain function and cognition in humans.”

    This research is funded, in part, by the MIT-IBM Watson AI Lab.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleMicrosoft’s secret Windows 11 tool lets you move data between two PCs
    Next Article MIT spinout maps the body’s metabolites to uncover the hidden drivers of disease

    Related Posts

    Artificial Intelligence

    Markus Buehler receives 2025 Washington Award

    June 4, 2025
    Artificial Intelligence

    3 Questions: Visualizing research in the age of AI

    June 4, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    “Create a replica of this image. Don’t change anything” AI trend takes off

    Artificial Intelligence

    mogwai – view library for creating GUI applications

    Development

    Alibaba Qwen Team just Released ‘Lessons of Developing Process Reward Models in Mathematical Reasoning’ along with a State-of-the-Art 7B and 72B PRMs

    Machine Learning

    CVE-2025-28202 – Victure RX1800 SSH/Telnet Authentication Bypass Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    Highlights

    Development

    A Comprehensive Overview of Data Engineering Pipeline Tools

    June 13, 2024

    The paper “A Survey of Pipeline Tools for Data Engineering” thoroughly examines various pipeline tools…

    ChatGPT will remember everything you tell it now – like a real personal assistant

    April 10, 2025

    CVE-2025-1093 – WordPress AIHub Theme Remote Code Execution File Upload Vulnerability

    April 22, 2025

    CVE-2025-40633 – Koibox Stored Cross-Site Scripting (XSS)

    May 20, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.