🧪 MeTHanol: Modularized Thinking Language Models with Intermediate Layer Thinking, Decoding and Bootstrapping Reasoning

Ningyuan Xi1, Xiaoyu Wang2, Yetao Wu3, Teng Chen3, Qingqing Gu3, Luo Ji3
1Beihang University, Beijing, China 2Beijing Institute of Technology, Beijing, China 3Geely AI Lab, Beijing, China
2025 International Joint Conference on Neural Networks (IJCNN)

Abstract

Current research efforts are focused on enhancing the thinking and reasoning capability of large language models (LLMs) by prompting, data-driven emergence and inference-time computation. In this study, we consider stimulating language models' thinking and cognitive abilities from a modular perspective, which mimics the human brain architecture. We select a specific intermediate attention layer with newly implemented language heads. We conduct dual-layer fine-tuning by annotated (query, thought, response) samples and show that the intermediate layer can also learn to decode fluent and reasonable language tokens. A two-pass inference mechanism is designed to generate thoughts then formal responses. The entire framework is called modularized thinking language model (MeTHanol) which can enhance LLM's cognitive behaviors as indicated by Theory of Mind (ToM) and Vignette-based experiments. Case studies also show that MeTHanol can plan and self-reflect and generate human-like thoughts and answers, even on unseen and open-domain tasks. MeTHanol can also adapt to a personalized prompt and behave as the specified character. Our study holds promise for significant cognitive gains from a modular perspective.

Overview

MeTHanol paradigm overview

An overview of MeTHanol with modular correspondence to human brain architecture.

Framework

MeTHanol framework comparison

Comparison of the MeTHanol framework to standard LLM fine-tuning.

Training Result

MeTHanol training loss curves

Training loss curves and special case performance across training steps.

Benchmark

Sally-Anne false-belief benchmark results

Fine-tuned results of Sally-Anne false-belief experiments. Values are percentages.

Vignette-based benchmark results

Zero-shot results of Vignette-based experiments. Values are percentages.

BibTeX

@INPROCEEDINGS{11229297,
  author={Xi, Ningyuan and Wang, Xiaoyu and Wu, Yetao and Chen, Teng and Gu, Qingqing and Ji, Luo},
  booktitle={2025 International Joint Conference on Neural Networks (IJCNN)},
  title={MeTHanol: Modularized Thinking Language Models with Intermediate Layer Thinking, Decoding and Bootstrapping Reasoning},
  year={2025},
  volume={},
  number={},
  pages={1-9},
  keywords={Training;Adaptation models;Inference mechanisms;Large language models;MIMICs;Computer architecture;Oral communication;Cognition;Decoding;Methanol;modularity;LLM;latent space;reasoning},
  doi={10.1109/IJCNN64981.2025.11229297}}