🧠 Mapping the Minds of LLMs: A Graph-Based Analysis of Reasoning LLM

1University of Southern California, 2The University of Queensland,
3University of California, San Diego, 4University of California, Merced
*Corresponding Author
A conceptual overview of our framework for modeling the long reasoning CoT with a graph structure.

Our framework converts long, verbose Chain-of-Thought (CoT) outputs into structured, analyzable reasoning graphs, improving readability, interpretability, and quantifiability.

Abstract

Recent advances in test-time scaling have enabled Large Language Models (LLMs) to display sophisticated reasoning abilities via extended Chain-of-Thought (CoT) generation. Despite their impressive reasoning abilities, Large Reasoning Models (LRMs) frequently display unstable behaviors, e.g., hallucinating unsupported premises, overthinking simple tasks, and displaying higher sensitivity to prompt variations. This raises a deeper research question: How can we represent the reasoning process of LRMs to map their minds?

To address this, we propose a unified graph-based analytical framework for fine-grained modeling and quantitative analysis of LRM reasoning dynamics. Our method first clusters long, verbose CoT outputs into semantically coherent reasoning steps, then constructs directed reasoning graphs to capture contextual and logical dependencies among these steps.

Through a comprehensive analysis of derived reasoning graphs, we also reveal that key structural properties, such as exploration density, branching, and convergence ratios, strongly correlate with models' performance. The proposed framework enables quantitative evaluation of internal reasoning structure and quality beyond conventional metrics and also provides practical insights for prompt engineering and cognitive analysis of LLMs. Code and resources will be released to facilitate future research in this direction.

Framework

Our pipeline provides a systematic method for transforming raw, unstructured reasoning tokens from LLMs into interpretable graph structures. We begin with the raw token output and first segment it into an ordered list of reasoning units based on natural delimiters. Then, a logical clustering step combines these fine-grained units into cohesive reasoning steps, which become the nodes of our graph. Finally, we detect the semantic relationships between these steps to establish directed edges, revealing the high-level reasoning structure adopted by the LLM.

Pipeline for building the graph structure from reasoning LLM output.

Analysis and Results

Our analysis reveals that higher task accuracy is consistently associated with a richer reasoning graph structure, characterized by increased exploration density, higher branching, and greater convergence. In contrast, few-shot prompting, particularly with verbose examples, tends to reduce branching and convergence, leading to more linear and less effective reasoning paths. Zero-shot prompting encourages more complex and adaptive graph structures, suggesting that models engage in more active exploration and synthesis when not constrained by demonstrations. These structural signatures of effective reasoning persist across different model scales, highlighting the explanatory power of our graph-based framework.

Analysis of reasoning graph metrics for correct and incorrect answers across different numbers of few-shot examples.

BibTeX

@article{xiong2025mapping,
  title={Mapping the Minds of LLMs: A Graph-Based Analysis of Reasoning LLM},
  author={Xiong, Zhen and Cai, Yujun and Li, Zhecheng and Wang, Yiwei},
  journal={arXiv preprint arXiv:2505.13890},
  year={2025}
}