Explainability of Large Language Model

Illustration of how natural language queries from users are parsed into executable operations

Development of explanations (such as post-hoc explanations, causal reasoning, and Chain-of-Thought Prompting) for transparent AI models. Human-centered XAI is prioritized to develop explanations that can be personalized for user needs at different levels of abstraction and detail. Development of methods to verify model faithfulness, ensuring that explanations or predictions accurately reflect the actual internal decision-making process.

  • Wang, Qianli, et al. “Cross-Refine: Improving Natural Language Explanation Generation by Learning in Tandem.” arXiv preprint arXiv:2409.07123 (2024).

  • Villa-Arenas, Luis Felipe, et al. “Anchored Alignment for Self-Explanations Enhancement.” arXiv preprint arXiv:2410.13216 (2024).

  • Wang, Qianli, et al. “FitCF: A Framework for Automatic Feature Importance-guided Counterfactual Example Generation.” arXiv preprint arXiv:2501.00777 (2025).

Dr. Veronika Solopova
Dr. Veronika Solopova
Senior Researcher