Raconteur icon : A Knowledgeable, Insightful, and Portable
LLM-Powered Shell Command Explainer

Accepted at NDSS 2025 Symposium

(CCF-A, Core-A*, Big4 Security Conf.)

Jiangyi Deng✲,1 Xinfeng Li✲,1, Yanjiao Chen✉︎,1, Yijie Bai1, Haiqin Weng2, Yan Liu2, Tao Wei2, Wenyuan Xu1

Zhejiang University1, Ant Group2




Abstract

Malicious shell commands are linchpins to many cyber-attacks, but may not be easy to understand by security analysts due to complicated and often disguised code structures. Advances in large language models (LLMs) have unlocked the possibility of generating understandable explanations for shell commands. However, existing general-purpose LLMs suffer from a lack of expert knowledge and a tendency to hallucinate in the task of shell command explanation. In this paper, we present Raconteur, a knowledgeable, expressive and portable shell command explainer powered by LLM. Raconteur is infused with professional knowledge to provide comprehensive explanations on shell commands, including not only what the command does (i.e., behavior) but also why the command does it (i.e., purpose). To shed light on the high-level intent of the command, we also translate the natural-language-based explanation into standard technique & tactic defined by MITRE ATT&CK, the worldwide knowledge base of cybersecurity. To enable Raconteur to explain unseen private commands, we further develop a documentation retriever to obtain relevant information from complementary documentations to assist the explanation process. We have created a large-scale dataset for training and conducted extensive experiments to evaluate the capability of Raconteur in shell command explanation. The experiments verify that Raconteur is able to provide high-quality explanations and in-depth insight of the intent of the command.

The Motivation behind Raconteur

Q1: What is Raconteur and what problem does it solve?

A1: Raconteur is an advanced shell command interpreter based on large language models (LLM), designed to help security analysts better understand and identify potential cyberattacks. As shown in below, shell commands—particularly malicious ones—are often used in cyberattacks, and their complexity or obfuscation makes them difficult for analysts to interpret. With advancements in LLMs, it has become possible to generate clear, understandable explanations of these commands.

Abstract

Q2: Why not commercial LLMs (e.g., GPT-4) for this problem?

A2: Shell commands are integral to corporate operations and frequently contain proprietary information that may not be accessed by third-party LLM providers (e.g., OpenAI). Raconteur is fine-tuned as a standalone, portable solution that can be deployed on-premises, thereby ensuring data privacy and security. Also, Raconteur leverages retrieval augmented generation (RAG) to deliver accurate and contextually relevant explanations of shell commands.

What are the core challenges that Raconteur seeks to address?

Q1: How can domain-specific expertise be integrated into a general-purpose LLM to provide accurate interpretations of shell commands?

A1: Existing general-purpose LLMs often struggle to interpret shell commands, especially malicious ones. Raconteur addresses this by fine-tuning the LLM with a custom dataset of (prompt, response) pairs. The responses incorporate expert knowledge from authoritative code repositories to enhance accuracy.

Q2: How can the Intent behind shell commands be explained effectively?

A2: Beyond simply describing the syntax, our paper addresses the need for understanding the intent, i.e., what an attacker aims to achieve (tactic) and how they plan to achieve it (technique). Raconteur achieves this by embedding natural language descriptions into the same space as the MITRE ATT&CK framework, allowing the system to match shell commands with the most relevant techniques and tactics.

Q3: How can Raconteur provide accurate information for private, unseen shell commands?

A3: Many organizations use proprietary commands or custom scripts that are not publicly documented. LLMs may provide inaccurate or fabricated information when confronted with these unfamiliar commands. To handle this, Raconteur includes a document retriever that gathers supplemental documentation to aid in interpreting private shell commands.

In summary, Raconteur assists security analysts by providing semantic and intent-based insights into both Unix Shell and PowerShell commands, enabling the identification of potential cyberattacks.

System Overview: Three Key Components

Abstract

① Command Behavior Explainer

The Command Behavior Explainer is tasked with providing a detailed, step-by-step interpretation of the command. This includes describing the action of each step within the command and summarizing the overall behavior, highlighting any potential malicious intent. This component is fine-tuned on a meticulously curated dataset, rich in (prompt, response) pairs, to ensure accurate and comprehensive explanations.

② Intent Identifier

The Intent Identifier translates the summarized behavior into descriptions of techniques and strategies defined within the MITRE ATT&CK framework. This aids analysts in quickly grasping the attacker's objectives. This is achieved by employing a novel embedding model, BD2Vec, which maps Raconteur's natural language descriptions into the same embedding space as the MITRE ATT&CK framework's standardized descriptions.

Abstract

③ Doc-Augmented Enhancer

The Doc-Augmented Enhancer enhances the performance of both the Behavior Explainer and the Intent Identifier by retrieving relevant information from supplementary documents. This allows Raconteur to interpret proprietary commands that were not encountered during training. A document retriever, utilizing the Text2Vec model (CD2Vec), is designed to match user-prompted commands with pertinent content within the documents.

Dataset Construction

To train and fine-tune Raconteur, we construct a comprehensive dataset. This dataset includes both malicious and benign shell command samples, along with their corresponding natural language explanations. This dataset is instrumental in training the Behavior Explainer and serves as the foundation for training and evaluating the Intent Identifier and Doc-Augmented Enhancer.


Abstract

LLM Question & Answer