Input Chat Log
Output Dataset
About the Chat to Prompt Dataset Generator
Fine-tuning Large Language Models (LLMs) like GPT-3.5, GPT-4, LLaMA 3, or Mistral requires high-quality data in a specific format. Often, developers have valuable conversation logs in plain text files but struggle to convert them into the structured JSONL (JSON Lines) format required by training APIs.
This Chat to Prompt Dataset Generator is a free, client-side tool designed to bridge that gap. It parses your raw chat logs, identifies user and assistant turns based on custom prefixes, and automatically formats them into the OpenAI-compatible chat format.
How to Use
- Step 1: Paste your raw chat log into the input area.
- Step 2: Define your "User Prefix" (e.g., "User:", "Human:") and "Assistant Prefix" (e.g., "AI:", "Bot:").
- Step 3: (Optional) Add a System Prompt to guide the model's behavior for all samples.
- Step 4: Click "Convert to JSONL". Review the output in the JSON view or Visual Preview.
- Step 5: Download the file or copy the code to start fine-tuning.
Features & Benefits
- 100% Client-Side: Your data never leaves your browser. Perfect for sensitive or private chat logs.
- Multi-Sample Support: Use a separator (like "### NEW CHAT ###") to process multiple conversations at once.
- Visual Validation: Switch to "Visual Preview" to ensure your parsing logic is correct before exporting.
- Token Estimation: Get a quick estimate of token usage to plan your fine-tuning budget.
Frequently Asked Questions
What format does this tool export?
It exports in the standard JSONL (JSON Lines) format used by OpenAI's fine-tuning API and many open-source libraries (like Axolotl or Unsloth). Each line represents one full conversation object containing a list of messages.
Can I use this for LLaMA 3 or Mistral?
Yes! Most modern fine-tuning pipelines for LLaMA and Mistral now support the "ShareGPT" or OpenAI-style chat format directly. If your training script expects this standard format, this tool is perfect.
Is my data sent to a server?
No. This tool runs entirely in your web browser using JavaScript. No data is transmitted to any server, ensuring your proprietary or private datasets remain secure.
Action successful
Read Also: