Work Society
Iranians Developed a Synthetic Persian Dataset for E-commerce Sales Bots
In Iran, small and medium-sized enterprises are increasingly conducting business on the Telegram messaging service, where real-time conversation with customers is crucial for closing sales. Now, Iranian researchers are introducing a dataset called MegaChat, aimed at improving the evaluation of sales chatbots built for such situations in the Persian language.
AI-based chatbots require large amounts of question-answer pairs to learn. However, producing such datasets is labor-intensive and expensive, especially for languages that lack extensive digital resources. Persian is among these so-called low-resource languages, which has slowed the development of advanced sales bots for Iranian companies.
According to the researchers, MegaChat is the first fully synthetic Persian question-answer dataset specifically designed for evaluating Telegram-based e-commerce bots. Synthetic means that the data is not directly based on real customer conversations but is automatically generated using AI methods.
The dataset is created using a multi-agent system where different AI agents perform their tasks: some generate questions, others review and modify them, and some ensure the quality of the answers. The system utilizes active Telegram shopping channels to gather background information, but the question-answer pairs themselves are reconstructed synthetically. Additionally, the conversations are tailored to consider both the customer's and the seller's roles and personalities.
The researchers aim to provide a comprehensive, high-quality test dataset that allows for more systematic comparison and development of Persian-language sales bots.
Source: MegaChat: A Synthetic Persian Q&A Dataset for High-Quality Sales Chatbot Evaluation, ArXiv (AI).
AI-based chatbots require large amounts of question-answer pairs to learn. However, producing such datasets is labor-intensive and expensive, especially for languages that lack extensive digital resources. Persian is among these so-called low-resource languages, which has slowed the development of advanced sales bots for Iranian companies.
According to the researchers, MegaChat is the first fully synthetic Persian question-answer dataset specifically designed for evaluating Telegram-based e-commerce bots. Synthetic means that the data is not directly based on real customer conversations but is automatically generated using AI methods.
The dataset is created using a multi-agent system where different AI agents perform their tasks: some generate questions, others review and modify them, and some ensure the quality of the answers. The system utilizes active Telegram shopping channels to gather background information, but the question-answer pairs themselves are reconstructed synthetically. Additionally, the conversations are tailored to consider both the customer's and the seller's roles and personalities.
The researchers aim to provide a comprehensive, high-quality test dataset that allows for more systematic comparison and development of Persian-language sales bots.
Source: MegaChat: A Synthetic Persian Q&A Dataset for High-Quality Sales Chatbot Evaluation, ArXiv (AI).
This text was generated with AI assistance and may contain errors. Please verify details from the original source.
Original research: MegaChat: A Synthetic Persian Q&A Dataset for High-Quality Sales Chatbot Evaluation
Publisher: ArXiv (AI)
Authors: Mahdi Rahmani, AmirHossein Saffari, Reyhane Rahmani
December 27, 2025
Read original →