Back to news
Society Education

A Small Language Model Was Taught Persian Without Massive Computational Resources

A new method shows that a small language model, initially trained only in English, can be adapted with minimal resources to also work in Persian. The Persian-Phi model by Iranian researchers challenges the assumption that strong multilingual capabilities require massive models or pre-existing multilingual foundations.

The starting point was Microsoft's Phi-3 Mini model, a so-called large language model with approximately 3.8 billion parameters, which was trained exclusively in English. The researchers developed a curriculum-like training process around it, where the model was gradually introduced to the new language instead of being simply 'flooded' with Persian data.

In the initial warm-up phase, the model was fed bilingual narratives, known as Tiny Stories, in both English and Persian. The purpose was to align the numerical representations, or embeddings, of the model's vocabularies before heavy further training.

After the warm-up, the model was further trained with Persian material and fine-tuned for task instructions using parameter-efficient fine-tuning (PEFT). In this method, only a small portion of the model's parameters are altered, reducing the need for computational power and memory.

The researchers reported that Persian-Phi achieved impressive multilingual skills despite its small size. The results suggest that developing Persian-language AI does not necessarily require massive supercomputers or models with tens of billions of parameters, but rather a carefully designed learning path.

Source: Persian-Phi: Efficient Cross-Lingual Adaptation of Compact LLMs via Curriculum Learning, ArXiv (AI).

This text was generated with AI assistance and may contain errors. Please verify details from the original source.

Original research: Persian-Phi: Efficient Cross-Lingual Adaptation of Compact LLMs via Curriculum Learning
Publisher: ArXiv (AI)
Authors: Amir Mohammad Akhlaghi, Amirhossein Shabani, Mostafa Abdolmaleki, Saeed Reza Kheradpisheh
December 27, 2025
Read original →