A Microsoft-backed project aimed at strengthening the Maltese language in the digital sphere has been launched, with $51,000 in funding to develop AI and language-processing tools.
Named SaqWI-QA/SaqWI, the project has been awarded $51,000, including $15,000 in Azure Credits through Microsoft’s Lingua funding scheme which supports European languages with limited resources on AI platforms.
The announcement was made during a meeting between Culture Minister Owen Bonnici and Microsoft General Manager for Malta, Greece and Cyprus Yana Andronopoulou.
Bonnici described the SaqWI project as “an important step forward” in the technological development of the Maltese language, ensuring its presence and relevance for future generations.
He added that collaborations with public and private entities would continue to be strengthened so that Maltese can advance further in the digital sphere.
The project will make use of the extensive audiovisual archives of Public Broadcasting Services (PBS) to create an open dataset including at least 5,000 question-and-answer sets and between 100 and 150 hours of aligned transcription snippets.
These resources, based on local broadcasts such as news programmes, documentaries and cultural shows, will ensure the dataset is culturally and historically relevant to the Maltese population.
Azure AI services will be used to transcribe the broadcasts, with human annotators refining the output for cultural and linguistic accuracy.
SaqWI will provide tools for future educational and technological applications. The dataset will be released under open licences to encourage long-term sustainability and further research on Maltese as an underrepresented language.
It will also serve to benchmark and audit the quality of automated speech recognition (ASR) systems and improve transcription for underrepresented languages.
In addition, the project will collaborate with TVM and Microsoft to enhance digital ASR tools for Maltese, using the 6pm news bulletin as a pilot project.
This will address challenges such as code-switching, spelling and intonation variation, and pave the way for inclusive subtitling solutions for the hearing impaired, the elderly, language learners and the general public.
