Why a Berber Corpus?

Current developments in general linguistics and typology, as well as the needs of researchers in Speech Technology give a prominent place to corpus-based research. In previous years, the dominant initiatives in corpora-compiling concerned well-known and widely studied languages such as English. However, the promotion of corpora for less documented languages is becoming a general tendency.

It is in this dynamic perspective that the project "A Corpus for Berber Languages" was launched. The history of the project, which dates back to December 2000, is linked to that of another project, also proposed by A. Mettouchi, which is currently in progress (the intonation systems of Berber languages). The relationship between the two is rooted in the fact that Berber languages are primarily spoken. The constitution of a corpus composed of audio- and videotapes, attached to a transcript and a translation, will hopefully allow a thorough study of the links between information structure, intonation, pragmatics and morpho-syntax. For some dialects, which have developed a written production, texts (press and literature) will also be included.                                     

In our project, emphasis is on the variety of genres, of dialects, of speakers. Special attention is devoted to conversational corpora, as well as culture-specific interactions.

The first stage of this project, which is in progress, consists in contacting interested partners inside and outside France, selecting appropriate transcription systems, setting up norms for the collection of data (recording standards for prosodic treatment, basic and unified information on speakers, recording conditions, archivation system, access...).


A workshop on this project took place on Friday 6 December 2002 at the INALCO, 2 rue de Lille, 75007 Paris.

A Web Page dedicated to the collection and transcription of the corpus is accessible from this page.

Apart from the researchers belonging to the INALCO, this project gathers researchers (collaborating or associated to the CRB) belonging to various institutions.



Salem Chaker, Professeur à l'INALCO

Abdallah Bounfour, Professeur à l'INALCO

Rachid Bellil, Chercheur associé à l'INALCO

Amina Mettouchi, Maître de Conférences à l'Université de Nantes, chercheur associé à l'INALCO

Hakim Smaïl, Doctorant à Paris 5 ; chargé de cours Inalco/Paris 5

Johanna Kuningas, Doctorante à Paris 3

Yasmina Oubouzar, DEA à l'INALCO

Farida Chekri, DEA à l'INALCO


Abdallah Boumalk, Maître-Assistant, Université d'Oujda, Maroc

Mena Lafkioui, Assistante de recherche, université de Leyden, Pays-Bas

Naïma Louali, Chargée de Recherche au CNRS, DDL-Lyon 2

Nora Tigziri, Maître-Assistante, Université de Tizi-Ouzou, Algérie

Rachid Ridouane, Doctorant à l'ILPGA, Paris 3


This list is not exhaustive. If you are interested, please contact: