FRE 1149H-S

Computational Tools for Linguists

Modalités de diffusion du cours

LEC0101 - En présentiel


Monday 9h - 13h

Instructor :

E. Dunbar


Odette Hall


OH 224

Description :

Computational corpus analysis and computational modelling have grown in importance in linguistics over the past decade, allowing linguists to make use of much larger data sets and much more nuanced analyses than ever before. Computational methods are becoming increasingly essential for linguistic research. This course provides a friendly introduction to computational tools relevant to research on language, spanning phonetics and phonology, syntax and morphology, and semantics. Python programming will be introduced: no prior programming skills are required to enroll in this course.

The main objectives are to provide students with new tools, to give them inspiration for new projects, and to give them the opportunity to carry out a mini-project using computational tools. The course will also provide the background needed to be able to follow current research papers in the field of computational linguistics. We will also examine articles that use or develop the underlying techniques and theories. The course will examine familiar problems in the language acquisition, language education, language change, in addition to core problems in linguistics, with a priority given to problems related to French.

The course will cover in detail: tools for automatic analysis of acoustic corpora (extraction of acoustic features) and for the calculation of acoustic / phonetic similarity; tools for the automatic annotation of morphological and syntactic properties; tools for the analysis of lexical semantics; as well as common linguistic applications of each of these types of tools. The order of presentation of each of these topics will be adapted to the group, in order to allow each student to be able to develop a final project relevant to their research interests.

The course will be taught in English.

Bibliography :

Jurafsky, D., & Martin, J. H. (À paraitre). Speech and language processing: An introduction to natural language processing, computational linguistics, and speech recognition. 3e édition.

Kübler, S., McDonald, R., & Nivre, J. (2009). Dependency parsing. Synthesis lectures on human language technologies, 1(1), 1-127.

Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.

Mohamed, A. R. (2014). Deep neural network acoustic models for ASR (Doctoral dissertation, University of Toronto).

Gaume, B., Ho-Dac, L.M., Tanguy, L., Fabre, C., Pierrejean, B., Hathout, N., Farinas, J., Pinquier, J., Danet, L., Péran, P. & De Boissezon, X. 2019. Toward a Computational Multidimensional Lexical Similarity Measure for Modeling Word Association Tasks in Psycholinguistics. In Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics (pp. 71-76).

Gupta, S., & DiPadova, A. (2019). Deep Learning and Sociophonetics: Automatic Coding of Rhoticity Using Neural Networks. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop (pp. 92-96).

Hale, J. (2001). A probabilistic Earley parser as a psycholinguistic model. In Second Meeting of the North American Chapter of the Association for Computational Linguistics.

Jimerson, R., & Prud’hommeaux, E. (2018). ASR for documenting acutely under-resourced indigenous languages. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018).

Mickus, T., Constant, M., & Paperno, D. (2020). Génération automatique de définitions pour le français. In Actes de la 6e conférence conjointe Journées d'Études sur la Parole (JEP, 31e édition), Traitement Automatique des Langues Naturelles (TALN, 27e édition), Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (RÉCITAL, 22e édition). Volume 2: Traitement Automatique des Langues Naturelles (p. 66-80). ATALA.


Homework related to fundamental techniques (30%) ; discussion of articles (10%) ; final project (60%, distributed over a proposal, draft, final version, and presentation).