Experienced multidisciplinary AI/ML researcher specializing in Natural Language Processing (NLP) and Artificial Intelligence (AI), successfully automating the processing of language data (both text and audio) and modelling of low-resource/minority languages. Developed and deployed NLP systems using a spectrum of methods from finite-state to Machine Learning (ML) approaches, resulting in over 10 peer-reviewed scientific publications in prestigious NLP venues such as the Association for Computational Linguistics (ACL) and Empirical Methods in Natural Language Processing (EMNLP) within a span of 5 years. Creative and self-motivated professional with a demonstrated ability to integrate seamlessly into cross-functional environments. Organized the Field Matters Workshop at ACL, fostering collaboration among industry partners, academics, and language community members.
– The project “The building blocks of language: Words and phrases in Central Australian languages,” funded by an Australian Research Centre (ARC) Discovery grant, achieved the
following milestones:
– Advocated for and successfully presented a business case to automate processing rather than relying on manual methods for each language.
– Gathered user requirements from diverse stakeholders, including non-technical end users.
– Developed ETL (Extract, Transform, Load) tools to construct specialized databases for phonotactic structures in Kaytetye, Warlpiri, Warumungu, and Pitjantjatjara languages,
resulting in a more efficient workflow and reducing 0.5FTE.
– Conducted statistical modelling to analyze phonotactic distribution in each language.
– Produced technical documentation and provided training to researchers on the use of these tools.
– Developing and presenting theoretical questions and practical coding problem sets.
– Evaluating assignments and conducting assessments for the final exam.
– Teaching subject content encompassing Information Retrieval (IR), Introductory Machine Learning (ML), and Natural Language Processing (NLP).
– Collaborating on course design and structure adjustments based on student performance and feedback.
– Assessing assignments and providing constructive guidance and feedback on final projects.
– Teaching subject content focused on research methodologies, encompassing both qualitative and quantitative approaches.
– Conducting industry-tailored workshops for one of Australia’s top four banks and a global insurance company involved:
– Designing and delivering workshops tailored for non-technical middle managers.
– Content covered fundamental aspects of data, including basics, analysis, and visualization, emphasizing practical applications in AI and machine learning.
– Reorganized collected raw data into a usable format.
– Generated metadata retrospectively for inherited data.
– Developed and maintained the database for the Nen language, which included over 30 hours of audio/video recordings, contributing to the CoEDL Corpora Collection.
– Presented derived insights through reports and presentations.
– Collaborated with key stakeholders to optimize processes and ensure adherence to archiving standards.
– Identified opportunities for data acquisition and assisted in preparing data for fieldwork.
– Assisted in managing and preparing data for the online Nen Dictionary.
– Established a database for Nen and Nambo languages as part of the Transdisciplinary and Innovation Grant (CoEDL) funded project.
– Conducted testing on four principal aligners commonly used in socio-phonetics (MAUS, LaBB-CAT, FAVE, and MFA), leading to the development of a poster titled “Enhancing forced-
alignment in minority languages,” presented at CoEDL Fest, Western Sydney University, Australia.
– Led the Grambank project, the world’s largest database of language structure, involving collaboration with over 105 global partners.
– Coordinated an international effort between prestigious institutions including Max Planck Institutes, Australian National University, University of Auckland, Harvard
University, Yale University, and others.
– Developed and managed a database encompassing 195 independent features across 2,467 languages, totaling over 800,000 datapoints.
– Coded typological surveys for under-described languages of Oceania and the Pacific, utilizing sketch/full grammars and published language resources.
– Contributed to a Science Advances publication highlighting the impact of genealogical constraints on linguistic diversity and the consequences of language loss.