Saliha ********** AI/ML Researcher

$720 / day


Experienced multidisciplinary researcher specializing in Natural Language Processing (NLP) and Artificial Intelligence (AI), successfully automating the processing of language data (both text and audio) and modelling of low-resource/minority languages. Developed and deployed NLP systems using a spectrum of methods from finite-state to Machine Learning (ML) approaches, resulting in over 10 peer-reviewed scientific publications in prestigious NLP venues such as the Association for Computational Linguistics (ACL) and Empirical Methods in Natural Language Processing (EMNLP) within a span of 5 years. Creative and self-motivated professional with a demonstrated ability to integrate seamlessly into cross-functional environments. Organized the Field Matters Workshop at ACL, fostering collaboration among industry partners, academics, and language community members.


AudacityBashData Processing and ManagementData visualisationDeep LearningELANFLExGitHubGoogle CloudIgorLaTeXLexique ProLinuxMathematicaMATLABOverleafPraatPythonR StudioSQLStatistical AnalysisTransformer models
AI ConsultantAI Research ScientistComputer Vision EngineerData ScientistDeep Learning EngineerMachine Learning Engineer


2024 PhD in Computational Linguistics at Australian National University
2017 Advanced Masters of General and Applied Linguistics at Australian National University
2015 Honours in Physics at Australian National University
2014 Bachelor of Science at Australian National University


Jan 2024 - Present Research Assistant/ETL developer at UNIVERSITY OF NEWCASTLE

– The project “The building blocks of language: Words and phrases in Central Australian languages,” funded by an Australian Research Centre (ARC) Discovery grant, achieved the
following milestones:
– Advocated for and successfully presented a business case to automate processing rather than relying on manual methods for each language.
– Gathered user requirements from diverse stakeholders, including non-technical end users.
– Developed ETL (Extract, Transform, Load) tools to construct specialized databases for phonotactic structures in Kaytetye, Warlpiri, Warumungu, and Pitjantjatjara languages,
resulting in a more efficient workflow and reducing 0.5FTE.
– Conducted statistical modelling to analyze phonotactic distribution in each language.
– Produced technical documentation and provided training to researchers on the use of these tools.

July 2022 - Dec 2022 Course Facilitator for COMP4650/COMP6940: Document Analysis at COLLEGE OF ENGINEERiNG AND COMPUTER SCIENCE

– Developing and presenting theoretical questions and practical coding problem sets.
– Evaluating assignments and conducting assessments for the final exam.
– Teaching subject content encompassing Information Retrieval (IR), Introductory Machine Learning (ML), and Natural Language Processing (NLP).

July 2021 - Dec 2022 Course Facilitator for ASIA8022: Approaching Asia and the Pacific: Concepts, Tools, and Methods at COLLEGE OF ASIA AND PACIFIC

– Collaborating on course design and structure adjustments based on student performance and feedback.
– Assessing assignments and providing constructive guidance and feedback on final projects.
– Teaching subject content focused on research methodologies, encompassing both qualitative and quantitative approaches.

Jun 2022 - Sept 2022 Data Science Facilitator at ACADEMY Xi

– Conducting industry-tailored workshops for one of Australia’s top four banks and a global insurance company involved:
– Designing and delivering workshops tailored for non-technical middle managers.
– Content covered fundamental aspects of data, including basics, analysis, and visualization, emphasizing practical applications in AI and machine learning.

Feb 2017 - Apr 2022 Research Assistant/Data Analyst at ARC CENTRE OF EXCELLENCE FOR THE DYNAMiCS OF LANGUAGE

– Reorganized collected raw data into a usable format.
– Generated metadata retrospectively for inherited data.
– Developed and maintained the database for the Nen language, which included over 30 hours of audio/video recordings, contributing to the CoEDL Corpora Collection.
– Presented derived insights through reports and presentations.
– Collaborated with key stakeholders to optimize processes and ensure adherence to archiving standards.
– Identified opportunities for data acquisition and assisted in preparing data for fieldwork.
– Assisted in managing and preparing data for the online Nen Dictionary.

July 2018 - March 2019 Research Assistant/Data Engineer at ARC CENTRE OF EXCELLENCE FOR THE DYNAMiCS OF LANGUAGE

– Established a database for Nen and Nambo languages as part of the Transdisciplinary and Innovation Grant (CoEDL) funded project.
– Conducted testing on four principal aligners commonly used in socio-phonetics (MAUS, LaBB-CAT, FAVE, and MFA), leading to the development of a poster titled “Enhancing forced-
alignment in minority languages,” presented at CoEDL Fest, Western Sydney University, Australia.

Dec 2017 - Apr 2018 Research Assistant/Analyst at MAX PLANCK INSTiTUTE FOR THE SCiENCE OF HUMAN HiSTORY

– Led the Grambank project, the world’s largest database of language structure, involving collaboration with over 105 global partners.
– Coordinated an international effort between prestigious institutions including Max Planck Institutes, Australian National University, University of Auckland, Harvard
University, Yale University, and others.
– Developed and managed a database encompassing 195 independent features across 2,467 languages, totaling over 800,000 datapoints.
– Coded typological surveys for under-described languages of Oceania and the Pacific, utilizing sketch/full grammars and published language resources.
– Contributed to a Science Advances publication highlighting the impact of genealogical constraints on linguistic diversity and the consequences of language loss.