Journaling of Selected Activities
- Talk – Chris Hokamp (Senior Research Scientist at AYLIEN) “NLP Careers for Linguists”
During the first week of classes, on January 16th, I attended a talk by senior research scientist Chris Hokamp of AYLIEN (Dublin), an AI, NLP, and machine learning startup. This talk piqued my interest specifically because Chris followed a path similar to the one I’m on: he received a BA in Music and MA in Linguistics (at UNT), and then decided to “jump ship” by doing a Computer Science Ph.D., specifically in Machine Translation. He wanted to talk to linguists and encourage them to do the same should they be so inclined. Specifically, he talked about how the skills that linguists have built can be segued into Data Analyst and Data Scientist careers, and what skills to learn to make this happen. The talk had some enjoyable and informative moments, but I was honestly hoping for much more pragmatic nitty-gritty detail than we received, since this is something I plan on doing within the next 3 years. I asked some questions regarding what Computer Science departments and research startups look for in someone with a Linguistics background, and I guess the answer really comes down to “program, program, program.” Will do! At any rate, thanks goes out to Mary Burke for organizing this talk and to Alex Smith for being Chris’s point of contact. This talk was important to me, since it exposed me to someone with the same goals who has met with substantial success in research.
- Talk – Antonios Anastapoulos – “NLP for Everyone”
On February 20th, I attended a talk given by Computer Science candidate Antonios Anastasopoulos, who is currently a postdoctoral researcher at Carnegie Mellon University. His primary research interest is building language models for low-resource languages; in this talk, he shared the results of a system he has built to perform machine translation on low-resource languages.Dr. Anastasopoulos’s model uses multiple sources (speech data and matching translations) as input to begin to offset the lack of resources, and uses attention in its translations (attention in the ML sense means that the system is capable of focusing on certain weighted factors when processing data). The model is also capable of working without translation data, since some low-resource languages do not even have this; in this case, it uses the output of the first task (audio –> translation) as additional input for the second. Another approach that works well for extremely low-resource languages is training data augmentation, including hallucination (i.e. using nonsensical but morphologically correct words for the language such as “wugs”) and using data from closely related languages. Hallucination improved Dr. Anastasopoulos’s model accuracy by 12%, and using closely related source data led to a 14% improvement.There is still plenty of work to do on making the model robust to noise (currently, data “noise” or messiness such as a typo in a word can cause a significant loss of accuracy); the end goal is a framework with a simple API that can take language data of any type as input and produce any other type as output. The end product will ideally be accessible even to those communities and researchers without computer science backgrounds.This was a fascinating talk, and Dr. Anastasopoulos’s passion about his research is clear. His system is a remarkable achievement, especially when you consider that it achieved decent translation results on Ainu (the language of the native people of Japan; I believe the training data had speech from only one person)!
- Talk – Pam Dukes – How to find your voice and passion as well as achieve your goals
This morning, I attended an online talk by U.S. Olympian and motivational coach Pam Dukes. Despite her obviously outstanding accomplishments (being an Olympic-level track and field athlete, for one), Pam self-described as the former queen of minimization: her coworkers didn’t even know what she was training for, and once the internet became widespread, she begged anyone who happened to Google her name to stay silent and not let word of her accomplishments get out.She says that this all changed once she discovered the Harada method, created by Takashi Harada, which promotes self-reliance and a spirit of optimism while working on one’s personal and professional goals in a systematic way. The Harada method purports to work on goals related to each individual’s body, mind, and spirit all at once. This method taught Pam that everyone has a superpower, and that she didn’t need to hide hers.A few other mainstays of the Harada method include:
- Let yourself listen to the people who tell you you’re good at something
- Surround yourself with those who are on your “team”
- Learn how to be grateful
- Help others
- Never stop learning
- Work on becoming more optimistic
- Set goals and find a way to achieve them
Personally, I was hoping that Pam would go into the specifics of the method a bit more so we could see an example at work, but she mainly focused on the ways in which the method has changed her life. This is quite possibly because adherents to the system make money on coaching others, and this is considered more or less proprietary information.
- Talk: Richard Wolf, Ethnomusicology and the Musical Poetry of Endangered Languages
Another Discovery Series talk that took place today, March 20th, was Dr. Richard Wolf’s research on ethnomusicology as it is seen in the native cultures of two endangered languages: Kota and Wakhi. Kota is a language in the Dravidian family spoken by about 930 individuals (as of the latest census) in a remote part of Tamil Nadu, while Wakhi is in the East Iranian branch of IE languages and is spoken in Northern Afghanistan and Tajikistan, as well as Pakistan and parts of China.This was a fascinating talk, as Dr. Wolf is both extremely knowledgeable and passionate about his subject material; he has been gathering records of native speakers of both Kota and Wakhi singing indigenous songs for decades. We were exposed to a few native song genres, and shown how the poetry of their rhyme and meter differ from and compare to each other as well as other relatively high-resource genres such as songs from Tamil-language films.
One story that I found particularly interesting was that of the composer A.K. Rangan, a member of the Kota who traveled outside the community for some time and picked up other musical influences, as his later songs bore some similarity to Tamil film music in their melodic range. Unfortunately, despite the beauty of these songs, showing this marker of outside identity led to Rangan being socially ostracized. This has happened to a number of individuals from both the Kota and Wakhi cultures.
The song genre that stood out to me was the lament; there seem to be a few subtypes of laments, depending on the specific content and/or metrical qualities of the song. The bulbulik is a form of Wakhi lament sung without any instruments other than droning vocals. A variation, the bayd, touches on similar themes but can be sung with other instruments as well. Another Kota form, the atl, is often used in a funerary context but takes a relatively uplifting tone, reflecting upon the fact that not everything about celebrating a lost loved one need be a lamentation.
This talk was very informative and information-dense; the topic was one I had no prior familiarity with, so I was grateful to be exposed to a musical form new to me.
- Workshop – Toulouse Graduate School – “Preparing Materials for the Academic Job Market”
On March 24th, I attended a graduate workshop given primarily by Brian Clifton (a Ph.D. candidate in the English department) in which we took a look at materials commonly requested and submitted for academic jobs. The workshop was geared most specifically to those in the course of their Ph.D.s who will be looking for academic jobs in the near future.
First, we took a look at a few example job letters (a critical part of an academic job application, the job letter is essentially a cover letter, but is focused more specifically on the applicant’s research and research interests and why those might be a good fit for the university), including one letter that was highly successful and a couple others that were less dazzling. One piece of advice Brian gave was to avoid “writing like a graduate student,” by which he meant from a primarily class-centered, completion-based perspective (e.g. “After writing 3 out of 4 sections, I found that…”), instead re-centering on a research focus and treating your audience as though they are potential colleagues as opposed to remote superiors.
Second, we looked at teaching philosophy statements, and examined some common pitfalls that job applicants make. One of the pitfalls described was the overuse of pedagogical buzzwords that have lost all meaning, such as repeatedly referring to a “student-centered” classroom. Another was tailoring a teaching philosophy to a class size that is atypical of the university; while there is nothing per se wrong with this approach, it might make the reviewers question whether you are able to successfully handle a more typical class at that university. A question asked in this section was whether a successful teaching philosophy necessarily had to align with and discuss a formal pedagogical theory, and the answer was that this level of formality is not needed, but that reviewers want to see that a candidate is being thoughtful about pedagogy.
I appreciated the opportunity to see some real-world examples of these documents and their receptions; it helps to form a mental representation of what success on an academic job application looks like.
- SLANT (Student Linguistic Association of North Texas) Meetings
I did my best to attend SLANT meetings, especially as they applied to computational topics. Occasionally there were talks that related to NLP, both from an emerging research perspective and an employment perspective.
- Workshop – namely a series of workshops on how to use the TALON3 supercomputer in our research. I didn’t take detailed notes at the time of this workshop, but what I learned from it came in handy and paved the way for me to use TALON3 in my own final-semester research. The amount of data I was working with would have far exceeded the capabilities of my own machine.
- Summer of Python 2019
Every Tuesday for a period of time over the summer, several other students, Dr. Alexis Palmer, and I got together to discuss various aspects of using Python in NLP. We took turns presenting on topics that would be especially useful to us in future years; my topic was Python 2 vs. Python 3, since we knew that a professor we’d be studying with preferred the former and we had initially learned the latter. This was a great way to keep our skills sharp over the summer!