Over the course of my time at UNT, I’ve had the privilege of taking many fascinating courses that have all contributed to the creation of a strong linguistic and technical foundation. Here I relate a few highlights, only a fraction of the value I have received from them.
LING 5070 – Qualitative Research Methods
Dr. Xian Zhang
This course, taken with the invaluable Xian, was my very first course attended at UNT. After being out of the academic game for twelve years, and not necessarily being steeped in thinking scientifically in my undergrad years, this course was a welcome reintroduction to thinking like a grad student. Prominent among our assignments was the requirement to read many academic journal articles, critiquing each from a methodological perspective, and present in detail on at least one of these. We also developed our own research proposal (although we were not required to carry out this research due to the time limitations of the semester), which underwent critiques and editing for feasibility and experiment design. I enjoyed this class so very much, and was delighted to be asked to think in this way – like a linguist — again.
LING 5300 – Phonology
Dr. Alexander Smith
Also undertaken in my first semester, Alex’s Phonology class was an equally fun reintroduction to Phonology, a class I took and enjoyed in undergraduate school. This class was centered around a number of challenging problem sets, beginning with a crash course in the International Phonetic Alphabet and including detailed phonological data from first language acquisition, diachronic vowel shifts, and phonologically-motivated allomorphy. As a final project, I undertook a phonological description of Hindi, a language I do not speak but am often exposed to through media. This paper refined my sense of descriptive precision, especially on those points where scholars differed slightly, and especially in the finer points of description (e.g. generalizing certain phonotactic constraints from data). Alex’s teaching style was approachable and enjoyable while still being information-dense, so class was always a pleasure to attend.
LING 5500 – Corpus Linguistics
Dr. Xian Zhang
This hands-on introduction to using corpora to perform linguistic analyses was perfectly timed. Coupled with Research Methods, it gave me the space to practice the principles I was learning in that course. I used COCA (The Corpus of Contemporary American English) to perform a painstaking analysis using which I attempted to answer the question, “Is the News Really Becoming More Negative?” Looking back, this would have been a much faster process if I had known Python and been able to use more computational tools. But I hadn’t yet really begun the programming part of my degree, and I searched for and obtained all the figures I needed by hand, compiling them in a lovingly color-coded spreadsheet. There were definitely some design flaws in this study, but it yielded some interesting results (not more negative words, but definitely words with stronger affect, have appeared in American news over time).
LING 5310 – Syntax
Dr. Haj Ross
We started off on the right foot by drawing over 70 syntax trees, which I found soothing and enjoyable. More than anything, this class taught me to “interrogate” sentences to persuade them to expose their internal workings, which wasn’t something that I had been asked to do before (which was strange to me – somehow, I earned my B.A. without a single Syntax course. It wasn’t by design; it just worked out that way). There was a beautiful synergy between my TAing for the undergraduate Syntax course. I was reading all the undergrads’ readings in order to be able to grade them knowledgeably, building a strong syntactic foundation with those introductory readings in parallel with graduate Syntax. Since that semester, I’ve been so much stronger in Syntax, which has been a great boon to my study of NLP algorithms.
LING 5410 – Computational Linguistics I
Dr. Alexis Palmer
Computational Linguistics was my favorite course. I got to really “get down to it” and start building a foundation in Python that would pave the way for my remaining semesters. We began with an introduction to NLP concepts alongside introductory programming exercises, and I was extremely grateful for the gentle introduction. By semester’s end, I was creating working programs that were able to complete textual analyses (not one to be serious all the time, I worked with a friend to build a program that replaced only the dialogue of Pride and Prejudice with txtspeak, mildly successfully), and while there was quite a distance to go in my Pythonic journey, I had fun literally every step of the way (there’s nothing like that thrill that says, “I am so happy I get to learn this!”). In retrospect, this course held additional appeal due to the state of blissful ignorance we were still in: we got to have fun creating and learning without yet being conscious of the vastness of the things we didn’t know.
LING 5075 – Quantitative Research Methods
Dr. Xian Zhang
Quantitative Research Methods took me deeper into the world of statistical tests, which ones were possible to run, and how to ensure the suitability of one’s data for a given test than I anticipated. Since I eventually want to go into research, this was much appreciated; it can be all too easy to run some numbers regarding one’s results and draw unjustified conclusions. Even in the professional world, this is not uncommon. We learned how to use SPSS for a number of analyses, but not before running through the calculations first using Excel, to make sure we thoroughly understood the test itself (something I really liked – when considering math, I want to know why calculations are done the way they are, and how they relate to the attributes of the data). Xian also played a legendary April Fools’ prank on us – he had us thinking we had to write a last-minute 25-page paper for a good 45 seconds. My heart still isn’t okay from that.
INFO 5502 – Tools and Methods for Data Analysis
Dr. Junhua Ding
This course, which primarily used Python packages pandas and DataScience (a package created by the data science team at Berkeley specifically for the purpose of teaching their introductory courses), was perfectly timed for me. Having the statistical foundation and the programmatic one too, I was very comfortable with the level of technical acumen required and was able to focus on learning the concepts of data science and analytics. We worked with a number of huge datasets, and through the rigors of statistical analysis, were able to use deductive analysis to draw conclusions about the world from them. For instance, one of our homework assignments had us working with a huge dataset of number of murders per state across a 40-year timespan, and from this our task was to deduce whether or not the death penalty helped decrease the number of murders, as its supporters would claim (hint: it did not). There is a bit of an adrenaline rush to being able to come to a broader conclusion like this on one’s own, using only the power of numbers.
INFO 5709 – Data Visualization
Dr. Jian Yang
In this incarnation, Data Visualization was an online class that dovetailed nicely with Data Analytics. Most of the class focused on the design principles of good visualizations and on visualization literature, all of which was enjoyable; I used Microsoft PowerBI to create a number of visualizations about stock prices and box office figures. As my final project, I undertook an analysis of the language used in movie synopses; the idea was to see if the frequency of the language used in the synopses could be statistically related to other factors, such as genre or success at the box office. It was an ambitious project – I came up with my own (rudimentary) frequency metric and painstakingly transformed my data before visualizing the results. In the end, while I was unable to reject the null hypothesis, I learned a lot, and plan on producing an updated version of this project based on what I know now.
CSCE 5290 – Natural Language Processing
Dr. Eduardo Blanco
Natural Language Processing pushed us over the high initial peak of the Dunning-Kruger curve, making us fully aware of how little we knew about programming. I had been simultaneously looking forward to this class and dreading it for the entirety of my coursework; looking forward to, because the subject matter is of huge interest to me and it is directly applicable to what I want to do, and dreading, because in my first semester word got round that we had to build a Hidden Markov Model that uses the Viterbi algorithm to do supervised POS tagging from scratch, and I had less than a year of programming experience (and some of that during a summer where I didn’t do a lot of programming) under my belt. Linguistics students described this experience as a gauntlet, a sink-or-swim type of situation, and I can confirm this. There were some dark moments where I feared that despite almost 100 hours of effort I might not get my code working in time – but, fortunately, I did, and while the results may not be elegant, they’re quite functional. After that, I felt like a real programmer, and like someone might actually pay me to do this one day (a very good feeling). Dr. Blanco is a great professor, and guided us through this experience with a lot of help and a lot of compassion. This was a difficult experience at times given the gulf between what I knew at the time and what was required of me, but it is not one I would have traded for anything.
CSCE 5380 – Data Mining
Dr. Eduardo Blanco
For the second Computer Science course in my arsenal, I picked Data Mining. While Data Mining isn’t related to language directly (most of the data we work with are numeric), the process of inducing attributes of data using machine learning and then deducing from what the system has learned is closely applicable to NLP applications. Classification using machine learning is everything nowadays, and building classifiers is one of the best possible skills I can learn as a strong computational linguist. Among other algorithms and projects, we got to implement a decision tree using Hunt’s algorithm and create a simple knn (k-nearest neighbors) classifier. The structure of this class was similar to NLP in that we were taken on a tour of many common algorithms at a step-by-step level, imbuing us with a greater understanding of their structure. Dr. Blanco also has a knack for explaining complexity in an accessible way; I couldn’t imagine a better professor to teach these courses. My experience with computer science courses has been so positive that it’s influenced how I plan to approach my career plans and my future doctoral research.
LING 5415 – Computational Linguistics II
Dr. Taraka Kasicheyanula
This Computational Linguistics course has been taught more as a computer science course than a linguistics course, which was appropriate at this stage in my studies. There were only 4 computational linguists to 8 Computer Science PhD students, and the course is structured primarily as a seminar course, meaning that we get to learn about new research and advanced concepts in machine learning from individuals who have studied these from an engineering perspective for multiple years. The exposure to this different context is extremely helpful for my own understanding. There were certainly moments where the mathematical content of the presented research flew over my head a bit, but this only engendered curiosity and consequently intellectual growth. My machine learning project for this course, which involves the automatic detection of propaganda in American news articles, was also a primary source of intellectual growth on the level of NLP. Not only did I grapple with building a system of intimidating scope and difficulty, but I also worked with Dr. Palmer to write and submit a paper to SemEval-2020 on this topic.
LING 5990 – Professional Development
Dr. Patricia Cukor-Avila
In Professional Development, we explored several aspects of how to prepare our work for submission and review in the wider academic world. We’re looking at conference abstracts and grant proposals, which will come in handy when I decide to pursue a PhD. The resume/CV assignment and accompanying workshop were also very helpful; while I already had both documents, I was able to refine them and tailor them to several types of desired positions. Early in the class we were able to get together and have peer editing sessions; unfortunately, later in the semester external conditions kept these sessions from continuing! I also wrote my first annotated bibliography as part of this course. This was an enjoyable assignment that allowed me to explore metrics for calculating word frequency in textual data, from simple frequency counts to tf-idf.