Under the Microscope – Professor Karin Verspoor | RMIT University
03 March 2024Why did you choose your field of research? My work is interdisciplinary, combining computer science with linguistics, and now I work primarily on applications of natural language processing in biomedicine. I did a double major during my undergraduate studies, studying both computer science and cognitive sciences, which inspired me to try to combine the two.…
Why did you choose your field of research?
My work is interdisciplinary, combining computer science with linguistics, and now I work primarily on applications of natural language processing in biomedicine. I did a double major during my undergraduate studies, studying both computer science and cognitive sciences, which inspired me to try to combine the two. I started with programming at a young age, at around 9 years old I started teaching myself BASIC, and then I was lucky enough to attend a high school for Science and Technology where I was able to learn more programming (Pascal! Yes, I’m old.). I enjoyed the challenge of solving problems with the step-by-step, logical approach needed to take advantage of the speed and processing power of a computer. I have also always been fascinated by language, probably thanks to my early childhood exposure to Dutch, French, and English – and perhaps even Wolof or Chichewa – while living in Africa (Senegal and Malawi) with my Dutch parents. I find it amazing that we can communicate complex ideas from one person’s head to another through language. The expressive power of human language is tremendous. I was fascinated by the idea of trying to develop computational systems that could make sense of language, and hoped that we would gain insight into how language works through this process.
Do you have a surprising or fun fact to share from your research?
These days, especially in anything related to artificial intelligence, there is the common perception that all you need to solve problems is data. If your model doesn’t quite work, you need to find more data, or possibly label more examples, and try again. However, we are finding that more data isn’t always the answer – it has to be the right kind of data to address whatever the underlying problem with the model is – and labelling more examples to guide the model requires a lot of human resources that we may not have. One particularly intriguing result from recent experiments done by my PhD student Thinh Hung Truong at the University of Melbourne is that large language models have a lot of trouble with negation (words like “not” and “don’t” or prefixes like “un-“). The models essentially don’t “see” the negation, e.g. giving the same responses to positive and negative versions of a question. They also don’t understand how words are related via negation, for instance even huge models are not reliably able to correctly predict the antonym of a given word.
Your work focuses on artificial intelligence in medicine – can you walk us through what is involved in your research?
My work is primarily about transforming linguistic data into actionable information. In biomedicine, huge amounts of data are captured in textual form. In scientific contexts, there are millions of publications that describe scientific findings and report the results of research such as clinical trials. In clinical contexts, there are referral letters, nursing notes, and reports describing histopathology or radiology findings. People even use social media to discuss their health. My research involves the development of methods and approaches to “machine read” these texts, usually looking for specific kinds of information – concepts and relationships between them – but also to summarise or synthesise facts. The tools we build have to understand the language of biomedicine – how symptoms and diseases are referred to, what genes and proteins are, and how interactions between these concepts/entities might be described – and have to be robust to linguistic variability, or all the creative ways that words might be used and combined to express core facts. The objective is to build a system that can cut through that variability and hone in on the core facts. In the process we can transform very complex natural language sentences to a structured representation, that can be stored in a database. This in turn makes the data much easier to search for, or to further analyse for patterns. As an example, we can take a prescription statement such as “Take one or two 325 mg Aspirin tablets twice a day” and structure it as e.g.
- [Drug: Aspirin]
- [Dose: [Unit: mg] [Value: 325]]
- [Form: tablet]
- [Quantity: [Min: 1] [Max: 2]][Frequency: [Window: Day] [Repeats: 2]]
Superficially it seems more complicated, but because each of these fields has a consistent semantics, the information can be utilised more effectively in a computational system. In addition, other ways of expressing the same thing such as “Take 1-2 tablets of Aspirin, 325mg, 2xdaily” can be mapped to the same representation – normalising away the linguistic variation.
What do you believe is the most important element of the intriguing work you undertake within RMIT University?
I do the research that I do because I think it matters. There is so much information trapped in scientific and clinical texts, and it is important that we find it and organise it, because it only really has value if it can be reused. We don’t really “know” anything unless we share it, and in turn other people find it and build on it. Because people are limited in terms of how much information they can read or truly absorb – we have limited time and limited memory capacity – it is very useful to lean on computers to help us with this. Working in the biomedical domain also means that my work has broader impact beyond the immediate methodological contributions. It can support scientists to find relevant findings in the literature to inform their research, or to have a more comprehensive knowledge base as a starting point for hypothesis generation. It can enable clinical decision making based on deeper synthesis of data about patients. Ultimately, I see my work as playing a role in accelerating our ability to understand, model, and make predictions about human health. Since health is one of the most valuable things we have in our lives, I feel that this work has real impact.
How much time do you spend on a university campus? What excites you about working within a university campus?
I am on campus every day. I actually moved with my family to the city after all the COVID lockdowns so that I could be here more often, and avoid a long commute. I wanted to be present for our staff and students, and to be able to take the “pulse” of the activities on campus. I love the intellectual environment on campus – the big questions that people are asking, the insights that they are gaining, and the learning that happens. I pop in on classes occasionally, and really enjoy seeing staff explaining concepts, and students working together to solve problems. Sometimes you can almost see a lightbulb going on in someone’s head. Watching people grow and learn from each other is really rewarding.
What do you think is the biggest challenge facing higher education in Australia?
This is a big question! There are many challenges that we are facing, including the impact of “virtualisation” of our world as so many activities can now be done online, the (reasonable) expectations of flexibility that students now have for their time and their learning experiences, and a host of factors beyond our control such as the economic and policy environments that we work within.
We are also currently grappling with the impact of genAI on authentic student learning and even on how we do research. What do students really need to know? How can we best support their learning when there are machines that know so much? When is it okay to use AI tools to help you study or do research, and when is it not? The answers to these questions are not black/white. As someone who deeply understands how these models work I am acutely aware of both the opportunities they represent, and their limitations. I don’t think we should simply bury our head in the sand and tell people not to use them – that would be pointless, and they obviously do have huge value! However, I also don’t think they change the fundamentals of how people learn. We need to find how AI tools can best help us, while also being aware of their limitations and where they can go wrong.
Do you have a hobby?
To be honest, I don’t really have a hobby in the traditional sense, unless you count coffee. I love a good coffee . But I do really enjoy travel, discovering new cultures, tasting new foods, experiencing the energy of a place, exploring nature, and learning about how history has shaped our world.
What’s your dream holiday destination and why?
I’m not really a sit-on-the-beach kind of person; while I can appreciate a beautiful tropical beach I prefer holidays where I can explore the diversity of nature, as well as learn or experience something new. Our family recently had a holiday in New Zealand, exploring both the North and South Islands. It was pretty dreamy! New Zealand has a mix of incredible nature landscapes, including glaciers, amazingly blue waters, and geothermal pools. It also has a number of very interesting museums where we learned about Māori culture as well as stories such as the Australia/New Zealand involvement in the Gallipoli campaign in World War I (and even how that connects to the Māori population). We walked a lot, learned a lot, and saw lots of cute animals and birds. I’m sure we’ll be back as we certainly didn’t see everything.
What would you say to a student about to begin their higher education journey in 2024?
I actually had the pleasure of welcoming over 1000 new computing technologies undergraduates to RMIT this week. I told them that it was an exciting time to be studying computing, as digital technology has become ubiquitous in our world, and integral to our lives and nearly every business or activity that you can imagine. But it is also a confusing time, because artificial intelligence is suddenly everywhere, and we don’t know yet exactly what this will mean for us all. Is generative AI the end of coding? (I don’t think so; there is a reason we still teach children how to add and subtract even though we have calculators.) How will the role of humans shift now that we have massive models that have encoded so much information?
Most importantly – and this applies to all students – I told them to make friends, and that studying is a team sport. I suppose this is timeless – university is a time to challenge yourself, to stimulate your mind, and to gain skills that will allow you to have a meaningful career. But it is also a time to find “your people”, including the people that will be in your professional network for years to come.