GenAI synth dialogues: potential learning tool — or disinformation WMD?
AI-synthesised dialogues are stunning the first time you hear them.
If you haven’t heard this in action, let’s take the Algenie biotech startup here at UTS and listen to this synthesised dialogue between a male and female presenter, helping you learn all about Algenie’s dynamic vision and strategy. Or when I upload one of my papers to Google’s NotebookLM, you get this inviting feature story all about it.
So overall, it’s the kind of easy listening ‘deep dive’ story you get on talk-radio/podcasts — an engaging way for the audience to get into the topic. The synthesised voices are indistinguishable from humans (notwithstanding the occasional glitch), and the male and female presenters joke, laugh, change tone, and interact quite compellingly. Everyone I know is impressed the first time they hear this. Yes, those fixed personas and accents might start to grate after a while — but the voices will of course be infinitely tuneable once this takes off. Apparently Spotify is being swamped already with AI-generated podcast interviews.
This is hardly a coincidental genre design choice if you’re looking to make a splash with your first release. Give it inoffensive material, and the presenters big up the content and authors in exactly the way you’d expect from its training material of podcast chats. In the chat about our paper, we authors are now “rockstars”… Nothing like a feature story about how awesome your work is! I did give it more challenging material, such as the executive summary to the World Economic Forum’s Global Risks Report 2023 (not exactly laugh-a-minute stuff), to see if the AI adapted tone of voice or genre in any way given the material. Interestingly, they do sober up noticeably — though the guy still jokes that “we’ve got to keep it light”.
(As a side-note, when you think about it, it’s a little odd that first came chatbots and only then came synthesised dialogues. One might have expected it to be the other way round — after all, surely far easier to engineer frozen dialogue about fixed content, than respond in real time to an infinite variety of users and topics.)
Can we harness AI synth dialogues for learning?
Once I picked my jaw off the ground on hearing it for the first time, my immediate reactions were to try and understand the genre better, wonder what the system prompt was (maybe this will come out in due course), hunt for the backstory (interesting interview with Raiza Martin the product manager), and then ponder what we could do with this educationally — right now, and in the future if only it was tuneable (more on that shortly).
Starting with its default talk radio format, does this open new educational possibilities for engaging with complex content in new ways?…
- Vicarious learning? Listening in on a conversation in order to understand a topic is of course a form of vicarious learning. Learning through listening to a skillful conversation goes back at least to the Greek philosophers of course. We all know that excitement when you’re interested in a topic, and get to listen in on two informed people exploring the questions and issues from different angles, in the process reducing the chance of being dazzled by a persuasive monologue. This may be even more compelling if you can identify with the interlocuters, and if they’re stretching you within your ZPD.
- Inclusion for neurodiversity? For neurodiverse students, and those with ADHD and ASD etc, could this be an accessibility and inclusion aid to pique interest and sustain attention? A few colleagues in this area have mused on the possibility.
- Critique the conversation? We could ask students to either listen to a conversation provided to them, or generate their own, and critique it, demonstrating their competence in whatever knowledge, skills and dispositions you’re teaching (argumentation; media literacy; gender roles…).
But my next question was how this could become more pedagogically tuneable?
Tuneable conversations
Then Google released an update which provides a system prompt window to customise the conversation. This is exactly what I had been hoping to see, although I’d imagined a GUI to guide the user around specific parameters. (As I noted early last year, in terms of UX, prompt engineering is a retrograde return to the command line interface, which Meredith Ringel Morris has recently argued in Prompting Considered Harmful.) Perhaps that will follow, but meantime, what can we do?
More serious tone of voice for serious material. Our students need to learn about some very serious matters. I leave it to your imagination as to how inappropriate it would be to have a jolly talk show chat about so many of the societal issues we confront. So I added an explicit prompt to make the tone of conversation suitably serious, for a news-hour type broadcast, on the Global Risks Report.
I think you can hear the change in tone clearly. However, I also asked them to deal with health risks first, which you can hear at 40secs in — however the meaning changes slightly, with the male presenter saying that the report “leads” with health, and the woman “the most immediate risk they highlight is with healthcare systems”… Subtle changes such as this could be significant and are something to watch for.
Connecting the material to local studies. Let’s jump back to the Algenie biotech startup. Now I want the presenters to refer the listener back to introductory biology courses here at UTS, and opportunities for tutorial discussions. I also made the male presenter lacking in confidence, to see if he might express confusions and questions that students were too afraid to say.
The new conversation really does reflect this, for instance jump to 6:30 and listen to the minute from there.
Adding scepticism. Until now, the presenters have never challenged the content — they just describe it in what seem like helpful accessible language. As with sycophantic chatbots trained to please, it requires explicit prompting that gives the LLM ‘permission’ to push back. In my first example, let’s upload the report we wrote here at UTS on our strategy for assessment reform in the age of AI. The default rah-rah conversation heaps praise on every word, but then I prompt for a more curious, sceptical response:
1:18 into the chat, she asks (yes the AI swaps the gender roles), “But I’m sure there are some people who think that they’re being a tad, you know…” (him) “Overly ambitious.” Later (1:55), she goes on, “And let’s be honest most academics I know are stretched pretty thin.” (him) “Tell me about it: grading, research, admin, it never ends.” (her) “Right — so where are they going to find the time to redesign entire courses?”
Now, I was impressed with this. These are exactly the sorts of reactions that we are encountering from some academics, and universities the world over can attest to the same. Systemic transitions are far from simple. The AI presenters are voicing precisely the doubts and worries that sit at the heart of our assessment crisis, and all this with a minimal prompt. Pretty impressive.
But let’s switch back to the Global Risks Report. In the explicit prompt, we invoke scepticism, casting doubt on the authority of the experts and their risk rankings:
Here’s how they handle this. 15secs in, she asks “what’s actually worth worrying about?” and 1min in, “This is where I get a little suspicious…” “Where’s the data in this report?” “How likely is this to cause major issues?” He quickly joins in: “Should we be taking this report with a grain of salt?” She confirms, “Maybe a whole shaker full, honestly.” “Who are these experts anyway?” “Are we just seeing the risks that fit a particular world view?” “So we need to act on climate change, but let’s not panic.”
And so it unfolds — exactly as I prompted. The polycrisis of system interactions is somewhat undermined with a straw man: “but are these guarantees?” No, they’re “possibilities”.
It’s not all negative. It’s quite reasonable to pose questions such as “Are these diverse experts, or just the usual suspects?” They appeal to human resilience and ingenuity, and bemoan the negativity of the risk analysis. Around the 6min mark, she offers an astute reflection: “So instead of dwelling on what can go wrong, I think the real value of a report like this is to get people talking, to ask tough questions, challenge what we think we know, and get ahead of the curve.” However, this also forms part of the presenters’ compulsion to end every podcast with a motivational call to action: we can all make a difference, let’s all pull together, etc.
Pedagogically, it would be fascinating to ask students to critique the strengths and weaknesses of a sceptical analysis, to further equip them that there is good and bad critique. This particular risk report does of course have its critics, and were this the actual material students needed to work with, one would hope that they would engage with those.
From critical thinking, to disinformation engine?
But — if you’re anything like me, listening to this last example is unsettling. The interviewers make no reference to the fact that the report details its methodology, partner organisations and expert panel (I didn’t prompt them to) but instead cast doubt on them and accuse the report of being vague.
So — we now have the ability to generate completely biased dialogue with the superficial appearance of a “deep dive report”, undermining analysis that others would consider authoritative. This is dual-use technology for sure, and looks like a gift to disinformation campaigns.