Conversational GenAI for argument analysis

History: making thinking visible

As you can tell from a quick scan of my site, I’ve spent a lot of time fascinated by how computers can make thinking visible (books Visualizing Argumentation and Knowledge Cartography), plus many papers and blog posts (on Argument Mapping and Dialogue Mapping), using experimental open source  software we built (such as Compendium and Cohere). A big picture account can be found in this 2007 keynote, Hypermedia Discourse: Contesting Networks of Ideas and Arguments.

So, all that’s to say that making arguments visible so that you and others can — in a very real sense — “see what you’re saying” has been a career-long passion. A key challenge in this long field of research has been that rigorous thinking is hard work. Bad luck, welcome to university! Argument Mapping and its related techniques use the affordances of visual trees/networks as an extended, external memory to augment personal and collective intelligence. Making one’s ideas visible as coherent diagrams is also hard work — but it’s good pain — the cognitive and discursive effort this entails is designed to clarify one’s thinking by revealing visually where the weaknesses are, in ways that writing and reading chunks of prose cannot tell you at a glance.

Enter NLP and rhetorical parsing

In 2012, we were now in the Web 2.0 era, and an exciting collaboration with NLP and linguistics expert Ágnes Sándor (Xerox) led to a new conception of Contested Collective Intelligence. For the first time in my work, machines could identify argumentative moves in sentences, complementing the argumentative moves that our web annotation tools enabled for people — who unlike machines, can of course can ‘read between the lines’ and see connections between ideas that may not even be in the texts.

This was extremely exciting, and the ideas and open source code carried through to our current Academic Writing Analytics project and web apps. I reflected  on the impact of encountering NLP colleagues, in the context of The Future of Text book.

Conversational generative AI

And so we arrive at generative AI based on large language models, which advances the state of the art in language processing and generation in so many ways. Moreover, the conversational paradigm, when a chat application is overlaid, opens so many interesting human-computer/personal-collective intelligence possibilities. I’ve been intrigued to play with GPT-4 to see what its argument analysis capabilities are.

Previously, I’ve shared some early experiments on ChatGPT-3.5’s ability to identify implicit premises in prose arguments, and critique a flawed argument by analogy. I’ve now had the chance to experiment a little with the version of GPT-4 that is Bing Chat, accessed via Microsoft Edge browser. I was dying to see how far I could get in generating an Argument Map from a written argument.

The task is a typical analysis workflow, as prep for teaching:

  • search for relevant sources
  • select one for analysis
  • extract key elements of the argument and their relationships (described using a structured markdown notation called ArgDown)
  • diagram them to show their key relationships (in the ArgDown web app)
  • discuss (with the AI)
  • start thinking about student activities to help them learn

I don’t mind admitting that watching a machine do this for the first time was startling! I tell the story here…

U21 2023 Educational Innovation Symposium Keynote from McMaster University (OFFICIAL) on Vimeo.

Deeper dive

Let’s take a closer look at what Bing Chat did, because it wasn’t perfect.

  • The gold stars signal what in my view are good summaries of what the authors said, correctly linked.
  • The blue info circles are “commentary” from Bing Chat about the arguments
  • The red crosses signal that the authors did not say this, it is a false reconstruction by Bing Chat.
  • The red underline signals classification of a premise using incorrect, or indeed made-up argument schemes. There is to my knowledge no such argument type as Argument from responsibility, or Argument from precaution. Argument from omission seems to be a jumbling of Fallacy of omission and Argument from ignorance. 

If we take this node for example, it reads well as a summary:

However, the authors do not talk about researchers at all, they say:

“The letter addresses none of the ongoing harms from these systems, including 1) worker exploitation and massive data theft to create products that profit a handful of entities, 2) the explosion of synthetic media in the world, which both reproduces systems of oppression and endangers our information ecosystem, and 3) the concentration of power in the hands of a few people which exacerbates social inequities.”

As an amusing sidenote, Bing Chat was curiously resistant to recognising this, insisting that it was correct, first “quoting” a fabricated passage from the article to me, and then saying that this implied that the authors meant researchers. I thought that this sort of stubbornness had been ironed out after Bing Chat’s earlier escapades! More seriously, this points to the value of dialogic learning, with a partner who can be conversed with 24/7 — but who must still be treated with some caution, certainly at this stage of maturity.

To summarise:

  • Bing Chat showed intriguing capability, for a machine, to analyse an argumentative article:
    • extracting the key claim and underlying premises, summarising them in own words
    • generating markdown (ArgDown) showing supporting/challenging relationships
    • (and without being asked to) attempting to classify some nodes using Walton’s Argumentation Schemes.
  • However it also introduced fallacious nodes (incorrect summaries of the authors, and incorrect commentary nodes), incorrect links, and argument classifications (inventing argument types, and/or misclassifying nodes).

This is an exploratory example, and more systematic evaluations are required, of the sort we see in the growing Argument Mining literature.


It does feel to me that we’ve turned a corner in the long, wintry history of AI. Perhaps this is a passing summer, which will fade like the others. But in my own career, punctuated by eureka moments such as seeing my first Apple Mac, my first web page load, and an iPhone — this is up there.

University is to teach you to think. Argument analysis is serious intellectual work, of the sort that we would hope to see from our students. Nor is there always “one map to rule them all’ — a correct map, since like in spatial cartography, design decisions are made about scale and purpose. The point about knowledge cartography is that it provokes productive reflection and discourse. So even if the AI gets the map wrong (and it will), the conversation this should provoke should be useful. With colleagues Kirsty Kitty and Andrew Gibson, I’ve argued that embracing imperfection in tech can be productive if it promotes deeper critical thinking in learners, e.g., learning by correcting the automated output, or reflecting on questions it asks, or why it seems wrong. Students must, however, be scaffolded to engage in such activity.

Informal learning? This is feasible in formal education, but may be less attractive in other informal learning contexts where we want to promote critical deliberation, e.g.  citizens engaged in a policy deliberation, many of whom lack the internal or external motivation to think that hard. But assuming future tools give more accurate argument maps/outlines, that require less debugging, perhaps we can see use-cases including:

  • assisting facilitators/educators to prepare learning resources for civic deliberations
  • assisting very engaged citizens to dissect complex arguments, and perhaps lowering the entry threshold for others who might otherwise not engage with such structured, critical deliberation
  • an article is very different to a multi-author conversation, but we can envisage summarising online discussions (NB: Teams is starting to summarise topics and actions in meeting transcripts)

Did we just supplant student cognition? From a learning sciences perspective, an overriding concern with generative AI is that it does too much cognitive work for the learner. Editing an AI-generated draft is not the same as wrestling with the blank page yourself. Ditto for reviewing an AI-generated argument map.

I have just done what many professionals have enjoyed doing in recent months: putting GPT through its paces to test its technical capability. But learners are not professionals: they don’t know what they don’t know. As I argue elsewhere, they may lack the knowledge, skills and dispositions to engage critically with AI output. They will require suitable scaffolding from mentors and teachers to learn what we mean by critical thinking and argument analysis, in order then to be equipped to use a power tool such as an argument mapping tool. Much empirical research awaits to test the affordances of generative AI like this, to establish when they are most useful to use developmentally, with a given age/stage of learner.

But we do know that argument mapping has struggled to gain traction (in formal education and among professionals) because it’s hard intellectual work. It could be that by generating full or intentionally incomplete argument maps, AI provides a step up for many learners to quickly get feedback on their work, or see examples of arguments about topics they are knowledgeable about — and thus better equipped to critique — compared to examples chosen by the teacher or textbook. Generative AI may open new possibilities because it can generate examples tuned to the interests of each learner, activating their curiosity to go deeper.

Your feedback is welcome, which is hosted on LinkedIn…

Comments are closed.