Should predictive models of student outcome be “colour-blind”?

This post was sparked by the international condemnation of George Floyd’s death, and the many others who came before him. Many communities and institutions are now reflecting on how structural racism manifests in their work (e.g. see SoLAR’s BLM statement and resources to help members learn more).

This is a tentative step into issues of race, about which I should declare I have no academic grounding. Nonetheless, it is important to ask what the implications are for a specific form of Learning Analytics, namely the predictive modelling of student outcomes. Should demographic attributes such as ethnicity be explicitly modelled, or should the models be “colour-blind”? While all categories have politics, this struck me as an interesting question, given that such techniques are demonstrating their value specifically in levelling the university playing field for all students. 

With thanks to Madi Whitman, Bart Rienties, Marti Hlosta and Paul Prinsloo for initial fact-checking and feedback. All comments are welcomed via this blog (moderated), the twitter thread or the LA Google Group thread.


Be more white. Be more male. Be wealthier. Those are the biggest correlations with success. It’s terrible, but it’s the truth.
[12] (p.1)

Classification systems provide both a warrant and a tool for forgetting […] what to forget and how to forget it […] The argument comes down to asking not only what gets coded in but what gets coded out of a given scheme.
[13] (pp. 277, 278, 281)

Since the emergence of Learning Analytics (c.2011) as both an intellectual community and commercial marketplace, an influential strand of work in higher education has been the use of predictive analytics, that is, developing computational models to identify students who look statistically likely (i.e. on the evidence of similar past cohorts) to be struggling, at risk of failing, or even dropping out. This is a dominant form of analytics inherited from the business world and machine learning, where it is highly lucrative to be able to predict the likelihood of, for instance, a customer buying a product or switching service provider — and take anticipatory action to change that possible future. So why not do the same for education?

Debate surrounds the ethics of such models in higher education, a particular version of broader concerns around the “datafication” of education through analytics, and now AI. The issues are complex, but examples of constructive dialogue are emerging, in which Learning Analytics and AI in Education engage with such critiques (e.g. these recent edited collections [2-4]).

Predictive modelling intersects with questions around the profiling of students, one attribute being ethnicity, which is what I want to focus on here given the current times we’re in, just a few weeks after the death of George Floyd at the hands of the police.

High profile success stories serve as iconic posters for the use of predictive modelling of student outcomes. Consider the Georgia State University Graduate Progression Success Advising program. It’s not called GPS by accident: the predictive model alerts student support teams when students look like they’ve ‘missed a turning’ (to push the metaphor) and off-course. An example screen from the system is shown below.

Discipline-level, cohort summaries of Low, Medium and High risk levels in the Georgia State University Graduate Progression Success Advising program.

Intriguingly, with regard to the question of racial colour-blindness, there’s a strong social justice angle that challenges head-on the demographically-related achievement gaps that many universities know only too well. Tim Renick, VP (Enrollment) at Georgia State University is unapologetic about GSU’s mission, and the GPS Advise website proclaims the sophistication of the analytics that help to power this:

“We have eliminated achievement gaps. For the last four years, we have been the only national university at which black, Hispanic, first-generation and low-income students graduated at rates at or above the rate of the student body overall. Georgia State is showing, contrary to what experts have said for decades, that demographics are not destiny.

Students from all backgrounds can succeed at comparable rates. Predictive analytics have helped all demographic groups graduate at higher rates from Georgia State, but just as critically, they have helped to level the playing field for all of our students.”

The irony will not be lost on those concerned about the datafication of education. Here we have analytics helping to level what historically has not been a level playing field for all students. When tools such as this are used intelligently, as aids for student support teams who are very much in the intervention loop, producing impressive outcomes for historically minoritized groups such as these (evidence which is not contested to my knowledge) — well, what’s not to like?

Another mature example of the process of embedding a predictive modelling tool into work practices is from The Open University UK (webinar / paper / paper [6, 7]). Working with online distance learning students, most of them mature students returning to academic study long after leaving high school, and including a high proportion of students with accessibility needs, the OU team has shown that compared to staff who did not use OU Analyse to monitor student progress, those who did contacted them more, with higher success rates [5]. Again, here we have analytics helping traditionally disenfranchised cohorts.

A screenshot from the OU Analyse dashboard, showing the risk of each student not submitting an assignment, their predicted grade, and their probability of passing or failing the course. (Figure 2 from [7])

Having set the scene, I want to focus on a specific decision that has to be made in such work, which I’m framing as follows:

Should predictive models of student outcome be “colour-blind”?

Two sides of the debate go something like this:

YES: MODELS SHOULD IGNORE HISTORIC INJUSTICES. Predictive models should ignore demographic attributes, which are well known to be highly predictive of outcomes, but students obviously have no control over their ethnicity, high school, being first-generation-in-family at university, etc. It’s clearly unethical to classify students as higher risk from day 1 for those reasons, immediately placing them in the shadow of inequitable historical patterns. They’ve got to university, possibly demonstrating greater resilience than their more privileged peers, so we wipe the slate clean. What counts is what they do when they walk through the door, some of which can be tracked by analytics through digital activity traces. Such models can therefore be declared to be “colour-blind”: ethnicity is not modelled explicitly, and nor are any other known proxies (e.g. Zip code; High School).

NO: MODELS SHOULD REFLECT BUT NOT PERPETUATE ALL KNOWN FACTORS. Predictive models of student success/risk should include demographic variables, since they greatly improve the model’s performance. It is myopic to ignore this, just as we should not ignore science and social science when they provide solid evidence of other difficult truths about societal inequities. The student’s demographics are not held against them, but rather, used to improve their chances. We should thus model student risk as comprehensively as possible, with our ethical ‘eyes wide open’, forearmed to use this knowledge in the students’ best interests, with strong ethical principles to ensure that competing interests are not allowed to influence decisions (e.g. a student’s need for extra support has resource implications).

Until recently, I thought of these positions as rather polarised. But a third analysis struggles with an unequivocal yes or no. This view problematises the goal of even trying to achieve colour-blindness:

BEING “COLOUR-BLIND” ≠ BEING ETHICAL

I’ll state very clearly that I’m brand new to reading anything academic about racism. As a result of reading sparked by George Floyd’s murder, I only just became aware of the work of people like Eduardo Bonilla-Silva on the nature of white privilege and structural racism, and at this point, have only managed to read various summaries and reviews of his influential book, Racism without racists: Color-blind racism and the persistence of racial inequality in the United States [1]. He argues:

“Whereas Jim Crow racism explained blacks’ social standing as the result of their biological and moral inferiority, color-blind racism avoids such facile arguments. Instead, whites rationalize minorities’ contemporary status as the product of market dynamics, naturally occurring phenomena, and blacks’ imputed cultural limitations” (p.2).

“Much as Jim Crow racism served as the glue for defending a brutal and overt system of racial oppression in the pre-Civil Rights era, color-blind racism serves today as the ideological armor for a covert and institutionalized system in the post-Civil Rights era” (p.3)

Colour-blind racism operates through:

  1. liberalism (markets are open to all and do not discriminate)
  2. naturalization (people “naturally” segregate themselves from other racial groups)
  3. cultural racism (minorities participate in self-defeating behavior) and
  4. minimization of racism (racism is no longer prevalent to address, specifically).

I found another article fascinating, introducing critical race theory to reflect on how academia functions, specifically HCI, a sister field to Learning Analytics (which just won CHI’20 Best Paper) [9]. In their summary of critical race theory, the authors also note Bonilla-Silva’s point (1) above:

“Liberalism itself can hinder anti-racist progress [34]. Liberalism’s very aspirations to color-blindness and equality – while admirable – can impede its goals, as they prohibit race-conscious attempts to right historical wrongs. In addition, liberalism’s tendency to focus on high-minded abstractions can lead to neglect of discrimination in practice.” (p.3)

These ideas raised the question in my mind: does making our computational infrastructure “colour-blind” merely perpetuate systemic discrimination in universities? So I was delighted to read the work of Madi Whitman [12], who presents an ethnographic account of how a university made its modelling decisions. There are some interesting quotes from the data science team, which I suspect might be echoed by many others, who are trying to make ethical decisions. First they are aware of the uncomfortable truth, as are many universities:

“Be more white. Be more male. Be wealthier. Those are the biggest correlations with success. It’s terrible, but it’s the truth.”

—Excerpt from interview with Don, a university administrator [12] (p.1)

Since the predictive model drives automated nudges to the students, they try to do the right thing (for the YES camp) — exclude demographic attributes over which students have no control:

“Socioeconomic status things. Demographic markers. But they’re all things that either because it’s too late in the game, we can’t tell a student, “Boy, it would have been great if you would have studied harder in high school.” And we certainly can’t tell a student on a demographic or socioeconomic thing, we can’t say, “Hey, it’d be good if you weren’t so poor.” There’s nothing a student can do with that. Even though it does put ‘em in a higher risk category. So we took those things that were malleable by the students. Things like, how much time they were spending on campus. Whether they were a proxy for whether we believed they were paying attention in class by how much data they were downloading in a class.” (p. 6)

Note the strong argument for student agency, which is a principle valued in much ethical discourse in Learning Analytics, and Human-Centred Design thinking. The student should be in control:

“I guess that we assume that what [students] did in the course of the day, they had control over. Right, so they chose whether they were gonna eat or not . . . they chose the gym or not, being on campus or not . . . They chose living where they chose to live. I think they have some say in that…So it seemed to me that any time that they had an opportunity to make a decision about what they were going to be doing, we called that a behavior.” (p.7)

Whitman helps us understand that while the analytics team sees this as the ethical response, it’s a double-edged sword: do they really have that level of control? She argues that:

“Because attributes are removed from the model and nudging, the reliance on behaviors suggests that students’ choices are at the heart of their success at the institution. Because demographic data are not incorporated into the predictive model at all, success is linked with behaviors and students’ choices. The purposeful presentation of data to students encourages students to internalize those data and act on them. As such, responsibility now rests on the students to take hold of their success.” (p.10)

If you are in the YES camp, this is exactly the goal. Level the playing field, we don’t care what colour you are, everyone is must take responsibility for their study habits, level of engagement, assignment submission, etc.

However, might this not also resonate with items 1, 3 and 4 in Bonilla-Silva’s work introduced above? The university and its learning platforms are framed as “open markets”, with opportunity for all (1); if students do not make wise choices, they only have themselves to blame (3), because we’ve erased racism from the algorithms (4):

  1. liberalism (markets are open to all and do not discriminate)
  2. naturalization (people “naturally” segregate themselves from other racial groups)
  3. cultural racism (minorities participate in self-defeating behavior) and
  4. minimization of racism (racism is no longer prevalent to address, specifically).

So Whitman with her modelling case study, and Bonilla-Silva in general, are questioning whether students from historically marginalised groups are really as autonomous and agentic as their more privileged peers. Whitman concludes:

“The visualizations of certain kinds of data—namely data students ought to use to inform their everyday decision-making—and obscuring of demographic data place the burden of responsibility and success on students. By minimizing the role that race, class, and gender play on graduation outcomes, the institution, through the model, can present behaviors as major factors in the likelihood of a student grad- uating within four years. If students do not attend class, a low GPA is a consequence of that decision.

Thus, the constraints around choices become invisible. The university and its existing inequalities start to vanish because success is placed in the hands of students. Social climate problems, structural barriers, issues of belongingness, and resource shortages disappear. A student cannot cite external factors in this model of success dominated by behaviors. The result is a shift in a locus of responsibility, wherein nudging is meant to give students tools to manage themselves and regulate their own behavior based on insights they ought to draw from their data.” (p.10)

WAYS FORWARD?

There seem to be some questions that could be asked, as a way to move this forward.

Does anyone contest the positive outcomes for students from the use of predictive models?

For instance, when GSU reports the startling impact of the GPS Advising initiative, is anybody questioning the figures? Is anyone questioning the claim that the algorithm has a pivotal role to play in this, rather than the impressive level of human support available to students? At the Open University, we knew that simply calling a student increased the chances of a positive outcome.

What is the purpose of the modelling?

If you’re designing automated nudges for students (as in the Whitman case study), clearly, there’s no point nudging them based on their static demographic history, so removing such attributes from the model seems uncontroversial in modelling terms. Whitman, of course, is concerned about this erasure (but see next section as to whether this is justified).

If you’re designing a model to understand the spectrum of challenges students face, in order to understand how to support them, then ignoring demographics becomes problematic. The UK Open University’s Student Probability Model  [7] was developed for financial forecasting, assessing the likelihood of a student still being enrolled as the course unfolded (sometimes over years for part-time students). This took into account deprivation indices, which could of course be a proxy for race in some contexts, but erasing this would simply lead to more erroneous financial forecasts. We should ask (perhaps even more so in these straightened times for universities) if it is in anybody’s interests for universities not to budget as accurately as possible.

The OU Analyse predictive model also takes into consideration a range of demographic variables including socio-economic and ethnic when making the first initial predictions, before a course starts. However, nearly all of the demographic factors quickly lose relevance once actual engagement and behavioural data is gathered when a course begins, in particular once the first assessment deadline has passed. Furthermore, previous credits obtained is mostly more predictive than any demographics. Interestingly, while the OU Analyse team has wanted to remove demographics given the limited additional variance its explains, those teaching on the front line apparently prefer to retain this, since it helps them to ‘colour in’ their picture of a student. Ethical arguments for both the Yes and No camps?

Given this tension between quant and qual drivers, it seems particularly important to understand when and why predictive models fail, through close qualitative analysis (see this recent example from the OU team [8]), as well as to understand in detail the experiences of the student support teams who use – or are expected to use – the outputs predictive models (e.g. [5]).

Is any real harm is caused by colour-blind modelling?

Whitman argues that in principle, an unfair burden is imposed on marginalised students if we assume they have the same capacity as their more privileged peers to respond to nudges and make wise choices. There is plenty of evidence that marginalised groups are not as free to make the same life-choices as more privileged whites, but is there any empirical evidence yet regarding student choices in response to automated nudges? I don’t know any yet.

One size does not fit all: students with the same demographics may still be very diverse

A black student may be working from home, in very poor physical and emotional conditions, poor computing and network access, struggling financially, commuting long hours, with dependents to care for. That student is clearly battling constraints that others are not, which will seriously affect how much “control” they have over their choices, through no fault of their own.

  • This is all invisible in the colour-blind model (YES camp). It is visible when we model such metadata (NO camp) and could be taken into account.

Another black student may have a generous scholarship, living on campus, free from carer responsibilities, and able to seize every opportunity that comes their way.

  • This seems to be the default assumption behind colour-blind student modelling — and that is precisely the point.

Should we just stop using predictive models in education?

Despite the flagship examples, perhaps the potential for poorly implemented predictive modelling is so high that they’re best steered clear of. It’s complex both technically and ethically. A range of ethical concerns not covered includes:

  • One size does not fit all. A body of evidence now demonstrates that a predictive model for one course does not translate smoothly to other courses. Differences in discipline, cohort, pedagogy and learning design introduce myriad variables.
    But within a given course, things are simpler, surely?
  • We don’t necessarily want to teach the way we always have. Predictive models assume that historically stable patterns are a reliable predictor of the future. But even within a course, this is not always true, since teaching staff, curriculum and pedagogies change. Indeed, many universities are trying to shift the way their staff teach and assess to more future-focused pedagogies. Innovations by definition break from the past, and so will likely break the predictive model, and the last thing we want is for our analytics to act as a brake on improving teaching. In our pandemic-afflicted world, predictive models based on a blended pedagogy with on-campus students, are unlikely to translate smoothly to 100% online students, working from diverse timezones (but that is ultimately, an empirically testable question).
  • Risk of misclassification. As in all areas of society where algorithms are classifying people, there is growing concern over the risk of being misclassified. Who wants a High Risk of Failure flag on their record, even before they start their studies? Is that flag really deleted, or saved to help validate future models? And could that classification be leaked to other entities, who could use it inappropriately?
  • University lacks the capacity to act. Prinsloo and Slade argue that a university has at least a moral, if not legal, obligation to act if it believes a student is at risk of failure. Predictive models, when valid, thus place a new burden on universities [11]. A key take-home from mature case studies such as GSU and the OU clarify the investment in people, processes and tools required to deliver on this.

So, there are significant risks that universities could buy predictive modelling products like any other ed-tech, but either use them badly, or if they are tuned well, still cannot act on what the dashboards are telling them, thus opening themselves up to charges of negligence. Perhaps it’s better not to know tens of thousands of students’ risk profiles in such precise terms…

Many universities choose instead to focus on other forms of analytics that make visible student activity in helpful ways, to both educators and students, provide educators with tools to intervene with personalised feedback at scale [10], but make no attempt to build a risk profile. That profile is left implicit, inferred by (hopefully well trained) student support mentors and educators.

What do students think?

I’ll close with this obvious question, but not one with any empirical evidence I know of. Let’s bring diverse students into the conversation and consult with them on these matters. Learning Analytics is beginning to introduce human-centred design methods that give a voice to students, and as with any co-design process, this requires learning, and listening, by all stakeholders. However, I do not know of any that engages students around predictive models in particular, and issues of race specifically.

How do students from diverse backgrounds engage with questions such as these?…

  • Do you want to be treated by the university just like any other student? Or should the university be recognising that you come from very different backgrounds, live in very different conditions, facing very different challenges day-to-day?
  • This extends into our IT systems: what do you think about analytics that continuously predict your likelihood of success, to maximise the support we can give you? Demographics including ethnicity and postcode can help improve such models, and help us ensure that outcomes are equitable for all students – does that seem reasonable? 
  • Are you surprised or shocked, or would you expect no less from a technically advanced university?
  • Are you happy to trust that the university will behave ethically, or do you want more transparency? How much do you want to know about the data we have and how we use it, and how much control do you want over this data?

References

[1] Bonilla-Silva, E. Racism without racists: Color-blind racism and the persistence of racial inequality in the United States. Rowman & Littlefield Publishers, 2006.

[2] Buckingham Shum, S. Critical Data Studies, Abstraction & Learning Analytics: Editorial to Selwyn’s LAK keynote and invited commentaries. Journal of Learning Analytics, 6, 3 (2019), 5-10 https://doi.org/10.18608/jla.2019.63.2

[3] Buckingham Shum, S., Ferguson, R. and Martinez-Maldonado, R. Human-Centred Learning Analytics. Journal of Learning Analytics, 6(2), 1–9. . Journal of Learning Analytics, 6, 2 (2019), 1-9 https://doi.org/10.18608/jla.2019.62.1

[4] Buckingham Shum, S. and Luckin, R. Learning analytics and AI: Politics, pedagogy and practices. British Journal of Educational Technology, 50, 6 (2019), 2785-2793 https://doi.org/10.1111/bjet.12880

[5] Herodotou, C., Rienties, B., Boroowa, A. and Zdrahal, Z. A large‑scale implementation of predictive learning analytics in higher education: the teachers’ role and perspective. Educational Technology Research Devevelopment, 67 (2019), 1273–1306 https://doi.org/10.1007/s11423-019-09685-0

[6] Herodotou, C., Rienties, B., Hlosta, M., Boroowa, A., Mangafa, C. and Zdrahal, Z. The scalable implementation of predictive learning analytics at a distance learning university: Insights from a longitudinal case study. The Internet and Higher Education, 45 (2020), 100725 https://doi.org/10.1016/j.iheduc.2020.100725

[7] Herodotou, C., Rienties, B., Verdin, B. and Boroowa, A. Predictive Learning Analytics ’At Scale’: Guidelines to Successful Implementation in Higher Education. Journal of Learning Analytics, 6, 1 (2019), 85-95 https://doi.org/10.18608/jla.2019.61.5

[8] Hlosta, M., Papathoma, T. and Herodotou, C. (2020). Explaining Errors in Predictions of At-Risk Students in Distance Learning Education. Proc. International Conference on Artificial Intelligence in Education (AIED 2020), pp 119-123. https://link.springer.com/chapter/10.1007/978-3-030-52240-7_22

[9] Ogbonnaya-Ogburu, I. F., Smith, A. D. R., To, A. and Toyama, K. Critical Race Theory for HCI. In Proceedings of the Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems(Honolulu, HI, USA, 2020). Association for Computing Machinery. https://doi.org/10.1145/3313831.3376392

[10] Pardo, A., Bartimote, K., Buckingham Shum, S., Dawson, S., Gao, J., Gašević, D., Leichtweis, S., Liu, D., Martínez-Maldonado, R., Mirriahi, N., Moskal, A. C. M., Schulte, J., Siemens, G. and Vigentini, L. OnTask: Delivering Data-Informed, Personalized Learning Support Actions. Journal of Learning Analytics, 5, 3 (2018), 235-249 https://doi.org/10.18608/jla.2018.53.15

[11] Prinsloo, P. and Slade, S. An elephant in the learning analytics room: the obligation to act. In Proceedings of the Proceedings of the Seventh International Learning Analytics & Knowledge Conference(Vancouver, British Columbia, Canada, 2017). Association for Computing Machinery. https://doi.org/10.1145/3027385.3027406

[12] Whitman, M. “We called that a behavior”: The making of institutional data. Big Data & Society, 7, 1 (2020), 1-13 https://doi.org/10.1177/2053951720932200

[13] Bowker, G. C. and Star, L. S. (1999). Sorting Things Out: Classification and Its Consequences. MIT Press, Cambridge, MA.

 

11 Responses to “Should predictive models of student outcome be “colour-blind”?”

  1. This is a great summation+reflection that got me thinking about our current practice, and how we might add additional methods to further validate how well a predictive algorithm not only equitably predicts the achievement of students across known demographic subgroups, but also about ways we might learn about the students’ approach to learning as they learning events that inform models accrue, and whether this approach might be related to these past experiences (and their interconnection to the students’ race, gender, generational status, etc in ways we could actually understand). I posted this twitter thread on a RT initially, but am reposting here after seeing a request to another poster to do so:

    Really captures the complexity and stakes of the debate: should students’ demographic data be excluded in prediction models? Our (@ UNC Chapel Hill, NC, USA) models currently fall in the “yes” camp, then we slice and check equity of prediction. What should be next? Different? 1/n

    Modeling agnostic to race, gender, first generation status, and other critical contextual information about the learner’s prior experience deprives models of predictive power, but ensures models are informed by behavioral choices students can control.

    What is not clear is whether those behaviors are similarly likely for students based on prior experiences/ characteristics, or when conducted, mean the same thing for their learning. I’m hopeful that data driven methods (differential sequence mining w latent profiles?) inform, or whether corroborating methods such as capturing think aloud data as students engage in learning, or retrospective interviewing can help us understand how clicks do and do not provide an unequivocal record of students’ learning events that scale across known subgroups.

    Validation efforts to understand the variability in what the key features of a predictive algorithm mean to the students as they engage in those learning events seems critical to determining whether behaviors alone can represent the student experience.

    We’ve begun this in the lab, but it seems like a combination of data-driven experience sampling methods (ESM; to ask students about their learning as they click) paired with stratified sampling (to ask members of subgroups) needs to be conducted in the educational “wild.”

  2. @Matt Bernacki, thanks for your encouragement, and thoughts.

    Yes, if you can find the resource to investigate this, you would be among the first to provide evidence, quant and qual, of how different kinds of students experience and act on automated recommendations such as these, as I call for.

    The only relevant work that I have across to date is in relation to the student experience of feedback messages written by educators, but personalised with the aid of (basically) rule-driven ‘mail merge’ technology, not from predictive models, such as Course Signals etc. In fact I mention this work in my post, from the OnTask project: https://OnTaskLearning.org

    Abelardo Pardo and his team are leading efforts to illuminate how these messages are experienced, e.g.

    Iraj, H., Fudge, A., Faulkner, M., Pardo, A. and Kovanović, V. Understanding students’ engagement with personalised feedback messages. Proceedings of the Tenth International Conference on Learning Analytics & Knowledge (Frankfurt, Germany, 2020). Association for Computing Machinery. https://doi.org/10.1145/3375462.3375527

    Lim, L.-A., Dawson, S., Gašević, D., Joksimović, S., Pardo, A., Fudge, A. and Gentili, S. (2020). Students’ Perceptions of, and Emotional Responses to, Personalised Learning Analytics-Based Feedback: An Exploratory Study of Four Courses. Assessment & Evaluation in Higher Education (2020): 1-21. https://doi.org/10.1080/02602938.2020.1782831.

  3. @Carlo Perrotta — Thanks for your blog responding to this.

    You provide some broader (depressing) US educational context in which to understand the claims that GSU are making — but unless I misunderstand, this does not in any way detract from their results. If anything, it shows how important is it to level the playing field for all students?

    You describe the GSU work as promoting a “political narrative of their colour-blind approach to reducing attrition” — but I’m not sure that’s right. They are quite explicit about tracking ethnicity, hence their ability to make claims about it dictating a student’s future. So, I think they are firmly in what I dubbed the “No Camp”. I’ve invited them to join this conversation…

    You point us helpfully to the body of critical race + technology work that undermines a basic assumption in data science, with learning analytics being no exception: that the “race” or “ethnicity” field in a dataset can be unproblematically treated as an “attribute” (e.g. Benjamin, 2019; Hanna et al 2020). For most data scientists, it’s certainly not a “behaviour”, but a simple metadata field, and they rarely have the intellectual background or resource capacity to question what it means. If the client provides this data, that’s what we work with. Now, on the one hand we can dismiss that as an inadequate response. Or, we can sympathise with the poor data scientist, who is suddenly being burdened with an extraordinary challenge (the whole point being that such categories are highly contested, and the product of hundreds of years of white discrimination).

    Practical solutions are being proposed, that move beyond demanding that all data scientists simply refuse to deal with any racial metadata, but it seems to me (as a newcomer) that there aren’t many practical guidelines in place yet. It falls to the academic and professional communities to devise those, but I’d love pointers to current examples, so that the Learning Analytics community can engage. Hanna et al. provide some clues as to which definition of race may be best suited for what analytical purposes.

    You conclude by asking whether this is all too complex, and perhaps we should just avoid building predictive models — an option many institutions have taken, as I discuss, for reasons that include but go beyond the perils of racial profiling. The further thought I would add, as I’ve read the responses and other material, is that there does seem to be a critical difference between student support systems that operate through *automated* feedback to students, and those in which there is a human in the loop — the student support teams that GSU and OU have invested in. Sadly for those seeking economic shortcuts through automation, those skilled people need paying, but have a critical role in mediating algorithmic output to students with wisdom and sensitivity.

    What I would really like to see are more closely documented examples of learning analytics design rationales, which can be examined from multiple disciplinary perspectives, including the ones you shared.

    Benjamin, R. (2019). Race After Technology: Abolitionist Tools for the New Jim Code. Polity. ISBN: 978-1-509-52643-7

    Hanna, A., Denton, E., Smart, A. and Smith-Loud, J. (2020). Towards a critical race methodology in algorithmic fairness. Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency (Barcelona, Spain, 2020). Association for Computing Machinery. DOI:10.1145/3351095.3372826

  4. You might find this article helpful in our exploration. While it is about incarceration they do a good job of illustrating the challenges you are interested in the context of education. https://www.technologyreview.com/2019/10/17/75285/ai-fairer-than-judge-criminal-risk-assessment-algorithm/

  5. Thnx Garron, yes, the ProPublica analysis of COMPAS is a major reference point in this debate, and that article has some nice interactive data viz to bring it alive.

    The debate around predictive policing is even closer in flavour to predictive risk modelling of students.

    What the GSA and OU examples show is that this can be done responsibly with good outcomes for students. GSA students are, for instance, graduating a semester earlier than they used to: that’s a lot of tuition fees not coming into GSA, but saving students. Other examples show that ‘at risk’ data can be used for nefarious purposes, in the interests of the institution but not the students.

    Consider this report on “The promise and peril of predictive analytics in higher education” [PDF] opening with this startling example:

    “Mount Saint Mary’s University, a small, private Catholic college in Emmitsburg, Maryland, made the news in 2016 when its president, Simon Newman, reportedly said about its freshmen, “You just have to drown the bunnies … put a Glock to their heads.”1 Newman was referring to a plan
    he had come up with to artificially improve the school’s retention rate. 2 He wanted to achieve this by requiring new students to take a survey, and then use their answers to identify those who were likely to drop out.3 Those students would then be encouraged to leave before they were included in the retention data the institution reports to the federal government.4,5 In an email sent to staff, Newman wrote that he wanted “20-25 people to leave by the 25th of September to boost retention by 4-5 percent,” according to the school’s newspaper, The Mountain Echo. 6”

    That’s a pretty extreme example of course. Much more damage will be done unwittingly through poor use of the tools. Analytics are invariably associated with economics, since they are obviously about automation. There’s no way to dodge that. The fact that machines can calculate in ways and at scale that humans can’t is not an intrinsic evil. We can take the view that all data gathering and analytics are so dangerous that we simply shouldn’t touch it, or we can engage with these new tools, and co-design the human and technical systems to enable their responsible deployment. Even if an educational institution has a strong set of values, and good leaders, it has to be smart enough to realise how its infrastructure can unwittingly perpetuate injustices of the sort we’re talking about here.

    Hence the need for a very good dialogue between learning analytics and critical race experts, and the need for constructive approaches that ground ethical discussions in tangible design dilemmas, e.g. that surface ethical edge cases for close analysis.

  6. I think this dialogue should include discussions about both intention and harm. Even without analytics involved we know that good intentions can lead to harm. In the example you provide you illustrate (i think) harmful intentions based on metric focused evaluations. You also acknowledge that poor use of tools. There are also a fair amount of poorly designed tools. I would not consider a software designer that does not know what they are doing as necessarily having bad intentions, but they certainly have a capacity to harm.

    One of the first meetings I had in Boston on Learning Analytics included participants who calculated predictions of “at-risk” students and claimed they could not tell why one student was predicted to be at risk while another student they considered near identical was not. Yet, they continued to produce the reports. So in this case it could be the prediction was bad (possibly not updated or maintained) and yet algorithm was still in use and potentially causing harm. I do not think the intentions of those generating the reports had bad intentions. However, who is accountable for the potential harm caused by (potentially) outdated algorithms making poor predictions that can have real consequences for students.

    I would hope that the dialogue you mention would include discussions of accountability for harm.

  7. Just published, in response to this blog post:

    Should College Dropout Prediction Models Include Protected Attributes?

    Renzhe Yu, Hansol Lee, René F. Kizilcec

    In Proceedings of the ACM Conference on Learning at Scale (L@S) 2021

    Preprint: https://arxiv.org/abs/2103.15237

    Early identification of college dropouts can provide tremendous value for improving student success and institutional effectiveness, and predictive analytics are increasingly used for this purpose. However, ethical concerns have emerged about whether including protected attributes in the prediction models discriminates against underrepresented student groups and exacerbates existing inequities. We examine this issue in the context of a large U.S. research university with both residential and fully online degree-seeking students. Based on comprehensive institutional records for this entire student population across multiple years, we build machine learning models to predict student dropout after one academic year of study, and compare the overall performance and fairness of model predictions with or without four protected attributes (gender, URM, first-generation student, and high financial need). We find that including protected attributes does not impact the overall prediction performance and it only marginally improves algorithmic fairness of predictions. While these findings suggest that including protected attributes is preferred, our analysis also offers guidance on how to evaluate the impact in a local context, where institutional stakeholders seek to leverage predictive analytics to support student success.

  8. I wrote this response during LAK22, but didn’t get a chance to send it until now….

    Overall, reading the original post and the comments, I think both intent and effect are important in planning and evaluating systems that use models. I say “systems” because the models themselves may generate predictions, but what we do with those predictions forms a system— models by themselves may contribute less to the effects than the other components of the system, e.g. notification criteria and contents. As noted by others, the infamous Mount St. Mary’s case is an example of bad intent; the continued use of models that make poor predictions or systems that send messages to participants (learners, teachers, advisors, etc.) that have negative consequences are bad effects, regardless of intent.

    1 – One thing we can and should ask is whether our models work equally well for people with different demographics. (e.g. ABROCA slicing, see Gardner, Brooks & Baker 2019 https://dl.acm.org/doi/10.1145/3303772.3303791) This is similar to acknowledging that facial recognition systems don’t work well on people of color, for example. When our models don’t work equally well, we need to fix them. Granted, this doesn’t avoid the problematic treatment of “race” and “ethnicity” as static data points, but it seems more ethical to use this data to evaluate or models/systems than to make assumptions about our learners.

    2 – Going beyond model prediction accuracy, we can ask if interventions (e.g. nudges to learners, recommendations to teachers, etc.) have equal effects for different populations, and if not, we may need to use race/ethnicity/SES data to inform what our systems advise participants to do. (An obvious simple example would be systems that send messages in English to those who don’t speak that language equally well, but word choice, typography, etc. can also affect recipients differently.)

    3 – While our nudges to students should be based on things students can do (and nudges to others can recommend things those others can be encouraged to do for them, like outreach), it is reasonable to include factors students have no control over in a model if they help to identify students who need specific kinds of help. If I suspect a student is missing meals (not by choice), the most helpful nudge I could send might be information about free meals. Maybe I can also help them to keep up with assignments, which hopefully will help their resource scarcity needs in the long run. Or maybe I can’t — people who are dealing with scarcity often don’t have enough bandwidth to address non-critical tasks like school assignments. Ideally a model should help us to identify ways to actually help someone, not just try to tell a student with challenges to act like one without challenges.

    4 – People ought to have choice about whether others see predictions about them, even when those predictions are assumed to be for their benefit. A student may be educationally at risk purely through life circumstances they are unable to change. They should be able to decide if they want that risk to be visible to others, e.g. if they trust those others to help them. People also ought to be able to choose whether to be reminded of risks they may already be aware of, whether or not the creators of systems think it would be helpful for them to be reminded. (This is distinct from saying people ought to be able to opt out of participating in such systems at all. An institution might argue that other students benefit by collecting aggregate data, and might be able to show that a system on the whole is beneficial to participants and justifies the data collection. That might also be an adequate argument for opt-out as a default rather than opt-in.)

    5 – Models ought to take the previous effects of their systems into account when generating future predictions. If we make a prediction and teachers or students change their behavior as a result of that prediction and the outcome is not what was predicted (hopefully a risk is avoided), that needs to be part of retraining the model and part of future predictions. (It should also be part of evaluating the value of the model/system over time.) If people don’t change behavior as a result of system nudges or other recommendations, and the predicted risk comes to pass, maybe those demographic factors should help tell us that the changes we advised are not practical for that population and we need to come up with more effective interventions. If those who received the nudge/recommendation actually did worse than those who were not notified, and that effect is correlated with a demographic value, we may need to remove that output from the system for that demographic.

    6 – Models and systems should not only be about what students do and can do. They need to also be about what faculty and institutions do. If we want faculty to teach in a different way, maybe we should make those teaching actions part of the models and systems too, and the “nudges” should include helping faculty to change— and we should evaluate whether those changes seem to be helping, empirically.

    In general, this means I’m arguing against “color blind” models and systems, but I think the marginal (if any) improvement in accuracy is not the point. We need to consider models in the context of the systems we build around them, whether they are automated or human processes, and use the demographic data to evaluate the effects those systems have on participants. If we can’t make our systems benefit people in all demographic groups, we need to ask ourselves if we are only helping to contribute to inequality. But I think an honest look at the data will let us do much better than that.

  9. Thanks @Liz, I (and I’m sure others) appreciate your reflections! And sorry that I only just spotted your comment…

  10. My colleagues and I now have a paper in press that argues that including demographic variables as predictors reduces model actionability.

    Baker, R.S., Esbenshade, L., Vitale, J.M., Karumbaiah, S. (in press) Using Demographic Data as Predictor Variables: a Questionable Choice. To appear in Journal of Educational Data Mining.

    https://learninganalytics.upenn.edu/ryanbaker/demographic-predictors.pdf

    We also critique arguments that excluding these variables is a form of color-blind racism, connecting to the original work introducing this term by Bonilla-Silva (as you do in your post here, Simon!)

Leave a Reply

You can use these XHTML tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>