Firefighting the ChatGPT storm

Categorised: Pedagogy, Teaching tips | Tags: AI
Posted by Sam Bell. Last updated: September 11, 2024

Samantha Bell is a senior lecturer in accounting at the University of Bristol.

Sam Bell considers the impact of AI in accounting coursework assessments. She explores how questions and rubrics can be adapted to deter improper use of Large Language Models (LLMs) such as ChatGPT, Google Bard and BingChat. She also provides practical suggestions for supporting learning activities and reducing overreliance on LLMs.

Image generated by Bing Image Creator (Dall-E). Prompt: “*woman, wearing a trouser suit and mortar board, extinguishing a fire, piles of documents on the floor, minimalist cartoon*”

Both my units for final year undergraduates have individual summative coursework elements. As firefighting measures, I made some small changes this academic year, but as I plan for next year, I’m preoccupied with how my assessments should change to accommodate the availability of LLMs.

Some academics feel the integrity threat posed to coursework by LLMs is so great that coursework should be abandoned in favour of proctored exams or in-person presentations. I cannot change my assessment for the next academic year this late in the planning cycle. So, I was left thinking about an approach to essay writing that I could incorporate within my existing course and assessment structure. Ideally, one that preserves academic integrity and encourages those students opting to use LLM tools to apply them responsibly.

The questions we ask

It seems that we can no longer set a single generic question for an entire cohort. We need instead a wider brief, from which students formulate their own bespoke questions and lines of enquiry. Following suggestions from Anna Mills and Lauren Goodlad (2023) [^{Note 7}], I advocate setting question briefs which “articulate nuanced relationships between ideas”.

Mills and Goodlad explain that currently, LLMs are relatively superficial in their analysis when faced with such questions. They tend to produce relatively “shallow” answers, which are “formulaic” and “repetitive”.

We must ensure students don’t opt for a problem already widely explored in the literature. In my field of auditing and financial reporting, using a conceptual or theoretical framework to explore a relatively current emerging issue could yield potential.

The cut-off date for training ChatGPT-3.5 was September 2021. But don’t assume more recent topics are ‘ChatGPT proof’ because other LLMs scrape the internet for information. However, incorporating some recency into your brief may make it less likely that LLMs will find sources that allow sufficient linking of the elements in your chosen brief.

A consequence of using this approach is that students need more support. Particularly to ensure they develop a question that addresses the brief. I solved this by encouraging students to consider suitable current issues. They bring a recent press article to class that covers a development or criticism of audit practice. We then co-create a shared Padlet resource for cohort discussion and reference.

I also require students to submit a coursework proposal or tentative abstract. This forms part of the overall coursework mark and I give feedback. Your cohort numbers may be too large to do this. In this case, you might adopt peer feedback or use a questionnaire or rubric to diagnose whether the question will meet the brief. Or perhaps offer short appointments with a tutor for advice. An alternative approach for large cohorts would be to assign the coursework brief as a group project.

Become a member

We’re building a community for students and teachers to learn from each other. Currently, membership is free.

Join today

Some scaffold for the ‘stochastic parrot’

Image generated by Bing Image Creator (Dall-E). Prompt: “woman, wearing a trouser suit and mortar board, extinguishing a fire, colorful parrots are flying about, minimalist cartoon”

Large language generative AI models have been described as ‘stochastic parrots’ (Bender et al 2021, [Note 1]). Essentially, from the user’s prompt, the model provides a response based on predicting natural “human-like” language using probability and training data and, in some cases, scraping the internet.

However plausible they may seem, LLMs do not have any ‘understanding’ of their responses. This theme and the ethics of LLMs were discussed in the QAA’s May 2023 publication, ‘Maintaining Quality and Standards in the ChatGPT era’ [Note 08]. In essence, we must help students realise how LLMs can support the academic writing process and where caution and critique need to be utilized.

I suggest you avoid using a didactic approach. Rather, encourage students to adopt a critical approach to using such tools. For example, incorporate class activities that illustrate the flaws and strengths of LLMs. I’ve enjoyed reading suggestions from Anna Mills [Note 6] and am keen to adopt her approach — not simply to evangelise about these tools, but to encourage their cautious use.

Careful — AI ‘hallucinates’

Who’s to say – maybe things will change over the next 12 months, but presently students should be aware that some LLMs hallucinate, and ChatGPT-3.5 in particular. The algorithms produce responses based on their training data. Where there is incomplete or unclear information, the models ‘guess’ responses based on patterns in the data. This can create plausible but factually inaccurate responses. Consequently, LLM output can include fictitious yet plausible references, theories and facts.

To demonstrate the hallucination phenomenon, you might ask students to enter a prompt designed by you. Your prompt might explore a relationship, incorporating at least two elements, and ideally has a ‘current’ context. Also, the prompt contains a requirement for citations from relevant peer-reviewed academic literature, plus a list of references.

Now ask your students to critique the generated response. Ask them to use Google Scholar to ascertain whether the references are valid. They should then evaluate whether the sources provide good quality evidence to support the arguments made. Or they could even assess the response as a marker and grade it using a rubric.

I changed my approach to academic scholarship last semester and placed a cap on the number of sources that students could use in their essays. Because of the hallucination issue in ChatGPT, I told students that I would check their references. I hope this encouraged them to be more considered in their source selection and to think carefully about the quality of their sources.

Recommended use of AI in accounting coursework

Finally, it is imperative to cover some ground rules for accommodating AI in accounting coursework. This will be informed by what you and your institution define as acceptable use of LLMs and other AI tools. For example, is it ok for students to use AI in the planning process to generate ideas or to provide style suggestions? If they wish, I allow students to use LLMs to draft their essay introductions which mimic how these tools might be used in the world of work.

Appropriate academic practice for acknowledging and referencing LLMs is still emerging. For example, there are traceability issues because prompts and responses do not have specific URLs.

If we allow the use of LLMs but are concerned about prompt-sharing amongst students, should we (as suggested by the University of Queensland [Note 11]) require the submission of screenshots of the prompts and responses used? Technology is emerging to solve this, such as ShareGPT, which can generate a URL for a specific ChatGPT prompt and response.

Authorship is another thorny issue. Can an LLM be cited as an author, or should we follow many leading academic journals and prohibit this? How should a student cite and reference their ChatGPT use? Guidance is emerging for the major academic referencing systems. I’ve been using advice from the Generative AI pages of the Bloomsbury Cite Them Right tool [Note 2] in my own materials.

Marking in the era of AI in accounting coursework

Image generated by Bing Image Creator (Dall-E). Prompt: “woman, wearing a trouser suit and a mortar board sat at her desk writing, a pile of documents are on fire, three big security cameras watching her, minimalist cartoon”

Assessment rubrics which reward higher-level skills . . . are less susceptible to influence by Generative AI tools.
The Quality Assurance Agency for Higher Education [Note 8]

Even before AI in accounting coursework was an issue, I shared marking rubrics with my students. They guide students to self-assess the development of their essays with a focus on higher-level skills such as critical thinking. See Miihkinen and Virtanen (2018) [Note 5] for a thorough discussion on assessment rubrics for written assignments in accounting.

I am adjusting my rubrics to encourage students to engage with the process of academic writing rather than the finished product. I believe this will reduce the incentive to over rely on LLMs. This past semester I awarded marks for the quality and suitability of sources (scholarship), and originality in terms of both the question crafted from the brief and the (critical) argument made in the essay.

LLMs are by their nature, language processing models. Their strength is in proofreading and writing style. I now, therefore, award fewer marks for presentation.

Using AI detectors in accounting coursework

And what about AI detectors? The evidence points to using AI detectors with extreme caution, if at all. Last semester, I reviewed the Turnitin AI writing detection score for my coursework assignments. There have since been widespread reports that caution against their use. The Financial Times reported, “Universities express doubt over tool to detect AI powered plagiarism” [Note 10] and Sarah Eaton (University of Calgary) [Note 4] reports that Open AI, the developer of ChatGPT, admitted their own detection tool is only 26% successful.

I’m additionally mindful that the pace of AI development is so great that as fast as AI detectors are developed, the faster AI tools may develop to evade them.

We should also consider the impact that a false accusation might have on a student’s wellbeing. Jim Dickinson in WONKHE [Note 3] suggested that bias might occur if a marker sees an AI detection score even if they do not act on or investigate it. I agree that suspicion of AI use could inadvertently but unfairly bias my marking for those particular scripts. Therefore, in future, I intend to mark coursework without viewing the Turnitin AI reports.

What next?

There are many other questions raised by AI in accounting coursework, such as ethical concerns and privacy issues. We must also consider the impact on academic workloads. Due to the rapid advancement in these technologies, how close are we to achieving artificial general intelligence? How soon will AI be able to complete any task a human can?

Let me know your thoughts . . .

Part of the Pedagogy series

Join the Accounting Cafe community

References and resources

Note 1: Bender, E.M., Gebru, T., McMillan-Major, A. and Shmitchell, S. (2021) ‘On the Dangers of Stochastic Parrots: Can Language Models Be Too Big??.’ Proceedings of the 2021 ACM conference on fairness, accountability, and transparency (pp. 610-623) Available at: https://dl.acm.org/doi/abs/10.1145/3442188.3445922

Note 2: Bloomsbury Cite Them Right > Harvard > Digital and Internet > Generative AI. Subscription required for access: https://www.citethemrightonline.com/sourcetype?docid=b-9781350927964&tocid=b-9781350927964-217. (Accessed: 23 June 2023).

Note 3: Dickinson, J. (2023) ‘It looks like you’re trying to assess a student’, WONKHE (29 March). Available at: https://wonkhe.com/wonk-corner/it-looks-like-youre-trying-to-assess-a-student/ (Accessed: 23 June 2023).

Note 4: Eaton, S. (2023) ‘The Use of AI-Detection Tools in the Assessment of Student Work’, Learning, Teaching and Leadership (May 6) Available at: https://drsaraheaton.wordpress.com/2023/05/06/the-use-of-ai-detection-tools-in-the-assessment-of-student-work/ (Accessed: 23 June 2023).

Note 5: Miihkinen, A. and Virtanen, T., (2018) Development and application of assessment standards to advanced written assignments. Accounting Education, 27(2), pp.121-159. Available at: https://doi.org/10.1080/09639284.2017.1396480

Note 6: Mills, A. — https://twitter.com/EnglishOER

Note 7: Mills, A. and Goodlad, L. (2023) ‘Adapting College Writing for the Age of Large Language Models such as ChatGPT: Some Next Steps for Educators’ Critical AI (17 April). Available at: https://criticalai.org/2023/01/17/critical-ai-adapting-college-writing-for-the-age-of-large-language-models-such-as-chatgpt-some-next-steps-for-educators/ (Accessed: 23 June 2023).

Note 8: Quality Assurance Agency for Higher Education (2023) Maintaining quality and standards in the ChatGPT era: QAA advice on the opportunities and challenges posed by Generative Artificial Intelligence. Available at: https://www.qaa.ac.uk/docs/qaa/members/maintaining-quality-and-standards-in-the-chatgpt-era.pdf?sfvrsn=2408aa81_10 (Accessed: 23 June 2023).

Note 9: Rose, S. (2023) “Five ways AI could improve the world” The Guardian (6 July) https://www.theguardian.com/technology/2023/jul/06/ai-artificial-intelligence-world-diseases-climate-scenarios-experts (Accessed: 23 June 2023).

Note 10: Staton, B. (2023) ‘Universities express doubt over tool to detect AI-powered plagiarism’ Financial Times (3 April). Available at: https://www.ft.com/content/d872d65d-dfd0-40b3-8db9-a17fea20c60c (Accessed: 23 June 2023).

Note 11: University of Queensland Library ‘ChatGPT and other generative AI tools – how to cite or acknowledge generative AI tools in your assignments and publications’ Available at: https://guides.library.uq.edu.au/referencing/chatgpt-and-generative-ai-tools (Accessed: 23 June 2023).