Article updated Aug 2023.

Status update

Please note: Macquarie University has disabled the Turnitin AI detection feature as of Session 2, 2023. This article remains for historical information purposes only.

Historical information follows below.

On the 5th April 2023 Turnitin launched a new reporting feature that will highlight text suspected of being written by a Generative Artificial Intelligence Tool such as ChatGPT.

The feature is only available to educators and administrators. Students are not be able to view the AI writing report.

Where to find the AI writing report and what will it look like?

At Macquarie University the feature has been added to “Turnitin Feedback Studio with Originality” via the Similarity report tool bar for both iLearn integrated assessments and direct submission assessments. It will not be available in iThenticate or Authorship (Originality).

Sections of text will be flagged (highlighted – as in the example below) in long-text student submissions where a Turnitin algorithm has predicted a high probability that the section of text was written by a Gen AI tool.

Similarity report tool bar: AI writing button.
Turnitin AI writing detection screen (Source: Turnitin)

Turnitin have provided a slide set with more details of what it looks like and the main features. See this video of a preview demonstration from Turnitin showing a development version of the detection tool. The production version is shown in the image above.

Detection is limited to English only, long form text as one would find in an essay or report. It will not check short snippets of text, tables, bullet points or lists. It will not work to check computer programming code and is will not produce useful results on writing such as poetry.

It is important to note that Turnitin have stated that the feature as launched on 5th April 2023 is in ‘preview’ status. A flagged section of text is not definitive in the same way as a text match. See below for ‘How does the new AI writing detection feature compare to text matching?’ 

How does AI writing detection work?

An algorithm is used to determine the probability that a section of text was written by a Gen AI tool. Turnitin have not disclosed exact details of how they provide this estimation, but further hints are available in a set of AI writing FAQs provided by Turnitin.

Expand the box below to read about two possible methods to detecting AI writing based on research conducted by groups such as OpenAI and Stanford University.

Explore possible methods to detect AI writing.

In cases where Gen AI tools produce text such as ChatGPT output original content, then there is no existing database against which Turnitin could compare incoming samples of student work.

Two possible approaches to detecting unique text written by artificial intelligence are outlined below. These explanations have been simplified.

A pattern matching approach based on the linguistic features of text. There are variations on how this can be done. One approach involves training a model on the features and patterns found in human writing and the patterns found in samples of writing done by Gen AI tool. These can then be compared and contrasted to create a statistical model to classify writing by author type. Once the characteristics of each type of writing have been modeled, then an incoming sample of writing can be analysed for the patterns present in it. The pattern found in the sample is then compared to the model representing the differences between writing done by a Gen AI tool and human writing. The output would be a probability score that the sample text matches either of those two sets of patterns. A determination above a certain threshold would then result in that section of text being highlighted as Gen AI tool produced writing.  

OpenAI appear to have used a similar approach in their artificial intelligence writing classifier where they used paired sets of writing for training. Their approach had a 26% detection rate and 9% false positive rate.

A similar, but more targeted approach is to use previous samples of the student’s writing as a basis to compare an incoming sample. This approach is used as part of the tool set available in Turnitin Authorship investigate. However this method does rely on having a reliable baseline of material for each individual for the approach to work. Early testing of the Turnitin approach used in Authorship Investigate showed that markers were able to increase their detection rate of suspected contract cheating from 48% to a 59% with minimal shifts in a false positive rate (but that testing was on a very small sample of work).

The ‘Do I like it’ approach. A method used by Stanford researchers was to ask each Gen AI tool (or specifically a Large Language Model – LLM) to provide a probability score that it would have itself produced the writing. The method avoids the problem of needing to have large data sets of human and Gen AI tool produced writing. However, if there is uncertainty as to which LLM produced the writing then you will need to ask many LLMs because each LLM is different. Overall, the method produced a 95% detection rate.

Summary

The descriptions above are an oversimplification and Turnitin may well be using one or a combination of methods, or perhaps some other approach. It is important to highlight that these methods result in a probability score and are not a direct 1-to-1 match to a source.

Note that internet connected Gen AI tools such as Microsoft’s Bing Chatbot and Google Bard integrate internet search results in to their output. Where the sources these tools used to generate text have not been acknowledged in their output, such tools could generate what may constitute plagiarism. A recent example is the case of Tom’s Hardware and Google Bard. In such a case, the regular Turnitin text matching may detect the match.

It is also worth highlighting the arguments by AI researchers at the University of Maryland who have recently (17 March 2023) cast doubt on the efficacy of AI writing detectors that use signature (pattern matching) and water marking techniques.

How reliable is Turnitin’s AI writing detector? 

Turnitin have claimed in a recent article that their detector has a 97% detection rate with a 1% false positive rate in the tests that they had conducted. Their FAQs provide detail that the detector will only highlight text as written by a Gen AI tool when it meets a 98% certainly threshold. As such some Gen AI tool written text may be missed. A recent video by Turnitin outlined that they had tuned reporting to focus on reliability but that false positives were still possible. But remember a 1% false positive rate still means that 1 student in a class 100 or 10 students in a class of 1000 could have their work incorrectly flagged. Geoffrey A. Fowler, a technology journalist at Washington Post preview tested Turnitin’s detector on a small sample of 16 papers and found it wanting, with one false positive and mixed results on others.

As of 5th April 2023, Turnitin state in their FAQs that their detector covers text focused Gen AI tool based on GPT 3, GPT 3.5 and derivatives such as ChatGPT. Eric Wang (Vice President, Artificial Intelligence – Turnitin) said in a webinar on 28 March 2023 [recording] that Turnitin aims to improve their detection capability given the release of GPT4, but that the detector should still be capable of detecting outputs from the GTP4 model (although a success rate was not mentioned).

Few details have been supplied as of the launch date regarding the nature or size of the sample that was used to develop their approach. Their FAQs do mention that they drew on their existing database of student writing and that writing from non-english speaking background students and minorities have been included in their development program.

At this point caution is still warranted because the claims as to reliability have not been independently verified at scale. Turnitin will likely be using the free preview period (5 April 2023 to Jan 2024) to further refine the detection approach.

It is also important to note that writing given a clean bill of health by the Turnitin AI writing detector is also not definitive. It can miss ChatGPT generated work, mixed human and AI work, and work produced by other Gen AI tools such as Microsoft Bing chat and Google Bard. Work passed via a paraphrasing tools such as Quilbot may go undetected. This is the case for many approaches to detecting AI writing that are trained to recognise the text generated by particular Gen AI tool. At the end of the day it is an arms race.

For a brief comparison with other artificial intelligence writing detectors, see the discussion in the ‘how does artificial intelligence writing detection work’ expansion box above.

Can the AI writing detection be configured or disabled?

[Note as of Session 2 2023 Macquarie has disabled the AI detection feature in Turnitin]

No. Turnitin has advised MQ that the new feature is standard* and there are currently no settings to change.

Turnitin sent an email to clients on 22 March 2023 to state that the AI detection feature is in ‘preview’ status. As such, features may change in the future. Turnitin said the preview version would be free until Jan 2024, after which only certain products would be covered. MQ does hold a site licence for Turnitin Feedback Studio with Originality that is one of the covered products.  

It is unclear if Turnitin will allow configuration of the feature in the future. At this point in time, it is a matter of ‘watch this space’. 

*Update: Turnitin subsequently advised that it was possible for an institution to request the feature be disabled but that it could not be re-enabled.

[Note as of Session 2 2023 Macquarie has disabled the AI detection feature in Turnitin]

How does the new AI detection feature compare to text matching?

A predictive highlighting approach departs from the ‘text matching’ behaviour we have become accustomed to in Turnitin. In the case of the AI writing detection feature, it is akin to a prediction or ‘educated guess’ based on linguistic or stylistic features. As such a ‘source’ is not provided along with the highlighted text. This contrasts to Turnitin text matching that is definitive to a source (e.g. a sentence in the student’s assignment is matched to a sentence found in a journal article because the string of words is identical). 

What can educators do with flagged text? 

First, remember that text flagged as being written by a Gen AI tool is not definitive evidence that the text was or was not written by a Gen AI tool – it might be. 

Caution is warranted. 

“Educators should be very mindful that what Turnitin is offering is not proof. We strongly encourage you to have a conversation with your students and make your own assessment of your student’s understanding of their work.” 

Kane Murdoch (Manager, Complaints, Appeals and Misconduct at Macquarie University.)

As such a highlighted section of text should be considered as a flag to more carefully consider the work in the context of what educators know about that student and the specifics of the task. Remember that flagged text is not evidence in and of itself that a breach of integrity has occurred. Before reporting a suspected academic integrity breach, consider: a) what students were told with respect to whether and how AI could be used for the task, and b) other possible signs of AI generated writing.

Please follow the advice as presented in the ‘AI of AI’ Teche post on articulating a rationale based on the context and sources of evidence, before submitting a report.

Where can I find out more about AI writing detection in Turnitin? 

[Note as of Session 2 2023 Macquarie has disabled the AI detection feature in Turnitin]

The iLearn Quick guide for Turnitin Feedback Studio has been updated with links to these resources provided by Turnitin: 

Turnitin also have a number of resources related to AI writing on their website.

Share your experience

We welcome your thoughts in the comments below with respect to AI writing detection in Turnitin. You can also contribute your ideas by emailing professional.learning@mq.edu.au.

See also – other posts in the series on Generative Artificial Intelligence.

Acknowledgements: Banner image: Stable Diffusion “view through magnifying glass to look at tiny robots. plain background.” (24 March 2023). Further edited by M. Hillier. CC0 1.0 Universal Public Domain.

Posted by Mathew Hillier

Mathew has been engaged by Macquarie University as an e-Assessment Academic in residence and is available to answer questions by MQ staff. Mathew specialises in Digital Assessment (e-Assessment) in Higher Education. Has held positions as an advisor and academic developer at University of New South Wales, University of Queensland, Monash University and University of Adelaide. He has also held academic teaching roles in areas such as business information systems, multimedia arts and engineering project management. Mathew recently led a half million dollar Federal government funded grant on e-Exams across ten university partners and is co-chair of the international 'Transforming Assessment' webinar series as the e-Assessment special interest group under the Australasian society for computers in learning in tertiary education. He is also an honorary academic University of Canberra.

Leave a reply

Your email address will not be published. Required fields are marked *