AI drawn fish and chips

Using AI GTP-4 to grade & linguistically analyse text

In my final experiment of the day with GTP-4, I am using the earlier AI-generated blog post, Debunking the myth of Learning Styles as the base. Peter Hartley spurred this post, acknowledging there was more fun yet to be had with GTP4 and this base material. He was right (and I promise the image at the top of this article will make sense by the end!).

First up – let’s see how the AI grades itself.

Using GTP4 to grade writing produced by GTP4

As I still have the entire conversation detailed earlier in AI-led blogging: An experiment with GTP4, DALL-E & Grammarly, I was able to prompt the grading directly. I asked:

Grade your earlier message that is Titled: Debunking the Learning Styles Myth: A Call for Evidence-Based Educational Practices

You are a Lecturer working at a prestigious university in the United Kingdom. You are to grade this work on a number scale from 0 to 100. A score below 40 represents a fail. You will need to provide detailed feedback to justify your grade, and you should identify areas of improvement to help your student take their work further next time.

Prompt

And the grade it gave itself?

Title: Debunking the Learning Styles Myth: A Call for Evidence-Based Educational Practices Grade: 85/100

Feedback:

Overall, this article is well-written, presenting a coherent and compelling argument against the concept of learning styles. The language is formal and appropriate for a specialist audience, and the structure is clear and organized. The article also demonstrates a good understanding of the relevant literature.

Areas of improvement:

  1. Depth of analysis: While the article covers the main arguments against learning styles, it would benefit from a more in-depth analysis of specific studies that debunk the concept, providing a more comprehensive critique.
  2. Counterarguments: Addressing potential counterarguments and providing refutations can strengthen your argument. Consider discussing the reasons behind the popularity of learning styles and possible alternative explanations for the observed effects.
  3. Practical implications: It would be beneficial to discuss specific examples of how evidence-based practices can be implemented in educational settings, giving readers a clearer understanding of the alternatives to learning styles.
  4. Source integration: While the article cites relevant sources, it could benefit from a more seamless integration of these sources into the text. Using direct quotes or paraphrasing specific ideas from the sources can help build a more robust argument.
  5. Originality: Although the article presents a well-established critique of learning styles, try to incorporate more original ideas or unique perspectives to make the argument more engaging and thought-provoking for your audience.

In summary, this article demonstrates strong writing skills, a clear structure, and an understanding of the relevant literature. To improve the article, focus on deepening the analysis, addressing counterarguments, discussing practical implications, integrating sources more effectively, and incorporating original ideas. Keep up the good work and continue to develop your critical thinking and writing skills.

GTP4

Evaluating the feedback

Not a bad evaluation. In reality, an actual university assignment probably wouldn’t score this well in presenting only one view (unless that was the task). It is correct in that counterarguments are required.

It is also interesting to see it suggest the use of direct quotes – something we’d generally ask students to avoid in order to demonstrate their own understanding.

The nod towards originality and depth of analysis very much capture the goals required for higher-level grades.


Read More
A robot writing at a desk

Can a blog post written by an AI trick an AI?

In my latest experiment with the GPT3 text-davinci-003 model, I asked it to write a blog on AI. I then wanted to run that content through some of the tools designed to detect AI-generated content.

Step 1: Adjusting the AI model’s settings

  • Temperature: 1 (This controls randomness. The closer the number is to 0, the more deterministic and repetitive the content will be. I wanted this as far away from 0 as it would accept.
  • Maximum length: 1000 tokens (I want to give it the space to generate a longer article)
  • Frequency penalty: 0.5 (Decreases likelihood of a repeated line)
  • Presence penalty: 0.5 (Increases likelihood of talking about new topics)
  • Best of: 3 (This will run the completion three times server side and display the best)

Step 2: Prompts for the AI

To produce the blog article, I sent the following prompts:

Write a high quality, but witty and amusing blog post that explains the benefits of artificial intelligence in the context of university-level education
The above copy is great. But nowhere near long enough. Can you rewrite the piece to be longer, and include some Harvard style references?
That's almost perfect. Can you make it even longer and funnier - in a dry humor kind of way.
Now give this blog post a clickbait title

I then used Grammarly to tidy up the grammar (just a few commas!).

Step 3: The final result – AI output:

Discover the Surprising Benefits of AI in University-Level Education!

University-level education is becoming increasingly advanced, in part thanks to the use of artificial intelligence (AI) in a variety of capacities. AI has revolutionized how we approach teaching, assessment and student support services, making learning more personalized and efficient than ever before. But what really are the benefits of artificial intelligence in the university context? Well, let me explain…

To begin with, AI technology can be used to assess student work quickly and accurately by offering automatic marking tools that give instant feedback on assignments (Lee & Voyer, 2019). This means students don’t have to wait for days or weeks for their grades – they can thank AI for avoiding those dreaded Sunday night existential crises! Additionally, AI software can scan through large volumes of material quickly, such as research papers or books, which helps professors save time when preparing lectures and course materials (Christie et al., 2019). This means their cake recipe development endeavours become even more prosperous – it’s a win-win!

Furthermore, AIing also helps personalize learning by recognizing individual learner educational needs and providing tailored content accordingly (Chen et al., 2018). For instance, it can detect a student’s point of difficulty in a particular subject matter based on their previous habits or searches online. It can then offer up educational materials at those points which are best suited to that person’s level of understanding and help them overcome any difficulty they may be having with the material. The best part? You don’t need to worry about remembering anything – so no more cramming before your exams whilst eating questionable microwaveable meals!

Finally, Artificial Intelligence (AI) technology offers up endless opportunities for teaching virtual classes where all students learn remotely from the comfort of their own homes while still benefiting from real-time interaction with their professor/tutor (Ahmed et al., 2017). With AI systems such as CourseSmart assisting instructors by preparing session material remotely before each lesson, no students will ever miss out again – not even if you forget your laptop charger because who could possibly forget that!?

All in all, these benefits make it clear why so many universities today are happily embracing AI technologies for university-level education purposes! By lowering costs associated with traditional methods as well as offering new stimulating learning environments; Artificial Intelligence presents an exciting frontier awaiting exploration. So why not get started today? Who knows what kind of convenient new features you can look forward to introducing into your life!?

References:

Ahmed I., Ullah K. A., Khan S., & Abbas M. (2017). Exploring the Potential Effects of Virtual Learning Environment among Undergraduate Learners: A Systematic Literature Review Using Thematic Analysis Approach. Computers & Education Journal. 112:204-222.

Chen C.-S., Chen T.-H., Chiu H.-Y., & Tsai C.-C.(2018). Cognitive Apprenticeship System Based on PrePA—Personalized Portfolio Assessment System for Learner Self-Development Environments Cloud Computing. International Journal Of Engineering And Technology Innovation 8(1):84–94.

Christie T., Wright C., Matthaei J ., Mureithi P .& Hobbs V .(2019). OERHub Final Report – Compendium Of Tools And Resources For Open Educational Resources Practitioner Support In Higher Education Institutions Around The World: Progress Report, Version 1 Harvard Graduate School Of Education

Lee S.-K& Voyer v.(2019). Soft Computing In Education: New Trends And Approaches: Learning Through Technology For All | Qsearch Ltd

NOTE: All of those references are garbage. Those sources don’t exist – though it looks convincing…


Testing the above article: Was it written by AI?

Well – we know the answer. It was, indeed, written by an AI. However, I wanted to run that text through some of the popular tools being used to detect AI-generated content to see how they did.

AI Text Classifier

AI Text Classifier is made by OpenAI, the same company that make the GPT3 text-davinci-003 model I used above. In OpenAI’s defence, they make it clear the tool is limited, and they caution its use in educational settings.

I can see why. It’s score for the above blog article was: The classifier considers the text to be very unlikely AI-generated.

GPTZero

GPTZero works by measuring the text’s perplexity and randomness. In this case, it did a better job than AI Text Classifer, though it failed to detect the whole piece as written by AI. It only identified four sentences at the top as generated by AI, but identified nothing in the rest of the text.

Your text may include parts written by AI.
Screenshot from GPTZero

GPT-2 Output Detector Demo

In fairness, this was designed for GPT-2, and the model used for the above article was substantially enhanced. Unsurprisingly, GPT-2 Output Detector scored the piece as real:

Prediction based on the first 510 tokens among the total 729. Scored 99.98% real.
Screenshot from GPT-2 Output Detector

Writer

Next, I tried Writer AI Content Detector. This tool is designed to identify AI text to help authors tweak their content to avoid detection. This is not designed for unfair means, but to stop search engines from penalising website page rankings. Content entirely produced by an AI often penalises websites from getting top spots on Google searches. Writer AI Content Detector is limited to 1,500 characters, so I had to split the article into two. Both halves were scored 100% for human-generated content…

100% Human-generated content. Fantastic!
Screenshot from AI Content Detector

Giant Language model Test Room (GLTR)

GLTR (glitter) “enables forensic inspection of the visual footprint of a language model on input text to detect whether a text could be real or fake”. It is built by a collaboration between Harvard NLP and the MIT-IBM Watson AI Lab. Similar to the GPT-2 Output Detector Demo, it was designed for GPT-2. It analyses how likely each word would be predicted given the context before it. It is pretty cool, as you can see word-by-word likelihood predictions for the next word:

Discover the surprising benefits of AI in - the predictions of what should follow this are: Your, the, a, your, Business
Screenshot from GLTR

Words highlighted in green are in the top 10 for most likely. Yellow words are in the top 100, and red words the top 1,000. A violet word is even more unlikely to be detected. In essence, while green should be the most common colour for both AI and human written pieces, there should be a proportionally higher number of yellow/red/violet words for something written by a human, as we are more random.

In this case, I was really shocked by the output. In my previous tests, I’d always seen a high proportion of green in AI-generated content. This time with the above blog post, I think it is fair to say there is a broader use of yellow/red/violet. To better explain the significance of this, I compared the above AI-generated content to my last blog post. You’ll see an almost identical spread of green/yellow/red/violet – though perhaps my content does have slightly more of the last two.

AI-written content (above blog post)
Human-written content (my last blog post)

I hadn’t expected that. In this case, I think GPT-3 text-davinci-003 and the above prompts produced a decent output – that the above detectors all failed to identify as AI-generated.

Human detection

I think it’s important to consider human detection. As we can see above, all of the tools failed to identify AI-generated content. In fairness, this is a new field – and just like the AI tools, these will develop too. In this case, human detection certainly wins. As you can see from the article, it has made up a load of references. They look convincing – and those journals, volumes and issues exist. The articles, however, do not.

Aside from the obvious errors in the AI-generated content, I question if anyone would write something so overwhelmingly positive in an academic context. Even a positive argument acknowledges there are alternative positions out there that should be refuted and rebutted. Part of this was down to my prompt, perhaps – but still something important that it missed out on generation.

I do think it did the funny, witty part well. The line: This means their cake recipe development endeavours become even more prosperous – it’s a win-win! was highly unlikely to be written by an AI in the above model.

Conclusion: AI-generators can trick AI-detectors

As you can see, there is still a long way to go in developing GPT-3 detection – which is perhaps concerning given that GPT-4 is on the way. I don’t think we can rely on tools to automate this process for us, and as you can see above, the human detectability of my output was very focused on my prompts. This will become more complicated as other AI tools emerge, requiring text to be checked against each of them. Until there is a paradigm shift in this technology, I think the answer to the above question is YES! An AI generation tool can certainly trick an AI detection tool.

There has never been a more important time to ensure AI literacy is a core aspect of the curriculum at every level of education.

Using Artificial Intelligence to summerise emails: A use case for Open AI’s text-davinci-003

There have been some interesting discussions around the use of Artificial Intelligence (AI) – and the fear of how it will impact assessment in HE. I will reflect on this myself in time, but for now, I want to muse on a possible use case. I’ve been following an email thread on AI in higher education via a publically accessible JiscMail. One participant had acknowledged how sprawling the debate had become – and asked if ChatGPT could lend a hand. I decided to give this a go using OpenAI‘s text-davinci-003 model (and it should be noted that this post’s feature image was drawn by OpenAI’s DALL·E 2)!

First, I downloaded the content text from 30 publically accessible emails on this topic of ChatGPT. Secondly, I removed all personally identifiable information (i.e. names, emails, links, institutions) from the text. This was an ethical issue as while the email text is in the public domain via jiscmail.ac.uk, the tool I used is allowed to retain text to train the algorithm further. Thirdly, I split the text into two separate queries to fit within the 4,000 token limit on text-davinci-003 model. Finally, I took the two text-davinci-003 outputs and asked text-davinci-003 to summarise the discussion from each text sample and re-write it for a knowledgeable audience.

Artificial Intelligence summary of the discussion:

This email thread is discussing a variety of topics related to ChatGPT, a new technology, in higher education assessment. Ideas have been suggested to mitigate any short-term disruption to assessment this semester, such as informing teachers of the technology, revising current assessment regulations, teaching students about academic misconduct, incorporating the model into assessment design, encouraging the greater use of oral-based assessment, group project work, and classroom-based assessment, using browser lockdown capabilities, testing the model against a variety of tasks, and promoting assessment (re)design that requires the creation of multi-media artefacts. There has also been discussion about curating resources on the topic, as well as different types of assessment that could be used, such as handwritten assignments and orals, and the ethics of the technology, with suggestions to use Turnitin to cross-reference essay submissions and to teach students to question the production and consumption of their cultural landscape. Additionally, it has been suggested to have an event which focuses on the implications of GPT for educational development and developers, as well as a SIG on AI and Academic Development. It was also noted that OpenAI (the company behind ChatGPT) is likely to become an organisation within the Microsoft ‘empire’, and that there is software that can detect ChatGPT-generated text.

text-davinci-003 summary of academic development discussion

Evaluating text-davinci-003’s output

… and do you know what. It isn’t a bad summary. It has missed some of the nuance and detail – and obviously the weight of some arguments over others. Some of the more powerful arguments (that were buried in the thread) were lost. For me, the summary was much poorer for excluding some of the deeper ethical concerns, for example, OpenAI Used Kenyan Workers on Less Than $2 Per Hour .

Yet, for someone who is ‘time-poor’, such a summary presents an exciting way to engage with significant chunks of information in little time. This summary took about five minutes – including both my time editing the text and that of text-davinci-003. This cost around 20 USD cents ($0.20). The original text was computed to take 18 minutes to read – the refined output was 60 seconds. Paying 20 cents to save 17 minutes of reading… not a bad cost for some.

Caution is, however, needed. When I asked text-davinci-003 to summarise the above for a 3rd grader – it was quite off the mark!

ChatGPT is a new technology that can help teachers assess students. It can help teachers check if students are cheating and can help them create different types of assessments. It can also help teachers create projects and activities that involve creating multi-media artefacts. There are also some ethical considerations to think about when using this technology.

text-davinci-003 3rd grader summary of academic development discussion

At this point, it has over-simplified the discussion to the point it is incorrect. At no point was ChatGPT pitched as something to help assess students or help create multi-media artefacts.

It was quite right on the ethical mark, though. I took great care to only use public-domain information with no personal information in this experiment. But even then – I imagine some people would feel uncomfortable about this.

Artificial Intelligence summaries as an automated proccess

This was a manual process – but I wonder how far off an automated future could be. From anyone using Microsoft 365/Outlook for emails, you may have received Microsoft Viva briefings. Microsoft Viva emails to bring together data from across your Microsoft 365 account (including your email) to bring connection and insights. This all uses a form of Artificial Intelligence to provide things like helpful reminders on outstanding actions, advice on how to manage your time in the context of your diary and information it feels might be helpful. Given Microsoft are rumoured to want OpenAI’s technology further integrated into Windows and Microsoft 365 – perhaps email summaries (on this scale) would be a natural extension of Microsoft Viva.

What are your thoughts? Let me know in the comments: