In online Problem-Based Learning (PBL), being able to provide immediate feedback to learners is invaluable, yet difficult to achieve. Using Spacy, we examine how well an off-the-shelf Natural Language Processing (NLP) framework is able to detect completeness of free-form short user input. More precisely, we solve two classification tasks: 1) the absence of an identified responsible stakeholder (Who) and 2) the absence of relevant necessary action (How) in ideas generated during security training.
To do this, we apply Part-of-Speech Tagging and Dependency Parsing on contextualised short written learner contributions, collected during the use of the CCO Toolkit, a publicly accessible experiential PBL environment. We test our heuristics on a dataset of 1174 contributions. These were contributed by 91 graduate students enrolled in courses at University College London and Ruhr Universität Bochum, working on three distinct problem scenarios within two different security domains.
We compare the results of the classification against a ground-truth, annotated by two security experts. Our results suggest that for the purposes of providing feedback in free input problem-solving exercises, generic transformer pipelines without fine-tuning can achieve good performance on the identification of missing stakeholder and only moderately satisfactory performance on missing relevant action.
The data was annotated for Who and How by two security experts and differences in annotation were discussed between the experts until an agreement was found. The annotation assessed that 17.3% of the dataset is missing Who and 74.9% is missing How. Additionally, ungrammatical (12.3%) and ambiguous (30.5%) contributions were annotated by the experts.
Full expert-annotated datasetPlayable at cco.works. Alternatively, consider watching the complete walkthrough.
| Scenario | Domain | Ideas |
|---|---|---|
| Meltdown | information security | 900 |
| Phone2U | information security | 141 |
| Moonshine | community safety | 133 |
Our classifiers implement the following grammatical rules:
| Task | Voice | PoS | Dependency |
|---|---|---|---|
| How | Active | VERB | ROOT |
| Passive | VERB | parent of agent | |
| Who | Active | Not PRON | nsubj |
| Passive | NOUN | child of agent |
The classification code is available on GitHub. Notice that for the purposes of this research, classification is positive when the response does not contain identified vocabulary.
This research would not have been possible without the continuous collaboration with and support of Bilyana Taneva-Popova. We trully regret that she is not part of the official authors list.