On March 11th, 2025, RM's Gwyneth Toolan hosted an enlightening webinar titled 'Educational Assessment in a Changing World'. This event was a deep dive into the evolving landscape of educational assessment, sparked by the launch of Isabel Nisbet and Stuart Shaw's latest book.
Meet the speakers
Isabel Nisbet Author
|
Stuart Shaw Author
|
Gwyneth Toolan RM Host
|
The webinar was a rich discussion, covering the authors' approach, lessons learned, and counter-arguments, followed by an engaging Q&A session.
Key themes explored
The tension between traditional standardised testing and more holistic methods was a central theme. The discussions revolved around four key lessons and counter-arguments to current assessment practices, prompting thought about a balanced path forward that maintains fairness while acknowledging individual contexts.
Highlights from the session include:
-
Why context always matters in educational assessment
-
The challenge of measuring modern skills, like collaboration and emotional intelligence
-
Breaking down false dichotomies (knowledge vs skills, formative vs summative)
-
Public confidence in differing assessment methods
Why this matters now
Stuart Shaw highlighted how Massachusetts is grappling with fundamental questions about what credentials should signify and how to measure student achievement fairly. This is a crucial conversation for educators, assessment specialists, policymakers, and concerned parents alike, offering valuable insights into the moral and practical dimensions of educational assessment in the 21st century.
Your questions from the session answered
Question:
You said formative assessment can have a big impact on the student's self-belief and future choices. What do you think needs to change in the learning journey? Should formative be more about feedback and less about "marks"?
Feedback and assessment, however informally delivered, can have life-changing effects on recipients. Anecdotally, examples include “guidance” that particular areas of study or avenues for further education are not appropriate for particular students. Musicians cite examples of young children being (wrongly) told that they “can’t sing”, and sports enthusiast children are put off physical exercise for life by negative feedback from gym teachers. Sometimes words can be as high stakes as “marks”. And tests intended to inform future teaching and learning may be interpreted by learners simply as rehearsals for final exams with more overtly summative purposes. It could be argued that any assessment (formative or summative) that influences a student’s learning or influences assessment practice in a school, college and university is ‘high-stakes’ (Popham, 2010).
Popham, W. J. (2010). Everything School Leaders Need to Know About Assessment. SAGE.
Question:
Thank you for painting a most vivid ‘big picture’.
Much emphasis was rightly placed on the importance of fairness, so may I ask a question, please, about the fairness of the grades currently awarded for GCSE, AS and A level exams in England?
According to OCR’s recent report “Striking the Balance”*, these grades are reliable** only to one grade either way at best. Since life-determining decisions can be taken on the basis of the (single) grade shown on a candidate’s certificate, are grades “reliable to one grade either way” reliable enough? And if not, what should be done now?
We discussed this briefly at the session. It would be a false quest to search for “absolute” reliability in assessments in some areas where the marking inherently involved some judgement. A better course is to seek to educate understanding about how to interpret and use the outcomes of assessments. The statisticians can tell us interesting things about explaining confidence limits in statistical measures.
It is worth noting that “absolute fairness to every examinee is impossible to attain, if for no other reasons than the facts that tests have imperfect reliability and that validity in any particular context is a matter of degree. But neither is any alternative selection or evaluation mechanism perfectly fair.” Standards for Educational and Psychological Testing (AERA, NCME, APA), 1999, p.73.
American Educational Research Association, American Psychological Association, and National Council on Measurement in Education. (2014). Standards for Educational and Psychological Testing. Washington, D.C.: American Educational Research Association.
* Note 18, page 20, https://www.ocr.org.uk/Images/717919-striking-the-balance.pdf?hsCtaAttrib=177138440350
** For clarity, “reliability” here refers to the probability that different examiners will award the same grade to the same script.
Question:
One of your questions for further consideration was around whether it is REALLY possible to move to a more individualised form of assessment.
I suspect that at least part of the challenge here will be one of scale. Standardisation is usually borne of necessity, particularly when resources are limited.
Is there an element of 'scale' that you've considered in terms of how the means of assessment may be constrained?
There certainly are resource constraints to what is possible for assessments taken by large numbers. There are also limits to how results can be classified and described so that users of the outcomes can understand them. However, there are good examples of computer-supported adaptive tests that do allow some individual routes (within limits). There is an up-front cost designing and producing these, but taking different individual routes and producing individualised reports does not cost more when the tests are in operation.
Question:
Are there any conclusions or any evidence to be drawn from the local marking of GCSEs and A levels during lockdown? Did this show any evidence of synthesis?
Certainly, the teachers awarding the marks locally would know about the contexts in which the tests were taken (for example, the hardship some of the pupils were suffering at home) in a way that external markers would not. One of the problems was the absence of guidance about how/whether they should take such factors into account, or whether they were supposed to be estimating what the outcome of an external (context-free) assessment would be.
Question:
What was your working definition for ‘feedback’ in your research? And hence for ‘individualised feedback’?
Our comments here about feedback were generalised, and we had in mind a range of responses, from informal oral comments to written marking of coursework or tests. There is a lot of interesting work being done about this in higher education.
Question:
I was particularly pleased to hear your thoughts about formative and summative assessment. I completely agree. Both are extremely valuable, and they are not necessarily distinct. If, for example, we can use improvements in data science certify a detailed test taker profile, is that summative oof formative? Presumably both.
Agreed – the same information can be used for formative and summative purposes.
Question:
Thank you for your answers to this question. What are the implications as regards decisions about which students are, and are not, within “The Forgotten Third”?
Very interesting question. In the book we brought together some outcomes of international measures and tests of educational attainment and participation and discussions of their implications. In summary, there is a long way to go in addressing the educational needs of whole populations. We need assessment tools to support that enterprise. Perhaps removing assessment – based barriers (e.g. a prescribed grade in a standardised maths test) to progressing an educational participation would help?
Continuing the Q&A
Although the live webinar has concluded, you can still watch the recording and ask any questions. Our speakers will be happy to respond.
Feel free to reach out to Stuart and Isabel directly:
nisbet.isabel@gmail.com
shawstuartd@gmail.com
Stay tuned for more thought-provoking discussions and insights from RM!
Would you like to discuss how RM assessment can help you to adopt digital assessment? Get in touch: