Giving Feedback to Students: Instructor vs. Machine

27 Replies

“edX, a nonprofit enterprise founded by Harvard and the Massachusetts Institute of Technology, will release automated software that uses artificial intelligence to grade student essays and short written answers.” John Markoff, New York Times

There has been much discussion this week among educators about the idea of robo-grading, or machine grading, prompted by the New York Times article Essay Grading Software Gives Professors a Break of which the quote above is an excerpt. To date over 1,000 comments posted to the article, most vehemently opposing the idea of automated grading. Quite by coincidence, I posted an article on this blog, Four Reasons Why we Need Instructor Feedback in Online Courses that emphasizes the value of instructor feedback specifically in online courses—and I stressed why MOOCs won’t cut it.

My argument is that undergraduate students need constructive and specific feedback to develop their writing and critical thinking skills, and a massive course such as a MOOC cannot provide it. My view contrasts starkly with the president of edX, Dr. Agarwal. Agarwal is convinced that students can learn from, and develop writing skills in a MOOC setting with feedback via automated grading. It’s the immediate feedback that is useful states Agarwal, and that students are able to “take tests and write essays over and over and improve the quality of their answers” (Markoff, 2013). Hmmm—while I do agree that immediate feedback supports the conditions required for learning, I don’t see students being motivated to rewrite an essay again and again.

How Does Automated Grading Affect Student Motivation?

In response to the NYT article, Elijah Mayfield, founder of LightSIDE Labs, developed a computer program that uses “machine learning to automatically assess written text“. Mayfield wrote a post for e-Literate discounting the claims outlined in the NYT article which generated over 50 comments, mostly from university professors opposing the robo-grader concept. I have minimal experience with machine grading, and my comments to Mayfield’s post took a different (perhaps less informed) approach, focusing more on the conditions of learning. The concerns I have focus on students perception and their willingness to consider automated grading as valuable. Also its effect on student motivation, thus potential learning. Two of my recent posts, here and here, reference research studies that support explanatory and constructive feedback from instructors.

Below is the comment I posted in response to Mayfield’s post Six Way the edX Announcement Gets Automated Essay Grading Wrong on e-Literate.

Thank you Elijah for this in depth post. Questions I have-how do students perceive machine grading? And how much research has been done on the impact on learning performance and motivation?

I wonder what the implications are (or will be) on students’ motivation, and quality of their effort and work? Students spend time on writing essays, some more than others, yet for students to know that a real person will not be reading their essay, could impact many processes. My teenagers have been exposed to automated grading periodically at their high school and they both strongly dislike it (despise it is a more fitting term). They discount its value completely. I predict that teenagers and young college students will not be receptive to this type of grading. Why should they spend hours researching, writing and re-writing an essay when they know no one ( a real person) will even read it? Even more so in a MOOC that is not for credit, why on earth would you write an essay for an automated grader?

For large-scale classes, as you discuss in your post, peer grading would be a far more valuable exercise and learning experience for students than machine grading. Two studies I have read show that there is 20 to 25% grade inflation with peer grading, but the learning for sides, peer and student is far more meaningful in my opinion.

I am all for technological advancements, yet at some point are we not going too far, and when will that be? (A rhetorical question). However, I do look forward to reading further and learning more about this method. Thank you for the thought-provoking post. Debbie

Response from Elijah Mayfield:

Debbie – There are mixed results in the literature, but most of all they point to a negative impression from students if they’re working purely alone, even if writing skill does go up. However, if automated technology is being used in a collaborative setting, scaffolding the interaction, we see almost the opposite effect – compared to a control it increases student satisfaction with the learning experience, and their own self-efficacy, even if the learning gains on top of that collaborative process are modest…

Mayfield’s response is fair and honest, and I appreciate his willingness to engage in discussion with readers that commented and expressed skepticism, if not criticism of his program. I encourage readers that are interested in learning more about the topic to read the post and the discussion that follows it.

Let’s Think about This More…

I want to learn more about the idea of machine grading, and am eager to review feedback from students after edX implements its grading software that Agarwal speaks of in the NYT article. Though I remain skeptical—I’m keeping my mind open. As mentioned, I am most concerned about its implications on student motivation, and the potential long-term effects on learning should machine grading become the norm. There is an emotional side to this story, the idea of students making personal connections and feel that their writing is of value when writing to a real person. Can the joy of writing be fostered when writing for a machine?

Further Reading:

Six Way the edX Announcement Gets Automated Essay Grading Wrong, Elijah Mayfield, (2013), e-Literate
Robots Eyes as Good as Humans when Grading Essays, Melissa Block, (2012), NPR
Tossing Sabots into the Automated Essay Grading Machine, (2012), Audrey Watters
Four Reasons Why Students Need Instructor Feedback in Online Courses, Online Learning Insights
Better Tests, More Writing, Deeper Learning, (2012), GettingSmarter.com
Essay-Grading Software Offers Professors a Break,(2013), John Markoff, New York Times

Image credit: Mike Licht, NotionsCapital.com’s photostream (Flickr)

27 thoughts on “Giving Feedback to Students: Instructor vs. Machine”

relojes invicta June 14, 2013 at 4:34 pm

Hi, Neat post. There is a problem along with your site in internet explorer,
may check this? IE still is the market chief and a good portion of other people will omit your great writing due to this problem.

LikeLike

Reply ↓
Pingback: Giving Feedback to Students: Instructor vs. Machine | Easy Fix
jamharl April 16, 2013 at 10:53 am

I prefer to be graded by a human being. However, they must follow some conditions to avoid being bias. They should always be conscience free.

That is because I’m afraid I will be given grades by a hacked machine.

LikeLike

Reply ↓
Tim Hunt April 14, 2013 at 9:13 am

You say: “I don’t see students being motivated to rewrite an essay again and again.” That seems rather unfair to your students. If they are anything like students I know, then they will work through numerous drafts before they submit their essay. This does not only apply to student, but to whenever anyone does some serious writing. How many drafts did you go through in writing the above blog post?

The question is, how efficient is that drafting process for different people. An experienced writer, such as yourself, has very good powers of self-criticism. You can spot the flaws in your own work, and so direct your own re-drafting efforts. Novice writers, on the other hand, cannot do that so well, which is one of the reasons they are novices. Now, a computer may be able to provide some of that necessary criticism more quickly and conveniently (though less well) than a teacher, which might be a good thing.

Can I ask, in your own writing, do you have spell-checkers and grammar checkers turned on or off? When you are advising your students about preparing their essays, what do you say to them in relation to these tools? (My own view is that grammar checkers are helpful, even though they are far from 100% accurate. Using a spell-checker is a no-brainer.) Perhaps computers are capable of going at least one step beyond what grammar checkers can do? Perhaps, even if it is not perfect, that is helpful in letting novice writers get further on their own, before showing their work to a teacher?

One person who has researched how students engage with immediate feedback from computer-marking is Sally Jordan, from the Open University (UK). See, for example, http://www.sciencedirect.com/science/article/pii/S036013151100251X or search for some of her other papers. (She also has a blog at http://www.open.ac.uk/blogs/SallyJordan/.)

LikeLike

Reply ↓
1. Laura Gibbs April 14, 2013 at 3:11 pm
  
  Thanks for the link to the research paper – although the abstract explains that this is not exactly the kind of
  
  “writing” that is envisioned in the edX announcement – 20 words or fewer is what I would call a “short answer,” a kind of glorified “fill in the blank.” When people talk about actual student writing, I think they have in mind something that is longer than 20 words.
  This comments, for example, is already 70 words long. I have added some line breaks to indicate where the first 20 words stop. And now this comment is 100 words long.
  
  LikeLike
  
  Reply ↓
2. Debbie Morrison Post authorApril 14, 2013 at 11:48 pm
  
  Hi Tim,
  
  Thank you for your thoughtful response and link to the article. Though I agree with Laura here, the article you link to refers to short answer responses, not essays. There is little research that suggests that students prefer or benefit from machine grading, or that it can provide meaningful feedback that develops writing skills specifically. Though research has been done that indicates there is value in immediate feedback, but these studies refer to multiple choice and in the paper you provided, short answer responses, but not development and constructive feedback provided for essay assignments.
  
  I should have clarified my statement – “I don’t see students being motivated to rewrite an essay again and again for a machine.” My comment referred to Dr.Agarwal’s comment in the NYT article where he suggested that with machine grading students could rewrite their essays again and again.
  
  To answer your question, yes I do use grammar checkers and spell check when writing, and I encourage students to do the same. I write for a human audience and write drafts in anticipation that one or more individuals will read what I write. Students should have the same opportunity, that they are writing for an audience, whether it be one person or many.
  
  Ask any high school student that has experienced machine grading in language arts or literature classes – they already know how to ‘beat the system’ they know what the machine is looking for. They discount its value and integrity completely.
  
  Thank you again Tim for sharing your views. It is most helpful to engage in discussion to explore issues from various perspectives.
  Debbie
  
  LikeLike
  
  Reply ↓
Jessica April 13, 2013 at 6:55 pm

Debbie, I see what you’re trying to get across, yet you got to admit, that feedback can go both ways. I can think of at least half a dozen cases when sour relations with a teacher or a professor guaranteed me a bad mark in class despite my academic performance. So yes, robots / PCs can hardly provide students with feedback that a talented and genuinely helpful teacher can provide, but they also won’t get back at you when grading your work.

LikeLike

Reply ↓
1. Debbie Morrison Post authorApril 14, 2013 at 1:27 am
  
  Hi Jessica. I agree with you – teachers are not perfect – and with that comes feelings of animosity etc. that may come through when grading student work. This is unfortunate – and I like to think the exception not the rule. But you bring up a worthy point. Thanks for sharing. Debbie
  
  LikeLike
  
  Reply ↓
2. Laura Gibbs April 14, 2013 at 3:13 pm
  
  Jessica, all the more reason to make sure we do a better job in helping teachers do a better job in teaching writing – designing better writing assignments, learning how to give good feedback, etc. All the time and energy we might spend “teaching machines” (as is required by this software) is time and effort that would be better spent teaching the teachers I think.
  
  LikeLike
  
  Reply ↓
  1. Tony Demetriou April 16, 2013 at 8:45 am
    
    … maybe.
    
    The tricky thing about “teaching a machine” is that it doesn’t have the same intuitive grasp of human concepts that humans have. So it’s a lot harder, and there are a lot more pitfalls needed to get it perfect. The nice thing about “teaching a machine” is that once you’ve figured it out, it’s a “solved problem” forever more. We don’t have to teach each individual computer, like we have to teach each individual teacher.
    
    And clearly we already know that. Which is why we have textbooks, video instruction etc. so that we don’t have to deliver the same teaching to each individual teacher.
    
    Am I saying we should stop producing good teachers? Absolutely not!
    
    I’m saying there are strengths and weaknesses to both teaching machines and teaching humans. They both provide different benefits.
    
    As for the “time and energy we might spend” – you’ll probably find that the people teaching the machines have an IT background, and want to be doing that. We’re talking a handful of individuals, who would only be able to teach a handful of teachers – assuming that clear communication and teaching is their specialty, and assuming that they are in a position to be teaching those teachers.
    
    In reality I think you’ll find that we can have both. We can let the programmers think about how to teach the machines, and let the educators think about how to teach the people, and let the programmers then apply what the educators preach.
    
    Automated marking is a tool, nothing more. When used well, it could be a big help for both teachers and students. When used as a shortcut to save a teacher having to do work, well, then it’ll be detrimental to the student. And the students will know they’re being cheated.
    
    LikeLike
    
    Reply ↓
Robert Connolly April 13, 2013 at 6:30 pm

Interesting discussion. When it comes to robo grading I experience now, I respond not what I think is the correct answer or insightful argument but what will the robot think. So, I am playing a game by trying to guess what the robot has been programmed to look for, if I am taking a course for a grade. If I am taking the course to learn, develop skills, etc. etc., then I am going to feel cheated if I am trying to expand the argument beyond box in which the robo-grader is programmed, not because I am not getting the grade, which is not reason for my taking the course, but because I am not getting feedback on my innovative or otherwise idea that was generated in part by the course content.

I remember in 1968 taking a standardized test to determine my suitability for military service. In my countercultural mindset of the time, I answered the numbered questions in order a, b, c, d, e, d, c, b, a, b, c, d, e . . . I failed that robo-graded test!

LikeLike

Reply ↓
1. Debbie Morrison Post authorApril 14, 2013 at 1:25 am
  
  Robert – Exactly! This is the mindset when students are writing to the machine – beat the system as he or she knows they are smarter than the machine – the machine has no integrity! Thanks for sharing. Debbie
  
  LikeLike
  
  Reply ↓
  1. Tim Hunt April 14, 2013 at 8:43 am
    
    Sorry, but there is nothing in what you wrote that is specific to machine-grading. When writing essays to be marked by humans, I have had to write the answer that matched the ideas taught in the course, rather than what I really believe, to make sure I got the marks.
    
    (In that case, one could try to argue that the purpose of the essay was to demonstrate that I had understood what was taught, and could apply it, rather than for me to sound off with my opinions, but still.)
    
    LikeLike
    
    Reply ↓
2. Laura Gibbs April 14, 2013 at 3:14 pm
  
  Tim, the difference is that a computer can never be better than this. A human teacher, on the other hand, can be better than that – and should be. See my comment about professional development above (or below – I’m not quite sure how comments are replies are ordered here).
  
  LikeLike
  
  Reply ↓
  1. Tony Demetriou April 16, 2013 at 9:08 am
    
    Laura, I don’t understand why a computer can never be “better than this”
    
    When an essay is being graded, someone (or something) reads through the text to extract meaning. That meaning is then checked to make sure that:
    – It follows a logical structure, making an argument or creating an explanation
    – Where appropriate, the assertions made are explained or referenced
    – It touches on all the important points necessary for the essay
    – (Bonus!) It expands on some of the points, introduces new points, or introduces a novel view of the material
    
    The grader then indicates the high points of the essay, so the student know what they did right, and indicates the bits that could be better, with an explanation of how it could be improved.
    
    Nothing about that process necessarily requires a human. At the moment, humans are better than machines at extracting the meaning from the essay, and the human marking the essay tends to be an expert in the field, so they already have the required information.
    
    But there’s nothing inherently “limiting” about having a machine do this task. We’ll still need a human to say what the appropriate structure for an essay is, what the necessary points are, what additional points are valid (or invalid) and so on. We still need a “human teacher” – but we don’t necessarily need that human teacher to spend all their time with a red pen.
    
    There are also all sorts of possible hybrid solutions. I could imagine a marking system that tags the essay with points that it things are right or wrong, and highlights points that it doesn’t understand. The teacher could then look through and tick the tags that the machine put in (so the machine learns that it’s doing it right) or crosses them out (so the machine learns that it did it wrong) – and could then tag the unknown section, so the machine can learn how to grade that, should a future student bring up a similar topic.
    
    My experience marking papers is very limited (I’m an IT guy, but I have taught university-level IT courses when the usual professors weren’t available) – and I don’t think a machine could have marked the papers as well as I did. Even with my inexperience. However, I don’t think it’s inconceivable that the machine could have marked them. I was teaching from a textbook – I was looking in the essays to see that the students had brought up the important points from the textbook, and had demonstrated that they *understood* those points. I’d tick each time they did explain something, and I’d make a quick list of the things they didn’t. Then looking at those two lists I’d assign a mark, and write comments. Maybe that was a terrible way of marking, maybe not. But a machine could *conceivably* have done it.
    
    What’s more, my comments of “You didn’t mention anything about user engagement” etc. would have been way more useful if the students got those instantly on submitting the assignments (since we’re already doing online submission anyway) – with the class I had, I know that every single one of them, bar only one, would have rewritten the assignment to include the missing information. Which means they probably would have gone back to the textbook and read up on the bit they were unclear on, or had forgotten. Which means they would have been educating themselves more – and that’s what I care about. I’d be thrilled if a machine could do the marking, and let the students re-write, until by the time I mark the papers I have to give every student 100% marks. That would make me very happy.
    
    And if we got to that point? Well, we could have experts adding in rules for the automated marking that cover subtleties in essay structure, or proper referencing, or adding tangental points that the student might bring up. And can build on this information, to a point where no single human could have reasonably done that in their marking.
    
    Let’s be honest, when a student references a textbook, how many markers will go out, grab that book, check the page, check the argument made in the book, and make sure the student has referenced properly? And that the reference is providing the information the student claims? Current-day computers probably could do that with pretty high accuracy (assuming there is an online version of the book being referenced)
    
    So yeah, I’m not saying computers will be better markers than humans, but I am saying that I reject the argument that they “can never be better than this”
    
    LikeLike
    
    Reply ↓
  2. Laura Gibbs April 16, 2013 at 1:21 pm
    
    Tony, it sounds like you have a very fanciful view of just what it means to teach the machine to assess a piece of human writing. The process is purely statistical and has nothing to do with meaning. Humans, when they write, do not write based on statistical models (most humans, in fact, do not even know what statistical models are); they do so based only on meaning. The systems are completely incommensurate and will never fit.
    
    So, while it is possible to teach a machine to compare, statistically, hundreds of thousands of pieces of writing to some pieces of writing (hundreds or thousands) that have been graded by humans, it is all just statistical processing. It works for grading if/when there is minimal/no deviation from the pre-marked papers. In essence, it is an expanded form of “fill in the blank.”
    
    As a result, when the writing prompt is soliciting the return of factual information, it is feasible… and that sounds like the type of marking you were doing. You were not evaluating writing; you were checking for information regurgitation. That kind of assignment should be a machine-graded test to begin with; it should not have been a writing assignment at all, based anyway on the description you have provided of how you marked it.
    
    But when a writing prompt asks a human being to think creatively in any way, providing answers that go beyond the pre-graded writing used to teach the machine, the machine will fail. “Does Not Compute.”
    
    Are there problems with how we teach writing now? Sure there are. I have provided some long comments about that here – http://mfeldstein.com/si-ways-the-edx-announcement-gets-automated-essay-grading-wrong/. Those are problems we need to fix, and computers are not going to help us fix them.
    
    I am a big believer in the use of computers to provide personalized tutorials for students, helping students diagnose gaps in their knowledge and skills and then giving them guided practice to fill in those gaps. Computers are incredibly good at that, better than humans. That is where our efforts should go, rather than in this misguided effort to have computers evaluate human writing.
    
    LikeLike
    
    Reply ↓
    1. Tony Demetriou April 17, 2013 at 1:46 am
      
      Hi Laura,
      
      I’m not a NLP expert, but it is a field I’m interested in, and I do follow blogs/discussions. So yeah, my views might be quite fanciful.
      
      I’m not sure that the gap between “meaning” and “statistical analysis” is as large as you think, though. Humans are doing statistical analysis all the time – they just don’t realize it. Our “intuition” is really just a whole lot of statistical heuristics based on our past experience, and like a computer statistical model, they aren’t perfect either.
      
      People don’t write to statistical models, but they do – as a group – regularly fit within various statistical models. It’s amazing how useful it can be to model traffic movement, crowd movement etc. – and then apply that to the real world, to end up with better designed areas.
      
      What does that have to do with teaching? Possibly nothing. But possibly, it shows that we don’t have to understand the “meaning” of everything we do – like how humans build up “intuition” from experiences, it’s conceivable that a machine can do the same. But I’m a reductionist, so my biases might be showing here 🙂
      
      Can a machine do this right now? I honestly don’t know. I suspect not, but there are some very clever people out there working on this problem. And we have techniques available that are a lot more targeted than just giving the machine a corpus of “good essay answers” and “bad essay answers” and making it figure everything else out statistically. We can have domain experts put in specific information for specific assignments. Many of the machine learning competitions have two tracks, one which allows prior knowledge about the domain, so domain-specific knowledge is clearly something that is already being used.
      
      All that said, I do agree with you. It’s feasible when the writing is soliciting the return of factual information. And that was the type of marking I was doing. My grading wasn’t based on the grammar, or quality of the argument, but on the learning displayed. So mostly on information regurgitation, but the reason to have writing tasks rather than “fill in the blank” or multiple choice is because we want to try and assess the understanding, not just the repetition. I don’t feel like I was fully successful in that. (I also didn’t write the assessments, they were handed over by the professor before he left)
      
      When a writing prompt asks a human being to think creatively – yes, the computer will fail to mark correctly if this goes beyond the pre-graded information. Absolutely.
      The same thing happens to a human, though. If I think creatively, and go beyond your knowledge of this topic, you’ll also hit a “do not compute” point. You’ll have to go out and seek more information – by reading up on the topic – by asking me for clarification, etc. – Is there any reason a computer shouldn’t be able to do the same? Is there any reason the computer shouldn’t flag this essay that expresses creativity, so that a human expert can look at it, give it a grade, and potentially add additional marking information to the computer?
      
      My follow-up question to this is, how much creativity do we expect to see? And how creative is it? I’m not intending to diminish my expectations of the students – as the article you linked to claims, machine grading isn’t useful to grade creative writing, or small groups of students working with a teacher. It’s useful when grading assignments for large groups of students. Of that large group there will be a few creative gems. There will also be a few “creative” students who write a lot of unexpected nonsense. And (depending on the topic and course) many who write exactly what is expected. Of the creative students, who see something beyond the text as-presented, how many of them will spot the same addition? It’s possible that we’re able to hand-mark these creative students (at both ends of the spectrum) while still teaching the machine domain-specific information. So the next round, the machine will have a slightly broader understanding, and flag a slightly lower percent as creative. Even though it’ll probably never get to 0%.
      
      When we’re talking about that much volume, in the real-life examples I know of, we’re not talking about an expert professor hand-grading each student’s essay and providing thoughtful comments. We’re often talking about the professor carefully writing out a marking guide, and giving that to a handful of tutors, who each (carefully) grade their stack of papers to the best of their abilities. So in the scenarios where there is machine grading, we’re not expecting the machine to be better than the professor – only better than the assistant, who is grading to a marking guide. Chances are the assistant is a high-grading student in the field, or similar, and still has a lot of useful knowledge, but even so the bar is a little lower.
      
      And on top of that, for me at least, I’m not even thinking that it should replace the assistant markers – just support the students. Give them instant feedback on their assignment, let them know if there are any glaring holes that they should rewrite, give them an expected mark range (50-60%, 60-80%, 80-100%) – that predicted range could even be biased to be low, so the students are only likely to receive pleasant surprises when the assistant marker gets to their paper.
      
      I currently have an adult friend doing distance education, and his biggest frustration is the seemingly-erratic marking. One assignment he gets comments saying he isn’t referencing properly, and then the next one where he referenced like instructed, he gets comments saying to do it differently again. Sometimes he barely scrapes through with a pass, and when he asks what he did wrong he got told “You’re doing fine, it’s just a hard course, and you didn’t go above and beyond the material”, yet he got 27/30 for his latest assignment, and can’t see what he did differently in this one. As far as we can tell, it’s just a matter of having so many different markers that each have their own preferences and expectations. In a scenario like this, surely machine marking could be useful. It could have told him why his referencing wasn’t good (or could have told the marker that his referencing was fine), it could have told him that he hit all the important points (or that he didn’t), it could have told him that his writing style was generally looking good or bad. And he could have then re-worked the essay with that feedback – minimal as it might be. Since that would still be better than the current (lack of) feedback. And hey, if he did put in something creative that the computer didn’t understand, it could have flagged that for someone to look at. That person could have then given him personalized feedback, which he might have received before submitting. (Because of the size of the class, the students are discouraged from asking for personalized feedback on their drafts)
      
      For myself, I’ve done some free online courses, that come with video lectures & homework. Homework that I always did, but never submitted (because I was doing it free, there was no marking available) – it’d have been cool if I could have submitted it. Even if the machine marking was substandard, hey, it’d still be more than nothing. Not because I wanted any qualification, but because I wanted confirmation that I was on the right track.
      
      So, yeah, nobody is saying that a machine can mark better than you, or that professors will be out of a job. That just isn’t the problem that we’re trying to solve at the moment. But if we want to improve education across the board, not just in our countries but worldwide, we need to find strategies to mass-distribute good teaching, and to keep the prices low to nonexistant. I am so thrilled that there are people who can’t afford university who get to learn to program iPhones due to free courses. That’s awesome. It’s also an “easy problem” because the students can self-assess, since they can see whether their code works or not. Their education isn’t as good as they would likely get at a university – but it’s a lot better than what was available a decade ago. I’m hoping that machine marking will open up the possibility of that sort of teaching for other fields, too. Those that can pay for teaching that includes human marking still will, and those that are happy with the machine marking quality, or can’t afford hand-marking, well, it’d be great to give them the option. Right?
      
      It seems impossible that a machine will be able to do this, I know. But we’re achieving the impossible all the time.
      It also seemed impossible that a machine could do something as human and personal as recommend music to me. Seriously, my good friends are pretty hit-or-miss when they recommend music (I’m picky!), and they know me, and know the music I like. Yet Pandora, after about half an hour of training, is consistently giving me songs I enjoy, and introducing me to new bands. Based on statistical analysis combined with human tagging of the song elements.
      
      It’s a great example of the combination of human knowledge with statistical analysis. And what it learned about my musical tastes was really interesting, and included things that I hadn’t realized myself (back when I first used it, it’d tell you why it picked out that song for you.)
      
      So yeah, that was an entirely different domain, but if a computer can learn to recommend music without even knowing what makes humans enjoy music… well… that just shows how useful statistical models can be. And how those models can apply to us, even if we don’t “write based on statistical models”
      
      So I agree with what you’re saying. My views are pretty fanciful for the technology at the moment, but I don’t think they are unreasonable for near-future technology. Computers will not be able to mark as well as a human expert, and aren’t able to assess creativity.
      
      But I also think that you’re comparing the computer to the top percent of markers, when it’ll be used to support or replace the bottom percent of markers. It’ll also be used to provide marking to students that otherwise wouldn’t be getting any feedback at all.
      
      Even if the machine isn’t perfect, I don’t think that the effort is “misguided” – I’d argue that it might even become essential, if we’re going to improve education across the poorer sectors of society, and across the poorer countries.
      
      And that won’t, in any way, diminish the value of an expert teacher hand-marking student assessments. It’ll just “fill the gaps” for the people who can’t have that.
      
      LikeLike
      
      Reply ↓
      1. Laura Gibbs April 17, 2013 at 4:34 am
        
        You’ve raised so many issue here, Tony – some of them are kind of far afield from the question of robograding, but they are all important questions for the future of education, that’s for sure.
        
        I really don’t see the way I read my students’ writing as being anything statistical at all – I don’t even compare one student to another (which is the entire basis for statistical grading); my goal is for each student to do something that is really unique and individual… so I guess you could say I want them to all be statistical anomalies! All writing should be creative writing… otherwise, what’s the point really? We should just make Powerpoint slides instead. And such creative writing is done in order to COMMUNICATE something, not just to get a grade. A computer can indeed assign a grade – but it cannot communicate with the human being in return.
        
        Meanwhile, the type of marking scenario you have described with the professor and the TAs is the kind of meaningless marking process that seems to me far better served by a well-designed exam. Especially if there is no revision as part of the assignment, I’m not convinced that it really has any value in terms of writing – for students to improve their writing, they need feedback (meaningful feedback) AND revision. Otherwise, it’s just better to do a test and have done with it, in my opinion. Especially in courses where there is limited or no instructor feedback, a test is much better – with a test at least students really can be confident in why their work is marked as it is. They will never have confidence in machine mark-up of their writing because they will never understand the statistical algorithms that drive that process; even the professor would be hard-pressed to grasp the computer’s algorithms – it’s a process entirely different from marking a test based on a key.
        
        Pandora is an interesting comparison but very different from what we are talking about here – Pandora is based not on analyzing the music but on analyzing the music CONSUMPTION patterns of lots of users. So Pandora offers a great model for what a robo-librarian could be, a research assistant who could help you find really valuable online resources – which would be REALLY useful, especially in a massive class where lots of students are consuming online materials that could be tracked as Pandora tracks people’s listening patterns. That’s quite different from robograding… and I sure would be glad if people spending all this time, money and energy to build robograders were building really user-friendly robo-librarians instead!
        
        LikeLike
        
        Reply ↓
        
        Tony Demetriou April 17, 2013 at 5:07 am
        
        Hey Laura,
        
        This post is going to be mostly “I agree, but it’s nice to dream…” 🙂
        
        Yeah, I was rambling a bit far afield from the topic – mostly to throw out examples of “you wouldn’t think statistics could do this but…”
        
        The way you describe marking writing, where you’re explicitly after the students to do something creative, is inspiring. It also explains your attitudes towards robo-marking. And I agree with them, I don’t think a robot can (now or in the near future) do marking like that, while also providing meaningful feedback.
        
        I don’t entirely agree that a robot can’t communicate with a human in return – the level of the robot’s communication skills will depend on what the robot “understands” of the information it’s trying to communicate. I enjoy the prose from Alan Dean Foster’s later novels, but not from his earlier novels. I’d be interested in finding out why – I bet any linguistic or writing expert would be able to tell me. But I also don’t find it implausible that a computer could tell me, it might be able to spot cadence or rhythms, or use of alliteration, or other features that might be what I enjoy from his later writing that weren’t as prevalent. Assuming the robot has been programmed to look for those features. Potentially, that same robot would be able to critique writing, and say “Hey, you know Tony really prefers prose that has a good cadence, read up about it here” which might help the writer, albeit not in a very targetted way. What’s more, (following your robo-librarian example) – once it knows what I enjoy, it could suggest other authors, and steer me away from their individual books I might not like.
        But it’s equally plausible that what I enjoy from the later books is better dialogue that feels more natural, or a deeper exploration of interesting themes, or maybe just more imaginative plots. And I doubt a computer could pick that out, even with domain-specific programming.
        
        My suspicion is that, when it comes to communication, a computer will be only as good at communicating as the person writing the program. I do think good user interface is REALLY important, and a huge part of that is how you choose for the program to communicate with the user.
        
        As for my marking scenario with the TA’s – yeah, I agree that it’s pretty meaningless, and would be much better served as a series of tests. Even better, interactive tests that could ask follow-up questions if the student seems unclear. I suspect in a lot of environments, a professor who only gives fill-in-the-blank or multiple choice assessments will be viewed with suspicion, even if they’re doing it because it’s the right answer to the problem. So there’s probably some social change that needs to go along with that. I personally promote the automated quizzes whenever talking to the academics – I love the idea that they can write the quiz once, and have it available for future students. Once it’s written, the only work is to ensure it stays relevant if the source material or course requirements change. And I certainly promote the “you can put in an explanation of why the answer was wrong” so students can use these quizzes to self-assess or study. I’m certainly not at the point where I’d be recommending that they use software to mark essays.
        
        Maybe we can entirely replace the need for robograders with properly structured assessments. It might take a bit of attitude shift in how we structure the courses & learning, and how students expect to engage with the material. Right now we’re “just” trying to get computers to take over one part of the existing process, instead of rethinking the whole process with the new technology in mind – and that’s the wrong attitude, in my opinion. But others argue that writing essays is valuable, either because it’s a skill that needs to be practiced, or because it allows students to demonstrate understanding in a way they can’t with other tasks. I’m not sure I subscribe to each of those attitudes – if you could write a multiple choice test, as long as it requires solid understanding of the material to be able to pass, then I’m happy with that. If you can take that further, and give immediate feedback & follow-up questions whenever I get something wrong, even better! The problem to solve is “how do we cheaply mass-assess student work and give useful, educational feedback” not “how do we write software that marks essays”
        
        Your comments about Pandora being about consumption are very canny. It’s true, I want Pandora to tell me what I’ll enjoy, and why. I don’t want Pandora to tell me what I could change about my musical taste so I can be a better listener. I love the idea of a robo-librarian, who could suggest books that other high-scoring students found useful for an assignment.
        
        LikeLike
        
        Reply ↓
Pingback: I Know I Sure Wouldn’t Like a Robot Grading My Essays. Would You? | Drew's Awesome Blog
Laura Gibbs April 12, 2013 at 5:44 pm

Thanks, Debbie, for starting up the conversation here! I left a lot of comments over at Elijah’s post, and am really glad to comment here too in the context of your other fabulous posts about MOOCs, teaching, and learning.
As someone who has worked a lot on this problem, I can assure you it goes far far far beyond nuance as online learning mentioned above. Computers cannot “understand” anything about what students write – they cannot “understand” even the most basic and obvious things about human language (obvious to humans that is), so their feedback is always going to have a slightly SURREAL quality. Anybody who has played around with Google Translate has experienced this same surreal quality, as has anyone who has played around with Turing-Test machines that attempt to carry on conversations with humans. The results provided by these AI systems are useful, yes, but decidedly odd… and the things that make the results odd (not just inaccurate, but downright weird) seem to me an impossible obstacle to overcome if the machine responses are designed to help students reflect on and improve their own writing.
Of course, we are talking about this in the abstract. So far, I have not seen anyone offering any kind of sample of the kind of feedback these automated scoring programs will provide. Up until now, these automatic text processing systems have been used only for grading (or to “grade the graders,” as College Board uses similar software in order to standardize its human readers and keep them in line). The idea of using these systems to provide meaningful feedback for students is, to my knowledge, unprecedented. So, it’s really up to the programmers to show us what they can do – right now, we are debating about this in the abstract without a single sample of the kind of feedback that these programs supposedly will provide.

LikeLike

Reply ↓
1. Debbie Morrison Post authorApril 12, 2013 at 6:02 pm
  
  Hi Laura,
  
  Thank for you for this excellent comment – you provide deep insight into this knotty issue of machine grading. This makes me wonder then, what DO these statements made by Anant Agarwal (edX) and others refer to when they discuss the valuable feedback that machine grading can provide? You are setting the record straight, that at this point there is little instructive feedback generated from these software programs that is of value to students. Thank you for the clarification.
  
  This is good – we need more of this dialogue before others jump to, and support machine grading as a solution for grading in MOOCs and other learning scenarios.
  
  A note to other readers, I highly recommend reviewing Laura’s comments on e-Literate post if your institution is discussing or is considering this type of software – with the right information, better decisions can be made on how to address and/or implement such programs. I have benefited greatly from Laura’s comments – she provides the perspective of an experienced university instructor. Thank you Laura!
  
  LikeLike
  
  Reply ↓
  1. Laura Gibbs April 12, 2013 at 6:14 pm
    
    I’m personally very curious to see what it will be like. My guess is that the feedback will be both bizarre and bad, and therefore quite confusing to students – and perhaps even likely to do more harm than good. But that’s all just speculation on my part. We really need to see just what these programs deliver! I’m glad Elijah is participating in a public discussion there at eLiterate – I will be following it closely.
    Also, do you know about my plans to create a robowriter? At first I was doing that just because I was angry (and, when I get angry, my usual response is to MAKE something, ha ha – it’s therapeutic!). Anyway, I now see that this robowriter is going to be very useful for me – in order to test these systems, you have to be prepared to give them literally THOUSANDS of essays. That is not the kind of thing a teacher can instantly and easily come up with. Well, with my robowriter, I will be able to give them THOUSANDS of essays, no problem at all – and I can introduce specific errors into those essays to test the machine software. So, while I did not foresee that initially, I now realize that my robowriter will come in very handy in the coming months. Plus, I am going to have a blast creating it!
    https://sites.google.com/site/mycourseraportfolio/
    
    LikeLike
    
    Reply ↓
    1. Debbie Morrison Post authorApril 12, 2013 at 6:37 pm
      
      “…feedback will be both bizarre and bad, and therefore quite confusing to students” LOL. Yes – I am sure, and the younger generation, teenagers at least, will be able to see right through anything unauthentic – which they will discount immediately (I’ve got two skeptics living in my house right now, a 16 year-old and 18 year-old).
      
      I love your robowriter. This is awesome. I look forward to seeing it evolve in the coming months. 🙂
      
      Thanks Laura for your (fabulous) comments.
      
      LikeLike
      
      Reply ↓
  2. Tony Demetriou April 16, 2013 at 9:18 am
    
    I can’t speak for edX, but I’ve worked with someone who had a working prototype for a marking machine, that did pre-marking for the human.
    
    What it did was twofold – firstly, it looked through for any obvious mistakes in the structure or syntax (e.g. incorrect referencing, too far from the desired wordcount) – secondly, it looked through for keywords or keyword combinations, and flagged those. The marker puts in those keywords beforehand, of course, as they need to vary depending on the essay topic.
    
    When the marker opens the essay, they can see which paragraphs refer to which keyword set (and therefore which expected point or argument the student is making) – and it flags any sets of keywords that are not found.
    
    The marker then has to read the essay (of course!) and if there is a keyword set that wasn’t found, they can scan to see if the student mentioned that topic. If not, there’s a pre-written comment about that aspect of the essay explaining that the student missed it, and why it’s important.
    
    Similarly, for the other topics, there can be pre-written comments about it, that clarify the topic in case the student didn’t clearly display their understanding. The marker can also write their own comments, and either just add the comment for that essay, or save it for future essays, since we’ve found that if one student needs clarification, chances are there are other students who will also need the same clarification.
    
    The end result is… a clunky system that some people hate using. Or a beautiful system that other people love using. Entirely depending on how they prefer to work, and how comfortable they are with computers. For the people who love it, it means they can spend a lot more time writing detailed, well thought out comments, which are then useful for a number of students, so they can be more efficient with their marking time. It also helps flag things the student might have missed, when it’s easy for a human to also miss that when they’re on their 50th essay.
    
    I know I’m talking about assisted marking for humans, not entirely automated marking, but I think there are clear advantages here, even if the technology isn’t properly matured yet.
    
    LikeLike
    
    Reply ↓
Online Learning April 12, 2013 at 5:02 pm

While the idea of machine grading is quite fascinating due to it’s ability to scale to meet a large class, it’s hard to believe that a machine will ever learn to account for nuance or intention that can be found in writing. The computers that would be needed to do this would essentially be come sentient, which i think is still far off.

LikeLike

Reply ↓
1. Debbie Morrison Post authorApril 12, 2013 at 5:36 pm
  
  I agree – it is the student voice and tone, as well as depth of feeling that the writing invokes that computer grading is not able to consider. Though, in fairness, the companies that create the software do acknowledge this. Thanks for your comment.
  
  LikeLike
  
  Reply ↓