Stanford Law School researchers say AI-generated tutoring answers beat responses written by law professors in a blind test. The finding gives legal education another reason to take AI tools seriously, rather than treat them only as classroom shortcuts.
The study, “Law Professors Prefer AI Over Peer Answers,” asked 16 U.S. law professors to create 40 representative contracts-law questions. The professors also wrote their own answers. They later judged anonymized comparisons between human answers and AI responses.
Across 2,918 comparisons, Stanford says professors picked the large language model answers at an average win rate of 75.33%. According to the paper’s abstract, the AI responses also performed close to the best human instructor in the study.
The surprise was not just accuracy
Legal reasoning is not a clean multiple-choice test. Stanford Law professor Julian Nyarko said the team focused on law because it requires judgment and nuance. It also requires people to work through ambiguity.
The paper says reviewers flagged AI answers as harmful 3.53% of the time. Professor-written answers were flagged at 12.06%. Stanford’s press release frames that gap as evidence that AI tutors may help students when schools deploy them carefully.
That does not mean law schools should hand the classroom to chatbots. The researchers still caution that responsible deployment matters. Students need to learn how to argue, spot weak reasoning, and understand ethical limits.
Why it matters for students
For students, the bigger takeaway is access. A strong AI tutor could answer office-hour-style questions on demand. Professors could then keep the deeper teaching role.
That same tension has been showing up across the wider AI ethics debate. Institutions are trying to balance usefulness with overreliance.
The study also tested specific systems, including commercial tutoring tools and Google’s NotebookLM. Performance varied by model. So the headline is not that every AI assistant is ready for legal education. It is that some AI systems can now meet a professional teaching standard in a judgment-heavy subject.
Stanford’s paper is posted through SSRN. The school says the work included researchers from Stanford, Yale, NYU, the University of Chicago, and other institutions.












































