New York City finally released its school AI policy last week, nearly three years after briefly banning ChatGPT. Two rigorous studies show AI tutors can outperform human instruction - but only under specific conditions. And research confirms what many suspected: AI detection tools disproportionately flag non-native English speakers as cheaters.
Here’s what actually matters from the past two weeks in AI education.
NYC Gets a Rulebook (Sort Of)
The NYC Department of Education’s preliminary AI guidelines use a traffic light system:
Red (Prohibited): AI cannot make decisions about student placement, discipline, graduation, or program access. IEPs and 504 plans require qualified humans.
Green (Encouraged): Teachers can use AI for lesson planning, drafting communications, training materials, and scheduling.
Yellow (Proceed with Caution): Students can use AI for research, exploration, and creative projects - with teacher supervision.
The guidelines explicitly state that AI cannot replace the work teachers do. It cannot grade student work. It cannot make disciplinary decisions.
What’s missing: comprehensive privacy protections, specific guidance on student use, and clarity on the many edge cases that actually matter. The city is accepting public comment until May 8, with a fuller “playbook” expected in June.
Five Community Education Councils have already passed resolutions calling for a two-year moratorium on AI in schools. The guidelines arrive into a divided landscape.
The Research: AI Tutoring Actually Works
Two randomized controlled trials published in recent months provide some of the first rigorous evidence on AI tutoring:
The Harvard study (published in Scientific Reports) tested AI tutoring against active learning classes with 194 undergraduate physics students. The AI group learned more, in less time, and reported higher engagement and motivation.
The UK classroom study tested Google’s LearnLM tutoring system in secondary school math, with human tutors directly supervising every AI-generated message. Students who received AI tutoring answered subsequent questions correctly 66.2% of the time, compared to 60.7% for students with human-only tutoring.
The critical detail: the UK study had experts reviewing and revising every AI message before students saw it. This wasn’t a chatbot set loose in a classroom. It was AI as a first draft, with human judgment as the final check.
Both studies point toward the same conclusion: AI tutoring can work, but the “how” matters enormously. Pedagogical guardrails - systems that ask guiding questions rather than provide answers - produce better outcomes than general-purpose chatbots.
The Detection Problem Gets Worse
AI detection tools continue to produce discriminatory outcomes.
A 2026 analysis found a 61.3% false positive rate for TOEFL essays written by Chinese students, compared to 5.1% for essays from US students. Stanford researchers found that while detectors performed “near-perfect” on essays by US-born eighth-graders, they misclassified over 61% of essays by non-native English speakers as AI-generated.
The reason: non-native speakers often use clearer, more standardized sentence structures - formal grammar learned through instruction that happens to resemble patterns AI detection systems associate with machine-generated text.
Several universities, including Arizona, have disabled AI detection features on their plagiarism software entirely due to reliability concerns. The technology simply isn’t accurate enough for high-stakes decisions.
The alternative gaining traction: process documentation rather than product detection. Having students submit drafts, outlines, and reflection notes makes it harder to submit AI-generated work while supporting genuine learning.
The Equity Problem
The optimistic case for AI in education was that it could democratize access to personalized tutoring - the equivalent of a private tutor for every student.
The emerging evidence suggests the opposite may happen.
Preliminary research indicates that affluent suburban districts are about twice as likely to train teachers to use AI as high-poverty urban or rural districts. Students from wealthier households use AI more often. And the cognitive offloading risk - where students use AI as a replacement for thinking rather than a supplement - appears to hit disadvantaged students harder.
Researchers from the University of Technology Sydney warn that unstructured AI use will likely widen existing achievement gaps. Students with strong background knowledge can use AI for beneficial offloading and accelerate learning. Novice or disadvantaged students are more likely to outsource thinking and fall further behind.
The irony: the students who could benefit most from AI tutoring are least likely to use it effectively without guidance that their schools often can’t provide.
Teaching Teachers
85% of K-12 teachers used AI during the 2024-2025 school year. But AI use among both students and educators has grown by more than 15 percentage points in the past year while training and policy haven’t kept pace.
A Global AI Faculty Survey found 40% of faculty feel they’re “just beginning” their AI literacy journey. Only 17% describe themselves as advanced or expert. And 80% say their institutions haven’t clarified how AI should be applied in teaching.
The Chronicle of Higher Education offered practical strategies last week for teachers trying to use AI without enabling cognitive offloading:
- AI as Socratic partner: Systems designed to question and respond with follow-ups that require students to explain, defend, and engage with different viewpoints
- AI verification: Students prompt AI to produce something, then evaluate the output, putting them in the evaluator role
- Cognitive mirror: AI that asks students to explain rather than providing explanations
The common thread: keeping the cognitive work with the student while using AI to structure or prompt that work.
What This Means
The NYC guidelines represent progress - official acknowledgment that AI exists and needs rules. But the guidelines’ yellow zone for student use reflects the genuine uncertainty that remains. Nobody has figured this out yet.
The RCT evidence is encouraging for supervised AI tutoring but says nothing about what happens when students use ChatGPT unsupervised for homework. The detection bias problem means schools can’t reliably catch misuse even if they wanted to. And the equity gap suggests AI may widen rather than narrow educational inequality without significant intervention.
The phrase from researchers that captures the current moment: “teach, not tell.” AI tools designed with pedagogical guardrails that guide reasoning rather than provide answers show promise. General-purpose chatbots that just answer questions may do more harm than good.
What Educators Can Do
Stop relying on AI detection software. The false positive rates are unacceptably high, especially for non-native speakers. Redesign assignments instead.
Adopt process-based assessment. Require drafts, outlines, and reflection notes. This is harder to fake and better pedagogy anyway.
Make AI use explicit. If students can use AI, specify how. Document prompts. Explain reasoning. Keep the cognitive work with the student.
Get trained. If your institution isn’t providing AI literacy training, seek it out. The Chronicle, EDUCAUSE, and professional associations are offering resources.
What Students Should Know
The research is becoming clearer: students who become dependent on AI and lose access perform worse than those who never used it. The “performance paradox” is real.
If you’re using AI as a replacement for thinking rather than a supplement to it, you’re building a dependency that will cost you when it matters. The skills you don’t develop now - critical thinking, writing, analysis - are exactly the skills that matter when the test is proctored or the job interview is live.
Use AI to explain concepts. Use it to check your work. Use it to brainstorm. But if you can’t do the work without it, you have a problem building.