As educators, we often rely on simplified classroom texts to make complex topics accessible for less seasoned and younger readers. But over the past two-three years, one of my biggest concerns when using AI for my classroom has been how much nuance is lost in the process—especially when it comes to historical topics involving discrimination and systemic injustice.
Tools meant to help us scaffold texts can sometimes be accidentally racist because they reinforce oversimplified or even misleading narratives. In particular, I’ve seen AI-generated leveled texts erase important context around institutional racism in the name of lowering the reading level.
After attending ISTE this year, I felt like the tools had matured enough to run a new test.
SPOILER ALERT: Overall, Diffit gave me an article I could be comfortable sharing with students but it was longer than the original to keep the context but lower the reading level. Chat GPT did a nice job that was less comprehensive but still solid while BrainFreeze kept all the important context but at a reading level that would still need additional scaffolding. Sadly Brisk, which has been a favorite tool for other tasks, came out at the bottom.
What I Did:
I used a Library of Congress article about Rosa Parks—one that highlights her decades of civil rights activism beyond the famous moment on the bus—and asked several AI tools to rewrite it at a 4th grade reading level. Then I analyzed the results with support from Claude, one of the leading large language models, to help compare tone, nuance, and depth. I also ran each output through Lexile.com’s analyzer to get a clearer picture of the readability levels.
Key Takeaways:
1. Many tools oversimplify history to focus only on the bus boycott.
Most tools zoomed in on Rosa Parks’ decision to stay seated and skipped over her broader activism before, after and even during the Montgomery Bus Boycott. In doing so, they not only minimized Mrs. Parks’ lifelong contributions but also flattened the story of institutional racism. Some even suggested that segregation ended cleanly after one event and a Supreme Court decision—undermining the ongoing resistance and activism that followed.
2. Length matters more than you think.
Most rewritten versions came back at about 400–600 characters. One exception was Diffit, which produced a much longer version (around 1,500 characters) that preserved far more context. It’s a reminder: if you want to keep depth, a longer passage may be necessary once the language is simplified.
3. “4th grade level” means different things to different tools.
The Lexile levels varied significantly, and effective readability also depended on how much background knowledge and emotional maturity the text required. Some outputs hit lower lexile scores but still demanded quite a bit of inference. Claude helped assess reading level with attention to both cognitive and contextual demands, similar to how you might use Fountas & Pinnell levels in your own classroom.
4. Tool design really impacts classroom usability.
Not all tools were created equal when it came to ease of use. Some required manual copying and pasting, which led to formatting headaches. Diffit was the only tool that pulled an image from the original article and included vocabulary support and summaries automatically. These features made it far more classroom-ready.
What I Found:

- Diffit: Best balance of readability and depth—but much longer. Ideal if you’re willing to give students more text to get better context.
- ChatGPT & BrainFreeze: Strong contenders for balanced tone and scope, especially when word count is limited. They each maintained focus beyond the bus boycott and included clear language and information about broader systemic issues, but came back with higher lexiles.
- School AI, MagicSchool, Gemini and Claude
- Brisk: Had the shortest and most surface-level rewrite. Quick and simple but lost clarity and nuance with a real risk for reinforcing simplified thinking about historical oppression.
- After tweaking the outputs manually: BrainFreeze could get to a simpler level but lost historical context. ChatGPT offered the best mix of simplicity and substance after revision.
Ultimately, teaching with AI can be powerful—but it requires careful guardrails and prompt engineering. When simplifying history, we’re not just shortening words. We’re shaping how students understand the world.
Going through this analysis felt so useful that I want to take it to my students. I created my own version of the original article for them to read and they will have a chance to compare the information to various AI tools to foster their AI literacy. This activity is available in my Teachers Pay Teachers Store.
As you explore tools like Diffit, ChatGPT, Brisk, and BrainFreeze, be sure to test for more than just lexile level. Look for whether the soul of the story still shines through. Ask what’s missing—and let your students ask, too.
I hope this gives you a thoughtful starting point as you continue teaching with AI in ways that stay true to our dearest values and pedagogical hopes.


