Why EdTech demands significantly stricter AI constraints and architectural guardrails.
When we talk about the reliability of AI in business, we always tend to mention the same examples: financial, healthcare, and legal systems. These are systems where a failure has significant and visible consequences and which are strictly regulated.
Education is not included in these discussions. It should be.
Over the past year, I’ve been asking myself a question: what does it mean for AI to be reliable in an environment where the end users are twelve years old, the content has pedagogical implications, and a failure doesn’t lead to a lawsuit, but rather to a child receiving incorrect instruction?
The more I think about it, the more I believe that the reliability standards we apply to other business systems seem inadequate for primary and secondary education systems.
The relationship between user and AI is reversed
In most discussions about AI reliability, the user is an adult with experience in the field. A lawyer analyzing the evaluation of an AI contract will notice when the AI makes a mistake in a clause. A financial expert analyzing financial data will notice when AI makes a mistake with a number. In other words, the user is responsible for verifying the AI. If the AI fails, the user will detect the error.
In a primary and secondary education system, the relationship is reversed. A twelve-year-old student lacks the capacity to analyze whether AI provides accurate information. They are not in a position to verify the AI. Their role is to learn from the AI. The AI is the authority. If the AI fails, the student doesn’t detect the error and learns incorrectly.
And this is worse than it seems. An adult expert can overlook an error and move on. But a student internalizes an error in an explanation and uses it as a basis for their subsequent learning.
“Generally correct” is an acceptable standard when users can detect the exception. But in primary and secondary education, “generally correct” means “incorrect a certain percentage of the time, without any detection mechanism.” This is a problem of a different nature.
Factual accuracy is necessary. It is not sufficient.
The standard measure of AI reliability has only one dimension: was the answer objectively accurate? But in a primary and secondary education system, this represents a minimum, not a maximum.
Furthermore, it must be pedagogically accurate, meaning it must have a level of detail appropriate for the student’s age and current level of understanding. It must build upon their prior knowledge and present concepts in a way that accommodates their subsequent development, without shortcuts that will prove ineffective later on.
LLMs fail here in a subtle but particular way: they offer simple explanations that are accurate enough to answer the immediate question, but present a model of how things work that is not accurate in a broader sense. The student gets the correct answer, but the wrong model, and this goes undetected by standard accuracy metrics because the explanation was already provided.
For example, when asked, “Why does water boil?” AI might indicate that boiling is due to the application of heat. This might be useful in the kitchen, but it’s false from a thermodynamics perspective because it omits pressure.
To detect this problem, an evaluation approach is needed that asks: “Is this explanation appropriate for this age group and this curriculum level?” And this requires expertise in education, rather than AI, something many AI teams lack.
The data protection layer is really different
The GDPR applies to everyone. However, data concerning minors has special provisions: higher protection requirements, specific rules on consent, and stricter rules on what data to process and how to process it.
The EU AI Act places education in the high-risk category. This means that all obligations apply: risk management documentation, data governance, technical documentation, activity logs, and human oversight.
The knowledge base may contain student-generated data. Its records will include information about sensitive interactions between students and teachers. Its assessment data will include information about individual student performance. All of this data must be managed with a higher level of rigor than would be applied to adult business data. None of this is optional, and none of it can be added later as a secondary consideration in the system design.
Bias has a different impact when users cannot object
AI systems can have varying response quality depending on the demographic group. Bias is a serious problem in enterprise AI, but users are adults with the ability to report discriminatory treatment and object to it.
In the case of AI in primary and secondary education, the users are children who lack this ability.
For primary and secondary education platforms, this means that the system must specifically assess whether its performance differs according to the demographic profile of the students—not just average accuracy, but also accuracy by language, culture, and learning style. This implies that the system must be continuously evaluated, not just before its launch and subsequent discontinuation.
This is not just an evaluation; it is also becoming a legal requirement for AI systems in the European education sector.
Human oversight involves teachers and parents, not just engineers
The need for effective human oversight is very different in primary and secondary education.
The human actors involved are not just other engineers or internal reviewers. They are also the teachers who need to understand what AI is communicating to their students. They are the administrators who need to understand the system’s behavior for all their students. And they are the parents who need to understand, to some extent, what AI is doing in their children’s lives.
This is a requirement of trust, not just regulatory compliance. Educational institutions are based on models of trust: the institution and the student, the institution and the parents, the institution and the government. An AI system that is not transparent to teachers and parents destroys these models, regardless of the system’s accuracy.
What is the actual standard?
Reliable educational AI requires: pedagogical accuracy, no bias, protection of minors, transparency for teachers, and expert human oversight.
This standard is more demanding than that required by most enterprise AI reliability frameworks. However, it is also very manageable. These characteristics can be designed, implemented, and even measured, provided that priority requirements are considered from the outset.
Platforms that meet this level of standard will earn the trust of schools, parents, and regulators. Platforms that apply general enterprise standards to an area where more is needed will learn the cost of this gap.
Education powerfully influences how people think. Artificial intelligence used in this process must be subject to a standard that takes this into account.
Conclusion
AI in classrooms demands a higher standard: trust. It’s not enough for the system to simply not fail; it must be designed for an environment where the user cannot audit errors. For AI to be a legitimate ally in primary and secondary education, it must prioritize pedagogical integrity and data governance from the outset. Any lower standard risks not only legal repercussions but also jeopardizes student learning.
Some information may be outdated