ClarixAI
A misconception-pattern radar for teachers · open-ended student answers in, the reasoning errors dominating a cohort out.
ClarixAI reads open-ended student answers (Dutch, English, Hungarian) and surfaces the misconception patterns dominating a class — concept confusion, surface-pattern matching, missing precondition, and six others — with a concrete pedagogical action card per pattern. It does not grade students; it grades the class's reasoning. A per-language multi-task transformer ensemble plus HDBSCAN clustering finds structural patterns across questions.
ListenA wrong multiple-choice answer tells you nothing about the reasoning behind it. Built a radar that reads open-ended answers and tells the teacher which misconceptions are running the room.
ClarixAI processes open-ended student answers in Dutch, English and Hungarian through a per-language transformer ensemble (RobBERT-2023, RoBERTa-base, huBERT) with a 1-vs-rest sigmoid head over eight misconception meta-tags. HDBSCAN clustering on encoder embeddings surfaces structural patterns across questions; the dashboard turns each tag into a what-it-means / why-it-happens / what-to-do card. The studio shipped the training pipeline, the inference service, and the teacher dashboard.
Our open-ended questions used to be the part of the assessment we'd skim, because reading 60 answers for a pattern was a whole evening. The team built a system that does the reading for us and tells us, per question, which misconception is running the room — and what to do about it, not just that it's there. We stopped over-prescribing 'more practice'. We started reteaching the right thing.
What's on screen
Frame breakdown
- 01User surface
The whole experience the user sees
This frame shows the live product: a misconception-pattern radar for teachers · open-ended student answers in, the reasoning errors dominating a cohort out. Every component is ours · scope, design, code, deploy.
- 02Stack behind the screen
What's powering it: Python, PyTorch, Transformers
5 stack components run behind this frame · Python, PyTorch, Transformers drive the visible UI; the rest sit in the data layer. All studio-owned.
- 03What we shipped
Multi-task misconception classifier · NL · EN · HU
The class's dominant reasoning errors are visible per question
- 04Status
Private deploy · under NDA.
Per the client's request the URL stays private · the build, architecture, and lessons can be shared in a scoping call.
How it shipped
Timeline- 01 · BRIEF
Pin the unit of analysis as the reasoning behind the answer — not the answer itself.
Decision the product rests on: the system never grades students. It surfaces the misconceptions running through the class so the teacher knows which question to re-explain, and how.
- 02 · ARCHITECTURE
Stack decisions before any code.
Decision doc captured the data flow, Python, PyTorch, Transformers, FastAPI role split, and the failure modes we'd handle in v1 vs defer. Cross-service boundaries (where AI ends and the web app begins) were drawn here so neither side leaked into the other later.
- 02 · BUILD
Per-language transformer + anti-overfit recipe + ensemble for prod.
RobBERT-2023 for Dutch, RoBERTa-base for English, huBERT for Hungarian — trained with the v5.0 anti-overfit recipe (frozen lower layers, focal loss, SupCon, weighted gold). NL ships as a 4-checkpoint ensemble (v6.2) with bounded per-tag thresholds + a k-NN prototype blend; EN matches Dutch on gold F1 and beats it on adversarial argmax.
- 04 · POLISH
Performance, accessibility, and observability.
PSI / a11y / coverage budgets enforced as launch gates. Logging + metrics wired before cut-over · the team can answer 'is it working?' from a dashboard, not a Slack thread. Threat-model checklist signed off before traffic hits the box.
- 03 · SHIP
FastAPI inference + Vite/React dashboard, three languages live.
Inference clusters per-question answers via HDBSCAN over the encoder's embeddings, surfaces structural cross-question patterns and per-cohort trends, and feeds the pedagogical knowledge base that turns each tag into a teacher action card.
What shipped
04- 01Model
Per-language ensemble · 8 misconception tags
RobBERT / RoBERTa / huBERT with a 1-vs-rest sigmoid head — concept confusion, surface pattern, missing precondition + five more.
- 02Pipeline
Classify → cluster → trend
Classify each answer, HDBSCAN-cluster the encoder embeddings per question, then roll up the cross-question and cohort-level patterns.
- 03Dashboard
Per-question + cohort trend
Vite/React with i18n (nl/hu/en) — teachers see misconception density per question and the trend across cohorts side by side.
- 04Pedagogy
Action card per misconception tag
Every tag links to a what-it-means · why-it-happens · what-to-do card — the radar surfaces the pattern, the card tells the teacher how to respond.
From the video
Frame by frame
01FramePer-question panel · misconception tags lit up
The teacher sees, per question, which misconception meta-tags dominate this cohort — not who answered wrongly, but the reasoning pattern under it.
02FrameThresholded tags · bounded per-tag scoring
Bounded per-tag thresholds keep the radar honest — a noisy answer doesn't smear every tag, the dominant one stays the dominant one.
03FrameCluster view · structural patterns across questions
HDBSCAN on the encoder's embeddings finds the misconception that keeps reappearing across different questions — the curriculum-level signal a per-question view misses.
04FrameCohort trend · the curriculum signal
Trends across cohorts make curriculum gaps obvious — if the same misconception dominates three classes in a row, the textbook chapter is the lever, not extra practice.
THE PROBLEM
- −Multiple-choice tells you 'wrong' · not WHY they were wrong
- −Open-ended answers are too time-consuming to read for patterns
- −'More practice' is the default action when teachers can't see the pattern
WHAT THE CLIENT GOT
- The class's dominant reasoning errors are visible per question
- Every error pattern comes with a concrete teacher response
- Cohort trends make curriculum gaps obvious
WHAT WE DELIVERED
- +Multi-task misconception classifier · NL · EN · HU
- +8 reasoning-error meta-tags · with a teacher action per tag
- +HDBSCAN clustering for cross-question structural patterns
- +Cohort-level trend dashboard
- +Pedagogy library · what it means, why it happens, what to do
STACK
- Python
- PyTorch
- Transformers
- FastAPI
- React
RELATED READING
- AI solutions · Websites, web apps & online shops · Cybersecurity · Custom software · everything elseDField Q3 2026 roundup · what shifted, what we shipped, what is brokenThree months in. SZEP 2.0 live, NAV v3 cutover, AI Act enforcement, OWASP LLM Top 10 v2. Hard numbers, one strong opinion on the consulting tier.
- AI solutions · Websites, web apps & online shops · Custom software · everything elseDField Q2 2026 roundup · what shifted, what we shipped, what is brokenFour months in. Eleven shipped projects, real before/after numbers, one strong opinion on what the consulting tier got wrong this quarter.
- AI solutionsThe EU AI Act in practice: a 2026 guide for teams shipping AIThe EU AI Act is phasing in, it reaches companies far outside the EU, and most of the work is engineering, not legal. Here's how a build team should actually think about it in 2026.
- AI solutionsSelf-hosted AI or the API? When to run your own LLM in 2026Calling the OpenAI or Anthropic API is the right default for most AI features. But data sensitivity, steady high volume, or strict EU residency can flip the answer. Here's the honest decision.