SHIPPED WORK2026

AI solutions Websites, web apps & online shops Custom software · everything else

ClarixAI

A misconception-pattern radar for teachers · open-ended student answers in, the reasoning errors dominating a cohort out.

ClarixAI reads open-ended student answers (Dutch, English, Hungarian) and surfaces the misconception patterns dominating a class - concept confusion, surface-pattern matching, missing precondition, and six others - with a concrete pedagogical action card per pattern. It does not grade students; it grades the class's reasoning. A per-language multi-task transformer ensemble plus HDBSCAN clustering finds structural patterns across questions.

Listen

CASE STUDY · 2026

A wrong multiple-choice answer tells you nothing about the reasoning behind it. Built a radar that reads open-ended answers and tells the teacher which misconceptions are running the room.

ClarixAI processes open-ended student answers in Dutch, English and Hungarian through a per-language transformer ensemble (RobBERT-2023, RoBERTa-base, huBERT) with a 1-vs-rest sigmoid head over eight misconception meta-tags. HDBSCAN clustering on encoder embeddings surfaces structural patterns across questions; the dashboard turns each tag into a what-it-means / why-it-happens / what-to-do card. The studio shipped the training pipeline, the inference service, and the teacher dashboard.

DELIVERY·TRAINEDSTACK·PyTorch · Transformers · FastAPI · ReactLANGUAGES·NL · EN · HU

Our open-ended questions used to be the part of the assessment we'd skim, because reading 60 answers for a pattern was a whole evening. The team built a system that does the reading for us and tells us, per question, which misconception is running the room - and what to do about it, not just that it's there. We stopped over-prescribing 'more practice'. We started reteaching the right thing.

Anonymous·Researcher · education-AI platform (under NDA)UNDER NDA

3Languages · NL EN HU

8Misconception meta-tags

4-ckptEnsemble · NL prod

HDBSCANCross-question clustering

What's on screen

Frame breakdown

ClarixAI · misconception-pattern radar for teachers

01User surface
The whole experience the user sees
This frame shows the live product: a misconception-pattern radar for teachers · open-ended student answers in, the reasoning errors dominating a cohort out. Every component is ours · scope, design, code, deploy.
02Stack behind the screen
What's powering it: Python, PyTorch, Transformers
5 stack components run behind this frame · Python, PyTorch, Transformers drive the visible UI; the rest sit in the data layer. All studio-owned.
03What we shipped
Multi-task misconception classifier · NL · EN · HU
The class's dominant reasoning errors are visible per question
04Status
Private deploy · under NDA.
Per the client's request the URL stays private · the build, architecture, and lessons can be shared in a scoping call.

How it shipped

Timeline

01 · BRIEF
Pin the unit of analysis as the reasoning behind the answer - not the answer itself.
Decision the product rests on: the system never grades students. It surfaces the misconceptions running through the class so the teacher knows which question to re-explain, and how.
02 · ARCHITECTURE
Stack decisions before any code.
Decision doc captured the data flow, Python, PyTorch, Transformers, FastAPI role split, and the failure modes we'd handle in v1 vs defer. Cross-service boundaries (where AI ends and the web app begins) were drawn here so neither side leaked into the other later.
02 · BUILD
Per-language transformer + anti-overfit recipe + ensemble for prod.
RobBERT-2023 for Dutch, RoBERTa-base for English, huBERT for Hungarian - trained with the v5.0 anti-overfit recipe (frozen lower layers, focal loss, SupCon, weighted gold). NL ships as a 4-checkpoint ensemble (v6.2) with bounded per-tag thresholds + a k-NN prototype blend; EN matches Dutch on gold F1 and beats it on adversarial argmax.
04 · POLISH
Performance, accessibility, and observability.
PSI / a11y / coverage budgets enforced as launch gates. Logging + metrics wired before cut-over · the team can answer 'is it working?' from a dashboard, not a Slack thread. Threat-model checklist signed off before traffic hits the box.
03 · SHIP
FastAPI inference + Vite/React dashboard, three languages live.
Inference clusters per-question answers via HDBSCAN over the encoder's embeddings, surfaces structural cross-question patterns and per-cohort trends, and feeds the pedagogical knowledge base that turns each tag into a teacher action card.

What shipped

01Model
Per-language ensemble · 8 misconception tags
RobBERT / RoBERTa / huBERT with a 1-vs-rest sigmoid head - concept confusion, surface pattern, missing precondition + five more.
02Pipeline
Classify → cluster → trend
Classify each answer, HDBSCAN-cluster the encoder embeddings per question, then roll up the cross-question and cohort-level patterns.
03Dashboard
Per-question + cohort trend
Vite/React with i18n (nl/hu/en) - teachers see misconception density per question and the trend across cohorts side by side.
04Pedagogy
Action card per misconception tag
Every tag links to a what-it-means · why-it-happens · what-to-do card - the radar surfaces the pattern, the card tells the teacher how to respond.

From the video

Frame by frame

01Frame
Per-question panel · misconception tags lit up
The teacher sees, per question, which misconception meta-tags dominate this cohort - not who answered wrongly, but the reasoning pattern under it.
02Frame
Thresholded tags · bounded per-tag scoring
Bounded per-tag thresholds keep the radar honest - a noisy answer doesn't smear every tag, the dominant one stays the dominant one.
03Frame
Cluster view · structural patterns across questions
HDBSCAN on the encoder's embeddings finds the misconception that keeps reappearing across different questions - the curriculum-level signal a per-question view misses.
04Frame
Cohort trend · the curriculum signal
Trends across cohorts make curriculum gaps obvious - if the same misconception dominates three classes in a row, the textbook chapter is the lever, not extra practice.

2026YEAR

03SERVICES

05TECHNOLOGIES

PRIVATESTATUS

THE PROBLEM

−Multiple-choice tells you 'wrong' · not WHY they were wrong
−Open-ended answers are too time-consuming to read for patterns
−'More practice' is the default action when teachers can't see the pattern

WHAT THE CLIENT GOT

The class's dominant reasoning errors are visible per question
Every error pattern comes with a concrete teacher response
Cohort trends make curriculum gaps obvious

WHAT WE DELIVERED

+Multi-task misconception classifier · NL · EN · HU
+8 reasoning-error meta-tags · with a teacher action per tag
+HDBSCAN clustering for cross-question structural patterns
+Cohort-level trend dashboard
+Pedagogy library · what it means, why it happens, what to do

STACK

Python
PyTorch
Transformers
FastAPI
React

ClarixAI

A wrong multiple-choice answer tells you nothing about the reasoning behind it. Built a radar that reads open-ended answers and tells the teacher which misconceptions are running the room.

What's on screen

The whole experience the user sees

What's powering it: Python, PyTorch, Transformers

Multi-task misconception classifier · NL · EN · HU

Private deploy · under NDA.

How it shipped

Pin the unit of analysis as the reasoning behind the answer - not the answer itself.

Stack decisions before any code.

Per-language transformer + anti-overfit recipe + ensemble for prod.

Performance, accessibility, and observability.

FastAPI inference + Vite/React dashboard, three languages live.

What shipped

Per-language ensemble · 8 misconception tags

Classify → cluster → trend

Per-question + cohort trend

Action card per misconception tag

From the video

Per-question panel · misconception tags lit up

Thresholded tags · bounded per-tag scoring

Cluster view · structural patterns across questions

Cohort trend · the curriculum signal

Like what you see? Let's build yours.