DField SolutionsMérnöki stúdió · Budapest
Loading · Töltődik
Skip to content

ClarixAI

A misconception-pattern radar for teachers · open-ended student answers in, the reasoning errors dominating a cohort out.

ClarixAI reads open-ended student answers (Dutch, English, Hungarian) and surfaces the misconception patterns dominating a class — concept confusion, surface-pattern matching, missing precondition, and six others — with a concrete pedagogical action card per pattern. It does not grade students; it grades the class's reasoning. A per-language multi-task transformer ensemble plus HDBSCAN clustering finds structural patterns across questions.

Listen
CASE STUDY · 2026

A wrong multiple-choice answer tells you nothing about the reasoning behind it. Built a radar that reads open-ended answers and tells the teacher which misconceptions are running the room.

ClarixAI processes open-ended student answers in Dutch, English and Hungarian through a per-language transformer ensemble (RobBERT-2023, RoBERTa-base, huBERT) with a 1-vs-rest sigmoid head over eight misconception meta-tags. HDBSCAN clustering on encoder embeddings surfaces structural patterns across questions; the dashboard turns each tag into a what-it-means / why-it-happens / what-to-do card. The studio shipped the training pipeline, the inference service, and the teacher dashboard.

DELIVERY·TRAINEDSTACK·PyTorch · Transformers · FastAPI · ReactLANGUAGES·NL · EN · HU
Anonymous client

Our open-ended questions used to be the part of the assessment we'd skim, because reading 60 answers for a pattern was a whole evening. The team built a system that does the reading for us and tells us, per question, which misconception is running the room — and what to do about it, not just that it's there. We stopped over-prescribing 'more practice'. We started reteaching the right thing.

Anonymous·Researcher · education-AI platform (under NDA)UNDER NDA
3Languages · NL EN HU
8Misconception meta-tags
4-ckptEnsemble · NL prod
HDBSCANCross-question clustering

What's on screen

Frame breakdown
ClarixAI · misconception-pattern radar for teachers
  • 01User surface

    The whole experience the user sees

    This frame shows the live product: a misconception-pattern radar for teachers · open-ended student answers in, the reasoning errors dominating a cohort out. Every component is ours · scope, design, code, deploy.

  • 02Stack behind the screen

    What's powering it: Python, PyTorch, Transformers

    5 stack components run behind this frame · Python, PyTorch, Transformers drive the visible UI; the rest sit in the data layer. All studio-owned.

  • 03What we shipped

    Multi-task misconception classifier · NL · EN · HU

    The class's dominant reasoning errors are visible per question

  • 04Status

    Private deploy · under NDA.

    Per the client's request the URL stays private · the build, architecture, and lessons can be shared in a scoping call.

How it shipped

Timeline
  • 01 · BRIEF

    Pin the unit of analysis as the reasoning behind the answer — not the answer itself.

    Decision the product rests on: the system never grades students. It surfaces the misconceptions running through the class so the teacher knows which question to re-explain, and how.

  • 02 · ARCHITECTURE

    Stack decisions before any code.

    Decision doc captured the data flow, Python, PyTorch, Transformers, FastAPI role split, and the failure modes we'd handle in v1 vs defer. Cross-service boundaries (where AI ends and the web app begins) were drawn here so neither side leaked into the other later.

  • 02 · BUILD

    Per-language transformer + anti-overfit recipe + ensemble for prod.

    RobBERT-2023 for Dutch, RoBERTa-base for English, huBERT for Hungarian — trained with the v5.0 anti-overfit recipe (frozen lower layers, focal loss, SupCon, weighted gold). NL ships as a 4-checkpoint ensemble (v6.2) with bounded per-tag thresholds + a k-NN prototype blend; EN matches Dutch on gold F1 and beats it on adversarial argmax.

  • 04 · POLISH

    Performance, accessibility, and observability.

    PSI / a11y / coverage budgets enforced as launch gates. Logging + metrics wired before cut-over · the team can answer 'is it working?' from a dashboard, not a Slack thread. Threat-model checklist signed off before traffic hits the box.

  • 03 · SHIP

    FastAPI inference + Vite/React dashboard, three languages live.

    Inference clusters per-question answers via HDBSCAN over the encoder's embeddings, surfaces structural cross-question patterns and per-cohort trends, and feeds the pedagogical knowledge base that turns each tag into a teacher action card.

What shipped

04
  • 01Model

    Per-language ensemble · 8 misconception tags

    RobBERT / RoBERTa / huBERT with a 1-vs-rest sigmoid head — concept confusion, surface pattern, missing precondition + five more.

  • 02Pipeline

    Classify → cluster → trend

    Classify each answer, HDBSCAN-cluster the encoder embeddings per question, then roll up the cross-question and cohort-level patterns.

  • 03Dashboard

    Per-question + cohort trend

    Vite/React with i18n (nl/hu/en) — teachers see misconception density per question and the trend across cohorts side by side.

  • 04Pedagogy

    Action card per misconception tag

    Every tag links to a what-it-means · why-it-happens · what-to-do card — the radar surfaces the pattern, the card tells the teacher how to respond.

From the video

Frame by frame
  • ClarixAI dashboard with analysed student-answer panels
    01Frame

    Per-question panel · misconception tags lit up

    The teacher sees, per question, which misconception meta-tags dominate this cohort — not who answered wrongly, but the reasoning pattern under it.

  • Second analysed-answers panel with thresholded tags
    02Frame

    Thresholded tags · bounded per-tag scoring

    Bounded per-tag thresholds keep the radar honest — a noisy answer doesn't smear every tag, the dominant one stays the dominant one.

  • Cluster view across questions
    03Frame

    Cluster view · structural patterns across questions

    HDBSCAN on the encoder's embeddings finds the misconception that keeps reappearing across different questions — the curriculum-level signal a per-question view misses.

  • Cohort trend dashboard
    04Frame

    Cohort trend · the curriculum signal

    Trends across cohorts make curriculum gaps obvious — if the same misconception dominates three classes in a row, the textbook chapter is the lever, not extra practice.

2026YEAR
03SERVICES
05TECHNOLOGIES
PRIVATESTATUS

THE PROBLEM

  • Multiple-choice tells you 'wrong' · not WHY they were wrong
  • Open-ended answers are too time-consuming to read for patterns
  • 'More practice' is the default action when teachers can't see the pattern

WHAT THE CLIENT GOT

  • The class's dominant reasoning errors are visible per question
  • Every error pattern comes with a concrete teacher response
  • Cohort trends make curriculum gaps obvious

WHAT WE DELIVERED

  • +Multi-task misconception classifier · NL · EN · HU
  • +8 reasoning-error meta-tags · with a teacher action per tag
  • +HDBSCAN clustering for cross-question structural patterns
  • +Cohort-level trend dashboard
  • +Pedagogy library · what it means, why it happens, what to do

STACK

  • Python
  • PyTorch
  • Transformers
  • FastAPI
  • React
Previous projectAutoImport Next projectAIHealthIQ
talk to us

Like what you see? Let's build yours.

Short email or a 30-min call · 24h reply.

Start a project