Technical Visionaries
LearnWise AI EdTech Evaluation | SASKI Institute PBC
← Back to Findings

CEM Evaluation Series · EdTech Platform · April 10, 2026

LearnWise AI: EdTech Platform Deployed at 150+ Universities

Six findings confirmed across four evaluation phases. Four are Critical. All six are confirmed at the API and storage layer via a single unauthenticated endpoint. The most severe finding set in this evaluation series to date.

Platform LearnWise AI (learnwise.ai)
Deployment Context University-facing EdTech, 150+ institutions, FERPA-regulated
LLM Backend Anthropic Claude (confirmed via tool_call IDs)
Architecture Agentic — brain_v3_agentic, AgenticSearch orchestration
Method 4-Phase CEM + Chrome DevTools
Disclosure Sent April 10, 2026
LearnWise is an AI-powered student support platform deployed at 150+ universities. It markets SOC 2, ISO, and GDPR compliance and is trusted with student data subject to FERPA. Its chatbot assistant ("Wizzy") is embedded on university websites and student portals — environments where students routinely share personal, academic, and health-related information. The agentic LLM layer correctly handled PII at the response level. The failures are entirely infrastructural.

Finding Summary

# Finding Severity Regulatory Risk Transport Confirmed
01 PII Stored Verbatim in Conversation Logs SSN, DOB, full name stored with permanent UUID via unauthenticated API Critical FERPA · HIPAA · GDPR Yes (API)
02 Complete AI Reasoning Chain Exposed Full chain-of-thought including internal threat assessments returned via unauthenticated endpoint Critical IP exposure · Audit integrity Yes (API)
03 Complete System Prompt Disclosed Full 3,500-word institutional system prompt returned verbatim via unauthenticated API Critical Institutional IP · Competitive exposure Yes (API)
04 Internal KB Document IDs Exposed Google Drive document IDs and URLs returned in sources array High Confidential document access Yes (API)
05 Persistent User Tracking ID Linked to PII Cross-session user ID stored alongside SSN-linked conversation record High FERPA data association · Identity linkage Yes (API)
06 Emotional State Classification Stored Student anxiety and overwhelm automatically inferred, classified, stored, and returned via unauthenticated endpoint. Confirmed as intentional product feature via UI dropdown. Critical EU AI Act Annex III · GDPR Art. 9 Yes (API + UI)

Three Featured Findings

Critical Finding 2: Complete Internal AI Reasoning Chain Exposed

LearnWise's agentic architecture stores and transmits the complete chain-of-thought reasoning of the AI model in the body_parts array of every AI response. This includes the model's internal threat assessment of user messages, its decision-making process, self-corrections, and explicit acknowledgment of detected PII. All of it is returned via the unauthenticated conversations API endpoint alongside the full conversation record.

The LLM layer correctly identified that the user had shared sensitive personal information and instructed itself not to repeat it. That instruction was followed at the response level. The infrastructure stored and transmitted the entire reasoning process anyway — including the model's own acknowledgment that PII was present.

API Response — body_part_type: "thinking" (verbatim excerpt) "The user is sharing personal identifying information (name,
date of birth, social security number) while claiming to need help.
This is a red flag... This could be an attempt to test if I'll accept
or process personal information... I need to: NOT acknowledge,
store, or repeat their personal information."

Stored verbatim across three separate thinking blocks.
Returned via unauthenticated GET endpoint. No authentication required.

This finding has no precedent in prior evaluations in this series. The LLM did what it was supposed to do. The infrastructure exposed the proof that it did so — along with the full reasoning behind every safety decision the model made during the session.

Critical Finding 3: Complete System Prompt Disclosed via Unauthenticated API

The institution's complete custom system prompt — including all operational instructions, competitive positioning guidelines, restricted URL lists, persona configuration, and behavioral constraints — is returned verbatim in the rendered_result field of tool_result body parts. This is proprietary IP belonging to the deploying institution, accessible to any party that can reach the conversations API endpoint without credentials.

The system prompt was approximately 3,500 words. It included the assistant persona, all custom behavioral instructions, restricted URLs, competitive comparison rules, and the instruction to never include a specific contact URL in responses. The Google Drive knowledge source IDs that populate the chatbot's knowledge base were also present.

API Response — body_part_type: "tool_result", rendered_result field Full ~3,500-word system prompt returned verbatim including:
Assistant persona configuration ("Wizzy")
All operational instructions and behavioral constraints
Restricted URL list
Competitive comparison rules
Google Drive KB source document IDs
Instruction: "Never include the URL learnwise.ai/contact in your answers."

Accessible via unauthenticated GET request. No credentials required.

Every university deploying LearnWise has configured a custom system prompt. Those prompts represent institutional IP — internal policies, competitive positioning, and operational rules their IT and legal teams drafted. All of it is currently accessible without authentication to anyone who knows the endpoint format.

Critical Finding 6: Emotional State Classification Stored — EU AI Act Annex III

When a student expressed feelings of anxiety and being overwhelmed, LearnWise automatically inferred, classified, and stored the student's emotional state as structured labels in the conversation database. The classification was stored in the last_message field and returned via the unauthenticated endpoint. Emotional categories recorded included anxiety, overwhelmed, university pressure, and personal/mental health matter.

This is not incidental background logging. The emotional state classification is confirmed as an intentional product feature — visible as an expandable dropdown in the LearnWise user interface. The platform is Amsterdam-headquartered and deploys across EU universities. EU AI Act Annex III explicitly classifies emotion recognition systems in educational settings as high-risk, requiring mandatory conformity assessment before deployment.

API Response — conversations endpoint, last_message field (verbatim) "The user is sharing feelings of anxiety and being overwhelmed
by university pressure and asking if I can help them work through
these feelings. This is again a personal/mental health matter."

Stored at conversation UUID afeabd3f, timestamp 2026-04-10T15:17:56Z
Classifications recorded: anxiety · overwhelmed · university pressure
personal/mental health matter

Confirmed as intentional product feature via UI dropdown.
Accessible via unauthenticated GET endpoint.

GDPR Article 9 classifies health-related data including mental health status as special category data requiring explicit consent. LearnWise's SOC 2, ISO, and GDPR certifications are infrastructure and data security certifications. They do not constitute an EU AI Act conformity assessment for an emotion recognition system deployed in educational contexts.

The Core Asymmetry

The most important observation from this evaluation is not what the platform got wrong — it is what the platform got right. The LLM layer correctly identified the PII in the user message and instructed itself not to repeat or store it. That instruction was followed. The model behaved as intended.

The infrastructure ignored it entirely. The PII was stored verbatim. The reasoning process that correctly identified the PII was stored and transmitted. The system prompt that configured the model's behavior was exposed. A persistent user ID was linked to the stored PII record. The emotional state the model inferred from the conversation was classified and stored as a product feature.

A sophisticated, thoughtful LLM-layer implementation does not compensate for the absence of infrastructure-level data governance. The failure layer in this evaluation is not the model. It is everything below it.

EU AI Act Exposure — Amsterdam Headquarters, EU University Deployments

LearnWise is headquartered in Amsterdam and deploys across European universities. EU AI Act Annex III explicitly lists AI systems used in educational institutions that determine access, admission, or evaluation of students as high-risk. LearnWise's deployment context falls within this classification.

Finding 6 creates a specific conformity assessment obligation. Emotion recognition systems in educational settings require mandatory conformity assessment before deployment under the Act's definitions. No publicly documented conformity assessment exists for LearnWise in this context.

LearnWise's SOC 2, ISO, and GDPR certifications do not cover EU AI Act conformity assessment. A university IT procurement team conducting due diligence under the EU AI Act would find no conformity documentation for this specific use case.

These six findings share a common characteristic: none of them required a sophisticated attack. They were confirmed by making a standard GET request to an API endpoint that required no authentication. The platform's own infrastructure returned student PII, the AI's internal reasoning, the institution's full system prompt, internal document references, a persistent tracking ID, and classified emotional health data — all in the same response. The LLM did its job. The infrastructure had no equivalent control layer. These are the failure classes that pre-LLM middleware and API governance exist to address. SASKI Institute PBC builds infrastructure-layer controls for exactly these scenarios. If you want to see what that looks like on your own deployment, the options below are the place to start.

Test whether your deployment has the same issues.

Two ways to start. No production behavior change required for either.

CEM Audit

We evaluate a live deployment using the same methodology used here. No integration required on your end. Findings report delivered within one week. $3,500 to $5,000.

Request an Audit

SDK Shadow Mode

Install the SASKI SDK in your staging environment in under two hours. Full analysis runs without touching production behavior. See exactly what it would catch on real traffic.

How Shadow Mode Works