wemeet-uni-bench — AI Benchmark for Korean University Students

§01 / Vision

It starts with two questions.

Question · 01

Does this AI actually help with university-level assignments, projects, and papers?

Question · 02

Does this AI guide students well so they learn and grow on their own?

M/01

Student-Centered Evaluation

We measure not "how smart it is" but "how much it actually helps me."

M/02

Right Cost, Right Performance

The most expensive model isn't always the best. We find the optimal value for your situation.

M/03

Empirical Verification via SAM

We directly compare and expose the gap between public benchmark scores and real student experience.

M/04

A Public Good, Built by Students

The problems I struggled with become part of the benchmark — helping other students too.

M/05

Korean Educational Context

We capture Korean academic writing style, Korean textbooks, and the way Korean students actually ask questions.

M/06

A Living Dataset

New problems flow in every semester — evolving with the times and curriculum.

§02 / Problem

Existing benchmarks have a blind spot.

On the difficulty spectrum, the "undergraduate years 2-4" segment is structurally empty. Between high school problems and PhD research — no one is measuring the actual learning reality of students.

Benchmark difficulty spectrum — illustrating the missing undergraduate segment

01

Multiple choice only.

Real assignments are essays, coding projects, and reports — yet benchmarks only test picking the right answer from four options.

02

Only the final answer is graded.

Students want step-by-step solutions and explanations, but benchmarks call it done if the final answer is correct.

03

English only.

No benchmark measures Korean academic writing style or the context of Korean textbooks.

04

"Helpfulness" is never asked.

A correct answer ≠ a helpful answer. Explanatory power, references, and error acknowledgment — no one measures these.

§03 / Two Tracks

Two tracks, one question.

Student helpfulness is measured across two dimensions — Outcome and Process.

Track / 01 NOW RECORDING

Academic Task Benchmark

Real university assignment benchmark

"Can I actually use this AI for my assignments?"

Outcome-Oriented

Major-specific problem-solving accuracy
Quality of step-by-step explanations
Coding assignment resolution
Paper summarization & structuring
Report writing assistance
Korean academic writing style

Track / 02 NEXT PHASE

Learning Assistant Benchmark

AI as a learning tutor benchmark

"Does this AI help me learn better?"

Process-Oriented

Ability to explain concepts simply
Adapting to the student's level
Prompting independent thinking
Providing hints (preventing answer spoon-feeding)
Study plan creation
Motivation and feedback

Score Architecture

Student Helpfulness Score Composite helpfulness metric

Academic Task Score Real assignment helpfulness

Learning Assistant Score Learning guidance helpfulness

§04 / Platform

What is SAM.

SAM (Smart AI Multiplexer) is an AI routing platform developed by SoonsoonFactory (순순팩토리). Through a single API, it calls 30+ AI models — GPT, Claude, Gemini, DeepSeek, and more — automatically selecting the optimal model based on purpose, cost, and performance.

This project runs all benchmark evaluations through SAM. It enables submitting the same problem under identical conditions to multiple models simultaneously, then comparing and analyzing the results.

Visit SAM Platform ↗

F/01

Single API, 30+ Models

GPT · Claude · Gemini · DeepSeek · Kimi and more — all through one endpoint

F/02

Automatic Routing

Automatically selects the optimal model based on request type (coding · reasoning · creative) and budget

F/03

Real-time Model Ranking

Continuously updated OVR · Chat · Code · Reason scores based on public benchmarks

F/04

Cost Transparency

Real-time comparison of per-model input/output token pricing — enabling budget-conscious choices for students

30+ Models

1 API Endpoint

4 Score Dimensions

Real-time Price Comparison

§05 / Verification

The gap between SAM rankings and real experience.

Through SoonsoonFactory's AI routing service SAM, we directly call and evaluate virtually every available AI API. SAM already maintains rankings based on public benchmarks — we measure the gap between those scores and what undergraduate students actually experience.

30+ Models accessible via single API

#	Model	OVR	Chat	Code	Reason	Price / M tok
01	Claude Opus 4.7	94.0	95	96	93	$5.5 / 27.5
02	GPT-5.4 Pro	92.7	94	94	96	$15 / 120
04	Gemini 3.1 Pro	90.8	94	90	93	$2 / 12
07	Kimi K2.6	84.0	87	93	75	$0.6 / 2.5
10	DeepSeek V4 Flash	72.7	77	78	65	$0.14 / 0.28
12	DeepSeek V3.2	68.6	72	73	61	$0.62 / 1.85

SAM Platform ranking basis Last updated: May 12, 2025 · Check latest data at sam.soonsoon.ai ↗

Probe / 01

Is the model with OVR 94 really #1 for undergraduate assignments too?

Probe / 02

Isn't the $0.14 model good enough for simple assignments?

Probe / 03

Does a CODE score of 93 translate to equal performance on real undergraduate coding tasks?

Core Question

Which model is the best value for my specific situation?

Student's first principle

Can I choose the right resource for my purpose and study or research at an 'appropriate' cost?

§06 / Program

Seoul National University WE-Meet

WE-Meet industry-academia program — mentors and students collaborating on research

WE-Meet (위밋) is an industry-academia collaboration project where companies and universities partner so that students solve real industry problems, build practical skills, and earn academic credit.

Format: Corporate mentor guidance + team project execution → performance evaluation (S/U grading)
Features: Official course credit, hands-on mentoring, real industry problem-solving
Domains: Next-gen Semiconductors · Big Data · Green Bio · AI
This Project: SoonsoonFactory (순순팩토리) × SNU — AI utilization mentoring & SAM service research

§07 / Team

1st Cohort Members.

1 mentor + 2 student researchers. All records are public via GitHub issues and PRs. At cohort completion, the dataset and results will be published as an open benchmark.

Mentor Lead · 01

송용성 Song Yongsung

CEO, SoonsoonFactory (순순팩토리)

Adjunct Professor, Dept. of AI Content Engineering, Kangwon National University

Drawing on experience in AI service development and operations, he provides students with practical, industry-perspective AI mentoring. He leads SAM platform development and oversees technical advisory and direction-setting for this project.

soonsoon@soonsoons.com soonsoon.ai ↗ LinkedIn ↗

Student Researcher Cohort · 01

02/03 · STUDENT 김 SNU · CSE · Y2

김태운 Kim Taewoon

Seoul National University, Computer Science & Engineering, Year 2

Track — Academic Coding

Responsible for benchmark research, execution script development, and coding category problem design.

listro002@snu.ac.kr

Student Researcher Cohort · 01

03/03 · STUDENT 김 SNU · MATSE · Y2

김호윤 Kim Hoyoon

Seoul National University, Materials Science & Engineering, Year 2

Track — STEM / Visualization

Responsible for benchmark research, STEM problem design, and results analysis & visualization.

khoyun007@gmail.com