DOSSIER №001 · WE-Meet 1st SEOUL · KR · 2025-2026

AI is smart, we get it. But does it actually help with my coursework?

An AI model benchmark built for Korean university students. We evaluate what existing benchmarks fail to measure — "does it actually help the student?" — and guide you toward the right cost · right performance AI for your situation.

Program
SNU WE-Meet 1st Cohort
Host
SoonsoonFactory (순순팩토리)
Tracks
Academic · Learning Assistant
Stack
SAM Platform · Python · GH Actions
Status
collecting problems

It starts with two questions.

Question · 01

Does this AI actually help with university-level assignments, projects, and papers?

Question · 02

Does this AI guide students well so they learn and grow on their own?

M/01

Student-Centered Evaluation

We measure not "how smart it is" but "how much it actually helps me."

M/02

Right Cost, Right Performance

The most expensive model isn't always the best. We find the optimal value for your situation.

M/03

Empirical Verification via SAM

We directly compare and expose the gap between public benchmark scores and real student experience.

M/04

A Public Good, Built by Students

The problems I struggled with become part of the benchmark — helping other students too.

M/05

Korean Educational Context

We capture Korean academic writing style, Korean textbooks, and the way Korean students actually ask questions.

M/06

A Living Dataset

New problems flow in every semester — evolving with the times and curriculum.

Existing benchmarks have a blind spot.

On the difficulty spectrum, the "undergraduate years 2-4" segment is structurally empty. Between high school problems and PhD research — no one is measuring the actual learning reality of students.

Benchmark difficulty spectrum — illustrating the missing undergraduate segment

The Missing Gap

An undergraduate-level benchmark does not exist.

01

Multiple choice only.

Real assignments are essays, coding projects, and reports — yet benchmarks only test picking the right answer from four options.

02

Only the final answer is graded.

Students want step-by-step solutions and explanations, but benchmarks call it done if the final answer is correct.

03

English only.

No benchmark measures Korean academic writing style or the context of Korean textbooks.

04

"Helpfulness" is never asked.

A correct answer ≠ a helpful answer. Explanatory power, references, and error acknowledgment — no one measures these.

Two tracks, one question.

Student helpfulness is measured across two dimensions — Outcome and Process.

Track / 01 NOW RECORDING

Academic Task Benchmark

Real university assignment benchmark

"Can I actually use this AI for my assignments?"

Outcome-Oriented
  • Major-specific problem-solving accuracy
  • Quality of step-by-step explanations
  • Coding assignment resolution
  • Paper summarization & structuring
  • Report writing assistance
  • Korean academic writing style
Track / 02 NEXT PHASE

Learning Assistant Benchmark

AI as a learning tutor benchmark

"Does this AI help me learn better?"

Process-Oriented
  • Ability to explain concepts simply
  • Adapting to the student's level
  • Prompting independent thinking
  • Providing hints (preventing answer spoon-feeding)
  • Study plan creation
  • Motivation and feedback

Score Architecture

Student Helpfulness Score Composite helpfulness metric
Academic Task Score Real assignment helpfulness
Learning Assistant Score Learning guidance helpfulness

What is SAM.

SAM (Smart AI Multiplexer) is an AI routing platform developed by SoonsoonFactory (순순팩토리). Through a single API, it calls 30+ AI models — GPT, Claude, Gemini, DeepSeek, and more — automatically selecting the optimal model based on purpose, cost, and performance.

This project runs all benchmark evaluations through SAM. It enables submitting the same problem under identical conditions to multiple models simultaneously, then comparing and analyzing the results.

Visit SAM Platform ↗
F/01

Single API, 30+ Models

GPT · Claude · Gemini · DeepSeek · Kimi and more — all through one endpoint

F/02

Automatic Routing

Automatically selects the optimal model based on request type (coding · reasoning · creative) and budget

F/03

Real-time Model Ranking

Continuously updated OVR · Chat · Code · Reason scores based on public benchmarks

F/04

Cost Transparency

Real-time comparison of per-model input/output token pricing — enabling budget-conscious choices for students

30+ Models
1 API Endpoint
4 Score Dimensions
Real-time Price Comparison

The gap between SAM rankings and real experience.

Through SoonsoonFactory's AI routing service SAM, we directly call and evaluate virtually every available AI API. SAM already maintains rankings based on public benchmarks — we measure the gap between those scores and what undergraduate students actually experience.

30+ Models accessible via single API
# Model OVR Chat Code Reason Price / M tok
01 Claude Opus 4.7 94.0959693 $5.5 / 27.5
02 GPT-5.4 Pro 92.7949496 $15 / 120
04 Gemini 3.1 Pro 90.8949093 $2 / 12
07 Kimi K2.6 84.0879375 $0.6 / 2.5
10 DeepSeek V4 Flash 72.7777865 $0.14 / 0.28
12 DeepSeek V3.2 68.6727361 $0.62 / 1.85
SAM Platform ranking basis Last updated: May 12, 2025 · Check latest data at sam.soonsoon.ai ↗

Probe / 01

Is the model with OVR 94 really #1 for undergraduate assignments too?

Probe / 02

Isn't the $0.14 model good enough for simple assignments?

Probe / 03

Does a CODE score of 93 translate to equal performance on real undergraduate coding tasks?

Core Question

Which model is the best value for my specific situation?

Student's first principle

Can I choose the right resource for my purpose and study or research at an 'appropriate' cost?

Seoul National University WE-Meet

WE-Meet industry-academia program — mentors and students collaborating on research

WE-Meet (위밋) is an industry-academia collaboration project where companies and universities partner so that students solve real industry problems, build practical skills, and earn academic credit.

Format
Corporate mentor guidance + team project execution → performance evaluation (S/U grading)
Features
Official course credit, hands-on mentoring, real industry problem-solving
Domains
Next-gen Semiconductors · Big Data · Green Bio · AI
This Project
SoonsoonFactory (순순팩토리) × SNU — AI utilization mentoring & SAM service research

1st Cohort Members.

1 mentor + 2 student researchers. All records are public via GitHub issues and PRs. At cohort completion, the dataset and results will be published as an open benchmark.

Mentor Lead · 01
송용성 profile 01/03 · MENTOR

송용성 Song Yongsung

CEO, SoonsoonFactory (순순팩토리)

Adjunct Professor, Dept. of AI Content Engineering, Kangwon National University

Drawing on experience in AI service development and operations, he provides students with practical, industry-perspective AI mentoring. He leads SAM platform development and oversees technical advisory and direction-setting for this project.

Student Researcher Cohort · 01
02/03 · STUDENT SNU · CSE · Y2

김태운 Kim Taewoon

Seoul National University, Computer Science & Engineering, Year 2

Track — Academic Coding

Responsible for benchmark research, execution script development, and coding category problem design.

Student Researcher Cohort · 01
03/03 · STUDENT SNU · MATSE · Y2

김호윤 Kim Hoyoon

Seoul National University, Materials Science & Engineering, Year 2

Track — STEM / Visualization

Responsible for benchmark research, STEM problem design, and results analysis & visualization.