CMU-CS-25-112
Computer Science Department
School of Computer Science, Carnegie Mellon University



CMU-CS-25-112

Democratizing On-Device LLM Inference with
Machine Learning Compilers and Web Technologies

Charlie F. Ruan

M.S. Thesis

May 2025

CMU-CS-25-112.pdf


Keywords: Large Language Models, LLM Inference, Machine Learning Compiler, WebAssembly, WebGPU, Edge Computing

Large language models (LLMs) have traditionally relied on cloud-based inference due to their high computational and memory demands. However, recent advances in small LLMs and consumer hardware capabilities have made on-device inference increasingly practical. Among potential deployment targets, the web browser stands out as a uniquely compelling platform: it is universally accessible, naturally abstracts out hardware heterogeneity, requires no dependency installation for web applications, and provides a natural agentic environment for task automation.

WebLLM is a high-performance TypeScript framework that enables LLM inference entirely within client-side web browsers. WebLLM compiles LLMs ahead of time using the MLC-LLM and Apache TVM compiler stack to generate optimized WebGPU kernels and a portable WebAssembly runtime. WebLLM exposes a familiar OpenAI-style API, supports efficient GPU acceleration, and integrates seamlessly with browser environments using Web Workers and WebAssembly. To enable structured generation, which is especially challenging for small LLMs, WebLLM incorporates XGrammar, an efficient grammar-constrained decoding engine, allowing developers to enforce output formats such as JSON or DSLs with near-zero overhead. Together, these components demonstrate a path toward democratizing LLM access, making intelligent, private, and responsive AI experiences universally available through the web.

42 pages

Thesis Committee:
Tianqi Chen (Chair)
Zihao Jia

Srinivasan Seshan, Head, Computer Science Department
Martial Hebert, Dean, School of Computer Science


Return to: SCS Technical Report Collection
School of Computer Science

This page maintained by reports@cs.cmu.edu