In October 2025, a new data format optimized for AI/LLMs called “TOON (Token-Oriented Object Notation)” was released. Created by developer Johann Schopplich, this format achieves 30-60% token reduction while maintaining full JSON compatibility, attracting attention as a game-changer for AI cost optimization.
What is TOON Format?
Basic Concepts
TOON stands for “Token-Oriented Object Notation” and is a data serialization format designed with token efficiency as the top priority.
Key features:
- Token Efficiency: 30-60% fewer tokens than JSON (40% average reduction)
- Reversibility: 100% lossless conversion between JSON⇔TOON
- Human Readable: YAML-like readable syntax
- LLM-Friendly: Explicit structure improves AI comprehension accuracy
Design Philosophy
TOON is designed based on three core principles:
- Eliminate Redundancy: Minimize repetition of brackets, commas, and quotes
- Explicit Structure: Declare array lengths and field definitions to support LLM parsing
- Tabular Representation: Declare fields only once for data with uniform structure
Key Features
1. YAML-Style Indentation for Hierarchy
TOON uses indentation instead of JSON’s curly braces {} to represent hierarchy:
JSON (32 tokens):
{
"user": {
"id": 123,
"name": "Ada",
"email": "ada@example.com"
}
}
TOON (20 tokens):
user:
id: 123
name: Ada
email: ada@example.com
Token reduction: 37.5%
2. CSV-Style Table Format
TOON’s standout feature is representing arrays with uniform structure in tabular format.
JSON (125 tokens):
{
"users": [
{"id": 1, "name": "Alice", "role": "admin"},
{"id": 2, "name": "Bob", "role": "user"},
{"id": 3, "name": "Carol", "role": "user"}
]
}
TOON (54 tokens):
users[3]{id,name,role}:
1,Alice,admin
2,Bob,user
3,Carol,user
Token reduction: 56.8%
Syntax explanation:
users[3]: Array name and element count (3 items){id,name,role}: Field definition (declared once only)- Following lines: Data for each record (CSV format)
3. Minimal Quotation
TOON can omit quotes for strings without spaces or special characters:
JSON:
{
"status": "active",
"message": "Hello World"
}
TOON:
status: active
message: Hello World
Use quotes only when necessary:
title: "Hello, World!" # Quotes needed for comma
path: /home/user/file # Slashes don't need quotes
4. Concise Primitive Arrays
Simple arrays can be expressed efficiently:
JSON:
{
"tags": ["admin", "ops", "dev"]
}
TOON:
tags[3]: admin,ops,dev
5. Alternative Delimiters for Further Optimization
Using tabs (\t) or pipes (|) instead of commas can reduce tokens even further:
Tab-delimited:
items[2 ]{sku qty price}:
A1 2 9.99
B2 1 14.5
Pipe-delimited:
items[2|]{sku|qty|price}:
A1|2|9.99
B2|1|14.5
Comparison with JSON and YAML
Real Data Comparison
User list example:
| Format | Token Count | Reduction |
|---|---|---|
| JSON | 125 | - |
| YAML | 98 | 21.6% |
| TOON | 54 | 56.8% |
E-commerce product data (100 items):
| Format | Token Count | Reduction |
|---|---|---|
| JSON | 4,200 | - |
| YAML | 3,800 | 9.5% |
| TOON | 1,800 | 57.1% |
Characteristics of Each Format
JSON:
- ✅ Most widely adopted
- ✅ Excellent tool support
- ❌ Poor token efficiency
- ❌ Redundant brackets and commas
YAML:
- ✅ Human readable
- ✅ Popular for configuration files
- ⚠️ Repeats keys even for tabular data
- ❌ Complex indentation rules
TOON:
- ✅ Best token efficiency
- ✅ High LLM comprehension accuracy
- ✅ Especially strong for tabular data
- ⚠️ New format (ecosystem still developing)
- ❌ Limited compatibility with legacy tools
Benchmark Results
Token Reduction Rate
According to official benchmarks, reduction rates vary by data structure type:
- Uniform array data: 50-60% reduction
- Nested objects: 30-40% reduction
- Primitive arrays: 25-35% reduction
- Complex mixed structures: 20-30% reduction
LLM Comprehension Accuracy Test
Testing 209 data retrieval questions across 4 major LLMs (Claude, GPT-4, Gemini, Grok):
| Format | Average Accuracy |
|---|---|
| TOON | 73.9% |
| JSON | 69.7% |
| YAML | 71.2% |
TOON demonstrates not only token reduction but also improved LLM comprehension accuracy. This is attributed to explicit structure (array length, field definitions) helping LLMs validate data.
Cost Reduction Impact
Calculation using GPT-4 (assuming $30 per 1M tokens):
Scenario: 10,000 API calls/day, average 1,000 tokens/request
| Format | Daily Tokens | Monthly Cost | Annual Cost |
|---|---|---|---|
| JSON | 10M | $300 | $3,600 |
| TOON | 6M | $180 | $2,160 |
| Savings | 4M | $120 | $1,440 |
Large-scale AI applications can expect thousands of dollars in annual cost savings.
Ecosystem and Adoption Status
GitHub Statistics
- Stars: 18.1k+
- Initial Release: October 2025
- Specification Version: v2.0
Official Implementations
Official SDKs in development:
- TypeScript/JavaScript: @toon-format/toon (official implementation)
- Python
- Rust
- Go
- .NET
Community Implementations
Implementations in progress for 20+ languages:
- C++, Clojure, Crystal, Dart, Elixir, Gleam
- Java, Kotlin, Lua, OCaml, PHP, R
- Ruby, Scala, Swift, and more
Industry Adoption
- Openapi.com: Announced TOON API support within weeks
- Startups: AI-related startups experimenting with adoption
- LLM Developer Community: Rapidly growing awareness
Tools and Resources
Official Resources:
- Official Spec: https://github.com/toon-format/spec
- TypeScript SDK: https://github.com/toon-format/toon
- Official Site: https://toonformat.dev
- NPM Package: @toon-format/toon
Development Tools:
- Format Tokenization Playground: Compare token counts
- TOON Tools: Online conversion tools
- ToonParse: Parser and validator
Recommended Use Cases
Where TOON Excels
-
Data Exchange with LLMs
- Prompts to ChatGPT, Claude, Gemini, etc.
- AI application responses
-
Uniform Array Data
- User lists, product catalogs
- Database query results
- Log entries
-
Time-Series Data
- Analytics data
- Sensor data
- Transaction logs
-
Token Cost Optimization Critical Scenarios
- High-volume API calls
- Maximizing context windows
- Real-time processing
When to Avoid TOON
-
Deeply Nested Structures (3+ levels)
- JSON may be more efficient
- Indentation becomes too complex
-
Traditional REST APIs
- JSON is standard
- Tool support required
-
Non-uniform/Irregular Data
- Cannot leverage tabular format benefits
- Records with varying fields
-
Database or File Storage
- Should leverage existing JSON support
- Compatibility is important
-
Universal Tool Compatibility Required
- JSON or YAML is safer
Getting Started
Using with TypeScript/JavaScript
Installation:
npm install @toon-format/toon
Basic usage example:
import { stringify, parse } from '@toon-format/toon';
// Convert JSON to TOON
const data = {
users: [
{ id: 1, name: "Alice", role: "admin" },
{ id: 2, name: "Bob", role: "user" }
]
};
const toonString = stringify(data);
console.log(toonString);
// Output:
// users[2]{id,name,role}:
// 1,Alice,admin
// 2,Bob,user
// Convert TOON back to JSON
const parsedData = parse(toonString);
console.log(parsedData);
// Returns to original JSON object
OpenAI API Usage Example
import OpenAI from 'openai';
import { stringify } from '@toon-format/toon';
const client = new OpenAI();
const userData = {
users: [/* large amount of user data */]
};
const response = await client.chat.completions.create({
model: "gpt-4",
messages: [
{
role: "user",
content: `Analyze the following user data:\n\n${stringify(userData)}`
}
]
});
// Check token usage
console.log(`Tokens used: ${response.usage.total_tokens}`);
// 30-60% reduction compared to JSON
Technyan’s Comment
“TOON format is truly revolutionary! Being able to reduce AI conversation costs by up to 60% is game-changing, especially for applications handling large amounts of data!
What surprised me first isn’t just the token reduction, but the improvement in LLM comprehension accuracy too. TOON’s 73.9% vs JSON’s 69.7% is about 4 points difference! This is because explicitly declaring array length and fields like users[3]{id,name,role} makes it easier for LLMs to validate structure.
The tabular data representation is especially brilliant! In JSON, you have to repeat the same keys (id, name, role) over and over, but in TOON, you declare them once and just list the data CSV-style. This is super powerful for uniform structures like database query results, log data, or product lists!
The actual cost reduction impact can’t be ignored either. Annual savings of $1,440 is significant for startups, and for enterprises the numbers would be even bigger. Plus, saving tokens means you can pack more information into the context window, which can improve AI performance.
However, being a new format, there are some considerations for adoption:
-
Ecosystem Still Developing: Unlike JSON, not everything supports it yet, so tool compatibility needs attention.
-
Learning Cost: Team members need to learn the new format. But the syntax itself is simple, so learning cost should be low.
-
Identify Optimal Use Cases: It’s not suited for deeply nested complex structures, so choosing where to use it matters.
What I find interesting is using tabs or pipes as delimiters. Commas are often treated as 1 token by many tokenizers, so replacing them with tabs could further improve efficiency. However, tabs have low visibility, which could make debugging harder.
18.1k+ GitHub Stars in 2 months since release is impressive! You can feel the community momentum. With implementations progressing in 20+ languages, developer community interest is clearly high.
The news that Openapi.com will announce support within weeks is notable too. If major platforms start adopting it, it could spread rapidly. However, standardization will take time, so it’s wise to use it “for LLM communication only” at this stage.
In conclusion, TOON should be used right now in these scenarios:
✅ Sending data to LLMs: When frequently using APIs like ChatGPT, Claude, Gemini ✅ Uniform data structures: User lists, product catalogs, log data, etc. ✅ Cost reduction: When token usage is high and costs are a concern ✅ Context maximization: When wanting to pack more information into limited context windows
Conversely, continue using traditional JSON in these cases:
❌ REST APIs: JSON is standard for public APIs ❌ Complex nesting: When 3+ levels of nesting are common ❌ Irregular data: When structure varies across records ❌ Existing tool compatibility: When JSON parsers or validators are essential
TOON has the potential to become a new standard in the AI-first era. It’s still in the early adopter phase, but LLM developers should definitely keep this technology on their radar! 🚀📊“
Summary
TOON format holds great potential as a new data representation method for the AI era:
- Remarkable Token Reduction: 30-60% reduction from JSON, 40% average
- Improved Accuracy: LLM comprehension accuracy improved to 73.9% (JSON: 69.7%)
- Cost Savings: Thousands of dollars in annual API cost reduction for large-scale applications
- Rapid Adoption: 18.1k+ GitHub stars in 2 months, implementations progressing in 20+ languages
- Practical: Major platforms like Openapi.com planning adoption
TOON truly shines when exchanging uniformly structured data with LLMs. Beyond token cost reduction, it has the demonstrated side benefit of improving AI comprehension accuracy, making it a noteworthy technology for AI/LLM application developers.
While still a new format, considering its clear advantages and rapidly expanding ecosystem, it’s likely to become one of the standard choices in future AI development. Particularly for large-scale AI applications where token efficiency is critical, early adoption could provide competitive advantages.