Semantic Parser

The @lokascript/semantic package parses multilingual
hyperscript directly into executable structures — without
requiring an English intermediate step.

Semantic vs i18n

The two packages serve different purposes:

	Semantic Parser	i18n Grammar
Purpose	Parse for execution	Transform for display
Direction	Any language → AST	Any language ↔ any language
Output	Structured semantic nodes	Translated text string
Confidence	Scored (0–1)	Deterministic
Use case	Runtime, adapter plugin	Documentation, teaching

Use the semantic parser when you need to execute
multilingual code. Use i18n when you need to translate code
for display or documentation.

How It Works

The parser uses a three-phase pipeline:

1. Tokenization

Input is split into tokens using language-aware rules:

English: "toggle .active"
→ [toggle (keyword), .active (selector)]

Japanese: ".active を 切り替え"
→ [.active (selector), を (particle), 切り替え (keyword)]

Each language has its own tokenization strategy. Japanese uses
particles (を, に, で) as word boundaries. English uses
spaces. Arabic handles right-to-left text.

2. Pattern Matching

Tokens are matched against language-specific command patterns:

English pattern (SVO):

[COMMAND_KEYWORD] [SELECTOR]
→ action=toggle, patient=.active

Japanese pattern (SOV):

[SELECTOR] [PARTICLE:を] [COMMAND_KEYWORD]
→ patient=.active, action=toggle

The same semantic roles are extracted regardless of language —
the patterns just account for different word orders.

3. Confidence Scoring

Each match produces a confidence score (0–1):

≥ 0.7 — High confidence. Use the semantic result.
0.5–0.7 — Medium. May fall back to traditional parser.
< 0.5 — Low. Falls back to original text.

SOV languages (Japanese, Korean, Turkish) naturally produce
lower scores due to greater structural ambiguity. Use
per-language thresholds to tune this.

API

`parse(code, options?)`

Parse hyperscript with language-aware semantic analysis:

import { parse } from '@lokascript/semantic';

const result = parse('on click toggle .active on me', {
  language: 'en',
  confidenceThreshold: 0.7
});

console.log(result.confidence); // 0.98
console.log(result.language);   // 'en'
console.log(result.ast);        // Parsed AST

`detect(code)`

Auto-detect the language of hyperscript code:

import { detect } from '@lokascript/semantic';

const lang = detect('クリック で 私 の .active を 切り替え');
// { language: 'ja', confidence: 0.95 }

`translate(code, fromLang, toLang)`

Parse in one language and render in another:

import { translate } from '@lokascript/semantic';

const english = translate('.active を 切り替え', 'ja', 'en');
// → 'toggle .active'

Language-Specific Features

Japanese

Particles (を, に, から, で) mark semantic roles
No spaces — word boundaries from particle positions
Morphological normalization handles conjugated verb forms
(e.g., 切り替えて → 切り替え)

Korean

Similar particle system to Japanese (을/를, 에, 에서)
Vowel-dependent particle selection (을 vs 를)

Arabic

Right-to-left text processing
VSO word order (verb first)
Diacritics stripped during normalization

Turkish

Agglutinative suffixes instead of separate particles
Vowel harmony in suffix selection

When the Semantic Parser Is Used

LokaScript runtime — The core runtime uses the semantic
parser to handle _="..." attributes in non-English
languages
Adapter plugin — The @lokascript/hyperscript-adapter
uses semantic parsing to translate code before the original
_hyperscript runtime processes it
Programmatic use — Call parse() or translate() for
code analysis tools, linters, or documentation generators

Next Steps

Writing in Your Language — Practical
guide to writing multilingual hyperscript
Grammar Transformation — How the i18n
grammar system transforms between languages
API Reference — Full semantic
parser API documentation

Edit this page on GitHub