Semantic Parser

The @lokascript/semantic package parses multilingual
hyperscript directly into executable structures — without
requiring an English intermediate step.

Semantic vs i18n

The two packages serve different purposes:

Semantic Parser i18n Grammar
Purpose Parse for execution Transform for display
Direction Any language → AST Any language ↔ any language
Output Structured semantic nodes Translated text string
Confidence Scored (0–1) Deterministic
Use case Runtime, adapter plugin Documentation, teaching

Use the semantic parser when you need to execute
multilingual code. Use i18n when you need to translate code
for display or documentation.

How It Works

The parser uses a three-phase pipeline:

1. Tokenization

Input is split into tokens using language-aware rules:

English: "toggle .active"
→ [toggle (keyword), .active (selector)]

Japanese: ".active を 切り替え"
→ [.active (selector), を (particle), 切り替え (keyword)]

Each language has its own tokenization strategy. Japanese uses
particles (, , ) as word boundaries. English uses
spaces. Arabic handles right-to-left text.

2. Pattern Matching

Tokens are matched against language-specific command patterns:

English pattern (SVO):

[COMMAND_KEYWORD] [SELECTOR]
→ action=toggle, patient=.active

Japanese pattern (SOV):

[SELECTOR] [PARTICLE:を] [COMMAND_KEYWORD]
→ patient=.active, action=toggle

The same semantic roles are extracted regardless of language —
the patterns just account for different word orders.

3. Confidence Scoring

Each match produces a confidence score (0–1):

  • ≥ 0.7 — High confidence. Use the semantic result.
  • 0.5–0.7 — Medium. May fall back to traditional parser.
  • < 0.5 — Low. Falls back to original text.

SOV languages (Japanese, Korean, Turkish) naturally produce
lower scores due to greater structural ambiguity. Use
per-language thresholds to tune this.

API

parse(code, options?)

Parse hyperscript with language-aware semantic analysis:

import { parse } from '@lokascript/semantic';

const result = parse('on click toggle .active on me', {
  language: 'en',
  confidenceThreshold: 0.7
});

console.log(result.confidence); // 0.98
console.log(result.language);   // 'en'
console.log(result.ast);        // Parsed AST

detect(code)

Auto-detect the language of hyperscript code:

import { detect } from '@lokascript/semantic';

const lang = detect('クリック で 私 の .active を 切り替え');
// { language: 'ja', confidence: 0.95 }

translate(code, fromLang, toLang)

Parse in one language and render in another:

import { translate } from '@lokascript/semantic';

const english = translate('.active を 切り替え', 'ja', 'en');
// → 'toggle .active'

Language-Specific Features

Japanese

  • Particles (, , から, ) mark semantic roles
  • No spaces — word boundaries from particle positions
  • Morphological normalization handles conjugated verb forms
    (e.g., 切り替えて切り替え)

Korean

  • Similar particle system to Japanese (을/를, , 에서)
  • Vowel-dependent particle selection (을 vs 를)

Arabic

  • Right-to-left text processing
  • VSO word order (verb first)
  • Diacritics stripped during normalization

Turkish

  • Agglutinative suffixes instead of separate particles
  • Vowel harmony in suffix selection

When the Semantic Parser Is Used

  • LokaScript runtime — The core runtime uses the semantic
    parser to handle _="..." attributes in non-English
    languages
  • Adapter plugin — The @lokascript/hyperscript-adapter
    uses semantic parsing to translate code before the original
    _hyperscript runtime processes it
  • Programmatic use — Call parse() or translate() for
    code analysis tools, linters, or documentation generators

Next Steps