Projects bibleweb Docs personal-translation-builder.md

Personal Translation Builder — Research Notes

Last modified March 22, 2026

Personal Translation Builder — Research Notes

Referenced from ideas.md → Personal Translation Builder

ShortGloss Matching Bugs (2026-02-23, Matt 24:5)

Three bugs in the original ShortGloss matching were identified and fixed:

1. Comma-inside-parens bug

far (passed, spent) was splitting into 3 items instead of 1, pushing "many" for G4183 past the 8-item cap.

Fix: Paren-aware comma splitter + cap raised to 12.

2. First-match-wins greedy bug

γάρ candidates ["and","as","because","but","even","for"...]: "and" appeared first AND appeared in the sentence (at position 12), so it was returned instead of "for" (the correct positional match).

Fix: Position-aware second-pass refinement assigns each ShortGloss to the candidate closest to its expected English position.

3. ShortGloss not in displayed Meanings

Fix: Safety-net insert added so the highlighted meaning is always visible.

Remaining hard case

BSB/KJV translation divergence. λέγοντες (G3004) = "claiming" in BSB but "say" in KJV usage — no amount of matching can bridge this without BSB-specific word-level alignment data.


BSB Word-Level Alignment Import (2026-02-23)

bereanbible.com/bsb_tables.tsv (CC0/public domain) imported via 10_import_bsb_alignment.py. 112,747 NT Greek words now carry an authoritative BSB gloss in greek_words.bsb_gloss. PopupBuilder uses this as the primary ShortGloss, falling back to KJV heuristic only when null (TR-only/untranslated words).

Result: λέγοντες (Matt 24:5) → "claiming" ✓, γάρ → "For" ✓.


Greek→English Word Absorption (2026-02-23, Matt 24:1)

Investigated why some Greek words appear in the interlinear with no visible English counterpart. Two root causes identified:

1. Greek grammatical machinery absorbed by English structure

Greek externalizes relationships as standalone words that English handles implicitly:

  • Definite articles (ὁ, τοῦ, οἱ, τὰς) — Greek requires articles before proper nouns and generic nouns; English omits them or folds them into possessives ("His disciples" = οἱ μαθηταὶ αὐτοῦ — the article οἱ is invisible in English)
  • Prepositions — ἀπὸ ("from") in "left from the temple" gets absorbed into BSB's "was walking away"; no standalone English token
  • Conjunctions restructured — Καὶ ("and") at sentence start recast as "As" (temporal subordinator) in BSB, embedded in clause structure
  • Repeated nouns replaced by pronouns — τοῦ ἱεροῦ ("of the temple") at verse end → BSB uses "its" buildings; two Greek words collapse into one English pronoun

2. BSB deliberately absorbs dative/accusative objects into verb phrases

E.g. αὐτῷ ("to him", indirect object of ἐπιδεῖξαι) has bsb_gloss = NULL because BSB renders "came up to Him to point out its buildings" — the "to him" is already implied by the verb phrase and not given its own token.

When this happens, the KJV heuristic fires and produces wrong results (e.g. "the other" for αὐτῷ — a valid but rare KJV rendering of αὐτός that happens to score best against the sentence).


Absorbed Word Detection — Design (2026-02-23)

PopupBuilder now detects "absorbed" words: if the verse has BSB data but a word's gloss is null → the word was deliberately not tokenized by BSB.

Visual treatment:

  • Absorbed words show "–" (en-dash) as ShortGloss in dim purple-gray (Theme.AbsorbedGloss)
  • KJV meanings list still shown for exploration, but none highlighted in gold — all dim
  • For rare verses with NO BSB data (18 NT verses): KJV heuristic kept but rendered in muted amber (Theme.GuessGloss)

Color key: gold = authoritative BSB | muted amber = KJV guess | dim = absorbed/alternatives

Impact: Fixes "the other" for αὐτῷ (Matt 24:1) and ~28,973 other null-gloss words.


Composed Sentence Word-Order Fix (2026-02-23)

Two issues discovered when the "Your translation" footer was first tested:

1. Hebrew composed sentence is nonsensical

Hebrew lacks BSB word-level alignment; ShortGloss comes from Strong's dictionary definitions (grammatical tags like "(Qal)", generic entries like "day"). Not verse-specific enough to compose sentences.

Fix: Hebrew footer shows disclaimer instead of composed sentence.

2. Greek follows Greek word order, not English

Greek clause order differs from English (e.g. Mark 3:2: Greek puts ἵνα κατηγορήσωσιν "In order to accuse" at the end, BSB moves it to the front).

Fix: Reorder composed parts by finding each BsbGloss's character position in the English sentence text. Uses closest-match heuristic for words appearing multiple times (e.g. "the"). Falls back to proportional position estimate for words without a match.