Pre-vowel Accents in Chabad’s CTR

The Chabad website has an edition of the Hebrew Bible called The Complete Tanach with Rashi (CTR). (See my document, “On the Provenance of Chabad’s CTR.”)

The CTR edition distinguishes three pairs of accents using a nonstandard mechanism rather than using the code points dedicated to making these distinctions. On a letter with a code point for a vowel mark (including HOLAM), CTR distinguishes the following prepositives from their impositive “lookalikes” by the logical order of an accent code point relative to that vowel:

The code point TIPEHA means either deḥi (!) or tarḥa / tipeḥa, depending on context.
- Before a vowel, it means deḥi (!).
- After a vowel (the normal order), it means tarḥa / tipeḥa (the normal meaning of TIPEHA).
The code point GERESH means either geresh muqdam (!) or geresh, depending on context.
- Before a vowel, it means geresh muqdam (!).
- After a vowel (the normal order), it means geresh (the normal meaning of GERESH).
The code point YETIV means either yetiv or mahapakh (!), depending on context.
- Before a vowel, it means yetiv (the normal meaning of YETIV).
- After a vowel (the normal order), it means mahapakh (!).

There are four levels of strangeness here.

As mentioned above, there are code points dedicated to making these distinctions. These code points have been around since the introduction of Hebrew accent code points, in Unicode 2.0.
The accent-before-vowel order will be normalized away in some environments, notably, in most web browsers.
The use of YETIV breaks the pattern established by TIPEHA and GERESH, where the impositive code point does double duty, and the prepositive code point is not used. If the pattern were followed, MAHAPAKH rather than YETIV would be used: MAHAPAKH before a vowel would mean yetiv (surprising but at least following the pattern) and MAHAPAKH after a vowel (the normal order) would mean mahapakh (the normal meaning of MAHAPAKH).
No font I am aware of will “understand” this encoding.

In some but not all cases, the logical order of these code points reflects a desired horizontal visual order. Even when it does reflect a desired visual order, this visual order is very unlikely to be achieved, except in the case of YETIV before a vowel. In all other cases, few if any fonts will render the marks in the desired visual order, and in normalizing contexts like most web browsers, the font won’t even get a chance to try. In detail:

The code point TIPEHA before a vowel means deḥi (!), e.g. to encode the segol and deḥi under the letter he in הֶ֭חרשתי (Psalm 32:3) (fully: הֶ֭חֱרַשְׁתִּי ),
- Instead of ‹POINT SEGOL, DEHI›, CTR uses ‹TIPEHA, POINT SEGOL›.
- I.e., instead of הֶ+ה֭, CTR uses ה֖+הֶ. (The plus sign expression indicates a kind of concatenation, and is meant to be read right to left.)
- This is very unlikely to have the desired appearance in most fonts. Plus, in normalizing contexts like most web browsers, the font won’t even get a chance to try. In this document’s context, it will look like this: הֶ֖חרשתי.
- Above I have ignored a געיה that CTR also has under the he. I have ignored it because (1) it is not relevant to the issue at hand and (2) it is one of many “extra” געיה marks that CTR has compared to many other editions.
The code point TIPEHA after a vowel means tarḥa / tipeḥa, e.g. וְאֵ֖ין (Psalm 32:2).
The code point GERESH before a vowel means geresh muqdam (!), e.g. to encode the tsere and geresh muqdam on the letter alef in אֵ֝ליו (Psalm 32:6) (fully: אֵ֝לָ֗יו ),
- Instead of ‹TSERE, GERESH MUQDAM›, CTR uses ‹GERESH, TSERE›.
- I.e., instead of אֵ+א֝, CTR uses א֜+אֵ. (The plus sign expression indicates a kind of concatenation, and is meant to be read right to left.)
- This is very unlikely to have quite the desired appearance in most fonts, though the appearance will likely be close to what is desired. In normalizing contexts like most web browsers, the font won’t even get a chance to try. In this document’s context, it will look like this: אֵ֜ליו. Because the two marks in question are not both below-marks, this looks pretty close to the desired appearance. But it is still not quite what is desired.
The code point GERESH after a vowel means geresh, e.g. הַמַּ֜יִם (Genesis 1:9).
The code point YETIV before a vowel means yetiv, e.g. to encode the ḥiriq and yetiv under the letter kaf in כִּ֚י (Joshua 2:11),
- Instead of ‹HIRIQ, YETIV›, CTR uses ‹YETIV, HIRIQ›.
- I.e., instead of כִּ+כּ֚, CTR uses כּ֚+כִּ. (The plus sign expression indicates a kind of concatenation, and is meant to be read right to left.)
- Although this is a strange order to encode it in, this is very likely to have the desired appearance in most fonts.
The code point YETIV after a vowel means mahapakh (!), e.g. to encode the qamats and mahapakh under the letter tav in אתָּ֤ה (Psalm 32:7) (fully: אַתָּ֤ה׀ ),
- Instead of ‹QAMATS, MAHAPAKH›, CTR uses ‹QAMATS, YETIV›.
- I.e., instead of תָּ+תּ֤, CTR uses תָּ+תּ֚. (The plus sign expression indicates a kind of concatenation, and is meant to be read right to left.)
- This is very unlikely to have the desired appearance in most fonts. Plus, in normalizing contexts like most web browsers, the font won’t even get a chance to try. In this document’s context, it will look like this: אתָּ֚ה.

As noted above, CTR’s strange vowel-relative distinctions apply not only to the below-vowels but also to the one above-vowel, HOLAM.

The TIPEHA code point before HOLAM means deḥi, e.g. (rendering CTR’s contents in this document’s context) אֹ֖זֶן, אֹ֖מֶר, and כָּל־רֹ֖אַי (Psalm 18:45, 19:4, and 22:8). But, consistent with the general sloppiness of CTR, sometimes TIPEHA appears after HOLAM, even when a deḥi is (or should be) intended, e.g. (rendering CTR’s contents in this document’s context) in בֹּ֖קֶר and עֹ֖ז (Psalm 5:4 and 22:11). Note that TIPEHA before HOLAM is somewhat analogous to GERESH before a below-vowel. In both cases, the logical order does not reflect a desired horizontal visual order, since in each case, one of the marks is a below-mark and the other is an above-mark. Rather, the logical order reflects at most a desired horizontal visual alignment (right-biased rather than centered) of the accent relative to a vowel-free area of its letter. (That area being the letter’s top for GERESH and bottom for TIPEHA). Because this is completely nonstandard, the desired visual alignment is very unlikely to be achieved in most or all fonts.
The GERESH code point before HOLAM means geresh muqdam, e.g. (rendering CTR’s contents in this document’s context) מִכָּל־רֹ֜דְפַ֗י, פֹּ֜רֵ֗ק, and כָּל־אֹ֜יְבָ֗יו (Psalm 7:2, 7:3, and 18:1). Note that GERESH before HOLAM, like GERESH before a below-vowel, does not reflect a desired distinction in horizontal visual order, since both geresh and geresh muqdam should, visually, appear before ḥolam. Rather, the logical order reflects at most a desired distinction in horizontal visual alignment (right-biased rather than centered) of the accent relative to its letter. Or, if you like, you can think of GERESH logically before HOLAM as meaning a geresh visually far before ḥolam, as opposed to GERESH logically after HOLAM, which means a geresh still visually before ḥolam, but not so far before it.
The YETIV code point before HOLAM means yetiv, e.g. כֹּ֚ל (Joshua 1:4) and YETIV after HOLAM means mahapakh, e.g. (rendering CTR’s contents in this document’s context) יֵ֘בֹ֚שׁוּ (Psalm 6:11) Note that YETIV before HOLAM is yet another case where the logical order does not reflect a desired horizontal visual order, since YETIV is a below-mark and HOLAM an above-mark.

One might naturally wonder how, on a letter without a vowel, CTR encodes the six accents of these three “lookalike” pairs. I.e. one might naturally wonder how these six accents are encoded when they are “bare,” i.e. not sharing their letter with a vowel mark.

The TIPEHA code point is used for both bare deḥi and bare tarḥa. This results in an ambiguity. E.g. CTR codes the bare deḥi in כִּי־ה֭וּא (Psalm 24:2) as TIPEHA. In this document’s context, that TIPEHA will look like this: ה֖וּא , i.e. it will look like a tarḥa. This makes it indistinguishable, for example, from the bare tarḥa Psalm 59:5 ע֖וּרָה .
The GERESH code point is used for both bare geresh muqdam and bare geresh. These accents are exclusive to the poetic and prose systems respectively so even when these accents are bare, there is no ambiguity (assuming we know what accent system the word belongs to). As always, it is important to be aware that though Job is, for the most part, a poetically-accented book, its introduction and conclusion are prose-accented. So, a bare geresh in Job could be either a geresh muqdam or geresh, depending on whether or not its verse is in the range 3:2 to 42:6 (inclusive).
The YETIV code point is used for both bare yetiv and bare mahapakh. Yetiv is exclusive to the prose system so there is no ambiguity if we know that the word belongs to the poetic system. If the word belongs to the prose system, then YETIV is ambiguous.
With no pattern I can discern, sometimes MAHAPAKH is used for a bare mahapakh, as in שִׂמְח֤וּ (Psalm 32:11).

In conclusion, CTR uses and abuses Unicode in strange ways that in most environments (font, browser, etc.) will not have the desired effect.