<ruby>: The Ruby Annotation element

dalke · on Dec 4, 2021

Interesting!

I noticed "It can also be used for annotating other kinds of text" and wanted to experiment with being able to number specific letters in a string.

More specifically, SMILES is a linear molecular structure notation. "O" is water, "COO" is ethyl alcohol, "c1ccccc1" is a benzene ring, and much more. (See https://en.wikipedia.org/wiki/Simplified_molecular-input_lin... .)

I want to annotate atom positions in the SMILES string. I currently do this with ("pip install smiview") text over the string, as this example using phenol.

  1 23456 7
  c1ccccc1O

I wanted to try it with ruby so I used:

  <ruby>
  c<rt>1</rt>1c<rt>2</rt>c<rt>3</rt>c<rt>4</rt>c<rt>5</rt>c<rt>6</rt>1
  </ruby>

The "2" is located over the center of "1c" instead of over the second "c" like I wanted.

How do I get it to center only over the "c"?

I tried changing the CSS too, using this catch-all:

  ruby {
      font-size: 2em;
      ruby-align: center;
      text-align: center;
  }
  * {ruby-align: center;}

No luck. I also tried wrapping things in a span, like:

  c<rt>1</rt>1<span>c<rt>2</rt></span>c<rt>3</rt>

but got the 3 as a new ruby line, centered over the "1cc", itself with a ruby "2" between the second and third "c".

I tried other combinations of <span>, to no avail.

vore · on Dec 4, 2021

Multiple ruby spans, perhaps?

  <ruby>
   c<rt>1</rt>
  </ruby>1<ruby>
   c<rt>2</rt>
  </ruby><ruby>
   c<rt>3</rt>c<rt>4</rt>c<rt>5</rt>c<rt>6</rt>1
  </ruby>

dalke · on Dec 4, 2021

Yes, that was the solution. Thanks!

twic · on Dec 4, 2021

CCO is ethanol. COO is methyl hydroperoxide.

SMILES is a lovely standard, really simple and easy enough to write by hand, but powerful enough to describe real molecules in detail. Not long ago, i saw a chemical structure used as part of an illustration, and wondered what it was. I transcribed it as SMILES, put it into a chemical search engine, and found out what it was (nothing interesting!).

dalke · on Dec 5, 2021

D'oh! Yup. You think I would know that by now. I'll attribute it to trying this out being the weekend, in the down times between the kids' bedtime routine.

gvx · on Dec 4, 2021

As far as I can tell, the only content inside of a ruby tag should be annotated text or its annotation, the 1 that should not be annotated should not be inside the <ruby> tag.

I get a good result with this:

    <ruby>c<rt>1</rt></ruby>1<ruby>c<rt>2</rt>c<rt>3</rt>c<rt>4</rt>c<rt>5</rt>c<rt>6</rt></ruby>1

alin23 · on Dec 4, 2021

Indeed, this looks quite good: https://cln.sh/5MWsIR

dalke · on Dec 4, 2021

Thanks to you for the working demo, and to gvx for figuring it out!

UPDATE: Here's an example for theobromine - https://jsfiddle.net/j84z1kyb/ .

It looks great!

But it's not as useful as I hoped it would be. Copy&paste captures the numeric annotations. I probably should have expected that, but didn't.

And highlighting is wonky. In Safari and Firefox, I seem to get a character and the ruby annotation for the character next to it, more often than I do the one overhead.

twic · on Dec 4, 2021

If you write the annotation like this:

  <ruby>C<rt class="ruby-1"></rt></ruby>

With this CSS:

  .ruby-1::after {
    content: "1";
  }

(with corresponding treatment for other numbers)

Then you should get an annotation that resists copying. This renders correctly for me in Firefox; no idea about other browsers.

lelandfe · on Dec 5, 2021

Nice one. A slight change that will result in less CSS:

  <ruby class="smiles">C<rt data-atom="1"></rt></ruby>

With this CSS:

  .smiles rt::after {
    content: attr(data-atom);
  }

dalke · on Dec 5, 2021

Thanks twik and leflandfe! Using CSS ::after resolves those highlighting and copy&paste issues I noticed.

New jsfiddle at https://jsfiddle.net/hnw1zbed/2/ .

alin23 · on Dec 4, 2021

Whoa that's incredible!

Yes, selection does also highlight the annotations but the good part is that copying ignores them.

I just selected the whole formula, pressed Cmd-C, Cmd-V and got this:

    Cn1cnc2c1c(=O)[nH]c(=O)n2C

dalke · on Dec 4, 2021

Interesting. My Cmd-C, Cmd-V in Safari gives:

  C1n21c3n4c52c61c7(=O8)[nH]9c10(=O11)n122C13

but in Firefox gives:

  Cn1cnc2c1c(=O)[nH]c(=O)n2C

ahmedfromtunis · on Dec 4, 2021

Not with chrome for Android though. Here's what the clipboard captured instead: C1n21c3n4c52c61c7(=O8)[nH]9c10(=O11)n122C13

derefr · on Dec 4, 2021

> Copy&paste captures the numeric annotations. I probably should have expected that, but didn't.

That's very likely a browser bug. Think about the accessibility implications (for e.g. screen readers) if ruby text was supposed to be modelled in the DOM as being interpolated into the text it's annotating.

I'm not sure what the standard says about how it should be treated, but my guess is that each annotation should be thought of as alternative for the text it annotates — ala the image "alt" and "srcset" attributes, or ala videos with multiple audio tracks in the same language, where one of those is Described Video or director's commentary or whatever.

In other words, the "correct" behavior would likely be that your browser knows the user's language prefs, and then chooses to select (or copy, or speak, etc.) either the text or its annotation, depending on which one the user is more likely to be able to read/understand.

yorwba · on Dec 4, 2021

> Think about the accessibility implications (for e.g. screen readers) if ruby text was supposed to be modelled in the DOM as being interpolated into the text it's annotating.

The <rp> tag exists explicitly for the purpose of interpolating the annotations into the text. So e.g. <ruby>漢字<rp>（</rp><rt>かんじ</rt><rp>）</rp></ruby> will look like 漢字（かんじ） to a client that doesn't support ruby text and a screenreader could read it the same way it would read any other text with parenthetical annotations in it.

The standard doesn't actually say what screenreaders are supposed to do, so I guess they could also try something fancy. But they don't have to.

kingcharles · on Dec 4, 2021

My guess is that only the paste is broken, not the copy.

If the box you were pasting into supported annotations then it would paste perfectly. Pasting into a plain text text-area field leaves the browser with a hard choice to make on how to interpret the data in the clipboard when transliterating it into plain text.

dalke · on Dec 4, 2021

I pasted into a iTerm2 terminal window, cat > /dev/null.

I just now tried pasting into a Jupyter notebook, and into the HTML entry box of the JSFiddle I linked to.

Again, Safari copy&paste to those elements in Safari includes the annotations.

Firefox does not.

Pasting to the terminal and pasting to a Jupyter notebook are my two primary expected paste destinations.

lygaret · on Dec 5, 2021

you can target the `rt` with css for no copy:

    ruby rt { user-select: none; }

OJFord · on Dec 4, 2021

I know nothing about this, but use a fixed width font perhaps (if you weren't)? Sounds like it could be because the '1' is narrower.

danschuller · on Dec 4, 2021

This is a cool.

It's something I've toyed with putting into my toy font renderers but it always seemed like it had a lot of edge cases. Length of the ruby text overflowing the width of the parent, in some to most cases a little overflow is ok but it's certainly not guaranteed. Scaling down the ruby text isn't the ideal solution because it quickly becomes unreadable. The other option is to scale the spacing in the parent text, which seems to be done for <ruby>境界面<rt>インターフェース</ruby> in the specification https://html.spec.whatwg.org/multipage/text-level-semantics.... but then that's going to impact the line wrapping and so on. Kudos to the implementators!

robin_reala · on Dec 4, 2021

Weirdly I’ve only used this in anger outside of Japanese text, to replicate a semantic layout in the original printing of Tristram Shandy for Standard Ebooks (see book 3, chapter 11, the Latin version: https://standardebooks.org/ebooks/laurence-sterne/the-life-a...).

kingcharles · on Dec 4, 2021

What does the "ruby" gloss mean in the linked text? Why do some words have it? It's been many a decade since I took Latin...

hcayless · on Dec 5, 2021

At a quick glance, it looks like it’s giving plural forms as alternatives. So the document can refer to a person or a group.

kazinator · on Dec 4, 2021

I used that for the furigana over the made-up words in a Jabberwocky translation.

http://www.kylheku.com/~kaz/gayabōkin.html

wodenokoto · on Dec 4, 2021

The `<rp>` tag showed in the examples, isn't explained on the page but is a fallback - something that should be rendered if the ruby tag is not understood.

Sadly, the rp page doesn't show any examples of what fallback behavior might look like.

https://developer.mozilla.org/en-US/docs/Web/HTML/Element/rp

kelnos · on Dec 5, 2021

In general, browsers will just drop tags they don't understand, while still rendering the tags' contents (as normal in-line text). So if, like in the example, you did:

    <ruby>
    明日 <rp>(</rp><rt>Ashita</rt><rp>)</rp>
    </ruby>

But the browser didn't have support <ruby>, it would just render as:

明日 (Ashita)

The idea is that a <ruby>-supporting browser wouldn't need to render the parentheses, because it's going to display "Ashita" in a special way that sets it off from the regular text (so you "annotate" the parens with <rp>). But in a browser that doesn't support <ruby>, you'd want it to still display in a sane, understandable way, where it would still be easy to understand that the added text is a pronunciation hint.

shawnz · on Dec 4, 2021

Simply remove all the tags to see how the fallback behaviour would look, for example data:text/html;charset=utf-8,%E6%BC%A2%20(kan)%20%E5%AD%97%20(ji)

kingcharles · on Dec 4, 2021

I assume this was added as a way to implement language features such as Furigana, which is a minor, but useful, feature of written Japanese: https://en.wikipedia.org/wiki/Furigana

Would this make sense for putting romanized text above non-roman languages?

Currently the standard is just to write the native text and then the romanized. See, e.g.: https://en.wikipedia.org/wiki/Weekly_Sh%C5%8Dnen_Jump

I'm thinking Hepburn above the kana: https://en.wikipedia.org/wiki/Hepburn_romanization

Anyone with more language knowledge want to cuss me out this idea?

divbzero · on Dec 5, 2021

MDN describes <ruby>’s typical usage for showing pronunciation of East Asian characters but only gives examples for Japanese.

Wikipedia [1] offers a few additional examples for other languages.

Chinese (pinyin):

  <ruby>
    北 <rp>(</rp><rt>běi</rt><rp>)</rp>
    京 <rp>(</rp><rt>jīng</rt><rp>)</rp>
  </ruby>

Chinese (zhuyin):

  <ruby>
    北 <rp>(</rp><rt>ㄅㄟˇ</rt><rp>)</rp>
    京 <rp>(</rp><rt>ㄐ丨ㄥ</rt><rp>)</rp>
  </ruby>

Korean (hangul):

  <ruby>
    韓 <rp>(</rp><rt>한</rt><rp>)</rp>
    國 <rp>(</rp><rt>국</rt><rp>)</rp>
  </ruby>

Vietnamese (chữ Quốc ngữ):

  <ruby>
    河 <rp>(</rp><rt>Hà</rt><rp>)</rp>
    內 <rp>(</rp><rt>Nội</rt><rp>)</rp>
  </ruby>

[1]: https://en.wikipedia.org/wiki/Ruby_character

antonkar · on Dec 4, 2021

I used it in my old and free iOS web browser to put translation (Spanish, French…) or Pinyin on top of English or Chinese words https://apps.apple.com/app/id932996489

akaBruce · on Dec 5, 2021

For those studying a language that might use benefit from this, I have this CSS in my Anki cards. I use the ruby tag to remind me of readings for things that aren't the main focus of the card I'm working with. For example, if a vocab word is used in an example sentence, but one of the other words in the example is unfamiliar to me.

It shows the rt tag on hover or focus and works for me for both mouse and touch on Anki and AnkiDroid. Maybe this or some variation might help others as well.

  ruby {
   text-decoration: underline dotted;
  }
  ruby rt {
   visibility: hidden;
  }
  ruby:hover rt, ruby:focus rt {
   visibility: visible;
  }

alexiaya · on Dec 4, 2021

When I read the headline I thought it's some weird new syntax for embedding Ruby code snippets in HTML.

This is definitely not going to be confusing... /s

alin23 · on Dec 4, 2021

My initial title was:

    The <ruby> HTML element

But the ruby tag got stripped by HN and I ended up with

    The  HTML element

Izkata · on Dec 4, 2021

Hum... Does it accept

  &lt;ruby&gt;

?

alin23 · on Dec 4, 2021

Not really, it renders:

    The andlt;Ruby> HTML element

https://cln.sh/LteC0S

makach · on Dec 4, 2021

I thought the same. Now that I read the documentation I think I will be fine and confusion less likely.

7c7599bfe5df · on Dec 4, 2021

It's been around for a decade in browsers[1], and the terminology itself precedes that of the ruby language.

You haven't been confused for the last 11 years, so this doesn't seem to have been a problem.

[1] The W3C spec is even older.

notreallyserio · on Dec 4, 2021

> You haven't been confused for the last 11 years

You underestimate me! Or over.

kingcharles · on Dec 4, 2021

Now I'm even more confused!

thrashh · on Dec 4, 2021

I think I used it in 2005 so it’s definitely olddd.

jhvkjhk · on Dec 4, 2021

Although Ruby element is basically a Japanese thing, it can be used to display both original text and translated text. I think using ruby rather than two-column-view is far better.

For example, this script[1] will show original English word above Japanese loan words, using the ruby element.

[1]: https://greasyfork.org/en/scripts/33268-katakana-terminator

LAC-Tech · on Dec 4, 2021

> Although Ruby element is basically a Japanese thing

It's commonly used in textbooks for different Chinese languages. I have ministry of education textbooks from Taiwan, and ruby characters are used for both Hokkien and Mandarin (the Hokkien one has two different ruby character scripts which is quite visually busy).

I would imagine it would be handy for Hindi learners as well. And probably hundreds of other languages, though I can't speak if it is used.

philsnow · on Dec 5, 2021

This is neat, and immediately made me think of the annotations that show up when you hit the play button on https://lowerquality.com/gentle/ , but it turns out those are made with absolutely-positioned divs and a lot of offline-precalculated px math.

fnord77 · on Dec 5, 2021

> The <ruby> HTML element represents small annotations that are rendered above, below, or next to base text, usually used for showing the pronunciation of East Asian characters.

I got a little worried that browsers were going to support that horrible programming language with the same name as a script tag or something.

jagger27 · on Dec 4, 2021

Is that support table right? IE5 supports it but it took until Firefox 38? Those came out in 1999 and 2015.

deaddodo · on Dec 4, 2021

Yes. The Ruby tag was introduced by Microsoft in IE5, and then rolled into HTML5 during the standardization process.

soheil · on Dec 5, 2021

Not to be confused with the programming language.