` - wrap around SSML text
* `lang` - set language for document
* `` - paragraph
* `lang` - set language for paragraph
* `` - sentence (disables automatic sentence breaking)
* `lang` - set language for sentence
* `` / `` - word (disables automatic tokenization)
* `lang` - set language for word
* `role` - set word role (see [word roles](#word-roles))
* `` - set language inner text
* `` - set voice of inner text
* `` - force interpretation of inner text
* `interpret-as` one of "spell-out", "date", "number", "time", or "currency"
* `format` - way to format text depending on `interpret-as`
* number - one of "cardinal", "ordinal", "digits", "year"
* date - string with "d" (cardinal day), "o" (ordinal day), "m" (month), or "y" (year)
* `` - Pause for given amount of time
* time - seconds ("123s") or milliseconds ("123ms")
* `` - User-defined mark (`marks_before` and `marks_after` attributes of words/sentences)
* name - name of mark
* `` - substitute `alias` for inner text
* `` - supply phonemes for inner text
* `ph` - phonemes for each word of inner text, separated by whitespace
* `` - inline or external pronunciation lexicon
* `id` - unique id of lexicon (used in ``)
* `uri` - if empty or missing, lexicon is inline
* One or more `` child elements with:
* Optional `role="..."` ([word roles][#word-roles] separated by whitespace)
* `WORD` - word text
* `P H O N E M E S` - word pronunciation (phonemes separated by whitespace)
* `` - use pronunciation lexicon for child elements
* `ref` - id from a ``
#### Word Roles
During phonemization, word roles are used to disambiguate pronunciations. Unless manually specified, a word's role is derived from its part of speech tag as `gruut:`. For initialisms and `spell-out`, the role `gruut:letter` is used to indicate that e.g., "a" should be spoken as `/eɪ/` instead of `/ə/`.
For `en-us`, the following additional roles are available from the part-of-speech tagger:
* `gruut:CD` - number
* `gruut:DT` - determiner
* `gruut:IN` - preposition or subordinating conjunction
* `gruut:JJ` - adjective
* `gruut:NN` - noun
* `gruut:PRP` - personal pronoun
* `gruut:RB` - adverb
* `gruut:VB` - verb
* `gruut:VB` - verb (past tense)
#### Inline Lexicons
Inline [pronunciation lexicons](https://www.w3.org/TR/2008/REC-pronunciation-lexicon-20081014/) are supported via the `` and `` tags. gruut diverges slightly from the [SSML standard](https://www.w3.org/TR/speech-synthesis11/) here by allowing lexicons to be defined within the SSML document itself (`url` is blank or missing). Additionally, the `id` attribute of the `` element can be left off to indicate a "default" inline lexicon that does not require a corresponding `` tag.
For example, the following document will yield three different pronunciations for the word "tomato":
``` xml
tomato
t ə m ˈɑ t oʊ
tomato
t ə m ˈi t oʊ
tomato
tomato
tomato
```
The first "tomato" will be looked up in the U.S. English lexicon (`/t ə m ˈeɪ t oʊ/`). Within the `` tag's scope, the second and third "tomato" words will be looked up in the inline lexicon. The third "tomato" word has a [role](#word-roles) attached (selecting a made up pronunciation in this case).
Even further from the SSML standard, gruut allows you to leave off the `` id entirely. With no `id`, a `` tag is no longer needed, allowing you to override the pronunciation of any word in the document:
``` xml
tomato
t ə m ˈɑ t oʊ
tomato
```
This will yield a pronunciation of `/t ə m ˈɑ t oʊ/` for all instances of "tomato" in the document (unless they have a ``).
## Intended Audience
gruut is useful for transforming raw text into phonetic pronunciations, similar to [phonemizer](https://github.com/bootphon/phonemizer). Unlike phonemizer, gruut looks up words in a pre-built lexicon (pronunciation dictionary) or guesses word pronunciations with a pre-trained grapheme-to-phoneme model. Phonemes for each language come from a [carefully chosen inventory](https://en.wikipedia.org/wiki/Template:Language_phonologies).
For each supported language, gruut includes a:
* A word pronunciation lexicon built from open source data
* See [pron_dict](https://github.com/Kyubyong/pron_dictionaries)
* A pre-trained grapheme-to-phoneme model for guessing word pronunciations
Some languages also include:
* A pre-trained part of speech tagger built from open source data:
* See [universal dependencies](https://universaldependencies.org/)