Jump to content

Phonetic symbols in Unicode

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by Wikifresc (talk | contribs) at 18:20, 16 February 2020 (Vowels: image to table). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Unicode supports several phonetic scripts and notations through the existing writing systems and the addition of extra blocks with phonetic characters. These phonetic extras are derived of an existing script, usually Latin, Greek or Cyrillic. In Unicode there is no "IPA script". Apart from IPA, extensions to the IPA and obsolete and nonstandard IPA symbols, these blocks also contain characters from the Uralic Phonetic Alphabet and the Americanist Phonetic Alphabet.

Phonetic scripts

The International Phonetic Alphabet (IPA) makes use of letters from other writing systems as most phonetic scripts do. IPA notably uses Latin, Greek and Cyrillic characters. Combining diacritics also adds meaning to the phonetic text. Finally, these phonetic alphabets make use of modifier letters, that are specially constructed for the phonetic meaning. A "modifier letter" is strictly intended not as an independent grapheme but as a modification of the preceding character[1] resulting in a distinct grapheme, notably in the context of the International Phonetic Alphabet. For example, ʰ should not occur on its own but modifies the preceding or following symbol. Thus, is a single IPA symbol, distinct from t. In practice, however, several of these "modifier letters" are also used as full graphemes, e.g. ʿ as transliterating Semitic ayin or Hawaiian okina, or ˚ transliterating Abkhaz ә.

From IPA to Unicode

Consonants

The following tables indicates the Unicode code point sequences for phonemes as used in the International Phonetic Alphabet. A bold code point indicates that the Unicode chart provides an application note such as "voiced retroflex lateral" for U+026D ɭ LATIN SMALL LETTER L WITH RETROFLEX HOOK. An entry in bold italics indicates the character name itself refers to a phoneme such as U+0298 ʘ LATIN LETTER BILABIAL CLICK

Bilabial Labiodental Dental Alveolar Postalveolar Retroflex Labialized palatal Postalveolar-velar
Plosive p 0070 b 0062 0070 032A 0062 032A 0074 032A 0064 032A t
0074
d 0064 ʈ 0288 ɖ 0256
Implosive ɓ̥ 0253 0325 ɓ 0253 ɗ̪ 0257 032A ɗ 0257 *
Ejective 0070 02BC t̪ʼ 0074 032A 02BC 0074 02BC ʈʼ 0288 02BC
Nasal 006D 0325 m 006D ɱ̊ 0271 030A ɱ 0271 n̪̊ 006E 032A 030A 006E 032A 006E 0325 n 006E ɳ̊ 0273 030A ɳ 0273
Trill ʙ 0299 0072 0325 r 0072 *
Tap or Flap ⱱ̟ 2C71 031F 2C71 ɾ 027E ɽ 027D
Lateral flap ɺ 027A *
Fricative ɸ 0278 β 03B2 f
0066
v 0076 θ 03B8 ð 00F0 s 0073 z 007A ʃ 0283 ʒ 0292 ʂ 0282 ʐ 0290 ɧ 0267
Lateral fricative ɬ 026C ɮ 026E A78E
Ejective fricative 0073 02BC ʃʼ 0283 02BC
Ejective lateral fricative ɬʼ 026C 02BC
Percussive ʬ
02AC
ʭ
02AD
Approximant β̞̊ 03B2 031E 030A β̞ 03B2 031E ʋ̥ 028B 0325 ʋ 028B ð̞ 00F0 031E ɹ̥ 0279 0325 ɹ 0279 ɻ̊ 027B 030A ɻ 027B ɥ̊ 0265 030A ɥ 0265
Lateral approximant 006C 0325 l 006C ɭ 026D
Click consonant ʘ
0298
ǀ
01C0
ǃ
01C3
ǃ / ǂ
01C3 / 01C2
Lateral click * ǁ
01C1
Alveolo-palatal Palatal Labial-velar Velar Uvular Pharyngeal Epiglottal Glottal
Plosive ȶ 0236 ȡ 0221 c 0063 ɟ 025F k͡p 006B 0361 0070 ɡ͡b 0261 0361 0062 k 006B ɡ 0261 q 0071 ɢ 0262 ʡ 02A1 ʔ 0294
Implosive ʄ 0284 ɠ 0260 ʛ 029B
Ejective 0063 02BC 006B 02BC 0071 02BC
Nasal ȵ 0235 ɲ 0272 ŋ͡m 014B 0361 006D ŋ 014B ɴ 0274
Trill ʀ 0280 *
Tap or Flap *
Lateral flap * *
Fricative ɕ 0255 ʑ 0291 ç 0063 0327 ʝ 029D x 0078 ɣ 0263 χ 03C7 ʁ 0281 ħ 0127 ʕ 0295 ʜ 029C ʢ 02A2 h 0068 ɦ 0266
Approximant j 006A ʍ 028 w 0077 ɰ 0270
Lateral approximant ȴ 0234 ʎ 028E ʟ 029F

Vowels

The following figures depict the phonetic vowels and their Unicode / UCS code points. Vowels appearing in pairs in the figure to the right indicate rounded and unrounded variations respectively. Again, characters with Unicode names referring to phonemes are indicated by bold text. Those with explicit application notes are indicated by bold italic text. Those from borrowed unchanged from another script (Latin, Greek or Cyrillic) are indicated by italics.

Unicode code points for phonetic vowels
This table represents the phonetic vowel trapezium

Before and after a bullet are the unrounded · rounded vowels

Close i · y
0069 0079
ɨ · ʉ
0268 0289
ɯ · u
026F 0075
Near-close ɪ · ʏ
026A 028F
ɪ̈ · ʊ̈
026A 0308 · 028A 0308
 · ʊ
028A
Close-mid e · ø
0065 00F8
ɘ · ɵ
0258 0275
ɤ · o
0264 006F
Mid ə
0259
Open-mid ɛ · œ
025B 0153
ɜ · ɞ
025C 025E
ʌ · ɔ
028C 0254
Near-open æ ·
00E6
ɐ
0250
Open a · ɶ
0061 0276
ɑ · ɒ
0251 0252

Diacritics

Diacritic Function Hex Diacritic Function Hex Diacritic Function Hex
Modifier Combining Modifier Combining Modifier Combining
˳ Voiceless 0x02F3 0x0325 ̤ Breathy Voiced 0x0324 ͏̪ Dental 0x032A
ˬ Voiced 0x02EC 0x032C ˷ Creaky Voiced 0x02F7 0x0330 ˽ Apical 0x02FD 0x033A
ʰ Aspirated 0x02B0 ͏̼ Linguolabial 0x033C ͏̻ Laminal 0x033B
̹ More Rounded 0x0339 ʷ Labialized 0x02B7 ̃ Nasalized 0x0303
͏̜ Less Rounded 0x031C ʲ Palatalized 0x02B2 Nasal release 0x207F
˖ Advanced 0x02D6 0x031F ˠ Velarized 0x02E0 ˡ Lateral release 0x02E1
ˍ Retracted 0x02CD 0x320 ˤ Pharyngealized 0x02E4 ˺ No audible release 0x02FA 0x031A
̈ Centralized 0x0308 ̴ Velarized or Pharyngealized 0x0334 ː Lengthened 0x02D0
˟ Mid-Centralized 0x02DF 0x033D ˔ Raised 0x02D4 0x031D
ˌ Syllabic 0x02CC 0x0329 ˕ Lowered 0x02D5 0x031E
͏̯ Non-syllabic 0x032F ͏̘ Advanced Tongue Root 0x0318
˞ Rhoticity 0x02DE ͏̙ Retracted Tongue Root 0x0319

Unicode blocks

From Unicode blocks to scripts

Phonetical scripts are encoded in six Unicode blocks.

IPA Extensions (U+0250–02AF)

IPA Extensions[1]
Official Unicode Consortium code chart (PDF)
  0 1 2 3 4 5 6 7 8 9 A B C D E F
U+025x ɐ ɑ ɒ ɓ ɔ ɕ ɖ ɗ ɘ ə ɚ ɛ ɜ ɝ ɞ ɟ
U+026x ɠ ɡ ɢ ɣ ɤ ɥ ɦ ɧ ɨ ɩ ɪ ɫ ɬ ɭ ɮ ɯ
U+027x ɰ ɱ ɲ ɳ ɴ ɵ ɶ ɷ ɸ ɹ ɺ ɻ ɼ ɽ ɾ ɿ
U+028x ʀ ʁ ʂ ʃ ʄ ʅ ʆ ʇ ʈ ʉ ʊ ʋ ʌ ʍ ʎ ʏ
U+029x ʐ ʑ ʒ ʓ ʔ ʕ ʖ ʗ ʘ ʙ ʚ ʛ ʜ ʝ ʞ ʟ
U+02Ax ʠ ʡ ʢ ʣ ʤ ʥ ʦ ʧ ʨ ʩ ʪ ʫ ʬ ʭ ʮ ʯ
Notes
1.^ As of Unicode version 16.0

Spacing Modifier Letters (U+02B0–02FF)

The characters in the "Spacing Modifier Letters" block are intended as forming a unity with the preceding letter (which they "modify"). E.g. the character U+02B0 ʰ MODIFIER LETTER SMALL H isn't intended simply as a superscript h (h), but as the mark of aspiration placed after the letter being aspirated, as in "aspirated voiceless bilabial plosive". The block contains:

  • Latin superscript modifier letters: (U+02B0–U+02B8): ʰ aspiration; ʱ breathy voice, murmured; ʲ palatalization; ʳ, ʴ, ʵ, ʶ r-coloring or r-offglides; ʷ labialization; ʸ palatalization, Americanist usage for U+02B2
  • Miscellaneous phonetic modifiers: (U+02B9–U+02D7): ʹ ʺ ʻ ʼ ʽ ʾ ʿ ˀ ˁ ˂ ˃ ˄ ˅ ˆ ˇ ˈ ˉ ˊ ˋ ˌ ˍ ˎ ˏ ː ˑ ˒ ˓ ˔ ˕ ˖ ˗
  • Spacing clones of diacritics: (U+02D8–U+02DD): ˘ breve; ˙ dot above; ˚ ring above; ˛ ogonek; ˜ small tilde; ˝ double acute accent
  • Additions based on 1989 IPA: (U+02DE–U+02E4): ˞ ˟ ˠ ˡ ˢ ˣ ˤ
  • Tone letters: (U+02E5–U+02E9): ˥ ˦ ˧ ˨ ˩
  • Extended Bopomofo tone marks: U+02EA ˪ MODIFIER LETTER YIN DEPARTING TONE MARK; U+02EB ˫ MODIFIER LETTER YANG DEPARTING TONE MARK
  • IPA modifiers: U+02EC ˬ MODIFIER LETTER VOICING, unaspirated
  • Other modifier letters: U+02EE ˮ MODIFIER LETTER DOUBLE APOSTROPHE for Nenets
  • Uralic Phonetic Alphabet (UPA) modifiers: (U+02EF–U+02FF): ˯ ˰ ˱ ˲ ˳ ˴ ˵ ˶ ˷ ˸ ˹ ˺ ˻ ˼ ˽ ˾ ˿
Spacing Modifier Letters[1]
Official Unicode Consortium code chart (PDF)
  0 1 2 3 4 5 6 7 8 9 A B C D E F
U+02Bx ʰ ʱ ʲ ʳ ʴ ʵ ʶ ʷ ʸ ʹ ʺ ʻ ʼ ʽ ʾ ʿ
U+02Cx ˀ ˁ ˂ ˃ ˄ ˅ ˆ ˇ ˈ ˉ ˊ ˋ ˌ ˍ ˎ ˏ
U+02Dx ː ˑ ˒ ˓ ˔ ˕ ˖ ˗ ˘ ˙ ˚ ˛ ˜ ˝ ˞ ˟
U+02Ex ˠ ˡ ˢ ˣ ˤ ˥ ˦ ˧ ˨ ˩ ˪ ˫ ˬ ˭ ˮ ˯
U+02Fx ˰ ˱ ˲ ˳ ˴ ˵ ˶ ˷ ˸ ˹ ˺ ˻ ˼ ˽ ˾ ˿
Notes
1.^ As of Unicode version 16.0

Phonetic Extensions (U+1D00–1D7F)

This block, together with Phonetic Extensions Supplement below, contains:

  • Small capitals "ɢ ɪ ɴ ɶ ʀ ʏ ʙ ʜ ʟ"
  • Turned small letters "ɐ ɥ ɯ ɹ ɺ ɻ ʇ ʌ ʍ ʎ ʞ ʮ ʯ"
  • Extra small capitals "ʁ ʛ ᴀ ᴁ ᴃ ᴄ ᴅ ᴆ ᴇ ᴊ ᴋ ᴌ ᴍ ᴎ ᴏ ᴐ ᴘ ᴙ ᴚ ᴛ ᴜ ᴠ ᴡ ᴢ ᴣ ᴦ ᴧ ᴨ ᴩ ᴪ"
  • Letters with palatal hooks "ƫ ᶀ ᶁ ᶂ ᶃ ᶄ ᶅ ᶆ ᶇ ᶈ ᶉ ᶊ ᶋ ᶌ ᶍ ᶎ ᶪ ᶵ"
  • Letters with retroflex hooks "ᶏ ᶐ ᶒ ᶓ ᶔ ᶕ ᶖ ᶗ ᶘ ᶙ ᶚ ᶩ ᶯ ᶼ"
Phonetic Extensions[1]
Official Unicode Consortium code chart (PDF)
  0 1 2 3 4 5 6 7 8 9 A B C D E F
U+1D0x
U+1D1x
U+1D2x
U+1D3x ᴿ
U+1D4x
U+1D5x
U+1D6x
U+1D7x ᵿ
Notes
1.^ As of Unicode version 16.0

Phonetic Extensions Supplement (U+1D80–1DBF)

Phonetic Extensions Supplement[1]
Official Unicode Consortium code chart (PDF)
  0 1 2 3 4 5 6 7 8 9 A B C D E F
U+1D8x
U+1D9x
U+1DAx
U+1DBx ᶿ
Notes
1.^ As of Unicode version 16.0

Modifier Tone Letters (U+A700–A71F)

Modifier Tone Letters[1]
Official Unicode Consortium code chart (PDF)
  0 1 2 3 4 5 6 7 8 9 A B C D E F
U+A70x
U+A71x
Notes
1.^ As of Unicode version 16.0

Superscripts and Subscripts (U+2070–209F)

Superscripts and Subscripts[1][2][3]
Official Unicode Consortium code chart (PDF)
  0 1 2 3 4 5 6 7 8 9 A B C D E F
U+207x
U+208x
U+209x
Notes
1.^ As of Unicode version 16.0
2.^ Grey areas indicate non-assigned code points
3.^ Refer to the Latin-1 Supplement Unicode block for characters ¹ (U+00B9), ² (U+00B2) and ³ (U+00B3)


Fonts support for IPA

IPA font support is increasing, and is now included in several fonts such as the Times New Roman versions that come with various recent computer operating systems. Diacritics are not always properly rendered, however. IPA fonts that are freely available online include Gentium, several from the SIL (such as Charis SIL, and Doulos SIL), DejaVu Sans, and TITUS Cyberbit, which are all freely available; as well as commercial typefaces such as Brill, available from Brill Publishers, and Lucida Sans Unicode and Arial Unicode MS, shipping with various Microsoft products. These all include several ranges of characters in addition to the IPA. Modern Web browsers generally do not need any configuration to display these symbols, provided that a font capable of doing so is available to the operating system.


Input by selection from a screen

Further Information: Unicode input#Selection from a screen

Applet for character selection

Many systems provide a way to select Unicode characters visually. ISO/IEC 14755 refers to this as a screen-selection entry method.

Microsoft Windows has provided a Unicode version of the Character Map program (find it by hitting ⊞ Win+R then type charmap then hit ↵ Enter) since version NT 4.0 – appearing in the consumer edition since XP. This is limited to characters in the Basic Multilingual Plane (BMP). Characters are searchable by Unicode character name, and the table can be limited to a particular code block. More advanced third-party tools of the same type are also available (a notable freeware example is BabelMap).

macOS provides a "character palette" with much the same functionality, along with searching by related characters, glyph tables in a font, etc. It can be enabled in the input menu in the menu bar under System Preferences → International → Input Menu (or System Preferences → Language and Text → Input Sources) or can be viewed under Edit → Emoji & Symbols in many programs.

Equivalent tools – such as gucharmap (GNOME) or kcharselect (KDE) – exist on most Linux desktop environments.

See also

References

  1. ^ "Spacing modifier letters". Everything2.com. 2002-08-29. Retrieved 2016-01-23.