Searching for Non-English Language Materials: Scripts, Diacritics, Numerals and Dates
- Overview
- Searching with non-Latin script
- Searching with romanization
- Searching with diacritics
- Numerals and dates
- Standards and conversion tools
Connect with a subject specialist
Use the library’s "Find a Specialist" tool to identify a librarian with expertise in your language. Search by language and then click through to the librarian's profile.
Scripts
Non-English language materials may be written in Latin (also called Roman) characters or non-Latin script characters. Knowing what script you are working with can impact the strategy you use to find materials in the online catalog.
- Latin characters constitute the letters of the English alphabet. Extended Latin characters include the English alphabet, plus additional letters and diacritical or accent marks and are found in languages such as Croatian, French, German, Polish, Spanish, Turkish, and Vietnamese.
- Non-Latin character or script languages include Arabic, Armenian, Chinese, Greek, Hebrew, Hindi, Japanese, Korean, Persian, Russian, Thai, Tamil, and Urdu, just to name a few.
One script serving multiple languages
When the same script is used for more than one language (such as Arabic script for Persian, Arabic and Ottoman Turkish), the letters may look the same, but vocabularies, pronunciation and romanization will be different.
One language in multiple scripts
A single language can be written in more than one script. For example, some Central Asian Turkic languages switched between Arabic, Latin, and Cyrillic alphabets four times in the 20th century.
Directionality
Non-Latin script languages may be written horizontally with the characters reading left to right, horizontally with the characters reading right to left, or vertically with each column read from top to bottom. Regardless of the directionality of the original non-Latin script, ALA-LC Romanization is always recorded and read horizontally and left to right.
Pro tip: Always enter romanized words into a search so that they read from left to right.
Transliteration & romanization
To enable communication across languages or interoperability between technologies, it is sometimes necessary to convert from one script to another by means of transliteration or romanization.
- Transliteration is the conversion of one script, alphabet, or character set into another, often in a way that approximates the pronunciation of the original words in the non-native script. (Not to be confused with translation).
- Romanization is specifically the conversion of a non-Latin/Roman script into Latin/Roman script.
There are many romanization standards used worldwide by scholars, in printed citations, or in library catalogs or databases from other countries. The University of Michigan Library (along with most North American libraries) uses the American Library Association-Library of Congress (ALA-LC) Romanization Tables to provide consistent and predictable romanization of non-Latin script languages within its own catalog.
Pro tip: Become familiar with the ALA-LC Romanization Tables corresponding to any non-Latin script language material you wish to find. This will allow you to correctly enter romanization in a North American online library catalog for a more successful search.
Searching by language
Search for collection material in a specific language or language grouping from the Advanced Search page in Library Search. To find the spelling of the name for a language or language grouping used in the list of languages within Library Search Advanced Search page, refer to the MARC Code List for Languages. For instance, instead of “Pashto” USE “Pushto.” Or if you are looking for books in “Gagauz” or “Bulgaro-Turkic” USE “Altaic (Other)”