Searching for Non-English Language Materials: Scripts, Diacritics, Numerals and Dates

Find strategies and tools to help make your search for non-English language materials more effective, including guidance on using scripts, diacritics, numerals and dates.

Scripts

Non-English language materials may be written in Latin (also called Roman) characters or non-Latin script characters. Knowing what script you are working with can impact the strategy you use to find materials in the online catalog.

Latin characters constitute the letters of the English alphabet. Extended Latin characters include the English alphabet, plus additional letters and diacritical or accent marks and are found in languages such as Croatian, French, German, Polish, Spanish, Turkish, and Vietnamese.
Non-Latin character or script languages include Arabic, Armenian, Chinese, Greek, Hebrew, Hindi, Japanese, Korean, Persian, Russian, Thai, Tamil, and Urdu, just to name a few.

One script serving multiple languages

When the same script is used for more than one language (such as Arabic script for Persian, Arabic and Ottoman Turkish), the letters may look the same, but vocabularies, pronunciation and romanization will be different.

One language in multiple scripts

A single language can be written in more than one script. For example, some Central Asian Turkic languages switched between Arabic, Latin, and Cyrillic alphabets four times in the 20th century.

Directionality

Non-Latin script languages may be written horizontally with the characters reading left to right, horizontally with the characters reading right to left, or vertically with each column read from top to bottom. Regardless of the directionality of the original non-Latin script, ALA-LC Romanization is always recorded and read horizontally and left to right.

Pro tip: Always enter romanized words into a search so that they read from left to right.

Transliteration & romanization

To enable communication across languages or interoperability between technologies, it is sometimes necessary to convert from one script to another by means of transliteration or romanization.

Transliteration is the conversion of one script, alphabet, or character set into another, often in a way that approximates the pronunciation of the original words in the non-native script. (Not to be confused with translation).
Romanization is specifically the conversion of a non-Latin/Roman script into Latin/Roman script.

There are many romanization standards used worldwide by scholars, in printed citations, or in library catalogs or databases from other countries. The University of Michigan Library (along with most North American libraries) uses the American Library Association-Library of Congress (ALA-LC) Romanization Tables to provide consistent and predictable romanization of non-Latin script languages within its own catalog.

Pro tip: Become familiar with the ALA-LC Romanization Tables corresponding to any non-Latin script language material you wish to find. This will allow you to correctly enter romanization in a North American online library catalog for a more successful search.

Searching by language

Search for collection material in a specific language or language grouping from the Advanced Search page in Library Search. To find the spelling of the name for a language or language grouping used in the list of languages within Library Search Advanced Search page, refer to the MARC Code List for Languages. For instance, instead of “Pashto” USE “Pushto.” Or if you are looking for books in “Gagauz” or “Bulgaro-Turkic” USE “Altaic (Other)”

Last Updated: Dec 4, 2025 1:11 PM

Subjects: How-to, International Studies

Tags: dates, diacritics, non-Latin scripts, numerals, romanization, transliteration