Alphabetical order

"Alphabetization" redirects here. For the creation of an alphabetic writing system, which in instances of Latin script is called romanization, see Romanization.

Alphabetical order is a system whereby strings of characters are placed in order based on the position of the characters in the conventional ordering of an alphabet. It is one of the methods of collation.

To determine which of two strings comes first in alphabetical order, their first letters are compared. If they differ, then the string whose first letter comes earlier in the alphabet comes before the other string. If the first letters are the same, then the second letters are compared, and so on. If a position is reached where one string has no more letters to compare while the other does, then the first (shorter) string is deemed to come first in alphabetical order.

Capital letters (upper case) are generally considered to be identical to their corresponding lower case letters for the purposes of alphabetical ordering, though conventions may be adopted to handle situations where two strings differ only in capitalization. Various conventions also exist for the handling of strings containing spaces, modified letters (such as those with diacritics), and non-letter characters such as marks of punctuation.

The result of placing a set of words or strings in alphabetical order is that all the strings beginning with the same letter are grouped together; and within that grouping all words beginning with the same two-letter sequence are grouped together; and so on. The system thus tends to maximize the number of common initial letters between adjacent words.

History

Alphabetical order was first used in the 1st millennium BCE by Northwest Semitic scribes using the Abjad system.[1] The first effective use of alphabetical order as a cataloging device among scholars may have been in ancient Alexandria.[2] In the 1st century BCE, Roman writer Varro compiled alphabetic lists of authors and titles.[3] In the 2nd century CE, Sextus Pompeius Festus wrote an encyclopedic epitome of the works of Verrius Flaccus, De verborum significatu, with entries in alphabetic order.[4] In the 3rd century CE, Harpocration wrote a Homeric lexicon alphabetized by all letters.[5] In the 10th century, the author of the Suda used alphabetic order with phonetic variations. In the 14th century, the author of the Fons memorabilium universi used a classification, but used alphabetical order within some of the books.[6]

In 1604 Robert Cawdrey had to explain in Table Alphabeticall, the first monolingual English dictionary, "Nowe if the word, which thou art desirous to finde, begin with (a) then looke in the beginning of this Table, but if with (v) looke towards the end."[7] Although as late as 1803 Samuel Taylor Coleridge condemned encyclopedias with "an arrangement determined by the accident of initial letters",[8] many lists are today based on this principle.

Ordering in the Latin script

Basic order and example

The standard order of the basic modern ISO basic Latin alphabet is:

A-B-C-D-E-F-G-H-I-J-K-L-M-N-O-P-Q-R-S-T-U-V-W-X-Y-Z

An example of straightforward alphabetical ordering follows:

The above words are ordered alphabetically. As comes before Aster because they begin with the same two letters and As has no more letters after that whereas Aster does. The next three words come after Aster because their fourth letter (the first one that differs) is r, which comes after e (the fourth letter of Aster) in the alphabet. Those words themselves are ordered based on their sixth letters (l, n and p respectively). Then comes At, which differs from the preceding words in the second letter (t comes after s). Ataman comes after At for the same reason that Aster came after As. Attack follows Ataman based on comparison of their third letters, and Baa comes after all of the others because it has a different first letter.

Treatment of multiword strings

When some of the strings being ordered consist of more than one word, i.e. they contain spaces or other separators such as hyphens, then two basic approaches may be taken. In the first approach, all strings are ordered initially according to their first word, as in the sequence:

In the second approach, strings are alphabetized as if they had no spaces, giving the sequence:

Special cases

Modified letters

In English, modified letters (such as those with diacritics) are treated the same as the base letter for alphabetical ordering purposes. For example, rôle comes between rock and rose, as if it were written role. However languages that use such letters systematically generally have their own ordering rules. See Language-specific conventions below.

Ordering by surname

In cultures where family names are written after given names, it is usually still desired to sort lists of names (as in telephone directories) by family name first. In this case, names need to be reordered to be sorted properly. For example, Juan Hernandes and Brian O'Leary should be sorted as "Hernandes, Juan" and "O'Leary, Brian" even if they are not written this way. Capturing this rule in a computer collation algorithm is difficult, and simple attempts will necessarily fail. For example, unless the algorithm has at its disposal an extensive list of family names, there is no way to decide if "Gillian Lucille van der Waal" is "van der Waal, Gillian Lucille", "Waal, Gillian Lucille van der", or even "Lucille van der Waal, Gillian".

The and other common words

Sometimes if a phrase begins with a very common word (such as "the" or "a"), that word is ignored or moved to the end of the phrase, but this is not always the case. The book title "The Shining" might be treated as "Shining", or "Shining, The" and therefore would be ordered before the book title "Summer of Sam", although it may also be treated as simply "The Shining" and therefore would be ordered after "Summer of Sam". Similarly, the book title "A Wrinkle in Time" might be treated as "Wrinkle in Time", "Wrinkle in Time, A", or simply "A Wrinkle in Time", depending on whom you ask. All three alphabetization methods are fairly easy to create by algorithm, but many programs rely instead on simple lexicographic ordering.

Mac prefixes

Main article: Mac and Mc together

The prefixes M' and Mc in Irish and Scottish surnames are abbreviations for Mac, and are sometimes alphabetized as if the spelling is Mac in full. Thus McKinley might be listed before Mackintosh (as it would be if it had been spelled out as "MacKinley"). Since the advent of computer-sorted lists, this type of alphabetization is less frequently encountered, though it is still used in British telephone directories.

Treatment of numerals

Main article: Lexicographical order

When some of the strings contain numerals (or other non-letter characters), various approaches are possible. Sometimes such characters are treated as if they came before or after all the letters of the alphabet. Another method is for numbers to be sorted alphabetically as they would be spelled: for example 1776 would be sorted as if spelled out "seventeen seventy-six", and 24 heures du Mans as if spelled "vingt-quatre..." (French for "twenty-four"). When numerals or other symbols are used as special graphical forms of letters, as 1337 for leet or the movie Seven (which was stylised as Se7en), they may be sorted as if they were those letters. Natural sort order orders strings alphabetically, except that multi-digit numbers are treated as a single character and ordered by the value of the number encoded by the digits.

Language-specific conventions

Languages which use an extended Latin alphabet generally have their own conventions for treatment of the extra letters. Also in some languages certain digraphs are treated as single letters for collation purposes. For example, the 29-letter alphabet of Spanish treats ñ as a basic letter following n, and formerly treated the digraphs ch and ll as basic letters following c and l, respectively. Ch and ll are still considered letters, but are now alphabetized as two-letter combinations. (The new alphabetization rule was issued by the Royal Spanish Academy in 1994.) On the other hand, the digraph rr follows rqu as expected, and did so even before the 1994 alphabetization rule.

In a few cases, such as Kiowa, the alphabet has been completely reordered.

Alphabetization rules applied in various languages are listed below.

A, AU, E, I, O, U, B, F, P, V, D, J, T, TH, G, C, K, Q, CH, X, S, Z, L, Y, W, H, M, N

Automation

Collation algorithms (in combination with sorting algorithms) are used in computer programming to place strings in alphabetical order. A standard example is the Unicode Collation Algorithm, which can be used to put strings containing any Unicode symbols into (an extension of) alphabetical order. It can be made to take conform to most of the language-specific conventions described above, by tailoring its default collation table. Several such tailorings are collected in Common Locale Data Repository.

For more details see Collation § Automated collation.

Similar orderings

The principle behind alphabetical ordering can still be applied in languages that do not strictly speaking use an alphabet – for example, they may be written using a syllabary or abugida – provided the symbols used have an established ordering.

For logographic writing systems, such as Chinese hanzi or Japanese kanji, the method of radical-and-stroke sorting is frequently used as a way of defining an ordering on the symbols. Japanese sometimes uses pronunciation order, most commonly with the Gojūon order but sometimes with the older Iroha ordering.

In mathematics, lexicographical order is a means of ordering sequences in a manner analogous to that used to produce alphabetical order.

Some computer applications use a version of alphabetical order that can be achieved using a very simple algorithm, based purely on the ASCII or Unicode codes for characters. This may have non-standard effects such as placing all capital letters before lower-case ones. See ASCIIbetical order.

A rhyming dictionary is based on sorting words in alphabetical order starting from the last to the first letter of the word.

See also

References

  1. Reinhard G. Lehmann: "27-30-22-26. How Many Letters Needs an Alphabet? The Case of Semitic", in: The idea of writing: Writing across borders / edited by Alex de Voogt and Joachim Friedrich Quack, Leiden: Brill 2012, p. 11-52
  2. Daly, Lloyd. Contributions to the History of Alphabetization in Antiquity and the Middle Ages Brussels, 1967. p. 25
  3. O'Hara, James (1989). "Messapus, Cycnus, and the Alphabetical Order of Vergil's Catalogue of Italian Heroes". 43: 35–38. JSTOR 1088539.
  4. LIVRE XI – texte latin – traduction + commentaires.
  5. Gibson, Craig (2002). Interpreting a classic: Demosthenes and his ancient commentators.
  6. Yeo, Richard (2001). Encyclopaedic visions: scientific dictionaries and enlightenment culture. Cambridge University Press. ISBN 0521651913.
  7. Robert Cawdrey's – A Table Alphabetical OBERT (1604).
  8. Coleridge's Letters, No.507.
  9. "Unicode Technical Standard #10". Unicode, Inc. (unicode.org). 2008-03-28. Retrieved 2008-08-27.

Further reading

External links

This article is issued from Wikipedia - version of the 11/29/2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.