ARTS1691 - Language Classification Systems

There are three main classification systems for defining languages; areal linguistics, typological linguistics and genetic linguistics.

Areal Linguistics

Areal linguistics is a way of classifying languages based on their regional location. It is not possible to fit every single language into a family structure, yet some languages do share common features which we can put down to things such as borrowing (due to close proximity or prolonged exposure).


In the European languages there are some that do not fit into the 9-branch Indo-European family tree structure.

Languages such as Hungarian, Finnish and Estonian can be explained by arriving late and being related to the Finno-Ugric family.

On the other hand languages like Euskara (spoken by the Basques in Spain) is isolated and doesn't appear to fit anywhere.


Typological classification is done by describing details of language structure; sound patterns, semantic and morphological rules, semantic distribution.

Studying the typological differences and similarities can show us the range of possibilities; the extreme differences and where the scale ranges from and to. It can also give us information about the language universals; those features which are constant in every language.

Absolute Universals
Features that occur in every language.
Universal Tendencies
Patterns that are likely to occur in most languages.
A feature which has a specific condition under which they operate.

E.g. If a language has two distinct terms for arm/hand then it'll have two for leg/foot. Similarly arm/hand being the same leads to leg/foot being the same as well, e.g. Russian.

A feature which operates without specific conditions.

E.g. All languages have pronouns for at least first and second person.

Factors in Explaining Universals

Study into typological classification is still reasonably new, and hence given how important universals are to understanding the brain and the principles that drive all communication, we try to be cautious about making assumptions about universals. Using logical reasoning to explain social and cognitive effects of universals rather than empirical proof cannot be correctness ensured.

Often it is the perception distinctiveness that determines whether a feature is universal; it makes sense for the core features of every language to be those that allow for the most clarity and distinctness for understanding.

Sometimes phonological reasons play a part in determining why some feature is a universal. For instance the minimal vowel system in any language is [i], [a], [u]. This makes sense as the three are the furthest apart and hence allow for the greatest distinction between sounds; increasing ease of clarity.

Similarly morphological reasons can be important; the differences between Old English “heo” (she) “hi” (he) and “hie” (they) were not large enough to provide a guaranteed distinction. Hence the introduction of the current forms.

Syntactic patterns can be explained with reference to the brain processing of sentence structure.

Genetic classification deals with sorting languages into family branches; explaining the similarities between languages that are too numerous to be caused purely areally (by borrowing).

There are a number of problems with this type of classification:

Lack of Data

  • We don't have data available for every language, and those we do it is not always much
  • Some languages (or dialects within languages) differ in their written forms but not in their linguistic features

Ensuring the link is Genetic

  • Defining the cause of something as genetic rather than areal can be quite murky. As can agreeing on the number of cognates required to classify it as a relation.
  • Separating out coincidental similarities caused by the pure probability of duplicates within a limited sound and structure system. Hence when sorting out languages that are similar we have to take whether they are completely unconnected into account.

Finding Obscured Links

  • Languages evolve over time, meaning two once quite closely connected languages (e.g. Russian and Old English both came from Indo-European roots) can appear quite distinct (e.g. Russian and Present Day English)
  • Sound changes and cognate dropping (words disappearing from the language) can make it quite difficult to find the similarities between languages.

Tracking Language Evolution Without Data

Robert Dixon proposed a method of tracing the 100,000 years of language development that we cannot recreate. He followed the lines of Steven J Gould's Punctuated Equilibrium evolutionary model.
The idea was that from a starting point we can suggest that in periods of peace the changes would have been areal; lots of borrowing and diffusion between languages. In active periods of change (religious, political, disease, war, natural disaster etc) then languages would have split as communities moved around, leading to creation of new language families.