ARTS1691 - Computational Linguistics

Computational linguistics is a very broad field, it essentially combines programming and language together. Key areas are:

  • Analysis of written and spoken discourse
  • Translation of text and speech
  • Use of human language for interacting between computers and people
  • Modelling and testing of linguistic theories
  • Frequency analysis
  • Information retrieval and summary

Automatic Machine Translation
Machine translation has been around since the 1940s, and the main difficulty is the inability to simply translate word for word.

Why this is:

  • Different word orders between language (e.g. SVO, SOV)
  • Syntactic/Semantic ambiguity (structure/synonyms)
  • Idioms (don't translate with semantic integrity)

How AMT currently works:
Speech Source → Speech Recognition System → Text Source → Translation → Text Target → Speech Synthesis System → Speech Target

Frequency Analaysis

  • the 10 most frequent English words used in texts are 'the', 'of', 'and', 'to', 'a', 'in', 'that, 'is', 'was' and 'he'
  • Concordance: search terms with immediate context
  • Collocations: multiple search times and their appearances together