19/03/2023
It has been quite a journey to arrive at a ChatGPT model! It took some time before we thought about modeling language as a probabilistic generative process. NLP studies the interactions between computers and human language and it is also as old as computers themselves.
Warren Weaver was the first to suggest an algorithmic approach to machine translation (MT) in 1949 and this led to the Georgetown experiment, the first computer application to MT in 1955. In 1957, Chomsky established the first grammar theory. ELIZA (1964) and SHRDLU (1968) can be considered to be the first natural-language understanding computer programs.
The 60s and early 70s marked the era of grammar theories. Transformational-generative Grammar by Chomsky in 1965, Case grammar by Fillmore in 1968 recognized the relationship among the various elements of a sentence. In 1969, Collins proposed the semantic network, a knowledge structure that depicts how concepts are related to one another and illustrates how they interconnect. In 1970, Augmented Transition Networks was a type of graph theoretic structure used in the operational definition of formal languages. In 1972, Schank developed Conceptual Dependency Theory to represent knowledge for natural language input into computers.
During the 70s, the concept of conceptual ontologies became quite fashionable. Conceptual ontologies are similar to knowledge graphs where concepts are linked to each other by how they are associated. You can imagine generating sentences by following concepts paths in ontologies. The famous ones are MARGIE (1975), TaleSpin (1976), QUALM (1977), SAM (1978), PAM (1978), Politics (1979) and Plot Units (1981).
The 80s showed a great period of success for symbolic methods. In 1983, Charniak proposed Passing Markers, a mechanism for resolving ambiguities in language comprehension by indicating the relationship between adjacent words. In 1986, Riesbeck and Martin proposed Uniform Parsing, a new approach to natural language processing that combines parsing and inferencing in a uniform framework for language learning. In 1987, Hirst proposed a new approach to resolving ambiguity: Semantic Interpretation.
The 90s saw the advent of statistical models for NLP. It was the beginning of thinking about language as a probabilistic process. In 1989, Balh proposed a tree-based method to predict the next word in a sentence. IBM presented a series of models for statistical machine translation. In 1990 Chitrao and Grishman demonstrated the potential of statistical parsing techniques for processing messages and Brill et al introduced a method for automatically inducing a part-of-speech tagger by training on a large corpus of text. In 1991, Brown proposed a method for aligning sentences in parallel corpora for machine translation applications.
In 2003, Bengio proposed the first neural language model, a simple feed-forward model. In 2008, Collobert and Weston applied multi-task learning with ConvNet. In 2011, Hinton built a generative text model with Recurrent Neural Networks. In 2013, Mikolov introduced Word2Vec which completely changed the way we approach NLP with NN. In 2014, Sutskever suggested a model for sequence-to-sequence learning. In 2017, Vaswani gave us the Transformer architecture that led to a revolution in model performance. In 2018, Devlin presented BERT that popularized Transformers. And in 2022, we finally got to experience ChatGPT that completely changed the way the public perceived AI.