Natural Language Processing (NLP) is a maintopic in AI where it is an active and interactive fields that’s at the intersection of computer science, AI and computational lingustics.
NLP involves languages as text.
In fact, most of the data is actually in text form, and estimated we have about hundreds of thousands of petabytes of text available.
Why NLP?
Because human want to achieve to communicate with computers using natural language.
One of the important application is in statistical machine translation.
The goal is to use statistical techniques and probability to infer the right translation of text, one of the example will be Google Translate.
Another important application will be information extraction, the aim is to automatically extract structured information from unstructured text.
Example include Columbia Newsblaster or News in Essence.
Third application is dialog system which via automated chatbots or online assistants.
The example will be caller and system where an user call to system and chat with the system before the user can finally get access to a real agent. Sometimes the user will not require real agent as the system is good enough to cater the user requirement.
Yes, NLP is an important topic but it is one of the hardest problem in AI. Why?
Because the language itself is too complex for system with language properties in
- Excellence ambiguous
- Metonymy (Example: Play Mozart, play is a metonymy)
- Metaphors
Progress of NLP
Below are topics which made a huge progress:
- Tagging, which is from a text to a tagged text.
Example: I take the ball. “I” could be tagged as pronoun, “take” is verb, “ball” is noun.
- Spam filtering which could achieve 99.9% accuracy with using Naive Bayes with some variance. Note: Naive Bayes is a text classifier method.
Topics that are works in progress:
- Summarization – Given large number of text/news/info, how to get the essence of this info.
For example, If one talking about the disadvantage of smoking which we can summarize as smoking in unhealthy.
- Question answering and dialect systems, although there is some progress in commercial application such as Siri, Echo.