During the Tech Research Showcase I attended a few weeks ago, someone asked me to guide them to some materials to start learning and doing some natural language processing and text analysis as a beginner. Text analysis, text mining, and natural language processing (NLP) are fields that all deal with mining and analyzing textual data to discover interesting patterns, extract useful insights, or learn more about the structure of language. Sometimes the terms are used interchangeably, even though the fields have different goals and work at different levels of language analysis. In addition, they all include techniques from linguistics, mathematics, statistics, and computer science.
In this post, I will not go into introducing the fields as they has been covered in several resources, and frankly, they are too huge to cover well in one post. They include a vast amount of techniques and methods for performing language detection, translation, document summarization, sentiment analysis, part-of-speech tagging, topic modeling, question answering, etc. A good background book which is practically oriented is the Practical Text Mining and Statistical Analysis for Non-structured Text Data Applications.
As large fields that have gained increased interest in both Academia and Industry, there are several sources and ways one can start. However, for someone who wants to just get their feet wet with text analysis, I would recommend the following links and courses to get practically started. From these, you also get to understand what is involved and helps you identify what questions or what insights you might want to get from the analysis or mining of the text, e.g., what is the sentiment in the text? What is the genre? Is it spam? among many others.
Recommendations:
-Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit – a practical introduction to NLP. You will learn by example, write real programs, and grasp the value of being able to test an idea through implementation.
–Python for text analysis – A practical course in Python, geared towards those who want to get some hands-on experience working with language data.
Apart from python based, there is also R based practical courses that make use of the tm: Text Mining Package, for example, Introduction to Text Mining with R for Information Professionals.
For online course which provide both theory and practical materials
–Introduction to Natural Language Processing – Provides an introduction to the field of NLP. The programming assignments are in Python.
–Text Mining and Analytics – The course will cover the major techniques for mining and analyzing text data to discover interesting patterns, extract useful knowledge, and support decision making, with an emphasis on statistical approaches that can be generally applied to arbitrary text data in any natural language with no or minimum human effort.
Besides understanding the field itself, recently it has become important to also understand how machine learning models are powering NLP applications. Recently, deep learning approaches have obtained very high performance across many different NLP tasks.
–Deep Learning for Natural Language Processing – Online course with lectures and reading materials – Lecture videos are available here.
–Oxford Deep NLP 2017 course – An advanced course on natural language processing containing lecture slides.
I will continue updating this page as I come across better or additional materials. If you have other good suggestions for beginners, please feel free to share.