NLP Before the Transformer Boom
Traditional machine learning has experienced major leaps and bounds - but only in certain fields. Advancements in machine learning (ML) today are associated with semantic search data labeling and vector databases. Language AI is one highly impacted area; knowing how to leverage this fact will dramatically differentiate companies that take advantage of modern advancements vs those that remain constrained by antiquated technologies. Conversational data has been limited to keyword search and sentiment analysis based on keyword searches. Natural Language Processing (NLP) as a subsection of ML helped build libraries to make text analysis more palatable. We will dive into where ML and NLP has come from and how LLMs are being leveraged to take conversational data analytics to a useful and insightful level.
Traditional NLP
Traditional NLP is most often characterized by a number of methods of compartmentalizing and analyzing language data in the pre-LLM era. While neural networks were employed for language tasks before LLMs became viable, these did not offer the reliability or scope of 2024’s technology. Recurrent neural networks, long/short-term memory networks, and even the convolutional neural networks first renowned in image processing were not substantial enough improvements to escape the academic world and enter commercial space.
Instead, traditional NLP was most successful and applied at a number of very specific tasks, such as named entity recognition, sentiment analysis, part of speech tagging, and some degree of language translation. Practitioners of traditional NLP will no doubt recall using basic statistical techniques in processes like TFIDF or using regular expressions to gain insight on the word distribution of texts or to make modifications to texts.
Perhaps one of the most obvious examples of the limitations of traditional NLP is comparing current-day ChatGPT to a Hidden Markov Model or a basic LSTM neural network; these older models are totally incapable of retaining memory over long periods of time (or almost not at all in the case of the Markov Model) and therefore cannot produce even a few paragraphs of texts without extreme deviation from the starting point.
The Limits & Use Cases of Traditional NLP
Still, traditional NLP did prove highly useful in a number of niche fields: spam detection was critical for email providers. Topic modeling could be used for organizing text into the semantic categories that comprised it or for searching loosely within it. The advent of word vectors was a foundational advancement in NLP, where the semantic content of a word or phrase could be encoded by numbers; this freed NLP practitioners from the constraints of basic word frequency-based approaches to tasks like similarity detection and added a great deal of resolution to semantic calculations.
Named Entity Recognition, which predates word embeddings at least in terms of viable commercial application, was practical in information extraction practices such as knowledge graph construction - or more generally speaking in identifying relationships between people, places, organizations, and so on.
Sentiment Analysis was another famous NLP tool, which was used for gauging brand performance, monitoring customer feedback, tracking market trends, and generally performing what is now often referred to as ‘social listening’.
The Advent of Modern Language Models
The single greatest advancement in language AI in recent time has been the transformer architecture of the neural network. Google’s pivotal paper, ‘Attention is All You Need’ was the first iteration of this. Since then, every single useful LLM has copied or iterated on the concepts laid out in that paper. At the very core of LLM utility is the word embedding/word vector - the encapsulation of the semantic content of language. This store of meaning is why LLMs can process information and seemingly (but not actually) understand us.
Traditional ML was heavily constrained; often training models to predict which category a thing should fall into required the explicit decision on the quantity and description of every possible category with all of the associated training data each category needed in order for the model to learn sufficient features so as to recognize it in practice. Even something as simple as detecting whether an image contained a cat or a dog required tuning model parameters and supplying thousands of images of cats and dogs. LLMs are not constrained in the same way as these older, often rules-based, and more obviously statistical models are. For example, instead of training a model on detecting the tone of an author, you could simply ask an LLM to tell you - minus the task-specific training process and training data.
One of the greatest benefits of the concept of ‘transfer learning’ is that the vast training data that LLMs were already trained on are typically sufficient to provide it with general prowess. Going further and fine tuning LLMs (not always necessary and sometimes a waste of money) can push linguistic/analytical competence further; GPT4 passed the bar exam and another version passed a medical license exam.
Use Cases of Large Language Models
The November 2022 release of ChatGPT spawned countless novel applications of LLMs to real problems. Initially, an individual developer with ChatGPT API access could create new solutions to old problems with little competition; having a pseudo-therapist to ask questions of based on your own digital diary was one of the easier ones any decent AI dev could spin up in a day. Now, the industry is moving so quickly that Apple and Samsung - some of the biggest names in the smartphone business - tailor their hardware to run AI on-device. But it's not just any type of AI in the spotlight, LLMs are a major threat to search and companies like Microsoft, Apple, and Google are racing to integrate LLMs into their systems. Right now, Apple is in talks with OpenAI on integrating ChatGPT into iPhones. If you deal with any type of language or conversational data, you should be equally concerned with your analytics and data enrichment processes.
In the case of even niche tasks that LLMs were never trained for, they prove not only useful but commercially viable. Not only can it perform traditional tasks like sentiment analysis, but it can do new types of tasks, like determining the label to give a set of text; automated labeling is a sought-after ability, and you can have your LLM do so as per specific instructions. Dashbot for example approaches topic modeling by combining effective traditional NLP techniques with LLMs to access a 360 analysis of the data. This produces a far more natural-sounding and representative topic label that anyone can understand. The ability to weave modern LLMs into existing data pipelines is not straightforward but is a major differentiating factor between companies that do data well, versus companies that slap ChatGPT onto the first problem they encounter.
Using LLMs at commercial scale isn't relegated to niche tasks like topic labelling. Even domain-specific fields already experience commercial viability from modern language AI; in law, LLMs are used for contract drafting, document review, legal research, and to convert legal speak into layman’s terms language for stakeholder communication.
LLMs and the Future of Conversation Analysis
Perhaps one of the larger bridges between traditional ML/NLP and modern language AI is the traditional, intent-based chatbot; a model with an explicitly defined set of categories of user input it is trained to detect. While possibly soon to be anachronistic, the level of reliability in category detection and its production of a predetermined output gives it some advantage over pure LLM chatbots (as of May 2024).
Outside of the obvious chatbot-esque interactions that OpenAI has made famous, one of the major advantages of developing LLMs is their ability to enrich existing data. In customer service for example, hundreds of thousands of transcripts between users and chatbots or human agents can be organized and analyzed by things like sentiment, user frustration, user effort expended, which topics are resolved most appropriately, and many more dimensions besides. Dashbot introduces this kind of data enrichment in its data transformation pipeline, which is why it's possible to view your customer interactions by broad (or specific) semantic categories as well as project metrics like CSAT across your entire data - all analyses based on data that never existed prior to the enrichment portion of the pipeline.
The transformer-based architecture of LLMs has also rippled across lesser known domains: in the field of gene editing, the new OpenCRISPR-1 was open-sourced after leveraging LLMs “trained on biological diversity at scale…” They claim to “demonstrate the first successful precision editing of the human genome with a programmable gene editor designed with AI.” At least for now, LLMs are outshining more traditional ML approaches in multiple fields.
General expectations of LLM development in the future imagine that wherever language exists digitally, language models may interact with humans or with each other. Multimodality is one of the major additions to LLM advancement, where LLMs can reference and ‘understand’ images. Modern phones already have the ability to perform image search, but truly multimodal systems will allow higher degrees of search and questioning. Feats like passing the bar exam may suggest deeper possibility for personalized care in fields like psychology or teaching, since these models will develop greater domain knowledge, deeper memory, and more sophisticated analyses of personal interactions. Finally, inference time will go down as compute becomes more efficient, enabling more private (hopefully), on-device, or cloud-based interactions with AI.