Exploring ChatGPT and Diffusion Models for Communication

Diffusion and Large Language Models

Dec 08, 2022

“Any sufficiently advanced technology is equivalent to magic.”
- Arthur C. Clarke

When I compose an article, I try to come at it from a different perspective. If you want regular reporting, you can always read the newspaper and get a journalist’s perspective of a complicated topic. I find that those articles don’t really tell you all that much. When it comes to describing complex technology, I first like to play with it, use it and break it to see its limitations. I need to understand what it is doing and then put it in my own words so I can describe it to a non-technical person.

From a very young age I was one of those kids that wanted to know how things worked. I took everything apart. Literally everything. My grandfather gave me a very old pocket watch when I was still in single digits and the first thing I did when I got home was take it apart. Electronic games for Christmas and yes, I took them apart. By ten I was putting brakes on dad’s car and doing all sorts of science experiments in my bedroom that these days would make most parents more than freak out. I am and always have been fascinated with technology and science.

Recently I have been starting to use two technologies that I find incredibly interesting. One is diffusion models (e.g.Stable Diffusion, DALL-E, Midjourney) which are text to image conversation platforms using large language models (LLMs) and pretrained with images and the other is ChatGPT which is an interactive chat application. It too has been pre-trained with large amounts of text using OpenAI’s GPT-3.

Large Language Models

I asked ChatGPT to define itself and this is how it responded:

“As I mentioned earlier, GPT-3, or Generative Pretrained Transformer 3, is a large language model trained by OpenAI. It is designed to generate human-like text based on the input it receives. Because of its ability to generate high-quality text, GPT-3 has a wide range of potential applications, including natural language processing, machine translation, chatbots, and text summarization. It is called a "transformer" because it uses a type of neural network architecture called a transformer, which allows it to process input text more efficiently than previous models.”

Now I would expect that it would generate a good text in response to that specific question, but I asked quite a few different questions. Suffice to say the technology is impressive in its ability to generate a response that is comprehensible and articulate albeit it not always right. I’ve worked with a lot of rule-based interactive systems in the past and this is light years ahead. Rule based systems, or the early days of AI, are very limited in their responses and can only give back what has been encoded in rules. It works well for many domains, but it is not a general purpose approach for generalized problems such as building a chatbot. When you call the helpline for a company, and you get the painful interactive voice response system that is an example of a traditional rule-based system. Even chatbots from five years ago where highly limited in their ability to respond and were very domain specific. Text generation was a combination of rules and statistical analysis of text and what should come next. There was no context. What LLMs have given us, using transformers, is context. Context is critical in any conversation and what has made these models incredibly powerful.

Let’s take that interactive voice response system and implement it using the technology in ChatGPT augmented with a custom, domain specific model, and now you have replaced humans in many functions. Combine it with the ability to convert speech to text in real-time and you now have an incredibly powerful tool that in most cases will appear to be human to the person on the other end of the phone. Of course, there is the obvious replacement of the technical support desk but what about more complicated job functions? How about your wealth manager? Crazy sounding? Yes, but plausible, very plausible and at some point, likely. Let’s even go a bit further and you can now get a very human response in the voice that you choose. Maybe you like a female voice from a specific country, or you want your response translated into the language of your choice. Imagine combining ChatGPT with a very human like avatar and speech to text conversion and you have one very powerful tool.

For many very complex domains the elimination of the human element is a very real possibility especially in domains where there are massive amounts of readily available data to train the models. For many everyday questions why not replace your doctor? Given the state of the technology this is not something that is technically far off. Now a regulator might have a very different view, but this is, again, something very plausible.

Generative Models

Many years ago, pre-modern AI and pre-GPU, I was working with a small company that built a sophisticated analysis tool using Bayesian and generative models. The easy definition of a generative model is a model that generates plausible data based on an underlying dataset. Using small datasets and generative models they could find subtle relationships in data that were undetectable because the data didn’t contain those relationships since the sample size was small. Fast forward to today and generative models have grown in sophistication, and with modern GPUs we can train models with massive amounts of data and use that to create new plausible data. The most visible application has been the creation of AI generated images from text using diffusion models which are a version of generative models. These tools can create incredible images that are seemingly lifelike and limited by only your imagination. Now some of them, or many of them, come out looking exceedingly odd but nonetheless the technology is impressive even if is early days.

I’ve been using a few of these tools: Midjourney, DALL-E and Stability Diffusion and all of them are fascinating. One can easily see the practical applications once it becomes more mature. Creating a custom advertisement based on a few phrases for a specific individual based on their history, the dynamic creation of video, games, AR/VR, interactive classrooms, and other domains where imagery is critical.

But there are other areas where generative models are useful. When GANs (Generative Adversarial Networks) where all the rage I hypothesized that they can be used in finance to generate real-time pricing data for equity markets. Or what if modern generative models were around during the time of the housing crisis and rather than using traditional mortgage prepayment models, we used generative models to model the housing market. Quite possibly all those previously unforeseen market problems might have been seen when you can model millions if not billions of potential outcomes that don’t explicitly exist in the underlying data.

One of the more interesting applications of diffusion models is the creation of training data for medical applications. For instance, training a model to discover breast cancer and not having access to a large dataset of medical imagery and only having access to a physician’s notes about the cancer. Creating imagery based on notes and using it to train a model to identify cancer would dramatically reduce the time and cost to create new models.

Both technologies have the potential to radically impact how we interact with other systems. Properly trained with the correct set of controls applications that have been spoken about for decades are now well within the realm of possibility.

While we are not replacing human ingenuity any time soon, if ever, these tools will fundamentally alter how we approach a large class of interactions whether we like it or not. We are in for one helluva interesting ride.

And to close I asked ChatGPT what the future of large language models was and here was its response:

“The future of large language models is difficult to predict with certainty, but they are likely to continue to play an important role in a variety of applications. As the technology behind these models continues to improve, they may become even more effective at tasks such as language translation, summarization, and text generation. They may also be used in new and creative ways, such as for creating personalized content or helping with the development of new artificial intelligence systems. It is also possible that large language models will become more integrated with other technologies, such as virtual and augmented reality, to create more immersive and interactive experiences.”

Thank you for reading Peter’s Musings on Data Science and Technology. This post is public so feel free to share it.

Peter’s Musings on Data Science and Technology

Exploring ChatGPT and Diffusion Models for Communication

Diffusion and Large Language Models

Large Language Models

Generative Models

Discussion about this post