Last week, a product called
flowRL caught my attention. Check out the GIF.
The product is meant to help app makers improve their sales (or another conversion of their choosing) by delivering personalized UIs, tailored to each user.
This type of real-time personalization with AI is driven by a machine-learning concept called Reinforcement Learning (RL).
I spoke to the makers of flowRL,
Fred Kurdov and
Alexey Primechaev, to get an introductory lesson to Reinforcement Learning and to learn more about what makes flowRL so much more powerful than traditional methods of finding a better UI.
You’ve already seen Reinforcement Learning in action
Unbeknownst to me, prior to speaking with Fred and Alex, RL is not a new concept. I’ve already engaged with technologies that leverage RL.
GPT4, for example, uses RL (specifically, reinforcement learning from human feedback (RFHF)) as part of its training process. Autonomous vehicles, robotics, video games, and chatbots have all applied RL concepts to their products and models to deliver an optimal result to users who are doing a specific action or seeking information.
That’s exactly what RL seeks to accomplish.
Reinforcement Learning is an ML technique where an agent learns to make decisions by interacting with an environment, with the goal of obtaining the maximum cumulative reward. Through trial and error, the agent receives feedback in the form of rewards or penalties to help it learn and improve over time.
It’s probably pretty easy to see how this might help your interactions with an AI agent, but we asked ChatGPT to explain to us how RL improves its performance. I’ve summarized a few benefits of using RLHF below (ChatGPT can be a little long-winded sometimes).
- Alignment with Humans: Responses are more aligned with human values and preferences and are more likely to be relevant, appropriate, and useful in a human context.
- Specificity: Human trainers can provide feedback on specific tasks or scenarios. For instance, if the model consistently makes errors in a particular domain, targeted feedback can help correct these mistakes.
- Biases and more: Human feedback is crucial in teaching the model to avoid generating misleading, offensive, or biased information.
- Reading between the lines: Through iterative feedback, the model gets better at understanding context and nuance in conversations.
- Reading the room: Human feedback helps the model understand and replicate the subtleties of human communication, such as tone, humor, empathy, and politeness, which are difficult to learn from text data alone.
Again, keep in mind, this type of RL includes human feedback as part of the training, but it helps me understand what OpenAI means when they say “ChatGPT now understands more context.”
RL beats A/B testing
So, Reinforcement Learning isn’t new, but we are still regularly seeing the paradigm being applied to new industries, technologies, and products.
As a product analyst who had worked at several large companies, Fred saw an opportunity to use Reinforcement Learning to tackle one of the largest pain points across analytics and product development: A/B testing.
Fred and Alexey – friends from school who had talked about making flowRL for years – decided to finally build a plug-and-play UI personalization product using RL/AI to drive target metrics for customers.
“You might envision different UI elements that might be needed for different users, like a banner on a homepage, a rearrangement of components on a checkout page….” explained Fred. “Our algorithm predicts which UI variant is best for a given particular user. We use deep reinforcement learning with multi binary action space. So basically, [the agent] just chooses a UI variance in each experiment with a user.”
“We also leverage transformers a bit,” he continued. "We leverage click and event data that our customers provide to us to make an embedding. It's like how ChatGPT is trained on token sequences where each token is a word. In our context, each token is an event that is happening inside an app. We basically compress these events into tight representations called embeddings, which are fed to the model and help the model to predict which variance to solve with the objective [being] to optimize the given metric. That might be subscriptions, bookings, orders, or something else.”
If you’re wondering, “Is this really all that different than A/B testing where you’re still picking the preferred outcome?” then you wouldn’t be alone, but the answer is no.
“There are several comparison points against A/B testing, but the first and biggest is that it's not arbitrary,” explained Alex, who himself is a seasoned product designer. "A/B testing separates the user base in an arbitrary manner and serves different variants to them to test how they perform on average and then determines the winner. We do not do that. We start serving randomly to users, and then we learn and allow the [tool] to target specific individual users who would benefit from those specific variants."
"In A/B testing, even the winning variant — the one that had 60% of users who preferred it — the other 40% didn't. So rolling out that variant for them would actually lower the performance, and that's what ends up happening.”
I’ll be honest. I had trouble perfectly understanding Alexey and Fred’s second main point on how RL beats A/B testing + similar testing frameworks (mathematics and statistics were never my strength), but the main point is that flowRL lets customers simply optimize for their end goal (i.e. subscriptions, purchases, etc.). A/B tests, on the other hand, require isolated tests that have to be very limited in scope to have usable results.
While there are additional statistical practices more advanced than A/B testing, Fred says Reinforcement Learning is “the end game” for real-time personalization.
Always-on optimization
We are still at the beginning of seeing how Reinforcement Learning shapes our interactions and engagements with brands and each other online, to the point that it seems unattainable for most.
Menlo Ventures recently backed another startup working on a similar solution called
OfferFit, and partner Jean-Paul Sanday told VentureBeat the concept initially seemed too good to be true.
But always-on optimization and hyper-personalization are certainly “take all my money” concepts when proven true. They completely change the game in marketing, customer acquisition, UX design, and advertising, where I'd venture to say most time is wasted figuring out what people want to see.
The answer is never simple. The current solution is typically formed by various communications to several persona types across different phases in their journey. Plus, there’s the achingly slow pace of progress as professionals in these fields work to adapt their strategies and assets with only bits of information at a time.
As a career marketer and content creator, it’s exciting get a glimpse of a future where I don’t have to wonder “Did I choose the right headline?” because the right headline was chosen for the user at hand.
–