Attribution Model Patterns with Yorgos Moschovis
Guests
Resources
Shane Gibson and Yorgos Moschovis discuss the complex and evolving world of attribution, particularly in the context of marketing. The conversation delves into the definitions, challenges, and methodologies of attribution, exploring both online and offline channels, as well as the impact of privacy regulations and third-party cookies.
They discuss:
- Attribution Defined: Attribution in marketing helps in understanding the contribution of each channel to a purchase or conversion event, determining which marketing channel is most effective in achieving specific financial objectives.
- Online vs. Offline Channels: The distinction between online and offline channels is becoming more complex with privacy regulations and the demise of third-party cookies.
- Complexity of Customer Behavior: Customers often have complex and non-linear journeys, requiring sophisticated analysis and different models for short and long consumer journeys.
- Data Constraints and Attribution Complexity: Tracing the first touch in a customer’s journey can be complex, especially with data constraints. Accurate attribution often faces challenges due to incomplete data.
- Investment in Channels and Attribution Modeling: Understanding which channels drive behavior allows organisations to allocate money more effectively. If a channel changes, it may require retraining the attribution model.
- Evolving Landscape and Adaptation to New Trends: The conversation hints at the ever-changing nature of social media platforms and the need for businesses to adapt their strategies accordingly.
- Personalised Engagement: Tailored communication, like Woody’s engaging Shane through personalized newsletters, emphasises the power of personalised engagement in enhancing the customer journey.
- Agile Approaches and Practical Learning: Both speakers emphasise the importance of agile approaches in marketing and the importance of practical learning and thinking about oneself as a consumer.
- Attribution as a “Dark Art”: Shane’s comment about attribution seeming like a “dark art” underscores the complexity and often misunderstood nature of this essential aspect of marketing analytics.
Listen on your favourite Podcast Platform
| Apple Podcast | Spotify | Google Podcast | Amazon Audible | TuneIn | iHeartRadio | PlayerFM | Listen Notes | Podchaser | Deezer | Podcast Addict |
Recommended Books
Podcast Transcript
Read along you will
Shane: Welcome to the Agile Data Podcast. I’m Shane Gibson.
Yorgos: I’m Yorgos Moschovis
Shane: Hey, Yorgos, great to have you on the show. You have the privilege of being the first partner from the AgileData Network coming on the podcast. We’ve been working together Agile data and Data Cenex, building out this consent management offering. But today we’re not here to talk about that.
Today we are here to talk about attribution. I think we’re talking about marketing attribution, but actually I’ve got no idea what attribution is. So we’ll probably rip in with that question first. But before we do that, why don’t you tell the audience a little bit about yourself, your background, how you got into this lovely world of data and analytics.
Yorgos: yeah. Thanks Shane. It’s good to be here. Look, I started in New Zealand and , that’s how we met. At the office of the Auditor General. And then I discovered that I had a an inclination to do a lot more analytics joined a c, then Telecom, now Spark New Zealand.
Then suddenly I discovered that I could do all of this as a consultant and joined Silicon Graphics as a consultant that took me to Asia. Once Silicon Graphics got into some real financial problems. Luckily I got tapped on the shoulder to join a SaaS person who was exiting from SaaS and starting a an Asia Pacific version of Unica, the campaign management software, which was later sold to I B M. And past that, I joined one of our major clients in Singapore, where my base became for 12 years. Then became a chief of analytics for a major bank there. Then moved to Stel to unify all of their analytics. And that’s how I came across the problem of attribution. Since Singtel had a hell of a lot of challenge in introducing digital. After 12 years in Singapore, I was headhunted to join Fairfax Media in Australia, which was a hell of a lot more attribution problems but in a good way because they are a mechanism to sell the media offerings to various clients after Fairfax Media merged with nine Entertainment.
Discovered that I wanted to do a lot more of this and joined a small company that did only attribution. As the chief of Analytics there and chief of Operations later on. I must have done maybe 20 or 30 different projects on attribution in various industries.
And that’s how I started liking the kind of challenge that attribution came with. And that’s where we are at today. The attribution problems have become a little bit a little bit more confronting, since the demise of third party cookies and a little bit more complex since the invasion of world carpets. It’s all good stuff. Nothing stays the same in in this space, that’s for sure.
Shane: Before we get into the attribution, effectively you’ve gone from being like a doer. To a leader, to a consultant. And then you’ve pretty much done most industries that have been around, so you’ve seen a lot of it. Think about back though, what the hell is it, 30 years ago when you first started out, doing the analytics at the office of the auditor general.
What was the term back there? Was it, were we back in the wave of data mining? Was that, or we, were we just back in, it’s called statistics. I can’t remember. What was the term for people that combined, analytics and data together using stats back then.
Yorgos: I think the the killer term was not a mining in in those days. Most of those terms are coined by big organization, software or hardware that wants to market their stuff around the planet, right? Data mining was determined.
I don’t think that statistical modeling was was sexy enough. The because everybody hated statistics in those days.
Shane: I remember doing stats at school and going this ain’t for me. And then I, at one stage I joined SaaS, the, statistical software company and went, oh my God. Actually I was wrong. When I said to my my teacher, I’m never gonna use this crap. I was like, Damn.
It’s funny how even back in the Unica days the dream, the vision from the software industry was self-service. Anybody can model, anybody can do stats, anybody can do analytics, and what are we 15, 20 years on? And we’re still trying to sell that dream, maybe the large language models are true realization but who knows?
So let’s get back to attribution let’s start out with a definition. What the hell is it?
Yorgos: Yeah, look, in simple terms it’s a mechanism to figure out which marketing channel, is probably the better one to achieve a specific financial objective. And usually when I say financial objective, it’s because attribution models have to.
Some kind of purchase or some kind of non purchase or whatever. That’s the way that all modeling works. It just requires some kind of well-defined response. Once you have an event like this or attribution looks at finding out what are the contribution of each channel participating might be, and that helps you fund that.
So if you feel that TikTok is the channel that has the greatest contribution to purchases of your product or service, then should be funded a little bit differently from,
Shane: So let me play that back to you , in the language I tend to use. So what you’re saying is , there’s a core business event, . So who does what? Let’s just use the e-commerce one and then we can go into other industries, customer orders product,
a customer comes on the web store, they add a product into the shopping cart and they buy it, the core business process. And what you are saying is, as an organization, that’s where we get value from. People buy more stuff. So we want more of that to happen. So we wanna understand which channel drove that process,
which channel was the thing that influenced that person to buy that product? Did I get that part of it right?
Yorgos: Yes, that’s right. So the initial reaction is that the last step was the one that created all the value. That’s the Amazon model, look at a specific ad somewhere and with one click you buy. And the therefore, in this simple example the ad and the e-commerce software will share the spoils in terms of contribution to the purchase event.
In real life, somebody maybe came to your website via search, via Facebook, via TikTok, via a variety of other channel, and they should get some kind of credit.
Shane: Okay, so before we get onto that complexity around last step, first step and all those other words I’ve read lots about, but understand little we go back to that core idea of attribution.
We see people come in and do the core business processes or events that we care about as an organization. And so we wanna answer the question when they did that, where did they come from? What contributed to them achieving that process? So it’s kinda like the core business question we’re asking.
And then quite importantly, what you said was, the action we want to take is we wanna put more money, or we wanna focus on those channels, to achieve more people following that process. So these channels we wanna put money into because we want, our outcome is more people buy more things and therefore we get more money.
So that’s the flow, is understand what drove the behavior to, for them to do that thing that’s important to us. And then the action is how can we encourage more people via that channel to do that again? Did I get it right?
Yorgos: Yep, that’s right. That’s the way it works.
Shane: And then we get into the complexity, so that’s where we start using specific terms, like last step . Before we do that though, does attribution only ever work with marketing? When you talk attribution, are we always talking a marketing use case, or is the term attribution like a pattern that can be used for other industries, other use cases?
It’s just typically used in marketing.
Yorgos: Yeah, it can be used in in other situations as well. When you have a a series of events that culminate in a particular objective that you are tracking.
Shane: And you could probably apply it to product analytics, so if I know that when people come into a software as a service and there’s a core feature or core outcome that they need to do from that then the way they flow through the events they do within the application, I could attribute which ones are driving them to that core feature that I know makes ’em sticky or not,
so this idea is there’s just a series of events. Those events are done in certain orders, and we want to attribute which event is the thing that’s driving the value for us, the process or the action that we’re expecting that user or customer to take, and therefore we know that one’s working and these other ones aren’t.
So we either pump more money into the ones that’s working or we fix the ones that aren’t, that’s the core of the process.
Yorgos: Yes, that’s right. So analytically they are the same.
Shane: Let’s go into that idea of last step so I can see that pattern in my head, I think. So if I play it back, the last thing we saw them do was the thing that drove them doing the thing, so the last thing they used to get to the shopping cart or the website to put product in the cart and buy it.
We attribute that last step was the thing that drove that behavior, so that’s the easy model because we know there’s only one jump, they did this thing, then they did this thing. We’re gonna attribute that last step to the thing that drove the value. Is that right?
Yorgos: That’s right. Typically in a digital event in particular the last few steps are digital. They happen on the side, so it’s easy to say the last step is my side. That’s,
Shane: All right, so that one’s easy, as long as you can figure out how to get all the event data rack and stack it, figure out what was customer or a unique person, figure out what the last one was done. Make sure your time zones and your UCCS are lined up that you’re not dropping any events that you’re tracking all the events, that they can’t get to the shopping cart through an event that you’re not tracking all those simple data problems, right?
Which nobody has.
Yorgos: yes,
Shane: So how much of your time when you were doing those attribution models was dealing with that data mess, all those things I just flippantly said aren’t a problem, which we know are, I’m assuming like all data problems, a lot of the time was sorting out the event data the sequencing, U D C dates, quality, all those things.
Yorgos: Yeah, there, is a fair bit going on there. But most of these problems are now solvable a lot more than they were just two or three years ago. Just sequencing events or creating a big data pipeline. If you want to approach it in that sort of detail.
I’m not as hard to solve as they used to be. Nobody should be scared in in those circumstances. I think it gets a little bit more terrifying when you have data from different sources, like when you’re mixing channels that are online and offline that can become really complicated because there are different rules and not the same gradient of detail.
Shane: So before we jump into that one, let’s just focus on online. ’cause that’s not easy anymore, . In the past we had little evil cookies that were dropped outta my device, and I could go in through many channels and many sites, and you’d always be able to see it was me. I couldn’t hide from you.
And so therefore you could assign events that happen to me, knowing it’s me, whereas now with all the privacy stuff coming in, those cookies aren’t readily available. Identifying a person and then attributing, an event or a channel or a set of channels to them, even in the online world has become incredibly difficult all of a sudden.
Yorgos: That’s true. There is a glimmer of hope trying to do this on a per channel basis because most of them will provide enough. Data for you to be able to do that. Not p i data I might add but enough events and enough date, time stamps for you to create something.
It is almost hopeless to try to stitch a journey from one channel to the next while identifying a single even device let alone person. I will say identifying a person is impossible. But even the device is tricky.
Shane: One thing I am seeing though, is I’m seeing a bigger push by organizations to create third party data and assign it against the event. And what I mean by that is that when you’re coming in through a single online channel and for example, we know there’s some way we can differentiate.
Your event or your session from somebody else’s. So maybe you are logging into something that we control and we know about, or maybe there’s some other way of identifying it. Organizations are, starting to tag these identifiers themselves against those events so that they have this first party record where they can align these events and attribute them to a person or an individual or a device
That first party data’s become more and more valuable. And therefore there’s also more work for them to tag each of the things they’re doing, each of the channels with this first party data so that they have a little bit more data to group by. Is that what you are?
Yorgos: Yeah. Not only that, but the companies there is a whole new industry that has emerged on identity management. The job that it’s doing is exactly that. They specialize in trying to figure out what is a device that can remain constant across different digital consumer journeys.
Shane: When we talk about online channels gimme some examples of what you would call an online channel.
Yorgos: Now obviously an organization’s own website , is an online channel but also social media. You can punch them all up together if you want to have some kind of summing up or identify Facebook TikTok, Reddit, whatever you want separately. All these can be done. But these are the majority of online channels. I like thinking of email as a kind of offline channel. It’s still big in situations where, You have almost real time events. It’s almost shameful not to class email and online channel since you get an email as soon as you have behaved in a particular way.
Shane: And then when we talk about those social media channels, do we differentiate between an advert and organic view? Do we treat that as a sub-channel? What’s the term that people normally use if they’re trying to differentiate attribution to AD versus organic?
Yorgos: It’s exactly that. Usually this conversation is not necessarily based on technical considerations. So if. It starts from let’s say the C F O in the OR, or the chief marketing officer going backwards. If this particular person wants to fund organic search separately from ads, then you should differentiate between the two. If internally there is no budgetary line to differentiate between the two, then there isn’t the same kind of pressure to do that. Although I will suggest that the insights on consumer behavior will come if you do differentiate whether there is a separate funding or not.
I err towards more detail rather than less detail.
Shane: We all do. I remember the days when, we had to buy expensive Teradata boxes and we had to be really careful about what we load. And then the whole, big data bullshit wave arrived. But one of the benefits outta big data was the idea of data lakes and relatively cheap storage where we can, store most things and keep ourselves comfortable that it’s all sitting there in that messy cupboard if we ever wanna try and find it.
But I just wanna come back to that key theme. So that key thing there is we attribute to a channel when we want to actually be able to invest. That channel. So if we don’t care about breaking our investment down between ads on Facebook and organic Facebooks, that’s not the way we run our operating model, and we don’t care about that.
Then there’s some value from an analytics point of view, but from an attribution point of view, we don’t care, we are figuring out the boundary of where we want to attribute somebody came from, and we wanna invest in that attribution, that channel to say, do more of it or less of it,
that’s always the key outcome that we’re looking for.
Yorgos: Yeah, that’s right. I’ve worked with some organizations that were adamant that they weren’t interested in separating out a lot of social channels. And that meant a lot more simplicity in the model because you didn’t have to distinguish all of these social media channels. And this type of attitude changed within six months. From a situation where there was no desire to fund separately TikTok or Reddit there was a request to understand what is going on with a view to maybe funding those in the future.
It’s actually a lot more useful to differentiate them since you can anyway technically rather than just punch them all up together.
Shane: So what drove that? In the BI world, we have the standing joke that the first question you ask data for is never your actual question. How many customers do we have? That’s actually never the question. It’s just to give you a boundary to go, okay, what type of customers are they?
Ah, so those types of customers, where do they live and what do they buy? It’s the first question you’re gonna answer to frame the next question. So in that scenario, where they went, okay, we only care about these channels at a, mega level. And then it a sudden, okay, now we need be a bit more fine grained.
What drove that change?
Yorgos: I think the dominant marketing thinking became realization that you had to be wherever your customers were. If you thought your customers and prospects were on TikTok, then you have to be there. And that was the winning argument within organizations say, Hey, why don’t we start by finding out what’s happening with that channel and maybe identify how many of our customers or prospects live there. And that’s what drove it. In the end, we just couldn’t ignore that there were more and more channels around. But the reality also is that it’s so easy to start a channel now and You go back to your thinking about big data and data lakes, you have a lot of different data points. Are they all going to be useful six months from now?
Shane: But if we take that example the value stream I heard was, okay, there’s a new channel out there. Let’s say that we are not doing anything in the TikTok space, it’s just a channel that we haven’t touched yet. So we go and say, okay, there’s a channel out there.
There seems to be a large amount of people on that channel. Everybody’s gonna TikTok, so we should go there too. Effectively the first question then is is our audience that we care about there, and then you do some research. You go, yep, there’s some people there that look like the type of people that are our customers.
And then you probably gonna go. Okay, we need a bit of an experiment. So let’s take a small sum of money, let’s experiment with some creative, let’s some pushing content out. But then what happens is we’ve, thrown some money at this channel.
We’ve got no idea whether we’ve got traction. So attribution tells us, attribution says, we’ve funded this amount of money on that channel and these customers have been attributed they’ve done this action. Yeah, they bought this product. And we attribute back that channel is what drove them to do that.
Yorgos: Yeah. . And I’ve seen in a couple of our projects I did see small traffic yet for some of these channels, no doubt about that. But when you track the journeys all the way to the financial objective and the conversion event it was rather significant. So you’d say okay, I paid $1,000 to have some creative there.
Then I looked at the clicks and blah, blah, blah. Why don’t I make it $2,000? So that’s doubling the budget in a way, proportionately speaking. If your budget is $2 billion, 2000 is not a big deal, but analytically you have just doubled the budget for one particular job.
Shane: That takes us on to offline and online. If I think back to my data mining days, is that the same as when we used to call it, above the line and below the line? Is that the same term or is that different?
Yorgos: I think you got me here. I didn’t think of offline, online, on those terms, but yeah, you could say that,
Shane: We used to talk about where we did ads on television. It was above the line because we had no data, we couldn’t attribute anybody actually watching it, let alone anybody buying off it. And then below the line was something that we, like a website clerk where we could always see a behavior.
We actually had, some data that we could, in theory, attribute to a person.
Yorgos: You can use that analogy. And yes these problems still remain although a lot more smart visa in the market and you can figure out who’s watching what anyway or a lot more than just a few years ago. But yeah, you can use different techniques.
They’re not exact techniques, but you can use different techniques just like you would’ve done in the old days with out of home, say you have an ad that is in a particular location, so you thinking that everyone who drives past that location will have seen the billboard. You have no idea whether they did or didn’t,
Shane: Is that the definition? A standard definition of online and offline. So online is when it’s a digital device and it’s a website, social app, that kind of thing. And offline is when there is no digital device. Is that the definition between online and offline?
From an attribution point of view?
Yorgos: No, it can still be a digital type of channel, but, You just cannot get, hold of the data in real time. And therefore it behaves just like a channel that could be like out of home billboard,
You get some data from what happened a month ago in a particular digital location. So you can’t use this in real time.
Shane: , so it’s about the frequency of that data. The availability of it is, can we make a decision on it right now because the person’s doing something right now, or is it something that , we’ll use to train some models later that might make a recommendation, but it’s not a near real time recommendation for them.
Yorgos: Yeah. In my head. If I can create a big data pipeline with the date and timestamps that would make it an online chat.
Shane: Last step, this is the last thing they do. We attribute that channel where they took that step is the thing that got us success. What are some of the other attribution models that are out there and, how are they different?
Yorgos: The opposite of that first step, which is first step, last step, right? It’s marks for authenticity, who came up with this genius?
Shane: Okay. But the question is there a quick step?
Yorgos: Must be something that you can do on Excel. I know you like Excel spreadsheets, so I thought I’d throw it in there.
Shane: oh I’m modern now. I’ve got Google sheets, mate. We’re in a Google stack now. Excel, Microsoft’s dead. Alrighty. So first step
Yorgos: First step is very useful in understanding whether there is a channel at the top of your marketing funnel. When you are playing for each that is dominant. for example is it Facebook or TikTok or Reddit or ,one of the news websites in the country first step is important in understanding that is almost always an entry point here into the consumer journey that I’m interested in.
Shane: And then I’m guessing there’s some stuff in the middle though I guess that people don’t just go to one channel, which is their first channel, go to a second channel, which is their last channel, and then, buy the product. There’s a whole lot of messy behavior in the middle, which is where the complexity comes in.
Yorgos: Yeah that’s right. What customers do and what consumers do is quite complex because they don’t even move in a particular linear fashion from the top channel to the converting channel. They can be doing and froing for a decent number of days. And in some occasions, like in some industries,
think about cruises., ? Your consumer journey might be months there in the making, as you try to figure out where to go what is the right route, whether you have enough friends to go with you and it can take months to plan a trip like this. And this is reflected in a consumer journey
Shane: Do you end up doing some kind of cohort analysis or segmentation where. There is short consumer journeys with a small number of touches, and then there’s long consumer journeys with a large amount of touches. And therefore you’re grouping those into different models to say these types of customers take the long path.
These types of customers take the short path. Therefore, we may pump money into the channel that’s the first step in the short path because, the time between us spending the money and recovering that money from a product purchases is what we care about right now. Or you could look at it and say, okay, in the cruise case people that do the short journey, take a seven day cruise and a cheap boat.
Somebody that does that long journey, they’re investing a lot of research. They’re doing the 28 day cruise and the luxury boats. So therefore, we don’t get our money back from that channel as quickly, but the margin we are getting off, it is much higher. So that’s kinda what you’re doing , is using that event data and that behavioral data to then decide where you’re bang for buck.
Yorgos: Yeah that’s right. That’s exactly how and why you do it. I am more interested in the shorter consumer journeys because they don’t have the same data complexity. Even first party cookies disappear after a few days now. If you wanted to track very long consumer journeys, you have to have some marketing approaches in there to get people to authenticate themselves, log in, have some goodies along the way create some interactions.
To keep the interest going. You just cannot think that you’re going to track a seven month consumer journey and across different types of jobs.
Shane: A little bit of complexity I hadn’t thought about. So as well as having a bunch of online channels and no safe way of saying it’s the same person engaging with those channels. And then a bunch of offline channels, where we may or may not get it data at the right time to help us with that attribution.
We’re also saying that unlike a corporate system where that data pretty much survives forever that ability to reference that data disappears after a certain point of time. After a certain number of days we lose the trail, so to speak of that person even if we are able to figure out what they’re doing.
So that. Urgency, brings in some complexity in terms of the way we create these models.
Yorgos: Yes, that’s right. And in fact it gives birth to a different thinking about what model to work with simply because work a lot better with continuous touches. Others work a lot better with deeper histories, but not a lot of touches.
Shane: Explain a little bit the difference between a continuous touch and a deeper history. ’cause they’re gonna think there two patents sitting in there with those two different terms.
Yorgos: Maybe you think about a consumer who is buying let’s say health insurance, typically you’d buy health insurance once a year maybe twice a year if you are unhappy with your provider. In order to understand the customer there you have to look at deeper histories, see what the type of product that they have bought has been.
This year, last year, three years ago, and so on. And see how their circumstances have changed. So it’s that few touches, deeper histories
And if you compare this to buying say
supermarket goods you would do this twice a week maybe every day in some of situations for fresh fruit and bench . So you have shorter consumer journeys and a lot more touches, so you don’t need deep histories you don’t need to analyze what you bought at the supermarket 12 months ago. You only need to understand what you bought last Sunday.
Shane: So it goes back to that volume, if I am constantly doing something and I’m doing it quite often, then the window of time for a richness of data is shorter, ’cause I’m doing it more often, whereas if I do it once a year, I have to go back over multiple periods to get any kind of representative data to infer something.
And then deeper touches sounds though we’re bringing in some more of the demographic attributes about people and things versus the behavioral ones around what they’re actually doing. Is that Right? So we’re augmenting the lack of volume of data with some more demographic data that’s valuable.
Yorgos: Yes. It’s always safe to do that. I have seen in the recent times that static demographic data don’t play a such a major role anymore. I think the the reality here is that as more and more people or nearly everyone is on a digital channel these days, the static demographics aren’t really a differentiator.
It’s just what happens with your digital behavior that is predictive of the next type of touch that you may have.
But whether you are male or female, or you live in an affluent neighborhood or not, or, the local or age I don’t think that it’s as crucial as it used to.
Shane: What you’re saying is that we are moving away from segmentation models that are based on demographics. How many people in your family, what ethnicity, where you live, what your income is, where you go for a holiday, all those kind of things that used to infer propensity to buy.
And now we’re moving much more towards that consumer journey and that channel and that attribution to say the journey they have to doing this action. So buying that product is this, so therefore that’s, we’re gonna pump the money regardless of the segmentation and demographics. Maybe we’d use the segmentation of demographics for the type of content we’re producing and the type of ad or the type of thing we’re putting onto those channels to attract them.
Yorgos: And even the the communication style perhaps that also matters. And, I can stretch this to specific consumer experiences as well. Some different service models but in order to figure out when somebody’s going to turn up in the supermarket to buy five to 10 different products I don’t think that it matters whether they are ethnically this or that older or younger or whatever.
Shane: So first step, first time we saw them top of the channel last step, last time we saw them before they took that action or the converting channel. What other attribution models are popular
Yorgos: There are models that are about equal type of credits, as in, if you have four different channels, each one will get 25% of the credit for a conversion. that’s useful in the sense that at least, what channels are participating in the process. Others create a gradient so higher gradient towards the conversion for obvious reasons, or higher gradient towards the first touch, again, for obvious reasons.
So it’s really a situation on what is it that you’re going to use your funding for? Are you going to use it to build the upper funnel or are you going to use it in order to boost your revenue line?
Shane: . But each of those channels are influential. So how the hell do you work out, that there’s a converting channel, a last step, but actually a certain number of people, they didn’t go through these first steps. Or they didn’t go through these three channels, then they never get to conversion.
Because that, in my head, that becomes a fairly large matrix model of multiple customer journey paths, multiple touch points over multiple periods of time, over multiple channels in different orders, different times of day. There’s just so much variability in that model that is that where the magic is,
is that where the actual hard part is doing that rather than just go first step, last step, easy. Fairly simple business rule. Anything else is an optimization problem a quite a large complex one.
Yorgos: Exactly. And also I find an analogy to what you have just said, ’cause it was it was actually quite interesting, you think about anonymous journeys, let’s forget about attribution for a moment. If you are browsing anonymously and three or four days later you authenticate yourself, then pretty much all software that is available today will stitch the journey back to you. And say okay. Now that you’ve authenticated yourself, I can actually say that you have been here as a repeated visitor three or four times. It’s the same kind of process. Once you start looking at somebody who has converted and therefore they have had to authenticate to themselves at the last step then you can go backwards and figure out, okay, now I know what it is that you’ve done before you got here.
Shane: But how, the only way you could do that is IP address
Yorgos: this IP address and particularly for your own site, you can your first party cookie is still alive
Shane: For a period of time,
Yorgos: For a period of time.
Shane: And again, that’s getting locked down more and more . As apple and as the other, people that actually honor privacy they’re locking down from a device point of view to stop people tracking you for longer and longer.
Yorgos: So there is a lot of complexity there, and that’s why it’s not.
Shane: Real world example I’m just thinking about. I love coffee. I work from home now I’m not gonna go and walk down to my local cap and spend $5 or $6 for a coffee. We’re bootstrapping. We can’t afford that. We’re incredibly frugal.
So I have an espresso machine. I buy the pods. Espresso New Zealand. I have an account. I get pods delivered all good. Last year we spent a month in the uk. The UK coffee is far worse than New Zealand coffee. So I was like, okay, I need an espresso machine. I go onto my espresso account, I go buy me a machine, deliver it to where I’m staying in the uk.
It goes, oh, you can’t do that. You’re in New Zealand. And I’m like, Okay, so go in, I create a new account, use a UK address. So I now got two accounts and I go, cool, buy that machine, get it delivered to us address, and it comes back and it goes, oh, there’s a problem with your order. So do the old service help desk call, and actually an espresso’s , I gotta say, they have one of the best responses in terms of the, people answering your questions matter which channel you contact them.
It’s awesome. However, they came back and said, oh, it’s ’cause you’re using a New Zealand credit card. And I’m like, yeah, I, you can’t do that. Okay, so we’ve actually got a UK credit card, so I go back in, I use a UK credit card, it goes no, you can’t do that. And you look at the help and it goes, oh, it’s a debit card.
And I go, okay, so you go through and you use a credit card and it goes, eh no. So in the end, I got somebody in the UK to create an account, log on, buy the bloody machine, get it delivered, that’s cool. Going back to the UK again this year. And I’m like, all right, I need some pods. Go back on with my UK account.
Eh? I was like, F you. So then I hop on and I go, okay, I turn on my V P N, and I go, right order it. So V P n I’m in the UK using my UK account with a UK credit card and nope. Still blocks me. now I’m getting a bit grumpy. So I’m like, okay. So I clear my browser, I go into an anonymous mode.
I turn my V p n on, I go in and create a new account in the UK with a UK address and a UK credit card. And the order went through But think about that. How does it know it’s me now? They’ve actually put a whole lot of things in the way, and I’m assuming it’s around fraud and all that kind of stuff,
Yorgos: Yeah. Yeah. That,
Shane: overseas buying stuff and I get that, but now they’ve lost an attribution model because in theory, there’s no way of them stitching me back together in my buying behavior.
Yorgos: yes, that’s correct. And I have to admire the anti-money laundering techniques that Nespresso UK is using, the you have to admire all this, this is essentially they’re saying yeah, I want you to be somehow traceable to the UK authorities.
Shane: As a data person and as a systems person, utmost respect for the systems they have in there as a consumer. No, they’re a bunch of dicks.
Yorgos: That’s right. You’ve made it so difficult. And I’m sure that everyone has a story about this because it’s so difficult to behave like yourself. If you’re going from one country to the other, it’s just not happening.
Shane: Which is interesting given, COVID and global working and everything else we do has become remote, not location based, which is interesting.
Yorgos: In situations like this your attribution would be neither available in the UK nor in New Zealand. It’s would be, it would be different silos,
which is incredible because that’s what attribution sought to do in the first instance, not to use each particular channel as a silo. And as soon as the third party cookies disappeared, then everything became a silo or a wall garden in sexier terms.
Shane: The good benefit of that is when I went onto one of my connected TV apps that has ads embedded in it because it’s free to use. I didn’t get an espresso ad pop up. So at least their attribution model wasn’t working because they weren’t spamming me with ads about the thing I already just bought because they didn’t know who I was.
But that’s one of the outcomes of attribution, isn’t it? Is that as well as figuring out where to throw money, which channels the best you can then actually say, are there any other complimentary channels that potentially that person is seeing? So there’s some benefit there because they’re giving you the message through another medium, another channel, that has potentially some influence, but they’re also just making it more complex because now I have yet another channel with yet more events to go into the model to figure out which ones drove me to the action they wanted, and therefore, which one they should invest more in.
Yorgos: This happens? Yes. And I think that the promise If you were in a position to understand the consumer journey and its length and how many touches it had, then you could also figure out what the messaging would be to accelerate somebody who was at the beginning of the journey or the middle, and that would be a tremendous insight for any organization to say I don’t need to put a special in front of you for a cruise because you have just started your journey.
You’re not in a position to buy now anyway. But what I can do is I can build a story that is going to stick in your head for months. And when the time comes then I will be number one choice for you.
Shane: Let’s just talk about the value stream to do this . What I can see is you grab a bunch of events data from a bunch of channels. You put those events together in one place.
You try and infer the concept of a unique idea or identifier for something you wanna bind to those events. Typically a person, but maybe a device. Then you just run chat, G p t or a, black box neural net against it, and it tells you the answer, right? That has been a little bit flippant.
But that’s the kind of high level process, isn’t it? Is grab the events, put them together, bind it to a concept, run some models to figure out how we attribute which event or which channel is driving that behavior. And then make an investment decision on that, that’s the core steps in the process.
Yorgos: Yes. And it opens up a lot of other different avenues for operational improvement. Like I said, the messaging can be consistent with where you are at in along, along the the value chain. Or it can be an alert as in it’s taking you too long.
Usually the customers that convert do so in in day to days. And now you are on the 31st day Of fiddling with various options and you have it converted, you optimize your contact process as a byproduct of all.
Shane: And then if we look at that value stream, where are the problems? So we’ve talked about a couple already. So we’ve talked about identifying the core concept, the core entity, the person or the device that we wanna bind all these events together to get a journey for, that’s incredibly difficult.
We talked about the fact that there are multiple channels that have multiple events and multiple touches and can be done in any order. And therefore that is an area of complexity. Are there , any other areas of complexity that make attribution difficult?
Yorgos: What is changing all the time is what data you can get out of the big social media giants where even when you put your ads on quite rightly, they don’t want to identify the person who has clicked on your ad and ended up on your website. But the type of detail that they release varies between different organizations as well.
So that makes the scientific process a little bit more complicated to figure out. I have a lot of detail from one channel, not a lot of detail from another one. How do I marry the two?
Shane: And stop it over waiting. If we have a large number of events that are quite high quality coming from one channel and then we have a small number of events that are relatively summarized coming from another channel, then when we’re modeling, depending on the techniques we use, it’s gonna skew the model to where there’s that richness of data.
Yorgos: yeah, that’s right. Yeah. You have to make them comparable and that’s just understanding what it is that you’re going to get out of each one of your marketing partners. If you get 100,000 events that have the 31st of March as the date, and then you have another data pipeline that has exactly the same volume, but. It’s daily. You have to do something about that,
Shane: That’s where the magic happens.
Yorgos: The magic happens.
Shane: and also explaining it so I always remember, back in the days where neural nets, where the bee’s knees, the data miners or data scientists would often, run a bunch of models against some data. So maybe we were doing I don’t know, prediction or a next best offer or, a term prediction,
and they would, get the data, they clean it up. It’s always 80% of their time. They’d run a bunch of models. I always have the standard joke that, 90% of analytics is a group buy the next 5%. So linear regression. And then we sometimes get into K means or something else. When I watched experts they were going in, they were running a bunch of different models.
They were then comparing the models, maybe looking at a lift chart or some other way of saying what models are most effective. They were tweaking the variables on a couple of models to see if they could train it on that data better. And then they’d come out with this magic answer that says, okay, this model, bang for buck is the best model to run against that data to answer these questions.
And nobody could understand it. One of the techniques they do that I really liked was they then typically ran a decision tree and they would say to , stakeholder, . This was not the model we are running, but this model is gonna give you a representative idea of the factors that are influencing the model.
Maybe it’s a churn model, and then the decision tree’s saying it’s people who have rung the call center five times in the last month who have had a bill change where the prices have gone up and defaulted on one payment. That’s the tree. They’re the ones that are highly likely to churn.
We’re not actually using that model, but that’s a, model that could visualize that flow for us. I’m imagining that with attribution models, we’ve got the same problem. There’s so many variables, so many demographic behavior variables, so many time-bound variables so many different ways of modeling it that actually explaining how we are attributing is gonna be a fricking nightmare to anybody that didn’t create the model or even the people who did create the model.
Is that true?
Yorgos: It’s true, but the market has moved on from that there isn’t that much emphasis anymore to explain what this particular statistic does or how the Model operates. And the reason for this, in my view, is because it has become so easy to do AB testing. And it is actually a much more efficient process. So for example, you say that this group of customers is your best probability of conversion, you just run with this for a few days to see if your control group versus the target group have any notable differences, that’s your ab testing within a a few days or, even a month of putting it in market.
You’ll know whether \ the insight was correct or not. That’s the same with attribution. You just say that and, okay I’ll probably. Increase the funding for a particular channel by 5%. And then I’ll see how I went after a few days or a few weeks of open, and then I will know whether I got it right or not. It’s a lot more pointless today to get in a room with 10 people and flood the screen with statistics. Say, oh, this statistic is zero point three four, which is a lot less than 0.9. Why is it a lot less? It doesn’t sound like a lot less and therefore x, y, or Z is going to happen.
I haven’t had to do it in the last four or five years.
Shane: That means, we’re getting agile approaches to that kind of work, like we no longer should sit in a room for nine months building an enterprise data model that nobody can execute. We need other ways of doing things quickly. Iterating it, showing the value, and then
testing it. .
Yorgos: Yeah. And it is a much easier engagement process to with all the stakeholders, say, Hey, look, I tell you what I’ve done the best that I could hear. But the real test is when we put some specific offers in front of the consumer and see how it goes.
Shane: . But that also means that model has to move from being a research model to a production model. Let’s take your example. We’re gonna go and say that there’s four channels. We’re gonna use a gradient model where there’s a percentage assigned to each channel.
That’s how we inferred the most valuable channel where we should do the next investment to achieve that outcome, which is people buy more products. We’re gonna go do some testing, we’re gonna throw a bit of money in that channel for a while and see where we get uplifted the number of people buying our product.
That means that model actually has to go and be flipped into a model that we can use to measure whether we’ve seen an increase in people buying products from that channel, that kind of is really important now ’cause we have to measure that investment achieve the goal that we thought it might.
Yorgos: That’s been the promise all along. The promise all along has been you build a model for a particular operational improvement, ? Not to satisfy yourself, that you’ve built a great model.
Shane: Sometimes in the past have been built to justify the investment that’s already been made,
Yorgos: Yes, that’s right. That’s why I’m laughing ’cause I’ve been there too, because it’s build this particular model, $3 million later and say it’s fantastic. But I tell you what, we’re not going to implement it because something else has happened and I don’t really.
Shane: I always wondered, when we saw the, analytical maturity model where we always talked about descriptive and predictive and all that kind of stuff I was always amazed that there wasn’t something on the access at, minus zero, that was post-rationalization models.
We already done the action and achieved nothing, but now we wanted to create a model to prove that it was the right thing that was done.
Yorgos: yeah, I must say the this type of meticulous work. And let me be clear, this is how I learned my craft as well. I haven’t turned my back to that. But it’s just not appropriate for marketing and consumer type of events. It’s more appropriate for. Getting it right before you have actually implemented it is 10,000 times more important. But here we are talking about what is your downside risk. Your downside risk is that you can potentially annoy people by putting in front of them and ad that is not appropriate. Okay.
It’s true. But the reality here is that you can also stop doing that a month later after you have figured out whether it works or not.
Shane: I can see, there’s , a spectrum of rigor that you need, depending on what action you’re gonna take and the outcome from that action.
Yorgos: If you’re talking about everyone is talking about actionable insights, here it is, you can action them by actually going to market with a specific approach and then come back a month later and said, okay something didn’t work. I’m going to have. Relook at how I derived my understanding of the consumer journey and conversion events. Or change the way I have made assumptions about funding my 5% or 10% increments.
Shane: All right, so just to close it out I often still think about their maturity model for analytics. The idea of descriptive, predictive
in the past, I’ve always said optimization is one of the most complex models, to deal with. Simple example that’s horrible to try and model is you have a school it’s an in-person school. You have a bunch of classrooms, you have a bunch of teachers, you have a bunch of subjects, and you have a bunch of students.
And now you’re trying to optimize that. The classroom’s always full with a certain number of students and the right teacher for the right subject. And actually when you try and do that algorithm, it’s a nightmare, ’cause there’s so many moving parts. It’s an optimization problem. So attribution to me sounds like an optimization problem, in terms of complexity of variables coming in the complexity of demographic versus behavior, the events, all that kind of stuff.
It is quite a complex area to actually build models on. Is that true? It is one of the harder ones to do.
Yorgos: Yes. And that’s one of the reasons why I wanted to talk about the is a little bit. ’cause it’s a great learning experience. It keeps you sharp. It is because of the fact that it has a lot of moving parts, not despite data. And the stakes are high,
Even if you get it right for six months out of 12 in a year think about it like this. For some organizations in the US in particular their marketing budget is $50 million. Just getting the the same number of sales with 10% less is a big deal.
Shane: I’m a great fan of agile as you would expect, and small bits of work, value, early feedback, iterate and build on it as you go. So I’m assuming that if you wanted to start from somewhere, starting with a last step model, a last step attribution model would be a good starting point,
build that one first. Understand what that looks like, make some small investments based on that, and then move into some of the more complex attribution models over time when there’s value in that. And you’re gonna return it based on, more targeted spend .
Yorgos: I understand your point. Your point is how can I be practical about this? Because I cannot or will not dive into this straightaway. So I will say that first and last touch are probably the two specific data points that are going to give you a lot more understanding of your consumers.
But if you do only one or the other, you’re not going to be better off than where you are today. Even if it is just a line between first touch and last touch, at least you have a line, with just one point. You don’t have a line. I will never advocate building a model based on just one touch.
It’s just too random. Start with two and see how you go. Maybe the entire consumer journey is three maybe your two is good enough. But don’t just do one.
Shane: Out of interest, how often, is it just a , three step or three touch journey? Is that common or is it rare?
Yorgos: I was surprised to to find out that for one of the organizations that I looked at it was two or three touches within a week. And, people would buy articles between five and $50. So for this type of retail consumer goods I was quite surprised that it could include it so quickly not, maybe not so surprised about the number of touches, but how quickly it concluded.
Shane: So again, it’s gonna be dependent on the industry, the product, the business model, the customer demographics, the market, the country. There’s gonna be a whole bunch of variables that influence, your customer journey, how many touches there are and therefore the complexity of the attribution model.
Yorgos: yeah. But an organization can figure out the touches even if it is within its own site anyway, so it gives you an idea of whether even within your own site, it’s a complex journey anyway. So if it is complex within your own website, right there, there is a learning there to start with.
But think about yourself as a consumer, how many touches have you needed for your car insurance?
Shane: It’s one of the ones I don’t change very often
Yorgos: you go.
Shane: happy with the company. And so that one, not so much my internet provider and noise the hell out of me that one’s constant touching. I’m a great fan of charcoal barbecuing, long barbecues.
And so there’s a remote, meat delivery company in New Zealand. Woody’s call out to Woody’s, and every time they email me with a newsletter, with some meat that touch my customer journey becomes short. I’m oh, yeah, I could try that with that, and then I’m on and I buy it,
Yorgos: so there you go. It’s is it five to $50?
Shane: They’ve got this good track that, it is free delivery over a certain amount, so you always want to get above that because somehow you think you get value because you’re saving $15 while you add $30 onto the shopping cart. So I think my total cart value probably is normally 50 to $80. I can’t remember what the limit is, but they definitely have some good marketing and sales techniques to increase the value of the cart
Yorgos: So there you go, you can do 50 to $80 within two or three touches within one week,
Shane: probably one touch,
Yorgos: yeah.
There you go. So it’s consistent with what I saw the journey is always influenced by the buying price as well. So if you have, if you’re buying a car, which is expensive then you want a few more touches,
That’s why the initial approach would be figure out how many touches are required when you can count those touches. Let’s say that’s your group by statement right there.
Shane: Although it’s interesting because with that example, my first touch was actually somebody I follow on Twitter who is better at long cooks than I am, and he constantly, publishes photos of those long cooks and recipes, and he was the one that mentioned them. For them to attribute it the first touch they have to know that I follow this person on Twitter.
That person mentioned them and as a result of that, I then hit their website and did my first trial order. So again, if you think about the complexity of the events with that half that data, they probably don’t have. So again, as we know, often data is the biggest constraint to anything we wanna do in the data and analytics world.
And in that case, they’re probably never gonna guess what the first touch for me was.
Yorgos: That’s a fair point, . But also it may be that by just funding Twitter a little bit more, they can get a lot more of you and that’s all they need.
Shane: That’s a good point. They can’t anymore because it’s now X and it’s locked down and there’s no APIs and nobody’s actually on there. It’s a ghost town. So yeah, probably attribution, from x and taking some of your budget and pushing it out through that channel, probably not happening so much.
If your attribution model is still heavily weighted towards the thing formerly called Twitter, you probably need to get contact YOLOs and get ’em in to retrain that model for you. But anyway, on that note, if people wanted to get a hold of you where do you live when talking analytics, how do people get a hold of you?
Yorgos: LinkedIn, I write there quite a bit. I started liking doing that. At data.
Shane: Excellent. Thanks for taking me through that. Attribution is definitely one we’ve talked about a few times. One that I never quite understood seemed like another one of those dark arts. For me I like this idea that you just want to figure out which channels are driving the business process you want, so if you’re an e-commerce company, it’s, customers buying your products.
And then while we’re doing that, we’re doing that ’cause we want to figure out where to pump our money, so which channel is the best channel to put more money in, which will help us achieve that process happening more often, and therefore more business value. And then after that, everything else is a data and analytics problem.
And in terms of this one, it is a pretty horrible one. Yeah, maybe get some help from people who have done it before or take a while. , that was a great session. I hope everybody has a simply magical day.
Yorgos: Likewise, take care.