Podcast

Impact of dirty data and how to address it

In this episode of Procurement Unplugged, we’re joined by Susan Walsh, the CEO of Classification Guru, who demystifies the often-overlooked world of data classification and quality in procurement. Susan recounts her journey from running a fashion business to becoming a leading expert in data management, highlighting the critical role clean data plays in procurement processes. She explains how her company helps organizations by creating customized taxonomies and tackling data issues that are frequently ignored, providing valuable insights into the challenges of maintaining and improving data quality.

Susan dives deep into the future of procurement, emphasizing the importance of data accuracy before implementing automation. She discusses the limitations of current off-the-shelf tools and stresses that while automation can streamline processes, it cannot replace the need for meticulous data classification and cleansing. With a focus on both immediate and long-term strategies, this episode offers a comprehensive look at how procurement professionals can enhance their operations through better data management and what to expect as the field evolves.

Our Speakers

Fabian Heinrich
CEO & Co-Founder of Mercanis
Susan Walsh
Founder & MD of The Classification Guru
Table of Content

Fabian| 00:08.26
Hello everyone and a warm welcome from my side to another episode of Procurement Unplugged. It is a true pleasure to have today Susan Walsh, the CEO of the Classification Guru and who is truly mastering the holy grail of data, which is really a mystic secret to most of the people working in procurement. So I'm truly excited to have you here, Susan.

Susan | 00:33.07
Thanks so much for having me on. I'm always happy to talk about data. and especially in procurement.

Susan Walsh's Journey into Procurement

Fabian| 00:39.79
So Susan, how did you end up first and foremost in procurement and then what was your path later on towards building the classification guru? I think that's a truly interesting story and I think our listeners would be really intrigued to to hear it.

Susan | 00:56.10
Yeah, so it's one of those things, it's I guess well, now it's a happy accident at the time it was not so much happy I had my first business was a women's clothes shop here in the UK, and it didn't work out. And I was desperate for some work. Just I would do anything to pay my bills. So I found an ad online for a spend analytics company. And so I went away to spoke to them. They said, oh, we need data classified. I thought, well, you know, I've worked in a few organizations. I'm sure I could do that. And so that's how it started.

And after five years with them, I ended up managing a team of 14 people. I was managing projects. And I could see that the clients were paying for this expensive dashboard and fancy analytics. But the real issue behind it was the quality of the data. That's what we were spending most of the time fixing. But nobody was talking about it. It's like a dirty little secret you don't talk about. And so I felt like I'd got as far as I could with that company. And I hadn't come from a procurement or a data background.

So I didn't know where I could get a job doing the same thing. So my only option really was to set up another business. So that's where the classification guru came from. So I've just hit my fourth birthday. And that in itself was hugely challenging because the procurement people that I was speaking to all thought it was a great idea, my business. But the reality was they maybe didn't need my services right away or people weren't looking for me because they didn't know that I existed.

And so I've spent the last four years really... building a presence particularly on LinkedIn and really establishing myself as the fixer of dirty data and the expert in my field and that's kind of how I ended up talking to you today I guess.

Challenges in Procurement Data Quality

Fabian | 03:01.41
Amazing, yeah, that’s quite a journey. I think "fixer of data" is also an interesting term. Yeah, I mean, is there something that really inspires you about the book, procurement, and also that particular data topic? Because, over my years in procurement, I’ve found the same thing you were saying: no one wants to talk about the data, and if the data is portrayed the wrong way, the standard excuse is always "shit in, shit out." So, how did you get so excited about that topic that no one wants to look at?

Susan | 03:39.48
I guess I saw an opportunity because nobody was talking about it but but genuinely I love the work that I do. And for me, no matter what task I'm doing, I'm always looking to improve my processes, be more efficient, work smarter, save money, you know, increase profitability. And the really easy way to do that is investing in your data quality. You can save so many people hours in a week just by having clean data. You can... find cost savings, you can drive profitability on projects because you don't have people spending as much time fixing bad data or looking for information.

t just makes so much sense to me, so I’m just trying to spread that word. But, of course, the reality is that it’s a boring subject, and a lot of people within data who are immensely talented, highly skilled, and very intelligent don’t necessarily convey to the business world the importance of data. So, I’m trying to bridge that gap.

Fabian | 04:50.41
No, and I think data also is related to taxonomies. And maybe you can elaborate a bit more about that topic. I think it's quite an emerging topic, which has been overlooked as well for many years. And I think, of course, everyone in the industry knows there is the NICS code and the UNSPC code. But maybe you can tell our listeners a bit more about that topic, which is, in my opinion, heavily related to data.

Susan | 05:17.23
Yeah. So I'm seeing a huge shift away from taxonomies like the UNSPSC because they're so big and so detailed and don't necessarily fit the needs of the business. So I would say bar I think one client, maybe two, over the four years I have built customised taxonomies for my clients and normally it's about four levels. depending on the level of detail. Now your taxonomy should fit your data.

So what that means is if you have detailed information in your spend, like pens, pencils, paper, paper clips, keyboards, mouse, then you should have that level of detail in your taxonomy. But if it only says office supplies or IT equipment, then you only need a two-level taxonomy. You can't go into any further detail than that because it's not there.

So you... you should have a taxonomy that fits that data. And something that I find really frustrating with the UNSPSC is it's not chart friendly. So when you try to put level one of the UNSPSC on a chart, it's bigger. The wording is longer than the chart itself. So, you know, keep things simple with your wording. IT, HR, professional services.

Fabian | 06:40.17
I think that's a very good topic. Keep things simple.

Susan | 06:42.93
Yeah.

Fabian | 06:44.05
So I guess. I get asked a lot of times the question by our customers, how do you ever classify services? And they really struggle with it. Okay. So, I mean, I might pass that question to you as you are the classification guru.

Classifying Services in Procurement

Susan | 07:00.67
Yeah. So I would say most services, well, it depends on the service. There's professional services, so that would cover your legal, your accounting, engineering, surveying. Could consultancy can be a tricky one? HR and IT consultancy can either sit in professional services?

Fabian | 07:21.60
Yeah, I think to that level most of the people are easy going, but then when it really comes to the capabilities and to the kind of products a service provider can provide, I think then they struggle because you're kind of missing the product category then.

Susan | 07:39.21
Yeah, well, that's an important part of your taxonomy.

Fabian | 07:42.05
It's more like a skill. It's like a skill or capability they provide. And I think here they struggle and it gets tricky. Yeah,

Susan | 07:51.06
And what can end up happening is misclassification because someone tries to fit it into something that happens to be there. Yeah, I normally build taxonomies as I'm classifying the data. So, I don't start with a taxonomy. I add to it as I go through the data. And then at the end, if there's any items in the taxonomy that have say only one classification in the whole data file, I'll delete those.

But if you've got hundreds or thousands of rows against one classification, then that makes sense to keep it. If it's one or two rows, then just move it into the next level. And I think maybe that's the mistake that some people make is they think they need the taxonomy before they start classifying. but not necessarily. If you know what you're doing,

Fabian | 08:45.72
Then it's okay. Okay, that's very interesting because I mean also we ourselves, if we have own service provider, we struggle because I mean we see a lot of cases in our own company but also with people like who are our customers that for example the sales team is using a headhunter. And then they're not sure how to classify that. They don't know the name of the supplier whatsoever. And then it's just classified as sales, whereas it's actually an HR service, right? Yeah.

Susan | 09:18.57
Great tip for you. And this is how I train anyone that works for me. When you have a data set, always think about, you see the supplier. If you don't know what it is, Google it. Find out what they do. And then think about what would my company be? Buying from this supplier, so like you just said it's a headhunter yes the sales department bought it but they're a headhunter, so that's the service that they would be you would be buying from them It's the same with deliveries.

So, if you buy something from Dell or IBM, you might have a delivery charge on the invoice. And if it's one of the invoice line descriptions, it might accidentally get classified as a courier.

However, the reality is that that delivery charge is part of the computer or IT equipment that you just purchased. You're not paying Dell or IBM for a delivery service or a courier service. You know, it's part of the computer. So anything like that, I would always train my team up to classify as IT, not delivery.

Fabian | 10:28.99
And what are, in your opinion, the biggest challenges with regards to classification?

Susan | 10:36.53
Well, actually, I think it's first is actually knowing where to start and starting. I think that's a big challenge for a lot of people. I think they find it intimidating. They think that they don't have the time or the resources to do it. And so they settle for less than good data.

They might have to use a GL code, which is notoriously unreliable. And then they get a rough idea of what they're spending their money on, but nowhere near close to what they're actually spending their money on. And then the second thing is for those that do have classification, they don't maintain it.

So they don't look after it, they don't update or refresh their data so it becomes out of date very quickly. You know people can accidentally cut and paste people change their minds, if nobody's managing that then it becomes a big problem within a few months you could be back to to where you were before you had clean data okay

Fabian | 11:37.30
So, i mean, the key challenge is the maintenance actually of the data and to keep that up to date. I mean i'm always surprised, because i mean let's say in sales, like the other kind of side of the business, we've been using or people are using tools like Salesforce since like 10 or 15 years.

So, I mean, I'm always wondering why the problem of data maintenance in a kind of single point of truth system has not been solved. Because if you think about it on the sales org, there are many users, many people. So it should be kind of similar problems and challenges.

Why we kind of not get the knowledge. how they tack it or how they solve those challenges over the years into the other side of the business?

Procurement’s Struggle with Clean Data

Susan | 12:20.09
Well, I'll tell you a secret. Another part of my business is cleansing databases like CRMs, so Salesforce, etc. Interesting. They are so messy. There are so many duplicate records for the same person.

So, a step back from that is actually training people within the organization and it should be... anybody that works with data not necessarily data people or procurement people, but anybody that is using a spreadsheet or a CRM system should be having some kind of data quality training. So, that they understand the importance of why it needs to be cleaned, because otherwise they just keep setting up new accounts everywhere, you know, i see it with suppliers as well you can have five versions of pwc you know PricewaterhouseCoopers, PwC, P.W.C, it goes on and on and on.

Fabian | 13:18.714
Yeah, I mean, I hear a lot about in the market. I mean, also when I was in Scalpy now with mechanics like self-service procurement, autonomous procurement and kind of those buzzwords. But I mean, I am always asking myself, I mean, does the foundation not need to be clean data? I mean, how to achieve.

Susan | 13:38.96
Autonomous procurement without clean data—so, wouldn’t the first step be something like cleansing all the data? Absolutely. But, you know, the thing about automation is that it’s great if you have clean data to start with and it can learn from that. However, you should always have an experienced person in that area checking the data because, if you just trust the automation, it might be auto-classifying something wrong. And it could do that for months or years before it’s picked up.

But again, if you're doing your maintenance and you're checking, then that would get flagged and it would avoid a lot of those errors.

Fabian | 14:19.79
So if we look at a journey, kind of, if we kind of could paint the future to autonomous procurement, you would advise basically any procurement organization to tackle first the data classification, then the data cleansing. And from that basically starting to automate things in order to achieve at some particular stage autonomous procurement.

Susan | 14:47.18
Yeah, but don’t blindly trust the automation; it always needs to be checked. So, yes, you know, it will do 90% of the work, but there’s probably 10% that needs to be checked, tweaked, and changed to make sure that it continues. If you just trust the automation and leave it, that’s when you could end up with a lot of problems.

Fabian | 15:16.32
And like I mean, I think you advise clients every day and you're a true expert in that area. But I mean, what is the point your clients struggle the most with? Or what's the main reason they call you?

Susan | 15:30.51
The main reason is they don't have visibility on their spend at all. Okay,

Fabian | 15:36.06
so spend visibility is kind of the key topic.

Susan | 15:39.22
Yeah.

Fabian | 15:39.94
Okay, that’s very, very interesting. And we think, I mean, you were... business, as far as I’m concerned, is like depending on your exceptional skills and your brain, and not necessarily on automated software. So, I have a team, and we do use some software; it’s called Omniscope, and it’s a data modeling and visualization tool.

Susan | 16:02.25
I've developed a methodology to so to classify cleanse the data the first time around for a new data set it's all done 100% manually. We do every single rule, but if there's a refresh, we can semi-automate that process. So it helps to be more efficient, because if we know that the existing set of data is already accurate, we can then map that over to future data.

Fabian | 16:31.52
Yeah. I mean, do you think like any kind of the big software players we are seeing around over the last 20 years will incorporate such a data modeling in order? I mean, imagine in one of those big software players, I could press a button, clean my data or maintain my data or update my data.

Do you think that's in the near future possible? Or do we need to wait for a best of breed solution, which is really tackling that? Or will it be more like the status quo that you with a smart but software enabled team with certain models can solve it?

Susan | 17:07.67
There's a couple of different answers to this. So if you're talking about a global off-the-shelf classification tool that's automated, we're probably a decade or more away from that, I think. However...

Fabian | 17:19.50
I mean, in that regard, you could theoretically use systems like Vaunt.

Susan | 17:24.14
Yeah. You know, you have to use what's best for your business, but be aware that it has limitations right now. But having said that, if you are using an industry-specific... or maybe even built your own in-house automation tool, that's likely to be far more accurate than something that you would buy off the shelf because it's tailored to your business or your industry.

The problem is when you try to classify for everybody across industries and then different, the same thing can mean different classifications to different industries. So that's where the problems lie. But if you kind of...

Fabian | 18:06.84
Siloed automation for industries or company-specific contexts is more likely to be successful. Very interesting. Yeah, I mean, since we’ve almost reached the end of our podcast, I think one super interesting question I would have is, I mean, you’ve seen many stages of procurement maturity, right? Across your career and across your business now with your customers, so...

What do you think could be the future of procurement in general, regardless of the data or anything? Like, if you could really see how procurement has evolved over the last 10 or 15 years, what is, in your opinion, kind of the future? I mean, it would be super interesting if you could give a quick overview of where you’ve seen procurement coming from and where you could see it evolving to.

Future of Procurement

Susan | 19:00.11
So I think, you know, we've moved from some pivot tables and... Excel to dashboards, Qlik, Tableau, Power BI, you know, people are reporting now in far more detail than they ever were before. In terms of the future, I think that it's about, yeah, streamlining the classification process, maintaining it, automating it where appropriate and where possible. And then on the other side of procurement, there's a lot of advancement in things like RPA, contracts.

And so I think a lot of the kind of laborious... prone to error tasks that are carried out by humans will be automated and be far more accurate and it's not about taking jobs away because those people will be needed in other areas now um but i think that's the way that it's going um and i don't think that's a bad i mean automation is always automation

Fabian | 20:01.73
Sorry to interrupt here, but automation is always a big buzzword. Where do you see automation happening? Well, I think that’s where we need to be realistic about what it can achieve. So, things like classification...

Susan | 20:12.72
No, it's far off, but more black and white tasks. So scanning contracts, you know, you know, S2P, P2P processes, you know, invoice comes in electronically. It goes through a system. It never gets printed off. You know, that kind of thing is going to be really great.

Relying on automation to classify your data. That is a minefield. That is not good. That will end up. Taking you more time to fix than it would to just start manually so that's that's my argument um but you know it will come, but i just think we are quite far off from it at the moment yeah

Fabian | 20:56.28
Yeah, look, that was a very interesting episode. I mean, I also learned a lot from you, and it was an extremely interesting talk to really dive into the topic of classification, clean data, dirty data, and how we can automate it. So, yeah, thanks a lot. Maybe we could look at another episode in a couple of months and dive more into one specific topic, because I think we could speak for hours about it. And, yeah, I don’t know if you know, but I have a book coming out in about two or three weeks, so you...

Susan | 21:34.29
You know, there are lots of topics in there. In fact, there’s a whole chapter on taxonomies, so you would love it. So, yeah, there’s lots to talk about. Maybe you can do a quick shoutout to our listeners. What’s the title of the book?

It’s called Between the Spreadsheets: Classifying and Fixing Dirty Data. There’s a chapter on what is dirty data and what the consequences are for businesses. But then there’s also how to classify, how to normalize, and how to build a taxonomy.

There's a data horror stories chapter so I share some stories that have been donated so it covers pretty much all the procurement data and top line information that you might need.

Fabian | 22:19.24
Yeah, it sounds like an absolutely great read. So, thanks a lot. Given what I’ve experienced with procurement data, I truly think it’s the foundation of every procurement organization. Even though I’ve unfortunately not read your book yet, I feel like what I’ve heard now will make me blindly recommend it when it comes. Thanks a lot for your time.

Also available on
Procurement Unplugged on Apple Podcasts

Explore more of our Podcasts

NEWSLETTER
Sign up for the newsletter!
Stay up to date and receive news about procurement and Mercanis, as well as new webinars, best practice guides, white papers, case studies, surveys and more.
Sign up now