Using machine learning to predict churn
SIDN Labs uses AI to solve problems, not for its own sake
SIDN Labs uses AI to solve problems, not for its own sake
SIDN Labs uses AI – or, to be precise, machine learning – to come up with smart solutions to real-world problems. With the aim of making the DNS and the .nl zone more secure and keeping it secure, the team has previously used the technology for the logo-recognition tool LogoMotive, which is now incorporated into SIDN BrandGuard, and for the RegCheck tool developed with DNS Belgium. Research Engineers Thijs van den Hout and Thymen Wabeke are now working with our marketing data team to investigate the possibility of using machine learning to predict whether a domain name registration is liable to be cancelled in the near future. Knowing that would be very helpful to everyone that works with .nl domain names.
AI is one of the fields that Thijs and Thymen are involved with at SIDN Labs, but they prefer to use the term 'machine learning' when describing their work. "'AI' is a bit of a catch-all phrase. If you ask 10 people what AI is, you're likely to get 10 different answers. That makes discussing the subject difficult," says Thijs.
Thymen adds, "We usually say that artificial intelligence is the objective: you want your computer to do something intelligent. You might want it to generate a good text, for example, or suggest a relevant domain name. One way of doing that is by analysing patterns in data, and machine learning is a technology you can use for that." Another important distinction is that SIDN Labs works mainly with predictive AI, whereas most people associate the term 'AI' with systems such as OpenAI's ChatGPT, which use generative AI.
Machine learning is very good for analysing patterns in large volumes of data. Take DNS traffic. SIDN processes more than 4 billion queries a day about .nl domains. "If you want to learn anything from that amount of data, there's no way you can do it by analysing the data yourself," says Thymen. "You need machine learning to do it for you. Then, for example, you can look into ways of optimising your network configuration. That's what we're trying to do with our Autocast project: we're using machine learning to identify the best places to locate our name servers."
Machine learning-enabled analysis of the large volumes of data we have can create opportunities for registrars as well. SIDN BrandGuard's logo recognition tool is a good example of machine learning technology being used for a practical application. RegCheck is another: it uses machine learning to assign risk scores to domain name registrations, so we know how likely it is that a given registration is malicious. The use of such tools to detect suspected abuse is very useful to most registrars, because it enables them to nip problems in the bud. And that means fewer people falling victim to phishing, and less work for domain name operators cleaning up after incidents.
Abuse prevention isn't the only opportunity that the research engineers are investigating. Once a month, SIDN Labs crawls all .nl websites and analyses the HTML content to see how the sites are used. How many have marketing content? How many are webshops?
As well as being used to categorise sites, the crawler data enables SIDN Labs to look for pointers as to the likelihood of a domain name being cancelled. "If a domain name is inactive for a long time, the probability of it being dropped when it next comes up for renewal is probably higher. And there may be other cancellation risk flags we can identify. So we're currently working with our Business & Support colleagues to see what we can come up with," says Thijs.
The research aimed at identifying churn prediction factors is at a very early stage. "Through CENTR, we're also in contact with 12 other registries – half of all CENTR members – that are interested in the project and want to be involved. So it's clear that the issue is widely relevant in our industry. For now, though, we're still at the stage of trying to find things that might predict whether a domain name is at high risk of cancellation."
One positive spinoff of the project has been to promote collaboration between SIDN's 2 data teams. "At Labs, we look at our datasets from a security perspective, whereas the Business & Support data team is mainly interested in marketing analysis. It's great to see the appetite for interaction grow through the project, with each team trying to learn from the other's outlook. Closer cooperation amongst teams is very much encouraged at SIDN."
While offering opportunities, the use of AI does involve potential hazards as well. SIDN Labs therefore always applies the 'human-in-the-loop' principle. Our systems don't make automated decisions about things like the modification of registrations: a human is always involved.
Thijs and Thymen also emphasise the need to keep sight of the true objective. "I think it's important to work with AI as a means to an end, not an end in itself. Many organisations like the idea of using AI, because it goes down well with their stakeholders. What matters really is using AI as a tool for resolving problems, not for its own sake, creating solutions in need of a problem."
Thymen agrees, and also flags up the importance of careful case-by-case analysis to decide whether AI is the right tool for the job. "People often see AI as a magic solution. We'll use AI, and it'll be fine. But that's not the way it is." For many problems, Thymen doesn't see AI as the best solution. "Here in the Netherlands, we've had 2 high-profile scandals, where data-based and algorithmic decision-making and risk assessment went badly wrong." Thymen therefore sees it as his personal mission to make sure that we do the right things, and that we get a return on our investments. "At SIDN, we're careful to define the problem we're trying to solve and decide on the best approach before we reach for AI. That way, we avoid mis-investment, and I think we can be proud of some of the excellent results we've achieved by using machine learning."
Our churn research is also intended to help .nl registrars. We'll keep you updated about the results and any relevant follow-up work.