Methodology.
In the 21st century, every business is an online business. This is to say, whether or not they sell digital services, the overwhelming majority of companies own and maintain a digital presence to represent and promote their business.
The premise behind our approach is simple:
The digital presence of every company contains signals that we can observe and analyse to learn about the characteristics and behaviour of that company.
For instance, a business with a sophisticated website that incorporates chatbots, video and e-commerce is likely to have a higher overall level of digital maturity than a business with a basic site consisting of a landing page and email contact.
There are many different types of signal contained in a company’s digital presence that could plausibly inform us about commercially significant aspects of its behaviour.
Our current toolset monitors over a hundred different variables, and we’re sure there are many more we’re yet to discover. Some of the signals we currently collect include:
- The rate and scope of the updates companies make to their website
- The sophistication of the technologies embedded in a company’s website
- The reading age of the language used on a website
- The semantic content of the language used on a website
- The presence of policies and accreditations
- The density of outbound links
Using tools like automated web scraping and machine learning, it is possible to collect and process these data at scale, allowing us to analyse the digital behaviour of many thousands of businesses simultaneously.
This allows us to differentiate and compare performance on our various metrics across large populations of businesses, sorted by size, geography, sector and so on.
Using this data, we’ve conducted national level business surveys, studied the impact of major societal events like the COVID-19 pandemic and evaluated the impact of policy interventions on target company cohorts.
We also used it to build a website
Over time, we’ve built up a broad portfolio of different metrics (see our Glossary for more details). Each metric is based on a different digital tools, and measures a distinct set of signals. However, they all share the same basic underlying methodology:
- We build a structured sample of companies, designed to accurately represent the target population. In this case, our sample is structured to represent the distribution of different sizes and types of business by SIC code across Wales’s 22 unitary authorities.
- Each month, our web crawler visits the website of each company in our sample to collect the data we need to run our analysis. Note that all the data we collect is from public-facing websites and open databases, and we limit the scope of data collection to avoid placing an undue burden on company websites. For more details, see our Data Collection Policy.
- We process these data in various ways to produce scores for our various metrics. In some cases, this might involve use of a neural net trained to detect certain patterns in the data, in others, a natural language processing module will analyse website text for particular concepts or themes.
- We package the scores up into a new edition and publish to the site.
Our methodology emerges out of a decade-long program of research undertaken by academic teams across Europe, who have used a similar approach to predict a range of business characteristics, including likelihood of default, e-commerce adoption and export capacity. Please see our references for more detail – including papers published by our team delving deeper into our tools.
What you can see on this site today is just the beginning. Looking forward, our ambition is to expand the range of company behaviours we can measure by developing new metrics, as well as extending our coverage to new regions. If you’d like to be a part of this journey, don’t hesitate to get in touch.