What is a keyword ontology?

A keyword ontology is a knowledge graph describing relationships between the keywords your target audiences frequently use in search queries and the products and services you sell. It can include other relationships as well, such as taxonomies, information architectures, social listening queries, competitors, etc.

Not that your audience always comes to the content that sells your products through search. But a large of enough sample of them do that the keywords they use tell you a lot about the information they need to ultimately buy your products. A keyword ontology helps you understand which words tend to go with which products, and how you can out think your competition to deliver the content your prospects need.

And it can do so much more. In a previous post on marketing AI, I made the bold claim:

“Keywords are the life’s blood of your marketing enterprise.”

Why? Keyword research is not just about choosing the right search words for your paid search campaigns. They can serve as a foundation for external organic search, internal search, navigation, and social listening. All the data you gather when you use keywords to learn what your audience needs and your position against your competitors can transform the way you make products for the market and build content that influences customers to buy your products.

Why build a keyword ontology?

If keyword research is so important, why do so few companies do it well? Because it’s really hard. Anyone can go to Google Planner and get 100 words for every seed word they put in. But when you go through the search results for those 100 words, you might find five that are relevant to your business. Many businesses just use what Google gives them and only find out after the fact that they chose a lot of junk keywords. Done well, keyword research is typically a manual process of checking each search result on the top page in Google for each keyword. This doesn’t scale well if every brand or business unit within your enterprise is doing it in isolation. If every brand used Google Planner, they would all choose the same or similar keywords, and their content would just compete for the limited shelf space in Google.

The primary benefits of a keyword ontology are scale, collaboration, and automation. If everyone in your company is using the same keyword data model, you can govern keyword usage across all the teams vying for shelf space in Google and attention in social media. Teams can work together to create common experiences that attract and engage with common audiences. And if you use AI to help you sort and filter keywords, and assign them to the right product marketing teams, you can evolve your keyword ontology. Your performance metrics can be a feedback loop for your ontology, proving out your research, or help your system learn to get ever more accurate.

How do you build a keyword ontology

You can only build a keyword ontology if you have a keyword tool, such as Moz Keyword Explorer, BrightEdge, Conductor Searchlight, or SearchMetrics. Each of these tools has keyword research as a component. They all have their strengths and weaknesses. Depending on the size and scale of your business, one of them will be the best fit. I will leave that review to you.

The most important thing to know is that none of the search tools by themselves can serve as the primary database of your keyword ontology. First of all, none of them generate graph databases of keywords. They are, in some cases, rigidly hierarchical in the way they organize keywords. So you will need to export the keywords into a graph database. Depending on the system, you can bring over associated pages and even key metrics related to those pages as part of the export. Those can be important facets of the keyword ontology. And because the keywords in your search tool are always evolving, it’s best to try to build an API between your keyword tool and your graph database, which updates at least weekly.

The other reason to build your keyword ontology outside of your search tool is taxonomy. The primary nodes in your ontology that are not keywords but are the parts of your corporate data that are necessarily outside of your keywords: your products and services, and other aspects of your classification schemes. The ontology consists of a set of relationships between keyword clusters and nodes in these taxonomies. For example, keyword x is most frequently associated with product y.

I recommend using the system where you manage your taxonomies as the home for your keyword ontology. Assuming it is built in standard formats governed by W3C Semantic Web standards, it will be able to handle the complex relationships between the nodes in the database. And it will be able to generate graphs that help you visualize the data, and otherwise export the data in useful formats such as JASON-LD. The best one we found, after an extensive RFP, is TopBraid EDG.

When you have your environment set up, you’re ready to start building associations between keyword clusters and product families. But, without AI, this could take a long time. We did this manually at IBM and it took one person a full year to get a rough mapping. A few days after he finished, our company reorganized from four business units to 12, so he had to do the mapping again. We find that brand managers change things more often than the market changes. It’s a little like Heraclitus’s paradox: by the time you think you’re done, the whole job has changed under your feet.

The only way to build a keyword ontology at the speed of business is to use AI. At IBM, we use the Watson family of AI products, of course. Namely, Watson Natural Language Understanding (NLU) (nee Alchemy) and Watson Knowledge Studio. Watson NLU lets you extract the entities from large text repositories and sorts them by relevance to a query. Watson Knowledge Studio helps you manage and iterate on this data set with machine learning (supervised or unsupervised).

The standard training set for Watson NLU is DBPedia ― a structured database of the information in Wikipedia. This was too generic for our needs. We wanted our keyword ontology to be a database of all and only the keywords that are highly relevant to our business (.75 or better on a scale of 0 to 1). Instead of DBPedia, we needed a custom training set ― a large repository of content that was about our products. Fortunately, we have a IBM Knowledge Center, a repository of 200 million pieces of information related to IBM products. We used Watson NLU to extract entity pairs: for each keyword, what products were mentioned in close proximity or with strong prominence in the repository? The results were a good first pass on what keywords are relevant to our products, and we found many strong correlations. But it was about 70 percent accurate.

To get the accuracy we need (80 percent or better), we ran the data through a machine learning process that included people with strong domain knowledge in each of our units. They can see mistakes quite easily that machines don’t see, and teach the system to exclude those mistakes in future runs of the data. Eventually, we got better than 80 percent recall and precision (the two metrics AI engineers use to measure accuracy). The result was a set of relationships between keywords and product names.

Once we had the two key node types of the ontology, filling in the rest was a matter of using existing taxonomies. For example, we have a taxonomy called the Global Brand Table (GBT) consisting of products, product families, brands, and business units. Each product fits in such a tree. So we can associate all the levels of GBT to the keywords related to the products. This helps us assign the most relevant keywords to the units, governing them as we go.

We can also cluster the keywords into buckets that are more relevant to some parts of our business than others. The result is the Topic taxonomy, which sits between keywords and business units in the ontology. Topic is used for navigation labels, information architecture, internal search facets, and a bunch of related content tagging work that helps users find the content they need, and the products related to it. And because it was built with Watson, the Topic taxonomy can be used in auto-tagging tech powered by Watson. This takes the tagging effort away from the content producers, helping them focus on writing authoritative content.

There really is no end to the nodes we can add to this ontology. A key one we are working on is social listening queries. Social media managers often don’t know what keywords are most relevant to their brands. Listening queries are composed of keywords separated by Boolean operators. If you don’t craft the right queries, you won’t listen to the right conversations, and your whole analysis will be skewed. A common mistake is to use brand names in social queries. This only captures conversations between people who know about your brands, not all the conversations related to your product categories.

What if you built a library of unbranded social queries that are associated to brands via the keyword ontology? This can become a framework for measuring conversations and understanding how each brand should engage in them. Most importantly, it can be a consistent data structure to measure your reach, engagement, and referrals from social campaigns.

I could go on. But I’ve taken enough of your attention for the moment. Congratulations for making it this far! If you thirst for more, I’ll be giving a workshop at MinneWebCon in Minneapolis in April, with the architects of the keyword ontology. It should be fun.