Trending Now

The three hard parts of manually tagging subjects on your web pages

Everyone loves faceted search–drilling down on your search results by selecting filters. We all use it in eCommerce to pick colors and prices, but we also are starting to see more of this on content websites. And most informational websites that tackle the top of funnel would love to let you drill down by subject–or topic or theme or whatever your favorite word is. They want searchers to be able to drill down in search results based on what the results are about.

And on the search side, that’s easy. Just about every search engine can drill down based on facets, which you can define as just about any value in one of your tags. So, you can filter by year published or industry or anything you can tag for.

But the hardest thing is to tag your subject. There are a couple of problems that bedevil us:

  • Deciding the subjects. I’m not sure if you have ever tried to do this in a medium-to-large company, but it’s a beast of a process. Everyone has their idea of what the right subjects should be, and how they should be arranged into some kind of hierarchical taxonomy or ontology. It takes forever to get agreement and there is no guarantee that your site visitors actually recognize what your experts decided. And there is no way to know if it reflects the actual composition of your content.
  • Tagging the documents. If anything, this is harder. You probably chose several dozen subjects and now you ask the authors to correctly classify each web page into the proper subject category (or categories). That might sound like an easy task, but it is actually quite hard for people to do. Some experiments show that not only do people disagree with each other on the right answer, but if you give the same person the same job a couple of days later, they often disagree with themselves. So, it’s a hard job that is rarely done consistently.
  • Making changes. You might see a pattern here, but this is actually the hardest job of all. To change your taxonomy to reflect changes in your industry, perhaps, you must reconvene all the same experts and get them to agree again on what the new subjects and hierarchy should be, which usually isn’t any easier than it was the first time. But then comes the real fun, which is that you have to go back to manually review every single document and potentially retag them with the new and changed subjects.  If you have more than a few thousand documents, you might imagine that this is excruciating. There might be a few ways to automate some of the task, but the work that is left is more than enough to dissuade anyone from every changing their taxonomy.

So, what do you do instead? Add a lot more automation.

I have worked with clients to use natural language processing and machine learning technology to examine their content to suggest a subject taxonomy based on their actual content. The experts can provide some feedback to move categories around and combine others, but, in the end, the machine has done most of the work, and they can agree more easily to the picture that was painted than they can agree on how to paint a blank slate.

And once we have a machine-generated taxonomy, we have the training data to automatically tag all the documents with those subjects. Now, the tags might not always be correct, but we know humans don’t do the job correctly, either. At least we know they will be done consistently, and we can work to improve the accuracy by analyzing and correcting errors.

Lastly, it’s a lot easier to make changes. You can redo the taxonomy generation process based on any new content you’ve added that might have new subjects. And you can retrain the automation and relabel everything a lot more easily than doing it by hand–which makes you more willing to change your subject taxonomy as your content changes.

If it sounds too good to be true, well, it isn’t perfect. But I am confident that it is a whole lot better than what you are doing now. And if the pain of manual subject taxonomies have kept you from even trying them, this might be an easy way to provide a much easier search experience for your customers.

Avatar

Mike Moran

Mike Moran is an expert in internet marketing, search technology, social media, text analytics, web personalization, and web metrics, who, as a Certified Speaking Professional, regularly makes speaking appearances. Mike’s previous appearances include keynote speaking appearances worldwide. Mike serves as a senior strategist for Converseon, a leading digital media marketing consultancy based in New York City. He is also a senior strategist for SoloSegment, a marketing automation software solutions and services firm. Mike also serves as a member of the Board of Directors of SEMPO. Mike spent 30 years at IBM, rising to Distinguished Engineer, an executive-level technical position. Mike held various roles in his IBM career, including eight years at IBM’s customer-facing website, ibm.com, most recently as the Manager of ibm.com Web Experience, where he led 65 information architects, web designers, webmasters, programmers, and technical architects around the world. Mike's newest book is Outside-In Marketing with world-renowned author James Mathewson. He is co-author of the best-selling Search Engine Marketing, Inc. (with fellow search marketing expert Bill Hunt), now in its Third Edition. Mike is also the author of the acclaimed internet marketing book, Do It Wrong Quickly: How the Web Changes the Old Marketing Rules, named one of best business books of 2007 by the Miami Herald. Mike founded and writes for Biznology® and writes regularly for other blogs. In addition to Mike’s broad technical background, he holds an Advanced Certificate in Market Management Practice from the Royal UK Charter Institute of Marketing and is a Visiting Lecturer at the University of Virginia’s Darden School of Business. He also teaches at Rutgers Business School. He is a Senior Fellow at the Society for New Communications Research. Mike worked at ibm.com from 1998 through 2006, pioneering IBM’s successful search marketing program. IBM’s website of over two million pages was a classic “big company” website that has traditionally been difficult to optimize for search marketing. Mike, working with Bill Hunt, developed a strategy for search engine marketing that works for any business, large or small. Moran and Hunt spearheaded IBM’s content improvement that has resulted in dramatic gains in traffic from Google and other internet portals.

Join the Discussion

Your email address will not be published. Required fields are marked *

Back to top