I had a conversation this week that reminded me of my old days of thinking that the Semantic Web was going to make the world a much better place. Well, I can’t say that I was right there. Before I go further on why, let me step back and explain what I thought the Semantic Web was going to accomplish:
One of the problems of the internet is that you build these linked silos (sometimes not even linkable) of information and activity. Let’s say that you want to go to a coffee shop that is walking distance to you, has good reviews, and that a friend of yours has recommended. Today the only way you can do it is to hope that there is a service out there that has merged all those features and that is nice enough to allow you to filter things that way. But there are services out there that allow you to know what is walking distance (for your definition on how long you are willing to walk), there are reviews, there are communications with friends. It’s just that they are not connected in any way.
What the Semantic Web was going to do was to allow you to have a “meta-internet” which allows people to build features on top of other people’s data and activities that would tailor to specific goals. It would still maintain “ownership” of the data (e.g. it wouldn’t require somebody to crawl and keep a copy of all the data), it would just abstract the presentation of the data to a specialized system. And that system then becomes the source for even more specialized systems.
Sounds cool? Well, I thought it did (and actually part of me still thinks it’s cool). But it didn’t work (there are still semantic web researchers out there, so maybe some people think it hasn’t worked yet, but it may still one day work). Why didn’t it work? There are many really hard problems:
- There are many technical challenges of doing this distributed dynamic queries efficiently considering you don’t own pieces of it, so you have to constantly deal with potential availability and latency issues.
- Representing data in a way that it’s expressive and reusable is very hard. On my days of working at the Amazon catalog, I’ve learned the cost of getting high quality domain-focused data at scale. Imagine getting data that is “complete” so that it can be correctly connected to other things… So expensive that companies would need to have a very strong financial motivation to do it. Without owning the experience and guaranteeing stickiness, ad impressions, and constant feedback on how you can improve things, it’s very hard to show that financial motivation.
- Building the higher level applications also isn’t easy. The logic-based language that is used to represent knowledge and be able to navigate through it isn’t for the faint of heart.
- There are lots of security considerations around interacting through data.
That’s it. Can’t be done? Well, maybe there could be some partial light at the end of the tunnel. The key piece that is changing is that we now have much higher level “programming languages” that might be able to at least solve for (3) by moving away from this complex logic-based language into more of a natural language approach. Behind a lot of what voice assistants do today is technology that was researched by the Semantic Web people.
Another thing that has continuously evolved is on making data available through APIs. It’s not the same as (1) where everybody is talking the “same language” and you can connect concepts almost “out of the box” by just adding configuration. Every API that you integrate you need to write code specifically for it. But back to the higher level programming languages idea, maybe it’s becoming pretty easy to consume those APIs more consistently, so we can build that joint knowledge.
Now we are left with financial incentives for companies to open up their data and security considerations. On the security side there are some pieces of support for it, but making that efficiently and have some transitive permission model is still an open question (which points towards things like OpenID, which didn’t get much traction either). On the financial incentives, well… That’s not my area of expertise. If it wasn’t so expensive to do, I’d harbor some hope that a Wikipedia-like solution would eventually happen (like Wikidata), but I’m not so sure it can.
Maybe it will be built internally at a company to power their voice assistant and search and then the US government will come along and force it to break it apart and the only way they can keep powering both with the same source was to open it up for everybody to use. Yeah, wouldn’t that be cool?