Long Live to the Semantic Web

I had a conversation this week that reminded me of my old days of thinking that the Semantic Web was going to make the world a much better place. Well, I can’t say that I was right there. Before I go further on why, let me step back and explain what I thought the Semantic Web was going to accomplish:

One of the problems of the internet is that you build these linked silos (sometimes not even linkable) of information and activity. Let’s say that you want to go to a coffee shop that is walking distance to you, has good reviews, and that a friend of yours has recommended. Today the only way you can do it is to hope that there is a service out there that has merged all those features and that is nice enough to allow you to filter things that way. But there are services out there that allow you to know what is walking distance (for your definition on how long you are willing to walk), there are reviews, there are communications with friends. It’s just that they are not connected in any way.

What the Semantic Web was going to do was to allow you to have a “meta-internet” which allows people to build features on top of other people’s data and activities that would tailor to specific goals. It would still maintain “ownership” of the data (e.g. it wouldn’t require somebody to crawl and keep a copy of all the data), it would just abstract the presentation of the data to a specialized system. And that system then becomes the source for even more specialized systems.

Sounds cool? Well, I thought it did (and actually part of me still thinks it’s cool). But it didn’t work (there are still semantic web researchers out there, so maybe some people think it hasn’t worked yet, but it may still one day work). Why didn’t it work? There are many really hard problems:

  1. There are many technical challenges of doing this distributed dynamic queries efficiently considering you don’t own pieces of it, so you have to constantly deal with potential availability and latency issues.
  2. Representing data in a way that it’s expressive and reusable is very hard. On my days of working at the Amazon catalog, I’ve learned the cost of getting high quality domain-focused data at scale. Imagine getting data that is “complete” so that it can be correctly connected to other things… So expensive that companies would need to have a very strong financial motivation to do it. Without owning the experience and guaranteeing stickiness, ad impressions, and constant feedback on how you can improve things, it’s very hard to show that financial motivation.
  3. Building the higher level applications also isn’t easy. The logic-based language that is used to represent knowledge and be able to navigate through it isn’t for the faint of heart.
  4. There are lots of security considerations around interacting through data.

That’s it. Can’t be done? Well, maybe there could be some partial light at the end of the tunnel. The key piece that is changing is that we now have much higher level “programming languages” that might be able to at least solve for (3) by moving away from this complex logic-based language into more of a natural language approach. Behind a lot of what voice assistants do today is technology that was researched by the Semantic Web people.

Another thing that has continuously evolved is on making data available through APIs. It’s not the same as (1) where everybody is talking the “same language” and you can connect concepts almost “out of the box” by just adding configuration. Every API that you integrate you need to write code specifically for it. But back to the higher level programming languages idea, maybe it’s becoming pretty easy to consume those APIs more consistently, so we can build that joint knowledge.

Now we are left with financial incentives for companies to open up their data and security considerations. On the security side there are some pieces of support for it, but making that efficiently and have some transitive permission model is still an open question (which points towards things like OpenID, which didn’t get much traction either). On the financial incentives, well… That’s not my area of expertise. If it wasn’t so expensive to do, I’d harbor some hope that a Wikipedia-like solution would eventually happen (like Wikidata), but I’m not so sure it can.

Maybe it will be built internally at a company to power their voice assistant and search and then the US government will come along and force it to break it apart and the only way they can keep powering both with the same source was to open it up for everybody to use. Yeah, wouldn’t that be cool?

Reading to learn

I’ve read a lot of different books and articles that highlight the importance of reading all the time to increase your knowledge and the most successful people are the ones that are constantly learning.

I’m going to agree that learning is important. But I think I have a slightly different take on what learning really is and how to do it. Actually, it’s not really novel, just something that I feel like a lot of these articles forget to emphasize.

The key concept that I believe is that we need to learn by doing. Just reading something and not putting it into use does not give you actual learning and I don’t think it is really that useful. Or else, considering the hours that the world spends reading Facebook, Twitter, WhatsApp, etc. we would be a very enlightened society. I don’t think we are quite there.

One consequence of doing it is that we can’t learn by just discussing the quantity of books that you read a day/week/month. More books can actually be the opposite as I’m going to claim (and I’ll weaken this claim a little below) that you are probably wasting a good amount of your time skimming through important information that you will just end up forgetting and never taking advantage of.

There are a couple of exceptions to this statement that come to mind:

  1. Reading for empathy sake: there is a good amount of research that supports the reading fiction is good for improving your empathy, your ability to put yourself in she shoes of somebody else (because that’s what you are doing reading those books). For that use, actually reading is all you need to do.
  2. Reading for building connections: there are some cases where you don’t really have a way to put something to practice directly, but you may do it by using that reading to then nudge your brain to connect concepts that you didn’t connect before. So you may be able to work on the concepts of book A and by reading book B and continuing to work through book A’s ideas, you will find yourself actually being able to connect A to ideas in yet another book C, because B provided you that mental bridge.

I think the most important thing to think about is really to spend time doing things.

Even if what you read is fiction only, spend time with it. Think about how the author develops the story or the relationships, and consider how that could help you develop relationships around you. Maybe take some time and rewrite a chapter of the book in a way that you think it could have gone. I’ve also seen people spending time actually trying to draw or build things that happen on the book, or cook the food on the book, taking you even deeper into the empathy that you get from it.

Yes, that will most certainly take time out of the time you’d be reading. My thesis here is that this is a good thing. Quality beats quantity.

Another side from it is that it will keep making the world better in aggregate. If most of us are just sinks for knowledge, or just filters of knowledge, then knowledge is not being built as quickly as it could be if we were all producing our own dimension to it, adding a little bit of insight and connection to the world.

Background music playlist-making frustrations

From time to time I decide that I want to listen to music while working. It’s not that often because my calendar is usually filled with meetings, so the best I often can do is to listen to music at the beginning of the day.

But today I had a big block without meetings so I decided to go for it and build new playlists of things to listen to. And that, for me means classical music. So I went to Spotify and did some research on what to listen to. When I don’t have an actual idea in mind, I often open their “new releases” playlist and then select a couple of albums from there to listen to.

Maybe let me step back and explain something that annoys me deeply about some (many) classical music playlists, especially the ones that are created by algorithms: they usually don’t include whole pieces, but just a movement of a piece, and then another movement of a different piece. That’s now how I listen to classical music, and I think that’s not how anybody should listen to classical music.

So back to my method: I’ll then go to each song that seem interesting and open the album to get the whole piece (and sometimes the whole album). So far so good, right? Well, not really. Thank you Spotify! Here are some examples:

I select an entry from the playlist…
I get a single?
Another example…

I had a lot of examples I could place here. I don’t know who it to blame for this, but some process is setting up tracks from actual albums as singles, so even if I wanted to listen to the whole piece I can’t!

I was frustrated about things like this (and auto-generated playlists) before and did try some classical music-focused services in the past and I’m not sure I was excited enough about those to stick with them. They were more expensive and considering that I don’t listen that often to them and they don’t offer as many integrations as Spotify (and Amazon Music), I ended up giving up on it.

So, well, today I’m ending my day still frustrated.

The spiral

It’s hard not to notice how much this year has forced us to adapt. March brought us (at least here in Seattle) COVID-19 and staying at home with kids almost all the time. We survived the kindergarten part of it, then came Summer and there was some sort of rhythm that was built at home of going out for a walk in the mornings and then starting the day. Then came first grade, which made things a little bit more complicated in the morning. Then came smoke and bad air quality which completely cancelled the morning walks.

But it’s fine, we adapt. Especially considering that I’m very fortunate of having the flexibility of choosing my own path to adapt. I still have my income, everybody in my family is healthy, my house is big enough that I can work at the same time as my older son is at his school Teams meeting and my wife is playing with my youngest, and nobody is interrupting anybody. I can’t say the same to a very large portion of the population.

Happy 2020!

It’s 2020 already and I feel quite old… For somebody born in the 1970s (yes, towards the end of that decade), 2020 feels like it was so far in the future!

The 2010s was a decade that contained a lot of changes. Marriage, 2 kids, 2 houses purchased (and 1 sold)… They were, for the most part, additive. So it was a pretty good decade (actually 2000s was pretty good too, but mostly circled about moving to the US and settling down here). What I expect (or don’t expect/hope won’t happen) in this decade:

  • Kids will continue growing. The oldest will be in high school at the end of the decade! (yikes)
  • Hopefully no more house buying or selling (but can’t be sure of that…)
  • I’ll probably still be a software developer at the end of the decade, or potentially doing more stuff in the management side. Unlikely I’ll retire or “semi-retire” (i.e. choose a lower-paying job that is more like a hobby-turned-business).
    • I will not make any statements on how many different companies I’ll have worked through the decade, because I think that’s something that can be interpreted incorrectly. I currently have no plans that include changing jobs in any planned window of time.
  • Looking back at the decade, the world and how we interact with the world would have changed a lot. I think a lot of it will be due to climate change and the things we will have to do to slow it down. It may be gradual and we may only be able to notice it in a more aggregate form, but it will happen.
    • Some of it will be good, of us being more aware of the resources we are using and the secondary effects around its use
    • Some of it will be bad. Many places and ecologies will be lost forever. Some activities will have to be adjusted to be more indoors, to move more inland where there will be less flooding.

I could go into technologies that I think will actually change in the 2020s, but I’m sure there are better lists on that out there.

Anyway, I’m happy we made it this far. Happy 2020!

Probability can be confusing (maybe…)

I’ll skip the long story that leads to the following statement, but I currently have a 1-year subscription to Blinkist. I think it’s a pretty good service that provides some interesting (but sometimes a little biased) summaries for nonfiction books.

So I was reading one of them, on The Drunkard’s Walk by Leonard Mlodinow and it had the following statement:

Imagine you start rolling a dice and recording numbers. Would you expect the results to be perfectly random? If they were, each number would appear exactly once every six rolls.

I stopped and re-read this a couple of times to make sure that I had it right. Well, outside a pet peeve of mine of using “dice” as the singular, where it should be “die”, the problem is that this is not the definition of “perfectly random”. It’s a definition of “very biased, where the 6th roll is always deterministic”. They are treating rolling dice as if it’s sampling with no replacement, which that’s not how dice work. The side is always there even after you roll it once.

First, I don’t think this is a problem with the original book, but with the summary provided by Blinkist. So why did the summary author make this mistake? Well, I think that it’s because actually probability thinking is not something that is natural to people. The book even talks about it being reasonably recent when it was formalized on even simple things like “why is it more likely to roll a 10 when you roll 3 dice than a 9?” It’s fascinating to think about that from the other side of having studied statistics for a while.

But don’t give up on the summary just because of a mistake like this, even if it was a little bad. The rest is actually pretty good. It doesn’t fully convey what is in the book, but it has a good amount of interesting bits that makes it worth reading.

A new website

So I have a new website… You may be wondering why I did that considering that I rarely update this website? Well, basically I decided I didn’t want to get rid of this website and, at the same time, I didn’t want to pay Squarespace prices to maintain it. So I moved to the slightly cheaper WordPress and let’s see how it works.

Don’t get me wrong, I really like Squarespace. It was easy to use, had way more intuitive navigation and formatting, and it did make my website look better without a lot of effort. So I do feel a little bad that I left, but I couldn’t convince myself to pay almost $200 a year to keep a website that I’m not using.

So, do I have plans for this website? I’ll get back to you on that. I do want to write more and share my ideas more often, but right now I’m not convinced I have the time to do it. Maybe 2020…

Classical Music Streaming

It has been observed by many people that “traditional” streaming services are terrible at classical music.

PCmag: Primephonic Wants to Save Classical Music, 1 Stream at a Time

Forbes: Meet Primephonic, The Streaming Company On A Mission To Save Classical Music

LifeHacker: The Best Classical Music Streaming Service Is Idagio

TechAcute: IDAGIO: The Streaming Service For Classical Music Fans

And they are pretty terrible. Classical music is complicated… There are lots of dimensions in which you can slice and dice the music, because the stronger dissociation of composer and performer, and the greater diversity of well-defined genres. I’m not saying that popular music doesn’t have as much diversity (although it might be true that it doesn’t), just that the taxonomy is not as well developed. Another dimension of complexity is the fact that a piece of classical music often needs to be taken as a complete unit, even though it might be split into multiple tracks (but not always, and not for everybody).

So enter the latest contenders, Primephonic and Idagio. How do they stack?

TL;DR: they are certainly better at handling classical music than the “standard” streaming tools. But they are far from perfect. And their “imperfection” is inconsistent. IDAGIO has a big defect (at least on the iOS client) of not supporting today gapless playback, so between tracks you get this short gap which is very annoying. This makes Primephonic the winner in my opinion, but not by much.

I started a trial on each and decided to do some comparisons for how I would use it. Here are the use cases that I tested:

  1. Query for a standard piece with an expected high number of recordings

  2. Query for a standard piece with less recordings

  3. Query for a series of pieces with low number of recordings

  4. Query for ensemble

I also discussed a couple of features related on how it is to listen to long pieces, make playlists and discovery.

I did not go into sound quality. That requires a fair amount of time investment in an environment where I could listen to the quality, so I just accepted what they advertise (Primephonic 24bit FLAC and IDAGIO 16bit FLAC) and that probably if you want to pay extra for it (for both you pay extra for lossless), you will get reasonably equivalent quality which will be more limited by the source recording than the streaming format.

1. Query for standard piece

beethoven string quartet 14

beethoven string quartet 131

(refers to Ludwig van Beethoven’s String Quartet No. 14 in C# minor op. 131)

IDAGIO:

“131” returns two works

  • String Quartet No. 14 in C sharp minor op. 131 (arr. for Piano) – single recording

  • String Quartet No. 14 in C sharp minor op. 131 – 53 recordings

Oddly, when I was typing it and was at “beethoven string quartet 13” it showed another work

  • String Quartet No. 14 in C sharp minor op. 131 (Arr. for String Orchestra) – 5 recordings

“14” returns:

  • String Quartet No. 14 in C sharp minor op. 131 (arr. for Piano) – single recording

  • Sonata for Piano No. 9 in E major op. 14/1 (Version for String Quartet)

This inconsistency made me ready for what was coming next for the next searches.

Primephonic

“131” returns 0 works! But quite a few albums. That’s what made me decide to also try “14”.

“14” returns 1 work:

  • String Quartet No.14 in C-sharp minor – 93 recordings, but when you click on it it only shows 82 recordings mixed orchestral and quartet. It had a lot of duplicates, but seems to have more recordings than IDAGIO.

2. Query for a standard piece with less recordings

“part festina lente”

for Arvo Pärt Festina Lente

IDAGIO

Not marked as “works” but returns “Festina Lente, for Harp and String Orchestra (1986, rev. 1990) – 5 recordings

Primephonic

“Festina lente” – 8 recordings

Interestingly the recordings have very little overlap.

3. Query for a series of pieces with low number of recordings

“bachianas”

Heitor Villa-Lobos “Bachianas Brasileiras No. 1-9”

IDAGIO

  • Shows (out of order) Bachianas Brasileiras 1-8 with dates

  • In the bottom has repeated 1 & 5 (twice)

Primephonic

  • Shows (out of order) Bachanas brasileiras 1-9 but missing 6

  • Repeated 4 & 5

4. Query for ensemble

“roomful of teeth”

IDAGIO

  • Only had one match for Berio: Sinfonia

Primephonic

  • 3 albums

  • Does have the Berio above, but doesn’t show in the search results (although it is in the metadata)

Other use flows

Recordings

They both have a concept of recordings which is a pair of work+recording, which allows you to add to playlist all movements for a single piece easily. That does create sometimes a little bit of a confusing point of what you are looking at when navigating. Moreover, on IDAGIO they have a concept of a “collection” and you can add “tracks”, “recordings” and “albums” to the collection. Some albums contain a single work, so sometimes you may think that you are adding a “recording”, but it ends up in the “album” because you were actually at the album view.

I think it’s a necessary concept, I just don’t think they cracked the UI component of it.

Playlists

I have this odd use case of creating a “today” playlist, where I accumulate the pieces that I want to listen to today. “Today” doesn’t mean that I’ll be able to listen to all those pieces in a single day and that’s where I have an unsolved issue: I’d love to know the last piece that I listened to on my playlist. Spotify (before my account started being shared with other people in my household) was pretty good at giving me where I was when I left, so, unless I switched to another playlist/album, I could just continue where I left off. Neither IDAGIO or Primephonic seem to do that. Often when I open their app, I start from scratch and I have to figure out where I was. I have a similar note below on playback.

Beyond that, playlists work as expected. You can add tracks, albums or recordings to them. But if you are on an album and you want to add a recording from that album (i.e. multiple tracks at once for the same work), you have to first navigate to the “recording” view for those tracks and then add them to your playlist.

Playback

This is probably one of the weakest points on IDAGIO, which made me decide that today (when I tested it) they are not the winner: it doesn’t support gapless playback. And that’s a big sin especially for classical music where you have a lot of works that have multiple movements that sometimes, by design by the composer, don’t have a break between them. Now on playback IDAGIO adds a break.

Beyond that, I also felt that IDAGIO took a little longer to start playing than Primephonic.

Finally, related to the “starting from scratch” that I mentioned on playlists, I also can never get it to autoplay when I start a new bluetooth connection. For example, my car connects to my phone on bluetooth and then I can play music through the car sound system, like most cars today. Let’s say that I select something on IDAGIO or Primephonic to play. Then I park the car, go get something and then when I get back to the car, I expect the bluetooth to connect again and trigger play on where I was, but that never worked on my tests for any of those two services. Kind of like the “start from scratch”, they just don’t know what to play. I have to go to the app and play from there.

Other Notes

For sure IDAGIO has the most number of features from the two:

  • Music by mood

  • Good music by musical instrument with a very large selection of instruments

  • More selection on pre-built playlists

But I don’t think that’s really enough to make it the winner, but, in technical terms, I believe the gapless playback is probably a smaller missing feature than the features above, so it might not take very long for IDAGIO to be my recommendation.

Ph.D. flashback of the day: scale-free networks

I was reading an article today from Quanta Magazine: Scant Evidence of Power Laws Found in Real-World Networks which refers to a posted article in arXiv: Scale-free Networks are Rare by Anna D. Broido and Aaron Clauset. And that gave me flashbacks of my Ph.D…

While, scale-free/power-law distributed networks weren’t really the main focus on my research, it did influence what I was doing, as it related to graph-structured databases, where a lot of that structure exists and affects scalability of your analysis. But, more importantly, it influenced a collaboration that I had with another researcher on the same department, Steve Morris. His actual interest was really on power-law distributed networks and his belief was that there was signal to be observed from when a network deviated from being power-law distributed.

During one Summer, we sat together and decided that the way researchers were claiming that everything was power-law distributed was by plotting it in a log-log scale and drawing a line through the points. We hypothesized that it was not a very good way to show that it followed the right distribution and we should have an actual statistical test. My proposal was to use bootstrapping and the Kolmogorov–Smirnov test. So we co-wrote a paper on it: Problems with Fitting to the Power-Law Distribution.

We didn’t have as much data to play with as the paper that I mentioned above, but we also did conclude that almost nothing was actually power-law distributed. And, until this date it’s the paper that I wrote with the most number of citations (622, according to Google Scholar).

We were onto something back then… Oh, well…

AI stigma

The other day I was noticing how many things out there tout to be using AI to solve things today and that’s a great thing. That gave me flashbacks of when I started at Amazon, in 2004. Back then I was just finishing my Ph.D. in a machine learning area, dealing with feature extraction on graph-structured databases, keeping a “fond” eye for the future of the web, the Semantic Web, where computers would be able to interact with the web as well as humans and that would be the turning point for what we’ve imagined as AI back then.

But we didn’t call it AI. AI wasn’t seen as a very good term. We talked about expert systems, or sometimes we did mention machine learning. But AI evoked two negative reactions:

  1. From the general population, AI was like 2001’s HAL 9000 or the Terminator, a robot/computer that was going to kill us all.
  2. For the research, and especially professional population, AI was that dream that people had in the past that just didn’t work. But “expert systems” were showing some great results (i.e. very bounded applications for algorithms initially developed, or inspired by algorithms developed for this “AI” field).

That even caused some awkward relationships at work. My manager then was a former IBM guy that was working on the AI team at IBM. A lot of people around me disregarded some of the work that we were doing because it was based on the ideas of this “AI guy”, so it wasn’t going to work.

Fast-forward to today and AI is everywhere. And now things have changed in perspective:

  1. From the general population, the biggest fear is not that it will kill us all, but that it will get rid of all jobs. A way more sensible fear based on things that go beyond Hollywood.
  2. In the research field, there is still some reluctance to call AI, as we are not at the “AI” imaged in the 80s and 90s, the one that failed miserably. We are just on a little bit less specific “expert systems” (or really “expert systems” that can learn from different applications too, but still applied to specific applications).
  3. On the professional field, people want to say that they do AI to say that they are at the edge of R&D and they are not going to be one of those companies that are going to be replaced by other company’s AIs. Yes, just like humans are afraid that they are going to be replaced by AIs, companies are afraid of the same risk.

So I think we are at a more sensible place. I personally don’t like that people are calling now any machine learning system as AI, but maybe that just softens the expectation of where we are going, making us forget a little bit that goal of this human-like machine that can out-think us. I can live with that!