Impressions from AI NEXTCon 2018

Today I went to AI NEXTCon, partly as a recruiting effort (Sift had a table there) and partly to go to the talks and see what people are talking about. It’s a fairly small conference and today was day 2 with 3 keynote speeches and two talk breakout sessions with 3-4 talks each.

It’s not a cheap conference. I think that attendance costs around $250/day. There are two conference days and two workshop days. So who attends those conferences? It was a very mixed bunch, actually. I saw some people that were fresh out of college looking for a job or to be inspired by machine learning. I saw some people that were there to sell their services (not only the ones that had a booth). There were some software professionals that were not in the AI field but curious to get into it. And there were actual AI professionals from many companies trying to see what is going on. And I think this last class is the one that was probably the most disappointed.

Talks were around 50 minutes long, so pretty long. That actually reduced the quality of the talks, in my opinion. Presenters either tried to cover a lot of ground and gave no examples, or tried to give “real” examples and ended up wasting time talking through code that probably went way too quickly to be really understood, but still took time. But I’m not going to mention names.

The only talk that I want to highlight was a talk by Amy Unruh, from Google. In that talk, she presented Google’s new AutoML. It’s still early stages for it, but I think it is a great new direction where ML-as-a-service should be going. I give some company my data and they give me a model that internally is trained with more data than the one I gave them. Hopefully that is done through transfer learning, but maybe there are other tricks that might work in fields where there is no reliable transfer learning solution.

I don’t think Google has the right product for it yet. I think there are some knobs that need to be provided for customers to do things like balancing errors between classes and other things like that, but it does have some good features:

  • Automatic separation of test set and visualization of quality in a lot of different useful ways
  • Support for asking Google to come up with people to manually label the data (apparently this is going to be staffed by Google employees/internal contractors)

The reason why I think it’s the future of ML-as-a-service is because this is where the value really is and scales. The pre-trained models are nice, but they are always hard to use in real life. The classes are either too granular, granular in the wrong places, or just not granular enough. Also it has sometimes puzzling errors (the example in the presentation above had actually a label that repeated twice with different scores – unfortunately it’s covered in the PDF version of the presentation). So you probably want to focus on the labels you care about for your application and bias training there.

If companies can provide you with a way to get the labels you want without having to spend the money to acquire the amount of data necessary to train a strong model, that’s the most sticky feature that you can provide people, and still give them what they want. I hope to see more on it soon. Image classification is an “easy” field for it (the problem is pretty well-defined and the inputs are consistent across domains, making it easier to get transfer learning to work). I will cheer more when companies start providing solutions for other fields, like many NLP tasks. Certainly something to keep an eye out for. Great start, Google!

The economics of personal data on the Internet

I was reading an article on The Economist:

Should internet firms pay for the data users currently give away?

It is an analysis of a paper “Should We Treat Data as Labor? Moving Beyond ‘Free'”
(although the article never mentions where it’s from).

While I have not read the original paper yet, here are my thoughts on the subject, that I actually even posted as comment on the article. Note that this is a reply from somebody saying that we are being paid for this data already because we are getting things for free that in the past we used to pay for (like GPS).

I completely agree that we are already implicitly paying for those services by providing our data. And that should be highlighted in the paper (it might be, but there were no references to the actual paper in the article).

I think the component that would make things more interesting is if we can make that explicit and then allow people to opt out of it (and then have to pay for the service). Interesting, but I don’t think it would be helpful at the end of the day. Data gathering from users is unbalanced. Some users give a lot of data because they use the service a lot, or have a device that is always collecting the data. And some users don’t give much data at all and don’t have the opportunity to do so. That means that the internet will become more expensive for some users, which might end up not being able to afford access and all the negative things that that entails.

I think that are many other dimensions that might be more interesting to investigate in a legal spectrum (some of them technically challenging): data portability (whatever I give to Google, I can ask them to export to GoogleNext so that they can be competitive and provide me personalized information), data sharing visibility (who gets that data and what are people outside the company allowed to query on), and data security (guarantees on what is available on a person’s raw data). That would help foster more competition while keeping data collected transparent and safe and that’s better than trying to just go the route of adding more complexity to our interactions with the internet.

Economics is hard and I don’t really have enough background to say really that the recommendation from the article is a bad one, but it feels like we are really attacking the problem the wrong direction.

At some point in the past I was asked to join a team that was handling ads on the Amazon website. My first reaction was that I was fundamentally opposed to the concept because we are muddling the relationship with customers by effectively giving something that we own from the customer (their page view) and giving a piece of it to a different company to profit from. But the reasoning was that this allowed Amazon to monetize those page views more effectively which, in turn, would allow for lower prices on the Amazon website.

That made me think about whether you can get the same effective result without the feeling that we are cheating customers to show them not something that they have asked for (and unbiased view that only contains the things that they are searching/browsing on the website). And I wasn’t able to really come up with anything better.

No, I didn’t end up going to the ads team. I decided to join a different initiative within Kindle and my path never crossed again with an opportunity doing ads, so I didn’t have to think about it anymore. Correct monetization of things is very complex, so we have sometimes to accept sub-optimal experiences in order for things to move on and, in aggregate, for you to have a better experience.

6 months away from Amazon… How does it feel?

As I’ve mentioned before, after 12.5 years at Amazon, I decided to move on and do something different. You may ask first why I left Amazon. I’ll get to that some other time. I want to first start with how is life outside Amazon. More specifically, how is life on a small mid-aged startup (6 years, past Series C) where the headquarters is in a different city where I’m working, and most people in the company have been doing software for less time than I have. That’s Sift Science, if you don’t quite know what I’m talking about.

TL; DR: it’s great! But this answer has quite a few dimensions to it:

  1. People: Sift is made of a lot of genuine people, open to feedback and learning. Amazon had a lot of those too, but for the most part Amazon’s culture focused the attention on people that were aggressive owners: they cared a lot about what they were trying to deliver and that sometimes created a little bit or us vs. them behavior.
  2. Visibility: Sift’s business is “simpler” and way more transparent to all employees than Amazon’s. Part of it is some level of maturity of the processes and staffing that Amazon has that led to a more complicated interaction between the parts of the business. And part of it is also the size of the product and the required siloing of the decision process. But, for the most part, there is some institutional distrust of people having too much information and that information potentially leaking to competitors. Back to the “people” aspect above, there is no distrust anywhere, so information flows everywhere (sometimes a little too freely, as I can see all closed deals, and ones that fell through).
  3. Software Maturity: that’s where probably there is a lot more to do at Sift than there was at Amazon. That’s expected for a fairly new company with reasonably young software developers. While the overall landscape of knowledge around availability, scalability and development best practices in the software industry has improved tremendously in the last decade, there are some components of the way software is being written that remind me of Amazon 12 years ago: monolithic structures focused on standardization and code reuse with the cost of complexity and unexpected dependencies (e.g. a table around metrics was broken and that prevented work on a job that only did a backfill for ip-to-geo mapping data).
  4. Problem space: in this one it’s a little hard to compare. At Amazon I was either working for projects that affected other teams (e.g. catalog projects to support category launches, or website feature launches), or things that directly affected customers (Amazon Go). At Sift I’m helping other companies scale their operations by externalizing the concerns about fraud detection. While we have the ability to have very close contact with those customers, in the end the impact to their business is only something we can guess or approximate. So measuring impact can be a little harder. Qualitatively, though, it’s a very important problem space, probably more important than any project that I’ve ever worked with at Amazon (unless Amazon Go actually takes off and causes a change in the way physical retail works – no signs of that yet). Getting feedback from some of our customers saying that they were only able to expand to different geographies because they trusted Sift to be able to detect fraud they were not experienced with and keep their focus on the actual “positive side” or their business is invigorating and not a rare feedback to receive.

I’m not going to deny that Amazon is an amazing company. It is honed with building processes and investments necessary to deliver things and learn from it. If that’s what you like to do and you are ready to sell your soul to get things done, I strongly recommend Amazon. Yes, it’s a large company, which means that there are pockets of everything, but in general it’s a relentless environment focused on business optimization and delivery.

Sift is not that. It is passionate, quirky, but in general sees problems in a more software-centric route. There is no strong business and process culture, but a culture of looking at the software that we’ve developed and the models that we’ve created, and making them better. That sometimes means trying some things that don’t necessarily take you anywhere, but there is no feeling of loss when that happens. We just try something else.

I don’t think I can right now claim is better or worse. It just opened my eyes to doing things differently. Internally I’m still struggling with the adjustment to it, but I can’t deny that we have very happy customers at the end of the day, so it’s certainly not the wrong way to do anything.

Maybe I’ll checkpoint my thought process again in another 6 months!


To-do lists, calendars, emails to self, etc.

I was reading the article “Why Calendars are More Effective than To Do Lists” by Srinivas Rao
and that reminded me of what I just wrote about how I was able to improve my productivity by using a few tools, including to-do lists. So I decided to think a little bit more about it and give my perspective on it.

Let me take this on abstract approach first and then I’ll try to map to tools:

  1. Write things down on a place that is easy to write and read
  2. Establish a routine of doing work based on what is written on the aforementioned place with daily/weekly/whatever-frequency objectives
  3. Build artificial scarcity to help prioritize long term goals
  4. Be as rigid as you can be, but make sure that your tool easily support flexibility where you can be flexible (e.g. you want to water the plants once a week, it doesn’t have to be stuck to every Monday just because that’s what you can schedule on your tool)
  5. Use recurring tasks as much as possible. They prevent you from having to think about the task twice every time you are going to do it: the first time to write it down and the send time to do it.

So now let’s think about common tools people use for doing this:

  1. Calendars
  2. To-Do lists
  3. Email inbox
  4. Paper notepad
  5. Sticky notes

So let’s go one at a time:

Calendar (the digital type)

It’s a great contender. You usually have it close to you at all times, and it’s where you track the things that have specific time to happen (like meetings), so it would make sense to be the place to look at things to do when you don’t have scheduled things. Also calendars have been great at doing recurring events (every day, every Tuesday, etc.).

However, calendars are not good at “fuzzy” scheduling. They don’t provide prioritization for things that need to happen. They also don’t allow for vague deadlines, like within the next couple of days. Finally, calendars are very day or week oriented, while things to do are actually oriented towards the now and the near future. So having a good view of what your availability is in one place is not that easy.

To-Do Lists

Those are great for providing you a view of specific tasks that need to be done, allow you to quickly prioritize, mark done, postpone. They give you a clear list of what needs to be done now and in the future and doesn’t really “sweat” about the past. Also, some to-do lists support recurring tasks.

On the other hand, they are not very good at supporting the “rigid” part of a person’s day. Meetings are still on a calendar forcing you to look at two different places in order to make a decision what to do next. To-Do lists also don’t necessarily integrate very well with other tools (like email to link you directly to the email you need to reply to, or document you want to finish). Forcing you to do a lot of extra work to do work.

Email inbox

For people where their lives are around emails, having an email to represent a task and using something like Google’s Inbox that allows you to “snooze” emails for a day, can allow you to postpone tasks and not lose track of them. Also email provides you with good integration and plenty of space to add as much information as you’d want (including images).

On the other hand, email clients are not very good at allowing you to easily prioritize things on your inbox. They tend to be time-sorted and that’s it. Also, like to-do lists, they don’t capture the rigid part of your day (calendar). Emails don’t support recurring “emails”. Finally, there is a lot of distractions that come to your email, adding cleaning your inbox to the job of planning your things to do, which is out of your control. You can have an independent inbox just for it, but that can be harder to manage.

Paper notepad

Paper notepads are very fast for entering new ideas. They can be textual or diagram-based. They also work when your phone has no battery.

On the other hand, you need to take them around with a pen, while, in general, it can be taken for granted that you will have your phone around. Re-prioritizing things is difficult. There is no support to recurring events. Moving events to the next day means scratching the event from one page and moving to the next (or something like that). They don’t integrate with anything else so you will find yourself doing a lot of moving around between environments. Searchability is an issue, so you have to keep your list short and focused only on the things to be done in the next couple of days.

Sticky notes

Provides the same advantages of notepads with also the ability to quickly reprioritize things. It allows you to color-code tasks giving some idea of context/priority. It is a very powerful visual solution to keeping track of things (that’s why it’s used by a lot of project planning approaches)

On the other hand, it works best with a board, which makes it less mobile. Also it has the searchability issue. It has some level of recurring approach, but you need to just move the sticky note somewhere else when done.


Overall, I actually don’t think there is a best solution. I think it depends a lot of how rigid your calendar is. If your day is 80% scheduled meetings, keeping things all on the calendar might be very advantageous. However, if your day is only 10% meetings and your tasks are less scheduled with pressing deadlines, then maybe a to-do list might work best for you.

None of them exactly handles artificial scarcity. Maybe there is some scarcity around how many hours you have in a day (real scarcity) or how much you can see on a page, but that’s very limiting and doesn’t give you a two-directional approach (you cannot do task Y until you do X). For that I actually use a database, but I wished I could solve it all with a single tool.

Maybe one day I’ll take on this project (again) and see what I can develop. And attitudes like this is what makes me agree with the theory that everybody needs to learn to write software. Everybody can have ideas like what works best for them that would never really be translated exactly to software that is available out-of-the-box. This is because a software that can adapt to all possible user’s needs would just be too complicated to use. That is, until we figure out how to use AI tools to simplify software use. Yes… there goes another project idea…

Maybe back here again?

I’ve been going through a phase of challenging myself to do things that I haven’t been doing often and, oddly enough, it has been working pretty well. I’ve been going to the gym twice a week (well, it’s been a week now that I haven’t gone, but it’s because I decided to hurt myself doing something yet unknown, probably gym-related), work on core strength, improve the health of my back, read more books and scientific articles, and work on projects at home.

I think the main trick that I’ve bee using is based on the following logic:

  1. Keep things simple and consistent
  2. Set recurring goals with some level of adjustment
  3. Track them (I’m using both a to-do list with fairly powerful capabilities for setting recurring to-dos, Todoist; and a web-based database, Airtable)
  4. Compensate for work done – basically I get to watch 2 hours of some TV show/movie for every hour that I go to the gym for

Unfortunately I still have more things that I want to change! Things are right now very inward-focused (either just me, or my immediate family, or my work). The next step is to expand my horizons, make sure that I don’t forget important friendships. I just need to figure out how to keep to the same logic and accomplish those things. Challenges are exciting!

Anyway, what does that have to do with blogging? We’ll see… I’m actually working now on a company that is not very strict about what I can write about and what I can’t, like Amazon was, so that’s one of my outward-focused things that I’ll try to cultivate. First thing probably will be a blog post about being away from Amazon for 6 months and how that feels. Soon!

The amazing power of Comcast

A week ago or so I received a robo call from Comcast/Xfinity saying that my current cable modem as too old and it will not support the speed improvements that they were doing with their network. But I was eligible for a free upgrade and I had to reply to some mail that I was going to receive or go to some website for more information.

When you receive a message implying that my internet could be faster, of course I complied and requested a new cable modem. And that cable modem arrived yesterday.

Before I installed it, I decided to do a speed check and then compare with the speed of the new modem. Surprise: nothing changed! I still have 30Mbps down and 6Mbps up after installing the new modem (which, by the way, is about double the size of the old one). So, besides the size, what is new? A couple of “Trojan Horse” things:

  1. Support for their phone service: if I decide to use the Xfinity phone service, I don’t need a new box, they just need to activate it and I connect a phone to the back of it and I’m good to go.
  2. Expansion of their “free” WiFi system: basically everybody that has their router receive the ability to have an “xfinitywifi” network. I actually don’t care too much about it. I do believe that they could have gotten the technology right and create an isolated network that does not use the same IP that I have and will not affect my bandwidth much. My concern with it is:
    1. It adds even more WiFi networks around me – see the list below
    2. It’s not trivial to turn it off. I can’t even turn off the WiFi that comes with it to use internally. I already have WiFi at home and I spent a lot of time having to expand my network to put the WiFi in a place that the whole house is able to work, and that’s not anywhere close to where the cable modem is.

Thanks, Comcast…

Thinking of the new Squarespace 7

It’s actually interesting what is going on with user interfaces… Basically little by little, everything migrates to WYSIWYG-style. Squarespace 6 was a step in that direction but created a very strange environment in which you could edit the site “inline” or on a different UI. And some things would be editable in one place, others in another (e.g. sidebars would be configured in one place (show/hide/left-right), and populated in another). It was very strange, but, at the same time, it would make it cleaner to see the preview without a lot of menus appearing when you mouse over things, etc.

Now Squarespace moved to their version 7 and pretty much got rid of the non-inline editing mode. Now all my edits happen directly on the preview. It’s pretty cool as a technology, but it does bring an interesting set of challenges. For example, on the SJC website if I hover on the menu on top in edit mode, there is an overlay of “Navigation | Edit” that actually covers part of my menu! Also sometimes my mouse is hovering on something and I don’t notice and suddenly there is “extra content” on my page that I didn’t expect.

But it does streamline editing. I haven’t played with it that much, but I think it’s a step on the right direction. The most important thing that they did right this time, that they couldn’t do with the Squarepace 5 to 6 transition is that it’s a feature that I can turn on for my website and not a matter of redoing the whole website as they required for the previous transition. Great job, Squarespace!

New year!

It’s now 5775, huh? That’s amazing how an year can go by that quickly. 5776 was an amazing year. A lot of stress, a lot of learning, a lot of changes. 5775 is likely to be completely different from any other year. Work will still be there and still be busy. Outside that, nothing will stay the same. But that’s a good thing! Looking forward to it.

Besides that, I don’t really have anything new to report. A lot of ideas going through my head right now, but no way to actually act on any of them. It’s probably the sleep deprivation causing my brain to go into an overdrive of sorts. Probably they are all really bad ideas, which is something that also happens when you are sleep-deprived. Anyway, Shanah Tovah everybody! And ready for a week of reflecting about my past year. I think this one is going to be easier than the last couple.

Funny blog comment spam

Apparently I still have old blogs laying around out there that allow people to spam the comments. Most of the time those spam messages are boring. I get things like “I really like your post. You should check my blog”. But this one, while in the same class, was funny because of a number of things. Before I enumerate them, though, let me paste the comment (links removed to not drive people to and from their site, as I don’t know what it is):


Thai recipes commented on Challenged by real-world ontologies – recipes

One of my apparently never-ending projects that I’ve spent a lot of time thinking about lately is how to build a system to …

Have you ever thought about creating an ebook or guest authoring on other blogs?
I have a blog based on the same topics yoou discuss and would love tto have you share some stories/information. I know
my readers would apprecioate yiur work. If you’re even remotely interested, feel free to shoott me an e mail.

So, what makes it funny in my opinion:

  1. Spelling: if you want to try to get somebody to guest author in your blog, or something like that, make sure you are a good writer so that people want to be “seen” with your posts.
  2. Topic: my blog post is about ontologies about recipes and not recipes themselves. The “person” that commented comes from a blog called “Thai recipes”, which doesn’t seem to be very related.
  3. Lack of specificity: if you are trying to convince somebody to join you, you should be a little bit more specific what you think can be the help on both directions.
  4. ebook? I didn’t get the reference to writing one. Why would I be flattered if somebody asks me if I want to create an ebook?

Anyway, I don’t even know my password to access that blog anymore (probably I could recover it if I really wanted to), so I don’t plan on doing anything else about it.

I should get back to thinking about recipe ontologies, thought. It was a great source of entertainment. I just need to first get to having time. Today I did have time, but was spent dealing with my backlog at work from my 3-week paternity leave/vacation.