The sorry state of web login

One of the things I’ve been poking around with recently is a hosting solution for APIs on this site (AKA the reverse proxy I’ve mentioned before). I’ve seen things that I’ve liked (mostly sandstorm), and while I’m certainly going to take things from a bunch of other places, it looks like I’m gonna have to do my own. Again. Which I totally don’t regret at all.

But, that’s not what this post is about. Once I’d decided I needed to do my own, I realized that I needed to figure out authn and authz for any system I’d build. And because there are other things I’m planning on that could use those same systems, I should build it as a separate thing.

So I did. But what I built was… rudimentary. It works, but it’s an outgrowth of the work I’ve done so far for Prose (see “The Prose Problem” I, II, and III), and it shows. It’s basically a login form that gives you a valet key (a bearer token that means pretty much nothing other than “I’ve logged in”), and while that works for now, it’s pretty “meh” for functionality and concerning for security.

So I decided to research what’s out there for authn specifically, though authz came up a lot (and boy is that a little concerning). And I got demoralized. This was a while ago, so I decided last week to look into it again. And I got demoralized. Again.

This time, though, I’m not gonna let it beat me, so you get to see what I see, and hopefully we’ll all be a little better off for it. Or at the very least, I get to spread my misery around.

So, let’s start at the start. When I began my research, I had a vague recollection from years ago of being able to log in to all sorts of sites using livejournal. I’d never used it, but I’d made note, and it seemed like a place to start. And it was, and that place was called OpenID 1.1.

OpenID is a fairly simple (by comparison to the other options, mostly) protocol whereby a user can prove that they own a URL without giving out any privileged information (eg, login credentails) to the site they’re proving to. And that’s all it does, the provider (the site that the URL is part of) gives the relying party (standards-speak for the site you’re tying to log in to) a yes|no response, and the RP (relying party) gets no information other than the yes|no and the URL.

OpenID has its warts. 1.1 specifically allowed for cleartext communication, and all that entails, and the secure option was difficult for web developers to implement, so they didn’t. 2.0 seems to have gotten a case of XML, though I haven’t looked into it enough to see how far the complexity grew. All said, though, it doesn’t seem like it should be fatally crippled when used with TLS.

So why isn’t it used? Well, it is. But even at its heyday, it never took over the world. So why? I’m not sure, but there are some obvious possibilities. First off, the protocol when done right seems reasonably secure, which means that it’s hard* to implement. As far as I can tell, there’s no open-source reference implementation, which no doubt helped hinder adoption. The standard is written pretty much in RFC-style, which helps pretty much no one actually implement the damn thing. But all together? I’m not really sure why, though people seemed to dislike it. I wasn’t active in the web space at the time, and I’ve never worked with OpenID, so I can’t really comment on why.

It’s largely irrelevant now, though, because the latest form of OpenID is Connect, which is based on OAuth 2.0.

Oh, OAuth. My first real introduction to you was the editor of 2.0 ragequitting, so my first impressons were… not good. But, being the nerd that I am, I’ve followed along and read up on the old version and the story along the way. It’s an interesting tale, but also kinda tragic.

OAuth 1.0 seems to have started as an extension of OpenID, and is heavily inspired by it. That said, though, it seems one of the design decisions that wound up defining it was “we’re not OpenID”. The aforementioned editor, Eran Hammer-Lahav, has explained the criteria that went into the design of OAuth, and well, it’s a perverse set of incentives. While the idea of building an authz protocol to work with OpenID, well, it wound up not being that.

So if it’s not an authn system, why am I talking about it? Two reasons: both because OAuth wound up being used for pseudo-authentication, and because of OAuth 2.0.

The first point is an interesting one to consider, and basically is how my first attempt at a login system works: pseudo-authentication. Why is it pseudo- authentication and not authentication? Because the user never actually proves that they are who they are, just that they can give permission to access a given resource. It may seem like a technicality, but that sort of thing can have serious repercussions, especially when using bearer tokens like OAuth 2.

So I’ve brought up OAuth 2 multiple times, and it’s worth discussing the primary differences. The first one from an architecture standpoint is bearer tokens vs signatures. OAuth 1 used signatures to prevent giving the RP any easily abused authorization, but web developers hated it. Real encryption (or anything close, really) turns out to be pretty hard. OAuth libraries were broken (and OAuth 1 was designed for some really limited scenarioes, think PHP 4 and terrible hosting, which made the design just a little bit more awkward). Even when they did work, the design was hard for some to wrap their heads around.

But perhaps the big killer is that apparently some major corporations didn’t like the way it scaled. I dunno what their issues were, specifically, but they made some extensions to OAuth 1 to switch to bearer tokens, and then pushed for OAuth 2.

And from everything I’ve seen and heard (admittedly, mostly by Eran Hammer), those corporations drove the entire process into the ground. OAuth 2 doesn’t even call itself a protocol, but instead it’s a “framework”. In practice this means that Facebook, Google, Yahoo, and so on each have their own incompatible versions of authz systems. One cannot log into Facebook with a Google account, and never will.

This would only be of limited problem if it wasn’t for a few things.

First, the world of the web is coasting towards centralization right now. Open protocols are stagnating (though for sometimes legitimate reasons, it’s a hard problem), and companies are realizing that cooperation doesn’t buy them much of anything in the short run, so they don’t care. In fact, it goes against them walling in their gardens, so they tend to actively work against it. Anything to help that quarter’s bottom line.

Second, OpenID decided to base their latest standard around OAuth 2. Honestly, it doesn’t seem too terrible on the surface, but basing it on OAuth seems like it’s going to defeat the purpose. I don’t know for certain, but I don’t have high hopes.

So what else is out there? Well, as far as I’ve found, there are three interesting takes on the problem: Persona, Oz, and WebID.

Mozilla created Persona, and it has some quite intriguing ideas behind it. It’s based around email addresses as identifiers, rather than username (or URL), and public key cryptography. One of the more generally interesting design points of it is that the Identifying Party (IdP, the site that proves you are who you say you are), doesn’t even get to know where you’re signing in to. It’s not really relevant to what I want to do, but an interesting point. I can’t pretend that I fully understand the design, both because it’s pretty heavy on crypto and not particularly well documented, but it seems like there’s interesting tidbits to be had there.

Eran Hammer went on to create Oz, which is more directly an authz library and deliberately leaves authn out of scope. It’s something I may wind up drawing inspiration from when I get to authz (or hell, maybe even using directly), but unfortunately he seems to be done with the entire standardization system so completely he doesn’t even want to write up the protocol, so I’d most likely have to reimplement the server side, as I never intend to use Node on my hardware.

Finally, there’s WebID, which I found when looking through how Diaspora* intends to do what they do. Unfortunately it looks like it’s not really going anywhere, since the last draft was in early 2014 and there hasn’t been any news since. I’m still looking over it, but aside from its extensive use of RDF, I don’t really have much of an impression on it. Who knows, though, it might have some good ideas.

And where does all this leave me? I’m not sure. There’s a significant part of me that wants to implement all of them to experiment with. I doubt that’s feasible, but I’ll probably make a crack at experimenting with several of them.

Will I wind up writing my own? I don’t know. No matter what, it seems likely that I’ll have to implement it myself, as all the implementations I could find don’t quite line up with my system requirements.

I’ll have more news once I’ve tried this stuff out.

“Hard” as defined by the lowest common denominator of web developers.