Andrew Godwin about Channels at Django: Under The Hood 2016

Andrew Godwin about Channels at Django: Under The Hood 2016


(chimes ringing) – So, there’s a lot to cover, and I’m gonna try and give you a reasonable idea of both the
rough scale of the problem, and then dive deep in the
Django-specific part of it. I can probably speak for a
whole conference on this topic, but you don’t want that, trust me. So, let’s get diving right in. First of all, I’m Andrew Godwin. If I didn’t say earlier, I
am a Django core developer. I work at Eventbrite, where I’m one of the senior software engineers there. I used to complain about migrations a lot, but apparently they’re
solved now, so, it’s all, no. (audience laughing) Marcus, thankfully, is doing
a lot of the heavy-lifting on that stuff now, so
I’ve been a bit more free to look at channels. And, really, a lot of you I presume want to know how channels works. Well, I have one answer
for you; it’s magic. Thank you, it’s been a great night. (audience laughing) It’s not magic; it may
seem like it at first. But, basically, I’m gonna
try and outline to you what the problem is,
or sort of the context that this sits in because for
some of you it’ll be familiar, for others it won’t be a familiar problem. And then go through both my
solution and thought process, which some people find interesting, and then look at the actual
code that came of that process, and some of the edge cases,
and problems we have. But first of all, what is it the problem? Well, the weather’s changing. Back in 2005 when Django was started, the web was a simpler, happier place. The websites were static HTML with a couple of interactive bits. JavaStrip was, sort of, a thing you used for making the status bar animate, or having snow on the page at Christmas; it wasn’t very exciting. But since then things have changed. We have rich web applications,
we have data binding, we have WebSockets. And WebSockets was really the thing that drove me to this in the first place. About four years ago, I started getting involved
with WebSocket stuff, initially as sort of a web games thing, but, also, just ’cause I like things that are overly complicated and pointless. But since then it’s developed
into a more interesting thing. And the first thing I
want to tell you all about is that unlike common conception, channels is not just WebSockets; it’s one of many things in the modern web. And all these things exist to today, and there’s more coming, that exist that are not
just request response. Django, and WSGI, and a
lot of Python web stuff is built around this request response, synchronous worker pool
kind of architecture. And all of these things, like long polling for example,
you cannot do with that because if you have that architecture, and you have a long poll that holds open for, say, 30 seconds, then
all of your worker threads are just idle for 30
seconds doing nothing. And the same goes for things
like service and events where you hold open things,
send multi-part chunks back. So, we have all these issues, and we have Django that is very capable of doing normal web pages, and has done that very well, but isn’t really suited
in its current form to these kinds of challenges. And this is down to a
couple of key things. The first is very obvious. Python, at least for a
long time, was synchronous. It’s only been in the last couple of years with the newer Python 3 versions that asynchronous, sort
of, handling and code, and the way we can do DX
libraries have come to being. And even then Django is still
a synchronous framework. The problem with async
libraries in the way that they are best written, is that you need an entirely separate API for synchronous and
asynchronous libraries. You, basically, need a whole
second standard library. And going back to the
thing we heard earlier about Django being the
standard library for the web, like, we would have to rewrite
most of the important parts of Django to be asynchronous. And, so, rewriting Django,
unless you can find me a team of 10 amazing developers
in a couple of years, is probably not gonna
happen in the near future. And it’s more on top of that, too. So, while asynchronous code is lovely… I would heavily advise you to
only write asynchronous code where you have to. I can write it; I try
not to whenever I can ’cause it’s easy to screw up, it’s easy to get horrible
mistakes and problems, deadlocks or livelocks abound. Synchronous code is easy to think about, easy to reason about. And on top of this, if you
think about the true meaning of what it means to be an
asynchronous framework or system, often that means, like, oh,
inside a single process. So, say you look at something
like Gevent for Python; it can do lots of green
threads in a single process. But once you’ve got a WebSocket server that’s laid in a single process, what happens if you make it
two processes, or two servers? How do you extend beyond
that machine boundary? And, so, are you seeing this much more as how do we take this
problem of there are these individual units of work,
and spread them across not just multiple threads or processes, but across a network of machines? Because that’s how modern websites work. We have clusters and data
centers full of stuff. And, finally, important thing for sanity, it has to be a proven design pattern. I, In my younger, more naive years, loved inventing brand new exciting things that didn’t work very well. I’m now a bit older and a bit wiser, and things that have been proven to work, and have research, and have
write-ups on how to do them is very important. You won’t get very far without building on the shoulders of the
giants that came before you. And, so, that’s a very
important thing for me because I don’t want to have to do all the years, and years
of research and testing to prove that something
works reasonably well. And, finally, I have to
be able to explain it because it’s no good if it’s an amazing system that’s elegant, and one person in the
world understands it. And it’s the same for your code. Code written in this system
should be easy to reason about, easy to think about, easy to just understand what’s going on. So, given all these contraints, and many more that I’ll, sort of, touch on over the rest of the talk, what fits those contraints? This is a thing I sat down
a couple of years ago, and I had been researching on and off for a few years before that. How can we solve this within Django? And one of the big things here is the idea of Loose Coupling. Now, Django I think used to
throw out in the front page the idea that Django has
this fan of Loose Couplings, the design pattern of
the way things are done. And what it means in a couple of ways, well, the first thing is it means not being too tied to WebSockets. As I said earlier, they’re
not the only problem; there’s more of a general problem here with the web not being this
purely synchronous flow anymore. And, also, shouldn’t
be too tied to Django. Django has been, for a
long time in the past, this, sort of, standalone
monolith in the Python community. And it’s important to have a solution that’s a bit more open,
a bit more welcoming, and then that we can have our friends in the rest of Python help us with. Because there are many
people in the rest of Python who are way better than me at networking, and they have their own projects; so, having them included
is a massive help. And the second part of loose coupling is the idea of the fact that
you can swap out components. That there’s a well-defined,
minimal interface that you can take and
you can program against, and then if you want to change
what’s on either side of it, you’re free to. And that’s, kind of, a key
part of what Django is, right? The template system now, you
can use Django templates, but if you want to, you can
just pop Ginger in there; it’s not too difficult. You gotta rewrite the templates, but it’s an option we give you. And Django has this philosophy of you build on Django to start with, and as you grow pieces fall away, and you can replace them
with your own thing. But at any given point, it’s
a slow, gradual process. Django is modular; you can
pick and choose what you want. So, let’s look at a proven
design pattern first. There’s not many options for
talking between machines. There’s been a lot of research into it. And the one I have had a lot
of positive experience with, and generally prefer, is The Message Bus. So, The Message Bus is the
idea where you have a single, centralized bus, it could be a server, it could be a cluster, it
could even just be a file. But it’s a single place
you point every server, or every process to. In particular, what you don’t
do is none of the processes know where the other ones are; all they talk to is the bus. And you put something on
the bus with an address, you say, “Hey, send this
to,” and you give a name, and then something
else, it’ll listen, say, “Hey, I want things for the name.” And it delivers them across. This is a heavily-proven
thing in research. It has some issues; there’s only tradeoffs with large system design. You’ll have to trust me
on that in some sense. And, so, that’s kind
of the basic style of, there’s some good solutions there. And then there’s the question as well, okay, you have a message bus, clearly you send messages over it, but what do the messages look like? How are you encoding things? If we’re gonna be doing
WebSockets, what does that mean? And, secondly, as well, you have a bus, but how do you talk to it? What does that interface look like? So, you know, how do you send it? And this is the basis of what channels is. Channels is routed in a
specification called ASGI, which stands for the Asynchronous
Server Gateway Interface. Definitely not based in WSGI;
definitely not the case. And what this is is a specification for both of those things. Both how you talk to the message bus, and how you send things over it. So, this is the five functions of how you talk to that message bus. You can send and receive, and, in particular, the
send is always non-blocking, and the receive is blocking
or poling; you get to choose. And, so, that’s how you
do a basic communication. And then the second
part is a group system. And this the key difficult
part of the problem because doing messages
between different machines is not too difficult, you
can do it pretty easily, but having a broadcast system
that works across a network is a difficult problem. And, so, that’s based into
the very core of channels is part of this specification, and part of the thing
that gets implemented is the idea of a group. You have a group; you can
add and remove channels from that group, and then
you can send to the group, and other channels in the
group get the message. And this is a very simple,
basic broadcast system, but it’s enough of that minimal interface that you can build on top of it. And then what are the messages? Well, the actual semantics,
the thing inside them, is defined in a little bit, but, like, in terms of format,
what’s the restriction? Is it binary, is it text? Is it just numbers? Is it Morse code? What I ended up with was
it is a JSON-compatible, as in it only executes the
types that JSON can represent, dictionary-based format, as in the top level’s
always a Python dictionary, text in fitted message onto the channel. And in the API that you have in ASGI, all you do is send and
receive dictionaries with these basic types of them. The encoding the serialization
is all done for you by the layer underneath. And that pluggable layer, the thing underneath those five functions, is one of the swappable components. Oh, apparently I’m a
very excitable speaker, and that pushed the force this way. That’s fine, we okay? Yes, excellent. And, so, that’s part of the specification that you can swap out. If you want to encode
things as JSON, you can. If you want to use message
pack, you can do that, as well. The key thing is as long as
you stick to this interface and provide dictionaries out
to Python, it doesn’t care. And, so, let’s dig a
bit more with an example because talking in
abstract is very difficult with this kind of thing. I prefer using very concrete
examples for things, so that you have an
idea of what’s going on. And, so, let’s look at WebSockets. If you’re not familiar,
they are a state-full socket that, basically, it starts
off as a HTTP connection, and then you send a header saying, “I, actually, wanted to
upgrade to WebSocket.” And then if it supports it,
the server sends back, “Okay.” And then it, sort of, the
nice text file vanishes and it turns into binary, and
then frames start happening. But, essentially, you’ve got
a couple of different parts of the life cycle. You have a connection phase, and in the connection phase
the client talks to the server, says, “I’d like to open a socket. “My origin is this, I’m
coming from this frame.” And then the server gets to go either, “No, we’re denying the
connection based on security, “or overbearing, overloading policies.” Or it could say, “Yes, I accept this.” And it sends a handshake
back and negotiate an open. Once you’ve opened, you can send and receive another direction. You can send or receive whenever you like, as many times as you like. You can just wait, you can open a socket, and then leave it open for a day, and then send one thing and close it. That’s perfectly acceptable. You can open a socket and
just receive things down it, and then close it. There’s no particular
ordering there at all. And then, finally, at
the end of a live socket, it disconnects. And there is a thing
in this bag that says, when you disconnect there’s a code, a bit like the HTTP
error code that says why. It’s, like, code 1,005 means
it closed unexpectedly, for example. And, so, this is kind of the framework of what the socket is as a protocol. And the job is how do
we take this protocol as an abstract concept, and
map it into what channels is? And this, kind of, is part
of the design channels. What you do is, because you
have this messaging system, you have this bus. What we’re doing is devolving protocols into a series of events. And what happens is for
each one of those events, you send a message onto the message bus under a certain names channel. So, for example, here are those three. When the connection
happens, a message gets sent onto websocket.connect. When a message comes in from the client, it goes under websocket.receive. And when something disconnects, it goes under websocket.disconnect. And what’s happening is on
one end of that message bus, there is a server sitting
there terminating the sockets. It’s a separate process, and
it’s handling that, sort of, all the handshaking, and
all the decoding for you, and just turning it into these
nice messages onto the bus. And then your job as a Django program is to perceive those
messages and act upon them. But, of course, to act upon them, you need to do things the other way. Like, you see here the arrows
go both ways on the slide. And, so, what you have to
do is you have send back to that socket on a channel, as well. And, so, not only do you send
from the client to the server on these connection-received
disconnection channels, you, also, send back from
the server to the client on a special channel. Now, you may notice there is
a different kind of formatting on this channel; and there’s
a good reason for that. So, if we look at the idea of how this bus might be laid out in a bigger system, let’s say here’s one. So, we have two of these separate servers, and bother of these servers, imagine, like, Gunicorn or Nginx, like, a separate process
that terminates WebSockets. The socket goes into
there, it gets terminated, decoded, parsed, and then
all that comes out that is events onto the message bus. Separately you have a couple
Django Worker processes listening on the bus for channels they’re configured to work on, and then when they get
the message, they go, “Oh, wake up.” Take the message, run the codes that you’ve written for them, send things back, and go back again. Now, when someone connects,
or sends a message in, any of these Django
processes can handle it; they all have the same code, in theory. And, so, what we can say is, “We’ll just send on to
the connection channel, “and whoever’s ready can
pick it up and go with it.” It’s great, that’s how general
worker pool handling works; it’s fantastic. But when you send something back, you can’t just say, “Oh, send this back. “Either server can handle it.” Because the socket is only
on one of those servers. The client has a TCB connection to just one of those two processes. So, you have to route back that socket to the particular process that has it. And that’s kind of the distinction
we have inside channels. You have these top set which
are general well-routable, anyone can handle them channels, and when you have an
exclamation mark, it means no. This channel has a specific process; we’re trying to look out
for it and receive it, and exactly that process can handle it. And that’s in there so that
the message bus implementation can, actually, handle
them in different ways for more optimal performance. It can understand that, “Oh,
if I see this character, “I can, actually, do a different
kind of routine for it.” And all this comes down to if I tell you there’s a message bus, the first question you should ask is, “Well, what are the tradeoffs?” Because any, any distributed
system has a set of tradeoffs. If somebody tells you
their system is perfect, they are lying to you. They should not do this to you at all. And, so, there are a series of tradeoffs. The first one is how does delivery fail? So, say I send a message and the system is in a non-perfect state, what happens to my message? You have two options. Either the message doesn’t deliver, or the message delivers more than once. And the architecture, you can
imagine the different ways, like, you know, not
delivering is pretty easy, it drops in that work somewhere. But you can have systems, too, Kafka is a good example of this, where in a failure state, you’ll get multiple
duplicates of the message; that’s their failure state. You never get, you get at least one. Channels is at-most once system where you get the message at most once. So, it means either you
get it, or it drops. This is generally a lot
easier to reason about, and often how you design
websites in the first place. It is, also, first-in, first-out. That means if you send a
message onto a channel, and then another message on the channel, the first one there gets handled first. This is a queue versus stack distinction you may, if you’ve done computer science, you may have heard of already. You then have things like back pressure. So, a channel has a capacity; so, by default websocket.connect
has a capacity of 100. So, if 100 people try and
connect, and the server is behind, the 101st connection event
tries to get added in, and send goes, “Whoa,
the channel is full!” And then what that means is it tells the thing that’s handling the socket, “Oh, well, the backend system
is full and overloaded. “I should drop this connection, “and send back an error code saying, “Oh, 503, or 501, the system
isn’t working very well. “Please, come and try again later.” There’s a few other things
to highlight, as well. Because of the way it’s
design, it’s not sticky. When you have all these different workers listening on a channel, any
one of them can get that particular message at a time. Whoever’s ready and gets it first. And, so, if you have a socket that opens, and it goes, “Oh, connect,
receive, receive, receive, “disconnect.” It might be a series of
events you have on the channel that can go to five different servers. And, in fact, it can go
to different servers, and happen at the same time. You would have connect,
and then receive, receive, as if it’s handled by two
different servers simultaneously, and in parallel with each other. And this is hard to reason about. So, that’s, kind of, an annoying
thing of this architecture. And, as I said, it’s a tradeoff. If you want, you can have
both of these things. You can have guaranteed-ordering, you can have serial-processing, and you’ll have a massive
speed drop as a result. But we’ll come to one
of those in a second. So, let’s look at what
they actually look like, these messages. Well, here’s an example
of receive message. When you have an open WebSocket, someone sends you a frame. Now, WebSockets are a framed protocol; that means that you get
distinct sets of data, so, like, as a chunk. And they can be either texts, or they can be bytes, basically, binary. So, imagine someone sends in a text frame. So, it comes into this server that’s on the edge of the network, has a socket tied to it. That server goes, “Okay.” Decodes the binary frame into reasons. And then goes, “Okay,
well, that’s a text frame.” So, it sends this message. For the first part, you can see here, is that, yes, it is a text frame. There is a text, part of
this message that says, text: “Hello, world!” That’s what we got sent. And the second two part, well,
the last two parts of this are the meta-data. And they tell the server what this means because in this system
I showed you before, it’s an entirely shared-nothing system. This is great for scaling ’cause you could just launch more of them, but what it means is
that none of the workers keep any state, or know about each other. And, so, your only way of understanding that these things are
part of the same socket is to have something in there that says, “Oh, no, these all come
from the same socket.” And what we do is reuse that channel that you send things back
to it on as the identifier. So, the reply channel here is both the way of sending things back to the socket. So, you can say, “Oh, I’ll take that “and send them back to the person.” But, also, your way of saying, “Oh, “this is the same socket “as that one I saw three seconds ago “that had the same reply channel.” And you can use this as a
key in another data store, say, a cache or a database,
to tie those things together. Much like you would with a
cookie, for example, in Django. And, finally, you have a path. And then comes in, as
we’ll see, in the routing later in the talk. But when you have in Django code that handles these messages, there’s kind of a URL-like
structure in there, where you say, “Oh, well the
things under this path do this, “and things on this path do this.” Because they can only welcome
things in the message, what we do is you bundle
the path with the message when it comes through. We didn’t bundle up the
headers, or anything else, just the path. This was a later addition onto the thing, but it’s very useful, as you’ll see later. And it’s not just WebSocket. So, as I said before,
you can take a protocol, you can decompose it into events, and you can say, “Well,
what does this mean? “How do we approach this in
this event-driven system?” It’s an event-system where
all you can do is write code that happens against things
that happen in the protocol. And, so, we have a full spec. We have proper musts and
shoulds, and everything for HTTP and for WebSocket. And, so, you saw there
part of the WebSocket spec. I’ll just go forward here. This is an example of HTTP
connection in this case. So, for the HTTP you
get a request channel, and then you send back
in a response channel; it’s very simple, but it
still fits the same model. And, so, you see here you get, well, here’s our reply channel
to send our response back on. But familiar things like the method, and the path, and the
query string are all here. And the fact this is just a
super-set of what WSGI provides, so it’s sort of inter-compatible, as well. But it’s not just web stuff; you can, also, apply
this to other protocols. And we have rough drafts
and people looking at, and interested in working
on both IRC and Slack as messaging protocols. So, the idea being that
you could write chat bot where, well, things appear on
the chat dot receive channel, and you can just write code
that acts on received messages, and just send them out
in the same framework. There’s an email rough
draft I’ve been working on where you could write things
that handle incoming email in the same framework. Where you go, “Okay,
well, we have a message “on the email-receive
channel, let’s look at it. “It has these headers, we
can respond in the same way.” You could do, in theory, any protocol. Two I have done in the past,
and recommend you never do, Minecraft which is you have to do special Java-only serialization in Python. I have a whole library for
that if you’re interested. (audience laughing) And screens ripping mainframe terminals over serial connections. Both possible, don’t do them. But in general, like, there’s more, we’re trying to expand the number of specs just as a health
management we can do them. But try and get a few more things than just these top two in there. And going back to the list, again, so, I highlighted these two earlier; and they are problematic
because WebSockets are an ordered set of frames
that you’re expected to handle, basically, one after the other. And, so, it’s not a great
user experience if I tell you, “Oh, it’s fine, just make your codes. “It works perfectly running parallel “on five machines at once, “and handles ordering of all the one,” No, as I said at the beginning, writing asynchronous code is hard. And, so, what we’ve done
is as part of the spec where, actually, those received
messages have a fourth key, they have an order key; and that key tells you, “Oh,
this was message number five.” And the second part of it is that what we can do is we can
use that order number, and then we can use the reply
channel with a caching store, and just have the system tell you, and it’s built into
channels as you’ll see, “Oh, enforce ordering on
this on this channel.” And what it’ll do is it
will trade off some speed, obviously, you lose some
speed to synchronization and this collaboration algorithm, but you do get your things executed perfectly serially in order. So, if you want it, you can
do that tradeoff yourself with a single decorator we have for you, and just have it work. And the second thing that is, in fact, a very recent development in channels, is that what we’ve done
is rather than before where you’d connect to the server, it would negotiate, and
then send a connect message. It would just always accept, and then just start
sending receive messages. What we’ve changed now is that it does a connect message,
and then it holds the socket in the handshaking phase, sends through your application and says, “Hey, do you wanna accept
this, or reject this?” And once you’ve sent back to reject, then it continues to either
open the socket, or close it. And what this does is
give you implicit ordering of the connect always comes
before your first receive. And that’s the most
common problem people had. So, it’s, kind of, hard
to understand all this. And, as I said, I’ll give you help. So, let’s go through some of
the components of the system. So, I’ve been talking about
the server that terminates, and the socket channels. So, this is the five packages that come and make up the Channels project. There is Channels, the autonomous package that is the Django integration that ties ASGI, the thing I’ve
been telling you about, into a nice Django-ish framework, and we’ll get to that in a second. But there’s, also, the other parts of what makes ASGI run. The Django part is just one
part of the whole system. There’s, also, the central
part, the message bus. Like, there has to be a piece of code that plugs in and does
all this delivery for you. And the idea is it’s a bit
like DBAPI 2 in Python, like the, sort of,
common database interface where there’s a series of pluggable things for whatever you feel like doing. So, there’s a Redis backend, which is used for redis delivery. There is an IPC one, which is
just shared memory segments if you want to be on one machine. There’s an in-memory one for testing. And we, actually, have
some people looking at licensing one on RabbitMQ,
as well, now, as well. So, the idea is you need to
choose from this set of options based on your deployment, your experience, maybe you have some of
these deployed already. Whatever you want, or fits the situation, you can have that particular
delivery mechanism. And if you want, it’s
not a very long spec, if you have a personal team who really likes asynchronous code, let them write one that fits your scheme, or fits your custom networking protocol, and that works, as well. And, finally, on the
other end of all of us, is the mystical server that
turns WebSockets into events; and that is called Daphne. Daphne is a, basically, I
took Twisted and Autobahn, two excellent pieces of code, put some hot-glue in the middle, and then just tacked on
ASGI on the back of them. All it does it uses the
Twisted web-standard HTTP-handling library. Autobahn, which is very
well-proven WebSocket library, and then just takes all the things where you plug in, like,
where you handle those, and just puts in send to
channel there instead. And whenever it gets
received from channel, it just then responds onto the socket. And that server is the thing that handles taking these connections
and turning them into events on the channel for you. If you want, you can just run Daphne; it will handle HTTP and
WebSocket all on the same socket. It will auto-negotiate all
this path-prefixing stuff you may have seen. But if you want, you can also just run your standard Nginx or
Gunicorn on one side, and then just send WebSockets to Daphne, if you’d like, as well. So, I mentioned previously
channels is the Django-ish part, so let’s go into some deep
detail on that Django-ish part ’cause that’s kind of
what you’re here for. So, crucially I want to highlight that what is here has been a lot of revisions. It takes a lot of tries to get a nice API. It’s still not fully there, yet, well, it’s mostly the way there now. And, so, what is here is
based on a lot of thought, and what fuels Django-ish? Nothing is, there’s no specification
for what is Django-ish. There’s not a big list the
core team has of, like, “Oh, well, it means this,
this, this, and this “stamp of approval.” That’s not how it works. It’s more so of a nuanced thing of what feels good for the users, what matches expectations, what’s not surprising, what’s boring? Boring is good in Django world. And, so, the first thing is, well, everything every Django
developer does is write views. We’ve all written views; they’re great. So, what if the fundamental
unit of work in channels looks, basically, like a view? That was my, sort of, guiding
principle of this project. So, consumers are the things that you write against these channels; and they look like views. Here is the code. What happens is you write a callable that takes a single thing, so it’s not a request, it’s a message, but it’s very similar. You do some stuff with
the thing you got given, and then you can send responses back. There’s a couple of differences. So, first of all, you can send responses anywhere in the consumer. It’s not a return, like it is with a view. You get that reply channel
back in the message. So, as you can see here, you have both message.reply_channel.send, in which case, what this does is it, this is tied to a channel. It gets the message, it then says, “OK.” It sends the message text
OK on the reply channel of the message. Then it uses the group system. Now, this is just a very concise example, but say in the connect handler, we said, “Oh, when someone
connects, here’s a consumer “that says add them to this group.” And when someone receives a message, here’s a consumer that
says send to this group. Tada, you’ve got a chat application. Like, those two things give
you broadcast text chat between all the clients connected. Or you can just do RM things,
you can whatever you like. It’s a normal Django process. The normal Django
database clearly applies. If you want, you can wrap
transactions around in there, it’s just, basically, like a view, but a bit more complicated. And that, kind of, happens in also with the way the routing works. Now, I like the Django URL system; it has it’s problems, but it is, I’ve grown to know and love it. Maybe a little bit of a Stockholm syndrome over the past 10 years. And, so, keeping it to look
like that is kind of important. And, so, what you have
is you have this list, sounds familiar, of reject-based
matches of what to do. So, imagine this example. You have, basically, a list of, well, this channel, and
this gets websocket.receive, and if the path is chat socket, send to this function. You can include like in URLs, and even like in the URLs files, if you include with a reject path, then that bit’s stripped
from the previous one, from the next one down. So, you could say, “Oh,
this prefix fits this file.” And then just not have the
prefix in the file at all. And the nice thing is, as well, you can have class-based consumers in the same way you can
have class-based views. And as well as having everything,
sort of, more generic, and instead of generic
class-based consumers, as well. Those consumers can, in
fact, omit their own routes. ‘Cause, in particular, the one change you see here from URLs, is that URLs don’t have a channel. In URLs, everything is on
the HTTP-request channel. And, so, this is a
little extra information. And, so, what we can do is
those class-based consumers can tell the URL file, “Oh, no, “I need these three channels.” And the route class decorator
takes that information from the class and uses it for you, so there’s no duplication
of content in this file. And, as I said before, it
is a shared-nothing system. These things would happen
on different machines, and different parts of the world, even if you have a weird
message bus set up. And, so, how do you share state? How do I have a thing
where I can send a message over the socket to set my username, and then have another
message come through, and then that remembers my
name and comes back again? So, you could use database;
databases are great. But you can, also, use Sessions. We all love sessions; they’re
a good temporary storage. And there are two kinds
of sessions and channels. There’s the good, old fashioned
session you know and love, from HTTP, from the cookie. And what happens is
when you open a socket, it’s very convenient
’cause it starts HTTP, the cookie comes through in
that initial connection request. So, we can actually take the
session from the real site, drag it through into
the WebSocket framework, and then stash it, but why do we stash it? Because, of course,
that connection handler is on one of the machines. Where does it store the cookie
for the rest of the thing? And, so, there’s also a
second kind of session called a channel session; and that channel session is basically your state for the lifetime of a socket. So, imagine if you’re writing a simple application in Python, you’d use your local variables for this. Write, oh, yeah, have a handler,
or a thread it’s in there. Instead the channel session is a place where you can put things that happen during the socket, and
for the socket only. When someone connects, set their username, set their permissions,
and that kind of thing. So, that gives you sort of your general over the lifetime set of what
you can do with the socket. And this is one of the things that powers the enforced-ordering thing
I mentioned about before. So, you know, I said
you can take a socket, and you can enforce ordering
on it using these sessions. And what this does is this decorates it, and ships with channels
just looks at the message, uses the channel session, and says, “Okay, I have packet number five. “Was the last packet
handled packet number four?” And at the end of the decorator it says, “I just handled packet number four.” And writes it into the channel session. So, what you get is you get the
thing where the first thing, packet number one just runs,
it doesn’t look for anything. It says, “Okay, number one runs.” Saves number one into the session, then number two goes, “Oh,
number one’s been run.” And that can then pick up and
do it again, and keep going. How this works, there’s a little bit of interior information here. So, inside the loop, oh, sorry, inside the
decorator, even, that runs this, this is the sort of code we have. So, this is in particular
the session decorator to give you channel sessions. So, the Django session framework, if you’ve ever used it, is interesting and quite old by now. And it, kind of, wants
to do its own thing. The session framework goes,
“Okay, here’s your session key. “Put this in the cookie.” Now, we can’t do that; there’s
no cookies on top of this. Well, you can receive them,
but you can’t set them. So, what we do is we go, “No,
no, no, no, here’s the key; “deal with it.” And the key is made
using a predictive hash from that reply channel, that identifier that we
have for every channel. And, so, we take that identifier, this first function on the slide here, basically, takes the reply
channel name, hashes it, then forces a set of
framework to look up by key. And you get back a session. And then the second part says, “Well, okay we have a session, “we need to make sure it’s saved “so it actually exists
in the database first, “so there’s no collisions.” And then when it’s saved, it can then pass it back and continue. Now, the one interesting thing here is there was a raised consumed later. The way that channels works is that it is a separate
worker loop from Django. So, normal Django runs
out of WSGI container most of the time; that handles
your loop and your requests. Because channels run separately, there’s a separate process,
it has its own loop. It has a thing called Run Worker that happens and sits there, and just spins around handling stuff. And the flow of that
is, well, it looks for all the channels it can listen on. So, connect, and receive, and disconnect. It says, “Give me any message
from any of these channels “that arrives first.” And then the loop goes,
“Okay, I got this thing.” It then looks in the routing to try and find the right consumer, finds it, then just runs the consumer, and then just finishes
and then loops back around to the beginning again. But in there, inside that code, there’s a little thing that says, “Ah, but if the consumer throws
the exception ConsumeLater, “take the message and put
it on the back of the queue “and continue.” And, so, this is your way you can go, “Ah, I don’t wanna handle
this one just yet.” And you can put it back on. There’s a couple of these
exceptions inside channels, sort of, a bit like control flow. I’ll throw this away, consume it later, and, sort of, a bit
more of that worker loop that we now have in place. There is no middleware. And this is a somewhat
controversial thing. There was some issues recently over this. So, middleware is great. I agree with this, I love middleware. However, middleware is built with a thing where the request flows
in through the middleware; the response flows out
through the middleware. Now, because of the way
consumers are written, which I remind you is like this… You don’t return the response, you send it via a couple
of different functions. You can do a group send, you
can do reply channel send. You can just send to an
arbitrary channel if you want to. And, so, what that means is you can’t capture those easily at all. And, so, your middleware
can’t do the exit portion. And, so, for that reason,
right now my position is decorators replace most of
the cases of middleware. And they are more obvious
because with decorators you look at your consumer and go, “Oh, there’s a decorator right there.” And then that’s being applied. With middleware, it’s
hidden away in another file. And I personally like explicit,
discoverable interfaces. I may, eventually, concede
on this at some point, but right now there is no
middleware; it’s all decorators. And things like, you know,
authentications, and sessions are all handled in decorators. Those things are there for you. And, now, you may think, “Well,
Andrew, what about the rest “of Django that still exists?” That’s still there; Django
views is still there, Django HTTP is all well and good. But the fun thing is because
what I described is a super-set of Django does now, it, actually, builds
into the current system. The entirety of Django as you know it now, is a single consumer inside channels. It’s called a view consumer. If you want to, you can
route request to it, and it’ll come through the view system, into URL routine, come out
the back with a response handled onto it. And this means that inside channels, you can just, if you want to, mix and match HTTP handling
with, I say, long polling which you can’t do inside views. Say, “Oh, yeah, in my routing
if it’s on the long poll path, “send it off to this
special consumer of HTTP. “If it’s on the WebSocket
path, do it over here.” And, finally, if you want to, and this is on by default, just drop down to the Django view system. And this way you can mix and match different kinds of even HTTP
handling, applicable handling, with standard Django views. This is what that consumer looks like in very modified small formats. It’s only about 20 lines of code. But, essentially, what it does is right now inside Django there’s
a thing called WSGI Handler. That is a piece of code where you go, “Okay, here is my application.” You load it in, and then you just call it ’cause WSGI you call and get a response. You call it, and it gives you a response. Same thing applies here,
there’s an AsgiHandler that can decode things from a message rather than WSGI environment-variables. So, we use that, and then
we just take the replies, and then we just send the
replies to the reply channel, and then handle them off. The one difference here, as you can see, another one of those exceptions. As I said before, channels have capacity. So, imagine you’re streaming
a very large file to a client, and it’s got a very slow down-link, they won’t be able to receive messages as fast as you’re sending them. And we don’t want the channel layer to have to sit there absorbing
100 megabytes of file into whatever buffer it has, and then doing it down, again. And, so, what we have is,
again, if a channel is full, in this case, rather than disconnecting, we just wait. We stop the iteration, and, in particular, if you have, like, a chunk-response,
or a yielding-response, this works as you’d expect
where it pauses it that yield. And just waits, and then when
the channel is empty, again, it goes, “Okay, it’s empty, again.” And continues going through. So, you can do stream
responses in the same fashion you can with WSGI. Finally, in sort of the slightly quick with a stop tour of this, there are more signals and commands. So, inside Django you have a
couple of important signals. Response-started and
response-finished, in particular, are your lifecycle of how response works. And a lot of Django is built around, well, when the response
starts, do some stuff; when the response finishes,
clean up some stuff, so the next response is done. In particular, closing
database connections is done in the response or request-finish handler. So, channels introduces new signals. The old ones are still in
there for the view system, but there’s, also, a consumer-starter and consumer-finished signal. And what it does it ties
in database cleaning to consumer-finished, as well, so your consumers won’t
leak database handles. And it, also, overrides run-server. So, run-server bundles in
Daphne and runs everything. So, run-server just has WebSockets now, just works as you want. And then, finally,
things like static files which have like special
development overrides in Django. You may not know this, but
when you run run-server, a piece of code flips on this, “Oh, in debug mode,
override this URL config “and inject static files
in the middle of it.” And we have to replicate that, as well. And then, finally, you know,
this is what channel is now, but what is in the future for channels? So, hopefully you’ve got
some idea of what it is. And, kind of, what underlies this is a generalized system for
communicating between nodes. Channel design is, perhaps,
not surprisingly based on CSP, a language I was taught at university for proving asynchronous
addition-ship systems. It is the idea that you can
have a message-parsing system, and that’s how you can
reason about things, and prove things, and prove deadlocks. And while I’ve tried to make a system that is not just for handling WebSockets, but for handling generalized
asynchronous problems in a larger system. Now, websites aren’t just HTML anymore; they are different kinds of problems, different kinds of system. You might have NQTT which
is an internet of things binary protocol for reporting sensor data. Internet of things is growing, it’s kind of part of the web, as well. We should be able to
handle that kind of thing. And, so, what I’m trying to lay here is not only a good Django
thing that looks like Django, and feels like Django, but, also, and underlying
system for Django and your wider projects,
and maybe the wider projects of Python in general for how do I write different processes to talk to each other in a way I can reason about effectively? And it’s not, also, even
to fit inside Django. You know, you can do, say
you got microservices. How do you send things
through microservices? Tada, channels! You can do that, you can do things like separating things by CPU. So, you can say, “Oh, I want
to run all the thumbnailing “on this machine for these channels. “I want to run all the things that are “have my sign-in keys on this one machine “locked in a closet behind 25 gates.” If you want, you can mix
different Python versions. This is a way to convert paths to Python 3 because you can say, “Oh,
well, these consumers “run on Python 3, and the rest of the site “runs on Python 2.” And, of course, you can run
different parts sync and async. Underlying channels is the
fact that what I’ve done is I’ve kept Django synchronous. Django still runs fully
synchronous in a worker loop. And you still have all the things, and all the same guarantees
you had with Django before. And all the asynchronous stuff is then kept in a separate process where, if you want to, you
can go and play with that. I recommend reading up before you do. Asynchronous stuff is horrible,
and WebSockets doubly so, but it keeps them apart. And if you want to, you can write your own synchronous processes, your own asynchronous processes, and tie them into that bigger system. Or you can just be like what I want to be, and ignore it and be happy, and just go, “I can write synchronous
code; this great.” And part of this is, you know, Channels is still a young project, especially by Django’s standards. And a big part of it is
diversity of implementations. So, I mentioned before there are currently three different channel layers. There is a conformance test suite, as well, I mentioned that. I quite like those in specifications. And a fourth one hopefully
being worked on very soon. And that’s, kind of, part of proving this is I want at least two
production-ready channel layers. Only one of them, the redis
one, is probably cross-network. I want at least two web servers that support this specification. You know, having Daphne is great, but the Django project is
not really in the business of running a web server, as well. So, they’re talking to other servers, trying to get them to
integrate support, as well, and having that be part of a wider push. And more optimization, like, channels is not the best at having
lots, and lots, and lots of channels being listened on because it sends them every request. It’s not particularly good
at bulk-sending to groups. Both of these are solvable problems. And, so, I believe in
getting it right first, and fast second. So, you know, we’re right
and a little bit fast. But there’s a lot of room there. It’s gonna get faster
and, hopefully, match or exceed the speed of current Django. ‘Cause at some point I
would like to have it be a no-brainer of like, “Oh,
we should run everything “via this, rather than just
have a split-WSGI thing.” And, also, finally, more maintainers. I am, in many ways, the person who talks a lot about channels. I was very grateful that there are a couple of other talks
at Django Con US this year that talked about it. There are not that many
people helping me to maintain the core project as it stands right now. And channels has a lot of problems that are fresh, and
range from easy to solve, to almost impossible to solve, depending on how you’re feeling. But, also, we still have Mozilla funding thanks to the very
generous Mozilla program that I applied for after
this conference last year, and then they just gave us
some money to build this. So, if you are at all interested channels and want to help do a few bugs, or write some documentation,
or watch tutorial, or write some examples
up, or fix some horrific, nasty, reality-tearing
things in the middle of it, all those are possible. Please, come and talk to me, and I’d happily talk you
through some of this stuff. And, finally, channels
have been here for a while. I am very close to
calling what we have 1.0; it’s not quite there, yet. You may not be aware, but
channels over the summer became an official Django project, so it’s now at github.com/django/channels. And, so, soon you reach a stable stage where you can run an API properly. It’s not gonna change much anyway; it’s pretty stable as it is. And then from there plan more,
like, what is the future? Maybe, sort of, start moving towards more of a date-based release system like Django has, as well. And with that, that’s the end of my talk. Thank you, very much. (audience applauding) – Thank you, very much, Andrew. Predictably we have a lot of questions. – Oh, great, how many hours we got left? – According to the schedule
we have four minutes. But that’s mostly my
fault for running over. So, we’ll run slightly over ’cause there’s a lot of
interesting questions. Let’s start off with something
relatively straightforward. What are some sort of common-use
cases you can think of for why you want to
start writing channels? – Yeah, that’s a good question. So, not everyone has a need
to write channel stuff. In particular, WebSockets
are still somewhat new, and not everyone wants to use them. So, I would say that if you have any part of the site that currently
uses like long polling, or sort of any kind of poling, or any live data, that’s
an obvious first choice. What I’m looking to do,
actually, in the near future is implement something like SockJS, which is a layer that sort
of encompasses long polling and sockets for fallback purposes. And, so, what you can do
is, oh, we’ll just use the existing JavaStrip that
someone else is writing with a spec they have, plug it into Django, and then
you can have easy live data on your site straight away. And a lot of more richer
batches want that kind of stuff. So, that’s a good first case, I would say. – [Interviewer] So, if you
want to get involved channels, is there, like, an equivalent
with the Django tutorial, yet? Is that a work in progress? – So, there is a getting-started guide. The documentation, which
is linked right here, has a, sort of, there’s
a first documentation introduces the concepts
in a, sort of, more proper way than I have here on stage. And then the second
document is a basic tutorial that, oh, let’s write to consumer, let’s fiddled with HTTP responses, let’s see what happens. It’s not as good as any of the
tutorials we have in Django, but it is a good start. I would love to see that
improved if you’re interested in helping out with that, as well. – So, having gone through
those documents at some point, and built my own little
channels and testing things, this is a question I’m
particularly fond of. Why do channel paths start with a slash? – Oh, you mean the URL routine? – In the URL routine. – So, the reason they start with a slash, so, if you can help me, let’s get this guy off the slide here. Ah, it’s got a good thing. So, paths in channels start with a slash; in URLs in Django they don’t. The reason for this is this,
actually, is not URL routine. It is routine-like action
route in the message. It happens that all the
messages have a path in them, and then that you can use that. But you can, also, do method, if this was a HTTP, you
could do method there. You could do resupply channel,
you can do anything you like. Header-based routine. It’s because it’s not just URLs, I didn’t wanna put special case, well, if the name you
look for is called path, then strip the first slash. So, this is more of a it’s a general code, rather than a URL-code thing. – So, I mean, there’s a dip
in progress at the moment about URL routine, and improving
that system in general. Do you think that this, sort
of, more of advanced system that you built might be applicable to reworking Django’s own URL routine? – I would love to see the two of them come together as one system. You know, I had to write
a, basically, brand new one because the current one
is not very flexible; but hopefully we can fix all
of that in a new rewrite, and have the same systems
shared between both of them, with a flag that says,
“Starts with a slash yes, no?” (chuckling)
(audience laughing) – How is this different to… To Celery, or, sort of, quote-unquote normal message queuing? – So, Celery is a task queue, and, in particular, it
is for offloading things to be done at some point in the future. Channels is a high-speed,
basically, communication layer. So, the latency is a big difference. Celery is reliable and trustworthy, and if you put message on it, it’s probably gonna get delivered. Channels is a, but it
might take like a second, or half a second to get there. Channels is, you know, under
a millisecond, really quick, designed for a very rapid chat. And, so, they’re different
kinds of systems. They both are kind of message buses, but, you know, many things
are the same underneath. For example, I would say sending an email is probably best done via
Celery if it’s important because Celery has more guarantees. But something like running some stats code is probably better done via
a background channels task because it’s much lighter weight. So, there’s a balance there between different design patterns. – I guess, kind of, building on from that, if you’re gonna do some things
in the background like that, is there a way of making sure
that particular connections, or particular channels, or so on, are only handled by certain
parts of your system, so that you could load balance, you can run it all in one system, but you can load balance your
different areas independently? – Yeah, so, that is definitely in there. So, the command you run
to run a Django process, which is called Run Worker,
there are options of that. So, you can say, like, “Oh, only run this pattern of channels.” Or, like, only run WebSocket dot star. Or, don’t run these channels,
don’t run HTTP dot star. And, so, if you want, you
can balance the process, say, “Oh, this process handles
everything but this channel.” And, sort of, load balance that way. Oh, these servers have more RAM, so we’ll put thumbnailing over here, and then we’ll put queuing over here, and we’ll put handling over there. So, you have that kind
of flexibility in there. – Okay. Has the HTTP over
channels been load-tested? Do we know whether this is better, worse than whatever the
situation we’ve got? – It has thanks to one of our people who signed up to do some help. It has been load-tested; it
came out very slightly slower than Gunicorn, which
as far as I’m concerned is not bad considering. And it’s mostly because
we are taking the request, re-encoding it, sending it over a network, de-encoding it, and doing
handling on it, again. So, it is a bit slower. I would love to see that equivalent speed, at least in a similar configuration. It, also, isn’t as great at, since they got, like, 1,000
simultaneous connections, it doesn’t have that, as well, yet either. So, there’s room for improvement there. But I fully believe it
could get as good as that. – But if you’re looking
at the performance of it, what profiling tools do we have available for distributive systems like this? How do we track how long it actually takes for a message to come in, go on the message bus, get processed, get sent out to something else, how do we profile the system? – We have disturbingly few talks for this. So, for HTTP you can obviously run the standard HTTP benchmarking tools. For WebSockets there is one from known community we’ve been using. There’s, also, one I’ve written, as well, which is a bit more mean. Sort of, throws in random non-UTF8 bytes, and then changes the order around, and tries to see if they
come back in the right order, and stuff like that. Part of this optimization
is writing those tools, writing that sort of
metric stuff to hook in, and go, like, well, what is the latency inside the ASGI layer? What is the end-to-end
latency of the whole system? Is it slow in one direction? Like, that needs a lot of work. – [Interviewer] So,
debug toolbar for ASGI? – Yeah, it’s on the ambition list. There’s one that says
debugger things for HDI, yeah. – Within the spec, is there the capacity that it could, at some
point, have a at least, at least once-mode, instead
of at most once-mode? And how many components are there that work on this principle? Could there be a different version of swapping some of those things in? – So, it’s not part of
the system, necessarily. At most once is a part
of, it’s a guarantee that you have to code against. And, so, it’s the way things are designed. You could do it, you could reuse the existing
layers and interfaces, but all the code on top,
all of the decorators, all of your consumers
would have to be rewritten. So, you know, imagine you have
a slightly expected failure, so, right now channels expects
failure, it’s built into it. Like the session-handling
cleans up after a while, ’cause it expects failure. And now you have to deal with duplicates. So, now you need a, well,
what happens if you send the same response back
to the client twice? Do we need to write new
client-side code to dedupe things? Do we have a dedupe store
locally to handle this stuff? It’s possible, and a lot
of the same things apply. It’s just like one of those base decisions that if you flip it,
you change the universe of code that comes after it. Maybe you could, and
certainly you could just write a layer that did that, but you wouldn’t be able to reuse a lot of the existing code. – So, I had a bunch of related questions around the choice of particular things. There’s lots of different
things that you can use for transport layer, of course,
there’s lots of different infrastructure stuff that
you could feed into this at the front end. You know, if you got
slack-input, or whatever, is this expected that you’d need, what would be the process of
implementing one of those? And kind of related to that, what do you currently
recommend that you use as your default stack
to run this thing on? – Of course, so, what you
have is basically you have what’s called a protocol server. That’s to, sort of, sit
between the incident and the protocol handling, be it slack, or IRC, or HTTP. And then that talks the ASGI backend, and the channel layer behind it. And, so, for each of
things you want to talk to, you need one of those. So, if you were doing a slack-integration, you’re writing slack-protocol servers. Sat there with its own process, with its own async framework that opened, I think they have WebSockets
in HTTP-API slacks. It opens those connections, and
does all that stuff for you. But the other part of it,
also, is writing a spec because, I mean, you don’t have to, but I highly recommend that
you write a specification that you code the protocol server against. And that’s, kind of, part
of what I’m trying to do with email and that
kind of stuff, as well. As for the default stack,
right now, personally, I would run Daphne for WebSockets only, and then route WSGI stuff,
also, HTTP to a WSGI server. So, what you can do is you have Nginx, you can do a path-separation, say, “This path goes to this backend. “This path goes to Gunicorn.” That’s how I run it. My personal, I actually run
this just in full Daphne-mode because there’s nothing
important on there. But once it hits 1.0, and
we get a bit more stable, I would start recommending
just run the whole thing through Daphne. And, hopefully, in the
future I can then say, “Or Server X also has this
support for this natively.” I’ve been talking with
a couple of maintainers of different Python web servers to try and get that conversation going. – Great, thank you. Final question, although there are at least a dozen more
questions I could ask. – [Andrew] I’ll read through them. – So, please do go through them all. I’m going to ask this one because it’s coming from the livestream. – [Andrew] Oh, hello! (Andrew chuckling)
(audience laughing) – How do you send a message from a client through the socket to a new route? It’s not documented
anywhere in said question– – So, how do you send the
message from the client? Say that, again. – How do you send a message from a client through the socket to a new route? – I don’t know what they’re asking there. I think the problem we need more context. – You’ll need to go and
ask that one Youtube then. – I’ll go in the channel
and try and answer that and get some more background on that one. – Great. Thank you, very much, Andrew! – Thank you!
(audience applause)


Leave a Reply

Your email address will not be published. Required fields are marked *