Designing Quality APIs (Cloud Next ’18)

Designing Quality APIs (Cloud Next ’18)


[MUSIC PLAYING] MARTIN NALLY: Welcome
to Google Next, and welcome to this talk
on designing quality APIs. I’m glad to see a
reasonably good turnout. I did once give a presentation
here in San Francisco, and I was scheduled in a
back room at the same time as the keynote presentation with
some reasonably famous speaker. I can’t remember who it was. And when it came time
to start the session, there was exactly one
person in the audience sitting in the front row. And I was just
sort of considering what to do at this point
when some fellow wandered in through the back
door, and it was fairly obvious from
the way he was behaving that he was completely lost. He didn’t know where he was. So I give him an
extremely warm welcome and thanked him for
showing up to my session. So of course at this point,
this poor fellow’s stuck. I’ve sort of cut off
his line of retreat. So anyway, the three of us sat
in the front row and just sort of chit-chatted for an hour. So I’m very happy
that so many of you have turned up for this talk. So I assume you’re here because
you’re interested in APIs. And if you’re
interested in APIs, you’ll know that there’s lots
of controversy around APIs, and there’s lots of conflicting
advice on what to do. My experience is
when that happens, it’s often the case that
people aren’t really disagreeing about technology. The reason that they disagree
is that they come to the problem with different ideas
and different values. They have different objectives. They’re trying to solve
different problems. And so since they’re
actually trying to solve different
problems, they tend to disagree
about the solutions. So I thought I’d take a
moment to talk about some of the different things
that people are actually trying to achieve
when they approach the problem of
developing an API. This is a picture
of Richard Hamming. You’ve probably heard
of Richard Hamming. He’s a famous person
in computer circles. He was a sort of
mathematician and scientist of the 20th century. He is famous for
Hamming codes, which are error correction and
detection codes which are actually still used today. They’re used in RAM
memory, and they’re used in wireless protocols. One of the things that
Hamming was famous for, in addition to Hamming
windows, Hamming distance, Hamming codes, he was famous
for a series of lectures that he gave. And you can find those
lectures on the internet now. There’s versions on YouTube
as well as written versions. And one of the most famous is
called “You and Your Research,” which is his advice
to researchers on how to maximize their impact
and manage their careers. And Hamming himself says that
although his experience is from research, his advice
is useful in different kinds of domains. So Hamming makes some important
points, some useful points. If you really want
to have an impact, then you should make
sure that you’re working on problems
that are really important problems
in your field. If you’re not working on
the important problems, it’s unlikely that you
will do important work. Another point that he makes–
in fact, this is a great talk. If you’re not familiar
with Hamming’s lecture, then you should go and spend
some time looking at it. In fact, the truth is if you
only had one hour to spend, you’d probably do better to
spend it listening to Hamming than listening to me. But I’m going to assume that
you have two hours to spend. So you’re going to spend
one hour listening to me and then one hour
listening to Hamming. The other thing that
Hamming talks about is this idea of having
an attack on a problem. And for him, an attack is kind
of a plausible approach that might lead to a solution to
the problem or, at least, make a dent in the problem. And in fact, he says that
the three greatest problems in physics were problems
that he never worked on and really nobody else ever
worked on during his career. And those three problems were
time travel, teleportation, and anti-gravity. And it’s not that those weren’t
problems for which a solution would have had an impact. It would have actually
changed the world, and it would have given
you a reputation that would have lasted centuries,
along with your Nobel Prize. But the problem is
nobody has a faintest idea how to attack
those problems, how to approach those problems. And therefore, it’s a waste
of time working on them. So good problems have
the characteristic that they are really
important in their impact, and you have some idea
how to tackle them. So if we bring this back
to the world of APIs, one very common way
to think about an API and the purpose of
an API is that it’s a way that two
pieces of software can communicate
across the network. I have a client and a server. And the client
needs to get at data or to invoke actions
on the server. And we want the communication
to be fast, secure, and easy to program. So many people come at APIs
with that as sort of the problem definition. And that’s actually not, at
least by Hamming’s definition, an important problem
in our field. That doesn’t mean that
it’s not important. I’ve worked on many
projects where those were– solving that problem
was important. But this is actually
a solved problem. There are several good
technologies for doing this. You just have to
pick one, if you think that this is what
your primary problem is. Here are some different
goals that you might have if you’re defining an API. And these are important
problems in our field. And these are unsolved problems. So one important problem is that
software is just basically hard to change. And every enterprise
that I’ve ever worked in or every enterprise that I’ve
had contact with and worked alongside with, which
is a much larger number, has had the problem that they
are essentially prisoners of their existing software. Unless you are in the first
few months of a startup, you have a legacy of
existing software. And it’s never what you want. It’s never what you want
for your business needs. The business scenarios change. That business needs change. And it’s never really what you
want from a technical point of view either. Once you’ve done
something, you always immediately learn how you
could have done it better. But you’re stuck with it. And it’s difficult to change
because it’s complicated. And it’s difficult to
change because in almost all the systems that
we have, they’re permeated with assumptions. We have difficulty containing
the assumptions that we make, whether those are technical
assumptions about which databases or which
technologies or which implementation languages
we use or whether those are business assumptions about
what actually the scenarios are that the system has
to support, which is also something that changes. Software is also very
hard to integrate. And integration is a huge
issue, especially in– well, in all of the enterprises
that I have worked in. Almost all the
enterprises that I’ve worked with have some
kind of initiative to get a 360-degree view of the
problem, which is basically, how do I integrate
all the information that I have about a customer? Or they’re trying to integrate
the total experience. They have many points of
contact with a customer, and they’re trying to integrate
those things together. Many of you will have traveled
here to the conference. When you made your
booking, that’s a big integration app dealing
with hotel reservations, your credit card, your flight
reservations, and things like that. And when you go back and
do your expense report– assuming that you have
a reasonably modern sort of expense report tool, which
I’m sure most of you have– that’s another big
integration app. It’s integrating your
credit card bills. If it’s a nice
modern application, your flight receipt
and your hotel receipt will already be known
in your expense report and associated with your
trip, that kind of thing. It’s all a big kind
of integration app. So a huge amount of the
development that I see is, in fact,
integration development. We build up these little
silos, and then we realize that there’s a
large amount of value if we can integrate those silos. Software’s hard to integrate
often because basic APIs are not available. So it is difficult when
you can’t get at the data that you want to do. And also, because it’s
an n-squared problem, every system has
its own unique API. So if you’re trying to
do an integration thing, you get to learn the APIs
of all of the systems that you’re trying to integrate. I believe that
some styles of API go beyond the first problem. They go beyond simply an
efficient, easy-to-program communications between two
distributed components, and they offer a
significant attack on those two major problems. Of course, there are other
major problems in software. Like software quality
is a huge problem. But I don’t have an
attack on that one, so I left that off my list. But I think APIs
provide an attack on those two major problems. There are two dominant
styles of API. One is the entity-oriented
style, sometimes called REST. I’ll come back to
that word in a minute. And in that style,
the client code manipulates the entities that
are exposed by the server, and the behaviors
of the system are hidden behind those entities. What I deal with is the
entities themselves. I can create entities. I can update entities. I can relate entities
to each other. What the system actually
does is a side effect of those entity-based
sort of operations. An older model is the
procedure-oriented model, the sort of RPC model. And that one is the
perfect inverse. In that one, what I interface
with is the procedures that you provide. And the entities that those
procedures are talking about are hidden behind
the procedures. So one puts the entities
in front of the behaviors. The other one puts the behaviors
in front of the entities. They’re inverses of each other. Here’s my kind of
stylized thing. And actually, you’re probably
annoyed at this chart, and I’m a little
annoyed at this chart. Because it’s trying to make a
point which I actually believe is true. And the point that
it’s trying to make is that in the
procedure-oriented style, typically what
you end up with is a very large number
of these procedures that you have to master. So there’s a lot of these
basic procedures or functions that you call, that you
learn a long list of. The entity-oriented
style has more structure. And the problem with this
chart is that it’s not really balanced on each side. The procedures that
I’m showing actually imply a number of
different entities. There’s the dog. There is the owner of the dog. There’s the veterinary
surgeon that deals with the health of the dog. So if this were
really a balanced one, I would have multiple entities
on the left-hand side. But the point that
it’s trying to make is that there’s
much more structure on the left-hand side. The left-hand side is like an ER
Diagram, an Entity Relationship Diagram. I understand what the
basic entities are. I understand the
relationships between them. But the actual methods that
I call are very limited, and they’re always the same. The procedure-oriented
style is very similar to learning the
libraries of a programming language. And I think this is
one of the reasons why RPC has been perennially
popular with programmers, because it’s very familiar. You learn a
programming language. You learn the libraries of
the programming language. And learning
distributed API that is written in the
RPC style is just like learning another library. It’s not really very,
very different from that. But one of the characteristics
is all of those APIs are different. The entity-oriented style
is much more similar to learning to use a database. So one of the characteristics
is if you use whatever your favorite database is– MySQL, or Postgres, or
Oracle, or whatever it is– a fundamental characteristic
is that there is only one API. It doesn’t matter how
many databases you have. The database management system
that hosts that database defines a single API that’s
used for all of them. So you only ever learn one API. Now, when you go from
database to database, you have to learn the
schema of that database. You have to learn the tables
and what are the tables. And then, you learn, what are
the columns in that table? Or if you’re using
MySQL database, the terminology
changes a little bit, but the concepts don’t
change very radically. So the entity-oriented
approach has this very interesting
characteristic of solving the
n-squared problem. Instead of having an
individual API for each system, I have one universal API that’s
used with all of the systems. And I gave this talk– I practiced this talk. It’s not very common for
me to practice talks, but Google wants us to do the
best that we can for you guys. So I was encouraged
to practice this talk. And when I gave this
talk internally, somebody said, are
you really telling me that my API should
just show– it should just be a view onto my database? And the answer is no,
that’s not what I’m saying. What you actually
store in your database is probably more complicated. You may have multiple
tables for each entity. You may not even have
tables for your entity. Your entities may be assembled
from information that you get from other services. So it’s not about taking
your database schema and just exposing
it as your API, but it is about using the
same kind of concepts, the same conceptual
model for an API that you see used
for a database. So if we’re going to attack
these important problems, our problem was
it’s hard to change, and it’s hard to integrate. Some of the
characteristics from APIs, especially the
entity-oriented APIs that will help us here is– one
is having a uniform API across all applications. That helps with our
n-squared problem. Another one is having a uniform
model for relationships. I’ll talk a little bit more
about why relationships are so important in this. And then the third
one is staying free of implementation assumptions. So how do we get that from this
kind of entity-oriented API? I said I’d come back
to the REST term. REST API is actually a
little bit of an oxymoron, because one of the primary rules
of REST is– what was REST? REST was one man’s
attempt to write down the architectural principles
that underpinned the HTTP web. So it’s really a description
of the architectural style or the architectural
principles of the HTTP web. And one of those principles
is uniform interface, which means that it really
only is one interface. So when we talk
about our REST API, there’s only one important
REST API in the world, and that’s HTTP. So when we talk about REST
APIs, what we’re really talking about is,
how do we just use HTTP– just use HTTP
exactly as it is and apply it to
the domain of APIs? Really what we’re
saying is don’t layer concepts on top of HTTP. Just use it as is. It is the universal API. Unfortunately, HTTP
has some gaps here. HTTP actually tells you how
to do crud, in a nice way. HTTP does not tell you how
to do query, for example. And query turns out to be very
important for almost all APIs. And neither does it tell
you how to do versioning. So there is some amount
of invention required beyond the sort of universal API
that is provided to us by HTTP. How do you actually leverage
this uniform API, this HTTP API? Well, again, it’s
a lot like what you would do with a database. The first thing you have to
do is you have to define a few well-known URLs– for example, /customers,
/accounts, /orders, which is very analogous to
defining what the tables are if I’m defining a
database schema. And then, I have to define the
actual resource schema, which is very analogous
to defining, what are the columns for each
one of those entities? And then the rest of the problem
is don’t mess up the basics. So when we have to start adding
query and adding versioning and things like
that, do it in a way that’s compatible with this
standard API that we want to– that we want to show. So here’s something
that’s very typical. So here is a simplified version. And if you’ve worked at all
in sort of HTTP JSON APIs, I shouldn’t think
there’s very much here that’s not familiar to you. But the thing that I’m
highlighting is this id:12345. And when you do
that, you’ve already stepped outside the
uniform interface. And the reason is that
the uniform interface that we’re trying to use, the
HTTP interface, requires URLs. It doesn’t require
simple numeric IDs. So if I’m going to write
my identifiers this way, I’m going to require extra
information about my API. I’m going to have to
describe in the documentation how you take one of those
IDs and how you transform it into a URL that can
then be used in the API. And this was unfortunate,
because it would have been just as easy to do that. And if I’d done that
instead, then now, I stay within the uniform API. I don’t require you to have
extra out-of-band information about my API in order to use the
universal API, which is HTTP. So very small change,
but these things sort of make a difference
when they’re accumulated. Relationships are
extremely important. Relationships are
extremely important when you’re designing a
database, and they’re also extremely important when
you’re designing an API. One of the reasons
is that APIs– the fundamental problem in
most integration projects is creating relationships
between entities that are stored in
different applications. Sometimes, I’m trying
to get app function through a particular API. The integration examples
that I gave earlier– for example, your expense
reports or your travel booking, are all about relating
entities to each other. I’m taking this flight along
with this hotel reservation. And I’m linking them all
together with this trip ID. So I’m relating entities
in different systems. Relationships are a
very fundamental way about the way we think. Thinking about
the world in terms of entities and
relationships is an idea that goes at least back
to Plato and Aristotle. I put their pictures
here just in case you come across them
at the conference and want to talk to them. Relationships
depend on identity. I cannot reason about a
relationship unless I can reason about identity. I can’t say John loves
Mary unless I can reliably identify John and
reliably identify Mary. And so one of the
primary obstacles to these kind of
relationship-based integrations through APIs is the struggle
that we have with identity. And here, the universal
API, the HTTP API, makes a huge contribution by
defining the concept of a URI– or a URL, which gives
us the ability to, across all of our
systems, write identifiers in a way that’s standard and
repeatable and storable and all of those kind of things. But again, we tend to
make mistakes here. So here’s a particularly– I’ve got the wrong pointer here. [INAUDIBLE] Didn’t
mean to do that. So here, focus on
this last line. So this is a very typical
way that you’ll see people writing or attempting
to write relationships in a typical sort of JSON API. They write ownerID. OwnerID is the field that I’ve
used to capture the owner. And I give its
numeric identifier. Now, the problem with this
is that this identifier, I need a lot of
contextual information. I cannot just use the uniform
API to follow this link, because I need
contextual information. I need to know
which other systems can I put this value into? And how do I plug that
in a way that creates a URL that I can then use? And in fact, this breaks
down in different ways. I mean, supposing I had
different kinds of owners– you know, some owners
are police dogs, right? Or rather, police dogs have
a different kind of owner. In that case, the owner of
the dog might not be a person. It might be the police
department or something like that. So how do I know to
resolve this number by looking in the system of
institutional owners as opposed to personal owners,
those kind of things? And we could have avoided
all of those problems if we’d simply
written it this way. In other words, owner’s the
relationship, and give a URL. If the URL’s in the same system,
you can write it relatively. I can resolve a
relative URL knowing only the standard specifications
published by IETF. I don’t have to know anything
specific about your URLs or your systems to do that. Or if this references
something in another system, then I write the URL
in its full form. And you might think about how
do you do this kind of example with RPC? With RPC, it’s probably
even more complicated. With RPC, you have to
take something like this. And then, you have
to know, amongst all of the remote procedures
that I could call, which are the ones that
will take something like this as an argument? So having a uniform
model for identities is extremely important. And the simplest way
to do that, to leverage the universal interface, HTTP,
is make every idea a URL. Another thing you
should do is not to confuse identity with name. So thinking about identity
is very important. It’s very important if you’re
going to make this thing work. So think carefully
about identity, and don’t confuse
identities with names. So what is a name? Oh, here’s a nice description
of what names mean in English going back to 1597. This is “Romeo and Juliet.” You all know the story, right? In fact, this is one of
the most famous passages in all of the English language. And so Juliet would have
made a terrific API designer. She thinks very carefully and
very clearly about concepts. Now, what she’s
saying to Romeo here is that your name
doesn’t really matter. Your name does not
say who you are. And you can cast off your name
if your name’s not convenient– you know, “doff
thy name,” Romeo– which is very different
from your identity. And so names are terrifically
important things. They do have a place in APIs. Many well-designed
APIs support names. But they do not confuse
names with identities. So here’s an example. Here’s a good identity. This is a pretty good identity. This is a very simple– doesn’t matter what kind of
what kind of resource it is. No amount of change– this pet– well, maybe there is
a problem with this, actually. So for example, if
I have a goat that’s part of a herd that’s really
just one of my possessions, and then I take pity on one
of these poor sick goats and take it into my
house, it became a pet. So something changed here. So this actually
might not be a pet, might not be a
very good identity, because it refers to
a characteristic that might actually change. Something that was
a pet might not be a pet if it’s just put out in
a field with the other animals. The important things
about identities is that they are immutable. They’re not based on
some characteristic that might change. I see lots of people
trying to make identities that look like this. And it’s very attractive,
because this kind of identity– one, two, three, four, five– nobody can remember that. But you can remember
that Lassie was a dog, if you remember the
Lassie story from the movie. Lassie, though, was a dog that
belonged to Joe Carraclough. Unfortunately, Lassie was
taken away by the Duke of– I forget, whatever. So not a very good
identity, because Lassie changed ownership in the movie. But it is useful to
have URLs like this, because it’s very useful to be
able to look things up by name. But you have to be aware
that names can change. And in fact, that
brings us to query. So one of the things
that I warned you about is basing APIs on HTTP is
a very powerful technique, but there are gaps in
HTTP, particularly around how you deal with query. And in fact, this URL, which is
basically saying, find the dog, or the pet, called Lassie that
belongs to Joe Carraclough, can be rewritten this way. It’s really just a
stylistic difference. But both of these are queries. This is not a URI in the sense. Well, it’s a URI of a
search, not the URI of a dog. And if you write
that URL this way, it doesn’t really
change the meaning. But it does make it a
little bit more obvious that, in fact, this is
just the URL of a query. It’s not that URL
of a particular dog. And because it’s
a query, it’s not guaranteed to get the
same result twice. Let’s talk about representation. So representation
in HTTP is the word that is used to mean the
actual format of the resources. So if I do get on a particular
resource, what I get back is the representation. In the HTTP world, JSON is the
most common sort of language that we use for writing
down those representations. But one of our design
problems as an API designer is, what should
the JSON look like? And there are lots of options. If your focus is the first
focus that I talked about, that your primary reason
for building an API is for efficient
communication between two components on the
network, then the kinds of things that you’re
going to focus on is you want things
that are efficient. You want things that
are easy to parse. You want a good match for
the programming language on each end. Those are going to be your
sort of your primary values. If your focus is, how do I
make things easy to integrate, and how do I make things
easy to change in the future, your focus is going
to be different. Your focus is actually
going to be independence from technology,
because you may want to anticipate different
clients of the same API that will show up at different
points in history. And resilience to change is
much more important to you than things like
ease of parsing. And sometimes, these
are in conflict. So you may have to make up your
mind what’s important to you. If you’re working directly
with the implementation teams, they have a lot of biases. And they will push
for certain things. They won’t like homogeneous
collections, for example. They want things that
are easy to parse. And especially if you’re working
in a typed language, like Java or Go or something like that,
it’s easier to parse things. It’s not impossible to parse
things if you don’t have this, but it’s easier to parse
things if you know ahead of time what the types will be. So don’t give me
a collection that has a lot of different
types of things in it. I don’t want the famous
“Sound of Music”– you know, “My Favorite
Things,” “Raindrops on roses and whiskers on kittens.” So those are all
different kinds of things. Don’t put them all in
the same collection, because I find that
harder to parse. Closed content– don’t tell
me that you don’t know ahead of time what all the fields are. Because in my parsing
routines, I really like to know what all the fields
are going to be ahead of time. So you’ll get these
kind of things. I’ll show you what I mean
by indexed collections. Union types are unpopular. Don’t give me a field
that’s sometimes a string and
sometimes an object. I don’t really like that. Makes it harder for me to parse. So you get lots of these
kind of constraints coming from developers of things
that they don’t really like. As an API designer,
if your goals are the big problems
that I talked about, I think you have to push
back on some of these things. Some of these things are fine. It’s not always a
good idea to have a single variable
that’s sometimes a string and sometimes a number. But if you think it makes
conceptual sense, then maybe you should push back on
the implementation teams who’ve got specific constraints. Here’s an example of what I
mean by indexed collection bias. And this is actually– I copied this out of
an old Facebook API. This may not be current. But this is a Facebook API
for listing photographs. And you can see the way
the data is organized is actually reasonably
complicated when I do it with my pointer. What happens is that the
outer name in the JSON thing is actually the user ID. And then, the inner
stuff is a set of photographs for that user. And then, we start with
a second user here. And then we have a block
that’s the list of photographs for that user. And when you design JSON this
way, what you’re really saying is that you’re favoring
one particular access pattern for the data. You’re assuming that the
primary usage by the client will be that they’ll want to
access the photographs based on the user ID. If they wanted to access
the photograph based on creation time or
something like that, this is actually not a very
friendly data structure. So within this data
structure, you’ve sort of advantaged one
set of access patterns and disadvantaged another
set, which might be OK, but it might have been better to
do something simpler like this. Flatten the whole thing out. So if I had flattened the whole
thing out– well first of all, it’s simpler, because
it’s much flatter. And what I really did was
I took these user IDs, and I moved them into the block. This is going to be
less convenient to use if my primary access method is
accessing the data by user ID. But if I’m accessing it
by some other criteria, it’s actually better. I actually prefer this. So I prefer the second form. Because I’m more open to change. If I have different users that
have different access patterns, I haven’t penalized
the second set of users because I optimized
my data representation for the first set of users,
even though the first set of programmers were
probably pushing me to do something like this. Because they wanted
something that was convenient to
their access pattern. I’m very public-spirited. And I have taken the
trouble to write down a set of possible rules. I’m not claiming that these
are the only rules that you can– that need to follow if
you’re going to use HTTP well. But I tried to write
down a set of rules that explain how you can
use JSON in, perhaps, the most simple form possible
that might be helpful. And the reason I like it
is it’s more abstract. It’s less geared towards one
particular access pattern. So query’s not
dealt with by HTTP. Neither is versioning. And versioning is a
very confusing topic. One reason is there’s at least
three meanings of versioning. And it’s very
common when you hear people talking about
versioning for people to talk past each
other, because they’re each thinking about
versioning a different way. They’re each thinking about a
different aspect of versioning. And you sometimes find people
even getting themselves confused by using
the word versioning in two different ways
in the same sentence and getting muddled. So one kind of versioning,
I have dubbed here entity versioning, which says,
there are multiple versions. So I have a particular
kind of an entity. I have a travel
trip or something, to stick with our
previous example. And I introduce version
one of travel trip. And then, that was successful
for a couple of years. And then, I had a major upgrade,
and we completely reworked our data model. Because we reworked
our data model to accommodate some
more advanced scenarios, the old trips and the
new trips are different. So I introduced a
version two of my system, but when I introduced
version two, I started creating
version two trips. So every trip is either a
version one trip or a version two trip, but they’re not
both at the same time. So they’re a different
set of entities. So that’s a very
common thing that happens when you make a major
change to your data model. Another kind of
versioning is what I have dubbed– this is
just a personal vocabulary– but I have dubbed
format versioning. And what happened there
was I introduced trips, and I gave them one
JSON representation. And then I realized from
feedback from my customers, they didn’t like the names that
I chose for the properties, or they didn’t like the way
that I organized the JSON. Maybe from the previous example,
I’d organized it the first way, and they really preferred I
organized it the second way, or the other way around. So I decided to
introduce a version two, but the important thing
is that each trip can be viewed either as through
the version one format or the version two format. They’re not two
different sets of trips. They’re the same
trips that can be viewed in two different ways. The third kind of versioning
that you hear about is where you keep prior
revisions of the same thing. So if I have a document,
a Google Doc, for example, and I keep changing
my Google Doc, I may have a long revision
history for that thing. And you use these
in different places. You use entity
versioning if you made a significant structural
change to your data model. You use format
versioning if you’re making small changes
to the representation of individual entities. And you use
historical versioning if you’re doing things like
creating an audit trail, undoing undo/redo. And they’re all different. So having got clarity
on that, let’s look at format versioning,
because that’s the one that I think people
do most commonly with APIs. And let’s look at some of
the mistakes that they make. So I’m doing format versioning. So one thing that you’ll see
people commonly doing is this. They put a version
number in a URL that they then use to
try to make a link. And that doesn’t
really work well. And the reason that
it doesn’t work well is that just because I
wanted version one of Lassie doesn’t not mean that I want
version one of Lassie’s owner. And in fact, there
may not even be a version– let’s say I asked
for version two of Lassie. There may not even be a
version two of Lassie’s owner. Owners may only
have a version one. So it’s not possible
for the server to select the version
to use in a link. So what that means is you’ve
got one or two choices. Either you don’t put version
numbers in links at all. That’s one possibility. So that will solve the problem. Or for each entity, you
need to provide a URL, one that does have the version
number in it and one that doesn’t. And the one that does not
have the version number is the one that you
will use in the links. And then, customers or
clients can transform that into a URL that does have a
version number if they want to select a particular version. So either of the
strategies will work. But what will not
work is trying to use URLs that have version
IDs in them in links. So you might protest
that if I do this– OK, let’s say I take
all of this stuff out. So I’ve taken all the
version numbers, or version identifiers, out of the URLs. But then, I’ve lost
some information. Well, you can put
it back if you like. Invent a property instead for– my pointer just stopped working. Has it? Yes. Invent a property that tells
you which version number– or which version
format you’re looking at when you’re looking at this. And if you want
to give out an ID and also tell me exactly
which version I’m looking at of the current one
or URL of the current version, you can use something like this. And these values
actually probably fit better with
the style of HTTP if we put these in headers,
but there’s certainly no harm in having them
in properties also. A lot of what we’re doing here
is just not messing things up, right? We’ve got this universal
API called HTTP. We know how to work it. It works based on URLs. Just follow the pattern. So as you are designing
things and you’re having to add things that
are missing out of HTTP, do it in a way that doesn’t
mess up the basics of how HTTP is actually used. So the right way
of doing versioning is every entity has a
version-free identity URI. You can allow clients to use
the version-free identity URI directly by having
an accept version header. Or you can allow them
to transform the URL into a different
URL that does have the version identifier in it. If you do that,
then you’re having to provide basically a
transformation algorithm that’s specific to your API. It’s not a generic part
of the protocol anymore. So I’m going to wrap up here
and invite you to ask questions. But my conclusion
slide is decide what you really care about. If you care primarily– in
this particular application that you’re doing,
if you care primarily about efficient communication
between tightly-coupled components, there are lots
of great choices out there. GRPC is the one that’s
favored by Google, and it’s a great choice. If what you’re focused on
is latency and throughput and efficiency and
ease of programming, RPC does all of those
things very well. If you’re aiming for something
more abstract that you’re creating your APIs
because you’re primarily interested in decoupling
systems so that they’re more adaptable to
change in the future, or you’re primarily interested
in creating APIs that will enable integration scenarios–
and actually one of the most important integration
scenarios which I failed to mention
earlier is the one when you’re publishing an API
to your partners or customers. So if you’re creating APIs
that will be consumed only internally within
your own– or you think will only be
consumed internally within your own
organization, it may be that the top set of
concerns might be the ones that dominate your thinking. Although, actually, I would
challenge that, but anyway. But if you’re creating APIs that
will be used by other people, that will be consumed by
development organizations over which you have no control– you can’t control the rhythm
at which they change things. You can’t control the way in
which they version things. And you cannot anticipate all
the different ways in which they will access your data. If that is what
you’re designing for, my recommendation is
don’t use an RPC model. Use an entity-oriented model. And in particular, use HTTP
JSON as directly and as simply as possible, requiring as little
as possible extra information about your specific API that
they cannot glean directly from the internet standards. So that’s my talk. [APPLAUSE] [MUSIC PLAYING]


2 thoughts on “Designing Quality APIs (Cloud Next ’18)

  1. I use JAX-RS to automatically Marshal and unmarsahl JSON into Java objects. Having a string like "/pet/12345" as ID instead of just 12345 would mean the parses won't work. or I will have to create an additional property in every entity to hold the string value and then extract the integer ID in post processing. Same for the other direction. Plus I probably have to name the string version of ID as id and the native integer id as int_Id or _id. So the string hack takes up the proper name "id" and the actual integer ID gets a second class name. The clean flow from database to JSON using generic code is also disturbed. Any suggestions?

Leave a Reply

Your email address will not be published. Required fields are marked *