This is the million-dollar question: what kind of a service do I need for my next project? REST is cool, but RPC is familiar. JSON is lighter, but the client already works with XML. The API will be used by mobile consumers, or web consumers, or a reporting engine, or all of these.
There’s rarely a clear-cut “one true way” when picking the best solution for a given API, but there are some key elements that can influence how to choose a solution that will be a good fit. API design is mostly engineering with a generous dash of common sense also required.
The big questions you need to ask at each step are these:
Who will be using this API?
What are they trying to achieve?
Which technologies do they use?
There are some key points that are important to consider when planning an API that will help us answer these questions and deliver an effective API.
The API will be needed and utilized by users, not developers. Start out with a comprehensive set of user stories about how users intend to use the API, the kind of people that will be using it, and the technologies they are likely to want to use. Beginning from the point of view of what a user wants to achieve, rather than a developer’s preferred toolchain, is more likely to give a good API outcome.
The term “dogfood” is a bit of a strange one but it means to incorporate the tools you make and publish into your own workflows where you can. This means not building an application with its own business logic and a separate API for outsiders to use, but instead building an API that is used both internally and externally. If you don’t need to make use of the API, it’s a good idea to at least build a sample application and think about the challenges a user might encounter.
Don’t plan a huge API and publish the whole thing at once. Decide which of your user stories constitute the MVP (Minimum Viable Product) for your consumers, and start with that. In Chapter 14 we’ll cover all the other various parts of delivering the API besides just the API itself. By starting small, it’s a great chance for a development team to make sure that the documentation, deployment, testing, and all the other associated tools are also in place and working well.
The first decision to make when designing any API is one that can’t be changed: decide what kind of a service you will offer. This depends on a combination of the audience and the features that will be offered.
If the service mostly deals with creating, fetching, and manipulating data, then a RESTful service should definitely be a candidate in your design decision. In Chapter 8 we discussed how everything in a RESTful service is either a collection or a resource, and if the service you have in mind mostly deals with things or groups of things then REST is going to be a great fit. It’s increasingly widely used and there are some excellent resources available specifically about designing RESTful services, independent of which language you implement it in (but PHP is a great choice).
For an API that does actions rather than working with things, an RPC service might be a more logical choice to make. RPC services are a very familiar paradigm for developers of all kinds and so they can offer a shallower learning curve to people integrating with external APIs for the first time. It might be useful to look at some of the existing services cited in Chapter 7 to get an idea of whether those feel like the kind of thing you are aiming for.
Another consideration is how easy your chosen type of service will be to build, deliver, and maintain with consideration to the technology stack and abilities of the consumers; some technology communities are much more familiar with SOAP for example, and would prefer a service with a WSDL that will bolt in nicely to their existing platform. Whereas a mobile developer would probably prefer something more lightweight such as a RESTful JSON API.
A SOAP service will always use XML, but for RESTful or RPC services, the data format that fits best can be chosen. The most common options are JSON and XML, but there are also services that handle incoming form-encoded data formats, outgoing HTML formats, serialized PHP formats, YAML, and even plain text.
We saw in Chapter 6 some examples of XML being used with an RPC service, and SOAP is XML underneath. However, XML has plenty more applications than just SOAP, and can be used as the data format (or a data format) in any one of a number of different styles of service. XML allows us to mark up elements with child elements, character data, and also attributes, but produces quite a large data size in return. Therefore, XML would do well when the bandwidth used for the transfers isn’t slow or expensive, and the devices consuming the data have enough memory and processing power to handle and parse the data.
JSON is great for JavaScript applications, but they’re not the only target market for this format. The majority of scripting languages have built-in support for JSON and will be able to serve and consume this format easily. JSON is also a great choice for mobile applications, where the smaller overall data size and simplicity of parsing the format are very useful for less powerful devices on potentially slow, patchy, or expensive connections.
HTML as a data format is an idea that isn’t found in many textbooks, but certainly shows up in the real world on a regular basis. In its simplest form, we might return HTML in response to an AJAX request from a webpage, perhaps showing some new content in HTML on the page (something that you may already feature in your applications). It doesn’t take a huge leap of faith from this to providing HTML as an optional output format for an API, if only for reading data. An example of this is found in the RESTful Joind.in API, where HTML is offered as an output format; if you request http://api.joind.in from your browser, the API reads your Accept
headers and returns the data as HTML, with the hypermedia presented as clickable hyperlinks. This serves as excellent documentation for your service.
Accepting incoming requests from a web form, or in that format, can also be very web-friendly if the users of the API are mostly web developers and it is likely to be used mostly with or from a web page. This is a step away from the pure idea of exchanging data between machines, but can be a valuable option depending on the audience of the API.
If the user stories show that different consumers will want different data formats, then the API will need to return multiple formats such as XML, JSON, and perhaps HTML as well. This approach has major advantages because every consumer of your service will be able to ask for the data in the format that is right for their scenario, using content negotiation via HTTP headers to indicate what the right format is. An application that takes care to make use of common templates or output handlers for each data format, used by every response sent, will be able to consistently return data in multiple formats.
The links to resources and related data in an API are called hypermedia and are an excellent feature to include. In a RESTful service particularly, every resource is identified by its URI and so this data can be given as part of the response data. In this way, consuming clients can follow links, rather like a user clicking links on the Web, instead of assembling the next URL from the instructions and concatenating ID fields into it. Hypermedia makes the whole experience smoother and easier for consumers by offering the ability to find their way around easily. For example, using the previous data set, the following actions are available:
Look at this resource, and then visit the comments_uri
to see the comments made on this talk.
See more information about the event this talk belongs to by visiting the event_uri
.
From there, follow another piece of hypermedia in the talks_uri
field to see a list of other talks at the event.
Another consideration when designing and working with RESTful APIs is whether or not it is useful to send additional nested data with the response to avoid a consumer having to make too many “round trips,” or requests and responses, to get the rest of the information desired. While GitHub and Joind.in both offer user information at their own locations, they also include some nested data in the responses shown here, which the consumer is likely to need.
On the other hand, sometimes too much information can lead to unnecessarily large amounts of data to transfer, and different APIs handle this in different ways. One common pattern is that, by default, a subset of the information is returned, but functionality to retrieve more information is also offered—this is what the Joind.in verbose_uri
offers. Alternatively, the extra information may be made available as a separate resource, such as offering /article/42
as the data about a blog post, but excluding the (potentially large) body of the post, which can then be found at /article/42/body
. Either approach shows consideration to the consumer, but which one is the right fit will depend on any particular scenario.
Some of the common API formats have provisions for controlling how much detail is returned when they request data, or which related data should be included with the response. Let’s look more at some of those data formats.
A single web service can offer a selection of data types, and it’s very common to offer multiple types. Often, these will be JSON or XML, but there can be others; for example, Joind.in will respond to GET
requests with an HTML data type if the Accept
header requests it. The format decision will be made on the server, usually on the basis of the Accept
header (you can read more about content negotiation in “Headers for Content Negotiation”).
Some services will allow a content indicator to be present in the URL itself, but this mixes up the identification of the resource with information about the representation desired. In general, the Accept
header is the “right” way to indicate the preferred format, while supporting an additional URL parameter may lower the barrier of entry, depending on your consumers.
So you might offer two ways of requesting JSON data:
By setting the Accept
header to “application/json”—this would be the preferred method.
Not all clients (or all developers) are capable of setting custom headers, so you might also allow the appending of ?format=json
or the equivalent to a request.
Don’t be tempted to add a .json suffix to your URL to allow clients to request JSON, especially in a RESTful service. The URI should point to a resource without specific format information, and then headers or additional parameters can be used to give extra information to the server about how we’d like that response served.
There are some very good prescribed data formats that you might like to consider for your applications. Beyond touching on JSON-RPC and XML-RPC earlier in the book, two other good candidates to consider are HAL and JSON-API. They do all differ in some ways, but each of them aims to make a scalable and consistent API for users by describing how data and hypermedia can be presented. The following sections have some examples of these formats to give you an idea of what to expect.
HAL (Hypertext Application Language) is a standard that aims to make it easy to traverse an API that you haven’t seen before and find your way around. It can be used with both XML and JSON and aims to describe hypermedia in a useful and importantly a consistent way across APIs.
The best features of HAL are the _links
collections that it uses to bring hypermedia into a known location and a known format. The _links
collections are used both at the top level of a response to include some metadata such as pagination, and also within representations to give related links to the item. The _links
collection has a named key that indicates what this link is, and then an array containing at least an href
value, although other information can also be included.
The result looks something like Example 12-1, which uses the same talk resource as mentioned in the hypermedia section earlier.
{
"_links"
:
{
"self"
:
"http://api.joind.in/v2.1/talks/7660?start=0&resultsperpage=20"
},
"talks"
:
[
{
"_links"
:
{
"self"
:
"http://api.joind.in/v2.1/talks/7660"
,
"verbose_self"
:
"http://api.joind.in/v2.1/talks/7660?verbose=yes"
,
"website_uri"
:
"http://joind.in/talk/view/7660"
,
"comments_uri"
:
"http://api.joind.in/v2.1/talks/7660/comments"
,
"event_uri"
:
"http://api.joind.in/v2.1/events/1056"
},
"talk_title"
:
"Everything You Ever Wanted to Know About Deployment But Were Afraid to Ask"
,
"start_date"
:
"2012-11-08T13:00:00-05:00"
,
"average_rating"
:
5
,
"comment_count"
:
4
}
]
}
HAL also supports embedding related resources within a result set; this is a very common situation to encounter so a standard way of presenting this is very useful. The other big feature in HAL is “curies,” which are a way of describing where the documentation can be found for the endpoints in use. Being able to reach documentation related to the request currently being made can be very helpful to users so this is a nice feature to include. You can read more about HAL in the specification.
Another format that works very well, especially on RESTful APIs, is JSON-API. As the name suggests, it’s for JSON data and much like HAL it sets out some very standard ways of organizing information so that all APIs built along these lines will be familiar. JSON-API describes a series of top-level data elements to use.
errors
is a collection of error items
meta
gives information not directly related to the data being transferred (JSON-API) can handle either a resource or a collection
data
the actual data goes inside this element, always
jsonapi
may be used to describe the server implementation
links
holds links related to the top level of this endpoint and always includes a self
link
included
can contain related resources, often used to avoid the consumer from having to make multiple calls
It is required to return at least one of errors
, meta
and data
, but beyond that this data format can be adapted to fit your needs. The individual resources each should contain an id
and a type
, with its main data in the attributes
section. Resources can also have their own links
and meta
properties.
Using the same talk resource example before, an example of how it might look if returned by a JSON-API service is shown in Example 12-2. As you can see, the two are very similar.
{
"links"
:
{
"self"
:
"http://api.joind.in/v2.1/talks/7660?start=0&resultsperpage=20"
},
"data"
:
[
{
"type"
:
"talk"
,
"id"
:
7660
,
"attributes"
:
{
"talk_title"
:
"Everything You Ever Wanted to Know About Deployment But Were Afraid to Ask"
,
"start_date"
:
"2012-11-08T13:00:00-05:00"
,
"average_rating"
:
5
,
"comment_count"
:
4
},
"links"
:
{
"self"
:
"http://api.joind.in/v2.1/talks/7660"
,
"verbose_self"
:
"http://api.joind.in/v2.1/talks/7660?verbose=yes"
,
"website_uri"
:
"http://joind.in/talk/view/7660"
,
"comments_uri"
:
"http://api.joind.in/v2.1/talks/7660/comments"
,
"event_uri"
:
"http://api.joind.in/v2.1/events/1056"
}
}
]
}
Picking a common format for your API data can be very helpful to allow people to quickly get up to speed and feel “at home.” The various API formats are evolving all the time, so it’s worth taking a look around at the options and reflecting on your goals for the service you want to publish before you choose. If there’s a standard that could fit, always try to use it—standards are always good!
Versioning your API is highly recommended, but of course it isn’t obvious that you need it until you want to start work on v2.0! Including a version number in your URL is a matter of taste. It is a very practical way to offer a service while identifying the current version of that service and opening the door to offering new versions of the service in the future. However, there are alternatives, and an elegant alternative is to use media types. These are invented content types that specifically describe the structure of the resource that will be returned, and can also include version information, so if the structure of a particular resource changes between versions, that change can be conveyed without a URL change.
Not all APIs will support media types, but they are a good way to version representation structures for users who want to be certain that the representations they receive will never change. GitHub has some media type support (their reference page explains the detail very well) that goes beyond the usual application/json
levels. They support media types specific to GitHub (application/vnd.github+json
) and also support using the media type to specify the version of representation that should be returned (application/vnd.github.v3
).
As well as choosing data formats, there are other variables for which the “right” choice to make will differ between the consumers of the API. An easy example is the number of entries you return. Returning all the data is fine…until the application becomes terribly popular, and suddenly the API is returning four thousand records instead of forty! To improve this experience for everyone, APIs often offer pagination of data. As well as giving a way to specify which range of results to return, it is good practice to allow the number of results returned to be customized. A reporting server on a fast network might want all the data, whereas the mobile device with a patchy signal might only want the newest five records.
Another big variable is how much information to return with each request, and this decision usually manifests in two forms. When returning information about a particular item, should all the information be returned? And the follow-up question: should any related data be returned also? Including data means we’ll sometimes be returning more information than needed, a bit like doing SELECT * FROM…
in SQL. But if you omit data, then some consumers will have to make a large number of requests to obtain what they need.
Since we already used GitHub as an example and they do this rather nicely, their gist data format will be a nice example to use. You can see it in Example 12-3.
{
"url"
:
"https://api.github.com/gists/ed972482e08ccddfc993"
,
"forks_url"
:
"https://api.github.com/gists/ed972482e08ccddfc993/forks"
,
"commits_url"
:
"https://api.github.com/gists/ed972482e08ccddfc993/commits"
,
"id"
:
"ed972482e08ccddfc993"
,
"git_pull_url"
:
"https://gist.github.com/ed972482e08ccddfc993.git"
,
"git_push_url"
:
"https://gist.github.com/ed972482e08ccddfc993.git"
,
"html_url"
:
"https://gist.github.com/ed972482e08ccddfc993"
,
"files"
:
{
"text.txt"
:
{
"filename"
:
"text.txt"
,
"type"
:
"text/plain"
,
"language"
:
"Text"
,
"raw_url"
:
"https://gist.githubusercontent.com/lornajane/ed972482e08ccddfc993/raw/336516c8e23e55265245bf589ae56aafa9cbbcf2/text.txt"
,
"size"
:
18
,
"truncated"
:
false
,
"content"
:
"Some riveting text"
}
},
"public"
:
true
,
"created_at"
:
"2015-07-23T18:30:11Z"
,
"updated_at"
:
"2015-08-29T14:25:41Z"
,
"description"
:
"Gist created by API"
,
"comments"
:
0
,
"user"
:
null
,
"comments_url"
:
"https://api.github.com/gists/ed972482e08ccddfc993/comments"
,
"owner"
:
{
"login"
:
"lornajane"
,
"id"
:
172607
,
"avatar_url"
:
"https://avatars.githubusercontent.com/u/172607?v=3"
,
"gravatar_id"
:
""
,
"url"
:
"https://api.github.com/users/lornajane"
,
"html_url"
:
"https://github.com/lornajane"
,
"followers_url"
:
"https://api.github.com/users/lornajane/followers"
,
"following_url"
:
"https://api.github.com/users/lornajane/following{/other_user}"
,
"gists_url"
:
"https://api.github.com/users/lornajane/gists{/gist_id}"
,
"starred_url"
:
"https://api.github.com/users/lornajane/starred{/owner}{/repo}"
,
"subscriptions_url"
:
"https://api.github.com/users/lornajane/subscriptions"
,
"organizations_url"
:
"https://api.github.com/users/lornajane/orgs"
,
"repos_url"
:
"https://api.github.com/users/lornajane/repos"
,
"events_url"
:
"https://api.github.com/users/lornajane/events{/privacy}"
,
"received_events_url"
:
"https://api.github.com/users/lornajane/received_events"
,
"type"
:
"User"
,
"site_admin"
:
false
},
"forks"
:
[
],
"history"
:
[
{
"user"
:
{
"login"
:
"lornajane"
,
"id"
:
172607
,
"avatar_url"
:
"https://avatars.githubusercontent.com/u/172607?v=3"
,
"gravatar_id"
:
""
,
"url"
:
"https://api.github.com/users/lornajane"
,
"html_url"
:
"https://github.com/lornajane"
,
"followers_url"
:
"https://api.github.com/users/lornajane/followers"
,
"following_url"
:
"https://api.github.com/users/lornajane/
following{/other_user}"
,
"gists_url"
:
"https://api.github.com/users/lornajane/gists{/gist_id}"
,
"starred_url"
:
"https://api.github.com/users/lornajane/
starred{/owner}{/repo}"
,
"subscriptions_url"
:
"https://api.github.com/users/lornajane/subscriptions"
,
"organizations_url"
:
"https://api.github.com/users/lornajane/orgs"
,
"repos_url"
:
"https://api.github.com/users/lornajane/repos"
,
"events_url"
:
"https://api.github.com/users/lornajane/events{/privacy}"
,
"received_events_url"
:
"https://api.github.com/users/lornajane/received_events"
,
"type"
:
"User"
,
"site_admin"
:
false
},
"version"
:
"a06b78d732925104c79256e58e84f74af8c579f2"
,
"committed_at"
:
"2015-07-23T18:30:11Z"
,
"change_status"
:
{
"total"
:
1
,
"additions"
:
1
,
"deletions"
:
0
},
"url"
:
"https://api.github.com/gists/ed972482e08ccddfc993/a06b78d732925104c79256e58e84f74af8c579f2"
}
]
}
This example is a nice combination of nested and linked data. It’s quite long, but it also means that consumers don’t need to make a lot of requests to get the data they are likely to want. For example, the owner information is included and has most of the data you get from the user URL itself, but the comments aren’t nested and instead we get a count and a link to them.
Some APIs allow the user to specify which nested items should be included, or which fields should be returned, which can be handy if, for example, one of the fields is very large. A great example of this is JSON-API, which supports both of these via its include
and fields
parameters.
For each service that is built, an important part of the design process is to make decisions about some of these elements. Whatever you decide for your own applications, make sure that you are consistent across your API and that if you do add optional ways of customizing output, that these are well documented.
It’s important to offer users some choice, but also to offer a simpler path so that people can jump straight in and use your API without having to set up too many options. Every customizable option should have a default value that is returned if no preference is stated. Are you missing the Accept
header? Send JSON. You don’t have any pagination settings? Send the first 25 results. This approach allows people to get the best of the API very quickly and easily, and they can delve deeper to change the defaults if their requirements don’t fit well with the defaults chosen.
Consider whether or not you will comply with all requests, though; if a consumer requests 1,000 results that might be expensive for your API to generate, you may still only send the first 200 (or whatever makes sense for your system). Similarly, some APIs will benefit from having rate limits. This means that each client can only make a certain number of requests in a given time period. Many APIs allow a very limited number of requests for unregistered users, and may allow differing levels of access to different customers, particularly for paid-for apps. Rate limiting is a way of making sure that you guarantee an expected level of service to all users by managing the load on your servers and allowing different users to have a level of access that suits them.
This philosophy of making things easy and useful to users, with minimal effort on their part, makes the barrier to entry much lower for your application and makes the experience of using a new API one of tolerance and welcome.