/ graphql

My Argument for GraphQL

First things first, I obviously haven't been around -- I mean how obsessed can I be with programming if I never stream my thoughts. Well, I guess you could say really since I rarely take time to come here and talk about things I've learned because I just keep consuming things. That's enough about that though, I'm here today and the things I want to epxress are my reasonings for getting on board the GraphQL bandwagon. This is nothing new, I've been a fan for at least the last two years, I even tried selling it in an interview. Heh. Well, now I'm putting my thoughts out there so I have a resource I can refer friends and coworkers to when I'm trying to make a big pitch.

Graphs

I don't think anyone really harps on this aspect of things all that much, maybe it's not important. But it's very important to me. Early on in my career I was assigned to put together an experimental project: put a webserver into a Graph Database. It sounds crazy, it is crazy but it's pretty damn cool (I may be biased, but still). This was my first time really working with graphs. While I had written a lot of personal stuff for myself and for class before this job, this was my first real project to lead myself, and I was totally green to the concept of graphs and graph databases -- and that's the beauty. Because graphs just make sense. I mean, high-level at least, the underlying bits of optimizing graph operations isn't always a cakewalk. At the high level, building a website by linking nodes together (each node served a perpose like HTML/CSS/backend code/client code/clien libraries, etc...) just really made a lot of sense. When the POC was up and demonstrable the other team members were pretty blown away by how much sense it made. Need jQuery on this page? Link the jQuery library node. Need some code behind it? Link some code nodes, build your router with route nodes, data was stored alongside the websites structure. I'm not going to say it was genius but it was awesome.

This tangent is relevant because this I was able to really grow and appreciate the simplicity a graph can provide to some types of problems (as with anything, graphs aren't for everyone and everything). This basic understanding of how to use a graph for storage and access and the kinds of operations you can do with graphs was eye opening. My first experiences growing beyond the basic REST API the Neo4j had to using the Cypher query language allowed me to more than halve the response times the server could generate. Where before I was doing some relatively slow node-by-node queries to get everything connected I was able to query from the route all the way down to all the connected view nodes in one pass and then immediately start building the response. It was awesome how I could just ask for what I needed. Each query could be customized. When another team member joined the project he started building the UI, and I was building a set of endpoints for him to use to fetch Node data -- it was amazing how easy it was to reuse a lot of the code around the queries and respond with only what was needed. When the UI loaded it fetched the root route node, which would show it's type and some data, and a few bits of connection info -- when clicked it would expand and fetch basic information (like type) from the nodes connected to it which could be previewed and we could later perform full requests for full node data. This all happened with very few actual endpoints, and the queries were largely the same, the only major difference was what was being fetched.

This mirrors the main aspect I like of GraphQL -- it is your API. REST is awesome, I'm not here to trash it, and I'm not a GraphQL purist, if it makes sense for your project to have REST endpoints and a GraphQL endpoint, more power to you. GraphQL is not the solution to every problem (after all not all problems are nails, sometimes you have to get a screwdriver). With GraphQL you don't really design the API in the same traditional way you normally would, you kind of just describe your data and what you can do to it. And then the query language (the QL part of GraphQL) lets your users do what they want with data. Your profile menu can fetch just the users display name and avatar URL, a user show page can fetch more information about a user -- you don't have to write these as two different things at all. You can just have a user field that returns a user object, and you've defined what fields can be fetched on that object.

What GraphQL Solves

Let's take a look at how you might develop a REST endpoint, and compare to how this would evolve in GraphQL. I'm aware this is anecdotal and in favor of my pitch for GraphQL but it's also based on experiences I've had at many places I've worked in the past.

So let's assume we have a blogging site, they're pretty simple to understand the data layout you'd expect. A user can own a blog, with posts, posts can have comments, users make said comments, etc...

We would most like start with some core routes, GET/PATCH /blogs/:id and POST /blogs and that works out pretty well. Our GET /blogs/:id will return a basic set of blog information, we'll return the "body", "title" and include some "author" details (for now "username"), and some comments, "body" and "author" information for them. We'll just toss it into a JSON:

{
  "title": "This is a title",
  "body": "This is the content for the blog post.",
  "author": {
    "username": "imauser"
  },
  "comments": [
    {
      "body": "This is the body of a comment",
      "author": {
        "username": "imnotthesameuser"
      }
    }
  ]
}

This is a great start. This gets us by with our blog show page and everything works. Then we need to add our index page, and we don't need the full body of teh post. The body could be massive, and we only want to a small portion of it. So we'll assume we've thrown in some summarization tools when the blog is created and we create our index response:

[
  {
    "title": "Blog Title",
    "summary": "Blog sumar...",
    "author": {
      "username": "imathirduser"
    }
  }
]

Now we start to see two issues wihout our budding JSON API, first off one request returns an object, one request returns an array. Maybe a quick solution is to make them all return objects like {"post": ...} and {"posts": ...} and we can get what we need off our root key. Solved that, next we see with have the author object repeated. We use that in a few different places. So we set up the backend so that all three use the same JSON renderer for the author data.

The next big feature we add is private posts. This requires users to be authenticated to read blog articles (okay, not that private, but still).
So now when an authenticate requests come through we need to display some error messages. Our first attempt, just testing our logic works, has us return something like:

"You must be logged in to view this post."

Now, we've already established that responses should probably follow some kind of pattern, like always an object so we work that into our current flow:

{
  "post": {
    "error": "You must be logged in to view this post."
  }
}

But that feels ugly. Now the front-end code has to know what key the positive path expects to fetch the error so we're going to have to write custom error handling for every request we make -- let's make this more generic friendly:

{
  "error": "You must be logged in to view this post."
}

All right, now we know if we have an error key, we can render our error feature and if we don't have an error key we have a successful response and can move on down the happy path.

Now we want to add profile pages, where you can view information about a user. We already have our basic user renderer that renders the username, and we know we don't really want to add a bunch of data for it so we add a second user renderer (similar to how we have two post renderers) to render more data about a user for this profile page -- but we also want to reuse our index post render but just for the user being viewed. So we take that effort to make the index post renderer a bit more generic and usable from any context and we have our profile page data.

Right now, Our API is starting to grow and we have no real hard decisions made on structure. Since we're about to expose this API to the public we decide to pick a clear standard we can point to when writing responses for not only ourselves and developers but for users consuming our API, so we select something like JSON:API. That changes our responses to something like:

{
  "data": {
    "type": "post",
    "id": "100",
    "attributes": {
      "title": "...",
      "body": "...",
      "summary": "...",
      "author": {
        "id": "10",
        "type": "user"
      }
    }
  },
  "included": [
    {
      "data": {
        "type": "user",
        "id": "10",
        "attributes": {
          "username": "imauser",
          "posts": [
            {
              "type": "post",
              "id": "100"
            },
            {
              "type": "post",
              "id": "110"
            }
          ]
        }
      }
    }
  ]
}

Whew, that's a big one. But we're standardized. If you notice we went and combined our small vs. large renderers becuase for the intital release it was too much trouble to include more than one renderer per type. We'll add that in later, and when we do we want to make sure we can optionally include data too. So we add a global include API variable so that we can decide what we need. GET /posts/:id?include=comments,author or perhaps we just need author GET /posts?include=author like for the listing page. Thing start to get quite a bit more complex at this point.

Now with more Graphs

So now let's rewind time. We're building a blog hosting site, and we've chosen GraphQL. Right off the bat this works out great because we have

A) A format for requests (the query language) and
B) we have a format for reponses, GraphQL will return a "data" keyed with our requested fields (more specifics soon) or an "errors" field with errors.

Next, we need to enable the blog show page. At this stage we have three resources and we'll throw them into our schema.

type User {
  id: ID!
  username: String!
  posts: [Post]
  comments: [Comment]
}

type Post {
  id: ID!
  title: String!
  body: String!
  author: User!
  comments: [Comment]
}

type Comment {
  id: ID!
  body: String!
  author: String!
}

type Query {
  post(id: ID!): Post
  posts: [Post]
  user(id: ID!): User
}

type Mutation {
  createPost(title: String!, body: String!, author: ID!): Post
}

schema {
  query: Query
  mutation: Mutation
}

(Forgive me if this isn't technically 100% the correct schema definition structure for GraphQL, I've only every used libraries with DSL/methods for defining said schemas, the notations should be correct though)

First things first. By mearly defining this schema we've exceeded feature partiy with the REST API we ended with. This assumes we've already written our resolution functions on the backend. These functions are pretty simple, no different than a REST handler, for example, if you were to run the query:

query getPost {
  post(id: 10) {
    title
  }
}

The resolver for the post(id: ID!) field would essentually look like SELECT * FROM posts WHERE id = ? and you'd substitue the value given for the id: argument. From there, and most frameworks handle this automatically, resolving the title field on post would just look up that key in the data for post. The specifics of which heavily depend on language and framework in use. This basic data resolves are so common they're usally the zero-value for a resolution function.

So what did this buy us? Well, we now have all the endpoints we need for our personal use and public use. Need a post for the show page?

query getPost {
  post(id: 10) {
    title
    body
    author {
      username
    }
    comments {
      body
      author {
        username
      }
    }
  }
}

You can even get fancy, we do author { username } in more than one place:

fragment authorName {
  author {
    username
  }
}

query getPost { 
  title
  body
  ...authorName
  comments {
    body
    ...authorName
  }
}

Need all the posts for the show page:

query getAllPosts {
  posts {
    title
    summary
    author {
      username
    }
  }
}

And so on.

Defining the objects, and corresponding resolution functions, enables us to handle any type of request we might need on those objects. We're also rather seamlessly turning our basic SQL database into a fancy graph Databse, how so? Well look (with our current schema):

query fancyGraphs {
  post(id: 10) {
    author {
      posts {
        comments {
          author {
            posts
          }
        }
      }
    }
  }
}

This is essentially: "Give me every post, for every comment author on all the posts for the user that wrote this post." This is most definitely a non-performant query without some intense optimizations and probably not useful at all. But by the expressiveness gained from GraphQL within the confines of it's query we were able to easily build this graph-esque request on top of an ordinary, non-graph datastore.

In Conclusion

So, while not necessarily a 101 on GraphQL, I hope you've learned something about why GraphQL is beneficial and how it works. Our Schema not only defined what data types we have, and their fields, but via the query and mutation side of things we basically exposed our root endpoints. The schema also has the benefit of become relatively self documenting. And tools like GraphiQL can leverage schema data to preset you with documentation for your project. On top of that, we have a standardized JSON format for responses defined for us up front, we don't need to worry about how many different request formats we're going to end up with. And we can modify these requests to grow and change as our data grows and changes. Adding in that "private" blog feature? Just throw a private: Bool! on the type Post and you're good to go. All of your queries can now start leverage this new data.

I think it's important to end on a note on when GraphQL isn't the answer. As you may have potentially noticed, GraphQL hinges on a predefined schema. Most libraries I've used involve a relatively static Schema definition at runtime and honestly I think that should be honored. If you're working in a way to dynamically add fields to your schema at runtime, that's a symptom that GraphQL is not the answer to your problem. Likewise if you're building something that takes on a large set of unknown or dynamic data (like a CMS with a map of user defined keys/values) that also doesn't really fit into the scheme of GraphQL. You can define scalars, and you can handle this if you really want to but I don't recommend it. My recommendation is to supplement with a REST endpoint to fetch that dynamic data. It's fine to return a URL, or ID of some kind instead of the object in GraphQL and follow up with a REST call -- so don't try and force everything you need into GraphQL.

I hope this has provided some insight into the REST vs. GraphQL debate. While it may not be perfect for everything, I think GraphQL is a great solution to 99% of the problems we face writing JSON APIs.