Lesson 33. Capstone: SQL-like queries in Haskell

Using the Monad type class to create SQL-like queries on lists
Generalizing functions written for one Monad (for example, List) to many
Organizing functions with types

In the preceding lesson, you saw how List as a Monad can also be understood as a list comprehension, which is used heavily in Python. In this capstone, you’re going to take your use of lists and monads one step further to create a SQL-like interface to lists (and other monads). SQL is used as the primary means to query relational databases. It has a clean syntax for representing the relationships between data. For example, if you had data about teachers teaching classes, you could query the teachers teaching English like this:

select teacherName from
teacher inner join course
on teacher.id = course.teacherId
where course.title = "English";

This allows you to easily combine two data sets, the teacher table and course table, to extract relevant information. You’ll build a set of tools you’ll call HINQ (borrowing its name from the similar .NET tool LINQ). HINQ will allow you to query your data relationally. You’ll make extensive use of the Monad type class to make this possible. In the end, you’ll have a query tool that

Provides a familiar interface for querying relational data in Haskell
Is strongly typed
Uses lazy evaluation to allow you to pass around queries without executing them
Can be used seamlessly with other Haskell functions

You’ll start by making some select queries on a list, learn to filter your queries with a where function, and finally build a join function that allows you to easily combine complex data inside a monad.

33.1. Getting started

Let’s start with some basic data. You can put everything you need in a file named hinq.hs. Because you want to see how well you can treat lists such as tables in a relational database, you’ll use an example involving students, teachers, courses, and enrollments. Figure 33.1 illustrates this setup.

Figure 33.1. The relationships among the basic data you’ll be working with

You’ll start with modeling your student. Each student has a name, which you’ll keep to firstName and lastName.

Listing 33.1. A simple `Name` data type with its `Show` instance

data Name = Name
            { firstName ::String
            , lastName :: String }

instance Show Name where
   show (Name first last) = mconcat [first," ",last]

Then each student has a grade level.

Listing 33.2. `GradeLevel` represents the student’s grade

data GradeLevel = Freshman
   | Sophmore
   | Junior
   | Senior deriving (Eq,Ord,Enum,Show)

In addition to these two things, you’ll include a unique student ID. Here’s your Student data type.

Listing 33.3. The `Student` data type

data Student = Student
   { studentId :: Int
   , gradeLevel :: GradeLevel
   , studentName :: Name } deriving Show

And you want a list of students you can play with.

Listing 33.4. A list of example students that you can query

students :: [Student]
students = [(Student 1 Senior (Name "Audre" "Lorde"))
           ,(Student 2 Junior (Name "Leslie" "Silko"))
           ,(Student 3 Freshman (Name "Judith" "Butler"))
           ,(Student 4 Senior (Name "Guy" "Debord"))
           ,(Student 5 Sophmore (Name "Jean" "Baudrillard"))
           ,(Student 6 Junior (Name "Julia" "Kristeva"))]

With just this list, you can move on to building out your basic operations, select and where. In addition to select and where, you’ll also want a join function that will allow you to combine two lists based on a common property.

Thinking in types, you can reason about these three functions as follows: Your select function needs to take a function representing the property being selected and a list of items to select from. Then the result will be a list of the selected property. Figure 33.2 shows the type signature of select.

Figure 33.2. The type signature of your select function

Your where function will take a test function (which is just a function from a -> Bool) and a list and will return only the remaining values in the list. The type signature of where is shown in figure 33.3.

Figure 33.3. The type signature for your where function

Finally, your join function will take two lists of potentially different types, and then two functions that extract a property from each list. It’s important that both properties are of the same type and instances of Eq so they can be compared. The result of join will be a list of tuples of the matching values from the original lists. Here’s the type signature for the join function (we’ll explain this in more detail when you implement it):

Eq c => [a] -> [b] -> (a -> c) -> (b -> c) -> [(a,b)]

Next you’ll get started with implementing your select and where functions, which will make it easy to perform simple queries on lists.

33.2. Basic queries for your list: select and where

The functions you’ll start with are select and where. The select clause in SQL allows you to select properties from a table:

select studentName from students;

This query gives you all the student names from the students table. In the case of your HINQ queries, you’d expect select to give you all the names from a list of students. The where clause in SQL allows you to condition your select on a given value:

select * from students where gradeLevel = 'Senior';

In this SQL statement, you’d select all the entries in the student table that have a grade level of Senior. Notice that in most databases, you’d have to represent the grade level as a String, but in Haskell you get the benefit of using a specific type.

Now you can move on to implementing these as functions. You’ll preface all the functions for HINQ with an underscore (_), not only because you haven’t covered modules and want to avoid collision, but also because where is a reserved keyword.

33.2.1. Implementing _select

The easiest operation to implement is _select. The _select function works just like fmap, only you’ll use the Monad syntax in this case.

Listing 33.5. The `_select` function is just `fmap`

_select :: (a -> b) -> [a] -> [b]
_select prop vals = do
   val <- vals
   return (prop val)

Here are a few examples of selecting properties from your students in GHCi:

GHCi> _select (firstName . studentName) students
["Audre","Leslie","Judith","Guy","Jean","Julia"]
GHCi> _select gradeLevel students
[Senior,Junior,Freshman,Senior,Sophmore,Junior]

This example may make it seem like _select can choose only a single property, but you can easily use a lambda to make a single function that selects two properties:

GHCi> _select (\x -> (studentName x, gradeLevel x)) students
[(Audre Lorde,Senior),(Leslie Silko,Junior),(Judith Butler,Freshman),
(Guy Debord,Senior),(Jean Baudrillard,Sophmore),(Julia Kristeva,Junior)]

Even after all the tricks of functional programming you’ve learned so far, it’s easy to forget how much power combining first-class functions with lambda functions can provide.

One more thing to notice is that your _select function is strictly less powerful than fmap solely because of its type signature. If you had literally defined _select as _select = fmap, your _select function would work on all members of the Functor type class. Later in this capstone, you’ll refactor your code (though for monads), but it’s worth realizing just how powerful a type signature can be.

33.2.2. Implementing _where

Your _where function will also be surprisingly simple. You’ll create a simple wrapper around guard, which will take a test function and your list. Remember, to use guard, you’ll have to import Control.Monad. The _where function is more complicated than _select because it’s not just the guard function (whereas _select could be defined as fmap). You’ll use the <- assignment to treat your list like a single value and then use your test function with guard to filter out the results that don’t pass your test.

Listing 33.6. `_where` allows you to filter your queries

_where :: (a -> Bool) -> [a] -> [a]
_where test vals = do
         val <- vals
         guard (test val)
         return val

To show off _where, you’ll start with a helper function that tests whether a String starts with a specific letter.

Listing 33.7. Check whether a `String` starts with a particular character using `startsWith`

startsWith :: Char -> String -> Bool
startsWith char string = char == (head string)

Now you can use where and select together to select only the students with names starting with J:

GHCi> _where (startsWith 'J' . firstName) (_select studentName students)
[Judith Butler,Jean Baudrillard,Julia Kristeva]

With the basics of _select and _where down, you can start to look at the heart of relational queries: _join.

33.3. Joining Course and Teacher data types

Before you join two data sets, you need to create more data. You’ll look at teachers and courses next. Your Teachers have a teacherId and a teacherName.

Listing 33.8. A `Teacher` data type

data Teacher = Teacher
 { teacherId :: Int
 , teacherName :: Name } deriving Show

Here are some example teachers.

Listing 33.9. A list of example teachers

teachers :: [Teacher]
teachers = [Teacher 100 (Name "Simone" "De Beauvior")
           ,Teacher 200 (Name "Susan" "Sontag")]

Your courses have a courseId, a courseTitle, and a teacher. The teacher is an Int that represents the teacherId of the Teacher leading the Course.

Listing 33.10. `Course` data type references a `Teacher` by ID

data Course = Course
 { courseId :: Int
 , courseTitle :: String
 , teacher :: Int } deriving Show

And you need some examples.

Listing 33.11. A list of example courses

courses :: [Course]
courses = [Course 101 "French" 100
          ,Course 201 "English" 200]

Now what you want is to join these two data sets. In SQL terms, the join you’re describing is an inner join, meaning that you care only about matching pairs. In SQL, the following query returns pairs of teachers and the class they teach:

select * from
teachers inner join courses
on (teachers.teacherId = courses.teacher);

You’re going to assume that for your _join, you’ll be checking to see whether a given property of data in one list is equal to another property in another list. This will have a rather large type signature. What you want to pass into your _join function is two lists, and then a function to select the property to join those lists on, and finally it will return those lists combined. Figure 33.4 shows the type signature to help you understand the process.

Figure 33.4. Reading the type signature for `_join`

You’ll create your _join the same way a join works in the relational algebra used to create databases. You’ll start by computing the Cartesian product of your two lists (by itself, this is a cross join in SQL). The Cartesian product is the combination of all possible pairs. Before you return the result, you’ll filter these pairs out by matching property values (based on the properties you passed in, as shown in figure 33.5).

Figure 33.5. The `_join` function joins two data sets on matching properties.

You can use _join to combine your teachers and courses:

GHCi> _join teachers courses teacherId teacher
[(Teacher {teacherId = 100, teacherName = Simone De Beauvior},
Course {courseId = 101, courseTitle = "French", teacher = 100}),
(Teacher {teacherId = 200, teacherName = Susan Sontag},Course
{courseId = 201, courseTitle = "English", teacher = 200})]

With the three major parts of your query language together, you can start packaging _select, _where, and _join into an easier-to-use format.

33.4. Building your HINQ interface and example queries

You want to make it a bit easier to stick together the pieces of your query. Here’s an example of using all three functions to find a list of English teachers (there’s only one).

Listing 33.12. You need a way to pleasantly combine `_join`, `_select` , and `_where`

joinData = (_join teachers courses teacherId teacher)
whereResult = _where ((== "English") . courseTitle . snd) joinData
selectResult = _select (teacherName . fst) whereResult

This solution is okay, but what you want is your query to feel like a SQL query. Typically, SQL queries are structured like this:

select <elements> from <data> where <tests>

Your data can be either a list or data created from joining two lists together. You want to be able to restructure your query so it looks like this:

(_select (teacherName . fst))
(_join teachers courses teacherId teacher)
(_where ((== "English") .courseTitle . snd))

You can achieve this by using lambda functions to restructure your code so it looks the way you want it to. You’ll create a function called _hinq that will take the _select,_join, and _where queries in the order you expect and then use lambdas behind the scenes to restructure everything.

Listing 33.13. `_hinq` function allows you to restructure your query

_hinq selectQuery joinQuery whereQuery = (\joinData ->
                                          (\whereResult ->
                                            selectQuery whereResult)
                                          (whereQuery joinData)
                                          ) joinQuery

The _hinq function can be used to run your query. Obviously, this code isn’t a perfect replication of SQL or LINQ, but it’s pretty close and allows you to think about combining two lists in the same way you would a relational query. Here’s the previous query restructured using your _hinq function.

Listing 33.14. Using `_hinq` allows you to approximate SQL syntax in Haskell

finalResult :: [Name]
finalResult = _hinq (_select (teacherName . fst))
                    (_join teachers courses teacherId teacher)
                    (_where ((== "English") .courseTitle . snd))

There’s one small annoyance left. Suppose you want to get the first names from your finalResult for all teachers, not just those teaching English. To do this, you wouldn’t need a _where clause. You can solve this by using (\_ -> True), which will automatically make everything True.

Listing 33.15. One possible solution to a missing `_where`

teacherFirstName :: [String]
teacherFirstName = _hinq (_select firstName)
                         finalResult
                         (_where (\_ -> True))

This works, but it’s no fun to have to remember to pass in this universally true statement. And Haskell doesn’t support default arguments. How can you make an easier way to deal with cases with missing where clauses? You’ll use a HINQ type that will have two constructors.

33.5. Making a HINQ type for your queries

In this section, you’ll create a HINQ type that represents a query. You know that a query can be a select clause, join/data clause, and a where clause, or just the first two. This will allow you to run queries with and without _where statements. Before moving on, though, you need to make one more improvement in your _select, _where, and _join functions. Currently, these all operate on lists, but you can generalize this so they work on other monads. To fix this, you don’t need to change your code at all, only make your type signatures less restrictive. But you’ll have to add a type class constraint. The guard function works on types of the Alternative type class. Alternative is a subtype of Applicative, and includes defining an empty element for the type (a lot like Monoid). Both List and Maybe are members of Alternative, but IO isn’t. To use the Alternative type class, you need to import Control.Applicative. Here are your refactored type signatures that will extend the power of your HINQ queries.

Listing 33.16. `_select`, `_where`, and `_join` functions can work for all monads

_select :: Monad m => (a -> b) -> m a -> m b
_where :: (Monad m, Alternative m) => (a -> Bool) -> m a -> m a
_join :: (Monad m, Alternative m, Eq c) => m a -> m b ->
                                            (a -> c) -> (b -> c) -> m (a,b)

This is a great example of why writing monadic code is so useful. You start by solving your problem for just the List type. But you can make your code dramatically more generalized by changing the type signatures! If you had used list-specific functions, such as map and filter, this would require much more work to refactor. Now that your types are refactored, you can make a generic HINQ type to represent the queries you’re interested in running, as shown in figure 33.6.

Figure 33.6. Understanding the `HINQ` data type

This constructor uses the types of your _selector, _join, and possibly _where functions. With this data type, you can write a runHINQ function that takes a HINQ type and runs the query.

Listing 33.17. `runHINQ` function allows you to execute HINQ queries

runHINQ :: (Monad m, Alternative m) => HINQ m a b -> m b
runHINQ (HINQ sClause jClause wClause) = _hinq sClause jClause wClause
runHINQ (HINQ_ sClause jClause) = _hinq sClause jClause
                                                (_where (\_ -> True))

The other benefit of having the HINQ type is that it clarifies the originally long type you were working with. Let’s run a few queries to see how it does!

33.6. Running your HINQ queries

With your HINQ type, you can start exploring different queries you might want to run. You’ll start by revisiting your English teacher query. Here’s the full HINQ query with a type signature:

query1 :: HINQ [] (Teacher, Course) Name
query1  = HINQ (_select (teacherName . fst))
               (_join teachers courses teacherId teacher)
               (_where ((== "English") .courseTitle . snd))

Because Haskell uses lazy evaluation, simply defining this query doesn’t run it. This is great because it emulates the behavior of .NET LINQ (which also uses lazy evaluation) and it means you can pass around expensive computation without worrying about running the queries until you need the result. Another great thing about HINQ is that it’s strongly typed. You’ll easily be able to find bugs in your query because Haskell’s type checker will yell at you. Because of type inference, you can always choose to leave the type out for quick queries. Run query1 and look at the result:

GHCi> runHINQ query1
[Susan Sontag]

If you want to select teacher names from a similar data set, you can omit the where clause and use HINQ_:

query2 :: HINQ [] Teacher Name
query2 = HINQ_  (_select teacherName)
                teachers

This is the same as using _select on teachers by itself, but it shows that your query type works even for extremely simple cases. You can see that you get the results you expect in GHCi:

GHCi> runHINQ query2
[Simone De Beauvior,Susan Sontag]

Lists are the most common use of something like HINQ, but remember that you refactored it to work with all members of Monad and Alternative. Next you’ll look at an example of querying a Maybe.

33.6.1. Using HINQ with Maybe types

It’s not hard to imagine that you could end up with a Maybe Teacher and a Maybe Course. Just because you don’t have a list of values doesn’t mean you don’t want to join your teacher with the course. Here’s an example of a possible Teacher and a possible Course.

Listing 33.18. Because it’s written for monads you can query `Maybe` types

possibleTeacher :: Maybe Teacher
possibleTeacher = Just (head teachers)

possibleCourse :: Maybe Course
possibleCourse = Just (head courses)

Running a query with a Maybe type means that you’ll get results only if the query doesn’t fail. It can fail from missing data or because it doesn’t find a match. Here’s your English teacher query again for Maybe types.

Listing 33.19. An example of a `Maybe` query

maybeQuery1 :: HINQ Maybe (Teacher,Course) Name
maybeQuery1  = HINQ (_select (teacherName . fst))
                    (_join possibleTeacher possibleCourse
                                           teacherId teacher)
                    (_where ((== "French") .courseTitle . snd))

Even in a Maybe context, you can still think relationally, run queries, and get your results:

GHCi> runHINQ maybeQuery1
Just Simone De Beauvior

If you had a missing course, you can still safely run the query.

Listing 33.20. You can join `Maybe` data and easily handle the case of missing data

missingCourse :: Maybe Course
missingCourse = Nothing

maybeQuery2 :: HINQ Maybe (Teacher,Course) Name
maybeQuery2  = HINQ (_select (teacherName . fst))
               (_join possibleTeacher missingCourse teacherId teacher)
               (_where ((== "French") .courseTitle . snd))

In GHCi, you can see that the missing data is still handled safely:

GHCi> runHINQ maybeQuery2
Nothing

You’ll end this capstone by using HINQ to solve a more complicated problem involving multiple joins.

33.6.2. Joining multiple lists to get all enrollments

Next you’ll look at querying your data to determine course enrollment. To do this, you need another data type to represent an enrollment. Enrollment is a student ID and a course ID. Here’s the type you’ll use to represent this.

Listing 33.21. `Enrollment` relates a `Student` to a `Course`

data Enrollment = Enrollment
   { student :: Int
   , course :: Int } deriving Show

You can represent all of your student enrollments by creating a list of them, each pairing a student’s ID with a course ID.

Listing 33.22. A list of example enrollments

enrollments :: [Enrollment]
enrollments = [(Enrollment 1 101)
              ,(Enrollment 2 101)
              ,(Enrollment 2 201)
              ,(Enrollment 3 101)
              ,(Enrollment 4 201)
              ,(Enrollment 4 101)
              ,(Enrollment 5 101)
              ,(Enrollment 6 201) ]

Suppose you want to get a list of all the students’ names paired with the name of the course they’re enrolled in. To do this, you need to join students with enrollments, and then join the result of that with courses. You can get all of the student enrollments in one go by using a HINQ_ query. This is a great example of the occasional times you may want to take advantage of type inference. How your queries combine types can get complicated, so writing out the full type signature can be tough. Thankfully, type inference takes care of all the work for you! This query will join students and enrollments to get a list of courses the students are enrolled in.

Listing 33.23. Queries students and the course they’re enrolled in

studentEnrollmentsQ = HINQ_ (_select (\(st,en) ->
                                       (studentName st, course en))
                            (_join students enrollments studentId student)

Even though you didn’t want to have to worry about the type signature of the query, you know the result should be a Name and an Id. When you run this query, you can make sure that this is the type of your result.

Listing 33.24. Running a query for `studentEnrollments`

studentEnrollments :: [(Name, Int)]
studentEnrollments = runHINQ studentEnrollmentsQ

In GHCi, you can double-check that your query ran as expected.

GHCi> studentEnrollments
[(Audre Lorde,101),(Leslie Silko,101),(Leslie Silko,201),
(Judith Butler,101),(Guy Debord,201),(Guy Debord,101),
(Jean Baudrillard,101),(Julia Kristeva,201)]

Now suppose you want to get a list of all English students. To do this, you need to join studentEnrollments with courses. Here’s your query for selecting the name of students enrolled in an English course.

Listing 33.25. Joining `studentEnrollments` with courses

englishStudentsQ = HINQ  (_select (fst . fst))
                         (_join studentEnrollments
                                courses
                                snd
                                courseId)
                         (_where ((== "English") . courseTitle . snd))

Notice that your _where clause used data in courses, but your select is concerned only about which students are in the course. Now you can run your query and get a list of englishStudents.

Listing 33.26. Running the `englishStudentsQ` query to list English students

englishStudents :: [Name]
englishStudents = runHINQ englishStudentsQ

With HINQ, you were able to join three lists together just as though they were tables in a relational database.

You can also use HINQ inside a function to make generic tools for querying your data. Suppose you want a function getEnrollments that would list all the students enrolled in a class. You can pass in the course’s name to the query you used last.

Listing 33.27. `getEnrollments` queries your data for enrollments

getEnrollments :: String -> [Name]
getEnrollments courseName =  runHINQ courseQuery
 where courseQuery = HINQ  (_select (fst . fst))
                           (_join studentEnrollments
                                courses
                                snd
                                courseId)
                           (_where ((== courseName) . courseTitle . snd))

In GHCi, you can see that this function works as expected:

GHCi> getEnrollments "English"
[Leslie Silko,Guy Debord,Julia Kristeva]
GHCi> getEnrollments "French"
[Audre Lorde,Leslie Silko,Judith Butler,Guy Debord,Jean Baudrillard]

And there you have it! With the power of monads, you’ve been able to successfully approximate a relational query engine that’s reasonably similar to both SQL and LINQ. Not only are your queries easier to read, but you also get lazy evaluation and a powerful type system to make your system more efficient and robust. Furthermore, for any new types you have, if you implement Monad and Alternative, you can use HINQ on those data types for free! Nearly all the code you wrote to implement used the Monad type class. By combining monads, sum types (your HINQ type), lazy evaluation, and first-class functions, you were able to build a powerful query engine from scratch!

Summary

In this capstone you

Learned how to easily implement _select and _where for lists
Used the Cartesian product of two lists to replicate a database join
Easily changed functions on lists to functions on monads in general
Saw how lambda functions can allow you to restructure the way functions are called
Made working with HINQ queries easier by using a HINQ data type

Extending the exercise

Now that you have the basics of your HINQ queries down, try to extend them the Haskell way! See if you can implement Semigroup and Monoid for HINQ. For Monoid, you might have to refactor your HINQ type to include the empty query. If you can define Monoid for HINQ, you can concatenate a list of HINQ queries into a single query!