In the preceding lesson, you saw how List as a Monad can also be understood as a list comprehension, which is used heavily in Python. In this capstone, you’re going to take your use of lists and monads one step further to create a SQL-like interface to lists (and other monads). SQL is used as the primary means to query relational databases. It has a clean syntax for representing the relationships between data. For example, if you had data about teachers teaching classes, you could query the teachers teaching English like this:
select teacherName from teacher inner join course on teacher.id = course.teacherId where course.title = "English";
This allows you to easily combine two data sets, the teacher table and course table, to extract relevant information. You’ll build a set of tools you’ll call HINQ (borrowing its name from the similar .NET tool LINQ). HINQ will allow you to query your data relationally. You’ll make extensive use of the Monad type class to make this possible. In the end, you’ll have a query tool that
You’ll start by making some select queries on a list, learn to filter your queries with a where function, and finally build a join function that allows you to easily combine complex data inside a monad.
Let’s start with some basic data. You can put everything you need in a file named hinq.hs. Because you want to see how well you can treat lists such as tables in a relational database, you’ll use an example involving students, teachers, courses, and enrollments. Figure 33.1 illustrates this setup.
You’ll start with modeling your student. Each student has a name, which you’ll keep to firstName and lastName.
data Name = Name { firstName ::String , lastName :: String } instance Show Name where show (Name first last) = mconcat [first," ",last]
Then each student has a grade level.
data GradeLevel = Freshman | Sophmore | Junior | Senior deriving (Eq,Ord,Enum,Show)
In addition to these two things, you’ll include a unique student ID. Here’s your Student data type.
data Student = Student { studentId :: Int , gradeLevel :: GradeLevel , studentName :: Name } deriving Show
And you want a list of students you can play with.
students :: [Student] students = [(Student 1 Senior (Name "Audre" "Lorde")) ,(Student 2 Junior (Name "Leslie" "Silko")) ,(Student 3 Freshman (Name "Judith" "Butler")) ,(Student 4 Senior (Name "Guy" "Debord")) ,(Student 5 Sophmore (Name "Jean" "Baudrillard")) ,(Student 6 Junior (Name "Julia" "Kristeva"))]
With just this list, you can move on to building out your basic operations, select and where. In addition to select and where, you’ll also want a join function that will allow you to combine two lists based on a common property.
Thinking in types, you can reason about these three functions as follows: Your select function needs to take a function representing the property being selected and a list of items to select from. Then the result will be a list of the selected property. Figure 33.2 shows the type signature of select.
Your where function will take a test function (which is just a function from a -> Bool) and a list and will return only the remaining values in the list. The type signature of where is shown in figure 33.3.
Finally, your join function will take two lists of potentially different types, and then two functions that extract a property from each list. It’s important that both properties are of the same type and instances of Eq so they can be compared. The result of join will be a list of tuples of the matching values from the original lists. Here’s the type signature for the join function (we’ll explain this in more detail when you implement it):
Eq c => [a] -> [b] -> (a -> c) -> (b -> c) -> [(a,b)]
Next you’ll get started with implementing your select and where functions, which will make it easy to perform simple queries on lists.
The functions you’ll start with are select and where. The select clause in SQL allows you to select properties from a table:
select studentName from students;
This query gives you all the student names from the students table. In the case of your HINQ queries, you’d expect select to give you all the names from a list of students. The where clause in SQL allows you to condition your select on a given value:
select * from students where gradeLevel = 'Senior';
In this SQL statement, you’d select all the entries in the student table that have a grade level of Senior. Notice that in most databases, you’d have to represent the grade level as a String, but in Haskell you get the benefit of using a specific type.
Now you can move on to implementing these as functions. You’ll preface all the functions for HINQ with an underscore (_), not only because you haven’t covered modules and want to avoid collision, but also because where is a reserved keyword.
The easiest operation to implement is _select. The _select function works just like fmap, only you’ll use the Monad syntax in this case.
_select :: (a -> b) -> [a] -> [b] _select prop vals = do val <- vals return (prop val)
Here are a few examples of selecting properties from your students in GHCi:
GHCi> _select (firstName . studentName) students ["Audre","Leslie","Judith","Guy","Jean","Julia"] GHCi> _select gradeLevel students [Senior,Junior,Freshman,Senior,Sophmore,Junior]
This example may make it seem like _select can choose only a single property, but you can easily use a lambda to make a single function that selects two properties:
GHCi> _select (\x -> (studentName x, gradeLevel x)) students [(Audre Lorde,Senior),(Leslie Silko,Junior),(Judith Butler,Freshman),(Guy Debord,Senior),(Jean Baudrillard,Sophmore),(Julia Kristeva,Junior)]
Even after all the tricks of functional programming you’ve learned so far, it’s easy to forget how much power combining first-class functions with lambda functions can provide.
One more thing to notice is that your _select function is strictly less powerful than fmap solely because of its type signature. If you had literally defined _select as _select = fmap, your _select function would work on all members of the Functor type class. Later in this capstone, you’ll refactor your code (though for monads), but it’s worth realizing just how powerful a type signature can be.
Your _where function will also be surprisingly simple. You’ll create a simple wrapper around guard, which will take a test function and your list. Remember, to use guard, you’ll have to import Control.Monad. The _where function is more complicated than _select because it’s not just the guard function (whereas _select could be defined as fmap). You’ll use the <- assignment to treat your list like a single value and then use your test function with guard to filter out the results that don’t pass your test.
_where :: (a -> Bool) -> [a] -> [a] _where test vals = do val <- vals guard (test val) return val
To show off _where, you’ll start with a helper function that tests whether a String starts with a specific letter.
startsWith :: Char -> String -> Bool startsWith char string = char == (head string)
Now you can use where and select together to select only the students with names starting with J:
GHCi> _where (startsWith 'J' . firstName) (_select studentName students) [Judith Butler,Jean Baudrillard,Julia Kristeva]
With the basics of _select and _where down, you can start to look at the heart of relational queries: _join.
Before you join two data sets, you need to create more data. You’ll look at teachers and courses next. Your Teachers have a teacherId and a teacherName.
data Teacher = Teacher { teacherId :: Int , teacherName :: Name } deriving Show
Here are some example teachers.
teachers :: [Teacher] teachers = [Teacher 100 (Name "Simone" "De Beauvior") ,Teacher 200 (Name "Susan" "Sontag")]
Your courses have a courseId, a courseTitle, and a teacher. The teacher is an Int that represents the teacherId of the Teacher leading the Course.
data Course = Course { courseId :: Int , courseTitle :: String , teacher :: Int } deriving Show
And you need some examples.
courses :: [Course] courses = [Course 101 "French" 100 ,Course 201 "English" 200]
Now what you want is to join these two data sets. In SQL terms, the join you’re describing is an inner join, meaning that you care only about matching pairs. In SQL, the following query returns pairs of teachers and the class they teach:
select * from teachers inner join courses on (teachers.teacherId = courses.teacher);
You’re going to assume that for your _join, you’ll be checking to see whether a given property of data in one list is equal to another property in another list. This will have a rather large type signature. What you want to pass into your _join function is two lists, and then a function to select the property to join those lists on, and finally it will return those lists combined. Figure 33.4 shows the type signature to help you understand the process.
You’ll create your _join the same way a join works in the relational algebra used to create databases. You’ll start by computing the Cartesian product of your two lists (by itself, this is a cross join in SQL). The Cartesian product is the combination of all possible pairs. Before you return the result, you’ll filter these pairs out by matching property values (based on the properties you passed in, as shown in figure 33.5).
You can use _join to combine your teachers and courses:
GHCi> _join teachers courses teacherId teacher [(Teacher {teacherId = 100, teacherName = Simone De Beauvior},Course {courseId = 101, courseTitle = "French", teacher = 100}),
(Teacher {teacherId = 200, teacherName = Susan Sontag},Course
{courseId = 201, courseTitle = "English", teacher = 200})]
With the three major parts of your query language together, you can start packaging _select, _where, and _join into an easier-to-use format.
You want to make it a bit easier to stick together the pieces of your query. Here’s an example of using all three functions to find a list of English teachers (there’s only one).
joinData = (_join teachers courses teacherId teacher) whereResult = _where ((== "English") . courseTitle . snd) joinData selectResult = _select (teacherName . fst) whereResult
This solution is okay, but what you want is your query to feel like a SQL query. Typically, SQL queries are structured like this:
select <elements> from <data> where <tests>
Your data can be either a list or data created from joining two lists together. You want to be able to restructure your query so it looks like this:
(_select (teacherName . fst)) (_join teachers courses teacherId teacher) (_where ((== "English") .courseTitle . snd))
You can achieve this by using lambda functions to restructure your code so it looks the way you want it to. You’ll create a function called _hinq that will take the _select,_join, and _where queries in the order you expect and then use lambdas behind the scenes to restructure everything.
_hinq selectQuery joinQuery whereQuery = (\joinData -> (\whereResult -> selectQuery whereResult) (whereQuery joinData) ) joinQuery
The _hinq function can be used to run your query. Obviously, this code isn’t a perfect replication of SQL or LINQ, but it’s pretty close and allows you to think about combining two lists in the same way you would a relational query. Here’s the previous query restructured using your _hinq function.
finalResult :: [Name] finalResult = _hinq (_select (teacherName . fst)) (_join teachers courses teacherId teacher) (_where ((== "English") .courseTitle . snd))
There’s one small annoyance left. Suppose you want to get the first names from your finalResult for all teachers, not just those teaching English. To do this, you wouldn’t need a _where clause. You can solve this by using (\_ -> True), which will automatically make everything True.
teacherFirstName :: [String] teacherFirstName = _hinq (_select firstName) finalResult (_where (\_ -> True))
This works, but it’s no fun to have to remember to pass in this universally true statement. And Haskell doesn’t support default arguments. How can you make an easier way to deal with cases with missing where clauses? You’ll use a HINQ type that will have two constructors.
In this section, you’ll create a HINQ type that represents a query. You know that a query can be a select clause, join/data clause, and a where clause, or just the first two. This will allow you to run queries with and without _where statements. Before moving on, though, you need to make one more improvement in your _select, _where, and _join functions. Currently, these all operate on lists, but you can generalize this so they work on other monads. To fix this, you don’t need to change your code at all, only make your type signatures less restrictive. But you’ll have to add a type class constraint. The guard function works on types of the Alternative type class. Alternative is a subtype of Applicative, and includes defining an empty element for the type (a lot like Monoid). Both List and Maybe are members of Alternative, but IO isn’t. To use the Alternative type class, you need to import Control.Applicative. Here are your refactored type signatures that will extend the power of your HINQ queries.
_select :: Monad m => (a -> b) -> m a -> m b _where :: (Monad m, Alternative m) => (a -> Bool) -> m a -> m a _join :: (Monad m, Alternative m, Eq c) => m a -> m b -> (a -> c) -> (b -> c) -> m (a,b)
This is a great example of why writing monadic code is so useful. You start by solving your problem for just the List type. But you can make your code dramatically more generalized by changing the type signatures! If you had used list-specific functions, such as map and filter, this would require much more work to refactor. Now that your types are refactored, you can make a generic HINQ type to represent the queries you’re interested in running, as shown in figure 33.6.
This constructor uses the types of your _selector, _join, and possibly _where functions. With this data type, you can write a runHINQ function that takes a HINQ type and runs the query.
runHINQ :: (Monad m, Alternative m) => HINQ m a b -> m b runHINQ (HINQ sClause jClause wClause) = _hinq sClause jClause wClause runHINQ (HINQ_ sClause jClause) = _hinq sClause jClause (_where (\_ -> True))
The other benefit of having the HINQ type is that it clarifies the originally long type you were working with. Let’s run a few queries to see how it does!
With your HINQ type, you can start exploring different queries you might want to run. You’ll start by revisiting your English teacher query. Here’s the full HINQ query with a type signature:
query1 :: HINQ [] (Teacher, Course) Name query1 = HINQ (_select (teacherName . fst)) (_join teachers courses teacherId teacher) (_where ((== "English") .courseTitle . snd))
Because Haskell uses lazy evaluation, simply defining this query doesn’t run it. This is great because it emulates the behavior of .NET LINQ (which also uses lazy evaluation) and it means you can pass around expensive computation without worrying about running the queries until you need the result. Another great thing about HINQ is that it’s strongly typed. You’ll easily be able to find bugs in your query because Haskell’s type checker will yell at you. Because of type inference, you can always choose to leave the type out for quick queries. Run query1 and look at the result:
GHCi> runHINQ query1 [Susan Sontag]
If you want to select teacher names from a similar data set, you can omit the where clause and use HINQ_:
query2 :: HINQ [] Teacher Name query2 = HINQ_ (_select teacherName) teachers
This is the same as using _select on teachers by itself, but it shows that your query type works even for extremely simple cases. You can see that you get the results you expect in GHCi:
GHCi> runHINQ query2 [Simone De Beauvior,Susan Sontag]
Lists are the most common use of something like HINQ, but remember that you refactored it to work with all members of Monad and Alternative. Next you’ll look at an example of querying a Maybe.
It’s not hard to imagine that you could end up with a Maybe Teacher and a Maybe Course. Just because you don’t have a list of values doesn’t mean you don’t want to join your teacher with the course. Here’s an example of a possible Teacher and a possible Course.
possibleTeacher :: Maybe Teacher possibleTeacher = Just (head teachers) possibleCourse :: Maybe Course possibleCourse = Just (head courses)
Running a query with a Maybe type means that you’ll get results only if the query doesn’t fail. It can fail from missing data or because it doesn’t find a match. Here’s your English teacher query again for Maybe types.
maybeQuery1 :: HINQ Maybe (Teacher,Course) Name maybeQuery1 = HINQ (_select (teacherName . fst)) (_join possibleTeacher possibleCourse teacherId teacher) (_where ((== "French") .courseTitle . snd))
Even in a Maybe context, you can still think relationally, run queries, and get your results:
GHCi> runHINQ maybeQuery1 Just Simone De Beauvior
If you had a missing course, you can still safely run the query.
missingCourse :: Maybe Course missingCourse = Nothing maybeQuery2 :: HINQ Maybe (Teacher,Course) Name maybeQuery2 = HINQ (_select (teacherName . fst)) (_join possibleTeacher missingCourse teacherId teacher) (_where ((== "French") .courseTitle . snd))
In GHCi, you can see that the missing data is still handled safely:
GHCi> runHINQ maybeQuery2 Nothing
You’ll end this capstone by using HINQ to solve a more complicated problem involving multiple joins.
Next you’ll look at querying your data to determine course enrollment. To do this, you need another data type to represent an enrollment. Enrollment is a student ID and a course ID. Here’s the type you’ll use to represent this.
data Enrollment = Enrollment { student :: Int , course :: Int } deriving Show
You can represent all of your student enrollments by creating a list of them, each pairing a student’s ID with a course ID.
enrollments :: [Enrollment] enrollments = [(Enrollment 1 101) ,(Enrollment 2 101) ,(Enrollment 2 201) ,(Enrollment 3 101) ,(Enrollment 4 201) ,(Enrollment 4 101) ,(Enrollment 5 101) ,(Enrollment 6 201) ]
Suppose you want to get a list of all the students’ names paired with the name of the course they’re enrolled in. To do this, you need to join students with enrollments, and then join the result of that with courses. You can get all of the student enrollments in one go by using a HINQ_ query. This is a great example of the occasional times you may want to take advantage of type inference. How your queries combine types can get complicated, so writing out the full type signature can be tough. Thankfully, type inference takes care of all the work for you! This query will join students and enrollments to get a list of courses the students are enrolled in.
studentEnrollmentsQ = HINQ_ (_select (\(st,en) -> (studentName st, course en)) (_join students enrollments studentId student)
Even though you didn’t want to have to worry about the type signature of the query, you know the result should be a Name and an Id. When you run this query, you can make sure that this is the type of your result.
studentEnrollments :: [(Name, Int)] studentEnrollments = runHINQ studentEnrollmentsQ
In GHCi, you can double-check that your query ran as expected.
GHCi> studentEnrollments [(Audre Lorde,101),(Leslie Silko,101),(Leslie Silko,201),(Judith Butler,101),(Guy Debord,201),(Guy Debord,101),
(Jean Baudrillard,101),(Julia Kristeva,201)]
Now suppose you want to get a list of all English students. To do this, you need to join studentEnrollments with courses. Here’s your query for selecting the name of students enrolled in an English course.
englishStudentsQ = HINQ (_select (fst . fst)) (_join studentEnrollments courses snd courseId) (_where ((== "English") . courseTitle . snd))
Notice that your _where clause used data in courses, but your select is concerned only about which students are in the course. Now you can run your query and get a list of englishStudents.
englishStudents :: [Name] englishStudents = runHINQ englishStudentsQ
With HINQ, you were able to join three lists together just as though they were tables in a relational database.
You can also use HINQ inside a function to make generic tools for querying your data. Suppose you want a function getEnrollments that would list all the students enrolled in a class. You can pass in the course’s name to the query you used last.
getEnrollments :: String -> [Name] getEnrollments courseName = runHINQ courseQuery where courseQuery = HINQ (_select (fst . fst)) (_join studentEnrollments courses snd courseId) (_where ((== courseName) . courseTitle . snd))
In GHCi, you can see that this function works as expected:
GHCi> getEnrollments "English" [Leslie Silko,Guy Debord,Julia Kristeva] GHCi> getEnrollments "French" [Audre Lorde,Leslie Silko,Judith Butler,Guy Debord,Jean Baudrillard]
And there you have it! With the power of monads, you’ve been able to successfully approximate a relational query engine that’s reasonably similar to both SQL and LINQ. Not only are your queries easier to read, but you also get lazy evaluation and a powerful type system to make your system more efficient and robust. Furthermore, for any new types you have, if you implement Monad and Alternative, you can use HINQ on those data types for free! Nearly all the code you wrote to implement used the Monad type class. By combining monads, sum types (your HINQ type), lazy evaluation, and first-class functions, you were able to build a powerful query engine from scratch!
In this capstone you
Now that you have the basics of your HINQ queries down, try to extend them the Haskell way! See if you can implement Semigroup and Monoid for HINQ. For Monoid, you might have to refactor your HINQ type to include the empty query. If you can define Monoid for HINQ, you can concatenate a list of HINQ queries into a single query!