CHAPTER 19
LINQ

LINQ is without question one of the most exciting features in C#. It was added by C# 3.0, and it represented a major addition to the language. Not only did it add an entirely new syntactic element, several new keywords, and a powerful new capability, but also it significantly increased the scope of the language, expanding the range of tasks to which C# can be applied. Simply put, the addition of LINQ was a pivotal event in the evolution of C#.

LINQ stands for Language-Integrated Query. It encompasses a set of features that lets you retrieve information from a data source. As you may know, the retrieval of data constitutes an important part of many programs. For example, a program might obtain information from a customer list, look up product information in a catalog, or access an employee’s record. In many cases, such data is stored in a database that is separate from the application. For example, a product catalog might be stored in a relational database. In the past, interacting with such a database would involve generating queries using SQL (Structured Query Language). Other sources of data, such as XML, required their own approaches. Therefore, prior to C# 3.0, support for such queries was not built into C#. The addition of LINQ changed this.

LINQ gives to C# the ability to generate queries for any LINQ-compatible data source. Furthermore, the syntax used for the query is the same—no matter what data source is used. This means the syntax used to query data in a relational database is the same as that used to query data stored in an array, for example. It is not necessary to use SQL or any other non-C# mechanism. The query capability is fully integrated into the C# language.

In addition to using LINQ with SQL, LINQ can be used with XML files and ADO.NET Datasets. Perhaps equally important, it can also be used with C# arrays and collections (described in Chapter 25). Therefore, LINQ gives you a uniform way to access data in general. This is a powerful, innovative concept in its own right, but the benefits of LINQ do not stop there. LINQ also offers a different way to think about and approach many types of programming tasks—not just traditional database access. As a result, many solutions can be crafted in terms of LINQ.

LINQ is supported by a set of interrelated features, including the query syntax added to the C# language, lambda expressions, anonymous types, and extension methods. Lambda expressions are described in Chapter 15. The others are examined here.


NOTE LINQ in C# is essentially a language within a language. As a result, the subject of LINQ is quite large, involving many features, options, and alternatives. Although this chapter describes LINQ in significant detail, it is not possible to explore all facets, nuances, and applications of this powerful feature. To do so would require an entire book of its own. Instead, this chapter focuses on the core elements of LINQ and presents numerous examples. Going forward, LINQ is definitely a subsystem that you will want to study in greater detail.

LINQ Fundamentals

At LINQ’s core is the query. A query specifies what data will be obtained from a data source. For example, a query on a customer mailing list might request the addresses of all customers that reside in a specific city, such as Chicago or Tokyo. A query on an inventory database might request a list of out-of-stock items. A query on a log of Internet usage could ask for a list of the websites with the highest hit counts. Although these queries differ in their specifics, all can be expressed using the same LINQ syntactic elements.

After a query has been created, it can be executed. One way this is done is by using the query in a foreach loop. Executing a query causes its results to be obtained. Thus, using a query involves two key steps. First, the form of the query is created. Second, the query is executed. Therefore, the query defines what to retrieve from a data source. Executing the query actually obtains the results.

In order for a source of data to be used by LINQ, it must implement the IEnumerable interface. There are two forms of this interface: one generic, one not. In general, it is easier if the data source implements the generic version, IEnumerable<T>, where T specifies the type of data being enumerated. The rest of the chapter assumes that a data source implements IEnumerable<T>. This interface is declared in System.Collections.Generic. A class that implements IEnumerable<T> supports enumeration, which means that its contents can be obtained one at a time, in sequence. All C# arrays implicitly support IEnumerable<T>. Thus, arrays can be used to demonstrate the central concepts of LINQ. Understand, however, that LINQ is not limited to arrays.

A Simple Query

At this point, it will be helpful to work through a simple LINQ example. The following program uses a query to obtain the positive values contained in an array of integers:

// Create a simple LINQ query.
using System;
using System.Linq;

class SimpQuery {
 static void Main() {

   int[] nums = { 1, -2, 3, 0, -4, 5 };
   // Create a query that obtains only positive numbers.
   var posNums = from n in nums
         where n > 0
         select n;

   Console.Write("The positive values in nums: ");

   // Execute the query and display the results.
   foreach(int i in posNums) Console.Write(i + " ");

   Console.WriteLine();
 }

}

This program produces the following output:

The positive values in nums: 1 3 5

As you can see, only the positive values in the nums array are displayed. Although quite simple, this program demonstrates the key features of LINQ. Let’s examine it closely.

The first thing to notice in the program is the using directive:

using System.Linq;

To use the LINQ features, you must include the System.Linq namespace.

Next, an array of int called nums is declared. All arrays in C# are implicitly convertible to IEnumerable<T>. This makes any C# array usable as a LINQ data source.

Next, a query is declared that retrieves those elements in nums that are positive. It is shown here:

var posNums = from n in nums
              where n > 0
              select n;

The variable posNums is called the query variable. It refers to the set of rules defined by the query. Notice it uses var to implicitly declare posNums. As you know, this makes posNums an implicitly typed variable. In queries, it is often convenient to use implicitly typed variables, although you can also explicitly declare the type (which must be some form of IEnumerable<T>). The variable posNums is then assigned the query expression.

All queries begin with from. This clause specifies two items. The first is the range variable, which will receive elements obtained from the data source. In this case, the range variable is n. The second item is the data source, which in this case is the nums array. The type of the range variable is inferred from the data source. In this case, the type of n is int. Generalizing, here is the syntax of the from clause:

from range-variable in data-source

The next clause in the query is where. It specifies a condition that an element in the data source must meet in order to be obtained by the query. Its general form is shown here:

where boolean-expression

The boolean-expression must produce a bool result. (This expression is also called a predicate.) There can be more than one where clause in a query. In the program, this where clause is used:

where n > 0

It will be true only for an element whose value is greater than zero. This expression will be evaluated for every n in nums when the query executes. Only those values that satisfy this condition will be obtained. In other words, a where clause acts as a filter on the data source, allowing only certain items through.

All queries end with either a select clause or a group clause. This example employs the select clause. It specifies precisely what is obtained by the query. For simple queries, such as the one in this example, the range value is selected. Therefore, it returns those integers from nums that satisfy the where clause. In more sophisticated situations, it is possible to finely tune what is selected. For example, when querying a mailing list, you might return just the last name of each recipient, rather than the entire address. Notice that the select clause ends with a semicolon. Because select ends a query, it ends the statement and requires a semicolon. Notice, however, that the other clauses in the query do not end with a semicolon.

At this point, a query variable called posNums has been created, but no results have been obtained. It is important to understand that a query simply defines a set of rules. It is not until the query is executed that results are obtained. Furthermore, the same query can be executed two or more times, with the possibility of differing results if the underlying data source changes between executions. Therefore, simply declaring the query posNums does not mean that it contains the results of the query.

To execute the query, the program uses the foreach loop shown here:

foreach(int i in posNums) Console.WriteLine(i + " ");

Notice that posNums is specified as the collection being iterated over. When the foreach executes, the rules defined by the query specified by posNums are executed. With each pass through the loop, the next element returned by the query is obtained. The process ends when there are no more elements to retrieve. In this case, the type of the iteration variable i is explicitly specified as int because this is the type of the elements retrieved by the query. Explicitly specifying the type of the iteration variable is fine in this situation, since it is easy to know the type of the value selected by the query. However, in more complicated situations, it will be easier (or in some cases, necessary) to implicitly specify the type of the iteration variable by using var.

A Query Can Be Executed More Than Once

Because a query defines a set of rules that are used to retrieve data, but does not, itself, produce results, the same query can be run multiple times. If the data source changes between runs, then the results of the query may differ. Therefore, once you define a query, executing it will always produce the most current results. Here is an example. In the following version of the preceding program, the contents of the nums array are changed between two executions of posNums:

// Create a simple query.
using System;
using System.Linq;
using System.Collections.Generic;
class SimpQuery {
 static void Main() {

   int[] nums = { 1, -2, 3, 0, -4, 5 };

   // Create a query that obtains only positive numbers.
   var posNums = from n in nums
                 where n > 0
                 select n;

   Console.Write("The positive values in nums: ");

   // Execute the query and display the results.
   foreach(int i in posNums) Console.Write(i + " ");
   Console.WriteLine();

   // Change nums.
   Console.WriteLine("\nSetting nums[1] to 99.");
   nums[1] = 99;

   Console.Write("The positive values in nums after change: ");

   // Execute the query a second time.
   foreach(int i in posNums) Console.Write(i + " ");
   Console.WriteLine();
 }
}

The following output is produced:

The positive values in nums: 1 3 5

Setting nums[1] to 99.
The positive values in nums after change: 1 99 3 5

As the output confirms, after the value in nums[1] was changed from –2 to 99, the result of rerunning the query reflects the change. This is a key point that must be emphasized. Each execution of a query produces its own results, which are obtained by enumerating the current contents of the data source. Therefore, if the data source changes, so, too, might the results of executing a query. The benefits of this approach are quite significant. For example, if you are obtaining a list of pending orders for an online store, then you want each execution of your query to produce all orders, including those just entered.

How the Data Types in a Query Relate

As the preceding examples have shown, a query involves variables whose types relate to one another. These are the query variable, the range variable, and the data source. Because the correspondence among these types is both important and a bit confusing at first, they merit a closer look.

The type of the range variable must agree with the type of the elements stored in the data source. Thus, the type of the range variable is dependent upon the type of the data source. In many cases, C# can infer the type of the range variable. As long as the data source implements IEnumerable<T>, the type inference can be made because T describes the type of the elements in the data source. However, if the data source implements the non-generic version of IEnumerable, then you will need to explicitly specify the type of the range variable. This is done by specifying its type in the from clause. For example, assuming the preceding examples, this shows how to explicitly declare n to be an int:

var posNums = from int n in nums
  // ...

Of course, the explicit type specification is not needed here because all arrays are implicitly convertible to IEnumerable<T>, which enables the type of the range variable to be inferred.

The type of object returned by a query is an instance of IEnumerable<T>, where T is the type of the elements. Thus, the type of the query variable must be an instance of IEnumerable<T>. The value of T is determined by the type of the value specified by the select clause. In the case of the preceding examples, T is int because n is an int. (As explained, n is an int because int is the type of elements stored in nums.) Therefore, the query could have been written like this, with the type explicitly specified as IEnumerable <int>:

IEnumerable<int> posNums = from n in nums
                           where n > 0
                           select n;

The key point is that the type of the item selected by select must agree with the type argument passed to IEnumerable<T> used to declare the query variable. Often query variables use var rather than explicitly specifying the type because this lets the compiler infer the proper type from the select clause. As you will see, this approach is particularly useful when select returns something other than an individual element from the data source.

When a query is executed by the foreach loop, the type of the iteration variable must be the same as the type specified by the select clause. In the preceding examples, this type was explicitly specified as int, but you can let the compiler infer the type by declaring this variable as var. As you will see, there are also some cases in which var must be used because the data type has no name.

The General Form of a Query

All queries share a general form, which is based on a set of contextual keywords, shown here:

Image

Of these, the following begin query clauses:

Image

A query must begin with the keyword from and end with either a select or group clause. The select clause determines what type of value is enumerated by the query. The group clause returns the data by groups, with each group being able to be enumerated individually. As the preceding examples have shown, the where clause specifies criteria that an item must meet in order for it to be returned. The remaining clauses help you fine-tune a query. The follows sections examine each query clause.

Filter Values with where

As explained, where is used to filter the data returned by a query. The preceding examples have shown only its simplest form, in which a single condition is used. A key point to understand is that you can use where to filter data based on more than one condition. One way to do this is through the use of multiple where clauses. For example, consider the following program that displays only those values in the array that are both positive and less than 10:

// Use multiple where clauses.
using System;
using System.Linq;

class TwoWheres {
 static void Main() {

   int[] nums = { 1, -2, 3, -3, 0, -8, 12, 19, 6, 9, 10 };

   // Create a query that obtains positive values less than 10.
   var posNums = from n in nums
                 where n > 0
                 where n < 10
                 select n;

   Console.Write("The positive values less than 10: ");
 
   // Execute the query and display the results.
   foreach(int i in posNums) Console.Write (i + " ");
   Console.WriteLine();
 }
}

The output is shown here:

The positive values less than 10: 1 3 6 9

As you can see, only positive values less than 10 are retrieved. This outcome is achieved by the use of the following two where clauses:

where n > 0
where n < 10

The first where requires that an element be greater than zero. The second requires the element to be less than 10. Thus, an element must be between 1 and 9 (inclusive) to satisfy both clauses.

Although it is not wrong to use two where clauses as just shown, the same effect can be achieved in a more compact manner by using a single where in which both tests are combined into a single expression. Here is the query rewritten to use this approach:

var posNums = from n in nums
              where n > 0 && n < 10
              select n;

In general, a where condition can use any valid C# expression that evaluates to a Boolean result. For example, the following program defines an array of strings. Several of the strings define Internet addresses. The query netAddrs retrieves only those strings that have more than four characters and that end with “.net”. Thus, it finds those strings that contain Internet addresses that use the .net top-level domain name.

// Demonstrate another where clause.
using System;
using System.Linq;

class WhereDemo2 {

 static void Main() {

   string[] strs = { ".com", ".net", "hsNameA.com", "hsNameB.net",
                   "test", ".network", "hsNameC.net", "hsNameD.com" };

   // Create a query that obtains Internet addresses that
   // end with .net.
   var netAddrs = from addr in strs
                  where addr.Length > 4 && addr.EndsWith(".net",
                        StringComparison.Ordinal)
                  select addr;
 
   // Execute the query and display the results.
   foreach(var str in netAddrs) Console.WriteLine(str);
 }
}

The output is shown here:

hsNameB.net
hsNameC.net

Notice that the program makes use of one of string’s methods called EndsWith( ) within the where clause. It returns true if the invoking string ends with the character sequence specified as an argument.

Sort Results with orderby

Often you will want the results of a query to be sorted. For example, you might want to obtain a list of past-due accounts, in order of the remaining balance, from greatest to least. Or, you might want to obtain a customer list, alphabetized by name. Whatever the purpose, LINQ gives you an easy way to produce sorted results: the orderby clause.

You can use orderby to sort on one or more criteria. We will begin with the simplest case: sorting on a single item. The general form of orderby that sorts based on a single criterion is shown here:

orderby sort-on how

The item on which to sort is specified by sort-on. This can be as inclusive as the entire element stored in the data source or as restricted as a portion of a single field within the element. The value of how determines if the sort is ascending or descending, and it must be either ascending or descending. The default direction is ascending, so you won’t normally specify ascending.

Here is an example that uses orderby to retrieve the values in an int array in ascending order:

// Demonstrate orderby.
using System;
using System.Linq;

class OrderbyDemo {

 static void Main() {

   int[] nums = { 10, -19, 4, 7, 2, -5, 0 };

   // Create a query that obtains the values in sorted order.
   var posNums = from n in nums
                 orderby n
                 select n;

   Console.Write("Values in ascending order: ");

   // Execute the query and display the results.
   foreach(int i in posNums) Console.Write(i + " ");

  Console.WriteLine();
 }
}

The output is shown here:

Values in ascending order: -19 -5 0 2 4 7 10

To change the order to descending, simply specify the descending option, as shown here:

var posNums = from n in nums
              orderby n descending
              select n;

If you try this, you will see that the order of the values is reversed.

Although sorting on a single criterion is often what is needed, you can use orderby to sort on multiple items by using this form:

orderby sort-onA direction, sort-onB direction, sort-onC direction, ...

In this form, sort-onA is the item on which the primary sorting is done. Then, each group of equivalent items is sorted on sort-onB, and each of those groups is sorted on sort-onC, and so on. Thus, each subsequent sort-on specifies a “then by” item on which to sort. In all cases, direction is optional, defaulting to ascending. Here is an example that uses three sort criteria to sort bank account information by last name, then by first name, and finally by account balance:

// Sort on multiple criteria with orderby.
using System;
using System.Linq;

class Account {
 public string FirstName { get; private set; }
 public string LastName { get; private set; }
 public double Balance { get; private set; }
 public string AccountNumber { get; private set; }

 public Account(string fn, string ln, string accnum, double b) {
 FirstName = fn;
 LastName = ln;
 AccountNumber = accnum;
 Balance = b;

 }

}

class OrderbyDemo {

 static void Main() {

   // Create some data.
   Account[] accounts = { new Account("Tom", "Smith", "132CK", 100.23),
                          new Account("Tom", "Smith", "132CD", 10000.00),
                          new Account("Ralph", "Jones", "436CD", 1923.85),
                          new Account("Ralph", "Jones", "454MM", 987.132),
                          new Account("Ted", "Krammer", "897CD", 3223.19),
                          new Account("Ralph", "Jones", "434CK", -123.32),
                          new Account("Sara", "Smith", "543MM", 5017.40),
                          new Account("Sara", "Smith", "547CD", 34955.79),
                          new Account("Sara", "Smith", "843CK", 345.00),
                          new Account("Albert", "Smith", "445CK", 213.67),
                          new Account("Betty", "Krammer","968MM",5146.67),
                          new Account("Carl", "Smith", "078CD", 15345.99),
                          new Account("Jenny", "Jones", "108CK", 10.98)
                        };

   // Create a query that obtains the accounts in sorted order.
   // Sorting first by last name, then within same last names sorting by
   // by first name, and finally by account balance.
   var accInfo = from acc in accounts
                 orderby acc.LastName, acc.FirstName, acc.Balance
                 select acc;
  Console.WriteLine("Accounts in sorted order: ");

  string str = "";

   // Execute the query and display the results.
   foreach(Account acc in accInfo) {
     if(str != acc.FirstName) {
       Console.WriteLine();
       str = acc.FirstName;
     }

     Console.WriteLine("{0}, {1}\tAcc#: {2}, {3,10:C}",
                       acc.LastName, acc.FirstName,
                       acc.AccountNumber, acc.Balance);
 }
 Console.WriteLine();

 }
}

The output is shown here:

Accounts in sorted order:

Jones, Jenny Acc#: 108CK, $10.98

Jones, Ralph Acc#: 434CK, ($123.32)
Jones, Ralph Acc#: 454MM, $987.13
Jones, Ralph Acc#: 436CD, $1,923.85

Krammer, Betty Acc#: 968MM, $5,146.67

Krammer, Ted Acc#: 897CD, $3,223.19

Smith, Albert Acc#: 445CK, ($213.67)

Smith, Carl Acc#: 078CD, $15,345.99

Smith, Sara Acc#: 843CK, $345.00
Smith, Sara Acc#: 543MM, $5,017.40
Smith, Sara Acc#: 547CD, $34,955.79

Smith, Tom Acc#: 132CK, $100.23
Smith, Tom Acc#: 132CD, $10,000.00

In the query, look closely at how the orderby clause is written:

var accInfo = from acc in accounts
              orderby acc.LastName, acc.FirstName, acc.Balance
              select acc;

Here is how it works. First, the results are sorted by last name, and then entries with the same last name are sorted by the first name. Finally, groups of entries with the same first and last name are sorted by the account balance. This is why the list of accounts under the name Jones is shown in this order:

Jones, Jenny Acc#: 108CK, $10.98

Jones, Ralph Acc#: 434CK, ($123.32)
Jones, Ralph Acc#: 454MM, $987.13
Jones, Ralph Acc#: 436CD, $1,923.85

As the output confirms, the list is sorted by last name, then by first name, and finally by account balance.

When using multiple criteria, you can reverse the condition of any sort by applying the descending option. For example, this query causes the results to be shown in order of decreasing balance:

var accInfo = from acc in accounts
              orderby x.LastName, x.FirstName, x.Balance descending
              select acc;

When using this version, the list of Jones entries will be displayed like this:

Jones, Jenny Acc#: 108CK,    $10.98

Jones, Ralph Acc#: 436CD, $1,923.85
Jones, Ralph Acc#: 454MM,   $987.13
Jones, Ralph Acc#: 434CK, ($123.32)

As you can see, now the accounts for Ralph Jones are displayed from greatest to least.

A Closer Look at select

The select clause determines what type of elements are obtained by a query. Its general form is shown here:

select expression

So far we have been using select to return the range variable. Thus, expression has simply named the range variable. However, select is not limited to this simple action. It can return a specific portion of the range variable, the result of applying some operation or transformation to the range variable, or even a new type of object that is constructed from pieces of the information retrieved from the range variable. This is called projecting.

To begin examining the other capabilities of select, consider the following program. It displays the square roots of the positive values contained in an array of double values.

// Use select to return the square root of all positive values
// in an array of doubles.
using System;
using System.Linq;

class SelectDemo {

  static void Main() {
    double[] nums = { -10.0, 16.4, 12.125, 100.85, -2.2, 25.25, -3.5 } ;

    // Create a query that returns the square roots of the
    // positive values in nums.
    var sqrRoots = from n in nums
                   where n > 0
                   select Math.Sqrt(n);

    Console.WriteLine("The square roots of the positive values" +
                      " rounded to two decimal places:");

    // Execute the query and display the results.
    foreach(double r in sqrRoots) Console.WriteLine("{0:#.##}", r);
 }

}

The output is shown here:

The square roots of the positive values rounded to two decimal places:
4.05
3.48
10.04
5.02

In the query, pay special attention to the select clause:

select Math.Sqrt(n);

It returns the square root of the range variable. It does this by obtaining the result of passing the range variable to Math.Sqrt( ), which returns the square root of its argument. This means that the sequence obtained when the query is executed will contain the square roots of the positive values in nums. If you generalize this concept, the power of select becomes apparent. You can use select to generate any type of sequence you need, based on the values obtained from the data source.

Here is a program that shows another way to use select. It creates a class called EmailAddress that contains two properties. The first holds a person’s name. The second contains an e-mail address. The program then creates an array that contains several EmailAddress entries. The program uses a query to obtain a list of just the e-mail addresses by themselves.

// Return a portion of the range variable.
using System;
using System.Linq;

class EmailAddress {
  public string Name { get; set; }
  public string Address { get; set; }

public EmailAddress(string n, string a) {
    Name = n;
    Address = a;
  }
}
class SelectDemo2 {
  static void Main() {

    EmailAddress[] addrs = {
         new EmailAddress("Herb", "Herb@HerbSchildt.com"),
         new EmailAddress("Tom", "Tom@HerbSchildt.com"),
         new EmailAddress("Sara", "Sara@HerbSchildt.com")
    };

    // Create a query that selects e-mail addresses.
    var eAddrs = from entry in addrs
                 select entry.Address;

    Console.WriteLine("The e-mail addresses are");

    // Execute the query and display the results.
    foreach(strings in eAddrs) Console.WriteLine("  " + s);
  }
}

The output is shown here:

The e-mail addresses are
  Herb@HerbSchildt.com
  Tom@HerbSchildt.com
  Sara@HerbSchildt.com

Pay special attention to the select clause:

select entry.Address;

Instead of returning the entire range variable, it returns only the Address portion. This fact is evidenced by the output. This means the query returns a sequence of strings, not a sequence of EmailAddress objects. This is why the foreach loop specifies s as a string. As explained, the type of sequence returned by a query is determined by the type of value returned by the select clause.

One of the more powerful features of select is its ability to return a sequence that contains elements created during the execution of the query. For example, consider the following program. It defines a class called ContactInfo, which stores a name, e-mail address, and telephone number. It also defines the EmailAddress class used by the preceding example. Inside Main( ), an array of ContactInfo is created. Then, a query is declared in which the data source is an array of ContactInfo, but the sequence returned contains EmailAddress objects. Thus, the type of the sequence returned by select is not ContactInfo, but rather EmailAddress, and these objects are created during the execution of the query.

// Use a query to obtain a sequence of EmailAddresses
// from a list of ContactInfo.
using System;
using System.Linq;

class ContactInfo {
  public string Name { get; set; }
  public string Email { get; set; }
  public string Phone { get; set; }

  public ContactInfo(string n, string a, string p) {
     Name = n;
     Email = a;
     Phone = p;
  }
}

class EmailAddress {
  public string Name { get; set; }
  public string Address { get; set; }

  public EmailAddress(string n, string a) {
    Name = n;
    Address = a;
  }
}

class SelectDemo3 {
  static void Main() {

    ContactInfo[] contacts = {
          new ContactInfo("Herb", "Herb@HerbSchildt.com", "555-1010"),
          new ContactInfo("Tom", "Tom@HerbSchildt.com", "555-1101"),
          new ContactInfo("Sara", "Sara@HerbSchildt.com", "555-0110")
    };

    // Create a query that creates a list of EmailAddress objects.
    var emailList = from entry in contacts
                    select new EmailAddress(entry.Name, entry.

    Console.WriteLine("The e-mail list is");

    // Execute the query and display the results.
    foreach(EmailAddress e in emailList)
      Console.WriteLine("  {0}: {1}", e.Name, e.Address);
  }
}

The output is shown here:

The e-mail list is
  Herb: Herb@HerbSchildt.com
  Tom: Tom@HerbSchildt.com
  Sara: Sara@HerbSchildt.com

In the query, pay special attention to the select clause:

select new EmailAddress(entry.Name, entry.Email);

It creates a new EmailAddress object that contains the name and e-mail address obtained from a ContactInfo object in the contacts array. The key point is that new EmailAddress objects are created by the query in its select clause, during the query’s execution.

Use Nested from Clauses

A query can contain more than one from clause. Thus, a query can contain nested from clauses. One common use of a nested from clause is found when a query needs to obtain data from two different sources. Here is a simple example. It uses two from clauses to iterate over two different character arrays. It produces a sequence that contains all possible combinations of the two sets of characters.

// Use two from clauses to create a list of all
// possible combinations of the letters A, B, and C
// with the letters X, Y, and Z.
using System;
using System.Linq;

// This class holds the result of the query.
class ChrPair {
  public char First;
  public char Second;

  public ChrPair(char c, char c2) {
    First = c;
    Second = c2;
  }
}

class MultipleFroms {
  static void Main() {

    char[] chrs = { 'A', 'B', 'C' };
    char[] chrs2 = { 'X', 'Y', 'Z' };

    // Notice that the first from iterates over chrs and
    // the second from iterates over chrs2.
    var pairs = from ch1 in chrs
                  from ch2 in chrs2
                  select new ChrPair(ch1, ch2);

    Console.WriteLine("All combinations of ABC with XYZ: ");

    foreach(var p in pairs)
      Console.WriteLine("{0} {1}", p.First, p.Second);
  }
}

The output is shown here:

All combinations of ABC with XYZ:
A X
A Y
A Z
B X
B Y
B Z
C X
C Y
C Z

The program begins by creating a class called ChrPair that will hold the results of the query. It then creates two character arrays, called chrs and chrs2. It uses the following query to produce all possible combinations of the two sequences:

var pairs = from ch1 in chrs
            from ch2 in chrs2
            select new ChrPair(ch1, ch2);

The nested from clauses cause both chrs and chrs2 to be iterated over. Here is how it works. First, a character is obtained from chrs and stored in ch1. Then, the chrs2 array is enumerated. With each iteration of the inner from, a character from chrs2 is stored in ch2 and the select clause is executed. The result of the select clause is a new object of type ChrPair that contains the character pair ch1, ch2 produced by each iteration of the inner from. Thus, a ChrPair is produced in which each possible combination of characters is obtained.

Another common use of a nested from is to iterate over a data source that is contained within another data source. An example of this is found in the section, “Use let to Create a Variable in a Query,” later in this chapter.

Group Results with group

One of the most powerful query features is provided by the group clause because it enables you to create results that are grouped by keys. Using the sequence obtained from a group, you can easily access all of the data associated with a key. This makes group an easy and effective way to retrieve data that is organized into sequences of related items. The group clause is one of only two clauses that can end a query. (The other is select.)

The group clause has the following general form:

group range-variable by key

It returns data grouped into sequences, with each sequence sharing the key specified by key.

The result of group is a sequence that contains elements of type IGrouping<TKey, TElement>, which is declared in the System.Linq namespace. It defines a collection of objects that share a common key. The type of query variable in a query that returns a group is IEnumerable<IGrouping<TKey, TElement>>. IGrouping defines a read-only property called Key, which returns the key associated with each sequence.

Here is an example that illustrates the use of group. It declares an array that contains a list of websites. It then creates a query that groups the list by top-level domain name, such as .org or .com.

// Demonstrate the group clause.
using System;
using System.Linq;

class GroupDemo {

  static void Main() {
    string[] websites = { "hsNameA.com", "hsNameB.net", "hsNameC.net",
                           "hsNameD.com", "hsNameE.org", "hsNameF.org",
                           "hsNameG.tv",  "hsNameH.net", "hsNameI.tv" };

      // Create a query that groups websites by top-level domain name.
      var webAddrs = from addr in websites
                   where addr.LastIndexOf('.') != -1
                   group addr by addr.Substring(addr.LastIndexOf('.'));

      // Execute the query and display the results.
      foreach(var sites in webAddrs) {
        Console.WriteLine("Web sites grouped by " + sites.Key);
        foreach(var site in sites)
          Console.WriteLine("  " + site);
        Console.WriteLine();
    }
  }
}

The output is shown here:

Web sites grouped by .com
  hsNameA.com
  hsNameD.com

Web sites grouped by .net
  hsNameB.net
  hsNameC.net
  hsNameH.net

Web sites grouped by .org
  hsNameE.org
  hsNameF.org

Web sites grouped by .tv
  hsNameG.tv
  hsNameI.tv

As the output shows, the data is grouped based on the top-level domain name of a website. Notice how this is achieved by the group clause:

var webAddrs = from addr in websites
               where addr.LastIndexOf('.') != -1
               group addr by addr.Substring(addr.LastIndexOf('.'));

The key is obtained by use of the LastIndexOf( ) and Substring( ) methods defined by string. (These are described in Chapter 7. The version of Substring( ) used here returns the substring that starts at the specified index and runs to the end of the invoking string.) The index of the last period in a website name is found using LastIndexOf( ). Using this index, the Substring( ) method obtains the remainder of the string, which is the part of the website name that contains the top-level domain name. One other point: Notice the use of the where clause to filter out any strings that don’t contain a period. The LastIndexOf( ) method returns –1 if the specified string is not contained in the invoking string.

Because the sequence obtained when webAddrs is executed is a list of groups, you will need to use two foreach loops to access the members of each group. The outer loop obtains each group. The inner loop enumerates the members within the group. The iteration variable of the outer foreach loop must be an IGrouping instance compatible with the key and element type. In the example both the keys and elements are string. Therefore, the type of the sites iteration variable of the outer loop is IGrouping<string, string>. The type of the iteration variable of the inner loop is string. For brevity, the example implicitly declares these variables, but they could have been explicitly declared as shown here:

foreach(IGrouping<string, string> sites in webAddrs) {
  Console.WriteLine("Web sites grouped by " + sites.Key);
  foreach(string site in sites)
    Console.WriteLine("  " + site);
  Console.WriteLine();
}

Use into to Create a Continuation

When using select or group, you will sometimes want to generate a temporary result that will be used by a subsequent part of the query to produce the final result. This is called a query continuation (or just a continuation for short), and it is accomplished through the use of into with a select or group clause. It has the following general form:

into name query-body

where name is the name of the range variable that iterates over the temporary result and is used by the continuing query, specified by query-body. This is why into is called a query continuation when used with select or group—it continues the query. In essence, a query continuation embodies the concept of building a new query that queries the results of the preceding query.


NOTE There is also a form of into that can be used with join, which creates a group join. This is described later in this chapter.

Here is an example that uses into with group. The following program reworks the GroupDemo example shown earlier, which creates a list of websites grouped by top-level domain name. In this case, the initial results are queried by a range variable called ws. This result is then filtered to remove all groups that have fewer than three elements.

// Use into with group.
using System;
using System.Linq;

class IntoDemo {

  static void Main() {

    string[] websites = { "hsNameA.com", "hsNameB.net", "hsNameC.net",
                           "hsNameD.com", "hsNameE.org", "hsNameF.org",
                          "hsNameG.tv",  "hsNameH.net", "hsNameI.tv" };
    // Create a query that groups websites by top-level domain name,
    // but select only those groups that have more than two members.
    // Here, ws is the range variable over the set of groups
    // returned when the first half of the query is executed.
    var webAddrs = from addr in websites
                   where addr.LastIndexOf('.') != -1
                   group addr by addr.Substring(addr.LastIndexOf('.'))
                              into ws
                   where ws.Count() > 2
                   select ws;

    // Execute the query and display the results.
    Console.WriteLine("Top-level domains with more than 2 members.\n");

    foreach(var sites in webAddrs) {
      Console.WriteLine("Contents of " + sites.Key + " domain:");
      foreach(var site in sites)
         Console.WriteLine("  " + site);
      Console.WriteLine();
    }
  }
}

The following output is produced:

Top-level domains with more than 2 members.

Contents of .net domain:
  hsNameB.net
  hsNameC.net
  hsNameH.net

As the output shows, only the .net group is returned because it is the only group that has more than two elements.

In the program, pay special attention to this sequence of clauses in the query:

group addr by addr.Substring(addr.LastIndexOf('.'))
           into ws
where ws.Count() > 2
select ws;

First, the results of the group clause are stored (creating a temporary result) and the where clause operates on the stored results. At this point, ws will range over each group obtained by group. Next, the where clause filters the query so the final result contains only those groups that contain more than two members. This determination is made by calling Count( ), which is an extension method that is implemented for all IEnumerable objects. It returns the number of elements in a sequence. (You’ll learn more about extension methods later in this chapter.) The resulting sequence of groups is returned by the select clause.

Use let to Create a Variable in a Query

In a query, you will sometimes want to retain a value temporarily. For example, you might want to create an enumerable variable that can, itself, be queried. Or, you might want to store a value that will be used later on in a where clause. Whatever the purpose, these types of actions can be accomplished through the use of let.

The let clause has this general form:

let name = expression

Here, name is an identifier that is assigned the value of expression. The type of name is inferred from the type of the expression.

Here is an example that shows how let can be used to create another enumerable data source. The query takes as input an array of strings. It then converts those strings into char arrays. This is accomplished by use of another string method called ToCharArray( ), which returns an array containing the characters in the string. The result is assigned to a variable called chrArray, which is then used by a nested from clause to obtain the individual characters in the array. The query then sorts the characters and returns the resulting sequence.

// Use a let clause and a nested from clause.
using System;
using System.Linq;

class LetDemo {

  static void Main() {

     string[] strs = { "alpha", "beta", "gamma" };

     // Create a query that obtains the characters in the
     // strings, returned in sorted order. Notice the use
     // of a nested from clause.
     var chrs = from str in strs
                let chrArray = str.ToCharArray()
                   from ch in chrArray
                   orderby ch
                   select ch;

    Console.WriteLine("The individual characters in sorted order:");

    // Execute the query and display the results.
    foreach(char c in chrs) Console.Write(c + " ");

    Console.WriteLine();
  }
}

The output is shown here:

The individual characters in sorted order:
a a a a a b e g h l m m p t

In the program, notice how the let clause assigns to chrArray a reference to the array returned by str.ToCharArray( ):

let chrArray = str.ToCharArray()

After the let clause, other clauses can make use of chrArray. Furthermore, because all arrays in C# are convertible to IEnumerable<T>, chrArray can be used as a data source for a second, nested from clause. This is what happens in the example. It uses the nested from to enumerate the individual characters in the array, sorting them into ascending sequence and returning the result.

You can also use a let clause to hold a non-enumerable value. For example, the following is a more efficient way to write the query used in the IntoDemo program shown in the preceding section.

var webAddrs = from addr in websites
               let idx = addr.LastIndexOf('.')
               where idx != -1
               group addr by addr.Substring(idx)
                          into ws
               where ws.Count() > 2
               select ws;

In this version, the index of the last occurrence of a period is assigned to idx. This value is then used by Substring( ). This prevents the search for the period from having to be conducted twice.

Join Two Sequences with join

When working with databases, it is common to want to create a sequence that correlates data from two different data sources. For example, an online store might have one database that associates the name of an item with its item number, and a second database that associates the item number with its in-stock status. Given this situation, you might want to generate a list that shows the in-stock status of items by name, rather than by item number. You can do this by correlating the data in the two databases. Such an action is easy to accomplish in LINQ through the use of the join clause.

The general form of join is shown here (in context with the from):

from range-varA in data-sourceA

join range-varB in data-sourceB

on range-varA.property equals range-varB.property

The key to using join is to understand that each data source must contain data in common, and that the data can be compared for equality. Thus, in the general form, data-sourceA and data-sourceB must have something in common that can be compared. The items being compared are specified by the on section. Thus, when range-varA. property is equal to range-var B.property, the correlation succeeds. In essence, join acts like a filter, allowing only those elements that share a common value to pass through.

When using join, often the sequence returned is a composite of portions of the two data sources. Therefore, join lets you generate a new list that contains elements from two different data sources. This enables you to organize data in a new way.

The following program creates a class called Item, which encapsulates an item’s name with its number. It creates another class called InStockStatus, which links an item number with a Boolean property that indicates whether or not the item is in stock. It also creates a class called Temp, which has two fields: one string and one bool. Objects of this class will hold the result of the query. The query uses join to produce a list in which an item’s name is associated with its in-stock status.

// Demonstrate join.
using System;
using System.Linq;

// A class that links an item name with its number.
class Item {
  public string Name { get; set; }
  public int ItemNumber { get; set; }

  public Item(string n, int inum) {
    Name = n;
    ItemNumber = inum;
  }
}

// A class that links an item number with its in-stock status.
class InStockStatus {
  public int ItemNumber { get; set; }
  public bool InStock { get; set; }

  public InStockStatus(int n, bool b) {
    ItemNumber = n;
    InStock = b;
  }
}

// A class that encapsulates a name with its status.
class Temp {
  public string Name { get; set; }
  public bool InStock { get; set; }

  public Temp(string n, bool b) {
    Name = n;
    InStock  = b;
  }
}

class JoinDemo {
  static void Main() {

    Item[] items = {
         new Item("Pliers", 1424),
         new Item("Hammer", 7892),
         new Item("Wrench", 8534),
         new Item("Saw", 6411)
    };
InStockStatus[] statusList = {
         new InStockStatus(1424, true),
         new InStockStatus(7892, false),
         new InStockStatus(8534, true),
         new InStockStatus(6411, true)
    };

    // Create a query that joins Item with InStockStatus to
    // produce a list of item names and availability. Notice
    // that a sequence of Temp objects is produced.
    var inStockList = from item in items
                      join entry in statusList
                        on item.ItemNumber equals entry.ItemNumber
                      select new Temp(item.Name, entry.InStock);
  
    Console.WriteLine("Item\tAvailable\n");

    // Execute the query and display the results.
     foreach(Temp t in inStockList)
      Console.WriteLine("{0}\t{1}", t.Name, t.InStock);
  }
}

The output is shown here:

Item    Available

Pliers  True
Hammer  False
Wrench  True
Saw     True

To understand how join works, let’s walk through each line in the query. The query begins in the normal fashion with this from clause:

var inStockList = from item in items

This clause specifies that item is the range variable for the data source specified by items. The items array contains objects of type Item, which encapsulate a name and a number for an inventory item.

Next comes the join clause shown here:

join entry in statusList
  on item.ItemNumber equals entry.ItemNumber

This clause specifies that entry is the range variable for the statusList data source. The statusList array contains objects of type InStockStatus, which link an item number with its status. Thus, items and statusList have a property in common: the item number. This is used by the on/equals portion of the join clause to describe the correlation. Thus, join matches items from the two data sources when their item numbers are equal.

Finally, the select clause returns a Temp object that contains an item’s name along with its in-stock status:

select new Temp(item.Name, entry.InStock);

Therefore, the sequence obtained by the query consists of Temp objects.

Although the preceding example is fairly straightforward, join supports substantially more sophisticated operations. For example, you can use into with join to create a group join, which creates a result that consists of an element from the first sequence and a group of all matching elements from the second sequence. (You’ll see an example of this a bit later in this chapter.) In general, the time and effort needed to fully master join is well worth the investment because it gives you the ability to reorganize data at runtime. This is a powerful capability. This capability is made even more powerful by the use of anonymous types, described in the next section.

Anonymous Types

C# provides a feature called the anonymous type that directly relates to LINQ. As the name implies, an anonymous type is a class that has no name. Its primary use is to create an object returned by the select clause. Often, the outcome of a query is a sequence of objects that are either a composite of two (or more) data sources (such as in the case of join) or include a subset of the members of one data source. In either case, often the type of the object being returned is needed only because of the query and is not used elsewhere in the program. In this case, using an anonymous type eliminates the need to declare a class that will be used simply to hold the outcome of the query.

An anonymous type is created through the use of this general form:

new { nameA = valueA, nameB = valueB, ... }

Here, the names specify identifiers that translate into read-only properties that are initialized by the values. For example,

new { Count = 10, Max = 100, Min = 0 }

This creates a class type that has three public read-only properties: Count, Max, and Min. These are given the values 10, 100, and 0, respectively. These properties can be referred to by name by other code. Notice that an anonymous type uses object initializers to initialize the properties. As explained in Chapter 8, object initializers provide a way to initialize an object without explicitly invoking a constructor. This is necessary in the case of anonymous types because there is no way to explicitly call a constructor. (Recall that constructors have the same name as their class. In the case of an anonymous class, there is no name. So, how would you invoke the constructor?)

Because an anonymous type has no name, you must use an implicitly typed variable to refer to it. This lets the compiler infer the proper type. For example,

var myOb = new { Count = 10, Max = 100, Min = 0 }

creates a variable called myOb that is assigned a reference to the object created by the anonymous type expression. This means that the following statements are legal:

Console.WriteLine("Count is " + myOb.Count);

if(i <= myOb.Max && i >= myOb.Min) // ...

Remember, when an anonymous type is created, the identifiers that you specify become read-only public properties. Thus, they can be used by other parts of your code.

Although the term anonymous type is used, it’s not quite completely true! The type is anonymous relative to you, the programmer. However, the compiler does give it an internal name. Thus, anonymous types do not violate C#’s strong type checking rules.

To fully understand the value of anonymous types, consider this rewrite of the previous program that demonstrated join. Recall that in the previous version, a class called Temp was needed to encapsulate the result of the join. Through the use of an anonymous type, this “placeholder” class is no longer needed and no longer clutters the source code to the program. The output from the program is unchanged from before.

// Use an anonymous type to improve the join demo program
using System;
using System.Linq;

// A class that links an item name with its number.
class Item {
  public string Name { get; set; }
  public int ItemNumber { get; set; }

  public Item(string n, int inum) {
    Name = n;
    ItemNumber = inum;
  }
}

// A class that links an item number with its in-stock
class InStockStatus {
  public int ItemNumber { get; set; }
  public bool InStock { get; set; }

  public InStockStatus(int n, bool b) {
    ItemNumber = n;
    InStock = b;
  }
}

class AnonTypeDemo {
  static void Main() {

    Item[] items = {
         new Item("Pliers", 1424),
         new Item("Hammer", 7892),
         new Item("Wrench", 8534),
         new Item("Saw", 6411)
    };

     InStockStatus[] statusList = {
         new InStockStatus(1424, true),
         new InStockStatus(7892, false),
         new InStockStatus(8534, true),
         new InStockStatus(6411, true)
     };
     // Create a query that joins Item with InStockStatus to
     // produce a list of item names and availability.
     // Now, an anonymous type is used.
     var inStockList = from item in items
                       join entry in statusList
                         on item.ItemNumber equals entry.ItemNumber
                       select new { Name = item.Name,
                                    InStock =  entry.InStock };

    Console.WriteLine("Item\tAvailable\n");

    // Execute the query and display the results.
    foreach(var t in inStockList)
      Console.WriteLine("{0}\t{1}", t.Name, t.InStock);
  }
}

Pay special attention to the select clause:

select new { Name = item.Name,
             InStock =  entry.InStock };

It returns an object of an anonymous type that has two read-only properties, Name and InStock. These are given the values specified by the item’s name and availability. Because of the anonymous type, there is no longer any need for the Temp class.

One other point. Notice the foreach loop that executes the query. It now uses var to declare the iteration variable. This is necessary because the type of the object contained in inStockList has no name. This situation is one of the reasons that C# includes implicitly typed variables. They are needed to support anonymous types.

Before moving on, there is one more aspect of anonymous types that warrants a mention. In some cases, including the one just shown, you can simplify the syntax of the anonymous type through the use of a projection initializer. In this case, you simply specify the name of the initializer by itself. This name automatically becomes the name of the property. For example, here is another way to code the select clause used by the preceding program:

select new { item.Name, entry.InStock };

Here, the property names are still Name and InStock, just as before. The compiler automatically “projects” the identifiers Name and InStock, making them the property names of the anonymous type. Also as before, the properties are given the values specified by item.Name and entry. InStock.

Create a Group Join

As mentioned earlier, you can use into with join to create a group join, which creates a sequence in which each entry in the result consists of an entry from the first sequence and a group of all matching elements from the second sequence. No example was presented then because often a group join makes use of an anonymous type. Now that anonymous types have been covered, an example of a simple group join can be given.

The following example uses a group join to create a list in which various transports, such as cars, boats, and planes, are organized by their general transportation category, which is land, sea, or air. The program first creates a class called Transport that links a transport type with its classification. Inside Main( ), it creates two input sequences. The first is an array of strings that contains the names of the general means by which one travels, which are land, sea, and air. The second is an array of Transport, which encapsulates various means of transportation. It then uses a group join to produce a list of transports that are organized by their category.

// Demonstrate a simple group join.
using System;
using System.Linq;

// This class links the name of a transport, such as Train,
// with its general classification, such as land, sea, or air.
class Transport {
  public string Name { get; set; }
  public string How { get; set; }

  public Transport(string n, string h) {
    Name = n;
    How = h;
  }
}

class GroupJoinDemo {
  static void Main() {

    // An array of transport classifications.
    string[] travelTypes = {
          "Air",
          "Sea",
          "Land"
     };

     // An array of transports.
     Transport[] transports = {
          new Transport("Bicycle", "Land"),
          new Transport("Balloon", "Air"),
          new Transport("Boat", "Sea"),
          new Transport("Jet", "Air"),
          new Transport("Canoe", "Sea"),
          new Transport("Biplane", "Air"),
          new Transport("Car", "Land"),
          new Transport("Cargo Ship", "Sea"),
          new Transport("Train", "Land")
     };

    // Create a query that uses a group join to produce
    // a list of item names and IDs organized by category.
    var byHow = from how in travelTypes
                       join trans in transports
                       on how equals trans.How
                       into lst
                     select new { How = how, Tlist = lst };
    // Execute the query and display the results.
    foreach(var t in byHow) {
      Console.WriteLine("{0} transportation includes:", t.How);

      foreach(var m in t.Tlist)
        Console.WriteLine("  " + m.Name);

      Console.WriteLine();
    }
  }
}

The output is shown here:

Air transportation includes:
  Balloon
  Jet
  Biplane

Sea transportation includes:
  Boat
  Canoe
  Cargo Ship

Land transportation includes:
  Bicycle
  Car
  Train

The key part of the program is, of course, the query, which is shown here:

var byHow = from how in travelTypes
                   join trans in transports
                   on how equals trans.How
                   into lst
                 select new { How = how, Tlist = lst };

Here is how it works. The from statement uses how to range over the travelTypes array. Recall that travelTypes contains an array of the general travel classifications: air, land, and sea. The join clause joins each travel type with those transports that use that type. For example, the type Land is joined with Bicycle, Car, and Train. However, because of the into clause, for each travel type, the join produces a list of the transports that use that travel type. This list is represented by lst. Finally, select returns an anonymous type that encapsulates each value of how (the travel type) with a list of transports. This is why the two foreach loops shown here are needed to display the results of the query:

foreach(var t in byHow) {
  Console.WriteLine("{0} transportation includes:", t.How);

  foreach(var m in t.Tlist)
    Console.WriteLine("  " + m.Name);

  Console.WriteLine();
}

The outer loop obtains an object that contains the name of the travel type and the list of the transports for that type. The inner loop displays the individual transports.

The Query Methods

The query syntax described by the preceding sections is the way you will probably write most queries in C#. It is convenient, powerful, and compact. It is, however, not the only way to write a query. The other way is to use the query methods. These methods can be called on any enumerable object, such as an array.

The Basic Query Methods

The query methods are defined by System.Linq.Enumerable and are implemented as extension methods that extend the functionality of IEnumerable<T>. (Query methods are also defined by System.Linq.Queryable, which extends the functionality of IQueryable<T>, but this interface is not used in this chapter.) An extension method adds functionality to another class, but without the use of inheritance. Support for extension methods is relatively new, being added by C# 3.0, and we will look more closely at them later in this chapter. For now, it is sufficient to understand that query methods can be called only on an object that implements IEnumerable<T>.

The Enumerable class provides many query methods, but at the core are those that correspond to the query keywords described earlier. These methods are shown here, along with the keywords to which they relate. Understand that these methods have overloaded forms and only their simplest form is shown. However, this is also the form that you will often use.

Image

Except for Join( ), these query methods take one argument, which is an object of some form of the generic type Func<T, TResult>. This is a built-in delegate type that is declared like this:

delegate TResult Func<in T, out TResult>(T arg)

Here, TResult specifies the result type of the delegate and T specifies the element type. In these query methods, the selector,predicate, or keySelector argument determines what action the query method takes. For example, in the case of Where( ), it determines how the query filters the data. Each of these query methods returns an enumerable object. Thus, the result of one can be used to execute a call on another, allowing the methods to be chained together.

The Join( ) method takes four arguments. The first is a reference to the second sequence to be joined. The first sequence is the one on which Join( ) is called. The key selector for the first sequence is passed via outerKeySelector, and the key selector for the second sequence is passed via innerKeySelector. The result of the join is described by resultSelector. The type of outerKeySelector is Func<TOuter, TKey>, and the type of innerKeySelector is Func<TInner, TKey>. The resultSelector argument is of type Func<TOuter, TInner, TResult>. Here, TOuter is the element type of the invoking sequence; TInner is the element type of the passed sequence; and TResult is the type of the resulting elements. An enumerable object is returned that contains the result of the join.

Although an argument to a query method such as Where( ) is a method compatible with the specified form of the Func delegate, it does not need to be an explicitly declared method. In fact, most often it won’t be. Instead, you will usually use a lambda expression. As explained in Chapter 15, a lambda expression offers a streamlined, yet powerful way to define what is, essentially, an anonymous method. The C# compiler automatically converts a lambda expression into a form that can be passed to a Func parameter. Because of the convenience offered by lambda expressions, they are used by all of the examples in this section.

Create Queries by Using the Query Methods

By using the query methods in conjunction with lambda expressions, it is possible to create queries that do not use the C# query syntax. Instead, the query methods are called. Let’s begin with a simple example. It reworks the first program in this chapter so that it uses calls to Where( ) and Select( ) rather than the query keywords.

// Use the query methods to create a simple query.
// This is a reworked version of the first program in this chapter.
using System;
using System.Linq;

class SimpQuery {
  static void Main() {

    int[] nums =  { 1, -2, 3, 0, -4, 5 };

    // Use Where() and Select() to create a simple query.
    var posNums = nums.Where(n => n > 0).Select(r => r);

    Console.Write("The positive values in nums: ");

    // Execute the query and display the results.
    foreach(int i in posNums) Console.Write(i + " ");
    Console.WriteLine();
  }
}

The output, shown here, is the same as the original version:

The positive values in nums: 1 3 5

In the program, pay special attention to this line:

var posNums = nums.Where(n => n > 0).Select(r => r);

This creates a query called posNums that creates a sequence of the positive values in nums. It does this by use of the Where( ) method (to filter the values) and Select( ) (to select the values). The Where( ) method can be invoked on nums because all arrays implement IEnumerable<T>, which supports the query extension methods.

Technically, the Select( ) method in the preceding example is not necessary because in this simple case, the sequence returned by Where( ) already contains the result. However, you can use more sophisticated selection criteria, just as you did with the query syntax. For example, this query returns the positive values in nums increased by an order of magnitude:

var posNums = nums.Where(n => n > 0).Select(r => r * 10);

As you might expect, you can chain together other operations. For example, this query selects the positive values, sorts them into descending order, and returns the resulting sequence:

var posNums = nums.Where(n => n > 0).OrderByDescending(j => j);

Here, the expression j => j specifies that the ordering is dependent on the input parameter, which is an element from the sequence obtained from Where( ).

Here is an example that demonstrates the GroupBy( ) method. It reworks the group example shown earlier.

// Demonstrate the GroupBy() query method.
// This program reworks the earlier version that used
// the query syntax.
using System;
using System.Linq;

class GroupByDemo {

  static void Main() {

     string[] websites = { "hsNameA.com", "hsNameB.net", "hsNameC.net",
                           "hsNameD.com", "hsNameE.org", "hsNameF.org",
                          "hsNameG.tv",  "hsNameH.net", "hsNameI.tv" };

    // Use query methods to group websites by top-level domain name.
    var webAddrs = websites.Where(w => w.LastIndexOf('.') != –1).
           GroupBy(x => x.Substring(x.LastIndexOf(".")));

    // Execute the query and display the results.
    foreach(var sites in webAddrs) {
      Console.WriteLine("Web sites grouped by " + sites.Key);
      foreach(var site in sites)
        Console.WriteLine("  " + site);
      Console.WriteLine();
    }
  }
}

This version produces the same output as the earlier version. The only difference is how the query is created. In this version, the query methods are used.

Here is another example. Recall the join query used in the JoinDemo example shown earlier:

var inStockList = from item in items
                  join entry in statusList
                    on item.ItemNumber equals entry.ItemNumber
                  select new Temp(item.Name, entry.InStock);

This query produces a sequence that contains objects that encapsulate the name and the in-stock status of an inventory item. This information is synthesized from joining the two lists items and statusList. The following version reworks this query so that it uses the Join( ) method rather than the C# query syntax:

// Use Join() to produce a list of item names and status.
var inStockList = items.Join(statusList,
                    k1 => k1.ItemNumber,
                    k2 => k2.ItemNumber,
                    (k1, k2) => new Temp(k1.Name, k2.InStock) );

Although this version uses the named class called Temp to hold the resulting object, an anonymous type could have been used instead. This approach is shown next:

 var inStockList = items.Join(statusList,
                     k1 => k1.ItemNumber,
                     k2 => k2.ItemNumber,
                    (k1, k2) => new { k1.Name, k2.InStock} );

Query Syntax vs. Query Methods

As the preceding section has explained, C# has two ways of creating queries: the query syntax and the query methods. What is interesting, and not readily apparent by simply looking at a program’s source code, is that the two approaches are more closely related than you might at first assume. The reason is that the query syntax is compiled into calls to the query methods. Thus, when you write something like

where x < 10

the compiler translates it into

Where(x => x < 10)

Therefore, the two approaches to creating a query ultimately lead to the same place.

Given that the two approaches are ultimately equivalent, the following question naturally arises: Which approach is best for a C# program? The answer: In general, you will want to use the query syntax. It is fully integrated into the C# language, supported by keywords and syntax, and is cleaner.

More Query-Related Extension Methods

In addition to the methods that correspond to the query keywords supported by C#, the .NET Framework provides several other query-related extension methods that are often helpful in a query. These query-related methods are defined for IEnumerable<T> by Enumerable. Here is a sampling of several commonly used methods. Because many of the methods are overloaded, only their general form is shown.

Image

You have already seen Count( ) in action earlier in this chapter. Here is a program that demonstrates the others:

// Use several of the extension methods defined by Enumerable.
using System;
using System.Linq;

class ExtMethods {
  static void Main() {

    int[] nums =  { 3, 1, 2, 5, 4 };

    Console.WriteLine("The minimum value is " + nums.Min());
    Console.WriteLine("The maximum value is " + nums.Max());

    Console.WriteLine("The first value is " + nums.First());
    Console.WriteLine("The last value is " + nums.Last());

    Console.WriteLine("The sum is " + nums.Sum());
    Console.WriteLine("The average is " + nums.Average());

    if(nums.All(n => n > 0))
      Console.WriteLine("All values are greater than zero.");

    if(nums.Any(n => (n % 2) == 0))
      Console.WriteLine("At least one value is even.");

    if(nums.Contains(3))
      Console.WriteLine("The array contains 3.");
  }
}

The output is shown here:

The minimum value is 1
The maximum value is 5
The first value is 3
The last value is 4
The sum is 15
The average is 3
All values are greater than zero.
At least one value is even.
The array contains 3.

You can also use the query-related extension methods within a query based on the C# query syntax. In fact, it is quite common to do so. For example, this program uses Average( ) to obtain a sequence that contains only those values that are less than the average of the values in an array.

// Use Average() with the query syntax.
using System;
using System.Linq;

class ExtMethods2 {
  static void Main() {

    int[] nums =  { 1, 2, 4, 8, 6, 9, 10, 3, 6, 7 };

    var ltAvg = from n in nums
                let x = nums.Average()
                where n < x
                select n;

    Console.WriteLine("The average is " + nums.Average());

    Console.Write("These values are less than the average: ");

    // Execute the query and display the results.
    foreach(int i in ltAvg) Console.Write(i + " ");

    Console.WriteLine();
  }
}

The output is shown here:

The average is 5.6
These values are less than the average: 1 2 4 3

Pay special attention to the query:

    var ltAvg = from n in nums
                let x = nums.Average()
                where n < x
                select n;

Notice in the let statement, x is set equal to the average of the values in nums. This value is obtained by calling Average( ) on nums.

Deferred vs. Immediate Query Execution

In LINQ, queries have two different modes of execution: immediate and deferred. As explained early in this chapter, a query defines a set of rules that are not actually executed until a foreach statement executes. This is called deferred execution.

However, if you use one of the extension methods that produces a non-sequence result, then the query must be executed to obtain that result. For example, consider the Count( ) method. In order for Count( ) to return the number of elements in the sequence, the query must be executed, and this is done automatically when Count( ) is called. In this case, immediate execution takes place, with the query being executed automatically in order to obtain the result. Therefore, even though you don’t explicitly use the query in a foreach loop, the query is still executed.

Here is a simple example. It obtains the number of positive elements in the sequence.

// Use immediate execution.
using System;
using System.Linq;

class ImmediateExec {
  static void Main() {

    int[] nums =  { 1, -2, 3, 0, -4, 5 };

    // Create a query that obtains the number of positive
    // values in nums.
    int len = (from n in nums
               where n > 0
               select n).Count();

    Console.WriteLine("The number of positive values in nums: " + len);
  }
}

The output is

The number of positive values in nums: 3

In the program, notice that no explicit foreach loop is specified. Instead, the query automatically executes because of the call to Count( ).

As a point of interest, the query in the preceding program could also have been written like this:

var posNums = from n in nums
              where n > 0
              select n;

int len = posNums.Count(); // query executes here

In this case, Count( ) is called on the query variable. At that point, the query is executed to obtain the count.

Other methods that cause immediate execution of a query include ToArray( ) and ToList( ). Both are extension methods defined by Enumerable. ToArray( ) returns the results of a query in an array. ToList( ) returns the results of a query in the form of a List collection. (See Chapter 25 for a discussion of collections.) In both cases, the query is executed to obtain the results. For example, the following sequence obtains an array of the results generated by the posNums query just shown. It then displays the results.

int[] pnums = posNum.ToArray(); // query executes here

foreach(int i in pnums)
  Console.Write(i + " ");
}

Expression Trees

Another LINQ-related feature is the expression tree. An expression tree is a representation of a lambda expression as data. Thus, an expression tree, itself, cannot be executed. It can, however, be converted into an executable form. Expression trees are encapsulated by the System.Linq.Expressions.Expression<TDelegate> class. Expression trees are useful in situations in which a query will be executed by something outside the program, such as a database that uses SQL. By representing the query as data, the query can be converted into a format understood by the database. This process is used by the LINQ to SQL feature provided by Visual C#, for example. Thus, expression trees help C# support a variety of data sources.

You can obtain an executable form of an expression tree by calling the Compile( ) method defined by Expression. It returns a reference that can be assigned to a delegate and then executed. You can declare your own delegate type or use one of the predefined Func delegate types defined within the System namespace. Two forms of the Func delegate were mentioned earlier, when the query methods were described, but there are several others.

Expression trees have one key restriction: Only expression lambdas can be represented by expression trees. They cannot be used to represent statement lambdas.

Here is a simple example of an expression tree in action. It creates an expression tree whose data represents a method that determines if one integer is a factor of another. It then compiles the expression tree into executable code. Finally, it demonstrates the compiled code.

// A simple expression tree.
using System;
using System.Linq;
using System.Linq.Expressions;

class SimpleExpTree {
  static void Main() {

    // Represent a lambda expression as data.
    Expression<Func<int, int, bool>>
      IsFactorExp = (n, d) => (d != 0) ? (n % d) == 0 : false;

    // Compile the expression data into executable code.
    Func<int, int, bool> IsFactor = IsFactorExp.Compile();

    // Execute the expression.
    if(IsFactor(10, 5))
      Console.WriteLine("5 is a factor of 10.");

    if(!IsFactor(10, 7))
      Console.WriteLine("7 is not a factor of 10.");

    Console.WriteLine();
  }
}

The output is shown here:

5 is a factor of 10.
7 is not a factor of 10.

The program illustrates the two key steps in using an expression tree. First, it creates an expression tree by using this statement:

Expression<Func<int, int, bool>>
  IsFactorExp = (n, d) => (d != 0) ? (n % d) == 0 : false;

This constructs a representation of a lambda expression in memory. As explained, this representation is data, not code. This representation is referred to by IsFactorExp. The following statement converts the expression data into executable code:

Func<int, int, bool> IsFactor = IsFactorExp.Compile();

After this statement executes, the IsFactor delegate can be called to determine if one value is a factor of another.

One other point: Notice that Func<int, int, bool> indicates the delegate type. This form of Func specifies two parameters of type int and a return type of bool. This is the form of Func that is compatible with the lambda expression used in the program because that expression requires two parameters. Other lambda expressions may require different forms of Func, based on the number of parameters they require. In general, the specific form of Func must match the requirements of the lambda expression.

Extension Methods

As mentioned earlier, extension methods provide a means by which functionality can be added to a class without using the normal inheritance mechanism. Although you won’t often create your own extension methods (because the inheritance mechanism offers a better solution in many cases), it is still important that you understand how they work because of their integral importance to LINQ.

An extension method is a static method that must be contained within a static, non-generic class. The type of its first parameter determines the type of objects on which the extension method can be called. Furthermore, the first parameter must be modified by this. The object on which the method is invoked is passed automatically to the first parameter. It is not explicitly passed in the argument list. A key point is that even though an extension method is declared static, it can still be called on an object, just as if it were an instance method.

Here is the general form of an extension method:

static ret-type name (this invoked-on-type ob, param-list)

Of course, if there are no arguments other than the one passed implicitly to ob, then param-list will be empty. Remember, the first parameter is automatically passed the object on which the method is invoked. In general, an extension method will be a public member of its class.

Here is an example that creates three simple extension methods:

// Create and use some extension methods.
using System;
using System.Globalization;
static class MyExtMeths {

  // Return the reciprocal of a double.
  public static double Reciprocal(this double v) {
    return 1.0 / v;
  }

  // Reverse the case of letters within a string and
  // return the result.
  public static string RevCase(this string str) {
     string temp = "";

    foreach(char ch in str) {
      if(Char.IsLower(ch)) temp += Char.ToUpper(ch, CultureInfo.CurrentCulture);
      else temp += Char.ToLower(ch, CultureInfo.CurrentCulture);
    }
    return temp;
  }

  // Return the absolute value of n / d.
  public static double AbsDivideBy(this double n, double d) {
    return Math.Abs(n / d);
  }
}

class ExtDemo {
  static void Main() {
    double val = 8.0;
    string str = "Alpha Beta Gamma";

    // Call the Reciprocal() extension method.
    Console.WriteLine("Reciprocal of {0} is {1}",
                      val, val.Reciprocal());
  
    // Call the RevCase() extension method.
    Console.WriteLine(str + " after reversing case is " +
                      str.RevCase());

    // Use AbsDivideBy();
    Console.WriteLine("Result of val.AbsDivideBy(-2): " +
                       val.AbsDivideBy(-2));
  }
}

The output is shown here:

Reciprocal of 8 is 0.125
Alpha Beta Gamma after reversing case is aLPHA bETA gAMMA
Result of val.AbsDivideBy(-2): 4

In the program, notice that each extension method is contained in a static class called MyExtMeths. As explained, an extension method must be declared within a static class. Furthermore, this class must be in scope in order for the extension methods that it contains to be used. (This is why you needed to include the System.Linq namespace to use the LINQ-related extension methods.) Next, notice the calls to the extension methods. They are invoked on an object, in just the same way that an instance method is called. The main difference is that the invoking object is passed to the first parameter of the extension method. Therefore, when the expression

val.AbsDivideBy(-2)

executes, val is passed to the n parameter of AbsDivideBy( ) and –2 is passed to the d parameter.

As a point of interest, because the methods Reciprocal( ) and AbsDivideBy( ) are defined for double, it is legal to invoke them on a double literal, as shown here:

8.0.Reciprocal()
8.0.AbsDivideBy(-1)

Furthermore, RevCase( ) can be invoked like this:

"AbCDe".RevCase()

Here, the reversed-case version of a string literal is returned.

PLINQ

The .NET Framework 4.0 adds a new capability to LINQ called PLINQ. PLINQ provides support for parallel programming. This feature enables a query to automatically take advantage of multiple processors. PLINQ and other features related to parallel programming are described in Chapter 24.