You can find the wrox.com
code downloads for this chapter at www.wrox.com/go/beginningvisualc#2015programming
on the Download Code tab. The code is in the Chapter 20 download and individually named according to the names throughout the chapter.
This chapter introduces Language INtegrated Query (LINQ). LINQ is an extension to the C# language that integrates data query directly into the programming language itself.
Before LINQ this sort of work required writing a lot of looping code, and additional processing such as sorting or grouping the found objects required even more code that would differ depending on the data source. LINQ provides a portable, consistent way of querying, sorting, and grouping many different kinds of data (XML, JSON, SQL databases, collections of objects, web services, corporate directories, and more).
First you'll build on the previous chapter by learning the additional capabilities that the system.xml.linq
namespace adds for creating XML. Then you'll get into the heart of LINQ by using query syntax, method syntax, lambda expressions, sorting, grouping, and joining related results.
LINQ is large enough that complete coverage of all its facilities and methods is beyond the scope of a beginning book. However, you will see examples of each of the different types of statements and operators you are likely to need as a user of LINQ, and you will be pointed to resources for more in-depth coverage as appropriate.
LINQ to XML is an alternate set of classes for XML that enables the use of LINQ for XML data and also makes certain operations with XML easier even if you are not using LINQ. We will look at a couple of specific cases where LINQ to XML has advantages over the XML DOM (Document Object Model) introduced in the previous chapter.
While you can create XML documents in code with the XML DOM, LINQ to XML provides an easier way to create XML documents called functional construction. In formal construction the constructor calls can be nested in a way that naturally reflects the structure of the XML document. In the following Try It Out, you use functional constructors to make a simple XML document containing customers and orders.
Unlike the XML DOM, LINQ to XML works with XML fragments (partial or incomplete XML documents) in very much the same way as complete XML documents. When working with a fragment, you simply work with XElement
as the top-level XML object instead of XDocument
.
In the following Try It Out, you load, save, and manipulate an XML element and its child nodes, just as you did for an XML document.
LINQ to XML is just one example of a LINQ provider. Visual Studio 2015 and the .NET Framework 4.5 come with a number of built-in LINQ providers that provide query solutions for different types of data:
DataSet
object was introduced in the first version of the .NET Framework. This variety of LINQ enables legacy .NET data to be queried easily with LINQ.With so many varieties of LINQ, it is impossible to cover them all in a beginning book, but the syntax and methods you will see apply to all. Let's next look at the LINQ query syntax using the LINQ to Objects provider.
In the following Try It Out, you use LINQ to create a query to find some data in a simple in-memory array of objects and print it to the console.
Follow these steps to create the example in Visual Studio 2015:
C:\BegVCSharp\Chapter20
, and then open the main source file Program.cs
.System.Linq
namespace by default in Program.cs
:using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using static System.Console;
using System.Threading.Text;
Main()
method in Program.cs
:static void Main(string[] args)
{
string[] names = { "Alonso", "Zheng", "Smith", "Jones", "Smythe",
"Small", "Ruiz", "Hsieh", "Jorgenson", "Ilyich", "Singh", "Samba", "Fatimah" };
var queryResults =
from n in names
where n.StartsWith("S")
select n;
WriteLine("Names beginning with S:");
foreach (var item in queryResults) {
WriteLine(item);
}
Write("Program finished, press Enter/Return to continue:");
ReadLine();
}
Names beginning with S:
Smith
Smythe
Small
Singh
Samba
Program finished, press Enter/Return to continue:
Simply press Enter/Return to finish the program and make the console screen disappear. If you used Ctrl+F5 (Start Without Debugging), you may need to press Enter/Return twice. That finishes the program run.The first step is to reference the System.Linq
namespace, which is done automatically by Visual Studio 2015 when you create a project:
using System.Linq;
All the underlying base system support classes for LINQ reside in the System.Linq
namespace. If you create a C# source file outside of Visual Studio 2015 or edit a project created from a previous version, you may have to add the using System.Linq
directive manually.
The next step is to create some data, which is done in this example by declaring and initializing the array of names
:
string[] names = { "Alonso", "Zheng", "Smith", "Jones", "Smythe", "Small",
"Ruiz", "Hsieh", "Jorgenson", "Ilyich", "Singh", "Samba", "Fatimah" };
This is a trivial set of data, but it is good to start with an example for which the result of the query is obvious. The actual LINQ query statement is the next part of the program:
var queryResults =
from n in names
where n.StartsWith("S")
select n;
That is an odd-looking statement, isn't it? It almost looks like something from a language other than C#, and the from…where…select
syntax is deliberately similar to that of the SQL database query language. However, this statement is not SQL; it is indeed C#, as you saw when you typed in the code in Visual Studio 2015 — the from
, where
, and select
were highlighted as keywords, and the odd-looking syntax is perfectly fine to the compiler.
The LINQ query statement in this program uses the LINQ declarative query syntax:
var queryResults =
from n in names
where n.StartsWith("S")
select n;
The statement has four parts: the result variable declaration beginning with var
, which is assigned using a query expression consisting of the from
clause; the where
clause; and the select
clause. Let's look at each of these parts in turn.
The LINQ query starts by declaring a variable to hold the results of the query, which is usually done by declaring a variable with the var
keyword:
var queryResult =
var
is a keyword in C# created to declare a general variable type that is ideal for holding the results of LINQ queries. The var
keyword tells the C# compiler to infer the type of the result based on the query. That way, you don't have to declare ahead of time what type of objects will be returned from the LINQ query — the compiler takes care of it for you. If the query can return multiple items, then it acts like a collection of the objects in the query data source (technically, it is not a collection; it just looks that way).
If you want to know the details, the query result will be a type that implements the IEnumerable<T>
interface. The angle brackets with T(<T>)
following IEnumerable
indicate that it is a generic type. Generics are described in Chapter 12.
In this particular case, the compiler creates a special LINQ data type that provides an ordered list of strings (strings because the data source is a collection of strings).
By the way, the name queryResult
is arbitrary — you can name the result anything you want. It could be namesBeginningWithS
or anything else that makes sense in your program.
The next part of the LINQ query is the from
clause, which specifies the data you are querying:
from n in names
Your data source in this case is names
, the array of strings declared earlier. The variable n
is just a stand-in for an individual element in the data source, similar to the variable name following a foreach
statement. By specifying from
, you are indicating that you are going to query a subset of the collection, rather than iterate through all the elements.
Speaking of iteration, a LINQ data source must be enumerable — that is, it must be an array or collection of items from which you can pick one or more elements to iterate through.
Enumerable means the data source must support the IEnumerable<T>
interface, which is supported for any C# array or collection of items.
The data source cannot be a single value or object, such as a single int
variable. You already have such a single item, so there is no point in querying it!
In the next part of the LINQ query, you specify the condition for your query using the where
clause, which looks like this:
where n.StartsWith("S")
Any Boolean (true or false) expression that can be applied to the items in the data source can be specified in the where
clause. Actually, the where
clause is optional and can even be omitted, but in almost all cases you will want to specify a where
condition to limit the results to only the data you want. The where
clause is called a restriction operator in LINQ because it restricts the results of the query.
Here, you specify that the name string starts with the letter S, but you could specify anything else about the string instead — for example, a length greater than 10 (where n.Length > 10
) or containing a Q (where n.Contains("Q"))
.
Finally, the select
clause specifies which items appear in the result set. The select
clause looks like this:
select n
The select
clause is required because you must specify which items from your query appear in the result set. For this set of data, it is not very interesting because you have only one item, the name, in each element of the result set. You'll look at some examples with more complex objects in the result set where the usefulness of the select
clause will be more apparent, but first, you need to finish the example.
Now you print out the results of the query. Like the array used as the data source, the results of a LINQ query like this are enumerable, meaning you can iterate through the results with a foreach
statement:
WriteLine("Names beginning with S:");
foreach (var item in queryResults) {
WriteLine(item);
}
In this case, you matched five names — Smith, Smythe, Small, Singh, and Samba — so that is what you display in the foreach
loop.
You may be thinking that the foreach
loop really isn't part of LINQ itself — it's only looping through your results. While it's true that the foreach
construct is not itself part of LINQ, nevertheless, it is the part of your code that actually executes the LINQ query! The assignment of the query results variable only saves a plan for executing the query; with LINQ, the data itself is not retrieved until the results are accessed. This is called deferred query execution or lazy evaluation of queries. Execution will be deferred for any query that produces a sequence — that is, a list — of results.
Now, back to the code. You've printed out the results; it's time to finish the program:
Write("Program finished, press Enter/Return to continue:");
ReadLine();
These lines just ensure that the results of the console program stay on the screen until you press a key, even if you press F5 instead of Ctrl+F5. You'll use this construct in most of the other LINQ examples as well.
There are multiple ways of doing the same thing with LINQ, as is often the case in programming. As noted, the previous example was written using the LINQ query syntax; in the next example, you will write the same program using LINQ's method syntax (also called explicit syntax, but the term method syntax is used here).
LINQ is implemented as a series of extension methods to collections, arrays, query results, and any other object that implements the IEnumerable<T>
interface. You can see these methods with the Visual Studio IntelliSense feature. For example, in Visual Studio 2015, open the Program.cs
file in the FirstLINQquery
program you just completed and type in a new reference to the names
array just below it:
string[] names = { "Alonso", "Zheng", "Smith", "Jones", "Smythe", "Small",
"Ruiz", "Hsieh", "Jorgenson", "Ilyich", "Singh", "Samba", "Fatimah" };
names.
Just as you type the period following names
, you will see the methods available for names
listed by the Visual Studio IntelliSense feature.
The Where<T>
method and most of the other available methods are extension methods (as shown in the documentation appearing to the right of the Where<T>
method, it begins with extension
). You can see that they are LINQ extensions by commenting out the using System.Linq
directive at the top; you will find that Where<T>
, Union<T>
, Take<T>
, and most of the other methods in the list no longer appear. The from…where…select
query expression you used in the previous example is translated by the C# compiler into a series of calls to these methods. When using the LINQ method syntax, you call these methods directly.
The query syntax is the preferred way of programming queries in LINQ, as it is generally easier to read and is simpler to use for the most common queries. However, it is important to have a basic understanding of the method syntax because some LINQ capabilities either are not available in the query syntax, or are just easier to use in the method syntax.
As the Visual Studio 2015 online help recommends, use query syntax whenever possible, and method syntax whenever necessary.
In this chapter, you will mostly use the query syntax, but the method syntax is pointed out in situations where it is needed, and you'll learn how to use the method syntax to solve the problem.
Most of the LINQ methods that use the method syntax require that you pass a method or function to evaluate the query expression. The method/function parameter is passed in the form of a delegate, which typically references an anonymous method.
Luckily, LINQ makes doing this much easier than it sounds! You create the method/function by using a lambda expression, which encapsulates the delegate in an elegant manner.
A lambda expression is a simple way to create a method on-the-fly for use in your LINQ query. It uses the =>
operator, which declares the parameters for your method followed by the method logic all on a single line!
The term “lambda expression” comes from lambda calculus, which is a mathematical field important in programming language theory. Look it up if you're mathematically inclined. Luckily you don't need the math in order to use lambdas in C#!
For example, consider the lambda expression:
n => n < 0
This declares a method with a single parameter named n
. The method returns true if n
is less than zero, otherwise false. It's dead simple. You don't have to come up with a method name, put in a return statement, or wrap any code with curly braces.
Returning a true/false value like this is typical for methods used in LINQ lambdas, but it doesn't have to be done. For example, here is a lambda that creates a method that returns the sum of two variables. This lambda uses multiple parameters:
(a, b) => a + b
This declares a method with two parameters named a
and b
. The method logic returns the sum of a
and b
. You don't have to declare what type a
and b
are. They can be int or double or string. The C# compiler infers the types.
Finally, consider this lambda expression:
n => n.StartsWith("S")
This method returns true if n
starts with the letter S, otherwise false. Try this out in an actual program to see this more clearly.
Follow these steps to create the example in Visual Studio 2015:
C:\BegVCSharp\Chapter20
. Open the main source file Program.cs
.Linq
namespace automatically in Program.cs
:using System.Linq;
Main()
method in Program.cs
:static void Main(string[] args)
{
string[] names = { "Alonso", "Zheng", "Smith", "Jones", "Smythe",
"Small", "Ruiz", "Hsieh", "Jorgenson", "Ilyich", "Singh", "Samba", "Fatimah" };
var queryResults = names.Where(n => n.StartsWith("S"));
WriteLine("Names beginning with S:");
foreach (var item in queryResults) {
WriteLine(item);
}
Write("Program finished, press Enter/Return to continue:");
ReadLine();
}
Names beginning with S:
Smith
Smythe
Small
Singh
Samba
Program finished, press Enter/Return to continue:
As before, the System.Linq
namespace is referenced automatically by Visual Studio 2015:
using System.Linq;
The same source data as before is created again by declaring and initializing the array of names:
string[] names = { "Alonso", "Zheng", "Smith", "Jones", "Smythe", "Small", "Ruiz",
"Hsieh", "Jorgenson", "Ilyich", "Singh", "Samba", "Fatimah" };
The part that is different is the LINQ query, which is now a call to the Where()
method instead of a query expression:
var queryResults = names.Where(n => n.StartsWith("S"));
The C# compiler compiles the lambda expression n => n.StartsWith("S"))
into an anonymous method that is executed by Where()
on each item in the names array. If the lambda expression returns true
for an item, that item is included in the result set returned by Where()
. The C# compiler infers that the Where()
method should accept string
as the input type for each item from the definition of the input source (the names
array, in this case).
Well, a lot is going on in that one line, isn't it? For the simplest type of query like this, the method syntax is actually shorter than the query syntax because you do not need the from
or select
clauses; however, most queries are more complex than this.
The rest of the example is the same as the previous one — you print out the results of the query in a foreach
loop and pause the output so you can see it before the program finishes execution:
foreach (var item in queryResults) {
WriteLine(item);
}
Write("Program finished, press Enter/Return to continue:");
ReadLine();
An explanation of these lines isn't repeated here because that was covered in the “How It Works” section following the first example in the chapter. Let's move on to explore how to use more of LINQ's capabilities.
Once you have located some data of interest with a where
clause (or Where()
method invocation), LINQ makes it easy to perform further processing — such as reordering the results — on the resulting data. In the following Try It Out, you put the results from your first query in alphabetical order.
Follow these steps to create the example in Visual Studio 2015:
C:\BegVCSharp\Chapter20
.Program.cs
. As before, Visual Studio 2015 includes the using System.Linq;
namespace directive automatically in Program.cs
.Main()
method in Program.cs
:static void Main(string[] args)
{
string[] names = { "Alonso", "Zheng", "Smith", "Jones", "Smythe",
"Small", "Ruiz", "Hsieh", "Jorgenson", "Ilyich", "Singh", "Samba", "Fatimah" };
var queryResults =
from n in names
where n.StartsWith("S")
orderby n
select n;
WriteLine("Names beginning with S ordered alphabetically:");
foreach (var item in queryResults) {
WriteLine(item);
}
Write("Program finished, press Enter/Return to continue:");
ReadLine();
}
Names beginning with S:
Samba
Singh
Small
Smith
Smythe
Program finished, press Enter/Return to continue:
This program is nearly identical to the previous example, except for one additional line added to the query statement:
var queryResults =
from n in names
where n.StartsWith("S")
orderby n
select n;
The orderby
clause looks like this:
orderby n
Like the where
clause, the orderby
clause is optional. Just by adding one line, you can order the results of any arbitrary query, which would otherwise require at least several lines of additional code and probably additional methods or collections to store the results of the reordered result, depending on the sorting algorithm you chose to implement. If multiple types needed to be sorted, you would have to implement a set of ordering methods for each one. With LINQ, you don't need to worry about any of that; just add one additional clause in the query statement and you're done.
By default, orderby
orders in ascending order (A to Z), but you can specify descending order (from Z to A) simply by adding the descending
keyword:
orderby n descending
This orders the example results as follows:
Smythe
Smith
Small
Singh
Samba
Plus, you can order by any arbitrary expression without having to rewrite the query; for example, to order by the last letter in the name instead of normal alphabetical order, you just change the orderby
clause to the following:
orderby n.Substring(n.Length - 1)
This results in the following output:
Samba
Smythe
Smith
Singh
Small
The last letters are in alphabetical order (a, e, h, h, l)
. However, you will notice that the execution is implementation-dependent, meaning there's no guarantee of order beyond what is specified in the orderby clause. The last letter is the only letter considered, so, in this case, Smith came before Singh.
All this LINQ syntax is well and good, you may be saying, but what is the point? You can see the expected results clearly just by looking at the source array, so why go to all this trouble to query something that is obvious by just looking? As mentioned earlier, sometimes the results of a query are not so obvious. In the following Try It Out, you create a very large array of numbers and query it using LINQ.
Follow these steps to create the example in Visual Studio 2015:
C:\BegVCSharp\Chapter20
. As before, when you create the project, Visual Studio 2015 already includes the Linq
namespace method in Program.cs
:using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using static System.Console;
Main()
method:static void Main(string[] args)
{
int[] numbers = GenerateLotsOfNumbers(12045678);
var queryResults =
from n in numbers
where n < 1000
select n
;
WriteLine("Numbers less than 1000:");
foreach (var item in queryResults)
{
WriteLine(item);
}
Write("Program finished, press Enter/Return to continue:");
ReadLine();
}
private static int[] GenerateLotsOfNumbers(int count)
{
Random generator = new Random(0);
int[] result = new int[count];
for (int i = 0; i < count; i++)
{
result[i] = generator.Next();
}
return result;
}
Numbers less than 1000:
714
24
677
350
257
719
584
Program finished, press Enter/Return to continue:
As before, the first step is to reference the System.Linq
namespace, which is done automatically by Visual Studio 2015 when you create the project:
using System.Linq;
The next step is to create some data, which is done in this example by creating and calling the GenerateLotsOfNumbers()
method:
int[] numbers = GenerateLotsOfNumbers(12345678);
private static int[] GenerateLotsOfNumbers(int count)
{
Random generator = new Random(0);
int[] result = new int[count];
for (int i = 0; i < count; i++)
{
result[i] = generator.Next();
}
return result;
}
This is not a trivial set of data — there are more than 12 million numbers in the array! In one of the exercises at the end of the chapter, you will change the size
parameter passed to the GenerateLotsOfNumbers()
method to generate variously sized sets of random numbers and see how this affects the query results. As you will see when doing the exercises, the size shown here of 12,345,678 is just large enough for the program to generate some random numbers less than 1,000, in order to have results to show for this first query.
The values should be randomly distributed over the range of a signed integer (from zero to more than two billion). By creating the random number generator with a seed of 0
, you ensure that the same set of random numbers is created each time and is repeatable, so you get the same query results as shown here, but what those query results are is unknown until you try some queries. Luckily, LINQ makes those queries easy!
The query statement itself is similar to what you did with the names before, selecting some numbers that meet a condition (in this case, numbers less than 1,000):
var queryResults =
from n in numbers
where n < 1000
select n
The orderby
clause isn't needed here and would add extra processing time (not noticeably for this query, but more so as you vary the conditions in the next example).
You print out the results of the query with a foreach
statement, just as in the previous example:
WriteLine("Numbers less than 1000:");
foreach (var item in queryResults) {
WriteLine(item);
}
Again, output to the console and read a character to pause the output:
Write("Program finished, press Enter/Return to continue:");
ReadLine();
The pause code appears in all the following examples but isn't shown again because it is the same for each one.
It is very easy with LINQ to change the query conditions to explore different characteristics of the data set. However, depending on how many results the query returns, it may not make sense to print all the results each time. In the next section you'll see how LINQ provides aggregate operators to deal with that issue.
Often, a query returns more results than you might expect. For example, if you were to change the condition of the large-number query program you just created to list the numbers greater than 1,000, rather than the numbers less than 1,000, there would be so many query results that the numbers would not stop printing!
Luckily, LINQ provides a set of aggregate operators that enable you to analyze the results of a query without having to loop through them all. Table 20.1 shows the most commonly used aggregate operators for a set of numeric results such as those from the large-number query. These may be familiar to you if you have used a database query language such as SQL.
Table 20.1 Aggregate Operators for Numeric Results
Operator | Description |
Count() |
Count of results |
Min() |
Minimum value in results |
Max() |
Maximum value in results |
Average() |
Average value of numeric results |
Sum() |
Total of all of numeric results |
There are more aggregate operators, such as Aggregate()
, for executing arbitrary code in a manner that enables you to code your own aggregate function. However, those are for advanced users and therefore beyond the scope of this book.
Because the aggregate operators return a simple scalar type instead of a sequence for their results, their use forces immediate execution of query results with no deferred execution.
In the following Try It Out, you modify the large-number query and use aggregate operators to explore the result set from the greater-than version of the large-number query using LINQ.
Follow these steps to create the example in Visual Studio 2015:
C:\BegVCSharp\Chapter20
.Linq
namespace method in Program.cs
. You just need to modify the Main()
method as shown in the following code and in the rest of this Try It Out. As with the previous example, the orderby
clause is not used in this query. However, the condition on the where
clause is the opposite of the previous example (the numbers are greater than 1,000 (n > 1000)
, instead of less than 1,000): static void Main(string[] args)
{
int[] numbers = GenerateLotsOfNumbers(12345678);
WriteLine("Numeric Aggregates");
var queryResults =
from n in numbers
where n > 1000
select n
;
WriteLine("Count of Numbers > 1000");
WriteLine(queryResults.Count());
WriteLine("Max of Numbers > 1000");
WriteLine(queryResults.Max());
WriteLine("Min of Numbers > 1000");
WriteLine(queryResults.Min());
WriteLine("Average of Numbers > 1000");
WriteLine(queryResults.Average());
WriteLine("Sum of Numbers > 1000");
WriteLine(queryResults.Sum(n => (long) n));
Write("Program finished, press Enter/Return to continue:");
ReadLine();
}
GenerateLotsOfNumbers()
method used in the previous example: private static int[] GenerateLotsOfNumbers(int count)
{
Random generator = new Random(0);
int[] result = new int[count];
for (int i = 0; i < count; i++)
{
result[i] = generator.Next();
}
return result;
}
Numeric Aggregates
Count of Numbers > 1000
12345671
Maximum of Numbers > 1000
2147483591
Minimum of Numbers > 1000
1034
Average of Numbers > 1000
1073643807.50298
Sum of Numbers > 1000
13254853218619179
Program finished, press Enter/Return to continue:
This query produces many more results than the previous example (more than 12 million). Using orderby
on this result set would definitely have a noticeable impact on performance! The largest number (maximum) in the result set is over two billion and the smallest (minimum) is just over one thousand, as expected. The average is around one billion, near the middle of the range of possible values. Looks like the Random()
function generates a good distribution of numbers!The first part of the program is exactly the same as the previous example, with the reference to the System.Linq
namespace, and the use of the GenerateLotsOfNumbers()
method to generate the source data:
int[] numbers = GenerateLotsOfNumbers(12345678);
The query is the same as the previous example, except for changing the where
condition from less than to greater than:
var queryResults =
from n in numbers
where n > 1000
select n;
As noted before, this query using the greater-than condition produces many more results than the less-than query (with this particular data set). By using the aggregate operators, you are able to explore the results of the query without having to print out each result or do a comparison in a foreach
loop. Each one appears as a method that can be called on the result set, similar to methods on a collection type.
Look at the use of each aggregate operator:
Count()
:
WriteLine("Count of Numbers > 1000");
WriteLine(queryResults.Count());
Count()
returns the number of rows in the query results — in this case, 12,345,671 rows.Max()
:
WriteLine("Max of Numbers > 1000");
WriteLine(queryResults.Max());
Max()
returns the maximum value in the query results — in this case, a number larger than two billion: 2,147,483,591, which is very close to the maximum value of an int
(int.MaxValue
or 2,147,483,647).Min()
:
WriteLine("Min of Numbers > 1000");
WriteLine(queryResults.Min());
min()
returns the minimum value in the query results — in this case, 1,034.Average()
:
WriteLine("Average of Numbers > 1000");
WriteLine(queryResults.Average());
Average()
returns the average value of the query results, which in this case is 1,073,643,807.50298, a value very close to the middle of the range of possible values from 1,000 to more than two billion. This is rather meaningless with an arbitrary set of large numbers, but it shows the kind of query result analysis that is possible. You'll look at a more practical use of these operators with some business-oriented data in the last part of the chapter.Sum()
:
WriteLine("Sum of Numbers > 1000");
WriteLine(queryResults.Sum(n => (long) n));
You passed the lambda expression n => (long) n
to the Sum()
method call to get the sum of all the numbers. Although Sum()
has a no-parameter overload, like Count()
, Min()
, Max()
, and so on, using that version of the method call would cause an overflow error because there are so many large numbers in the data set that the sum of all of them would be too large to fit into a standard 32-bit int
, which is what the no-parameter version of Sum()
returns. The lambda expression enables you to convert the result of Sum()
to a long 64-bit integer, which is what you need to hold the total of over 13 quadrillion without overflow — 13,254,853,218,619,179 lambda expressions enable you to perform this kind of fix-up easily.
In addition to Count()
, which returns a 32-bit int
, LINQ also provides a LongCount()
method that returns the count of query results in a 64-bit integer. That is a special case, however — all the other operators require a lambda or a call to a conversion method if a 64-bit version of the number is needed.
Another type of query that those of you familiar with the SQL data query language will recognize is the SELECT DISTINCT
query, in which you search for the unique values in your data — that is, the query removes any repeated values from the result set. This is a fairly common need when working with queries.
Suppose you need to find the distinct regions in the customer data used in the previous examples. There is no separate region list in the data you just used, so you need to find the unique, nonrepeating list of regions from the customer list itself. LINQ provides a Distinct()
method that makes it easy to find this data. You'll use it in the following Try It Out.
Follow these steps to create the example in Visual Studio 2015:
C:\BegVCSharp\Chapter20
.Customer
class and the initialization of the customers
list (List<Customer> customers
):class Customer
{
public string ID { get; set; }
public string City { get; set; }
public string Country { get; set; }
public string Region { get; set; }
public decimal Sales { get; set; }
public override string ToString()
{
return "ID: " + ID + " City: " + City +
" Country: " + Country +
" Region: " + Region +
" Sales: " + Sales;
}
}
class Program
{
static void Main(string[] args)
{
List<Customer> customers = new List<Customer> {
new Customer { ID="A", City="New York", Country="USA",
Region="North America", Sales=9999},
new Customer { ID="B", City="Mumbai", Country="India",
Region="Asia", Sales=8888},
new Customer { ID="C", City="Karachi", Country="Pakistan",
Region="Asia", Sales=7777},
new Customer { ID="D", City="Delhi", Country="India",
Region="Asia", Sales=6666},
new Customer { ID="E", City="São Paulo", Country="Brazil",
Region="South America", Sales=5555 },
new Customer { ID="F", City="Moscow", Country="Russia",
Region="Europe", Sales=4444 },
new Customer { ID="G", City="Seoul", Country="Korea",
Region="Asia", Sales=3333 },
new Customer { ID="H", City="Istanbul", Country="Turkey",
Region="Asia", Sales=2222 },
new Customer { ID="I", City="Shanghai", Country="China",
Region="Asia", Sales=1111 },
new Customer { ID="J", City="Lagos", Country="Nigeria",
Region="Africa", Sales=1000 },
new Customer { ID="K", City="Mexico City", Country="Mexico",
Region="North America", Sales=2000 },
new Customer { ID="L", City="Jakarta", Country="Indonesia",
Region="Asia", Sales=3000 },
new Customer { ID="M", City="Tokyo", Country="Japan",
Region="Asia", Sales=4000 },
new Customer { ID="N", City="Los Angeles", Country="USA",
Region="North America", Sales=5000 },
new Customer { ID="O", City="Cairo", Country="Egypt",
Region="Africa", Sales=6000 },
new Customer { ID="P", City="Tehran", Country="Iran",
Region="Asia", Sales=7000 },
new Customer { ID="Q", City="London", Country="UK",
Region="Europe", Sales=8000 },
new Customer { ID="R", City="Beijing", Country="China",
Region="Asia", Sales=9000 },
new Customer { ID="S", City="Bogotá", Country="Colombia",
Region="South America", Sales=1001 },
new Customer { ID="T", City="Lima", Country="Peru",
Region="South America", Sales=2002 }
};
Main()
method, following the initialization of the customers
list, enter (or modify) the query as shown here: var queryResults = customers.Select(c => c.Region).Distinct();
Main()
method as shown here.foreach (var item in queryResults)
{
WriteLine(item);
}
Write("Program finished, press Enter/Return to continue:");
ReadLine();
North America
Asia
South America
Europe
Africa
Program finished, press Enter/Return to continue:
The Customer
class and customers
list initialization are the same as in the previous example. In the query statement, you call the Select()
method with a simple lambda expression to select the region from the Customer
objects, and then call Distinct()
to return only the unique results from Select()
:
var queryResults = customers.Select(c => c.Region).Distinct();
Because Distinct()
is available only in method syntax, you make the call to Select()
using method syntax. However, you can call Distinct()
to modify a query made in the query syntax as well:
var queryResults = (from c in customers select c.Region).Distinct();
Because query syntax is translated by the C# compiler into the same series of LINQ method calls as used in the method syntax, you can mix and match if it makes sense for readability and style.
Now that you are dealing with objects with multiple properties, you might be able to envision a situation where ordering the query results by a single field is not enough. What if you wanted to query your customers and order the results alphabetically by region, but then order alphabetically by country or city name within a region? LINQ makes this very easy, as you will see in the following Try It Out.
Follow these steps to create the example in Visual Studio 2015:
C:\BegVCSharp\Chapter20
.Customer
class and the initialization of the customers
list (List<Customer> customers
) as shown in the BegVCSharp_20_8_SelectDistinctQuery example; this code is exactly the same as in previous examples.Main()
method, following the initialization of the customers
list, enter the following query: var queryResults =
from c in customers
orderby c.Region, c.Country, c.City
select new { c.ID, c.Region, c.Country, c.City }
;
Main()
method are the same as in previous examples.{ ID = O, Region = Africa, Country = Egypt, City = Cairo }
{ ID = J, Region = Africa, Country = Nigeria, City = Lagos }
{ ID = R, Region = Asia, Country = China, City = Beijing }
{ ID = I, Region = Asia, Country = China, City = Shanghai }
{ ID = D, Region = Asia, Country = India, City = Delhi }
{ ID = B, Region = Asia, Country = India, City = Mumbai }
{ ID = L, Region = Asia, Country = Indonesia, City = Jakarta }
{ ID = P, Region = Asia, Country = Iran, City = Tehran }
{ ID = M, Region = Asia, Country = Japan, City = Tokyo }
{ ID = G, Region = Asia, Country = Korea, City = Seoul }
{ ID = C, Region = Asia, Country = Pakistan, City = Karachi }
{ ID = H, Region = Asia, Country = Turkey, City = Istanbul }
{ ID = F, Region = Europe, Country = Russia, City = Moscow }
{ ID = Q, Region = Europe, Country = UK, City = London }
{ ID = K, Region = North America, Country = Mexico, City = Mexico City }
{ ID = N, Region = North America, Country = USA, City = Los Angeles }
{ ID = A, Region = North America, Country = USA, City = New York }
{ ID = E, Region = South America, Country = Brazil, City = São Paulo }
{ ID = S, Region = South America, Country = Colombia, City = Bogotá }
{ ID = T, Region = South America, Country = Peru, City = Lima }
Program finished, press Enter/Return to continue:
The Customer
class and customers
list initialization are the same as in previous examples. In this query you have no where
clause because you want to see all the customers, but you simply list the fields you want to sort by order in a comma-separated list in the orderby
clause:
orderby c.Region, c.Country, c.City
Couldn't be easier, could it? It seems a bit counterintuitive that a simple list of fields is allowed in the orderby
clause but not in the select
clause, but that is how LINQ works. It makes sense if you realize that the select
clause is creating a new object but the orderby
clause, by definition, operates on a field-by-field basis.
You can add the descending
keyword to any of the fields listed to reverse the sort order for that field. For example, to order this query by ascending region but descending country, simply add descending
following Country
in the list, like this:
orderby c.Region, c.Country descending, c.City
With descending
added, you see following output:
{ ID = J, Region = Africa, Country = Nigeria, City = Lagos }
{ ID = O, Region = Africa, Country = Egypt, City = Cairo }
{ ID = H, Region = Asia, Country = Turkey, City = Istanbul }
{ ID = C, Region = Asia, Country = Pakistan, City = Karachi }
{ ID = G, Region = Asia, Country = Korea, City = Seoul }
{ ID = M, Region = Asia, Country = Japan, City = Tokyo }
{ ID = P, Region = Asia, Country = Iran, City = Tehran }
{ ID = L, Region = Asia, Country = Indonesia, City = Jakarta }
{ ID = D, Region = Asia, Country = India, City = Delhi }
{ ID = B, Region = Asia, Country = India, City = Mumbai }
{ ID = R, Region = Asia, Country = China, City = Beijing }
{ ID = I, Region = Asia, Country = China, City = Shanghai }
{ ID = Q, Region = Europe, Country = UK, City = London }
{ ID = F, Region = Europe, Country = Russia, City = Moscow }
{ ID = N, Region = North America, Country = USA, City = Los Angeles }
{ ID = A, Region = North America, Country = USA, City = New York }
{ ID = K, Region = North America, Country = Mexico, City = Mexico City }
{ ID = T, Region = South America, Country = Peru, City = Lima }
{ ID = S, Region = South America, Country = Colombia, City = Bogotá }
{ ID = E, Region = South America, Country = Brazil, City = São Paulo }
Program finished, press Enter/Return to continue:
Note that the cities in India and China are still in ascending order even though the country ordering has been reversed.
A group query divides the data into groups and enables you to sort, calculate aggregates, and compare by group. These are often the most interesting queries in a business context (the ones that really drive decision-making). For example, you might want to compare sales by country or by region to decide where to open another store or hire more staff. You'll do that in the next Try It Out.
Follow these steps to create the example in Visual Studio 2015:
C:\BegVCSharp\Chapter20
.Customer
class and the initialization of the customers
list (List<Customer> customers
), as shown in the BegVCSharp_20_8_SelectDistinctQuery example; this code is exactly the same as previous examples.Main()
method, following the initialization of the customers
list, enter two queries: var queryResults =
from c in customers
group c by c.Region into cg
select new { TotalSales = cg.Sum(c => c.Sales), Region = cg.Key }
;
var orderedResults =
from cg in queryResults
orderby cg.TotalSales descending
select cg
;
Main()
method, add the following print statement and foreach
processing loop: WriteLine("Total\t: By\nSales\t: Region\n-----\t ------");
foreach (var item in orderedResults)
{
WriteLine($"{item.TotalSales}\t: {item.Region}");
}
Main()
method are the same as in previous examples. Compile and execute the program. Here are the group results:Total : By
Sales : Region
----- ------
52997 : Asia
16999 : North America
12444 : Europe
8558 : South America
7000 : Africa
The Customer
class and customers
list initialization are the same as in previous examples.
The data in a group query is grouped by a key field, the field for which all the members of each group share a value. In this example, the key field is the Region
:
group c by c.Region
You want to calculate a total for each group, so you group into
a new result set named cg
:
group c by c.Region into cg
In the select
clause, you project a new anonymous type whose properties are the total sales (calculated by referencing the cg
result set) and the key value of the group, which you reference with the special group Key
:
select new { TotalSales = cg.Sum(c => c.Sales), Region = cg.Key }
The group result set implements the LINQ IGrouping
interface, which supports the Key
property. You almost always want to reference the Key
property in some way in processing group results, because it represents the criteria by which each group in your data was created.
You want to order the result in descending order by TotalSales
field so you can see which region has the highest total sales, next highest, and so on. To do that, you create a second query to order the results from the group query:
var orderedResults =
from cg in queryResults
orderby cg.TotalSales descending
select cg
;
The second query is a standard select
query with an orderby
clause, as you have seen in previous examples; it does not make use of any LINQ group capabilities except that the data source comes from the previous group query.
Next, you print out the results, with a little bit of formatting code to display the data with column headers and some separation between the totals and the group names:
WriteLine("Total\t: By\nSales\t: Region\n---\t ---");
foreach (var item in orderedResults)
{
WriteLine($"{item.TotalSales}\t: {item.Region}");
};
This could be formatted in a more sophisticated way with field widths and by right-justifying the totals, but this is just an example so you don't need to bother — you can see the data clearly enough to understand what the code is doing.
A data set such as the customers
and orders
list you just created, with a shared key field (ID), enables a join
query, whereby you can query related data in both lists with a single query, joining the results together with the key field. This is similar to the JOIN
operation in the SQL data query language; and as you might expect, LINQ provides a join
command in the query syntax, which you will use in the following Try It Out.
Follow these steps to create the example in Visual Studio 2015:
C:\BegVCSharp\Chapter20
.Customer
class, the Order
class, and the initialization of the customers
list (List<Customer> customers
) and orders
list (List<Order> orders
) from the previous example; this code is the same.Main()
method, following the initialization of the customers
and orders
list, enter this query: var queryResults =
from c in customers
join o in orders on c.ID equals o.ID
select new { c.ID, c.City, SalesBefore = c.Sales, NewOrder = o.Amount,
SalesAfter = c.Sales+o.Amount };
foreach
query processing loop you used in earlier examples: foreach (var item in queryResults)
{
WriteLine(item);
}
{ ID = P, City = Tehran, SalesBefore = 7000, NewOrder = 100, SalesAfter = 7100 }
{ ID = Q, City = London, SalesBefore = 8000, NewOrder = 200, SalesAfter = 8200 }
{ ID = R, City = Beijing, SalesBefore = 9000, NewOrder = 300, SalesAfter = 9300 }
{ ID = S, City = Bogotá, SalesBefore = 1001, NewOrder = 400, SalesAfter = 1401 }
{ ID = T, City = Lima, SalesBefore = 2002, NewOrder = 500, SalesAfter = 2502 }
Program finished, press Enter/Return to continue:
The code declaring and initializing the Customer
class, the Order
class, and the customers
and orders
lists is the same as in the previous example.
The query uses the join
keyword to unite the customers with their corresponding orders using the ID fields from the Customer
and Order
classes, respectively:
var queryResults =
from c in customers
join o in orders on c.ID equals o.ID
The on
keyword is followed by the name of the key field (ID
), and the equals
keyword indicates the corresponding field in the other collection. The query result only includes the data for objects that have the same ID field value as the corresponding ID field in the other collection.
The select
statement projects a new data type with properties named so that you can clearly see the original sales total, the new order, and the resulting new total:
select new { c.ID, c.City, SalesBefore = c.Sales, NewOrder = o.Amount,
SalesAfter = c.Sales+o.Amount };
Although you do not increment the sales total in the customer
object in this program, you could easily do so in the business logic of your program.
The logic of the foreach
loop and the display of the values from the query are exactly the same as in previous programs in this chapter.
20.1 Modify the third example program (BegVCSharp_20_3_QuerySyntax) to order the results in descending order.
20.2 Modify the number passed to the GenerateLotsOfNumbers()
method in the large number program example (BegVCSharp_20_6_LargeNumberQuery) to create result sets of different sizes and see how query results are affected.
20.3 Add an orderby
clause to the query in the large number program example (BegVCSharp_20_6_LargeNumberQuery) to see how this affects performance.
20.4 Modify the query conditions in the large number program example (BegVCSharp_20_6_LargeNumberQuery) to select larger and smaller subsets of the number list. How does this affect performance?
20.5 Modify the method syntax example (BegVCSharp_20_4_MethodSyntax) to eliminate the where
clause entirely. How much output does it generate?
20.6 Add aggregate operators to the third example program (BegVCSharp_20_3_QuerySyntax). Which simple aggregate operators are available for this non-numeric result set?
20.6 Answers to Exercises can be found in Appendix A.
Topic | Key Concepts |
What LINQ is and when to use it | LINQ is a query language built into C#. Use LINQ to query data from large collections of objects, XML, or databases. |
Parts of a LINQ query | A LINQ query includes the from , where , select , and orderby clauses. |
How to get the results of a LINQ query | Use the foreach statement to iterate through the results of a LINQ query. |
Deferred execution | LINQ query execution is deferred until the foreach statement is executed. |
Method syntax and query syntax | Use the query syntax for most LINQ queries and method queries when required. For any given query, the query syntax or the method syntax will give the same result. |
Lambda Expressions | Lambda expressions let you declare a method on-the-fly for use in a LINQ query using the method syntax. |
Aggregate operators | Use LINQ aggregate operators to obtain information about a large data set without having to iterate through every result. |
Group queries | Use group queries to divide data into groups, then sort, calculate aggregates, and compare by group. |
Ordering | Use the orderby operator to order the results of a query. |
Joins | Use the join operator to query related data in multiple collections with a single query. |