Log In
Or create an account -> 
Imperial Library
  • Home
  • About
  • News
  • Upload
  • Forum
  • Help
  • Login/SignUp

Index
Goal of the Book Conventions Used in This Book Part I: Introducing the Apache Web Server Chapter List Chapter 1: Introduction to Web Servers Overview In the fast-paced world of e-commerce, every minute witnesses new technologies that help the world knit closer by way of the Internet. This chapter discusses an inseparable part of the Internet—Web servers. Simply put, a Web server is a dedicated repository for Web site files and is used to host Web sites on the Internet. In more technical terms, a Web server accepts, processes, and responds to HyperText Transfer Protocol (HTTP) requests. These requests are sent by Web browsers, which are used by client computers to communicate, send, and receive information on the Internet. The relationship between a Web server and a Web client is called a client/server relationship. In this chapter, I discuss the basics of networking software. Then I discuss the basic concept of Web servers and how they work. Finally, I touch upon the concepts of HTTP/1.1. Chapter 1: Introduction to Web Servers Basic Concepts of Networking Software Before I begin discussing the more technical aspects of a Web server and how it operates, I will first discuss the underlying networking terms that you will come across time and again while reading this book. The following are the important networking terms that you should be familiar with while working with Web servers: IP addresses Name resolution Ports Sockets Protocol IP Addresses When working with Web servers and managing networks, the most important thing that system administrators manage and use are Internet Protocol (IP) addresses. An IP address is a unique numeric identifier that is set for each computer on the network. On a network, every computer is identified by its IP address. The IP address is 32 bits and is a combination of four numeric values separated by a period. For example, 172.17.68.130 is a valid IP address. However, each numeric value used for the IP address should be a number in the range of zero to 255, with both the lower Understanding How Web Servers Work Before learning how to make Web servers work in a desirable way, it is really important to understand how Web servers work internally. Most Web servers follow the same logic while operating internally. The basic understanding of the networking terms that were discussed in the previous section will help you understand how Web servers work. The process can be divided into four basic steps, and the components involved in this process are a Web browser and a Web server. The steps are as follows: The client computer uses a Web browser to connect to the Web server and requests a Web page. Upon receiving the request, the Web server locates the corresponding file or program in the file system. After locating the file, the Web server retrieves the file from the file system. The Web server sends the retrieved file to the Web browser. The requested file is then rendered on the client computer. This request and response process takes place as shown in Figure 1.1. Types of Web Servers Available Software is either free (open source) or proprietary (closed source). Free software doesn't just mean that it is available free of cost, but also means that the source code of the same is available for redistribution, modification, or customization. On the other hand, proprietary software can be acquired free of cost (in some cases), but the source code is still not available for modification or customization. In most cases, you will need to make a payment to acquire a license before using proprietary software. Web servers can be classified as free or proprietary. The first category is comprised of Web servers that can be acquired without paying for a license and whose source code is also available for modification and redistribution. The second category includes Web servers for which users must make a payment in order to acquire a license (in most cases) and whose source code is not available for modification or customization. Tip  At times, software tha Role of a Web Server Web servers perform various functions apart from accepting HTTP requests. They can be summed up as follows: Access control. Users should be able to access only those resources on the Web server for which they have permissions. Each user should be able to access only those sets of files that belong to him. Web servers allow only authorized users to access files on the server if configured accordingly. This mechanism is called access control. Access control can be practiced in numerous ways, such as setting appropriate permissions on files and directories and implementing hostname/IP address restrictions. Access control can also be implemented using authentication in which the user is required to enter a valid username and password in order to access the resources on the server. Server-side page processing/parsing. Parsing is a process by which a Web server substitutes field names with appropriate values depending upon user input. After parsing the document, the Web All about the HTTP Protocol By now you have a fair idea of what Web servers are and how they operate. However, for a better understanding of Web servers and how they function, you need to be familiar with how the HTTP protocol works. The concept is so detailed that a whole book could be written on its specification. This section covers important concepts pertaining to the HTTP protocol, but the scope is limited to what you need to understand for the functioning of Web servers. A Brief History of HTTP The first version of HTTP, HTTP/0.9 was released in 1991. However, this was not an official release. This protocol was simple and was used to transfer raw data over the Internet. The Web servers that accepted HTTP requests at that time responded to simple requests such as this one: GET /index.html In this scenario, if a file with the name index.html were in the document root (the directory that stores the Web site-related files) of the Web server, the contents would be displayed on a Web p Summary In this chapter, you learned about the basic concepts that you need to be familiar with before considering working with Web servers. The first section provided an overview of the general networking terminology such as IP addresses, name resolution, ports, sockets, and protocols. The next section presented a detailed explanation on how the Web server works. In this section, you learned how a Web server accepts and processes requests and then sends responses to the client computer. The next section discussed the difference between dynamic and static content. Next, you learned about the types of Web servers, including commercial and noncommercial Web servers, and about the most commonly used commercial and non-commercial Web servers. The chapter then moved on to discuss the role of Web servers, including a brief discussion about the main tasks a Web server performs. Finally, you read about the HTTP protocol, including the history and features of the HTTP protocol, followed by the Check Your Understanding Multiple Choice Questions and Answers 1.  Which of the following are not valid IP addresses for a host? (Choose all that apply.) 172.17.68.222 134.147.10 172.17.10.258 127.0.0.1 2.  Which of the following Web servers is an open source, kernel-based, multithreaded, powerful HTTP server and is also a product of Red Hat? Internet Information Server Apache TUX iPlanet 3.  Which of the following roles is not performed by Web servers? Maintaining logs Page processing/parsing Accessing Web site-related files Requesting documents from the browser 4.  Which of the following request methods were introduced with HTTP /1.1? HEAD POST TRACE GET 5.  Which of the following request methods is most commonly used to send information entered in Web forms? HEAD PUT DELETE POST 6.  Which of the following types of HTTP headers sent by the Web server contain information about the content or the message body that a server sends to the client? General Response Entity Request Answers 1. Chapter 2: Introduction to Apache Technology is rapidly evolving, and the result is a plethora of software solutions, which means a plethora of options. Therefore, software developers have a tough time keeping up with competition and are perpetually working to improve the quality of software. In such a fast-paced technical environment, the ability of a Web server to occupy 61 percent of the market share is commendable. I am talking about none other than the Apache Web server. Over the years, there has been a surge in the use of Apache as a Web server on various platforms. This chapter covers the features of the Apache Web server in detail. It also sheds light on how the Apache Web server works and the architecture that makes it an obvious choice for millions of worldwide organizations. Features of the Apache Web Server The design of Apache is robust, secure, and stable, and by the time you finish this section, you may well agree with this claim! The fact is that Apache has numerous fea Chapter 2: Introduction to Apache Architecture of Apache In order to understand how Apache functions, it is essential to understand how it works internally. This section, which discusses the architecture of the Apache Web server, will help you understand what happens internally while client computers send HTTP requests to Apache. Apache has a modular design (a design that allows you to extend functionality, if needed, by using add-on modules) that makes it more powerful than Web servers with a monolithic design (a rigid design that doesn't allow you to add functionality easily). Because many Web servers are definitely faster than Apache, I was initially surprised to discover that Apache is the most widely used Web server. However, after closer examination, I understood the reason for the high speed of those Web servers. Those Web servers stripped out a lot of functionality to achieve high speed. This is not the case with Apache. The Apache Web server has a full-fledged list of functions and features. How Apache Functio TCP/IP and Apache Another important concept is the relationship between TCP/IP and Apache. Consider a situation in which you are administering a live server and the Web site hosted on your server has several thousand hits every day. There is a limit to how many connections a Web server can handle at a given point. Have you ever wondered what happens when the number of requests at a given time crosses the permissible limit? This is a point worth thinking about. The first possible thought is that the requests exceeding the permissible limit will be rejected. However, doing so will adversely affect the rapport of the site. The solution is to configure Apache in such a way that the requests are not rejected. You can maintain a queue for requests that cannot be processed immediately. These requests can be processed later when the Web server is ready to handle the queued requests. You can modify TCP/IP parameters on the host computer to achieve this functionality (which I cover in Chapter 9, Summary In this chapter, I first focused on the salient features of Apache that make it a strong and secure Web server. Next, I discussed the architecture of the Apache Web server to spell out how it works internally. Finally, I discussed the relationship between TCP/IP and Apache. Check Your Understanding Multiple Choice Questions and Answers 1.  Which one of the following phrases best defines the Apache distribution? Software can be obtained free of cost. Software and source code can be obtained free of cost. Software and source code are free, but the source code cannot be modified. Software and source code are free, and the source code can be modified. 2.  Why don't expert users like to install Apache using the RPM file? It is easy to install. It has several configuration options that are confusing. Only a few configuration options can be specified at the time of installation. No configuration options can be specified because it contains precompiled options. 3.  Which of the following best define the nature of the Apache Web server? (Choose all that apply.) Multiprocessed Web server Multithreaded Web server Single-threaded Web server Preforked Web server 4.  Why is Apache Web server's request handling considered secure? (Choose all that apply.) All the server Chapter 3: Installing Apache The previous chapter covered the salient features of the Apache Web server. It also discussed the working and architecture of Apache. Now you need to install Apache. There are several install options that you may want to omit and a few that you can't afford to do away with. The installation of Apache is flexible and can be a cakewalk or as intricate as open-heart surgery. You need to know how you will use Apache so that you can customize its installation based on your requirements. This chapter focuses on the installation of the Apache Web server on both Linux and Windows platforms in a stepwise, systematic manner. Types of Installation In general, you can install the Apache Web server in two ways: by using a binary distribution or by compiling from the source code. These types of installation apply to Linux and Windows systems. Binary Installation Installing Apache by using binaries is the most common and easiest method. Binaries are precompiled and can be Chapter 3: Installing Apache Binary Installation versus Source Compilation Choosing the ideal installation type might appear to be a perplexing task. To help you decide which type of installation to use, I will compare of the two types based on the following features: Ease of installation Customization options Support for third-party modules Ability to apply patches Compiler optimization Ease of Installation If you are a novice user or a Linux newbie, you will probably look for ease of installation. In that case, go for binary installation. Installing Apache by compiling the source code can appear to be complex to novice users. Customization Options As discussed earlier, compiling Apache from source provides several customization options at the time of installation. Compiling by using source code keeps your Apache installation open to extensions and alterations, which appeals to system administrators who are abreast of technological enhancements and want the ability to compile the source code in a way that allows Installing Apache on Linux You can install Apache on Linux by using the RPM file or by compiling the source code. This section focuses on both these installation types and helps you install Apache. RPM Installation You can handle RPM installation with a single command. However, you should follow a few guidelines within the context of RPMs. Before you begin downloading the RPM, ensure the following: The RPM is for the latest version and is specific to the operating system that you are using. The RPM is built with the precompiled options that will be most beneficial to you. You can obtain RPMs from http://www.rpmfind.net, the official Web site of Apache, or from a Red Hat mirror site. Follow these steps to install Apache from an RPM file: Download the RPM file from the site. Type the following command while logged on as the root user. #rpm -ivh apache-1.3.xx-y.i386.rpm This command will install the version of RPM that you specify. However, you may already have an older version of Apache Installing Apache on Windows In the previous sections, I discussed the installation of Apache on Red Hat Linux systems. This section focuses on installing Apache on Windows 2000. I am using the binary distribution of Apache for Windows to install Apache. As discussed earlier, the Apache distribution for Windows is available as a .msi file. This file can be obtained from the official Web site of Apache. The steps for installing Apache on Windows 2000 are as follows: Obtain Apache for Windows. The Apache binary file, apache_1.3.24-win32-x86-src.msi for Windows, can be downloaded from http://www.apache.org/dist/httpd/binaries/win32/. Just double-click the installation file to begin the installation. Double-click the installation file. The HTTP Server - Installation Wizard is initiated, and a Welcome screen appears. Figure 3.4 shows the Welcome screen. Figure 3.4: The Welcome screen Click Next to proceed. The License Agreement screen appears. In this screen, select the "I accept the terms Summary This chapter focused on the installation of the Apache Web server. First, I discussed the steps for installing the Apache RPM file on Red Hat Linux systems. Next, I talked about the steps for installing Apache on Linux by compiling the source code. Finally, I provided step-by-step directions for installing Apache on Windows 2000 Server. Check Your Understanding Multiple Choice Questions and Answers 1.  For which of the following reasons will an experienced Apache administrator prefer to install Apache by using the source code rather than the RPM file? (Choose all that apply.) Ease of installation Ability to apply patches after installation Ability to customize the installation using install-time options Ability to install third-party modules using DSO support 2.  Which of the following options is used with the configure script while installing Apache to specify the inclusion of all modules in the standard distribution of Apache that are supported by all platforms? --enable-module=all --enable-module=most --enable-module=max --enable-module=NAME 3.  Which of the following rpm commands is used to install a new version of Apache when an older version is already running? rpm -vh package_name rpm -Uvh package_name rpm -e package_name rpm -i package_name 4.  Which of the following scripts contains information that Apache us Chapter 4: Cofiguring Apache Server Overview In the previous chapter, I discussed the steps for installing Apache Web server and helped you perform the installation on Linux and Windows platforms. I also discussed the steps to start Apache and test whether Apache is working. After you install and test your Apache installation, you need to configure Apache in a customized way that best suits your requirements. Believe me, you can configure Apache numerous ways, and you can alter the Apache configuration to a great extent to ensure maximum performance and security. However, in this chapter, I merely acquaint you with the configurable options, because it is practically impossible to suggest the best solution for configuring Apache server. The way you choose to configure Apache is based entirely on the use and benefits you want to derive. I have tried to explain each configuration option so that it will be easy for you to decide which options should be configured and how. First, I cover th Chapter 4: Cofiguring Apache Server The Apache Configuration File The Apache configuration file is httpd.conf. If you performed a Custom-Everything installation of Red Hat Linux 7.2, the Apache Web server is installed automatically. In this case, you can find the Apache configuration file in the /etc/httpd/conf/ directory. However, the location of this file can differ if you compile the source code of Apache and specify a separate location in the --prefix argument while installing. Tip  If you followed the installation steps specified in the previous chapter, you will find the httpd.conf file in the /usr/local/apache/conf directory. The httpd.conf file is a text file and has numerous options that can be configured by adding, commenting, uncommenting, and deleting the options in it. This file is based on the configuration files of the NCSA server that were originally developed by Rob McCool. It contains plenty of options and, at first sight, might appear complex to novice Web server administrators. Editing the Apache conf Directives Before examining the options in this file, you need to be familiar with the concept of directives. The httpd.conf file uses directives to configure Apache. Directives follow a specific syntax that is similar to any programming language. However, directives can't be called commands and don't operate like programming languages. They can be referred to as instructions to the Apache server that help Apache behave in a particular manner and help it locate the resources to be used. It is important, though, to understand that directives don't directly control the actions of Apache. The purpose and use of Apache directives should be clear to every individual who takes up Apache Web server administration. As a matter of fact, only with experience do Apache administrators gain expertise in using these directives. The more hands-on experience you have with Apache, the better are your chances for "sharpening your saw" with it. Directives can be broadly classified as core directives and Layout of the httpd.conf File Before beginning, you need to understand the layout of the httpd.conf file. The configuration directives in the httpd.conf file are grouped under three main sections. Each section contains directives that help you configure Apache. You will also see many comments that help you understand the configuration options in the httpd.conf file. The sections make it convenient for the user to locate the configuration options while editing the httpd.conf file. Sometimes, experienced Web server administrators prefer maintaining different configuration files for each of these sections for their convenience. The three main sections are: The Global Environment section The Main server configuration section The Virtual Hosts section The Global Environment Section This section contains those directives that globally affect the functioning of the Apache Web server. These directives are essential for the overall operation of Apache. The directives in this section determine h Examining the Configuration Options By now, you know how to examine the options of the httpd.conf file. In this section, I include extracts of the httpd.conf file and discuss the relevance and use of each option. By uncommenting or commenting on existing lines, you can handle most of the Apache configuration. However, most options work best if they are not altered. Tip  I suggest that you open the httpd.conf file using the vim text editor. The vim text editor applies different colors to the lines in the httpd.conf file, depending on the nature of the lines. For example, the commented entries will have a different color than the uncommented configurable options. This way, you can easily identify the configuration options in the midst of hundreds of lines of text. To open the file using vim, specify the command vim /usr/local/apache/conf/httpd.conf at the console. I divided the configurable options into three different sections, as shown in the following subsections. Global Environment C Using Apacheconf By now, you have some idea about the directives and configurable options in the httpd.conf file. However, this is not the only way you can configure Apache Web server. Besides editing the httpd.conf file, you can use a GUI tool named Apacheconf to configure Apache. This tool is available with standard Red Hat 7.2 distribution. You can invoke Apacheconf from the GNOME or KDE desktop environments by typing the command apacheconf on the Terminal Emulation Program window. Though you can't use this utility to the extent that you can the httpd.conf file, Apacheconf is a good GUI tool for novice administrators. Note  Leave the httpd.conf file untouched if you want to use Apacheconf to configure your Apache Web server. The Apache Configuration window contains four tabs: Main, Virtual Hosts, Server, and Performance Tuning. The Main Tab You use the Main tab to specify the general server settings, such as the Server Name and the Webmaster email address. Figure 4.1 displays the Ma Summary In this chapter, I focused on the different ways you can configure the Apache Web server. I discussed the httpd.conf file, which is the conventional way of configuring Apache. Finally, I discussed the GUI utility named Apacheconf, which you can also use to configure Apache. Check Your Understanding Multiple Choice Questions and Answers 1.  Considering that you have performed a standard installation of Apache, in which directory is the httpd.conf file located? /root /etc/httpd/conf /home/www /usr/local/apache/conf 2.  Which of the following directives do you use to specify the default location for all Web site files hosted by Apache Web server? ServerRoot ResourceConfig DocumentRoot AccessConfig 3.  Which of the following directives do you use to specify the location for error logs? ErrorLog LogLevel LogFormat CustomLog 4.  Which of the following directives do you use to check whether a particular module is installed? LoadModule AddModule Directory IfModule 5.  Which of the following commands do you use to invoke Apacheconf? apacheconf apachectl service httpd start /etc/rc.d/init.d/httpd restart Answers 1.  b. If you used the standard installation for Apache, the httpd.conf file will be located in the /etc/httpd/conf directory. You can specify a different Chapter 5: Implementing Access Control in Apache Overview The Internet is a huge repository of information that is accessible to millions of users across the world. Numerous Web servers host a wide range of information on the Internet. In such a scenario, it is important to keep a check on who is allowed to access resources of a Web site on a Web server. When you request a resource from a site, chances are that your request is either accepted or rejected outright. The latter might happen because you are not entitled to access the document you requested. It is quite possible that the Web server you are accessing contains certain sensitive information that is meant only for a few specific people and excludes you. From the perspective of a Web server administrator, it is extremely important to protect this information from anybody who is not entitled to access it. Now, the question that arises is how all this is managed in Apache. Restricting or allowing access to resources in Apache can Chapter 5: Implementing Access Control in Apache Introducing Access Control An integral part of any Web server is its capability to implement access control on the resources in it. Access control is the basic method for ensuring that the resources on the Web server are safe. I define access control as a regular check performed by the Web server on the basis of predefined criteria that determine whether a request should be accepted or rejected. This definition can be split into three parts: A regular check performed by the Web server. The first part of the definition suggests that access control is initiated when Apache performs a check on the request. This helps Apache determine the nature of the request. Check on the basis of predefined criteria. The next part of the definition suggests that Apache checks the requests on the basis of predefined criteria. This means that, as an Apache administrator, you can specify the condition or criteria that should be satisfied before the request is sent a positive response. A wide variety of cri Types of Access Control There are various ways of implementing access control to restrict access to files and directories on the host computer. The three most common methods are as follows: Using host-based access control Using authentication-based access control Using .htaccess files Host-Based Access Control Host-based access control signifies that the access control is dependent on the computer from which a user is sending a request. In other words, a request sent from a user will be accepted only if the computer sending the request is allowed to access the resource on the server computer. Host-based access control can be implemented by specifying IP addresses or domain names in the configuration file. If any portion of the specified IP address or domain name is restricted, the server will deny access to the requested resource. You can implement host-based access control using the following: Domain names IP addresses Domain Names Hostnames can be used to restrict or allow access to Putting Your Knowledge into Practice In this section, I discuss a scenario for protecting content on your Web site. Imagine that you are a system administrator responsible for administering the Apache Web server. As a system administrator, you are responsible for the security of your site. You have been assigned the task of setting up password protection for the site in such a way that only valid users can access its resources. To accomplish this task, and assuming that you want to use basic authentication, you perform the steps in the following sections. Create the .htaccess File To create the .htaccess file, follow these steps: Use the vi editor to create a file named .htaccess in the document root directory. #vi /opt/web/html/.htaccess Add the following directives in the .htaccess file. AuthName "Please Enter Credentials" AuthType Basic AuthUserFile /usr/local/apache/passwd/.htpasswd Require valid-user Create the Password File To create the password file, follow these steps: If the Summary In this chapter, you learned how to configure Apache so that you can implement access control. You learned how host-based and authentication-based access control work and how to use the .htaccess file. Check Your Understanding Multiple Choice Questions and Answers 1.  Which of the following are fully qualified domain names? .com .org .welcome.com colors.rednblue.com 2.  Consider a situation in which IP addresses 134.147.100.0 to 134.147.100.50 are specified in the Allow directive and the IP address 137.147.100.6 is specified in the Deny directive. Which of the following configurations will allow the IP address 134.147.100.6 to access resources on the Apache Web server? (Choose all that apply.) Order deny, allow Order allow, deny Order mutual-failure Order allow, deny Allow from all 3.  Which of the following is not a feature of digest authentication? Credentials of the user are sent in encrypted format. The Web server sends resources to the client in encrypted form. A few browsers don't support digest authentication. A nonce is generated in digest authentication. 4.  Which of the following is not used to derive a digest that is sent to the Web server when digest authentication is use Part II: Implementing Basic Features of Apache Chapter List Chapter 6: Configuring Virutal Hosts Overview Virtual hosting enables a Web server to host more than one Web site at a time. The Apache Web server was one of the first Web server to incorporate this functionality. Many Web-hosting companies today use the Apache Web server to host several Web sites for each of their customers. The Web site that you plan to host on Apache could be a part of the same domain or of different domains. With the ever-increasing number of Web sites on the Internet, virtual hosting has become a necessity in order for a Web server to qualify as a good Web server. In this chapter, I discuss the concepts related to virtual hosting in Apache and the types of virtual hosting options that are available. Finally, I discuss how virutal host support can be added to Apache. Chapter 6: Configuring Virutal Hosts The Virtual Hosts Section in httpd.conf Before delving into a discussion on virtual hosts, you need to become familiar with the Virtual Hosts section in the httpd.conf file. The following section (Section 3) contains a few directives that you should be familiar with before you begin: # Section 3: Virtual Hosts # # VirtualHost: If you want to maintain multiple domains/hostnames on your # machine you can setup VirtualHost containers for them. Most configurations # use only name-based virtual hosts so the server doesn't need to worry about # IP addresses. This is indicated by the asterisks in the directives below. # # Please see the documentation at <URL:http://www.apache.org/docs/vhosts/> # for further details before you try to setup virtual hosts. # # You may use the command line option '-S' to verify your virtual host # configuration. # # Use name-based virtual hosting. # #NameVirtualHost * # # VirtualHost example: # Almost any Apache directive may go into a VirtualHost container. # Th The Virtual Hosts Section in httpd.conf Types of Virtual Hosts Broadly speaking, there are three types of virtual hosts: name-based virtual hosts, IP-based virtual hosts, and dynamic virtual hosts. However, the most common methods of configuring Apache to use virtual hosts involve using name-based and IP-based virtual hosts. The mod_virtual module is the standard Apache module used for virtual hosting. Both IP-based and name-based virtual hosts can be implemented using the mod_virtual module. IP-Based Virtual Hosts IP-based virtual hosts are virtual hosts that are identified by using unique IP addresses. These virtual hosts receive client requests on an assigned IP address. All responses sent to the client are also sent using the same IP address assigned to the IP-based virtual host. Note  IP addresses are configured on the network interface card that the server is using. However, when you want to configure several virtual hosts, it might not be feasible to have that many network interface cards. To counter this, you can use Summary In this chapter, I discussed all the concepts related to configuring virtual hosts in Apache. First, I discussed the Virtual Hosts section of the httpd.conf file to familiarize you with the directives used in relation to virtual hosts. Then I explained the concept of IP-based virtual hosts and how they are implemented. Next, I examined the steps for configuring name-based virtual hosts in Apache. Finally, I discussed the concept of dynamic virtual hosting and the directives that are used in conjunction with it. Check Your Understanding Multiple Choice Questions and Answers 1.  Which of the following directives should be used only if you are implementing name-based virtual hosts? NameVirtualHost DocumentRoot ServerRoot <VirtualHost> 2.  Which of the following directives should have the same value specified for all name-based virtual hosts configured for an Apache Web server? DocumentRoot ServerName ErrorLog <VirtualHost IP_address> 3.  Which of the following are the features of IP-based virtual hosts? (Choose all that apply.) More than one virtual host can be configured on a single IP address. Are used for Web sites that use SSL connections. Every virtual host should have a unique IP address. Are not supported by HTTP /1.1-compliant browsers. 4.  For what purpose is interface aliasing used? (Choose all that apply.) Assign multiple IP addresses to a single network interface card. Assign a single IP address to multiple network interface cards. Configure multiple IP-based virtual hosts on a singl Chapter 7: Dynamic Content with CGI Gone are the days when all Web pages were created by using plain HTML. Web pages created using HTML were static and lacked interactivity. A Web server that is capable of displaying only static Web pages is not of much use in today's Internet scenario, which demands interactive, dynamic, and user-friendly Web pages. Times have changed, and the Internet has evolved to a point where even personal Web pages (for example, guest books, newsletters, user registrations, form mailers, and so on) require more than static outputs. Thanks to Common Gateway Interface (CGI), you can multiply the functionality of a Web server. You achieve this in Apache by executing external programs when required and receiving their output in a format that can reach your intended audience through a Web browser. Such a mechanism allows a Webmaster to write customized programs that work on the Internet. That is, a Webmaster can write a program in C or C++ that displays output in Int Chapter 7: Dynamic Content with CGI Accepting User Input in HTML Forms Whether it is filling out an online application form or signing up for free e-mail on a Web site, HTML forms are used everywhere. HTML forms provide an interface for the user to enter information that is processed by CGI programs. You create these forms using the <FORM> and </FORM> tags in the HTML pages. These forms can accept data entered by a user on the client side using HTML input fields, such as text fields, multiple line text areas, radio buttons, check boxes, drop down menus, multiple selection boxes, and hidden fields. Each element on a form contains a NAME tag that is used to uniquely identify the element on the form and also to reference the information stored in the element. The information stored in each element is known as its value and is encoded with the rest of the form data when the information is sent to the calling CGI program. The information is passed in the form of pairs, each pair consisting of the element's name and its value. Configuring Apache to Use CGI Although configuring Apache to use CGI scripts requires little effort, you can do a lot to configure Apache to use CGI scripts securely. The following sections provide information that you need when using Apache with CGI. The sections explain the CGI related directives and how they can be used to configure Apache to use CGI. The ScriptAlias Directive You need to configure the ScriptAlias directive on Apache before you can execute CGI scripts on the server. The scripts are stored in a specific directory that is used exclusively by CGI. You then store all CGI programs in this directory, and searches for all client requests occur in this directory. All the ScriptAlias directives are stored in the httpd.conf configuration file. Consider the following example: ScriptAlias /cgi-bin/ /usr/local/apache/cgi-bin/ If all the CGI files are stored in the directory specified by the DocumentRoot directive, they are easily accessible. However, if the files are stored in o Learning to Script This section discusses the important concepts related to CGI programming. Before I discuss the specifics of CGI programming and write a simple CGI program, though, it is imperative that you understand how a CGI script is different from other conventional Web-based programs. The following are the key features of CGI programs that differentiate CGI from other Web-based applications: MIME-type header. All CGI programs also send MIME header information along with the regular output returned by the program. This information generally informs the client browser about the type of information being returned and is stored along with the HTTP header information. Output in HTML format. All output that is returned by CGI scripts is sent to the client browser in HTML format. Although all the information is displayed in HTML, you can configure the programs to return the output in other formats that the browser supports. For example, in addition to displaying HTML information, brow Troubleshooting If the CGI script works correctly, it will display the correct output in the browser window. However, sometimes the browser does not display the correct output and, instead, displays an error. In such circumstances, you need to know how to troubleshoot the problem. The following problems commonly occur in CGI scripts: Permission problem. One of the most common problems is related to access permissions. If an administrator allows access to only certain users, other users will not be able to access the CGI programs stored on the Web server. In restricted access, a log on attempt made by a user is logged onto the Web server. If a user needs to be assigned access to the Web server, appropriate file permissions will need to be assigned (you will learn about file permission later in this chapter). Most error messages begin with the word Forbidden. Errors on the server. Sometimes access to CGI programs is refused or there is no output because of certain errors on the Web serve CGI Environment Variables An environment variable is a variable that is maintained by the operating system and Web servers. These variables exist in memory and are used by applications for obtaining specific information about the system. Environment variables are used primarily to store information about the current date, time, and log-in information. You can see a list of all environment variables configured on the operating system by using the env command. Check out the following output of the env command: [root@linux root]# env PWD=/root HOSTNAME=linux.server.com PVM_RSH=/usr/bin/rsh QTDIR=/usr/lib/qt-2.3.1 LESSOPEN=|/usr/bin/lesspipe.sh %s XPVM_ROOT=/usr/share/pvm3/xpvm KDEDIR=/usr USER=root LS_COLORS=no=00:fi=00:di=01;34:ln=01;36:pi=40;33:so=01:35:bd=40;33;01:cd= 40;33;01:or=01;05;37;41:mi=01:05;37;41:ex=01;32:*.cmd=01;32:*.exe= 01;32:*.com=01:32:*.btm=01:32:*.bat=01;32:*.sh=01:32:*.csh=01;32:*.tar= 01;31:*.tgz=01;31:*.arj=01;31:*.taz=01;31:*.lzh=01;31:*.zip=01;31:*.z= 01;31:*.cpi STDIN and STDOUT STDIN stands for standard input, and STDOUT stands for standard output. You may recall that Linux has three types of standard files: the standard input file, the standard output file, and the standard error file. In the context of the Linux operating system, a standard input file is a keyboard that can be used to enter data, and the standard output file is the screen where you see the output of the data entered. Although the logic for STDIN and STDOUT is the same with respect to using CGI scripts, the way in which you apply STDIN and STDOUT is different. The following list describes what happens when a Web form is sent to a CGI program using the POST method: The data present in the form is combined into a special format. This format is delivered to the CGI program using STDIN. The program processes this data as though it were entered from the keyboard and sends the output using STDOUT. CGI Modules and Libraries All system administrators should use CGI code modules and libraries when configuring Apache to use CGI programs. These modules and libraries help you develop efficient CGI code. The main benefits of using CGI modules and libraries are that they allow you to develop CGI programs with fewer errors and also lead to faster development. If you are considering using Perl to write most of your CGI programs, you might want to use the modules available on Comprehensive Perl Archive Network (CPAN). CPAN is a wide network of resources related to Perl and the Perl community. It contains numerable resources related to Perl, such as utilities developed in Perl, books, documentation, and the entire Perl distribution. The most popular libraries available with Perl on CPAN are the following: CGI Lite. A library with a minimum set of functionality. CGI.pm. A popular and more comprehensive library. Improving Performance of CGI Scripts A variety of third-party tools are available that enable you to write CGI programs faster and more efficiently. Two of the important tools are FastCGI and CGI.pm. In this section, I'll discuss the benefits of using these tools and how they help developers write CGI scripts faster and more efficiently. CGI.pm CGI.pm is a Perl 5 CGI library that helps improve development speed of CGI scripts. This Perl library helps parse and interpret query strings before passing them to CGI scripts. When you use CGI.pm, almost every task is accomplished using a CGI object. You can use a CGI object for the following purposes: To examine the environment of a query string, to parse it, and to store its results. To return and modify query values. To handle POST and GET methods effectively. FastCGI FastCGI is an open source extension to CGI. The most important benefit of using FastCGI is that it facilitates high performance and it doesn't have limitations of server-speci Securing CGI Although CGI scripts are easy to use and serve many useful purposes, they are vulnerable to security attacks. As a result, CGI is sometimes considered too insecure to run on the server side. On the other hand, CGI has been on the Internet scene for quite some time, which is probably why its shortcomings are so globally evident. The truth is that if CGI scripts are written properly, they are as secure as any other application. Security Tips Following are a few tips for securing CGI applications: Never run the Web server as a privileged user. It is more than dangerous to run the Web server as a privileged user such as the root user. Only the root user should run the main Apache daemon. This is because the main Apache daemon doesn't handle client requests directly. All child processes directly handle client requests and should be run with the permissions of an unprivileged user. Avoid passing input directly to the Linux shell. Perl scripts can potentially pass data to the Lin Summary In this chapter, I introduced concepts related to the Common Gateway Interface. Then I discussed CGI in relation to the Apache Web server and how you can configure Apache to work with CGI. Next, I discussed how to create a basic CGI program and troubleshoot CGI related problems. I also discussed the environment variables. Finally, I described a few tools that you can use to improve the performance and security of CGI applications. Check Your Understanding Multiple Choice Questions and Answers 1.  Which of the following directives is used to instruct the Web server regarding which files should be considered as CGI files? Directory AddHandler Options ScriptAlias 2.  Which of the following statements is not true with respect to FastCGI? All tasks are accomplished using CGI objects. Simplifies the migration of existing CGI applications. Independent of the Web server architecture. Doesn't have language dependencies. 3.  Which of the following environment variables is used to store the value of the IP address of a remote computer that sent a request? REMOTE_HOST REMOTE_USER REMOTE_ADDR REMOTE_IDENT 4.  Which of the following environment variables is used to specify the complete URL that includes information such as the protocol used for connection, the hostname, the port used, and the request? SCRIPT_URI SCRIPT_URL SCRIPT_FILENAME DOCUMENT_PATH_INFO 5.  Which of the following tools is primarily used to improve the sec Chapter 8: Working with Apache Modules Overview The Apache Web server services clients' requests over the HTTP protocol. In the request-response mechanism, a client requests a service from the server, and the server sends a response back to the client. The response can be a static Web page or a dynamic Web page containing results of programs executed at the server. In addition to serving a client's requests, the Web server performs tasks such as authentication of the client, authorization of the clients for the services sought, and so on. The Apache Web server is built on a modular concept, which ensures ongoing scalability of the server. This modular architecture means that you can easily enhance the Apache Web server by either expanding it or shrinking it to meet your needs. Here are some of the specific benefits you derive by using a modular architecture: You can easily add features to the server. Because of Apache's open source nature, programmers around the globe are able to devel Chapter 8: Working with Apache Modules Apache Standard Modules As I said earlier, the Apache core provides the basic features of the Apache server, while a standard set of modules supplements the core's basic functionality. All of this combined functionality is required in order for the Apache server to work. Modules are required to handle each stage in the request-response cycle that takes place between a Web server and Web browser. Various stages of the request-response cycle that are managed by modules are listed here: Determination of the physical file requested using the URI sent by the HTTP client Authentication Access control File permissions MIME type mapping of the object requested Response to the client Request logging In the upcoming sections of this chapter, you'll discover that modules are available to control and manage each stage. In the following sections, you'll learn about the standard set of Apache modules, which are categorized based on their functions. Environment Creation Modules Modules in this catego Dynamic Shared Objects You need to load a module to make it available to the Apache server. You can do so in two ways: statically or dynamically. After you statically compile a module to link it to the Apache core daemon file (httpd), the module can't be removed or added. Apache supports dynamic loading by using the Dynamic Shared Object (DSO) mechanism. The DSO method was introduced in Apache 1.3. The support for DSO makes it possible for you to load modules into the Apache Web server without having to recompile the server. With this method, you load the DSO module into memory when needed and link it dynamically to the Apache core. This course of action uses the Shared Object mechanism in Linux and the DLL mechanism in Windows. Many third-party modules are available in the DSO compatible format. Before you can dynamically load a module, you need to declare the module as dynamically loadable. The dynamically loadable modules are specified in the http.conf file. You declare all such mod Application and Language Support Modules One important feature of Apache is that it is possible to add programming language support by simply installing the respective module. For example, to build support for PHP scripting language in Apache you can simply install the respective module and you are ready to use PHP. You can broadly categorize some modules as those that support applications and languages. Some such commonly used modules are listed here: mod_cgi. Provides support for CGI. mod_perl. Provides the server API with the Perl language interpreters and the Perl interface. mod_python. Provides support for the Python language interpreter. mod_php. Provides support for the PHP server-side scripting language. mod_javascript. Incorporates support for JavaScript. mod_serv. Provides support for Java servlet programs. According to a survey posted at https://secure1.securityspace.com/s_survey/data/man.200207/apachemods.html, the following are the five most popular modules: mod_php mod_pe Installing Modules You can install Apache modules several ways. First, you can install Apache modules either within the Apache source code tree or outside it. The modules installed within the source tree are basically a part of the Apache standard distribution. These modules are usually placed in the /src/modules directory. To install the module this way you must pass some configuration directives to the Apache AutoConf-style Interface (APACI). APACI compiles the modules with the Apache server, and the modules are linked during runtime. You need to make the required changes to the httpd.conf to enable the module. Second, you can configure modules as dynamically linked modules (refer to the section "Dynamic Shared Objects" for the details on dynamically linked modules). Third, you can install modules by using the Apache Extension, which is covered in the next section. Using the Apache Extension (apxs) The Apache Extension (apxs) is a Perl script that you can use to compile and install A Summary In this chapter, you learned about Apache modules. Modules are the core of the Apache server, and they provide added functionality that enables the server to function properly. I discussed the various standard modules that are shipped with the default Apache installation. A benefit of using Apache modules is that you can enable or disable the modules during runtime without having to recompile the Apache server. The LoadModule and AddModule directives in the httpd.conf file play an important role in determining which modules are available to the Apache server. You found detailed coverage of these directives in this chapter. Finally, you learned how to install two the most popular modules, mod_php and mod_perl. Check Your Understanding Questions and Answers 1.  Which module uses the Action and the Script directives to execute CGI scripts based on media type or request method? mod_actions mod_include mod_cgi mod_negotiate 2.  Select the advantages of using Apache modules. (Choose all that apply.) Low memory usage Flexibility Extensibility High performance during the first-time loading of modules Easy to manage 3.  The _________ file helps you ensure that PHP is installed successfully. 4.  Which of the following directives in the httpd.conf file do you use to specify the PHP extensions? AddType AddExtension AddPHPType AddPHPExtension 5.  Which of the following modules do you use for URL mapping? (Choose all that apply). mod_rewrite mod_alias mod_access mod_negotiation Answers 1.  a. mod_actions module uses the Action and the Script directives to execute CGI scripts based on media type or request method. 2.  a, b, c, e. The advantages of the Apache modules are low memory usage, flexibility, exte Part III: Advanced Configuration Chapter List Chapter 9: Improving Apache's Performance Overview Performance is always one of the major determining factors of the effectiveness of any software program. As a system administrator, you want to optimize performance as much as possible. In regard to Web servers, performance can be a cumulative measure of availability, time taken to retrieve resources from the server, or the number of requests handled in a given time. However, every setup is unique, and performance will be considered differently in different situations. Apache is a full-feature Web server, and you can tweak it to a great extent. If you like experimenting with software, you might design your own plan for optimizing Apache's performance. However, years of research and testing have resulted in Apache's performing at the highest possible level. In this chapter, I'll begin with a few directives that you can tweak to ensure maximum performance according to your needs. Then I'll discuss how you can use Apache as a proxy server Chapter 9: Improving Apache's Performance Performance-Related Directives As discussed, you can configure a couple of directives to improve the performance of Apache. This will depend entirely on your setup and how you want to use Apache. In this section, I'll discuss how the main performance directives work and how to configure them for maximum performance in different situations. Configuring Performance-Related Directives One of the main elements of performance is speed. The higher the speed, the faster you are able to address client requests. In a production environment, thousands of clients might be concurrently connected to a Web server at a given point in time. Things are different when you are using Apache on a local area network, or LAN, where the client requests are minimal compared to the requests handled by a public Web server. Depending on the requirement and how you want to use Apache, you can configure the performance-related directives in Apache. A method frequently used to improve Apache's speed and availability Proxying Using Apache One of the many benefits that the Apache Web server can provide you is the ability to use it as a proxy server. A proxy server enables you to use one and only one network address to external sites and handles exchanges between the Internet and your LAN, essentially protecting your network's security and when used as a caching server, also enhancing its performance. In order to configure Apache to act as a proxy server, use the mod_proxy module. In this section, I'll discuss the directives you use to configure Apache as a proxy server. I'll also discuss how to configure Apache to act as a proxy server. Benefits of a Proxy Server As I just mentioned, using a proxy server provides several benefits, particularly the following three: Improves performance. The first and foremost benefit of using a proxy server is increased performance. With proxies, you can cache frequently requested documents; then, as you need the data, you can download it directly from the proxy. For Logging in Apache Maintaining logs is an integral part of Apache server administration. An efficient administrator has the ability to analyze logs and maintain a healthy server. Logs are analyzed from time to time to ensure that the system is safe from intrusion. Logs are also examined whenever there is a problem. Log files help you determine what went wrong, frame a solution for the problem, and later troubleshoot it. The Apache Web server uses several built-in log files. During runtime, Apache appends the log files to statements that help determine how the Web server is functioning. Apache uses two main types of log files. The first type records the network activity of the Web server, information about the client requests, and information that can be gathered about the client sending the request. These logs are called request logs. Apache also uses error logs. Error logs, as the name suggests, record errors that take place while the Apache Web server is running. The error log records Miscellaneous Tips for Performance Tuning I've discussed most, but not all, of the steps you can take to improve the performance of the Apache Web server. Following are a few more suggestions for improving performance: Choose adequate RAM. The most important hardware component that directly affects the performance of a Web server is the RAM. Be sure to select the appropriate amount of RAM for your server. Another option is to set the MaxClients directive in a way that is optimum for the RAM you are using. Inadequate RAM can force the server to respond to requests by using the SWAP memory. This is not recommended because it increases the time it takes the Web server to answer the request. Use the AllowOverride directive. Wherever possible, avoid specifying the value all for AllowOverride directive. When using .htaccess files, you need to set this directive as all. As a result, for every request, the .htaccess files are opened and checked by the Apache Web server. Avoid content negotiati Summary In this chapter, I discussed concepts that help you tune the performance of your Apache Web server. I began by discussing the most important performance-related directives and how you configure them for optimum results and according to how you want to use your Web server. Next, I discussed using Apache as a proxy server and as a cache server. Then I discussed how to manage logs to ensure maximum performance and security. Finally, I presented a few miscellaneous tips that will help you achieve peak performance. Check Your Understanding Multiple Choice Questions and Answers 1.  Which of the following directives is used to specify the maximum number of headers that should be allowed for an incoming request? LimitRequestBody LimitRequestFields LimitRequestFieldSize LimitRequestLine 2.  Which of the following statements will not hold true with respect to configuring Apache as a proxy server? Apache can be configured as a forward proxy. Apache can be configured as a reverse proxy. Apache can be configured to act as a forward proxy and at the same time also configured to act as a reverse proxy. Apache can be configured either as a forward proxy or as a reverse proxy at a given time. 3.  Which of the following directives is used to share the workload of the proxy with another computer? The ProxyReceiveBufferSize directive The ProxyDomain directive The ProxyRemote directive The ProxyBlock directive 4.  Which of the following directives is used to specify the maximum limit for the cache that can be st Chapter 10: Using Server-Side Includes (SSI) Overview Gone are the days when entire Web sites were created using HTML. Those sites were static and inflexible, making the content on the Web pages appear dull because there was no user interaction possible. Times have changed, and now you can use several client-side and server-side scripting languages to generate dynamic Web pages. Apache supports a wide range of scripting languages such as Perl, PHP, and ASP. These scripting languages can be used to create full-fledged, feature-full, and dynamic Web sites. In addition to supporting popular scripting languages, Apache supports a mechanism called server-side includes (SSI). Server-side includes are used in Apache to generate dynamic content on HTML Web pages. In this chapter, I focus on the meaning and use of server-side includes. I then discuss the important directives that are used in conjunction with server-side includes. Next, I discuss the tags that are used with SSI. Finally, I consi Chapter 10: Using Server-Side Includes (SSI) Introduction to SSI Server-side includes are nothing but simple tags used by the Apache Web server. These tags are placed within HTML code of a Web page as comment statements. When Apache encounters these statements at runtime, it interprets them and performs the specified action, thereby making HTML Web pages dynamic and flexible. In order to configure Apache to use SSI, you need to either edit the directives in the httpd.conf file or include the appropriate configuration directives in the .htaccess file. A few directives must be configured in order to direct Apache to parse Web pages and to detect statements within the HTML code as server-side includes. Server-side includes are named as such because the tags that you specify within the HTML file are processed at the server-side before delivering the requested resource to the client. Purposes of SSI You can use SSI to perform a wide range of activities at the server-side. A few server-side activities that you can perform include the f Tags Used with SSI The best part about SSI is that you implement it using SSI tags that are easy to use. SSI tags are embedded in HTML in the form of comments and have special meaning only for the SSI parser. The Web server deletes the SSI tags and their enclosing comments. Therefore, the SSI tags are not visible to the client browsers. Syntax for Using SSI The syntax for using SSI tags is as follows: <!--#command_name attribute=value attribute=value…--> In the preceding syntax, note the following: command_name is the command/tag that you specify. attribute is the option used with the command/tag. value is an element that is specified for each attribute. Now that you are familiar with the syntax of using SSI, turn your attention to the tags that are used with SSI. The <config> Tag You use the config tag to specify how certain elements of SSI should be displayed or formatted. Three attributes can be used with this tag. They are as follows: errmsg. The value of this attribute is displaye Using Flow Control Elements in SSI SSI is not a powerful programming language. It is a simple mechanism to generate dynamic Web pages. Therefore, all of the control structures that are available with a full-fledged programming language are not available with SSI. However, basic level of flow control can be implemented in SSI. SSI supports simple statements such as if and else. But these statements/operators don't support the function of nesting that is available with other programming languages. The syntax for using these operators is as follows: <!--#if expr="any_condition"--> HTML code <!--#elif expr="any_condition"--> HTML code <!--#else --> HTML code <!--#endif--> In the preceding syntax, note the following: The if and else operators are used. The expr keyword is specified. This keyword should always be specified when you use these operators. The if expr element performs the same task as any other if statement in any programming language. You use it to check whether the specified c Summary In this chapter, I discussed server-side includes. I started with a brief introduction to server-side includes and how they are used. Next, I discussed the SSI tags and how they are embedded in HTML to generate dynamic Web pages. Finally, I discussed the concept of flow control in SSI where I explained how conditionals, such as if and else, are used with SSI. Check Your Understanding Multiple Choice Questions and Answers 1.  Which of the following is not possible with respect to SSI? Access environment variables Create dynamic Web pages Create complex programs using nested operators Call a CGI program 2.  Which of the following methods for configuring SSI for Apache is not recommended if you already have an existing Web site? Uncommenting the AddType and AddHandler directives in httpd.conf to allow server-side parsing Adding the XBitHack directive manually in the httpd.conf file Using MIME type configuration Using .htaccess files to specify the configuration directives 3.  Which of the following strings is used with the config timefmt option to display a weekday in abbreviated form? %a %A %b %B 4.  Which of the following SSI tags are used to display the last modified information about a file? U printenv set flastmod 5.  Which of the following SSI tags is used to assign a name and value to a variable? exec set include config Answers 1.  c. I Chapter 11: Securing Apache Overview Secutity is an integral part of life. Anything important in your life needs to be secured in some way. This could be your house, which you keep locked when you're away, or money that you keep safe in a bank. In simple language, security is a basic activity that you need to perform in order to safeguard your resources. The same holds true for a Web server. In today's Internet age, growing awareness and technical expertise have allowed malicious people to find and exploit system vulnerabilities. Security stays a step ahead by performing a systematic check to discover vulnerabilities. Security stays a step ahead by performing a systematic check to discover vulnerabilities and loopholes and then taking appropriate corrective actions. An essential part of Web server administration is ensuring that the resources on the server are not tampered with. Remember the adage, "An ounce of prevention is worth a pound of cure"-that's what security is all about. Sec Chapter 11: Securing Apache Securing Web Server Files A Web server is a repository of numerous resources. These resources need to be protected from unauthorized access at all times. Implementing a strong security policy is the only logical solution to this state of affairs. The issue is how to protect resources. All Web server files are essentially disk files, which, if accessed by an unauthorized user, could provide important information to an unintended participant. The root cause for several security problems is poorly managed files. You can address this problem by taking the following measures: Setting appropriate access rights Setting appropriate symbolic links Using the Indexes option with care Setting Appropriate Access Rights The way the Apache Web server operates and the access rights allowed to the server depend on the access rights of the user ID running the Apache service. In Linux systems, the root user starts the Apache Web server. However, when the client requests need to be answered, a non-root us Safeguarding URLs In addition to securing Web server files, another important aspect of security is to safeguard URLs. This is essential because merely setting access rights for a file doesn't necessarily mean that the filename is hidden from the user. A user might not have access to a file, but may be allowed to read the name of the file. In some situations, even revealing the name of the file can cause potential harm to system resources, because the name of the file might indicate the information's degree of sensitivity. Protecting URLs helps to prevent the divulging of even small pieces of information and ensures that this information is restricted to specific users only. Ensuring security at this level is mandated when you undertake Web hosting on a large scale, when potentially millions of users have access to your site at any point in time. Mandatory and Discretionary Access To ensure that you maintain a high level of security for your Web site, you must be familiar with the conc Often-Neglected Security Concerns In this section, I introduce the most common security concerns that you'll face as a system administrator. The concepts discussed are simple but useful guidelines that, in my experience, have proven useful. No threat is as harmful as a system administrator's negligence; however, the following information can help you avoid making mistakes that might damage your system. Always Back Up the Configuration File No matter how experienced a Web server administrator you are, it is wise to always make a back up of the original configuration file before making any changes to it. You might make a mistake while configuring the server and then be unable to track it down later. Consider a situation in which you start making changes to the configuration file without making a backup. After making the required changes, you restart Apache but the changes you made aren't taking effect. Instead, the Web pages are displaying improperly, and you are encountering error messa Vulnerabilities in Server Side Includes I discussed server side includes in Chapter 10, "Using Server Side Includes." Although using server side includes is a good idea for generating dynamic content on the Web site, you must, as a system administrator, ensure that using server side includes doesn't leave any security loopholes. There are many reasons for using server side includes. However, doing so poses potential security risks, including the following: Increased load on the server. When you enable server side includes, the Apache Web server tries to parse each of the Web pages to look for SSI tags. This results in a load increase on the server. The load could be very heavy if you are running Apache Web server in a shared server environment. Using the "exec cmd" element in SSI-enabled files. This element allows the execution of CGI scripts or other programs under the permissions of the user specified in the User directive and the group specified in the Group directive. The user and CGI-Related Security Issues This section covers a few CGI-related issues that can help you ensure maximum security for your Web server. CGI scripts are perhaps the most vulnerable feature in the Apache Web server. A poor configuration or even fragile (poorly written) CGI code can result in users gaining unauthorized access and, as a result, damaging the setup. Your best approach is to take precautions that make the Web server less vulnerable to attacks. The following topics discuss a few guidelines that you should always keep in mind. Script Allased CGI Script Aliased CGI requires all the CGI scripts to be stored in a single directory. This option is considered to be more secure than the non-script aliased CGI option. Only the root user (administrator) and a few trusted users are given write access to the directory containing all the CGI scripts. However, using script aliased CGI would be more beneficial if the administrator could afford to check each and every script added to the dire Security for .htaccess Files To always be on the safe side, you should ensure that users are not allowed to set up .htaccess files that could override the original settings in the Apache configuration file. This could have a negative impact on the security of the site. You need to prevent the use of .htaccess files in all directories other than those specifically enabled. You can do so by setting the AllowOverride directive within the Directory directive, as shown here: <Directory /somedir> AllowOverride None </Directory> Setting the None value to the AllowOverride directive can prevent the use of .htaccess files in all directives other than those specifically enabled. Disabling Unnecessary Services As a Web server administrator, you can disable or eliminate the services that you don't really use. These services will differ based on your requirements. To make your system safer, also close open ports that are not used. In this section, I focus on the services that can be disabled based on how you want to use the Apache Web server. In short, you can use the Apache Web server for the following purposes: Public Web host. This type of Web host provides Internet connections and is available 24 hours a day. Anybody connected to the Internet can access the resources on a public Web host. Private and extranet Web host. The hosts that provide Internet connectivity to limited people fall in the category of private and extranet Web hosts. Intranet Web host. Set up an intranet Web host to allow access to the resources on the Web server only to users on the local intranet. This type of host doesn't have Internet connectivity. Depending on the type of host you are OS Level Security All software is installed on operating systems, and Apache is no exception. Regardless of what you do to secure Apache, you cannot rest until you're sure that the operating system on which you installed Apache is also secure. This section provides a few concepts related to Linux security. Physical Security Although this might sound a little offbeat while discussing technical content, the fact is that physical security is an absolute essential. Exercise care when deciding on the location of the server, and keep it locked so that only the administrator has access to it. The following list describes precautions to take in order to ensure that the server is not vulnerable to physical threats: Lock up the CPU of your server and retain personal responsibility for the keys. This would help secure the hardware present in the CPU. If it is not possible to lock all hardware connected to the server, set up a surveillance camera that continuously monitors people's activity near t Network Security The final phase of ensuring security is to secure your network. No matter how safe your computer is, if the network is down, you can't use network resources. Consider the following list when setting up your network's security: Using secure protocols Securing a network using firewalls Securing FTP Securing DNS Securing NIS Securing network configuration files Network monitoring Securing a network using TCP wrappers Using Secure Protocols Protocols are set of rules that are used by two computers on a network to interact with each other. To ensure maximum network security, use a secure means of communication on the network. This section focuses on the secure protocols that you can use to ensure security of data while in transit. Two of the most popular secure protocols are Secure Shell (SSH) and Secure Sockets Layer (SSL). SSH The SSH protocol is ideal for establishing secure connections over a network. The SSH protocol allows a user to access a server remotely. The stren Using Security Tools Available with Linux are a wide range of security tools that you can use to ensure Web server, operating system, and network security. In this section, I discuss a few tools that can help you maintain a secure server. Using Nmap The Nmap is a very useful, open source, command-line tool that you can use to scan large networks and determine which hosts are available on the network and what services they are running. You can also use Nmap to determine operating system identification. The open ports detected by this utility determine the services the remote host is running. The following use of nmap command detects the open ports on a Linux system: [root@linux home]# nmap -sT 172.17.68.181 Starting nmap V. 2.54BETA22 ( www.insecure.org/nmap/ ) Interesting ports on linux.server.com (172.17.68.181): (The 1533 ports scanned but not shown below are in state: closed) Port State Service 21/tcp open ftp 22/tcp open ssh 23/tcp open telnet 80/tcp open http 111/tcp open sunrpc 1 Other Third-Party Tools Used for Security If you are curious enough to search the Internet for newly released security tools, you will find dozens of them. However, the tools you choose will depend on your exact requirements. However, only after learning about the various tools can you make the right choices. In this section, I discuss a few third-party tools that are used for securing the network, the operating system, and the Apache Web server. SATAN System Administrator Tool for Analyzing Networks (SATAN) is specifically designed for system administrators who must keep an eye on their networks. SATAN is a powerful tool developed by Dan Farmer and Wietse Venema. It scans a system connected to the network to detect often-exploited vulnerabilities. SATAN offers a tutorial for each of the vulnerabilities that it detects. System administrators can refer to this tutorial for in-depth information about a vulnerability and the corrective actions that can be taken to get rid of it. You can d Other Third-Party Tools Used for Security Summary In this chapter, I discussed security-related issues and how they are addressed in Apache. I began by discussing the steps used for securing files and URLs. Then I discussed the common mistakes made by administrators with respect to security and how they can be rectified. Next, I discussed SSI and CGI vulnerabilities followed by securing .htaccess files. I also discussed the services that should be disabled if you are using the Apache Web server. Then I discussed certain concepts on OS level security and network security. Finally, I discussed the security tools that can be used with Apache to constantly monitor whether there is any possibility of intrusion. Check Your Understanding Multiple Choice Questions and Answers 1.  Which of the following options will ensure maximum security for CGI scripts? (Choose all that apply). Use script-aliased CGI. Use non-script-aliased CGI. Allow offsite execution of CGI scripts. Pass CGI input to the command line. 2.  Which of the following activities can lead to a security threat? (Choose all that apply). Making a backup of your configuration file Running the Apache Web server as the root user Never checking log files Running the Apache Web server as a non-root user 3.  Which of the following remote services is used to establish a secure connection over a network and at the same time allow the user to use a separate shell after logging in? rsh rlogin ssh rwho 4.  Which of the following utilities comes bundled with Red Hat Linux and is used to scan large networks and determine which hosts are available on the network and the services they are running? Whisker Md-webscan Nmap Netcat 5.  Which of the follo Chapter 12: URL Mapping By now, you know that the files served by Apache are stored in the document root directory. Generally, when a client specifies the URL of a site, the site's index page is retrieved from the document root and displayed on the client browser. However, sometimes the documents that clients need to access are not in the document root directory. This is where URL mapping comes in; it helps locate resources that are not necessarily located in the document root directory. In this chapter, I discuss URL mapping and how a request can be redirected to a directory in the file system other than the document root directory. I also discuss the various modules and directives that you use to achieve URL mapping. Aliasing and Redirection Before I discuss how URL mapping can be implemented in Apache, it is essential that you understand the methods that make URL mapping possible. In this section, I discuss aliasing and redirection, the most basic methods used for URL mapping. In Ap Chapter 12: URL Mapping Modules Used for URL Mapping Apache supports several modules for URL mapping. Each module consists of directives that can be configured in the httpd.conf file or in .htaccess files. In this section, I will discuss the modules that can be used with Apache for URL mapping. The modules are: mod_alias. The directives that use this module manipulate and control URLs whenever a request is sent to the Apache Web server. mod_rewrite. This module provides a mechanism for rewriting URLs. It uses a rule-based rewriting engine to accomplish this task. mod_userdir. This module is used for providing user-specific directories. mod_speling. This module is used to check the spellings of URLs and correct them as needed. Mistakes related to capitalization are also checked. mod_vhost_alias. This module is used when you implement dynamically configured mass virtual hosting. Among the preceding modules, the most important ones for URL mapping are mod_alias and mod_rewrite. I explain these modules in detail The mod_alias Module This mod_alias module is an important module in URL mapping. Both aliasing and URL rewriting are possible using this module. The mod_alias module is a standard module, which means that it is compiled in the Apache Web server by default. This module can handle most of the tasks related to URL redirection and aliasing and is easy to use from an administrator's point of view. Using mod_alias for Aliasing Using the mod_alias module for aliasing is similar to creating symbolic links in Linux. This is the module that you will use most often for aliasing URLs in Apache. When you create a symbolic link, you can point to a file using the name of the file (which might be different from the original file) that you have created as a symbolic link. Similarly, in Apache, URL aliasing helps you point to a resource present in a directory other than the document root directory. Before learning how to implement URL aliasing using the mod_alias directive, you need to know about the f The mod_rewrite Module Another module that you can use for URL mapping is the mod_rewrite module. As I discussed earlier, most of the tasks related to URL redirection can be handled by the mod_alias module, except a few tasks that are more complicated and are best handled using the mod_rewrite module. The powerful mod_rewrite module can be used in the following situations: When URLs need to be rewritten on the basis of certain conditions When URLs need to be rewritten on the basis of information contained outside the URL When database lookups are involved When file lookups are involved The mod_rewrite module is defined as a rule-based rewriting engine. It is called so because it operates on the basis of certain predefined rules. When a request is received, these rules are applied, and the request is redirected to a newly rewritten URL. The ability to use rules takes URL redirection to a new dimension of complexity and dynamism. The mod_rewrite module enables a system administrator to b Summary In this chapter, I discussed the importance of URL mapping within the context of the Apache Web server. I discussed the concept of aliasing and redirection and later explained the mod_alias and mod_rewrite modules that are used for URL aliasing and redirection. I also discussed the directives that are associated with these modules and how they are used to implement URL aliasing and redirection. Check Your Understanding Multiple Choice Questions and Answers 1.  Which of the following modules uses a rule-based rewriting engine to provide a mechanism for rewriting URLs? mod_alias mod_rewrite mod_speling mod_userdir 2.  Which of the following directives is associated with the mod_alias module and is used to map a URL to a location other than the document root using regular expressions? The Alias directive The ScriptAlias directive The AliasMatch directive The ScriptAliasMatch directive 3.  Which of the following directives associated with the mod_alias module is used to redirect a URL to a temporary location? The Redirect directive The RedirectTemp directive The RedirectPermanent directive The RedirectMatch directive 4.  Which of the following directives are mandatory directives that need to be used with the mod_rewrite module? (Choose all that apply.) The RewriteOptions directive The RewriteEngine directive The RewriteLog directive The RewriteRule directive 5.  Which of the foll Chapter 13: Metainformation and Content Negotiation Overview A Web server is a repository of thousands of resources. To closely knit the world, it is important to make a resource available in representations. For example, a user in France would probably prefer a Web page to appear in French, whereas a user in Japan might want it to appear in Japanese. One way to ensure that a resource is delivered as preferred by a user is to display an index page with all the possible options so that the user can choose the one he wants. However, this approach doesn't look professional, and it requires the user to read through and choose the preferred option. Allowing the user to use his discretion is the way to go, but the difficulty lies in knowing what a particular user prefers. That's where HTTP /1.1 can help. HTTP /1.1 introduced request headers that allow users to send their preferences with their requests. Then the Web server interprets the requests and "negotiates" with the browser to display Chapter 13: Metainformation and Content Negotiation Understanding the Role of Metainformation Because metainformation helps servers process data and transfer particular kinds of data to an alternate application as needed, it is essential to Web servers and browsers. It is important that you understand the different types of metainformation before I discuss the use of metainformation in detail. Metainformation can be categorized as: The type of content of the resource The language in which the text will appear The character set used for the text The encoding or the compression scheme used for the text You know that metainformation assists in the interaction between a Web server and a Web browser. The following list discusses the uses of metainformation, which fall into three broad categories: Instruct the browser. Later, in the section "Related HTTP /1.1 Headers," I discuss a series of HTTP headers that you use to instruct browsers and help them handle content sent by the Web server. The primary role of these headers is to convey importa Using Metainformation As discussed earlier, metainformation has three main uses with respect to Apache. In this section, you find a detailed discussion of those uses. You'll examine how Apache uses metainformation to communicate with the browser, assigns a handler for any server-side processing that is required, and negotiates content by analyzing the client preferences before sending the requested document to the browser. Instructing the Browser HTTP is a binary transfer protocol that transfers data without requiring specific encoding. Each HTTP transaction uses certain headers that help identify the type of data and how it should be handled at the client and server sides. One important and mandatory header is the Content-Type header. You need to include this header with every HTTP transaction. Like most Web servers, Apache also supports a wide range of standard MIME types; so do various browsers. However, at the browser end, you can make appropriate settings to associate a particular Summary In this chapter, I discussed metainformation and its importance. I also discussed the types of metainformation and the associated HTTP headers and directives that facilitate the use of metainformation. Finally, I explained how Apache uses metainformation to instruct Web browsers and set a handler. Finally, I discussed content negotiation and how it is implemented in Apache to deliver content based on the preferences specified by the client. Check Your Understanding Multiple Choice Questions and Answers 1.  Which of the following metainformation is stored in the mime.types file that is located in the conf directory? MIME types MIME encoding MIME charset MIME Language 2.  Which of the following request headers is sent by the client browser to specify the language in which the Apache Web server should deliver the content? The Accept request header The Accept-Language request header The Accept-Charset request header The Accept-Encoding request header 3.  Which of the following directives is used to associate a handler with a particular type of file? AddHandler SetHandler AddEncoding AddCharset 4.  Which of the following terms is used to define each of the multiple representations of a resource? Representation Resource Negotiable resource Variant 5.  Which of the following statements are not true with respect to the source quality (qs) of a variant? (Choose all that apply.) The qs is a number in the range of 0.000 to 1.000. A Part IV: Appendixes Chapter 1: Introduction to Web Servers Chapter 2: Introduction to Apache Chapter 3: Installing Apache Chapter 4: Cofiguring Apache Server Chapter 5: Implementing Access Control in Apache Chapter 6: Configuring Virutal Hosts Chapter 8: Working with Apache Modules Chapter 9: Improving Apache's Performance Chapter 10: Using Server-Side Includes (SSI) Chapter 13: Metainformation and Content Negotiation Chapter 10: Using Server-Side Includes (SSI) Chapter 2: Introduction to Apache Chapter 12: URL Mapping
  • ← Prev
  • Back
  • Next →
  • ← Prev
  • Back
  • Next →

Chief Librarian: Las Zenow <zenow@riseup.net>
Fork the source code from gitlab
.

This is a mirror of the Tor onion service:
http://kx5thpx2olielkihfyo4jgjqfb7zx7wxr3sd4xzt26ochei4m6f7tayd.onion