Owen needs to get the word out about Fang

One of the big problems facing any web site is keeping people coming back. It’s one thing to snare a visitor, but quite another to get them to come back again. Even sites with the most engaging content can fall off a person’s radar simply because it’s hard to remember to go visit a web site regularly. Knowing this, Owen wants to offer an alternative means of viewing alien abduction reports—he wants to “push” the reports to people, as opposed to them having to visit his site on a regular basis.

Push alien abduction data to the people

By pushing alien abduction content to users, Owen effectively creates a virtual team of people who can help him monitor abduction reports. With more people on the case, the odds of identifying more Fang sightings and hopefully homing in on Fang’s location increase.

Pushing web content to users is a great way to help gain more exposure for a web site.

RSS pushes web content to the people

The idea behind posting HTML content to the Web is that it will be viewed by people who visit a web site. But what if we want users to receive our web content without having to ask for it? This is possible with RSS, a data format that allows users to find out about web content without actually having to visit a site.

RSS is kinda like the web equivalent of a digital video recorder (DVR). DVRs allow you to “subscribe” to certain television shows, automatically recording every episode as it airs. Why flip channels looking for your favorite show when you can just let the shows come to you by virtue of the DVR? While RSS doesn’t actually record anything, it is similar to a DVR in that it brings web content to you instead of you having to go in search of it.

By creating an RSS feed for his alien abduction data, Owen wants to notify users when new reports are posted. This will help ensure that people stay interested, resulting in more people combing through the data. The cool thing is that the same database can drive both the web page and the RSS feed.

HTML is for viewing; RSS is for syndicating.

RSS offers a view on web data that is delivered to users automatically as new content is made available. An RSS view on a particular set of data is called an RSS feed, or newsfeed. Users subscribe to the feed and receive new content as it is posted to the web site—no need to visit the site and keep tabs.

To view an RSS feed, all a person needs is an RSS newsreader. Most popular web browsers and email clients can subscribe to RSS feeds. You just provide the newsreader with the URL of the feed, and it does all the rest.

RSS is really XML

RSS is like HTML in that it is a plain text markup language that uses tags and attributes to describe content. RSS is based on XML, which is a general markup language that can be used to describe any kind of data. XML’s power comes from its flexibility—it doesn’t define any specific tags or attributes; it just sets the rules for how tags and attributes are created and used. It’s up to specific languages such as HTML and RSS to establish the details regarding what tags and attributes can be used, and how.

In order to be proficient with RSS, you must first understand the ground rules of XML. These rules apply to all XML-based languages, including RSS and the modern version of HTML known as XHTML. These rules are simple but important—your XML (RSS) code won’t work if you violate them! Here goes:

RSS is a markup language used to describe web content for syndication.

XML is a markup language used to describe any kind of data.

Yes, sort of. But you don’t typically create XML code by hand, and it often doesn’t get stored in files.

It’s true that XML can and often does get stored in files. But with RSS we’re talking about dynamic data that is constantly changing, so it doesn’t make sense to store it in files—it would quickly get outdated and we’d have to continually rewrite the file. Instead, we want XML code that is generated on-the-fly from a database, which is how the HTML version of the main Aliens Abducted Me page already works. So we want to use PHP to dynamically generate RSS (XML) code and return it directly to an RSS newsreader upon request.

From database to newsreader

In order to provide a newsfeed for alien abduction data, Owen needs to dynamically generate RSS code from his MySQL database. This RSS code forms a complete RSS document that is ready for consumption by RSS newsreaders. So PHP is used to format raw alien abduction data into the RSS format, which is then capable of being processed by newsreaders and made available to users. The really cool part of this process is that once the newsfeed is made available in RSS form, everything else is automatic—it’s up to newsreaders to show updated news items as they appear.

An RSS newsreader is designed to consume the data made available by an RSS newsfeed.

Visualizing XML RSS

You already learned that XML code consists of tags, which are also sometimes referred to as elements, that form parent-child relationships within the context of a complete XML document. It is very helpful to be able to visualize this parent-child relationship as you work with XML code. As an example, the RSS document on the facing page can be visualized as a hierarchy of elements, kind of like a family tree for newsfeed data, with parent elements at the top fanning out to child elements as you work your way down.

Dynamically generate an RSS feed

Understanding the RSS data format is all fine and good, but Owen still needs a newsfeed to take alien abduction reports to the people. It’s time to break out PHP and dynamically generate a newsfeed full of alien abduction data that has been plucked from Owen’s MySQL database. Fortunately, this can be accomplished by following a series of steps:

The resulting newsfeed isn’t stored in a file but it is an XML document.

Set the content type of the document to XML.
Generate the XML directive to indicate that this is an XML document.
Generate the static RSS code that doesn’t come from the database, such as the <rss> tag and the channel information.
Query the aliens_abduction database for alien abduction data.
Loop through the data generating RSS code for each news item.
Generate the static RSS code required to finish up the document, including closing </channel> and </rss> tags.
```
  </channel>
</rss>
```

Link to the RSS feed

It’s important to provide a prominent link to the newsfeed for a web site because a lot of users will appreciate that you offer such a service. To help aid users in quickly finding an RSS feed for a given site, there is a standard icon you can use to visually call out the feed. We can use this icon to build a newsfeed link at the bottom of Owen’s home page (index.php).

A standard RSS icon is available to make it clear to users that you offer an RSS newsfeed.

A picture video is worth a thousand million words

After a newsfeed subscriber alerted Owen to a YouTube video with a dog in it that resembles Fang, Owen realized that he’s going to have to use additional technology to expand his search for Fang. But how? If Owen could incorporate YouTube videos into Aliens Abducted Me, his users could all be on the lookout for Fang. Not only that, but he really needs to come up with a way to avoid constantly doing manual video searches on YouTube.

Try this!

Visit Owen’s YouTube videos at www.youtube.com/user/aliensabductedme.
Watch a few of the alien abduction videos that Owen has found. Do you think the dog in the videos is Fang?

Pulling web content from others

The idea behind an RSS newsfeed is that it pushes your content to others so that they don’t have to constantly visit your web site for new content. This is a great way to make it more convenient for people to keep tabs on your site, as Owen has found out. But there’s another side to the web syndication coin, and it involves pulling content from another site to place on your site. So you become the consumer, and someone else acts as the content provider. In Owen’s case of showing YouTube videos on his site, YouTube becomes the provider.

YouTube is the provider of videos.

Aliens Abducted Me is the consumer of videos.

It’s important to understand that Owen doesn’t just want to embed a specific YouTube video or a link to a video. That’s easy enough to accomplish by simply cutting and pasting HTML code from YouTube. He wants to actually perform a search on YouTube videos and display the results of that search. So Aliens Abducted Me needs to perform a real-time query on YouTube data, and then dynamically display the results. This allows Owen and his legion of helpful Fang searchers to keep up-to-the-minute tabs on alien abduction videos that have been posted to YouTube.

Syndicating YouTube videos

In order to source videos from YouTube, we must learn exactly how YouTube makes videos available for syndication. YouTube offers videos for syndication through a request/response communication process where you make a request for certain videos and then receive information about those videos in a response from YouTube’s servers. You are responsible for both issuing a request in the format expected by YouTube and handling the response, which includes sifting through response data to get at the specific video data you need (video title, thumbnail image, link, etc.).

Following are the steps required to pull videos from YouTube and display them:

Syndicating videos from YouTube involves issuing requests and handling responses.

Build a request for YouTube videos.
Note
This request is often in the form of a URL.
Issue the video request to YouTube.
Receive YouTube’s response data containing information about the videos.
Note
YouTube uses XML to respond to video requests.
Process the response data and format it as HTML code.

Make a YouTube video request

Pulling videos from YouTube and incorporating them into your own web pages begins with a request. YouTube expects videos to be queried through the use of a REST request, which is a custom URL that leads to specific resources, such as YouTube video data. You construct a URL identifying the videos you want, and then YouTube returns information about them via an XML document.

The details of the URL for a YouTube request are determined by what videos you want to access. For example, you can request the favorite videos of a particular user. In Owen’s case, the best approach is probably to perform a keyword search on all YouTube videos. The URL required for each of these types of video REST requests varies slightly, but the base of the URL always starts like this:

Request videos by user

Requesting the favorite videos for a particular YouTube user involves adding onto the base URL, and also providing the user’s name on YouTube.

To request the favorite videos for the user elmerpriestley, use the following URL:

Request videos with a keyword search

A more powerful and often more useful YouTube video request is to carry out a keyword search that is independent of users. You can use more than one keyword as long as you separate them by forward slashes at the end of the URL.

To request the favorite videos for the keywords “elvis” and “impersonator,” use the following URL:

Owen is ready to build a REST request

Since Owen’s goal is to scour YouTube for alien abduction videos that might have Fang in them, a keyword search makes the most sense as the type of REST request to submit to YouTube. There are lots of different keyword combinations that could be used to search for possible Fang videos, but one in particular will help home in on videos related specifically to Fang:

While you probably wouldn’t reference the title of a book series when carrying out a normal YouTube video search, it just so happens to be a good idea in this particular case. Let’s just say it’s a coincidence that a lot of alien abduction videos have been made by Head First fans! With a REST request URL in hand, Owen can scratch off Step 1 of the YouTube video syndication process.

Build a request for YouTube videos.
Note
The first step is knocked out thanks to the YouTube request URL.
Issue the video request to YouTube.
Receive YouTube’s response data containing information about the videos.
Process the response data and format it as HTML code.

We can, we just need a PHP function that allows us to submit a REST request and receive a response.

Note

The SimpleXML extension to PHP, which offers the simplexml_load_file() function, was added to PHP in version 5. So prior versions of PHP don’t have built-in support for XML processing.

The built-in PHP function simplexml_load_file() lets us submit REST requests that result in XML responses, such as YouTube requests/responses. The function actually loads an XML document into a PHP object, which we can then use to drill down into the XML data and extract whatever specific information is needed. So how does that impact Owen’s YouTube video request? Check out this code, which creates a constant to hold a YouTube URL, and then issues a REST request using the simplexml_load_file() function:

Relax

Don’t sweat it if you don’t know what an object is, especially in the context of PHP.

A PHP object is a special data type that allows data to be packaged together with functions in a single construct. All you need to know for now is that it’s much easier to process XML data in PHP by using objects. You’ll learn more about how this is possible in just a bit.

Oh, but there is! The XML code returned by YouTube isn’t really as messy as it looks... you just have to know where to look.

YouTube speaks XML

The video response from YouTube isn’t exactly a DVD packaged up in a shiny box and delivered to your front door. No, it’s an XML document containing detailed information about the videos you requested, not the videos themselves.

YouTube responds to video requests with XML data that describes the videos.

The unusual XML code uses namespaces and entities, which help organize tags and encode special characters.

When you see an XML tag that has two names separated by a colon, you’re looking at a namespace, which is a way of organizing a set of tags into a logical group. The purpose of namespaces is to keep tags with the same name from clashing when multiple XML vocabularies are used in the same document. As an example, consider the following two XML tags:

Without the media namespace in the second <title> tag, it would be impossible to tell the two tags apart if they appeared in the same XML code. So you can think of a namespace as a surname for tags—it helps keep an XML document full of “first names” from clashing by hanging a “last name” on related tags. The YouTube response code uses several different namespaces, which means it is using several different XML languages at once—namespaces allow us to clearly tell them apart.

To ensure uniqueness, an XML namespace is always associated with a URL. For example, the media namespace used in YouTube XML data is established within the <feed> tag like this:

Namespaces are named groups of XML tags, while entities are used to encode special characters within XML documents.

The other strange thing in the YouTube XML code is &, which is XML’s way of representing the ampersand character (&). This is an XML entity, a symbolic way of referencing a special character, such as &, <, or >, all of which have special meaning within XML code. Following are the five predefined XML entities that you will likely encounter as you delve deeper into XML code:

Deconstruct a YouTube XML response

Once you get to know the structure of a YouTube response, extracting the video data you need is pretty straightforward. In addition to understanding what tags and attributes store what data, it’s also important to understand how the tags relate to one another. If you recall from earlier in the chapter when analyzing an RSS feed, an XML document can be viewed as a hierarchy of elements. The same is true for the XML data returned in a YouTube video response.

One important clue toward understanding the video data buried in this XML code is the different namespaces being used. The media namespace accompanies most of the tags specifically related to video data, while the yt namespace is used solely with the <statistics> tag. Finally, comments are enclosed within the <comments> tag, which falls under the gd namespace. These namespaces will matter a great deal when you begin writing PHP code to find specific tags and their data.

Visualize the XML video data

Earlier in the chapter when working with RSS code, it was revealed that an XML document can be visualized as a hierarchy of elements (tags) that have a parent-child relationship. This relationship becomes increasingly important as you begin to process XML code and access data stored within it. In fact, it can be an invaluable skill to be able to look at an XML document and immediately visualize the relationship between the elements. Just remember that any element enclosed within another element is a child, and the enclosing element is its parent. Working through the XML code for the YouTube video on the facing page results in the following visualization.

An element is just an abstract way of thinking of an XML tag and the data it contains.

The significance of this hierarchy of elements is that you can navigate from any element to another by tracing its path from the top of the hierarchy. So, for example, if you wanted to obtain the title of the video, you could trace its path like this:

Access XML data with objects

There are lots of different ways to work with XML data with PHP, and one of the best involves objects. An object is a special PHP data type that combines data and functions into a single construct. But what does that have to do with XML? The entire hierarchy of elements in an XML document is contained within a single variable, an object. You can then use the object to drill down into the data and access individual elements. Objects also have methods, which are functions that are tied to an object, and let us further manipulate the object’s data. For an object that contains XML data, methods let us access the collection of child elements for an element, as well as its attributes.

Objects are a special PHP data type that combine data and functions together.

You’ve already seen how to create this XML object for Owen’s alien abduction YouTube keyword search:

This code results in a variable named $xml that contains all of the XML YouTube video data packaged into a PHP object. To access the data you use object properties, which are individual pieces of data stored within an object. Each property corresponds to an XML element. Take a look at the following example, which accesses all of the entry elements in the document:

This code accesses all the entry elements in the XML data using a property. Since there are multiple entry elements in the data, the $entries variable stores an array of objects that you can use to access individual video entries. And since we’re now dealing with an array, each video <entry> tag can be accessed by indexing the array. For example, the first <entry> tag in the document is the first item in the array, the second tag is the second item, etc.

From XML elements to PHP objects

When it comes to XML data and PHP objects, you’re really dealing with a collection of objects. Remember that stuff about visualizing an XML document as a hierarchy of elements? Well, that same hierarchy is realized as a collection of objects in PHP. Take a look:

This element hieararchy/object collection stuff forms the basis of understanding how to dig through XML data in PHP. With the relationship between individual pieces of XML data in mind, it becomes possible to write code that navigates through the data. Then we can isolate the content stored in a particular tag or attribute down deep in an XML document.

Drill into XML data with objects

Getting back to Owen, our goal is to pull out a few pieces of information for videos that are returned as part of the XML YouTube response. We know how to retrieve the XML data into a PHP object using the simplexml_load_file() function, but most of the interesting data is found down deeper in this data. How do we navigate through the collection of objects? The answer is the -> operator, which is used to reference a property or method of an object. In the case of an XML object, the -> operator accesses each child object. So this code displays the title of a video entry stored in a variable named $entry:

This code relies heavily on the relationship between the title, group, and entry objects, which form a parent-child relationship from one to the next.

The -> operator references a child object from a parent object. So title is a child of group, which is a child of entry. Remember that the -> operator can be used to access both properties and methods. One method that comes in particularly handy is the attributes() method, which is able to pluck out the value of an XML attribute for a given element.

This code drills down to the duration element and then grabs all of its attributes and stores them in the $attrs variable, which is an array of all the attributes. The value of the seconds attribute is then retrieved from the array.

Not without a namespace!

There’s a small problem with the code on the facing page that accesses XML data using objects, and it has to do with namespaces. If you recall, namespaces act as surnames for tags by organizing them into meaningful collections. So in a YouTube response, the <duration> tag is actually coded as <yt:duration>, and the title for a video is coded as <media:title>, not <title>. When an element is associated with a namespace, you can’t just reference it by tag name in PHP code. Instead, you have to first isolate it by namespace by calling the children() method on the parent object.

Namespaces make it a bit trickier to access elements within XML data.

This code retrieves all the child objects of the video entry whose namespace is http://search.yahoo.com/mrss/. But that’s the URL for a namespace, not the namespace itself. This URL is located in the <feed> tag at the start of the XML document. This is where you’ll find all the namespaces being used.

This code reveals how each namespace is associated with a URL. More specifically, it shows how the media and yt namespaces are specified for use in the document. This is all you need to find tags related to these two namespaces.

Once you’ve isolated the child elements for a particular namespace by calling the children() method on the parent element, you can then resume accessing child objects with the -> operator. For example, this code obtains the video title from the <media:group> tag:

Use the children() method to isolate all elements associated with a namespace.

Fang sightings are on the rise

While Owen has been busy brushing up on XML and figuring out how to communicate with YouTube, Fang has been busy. Numerous video sightings have turned up with the little guy apparently serving as a tour guide for his alien abductors. Owen is ready to finish up the YouTube script, get some videos showing on the Aliens Abducted Me home page, and find his lost dog.

The good news is that Owen is almost finished with the YouTube script. In fact, all that’s left is to finish processing the XML data and format it as HTML code.

Build a request for YouTube videos.
Issue the video request to YouTube.
Receive YouTube’s response data containing information about the videos.
Process the response data and format it as HTML code.
Note
Knock out this step and the youtube.php script will be done!

Draw how you would format YouTube response data as videos along the bottom of the main Aliens Abducted Me page:

Lay out videos for viewing

The idea behind the youtube.php script is that it will be included in the main index.php script for Aliens Abducted Me. This means that the youtube.php script needs to take care of submitting a video request, processing the XML response, and formatting the individual videos so that they are displayed via HTML in such a way that they can coexist with the alien abduction reports that are already on the main page. A good way to accomplish this is to arrange the videos horizontally along the bottom of the page.

The youtube.php script will be included so that the videos appear just below the alien abduction reports.

Arranging the videos horizontally on the main page keeps them from detracting too much from the alien abduction reports. Also, we’re talking about arranging the video thumbnail images, not the videos themselves, so users will have to click a thumbnail to visit YouTube and see the actual video. It would eat up too much screen real estate to attempt to show multiple videos large enough to be embedded directly on the Aliens Abducted Me page.

Format video data for display

Although a video thumbnail image is certainly one of the most important pieces of information when assessing whether or not a video is worth watching, it isn’t the only data useful for Owen’s YouTube script. For example, the title of a video could hold some important information about the nature of the video—like whether it might include a dog. The length of the video could also be helpful. And of course, we need the URL of the video link to YouTube so that the user can click on a video thumbnail to actually view a video. So the following information is what we need to extract from the XML data in the YouTube response:

This data forms the basis for the HTML code that displays a horizontal row of videos. In fact, each video in the row ends up looking like this:

In the YouTube response data, the length of a video is specified in the seconds attribute of the <yt:duration> tag. Unfortunately, most people don’t think in terms of total seconds because we’re accustomed to times being specified in minutes and seconds. For example, it isn’t immediately obvious that 330 seconds is a five-and-a-half-minute video—you have to do the math for the value to make sense as a length of time. Knowing this, it’s a good idea to go ahead and do the math for users when displaying the length of a video, converting seconds into minutes and seconds.

Geek Bits

It isn’t necessary to factor in hours in the video length calculation because YouTube doesn’t currently allow videos longer than 10 minutes to be posted.

That is, unless you’re part of the YouTube Director program, in which case you can post videos longer than 10 minutes.

Your PHP & MySQL Toolbox

With Fang now accounted for, it’s possible to reflect a little on what all it took to track him down. As it turns out, PHP and MySQL required some help from a few other technologies.

The End

Q:	Q: Why is RSS so much better than someone just coming to my web site?
A:	A: If people regularly visited your web site to seek out the latest content, then RSS wouldn’t be any better than simply displaying content on your web site. But most people forget about web sites, even ones they like. So RSS provides an effective means of taking your web content directly to people, as opposed to requiring them to seek it out.
Q:	Q: What does RSS stand for?
A:	A: Nowadays RSS stands for Really Simple Syndication. Throughout its storied history there have been several different versions, but the latest incarnation of RSS (version 2.0) stands for Really Simple Syndication, which is all you need to worry about.
Q:	Q: So what does RSS consist of?
A:	A: RSS is a data format. So just as HTML is a data format that allows you to describe web content for viewing in a web browser, RSS is a data format that describes web content that is accessible as a news feed. Similar to HTML, the RSS data format is pure text, and consists of tags and attributes that are used to describe the content in a newsfeed.
Q:	Q: Where do I get an RSS reader?
A:	A: Most web browsers have a built-in RSS reader. Some email clients even include RSS readers, in which case RSS news items appear as email messages in a special news feed folder. There are also stand-alone RSS readers available.

<rss>	This tag has nothing to do with RSS. But it sure sounds like a cool name for a piece of news data!
<channel>	The publication date is an important piece of information for any news item, and this tag is used to specify it.
<cronkite>	This tag represents a single channel in an RSS feed, and acts as a container for descriptive data and individual news items.
<title>	Represents an individual news item, or story, which is further described by child elements.
<language>	This tag always contains a URL that serves as the link for a channel or news item.
<link>	Encloses an entire RSS feed—all other tags must appear inside of this tag.
<description>	This tag stores the title of a channel or news item, and is typically used within the `<channel>` and `<item>` tags.
<pubDate>	Used to provide a brief description of a channel or news item, appearing within either the `<channel>` and `<item>` tags.
<item>	This tag applies to a channel, and specifies the language used by the channel, such as `en-us` (U.S. English).

Q:	Q: Is XML case-sensitive?
A:	A: Yes, the XML language is case-sensitive, so it matters whether text is uppercase or lowercase when specifying XML tags and attributes. A good example is the RSS `<pubDate>` tag, which must appear in mixed case with the capital D. Most XML tags are either all lowercase or mixed-case.
Q:	Q: What about whitespace? How does it fit into XML?
A:	A: First of all, whitespace in XML consists of carriage returns (`\r`), newlines (`\n`), tabs (`\t`), and spaces (`' '`). The majority of whitespace in most XML documents is purely for aesthetic formatting purposes, such as indenting child tags. This “insignificant” whitespace is typically ignored by applications that process XML data, such as RSS news readers. However, whitespace that appears inside of a tag is considered “significant,” and is usually rendered exactly as it appears. This is what allows things like poems that have meaningful spacing to be accurately represented in XML.
Q:	Q: Can an RSS feed contain images?
A:	A: Yes. Just keep in mind that not every news reader is able to display images. Also, in RSS 2.0 you can only add an image to a channel, not individual news items. You add an image to a channel using the `<image>` tag, which must appear inside the `<channel>` tag. Here’s an example: <image> <url>http://www.aliensabductedme.com/fang.jpg</url> <title>My dog Fang</title> <link>http://www.aliensabductedme.com</link> </image> It is technically possible to include an image in a news item in RSS 2.0; the trick is to use the HTML `<img>` tag within the description of the item. While this is possible, it requires you to encode the HTML tag using XML entities, and in many ways, it goes against the premise of an RSS item being pure text content.

Q:	Q: How is REST different than, say, a GET request?
A:	A: It’s not. Any time you’ve used a GET request, such as simply requesting a web page, you’re using REST. You can think of a normal web page as a REST resource in that it can be accessed via a URL, and GET is the REST “action” used to access the resource. Where REST gets more interesting is when it is used to build queries, such as YouTube video requests. In this case you’re still dealing with REST requests but they are querying a database for data instead of simply requesting a static web page.
Q:	Q: Does the order of arguments matter when performing a YouTube keyword search?
A:	A: Yes. The first keywords are given a higher precedence than later keywords, so make sure to list them in order of decreasing importance.
Q:	Q: When there are multiple matches for a video search, how does YouTube determine what videos to return?
A:	A: YouTube keyword video requests return videos based on search relevance, meaning that you will get the videos that best match the keywords, regardless of when the videos were posted to YouTube.

Q:	Q: Why do I even need to worry about namespaces?
A:	A: Because XML code generated by others often involves namespaces, which affects how you access XML elements programmatically. As you’re about to find out, the namespace associated with an element directly affects how you find the element when writing PHP code that processes XML data. So the namespace must be factored into code that is attempting to grab the data for a given element.
Q:	Q: How do I know if a tag is part of a namespace?
A:	A: Although it’s possible to have a default namespace that doesn’t explicitly appear in the code for a tag, in most cases you’ll see the namespace right there in the tag name, so the tag is coded as `<media:title>` instead of just `<title>`. The name to the left of the colon is always the namespace.