A SAX filter sits between the parser and the client
application and intercepts the messages that these two objects pass to
each other. It can pass these messages unchanged or modify, replace,
or block them. To a client application, the filter looks like a
parser, that is, an XMLReader
. To
the parser, the filter looks like a client application, that is, a
ContentHandler
.
SAX filters are implemented by subclassing the org.xml.sax.helpers.XMLFilterImpl
class.[1] This class implements all the required interfaces of SAX
for both parsers and client applications. That is, its signature is as
follows:
public class XMLFilterImpl implements XMLFilter, XMLReader, ContentHandler, DTDHandler, ErrorHandler
Your own filters will extend this class and override those
methods that correspond to the messages you want to filter. For
example, if you wanted to filter out all processing instructions, you would write a filter that
would override the processingInstruction()
method to do nothing, as shown in Example 20-5.
Example 20-5. A SAX filter that removes processing instructions
import org.xml.sax.helpers.XMLFilterImpl; public class ProcessingInstructionStripper extends XMLFilterImpl { public void processingInstruction(String target, String data) { // Because this does nothing, processing instructions read in the // document are *not* passed to client application } }
If instead you wanted to replace a processing instruction with
an element whose name was the same as the processing instruction's
target and whose text content was the processing instruction's data,
you'd call the startElement( )
,
characters( )
, and endElement( )
methods from inside the
processingInstruction()
method
after filling in the arguments with the relevant data from the
processing instruction, as shown in Example 20-6.
Example 20-6. A SAX filter that converts processing instructions to elements
import org.xml.sax.*; import org.xml.sax.helpers.*; public class ProcessingInstructionConverter extends XMLFilterImpl { public void processingInstruction(String target, String data) throws SAXException { // AttributesImpl is an adapter class in the org.xml.sax.ext package // for precisely this case. We don't really want to add any attributes // here, but we need to pass something as the fourth argument to // startElement( ). Attributes emptyAttributes = new AttributesImpl( ); // We won't use any namespace for the element startElement("", target, target, emptyAttributes); // converts String data to char array char[ ] text = data.toCharArray( ); characters(text, 0, text.length); endElement("", target, target); } }
We used this filter before passing Example 20-2 into a program that
echoes an XML document onto System.out
and were a little surprised to
see this come out:
<xml-stylesheet>type="text/css" href="person.css"</xml-stylesheet> <person xmlns="http://xml.oreilly.com/person"> <name:name xmlns:name="http://xml.oreilly.com/name"> <name:first>Sydney</name:first> <name:last>Lee</name:last> </name:name> <assignment project_id="p2"></assignment> </person>
This document is not well-formed! The specific problem is that there are two independent root elements. However, on further consideration, that's really not too surprising. Well-formedness checking is normally done by the underlying parser when it reads the text form of an XML document. SAX filters should, but are not absolutely required to, provide well-formed XML data to client applications. Indeed, they can produce substantially more malformed data than this by including start-tags that are not matched by end-tags; text that contains illegal characters, such as the formfeed or the vertical tab; and XML names that contain non-name characters, such as * and §. You need to be very careful before assuming data you receive from a filter is valid or well-formed.
If you want to invoke a method without filtering it, or you
want to invoke the same method in the underlying handler, you can
prefix a call to it with the super
keyword. This invokes the variant of the method from
the superclass. By default, each method in XMLFilterImpl
just passes the same arguments to the equivalent method
in the parent handler. Example
20-7 demonstrates with a filter that changes all character data
to uppercase by overriding the characters()
method.
Example 20-7. A SAX filter that converts text to uppercase
import org.xml.sax.*; import org.xml.sax.helpers.*; public class UpperCaseFilter extends XMLFilterImpl { public void characters(char[ ] text, int start, int length) throws SAXException { String temp = new String(text, start, length); temp = temp.toUpperCase( ); text = temp.toCharArray( ); super.characters(text, 0, text.length); } }
Using a filter involves these steps:
Create a filter object, normally by invoking its own constructor.
Create the XMLReader
that
will actually parse the document, normally by calling XMLReaderFactory.createXMLReader(
)
.
Attach the filter to the parser using the filter's setParent( )
method.
Install a ContentHandler
in the filter.
Parse the document by calling the filter's parse( )
method.
Details can vary a little from application to application. For
instance, you might install other handlers besides the ContentHandler
or change the parent between
documents. However, once the filter has been attached to the
underlying XMLReader
, you should
not directly invoke any methods on this underlying parser; you should
only talk to it through the filter. For example, this is how you'd use
the filter in Example 20-7
to parse a document:
XMLFilter filter = new UpperCaseFilter( ); filter.setParent(XMLReaderFactory.createXMLReader( )); filter.setContentHandler(yourContentHandlerObject); filter.parse(document);
Notice specifically that you invoke the filter's parse( )
method, not the underlying parser's
parse( )
method.
[1] There's also an org.xml.sax.XMLFilter
interface. However, this interface is arranged
exactly backward for most use cases. It filters messages from the
client application to the parser, but not the much more important
messages from the parser to the client application. Furthermore,
implementing the XMLFilter
interface directly requires a lot more work than subclassing
XMLFilterImpl
. Experienced SAX
programmers almost never implement XMLFilter
directly rather than
subclassing XMLFilterImpl
.