Input Filters

An input filter is called when input is required. It is handed a brigade to fill, a mode parameter (the mode can either be blocking, nonblocking, or peek), and a number of bytes to read — 0 means “read a line.” Most input filters will, of course, call the filter below them to get data, process it in some way, then fill the brigade with the resulting data.

As with output filters, the filter must be registered:

ap_register_input_filter("filter name", filter_function, AP_FTYPE_CONTENT);

where the parameters are as described earlier for output filters. Note that there is currently no attempt to avoid collisions in filter names, which is probably a mistake. As with output filters, you have to insert the filter at the right moment — all is the same as earlier, except the functions say “input” instead of “output,” of course.

Naturally, input filters are similar to but not the same as output filters. It is probably simplest to illustrate the differences with an example. The following filter converts the case of request data (note, just the data, not the headers — so to see anything happen, you need to do a POST request). It should be available in modules/experimental/mod_case_filter_in.c. (Note the comments follow the line(s) of code to which they refer.)

#include "httpd.h"
#include "http_config.h"
#include "apr_general.h"
#include "util_filter.h"
#include "apr_buckets.h"
#include "http_request.h"

#include <ctype.h>

As always, we start with the headers we need.

static const char s_szCaseFilterName[]="CaseFilter";

And then we see the name of the filter. Note that this is the same as the example output filter — this is fine, because there’s never an ambiguity between input and output filters.

module case_filter_in_module;

This is just the usual required forward declaration.

typedef struct
{
    int bEnabled;
} CaseFilterInConfig;

This is a structure to hold on to whether this filter is enabled or not.

typedef struct
{
    apr_bucket_brigade *pbbTmp;
} CaseFilterInContext;

Unlike the output filter, we need a context — this is to hold a temporary bucket brigade. We keep it in the context to avoid recreating it each time we are called, which would be inefficient.

static void *CaseFilterInCreateServerConfig(apr_pool_t *p,server_rec *s)
{
    CaseFilterInConfig *pConfig=apr_pcalloc(p,sizeof *pConfig);

    pConfig->bEnabled=0;

    return pConfig;
}

Here is just standard stuff creating the server config structure (note that ap_pcalloc() actually sets the whole structure to zeros anyway, so the explicit initialization of bEnabled is redundant, but useful for documentation purposes).

static void CaseFilterInInsertFilter(request_rec *r)
{
    CaseFilterInConfig *pConfig=ap_get_module_config(r->server->module_config,
                                                     &case_filter_in_module);
    CaseFilterInContext *pCtx;

    if(!pConfig->bEnabled)
        return;

If the filter is enabled (by the CaseFilterIn directive), then...

    pCtx=apr_palloc(r->pool,sizeof *pCtx);
    pCtx->pbbTmp=apr_brigade_create(r->pool);

Create the filter context discussed previously, and...

    ap_add_input_filter(s_szCaseFilterName,pCtx,r,NULL);

insert the filter. Note that because of where we’re hooked, this happens after the request headers have been read.

Now we move on to the actual filter function.

static apr_status_t CaseFilterInFilter(ap_filter_t *f,
                                       apr_bucket_brigade *pbbOut,
                                       ap_input_mode_t eMode,
                                       apr_size_t *pnBytes)
{
    CaseFilterInContext *pCtx=f->ctx;

First we get the context we created earlier.

    apr_status_t ret;

    ap_assert(APR_BRIGADE_EMPTY(pCtx->pbbTmp));

Because we’re reusing the temporary bucket brigade each time we are called, it’s a good idea to ensure that it’s empty — it should be impossible for it not to be, hence the use of an assertion instead of emptying it.

    ret=ap_get_brigade(f->next,pCtx->pbbTmp,eMode,pnBytes);

Get the next filter down to read some input, using the same parameters as we got, except it fills the temporary brigade instead of ours.

    if(eMode == AP_MODE_PEEK || ret != APR_SUCCESS)
        return ret;

If we are in peek mode, all we have to do is return success if there is data available. Since the next filter down has to do the same, and we only have data if it has, then we can simply return at this point. This may not be true for more complex filters, of course! Also, if there was an error in the next filter, we should return now regardless of mode.

    while(!APR_BRIGADE_EMPTY(pCtx->pbbTmp)) {

Now we loop over all the buckets read by the filter below.

        apr_bucket *pbktIn=APR_BRIGADE_FIRST(pCtx->pbbTmp);
        apr_bucket *pbktOut;
        const char *data;
        apr_size_t len;
        char *buf;
        int n;

        // It is tempting to do this...
        //APR_BUCKET_REMOVE(pB);
        //APR_BRIGADE_INSERT_TAIL(pbbOut,pB);
        // and change the case of the bucket data, but that would be wrong
        // for a file or socket buffer, for example...

As the comment says, the previous would be tempting. We could do a hybrid — move buckets that are allocated in memory and copy buckets that are external resources, for example. This would make the code considerably more complex, though it might be more efficient as a result.

        if(APR_BUCKET_IS_EOS(pbktIn)) {
            APR_BUCKET_REMOVE(pbktIn);
            APR_BRIGADE_INSERT_TAIL(pbbOut,pbktIn);
            continue;
        }

Once we’ve read an EOS, we should pass it on.

        ret=apr_bucket_read(pbktIn,&data,&len,eMode);
        if(ret != APR_SUCCESS)
            return ret;

Again, we read the bucket in the same mode in which we were called (which, at this point, is either blocking or nonblocking, but definitely not peek) to ensure that we don’t block if we shouldn’t, and do if we should.

        buf=malloc(len);
        for(n=0 ; n < len ; ++n)
            buf[n]=toupper(data[n]);

We allocate the new buffer on the heap, because it will be consumed and destroyed by the layers above us — if we used a pool buffer, it would last as long as the request does, which is likely to be wasteful of memory.

        pbktOut=apr_bucket_heap_create(buf,len,0,NULL);

As always, the bucket for the buffer needs to have a matching type (note that we could ask the bucket to copy the data onto the heap, but we don’t).

        APR_BRIGADE_INSERT_TAIL(pbbOut,pbktOut);

Add the new bucket to the output brigade.

        apr_bucket_delete(pbktIn);

And delete the one we got from below.

    }

    return APR_SUCCESS;

If we get here, everything must have gone fine, so return success.

}

static const char *CaseFilterInEnable(cmd_parms *cmd, void *dummy, int arg)
{
    CaseFilterInConfig *pConfig
      =ap_get_module_config(cmd->server->module_config,&case_filter_in_module);
    pConfig->bEnabled=arg;

    return NULL;
}

This simply sets the Boolean enable flag in the configuration for this module. Note that we’ve used per-server configuration, but we could equally well use per-request, since the filter is added after the request is processed.

static const command_rec CaseFilterInCmds[] = 
{
    AP_INIT_FLAG("CaseFilterIn", CaseFilterInEnable, NULL, RSRC_CONF,
                 "Run an input case filter on this host"),

Associate the configuration command with the function that sets it.

    { NULL }
};


static void CaseFilterInRegisterHooks(apr_pool_t *p)
{
    ap_hook_insert_filter(CaseFilterInInsertFilter,NULL,NULL,APR_HOOK_MIDDLE);

Hook the filter insertion hook — this gets called after the request header has been processed, but before any response is written or request body is read.

    ap_register_input_filter(s_szCaseFilterName,CaseFilterInFilter,
                             AP_FTYPE_RESOURCE);

This is a convenient point to register the filter.

}

module case_filter_in_module =
{
    STANDARD20_MODULE_STUFF,
    NULL,
    NULL,
    CaseFilterInCreateServerConfig,
    NULL,
    CaseFilterInCmds,
    CaseFilterInRegisterHooks
};

Finally, we associate the various functions with the correct slots in the module structure. Incidentally, some people prefer to put the module structure at the beginning of the source — I prefer the end because it avoids having to predeclare all the functions used in it.