Using a lookup with wildcards

Splunk lookups also support wildcards, which we can use in this case.

One advantage is that we can define arbitrary fields for grouping, independent of the values of url.

For a lookup wildcard to work, first we need to set up our url field and the lookup:

  1. Extract the url field. The rex pattern we used before should work:

s[AZ]+s(?P<url>.*?)s. See Chapter 5, Tables, Charts, and Fields, for detailed instructions on setting up a field extraction. Don't forget to set permissions on the extraction.

  1. Create our lookup file. Let's call the lookup file flatten_summary_lookup.csv. Use the following contents for our example log:
url,section 
/about/*,about 
/contact/*,contact 
/*/*,unknown_non_root 
/*,root 
*,nomatch 

If you create your lookup file in Excel on a Mac, be sure to save the file using the Windows comma-separated values (.csv) format.

  1. Upload the lookup table file and create our lookup definition and automatic lookup. See the Using lookups to enrich the data section in Chapter 7, Extending Search, for detailed instructions. The automatic lookup definition should look like the following screenshot (the value of Name doesn't matter):
  1. Set the permissions on all the objects. I usually opt for All Apps for Lookup table files and Lookup definitions, and This app only for Automatic lookups. See Chapter 7, Extending Search, for details.
  2. Edit transforms.conf. In this version (actually as of Splunk 4.3), not all the features of lookups can be defined through the admin interface. To access these features, the configuration files that actually drive Splunk must be edited manually.

We will cover configuration files in greater detail in Chapter 11, Configuring Splunk, but for now, let's add two lines to one file and move on:

  1. Edit $SPLUNK_HOME/etc/apps/is_app_one/local/transforms.conf. The name of the directory is_app_one may be different depending on what app was active when you created your lookup definition. If you can't find this file, check your permissions and the app column in the admin interface.
  1. You should see these two lines, or something similar, depending on what you named your Lookup table file and Lookup definition instances:
[flatten_summary_lookup] 
filename = flatten_summary_lookup.csv 

If you do not see these lines in this file, check your permissions.

  1. Add two more lines below filename:
match_type = WILDCARD(url) 
max_matches = 1 

These two lines effectively say the following:

If everything is wired up properly, we should now be able to run the search:

sourcetype=impl_splunk_web | stats count by section 

This should give us the following simple report:

To see in greater detail what is really going on, let's try the following search:

sourcetype=impl_splunk_web 
| rex field=url "(?P<url>.*)?" 
| stats count by section url 

The rex statement is included to remove the query string from the value of url created by our extracted field. This gives us the following report:

Looking back at our lookup file, our matches appear to be as follows:

url pattern section 
/about/ /about/* about 
/contact/ /contact/* contact 
/bar /* root 
/foo /* root 
/products/ /*/* unknown_non_root 
/products/x/ /*/* unknown_non_root 
/products/y/ /*/* unknown_non_root 

If you read the lookup file from top to bottom, the first pattern that matches wins.