Splunk provides an extensive HTTP REST interface, which allows searching, adding data, adding inputs, managing users, and more. Documentation and SDKs are provided by Splunk at http://dev.splunk.com/.
To get an idea of how this REST interaction happens, let's walk through a sample conversation to run a query and retrieve the results. The steps are essentially as follows:
- Start the query (POST)
- Poll for status (GET)
- Retrieve results (GET)
We will use the command-line program curl to illustrate these steps. The SDKs make this interaction much simpler.
The command to start a query is as follows:
curl -u user:pass -k https://yourserver:8089/services/search/jobs -
d"search=search query"
This essentially says to use POST on the search=search query. If you are familiar with HTTP, you might notice that this is a standard POST from an HTML form.
To run the query earliest=-1h index="_internal" warn | stats count by host, we need to URL-encode the query. The command, then, is as follows:
$ curl -u admin:changeme -k
https://localhost:8089/services/search/jobs -
d"search=search%20earliest%3D-1h%20index%3D%22_internal%22%20
warn%20%7C%20stats%20count%20by%20host"
If the query is accepted, we will receive XML code that contains our search ID:
<?xml version='1.0' encoding='UTF-8'?> <response><sid>1352061658.136</sid></response>
The contents of <sid> are then used to reference this job. To check the status of our job, we will run the following code:
curl -u admin:changeme -k
https://localhost:8089/services/search/jobs/1352061658.136
This returns a large document with copious amounts of information about our job, as follows:
<entry ...> <title>search earliest=-1h index="_internal" warn | stats count by host</title> <id>https://localhost:8089/services/search/jobs/1352061658.136</id> ... <link href="/services/search/jobs/1352061658.136/events" rel="events"/> <link href="/services/search/jobs/1352061658.136/results" rel="results"/> ... <content type="text/xml"> <s:dict> ... <s:key name="doneProgress">1.00000</s:key> ... <s:key name="eventCount">67</s:key> ... <s:key name="isDone">1</s:key> ... <s:key name="resultCount">1</s:key>
Interesting fields include doneProgress, eventCount, resultCount, and the field we are most interested in at this point, isDone. If isDone is not 1, we should wait and poll again later. Once isDone=1, we can retrieve our results from the URL specified in <link rel="results">.
To retrieve our results, we make the following call:
curl -u admin:changeme -k
https://localhost:8089/services/search/jobs/1352061658.136/results
This returns the following XML output:
<?xml version='1.0' encoding='UTF-8'?> <results preview='0'> <meta> <fieldOrder> <field>host</field> <field>count</field> </fieldOrder> </meta> <result offset='0'> <field k='host'> <value><text>vlb.local</text></value> </field> <field k='count'> <value><text>67</text></value> </field> </result> </results>
The list of fields is contained in meta/fieldOrder. Each result will then follow this field order.
Though not necessary (since jobs expire on their own), we can save disk space on our Splunk servers by cleaning up after ourselves. Simply calling the DELETE method on the job URL will delete the results and reclaim the used disk space:
curl -u admin:changeme -k -X DELETE
https://localhost:8089/services/search/jobs/1352061658.136
Just to show the Python API action, here's a simple script:
import splunk.search as search import splunk.auth as auth import sys import time username = sys.argv[1] password = sys.argv[2] q = sys.argv[3] sk = auth.getSessionKey(username, password) job = search.dispatch("search " + q, sessionKey=sk) while not job.isDone: print "Job is still running." time.sleep(.5) for r in job.results: for f in r.keys(): print "%s=%s" % (f, r[f]) print "----------" job.cancel()
This script uses the Python modules included with Splunk, so we must run it using Splunk's included Python, as follows:
$ /opt/splunk/bin/splunk cmd python simplesearch.py admin changeme 'earliest=-7d index="_internal" warn | timechart count by source'
This produces the following output:
_time=2012-10-31T00:00:00-0500 /opt/splunk/var/log/splunk/btool.log=0 /opt/splunk/var/log/splunk/searches.log=0 /opt/splunk/var/log/splunk/splunkd.log=31 /opt/splunk/var/log/splunk/web_service.log=0 _span=86400 _spandays=1 ---------- _time=2012-11-01T00:00:00-0500 /opt/splunk/var/log/splunk/btool.log=56 /opt/splunk/var/log/splunk/searches.log=0 /opt/splunk/var/log/splunk/splunkd.log=87 /opt/splunk/var/log/splunk/web_service.log=2 _span=86400 _spandays=1 ---------- ...
For more examples and extensive documentation, check out http://dev.splunk.com.