In addition to reading files, Splunk can listen to network ports. The stanzas take the following form:
[protocol://<remote host>:<local port>]
The remote host portion is rarely used, but the idea is that you can specify different input configurations for specific hosts. The usual stanzas look like this:
- [tcp://1234]: Specify that we will listen to port 1234 for TCP connections. Anything can connect to this port and send data in.
- [tcp-ssl://importanthost:1234]: Listen on TCP using SSL, and apply this stanza to the importanthost host. Splunk will generate self-signed certificates the first time it is launched.
- [udp://514]: This is generally used to receive syslog events. While this does work, it is generally considered a best practice to use a dedicated syslog receiver, such as rsyslog or syslog-ng. See Chapter 12, Advanced Deployments, for a discussion on this subject.
- [splunktcp://9997] or [splunktcp-ssl://9997]: In a distributed environment, your indexers will receive events on the specified port. It is a custom protocol used between Splunk instances. This stanza is created for you when you go to Settings | Forwarding and receiving | Receive data and configure your Splunk instance to receive data forwarded from other Splunk instances.
For TCP and UDP inputs, the following attributes apply:
- source: If it is not specified, the source will default to protocol:port, for instance, udp:514.
- sourcetype: If it is not specified, sourcetype will also default to protocol:port, but this is generally not what you want. It is best to specify a source type and create a corresponding stanza in props.conf.
- connection_host: With network inputs, what value to capture for host is somewhat tricky. Your options essentially are:
- connection_host = dns uses reverse DNS to determine the hostname from the incoming connection. When reverse DNS is configured properly, this is usually your best bet. This is the default setting.
- connection_host = ip sets the host field to the IP address of the remote machine. This is your best choice when reverse DNS is unreliable.
- connection_host = none uses the hostname of the Splunk instance receiving the data. This option can make sense when all traffic is going to an interim host.
- host = foo sets the hostname statically.
- It is also common to reset the value of the host using a transform, for instance, with syslog events. This happens after parsing, though, so it is too late to change things such as time zone based on the host.
- queueSize: This value specifies how much memory Splunk is allowed to set aside for an input queue. A common use for a queue is to capture spiky data until the indexers can catch up.
- persistentQueueSize: This value specifies a persistent queue that can be used to capture data to the disk if the in-memory queue fills up. If you find yourself building a particularly complicated setup around network ports, I would encourage you to talk to Splunk support as there may be a better way to accomplish your goals.