Subsearching
A subsearch is a Splunk search that uses a search pipeline as the argument. Subsearches in Splunk are contained in square brackets and evaluated first. Think of a subsearch as being similar to a SQL subquery (a subquery is a SQL query nested inside a larger query).
Subsearches are mainly used for three purposes:
- To parameterize one search using the output of another search
- To run a separate search but to stitch the output to the first search using the
append
command - To create a conditional search where you only see the results of your search if the result meets the criteria or perhaps the threshold of the subsearch
Generally, you use a subsearch to take the results of one search and use them in another search, all in a single Splunk search pipeline. Because of how this works, the second search must be able to accept arguments, such as with the append
command (as mentioned earlier).
Some examples of subsearching are as follows:
- Parameterization: Consider the following code:
sourcetype=TM1* ERROR[search earliest=-30d | top limit=1 date_mday| fields + date_mday]
The preceding Splunk search utilizes a subsearch as a parameterized search of all TM1 logs indexed within the Splunk instance that have error events. The subsearch (enclosed in square brackets) filters the search (looking for the
ERROR
character string in all the data of the sourcetypeTM1*
) to the past 30 days and then the top event in a single day. - Appending: Splunk's
append
command can be used to append the results of a subsearch to the results of a current search:sourcetype=TM1* ERROR | stats dc(date_year), count by sourcetype | append [search sourcetype=TM1* | top 1 sourcetype by date_year]
The preceding Splunk search utilizes a subsearch with an
append
command to combine 22 TM1 server log searches. The main search looks through all the indexed TM1 sources for "error" events; the subsequent search yields a count of the events by TM1 source by year, and the next subsearch returns the top (or the most active) TM1 source by year. The results of the two searches are then appended. - Conditions: Consider the following code:
sourcetype=access_* | stats dc(clientip), count by method | append [search sourcetype=access_* clientip where action = 'addtocart' by method]
The preceding Splunk search counts the number of different IP addresses that accessed the web server and also the user that accessed the web server the most for each type of page request (method); it was modified with the
where
clause to limit the counts to only those that are theaddtocart
actions (in other words, which user added the most to their online shopping cart—whether they actually purchased anything or not).
To understand the preceding search command better, we can dissect it into smaller sections as follows:
Output settings for subsearches
When performing a Splunk subsearch, you will often utilize the format
command, which takes the results of a subsearch and formats them into a single result.
Depending on the search pipeline, the results returned might be numerous, which will impact the performance of your search. To remedy this, you can change the number of results that the format
command operates over in line with your search by appending the following to the end of your subsearch:
| format maxresults = <integer>.
More aligned to the Splunk master perspective, it is recommended that you take a more conservative approach and utilize Splunk's limits.conf
file to enforce limits on your subsearches.
This file exists in the $SPLUNK_HOME/etc/system/default/
folder (for global settings), or for localized control, you might find (or create) a copy in the $SPLUNK_HOME/etc/system/local/
folder. The file controls all Splunk searches (provided it is coded correctly, based on your environment), but also contains a section specific to Splunk subsearches, titled subsearch
. Within this section, there are three important subsections:
maxout
: This is the maximum number of results to be returned from a subsearch. The default is 100.maxtime
: This is the maximum number of seconds to run a subsearch for before finalizing. This defaults to 60.ttl
: This is the time to cache a given subsearch's results. This defaults to 300.
The following is a sample subsearch section from a limits.conf
file:
[subsearch] maxout = 250 maxtime = 120 ttl = 400
Search Job Inspector
After running a Splunk search, you can click on the Job menu and select Inspect Job to open the Search Job Inspector dialog.
Within the Search Job Inspector dialog, you can view a summary of the returned events and (search) execution costs; also, under Search job properties, you can scroll down to the remoteSearch component and take a look at the actual Splunk search query that resulted from your subsearch.
The Splunk search job inspector can help you determine performance bottlenecks within your Splunk search pipeline, such as which search has the greatest "cost" (takes the most time). It dissects the behavior of your searches so that you better understand how to optimize them.