SecuritySynapse

By Tony Lee

Welcome to the fourth article of the Spelunking your Splunk series, all designed to help you understand your Splunk environment at a quick glance. Here is a quick recap of the previous articles:

Spelunking your Splunk Part I (Exploring Your Data) - A clever dashboard that can be used to quickly understand the indexes, sources, sourcetypes, and hosts in any Splunk environment.
Spelunking your Splunk – Part II (Disk Usage) - A dashboard that can be used to monitor data distribution across multiple indexers.
Spelunking your Splunk – Part III (License Usage) - A dashboard to understand license usage over time.

This article will focus on understanding the users within the environment--even when spread over a search head cluster. We will show you that it is possible to check the amount of concurrent Splunk users, how much they are searching, successful and failed logins and aged accounts. This information is useful not only from an accountability perspective, but also from a resource perspective. When a search head (or cluster) becomes overloaded with users, it may be a good time to consider horizontal scaling.

Finding and understanding user information

There are at least two places within Splunk to discover user information. The first requires a RESTful call and provides information about authenticated users. The second is a search against the _audit index filtering on user activity. Try copying and pasting the following two searches into your Splunk search bar one at a time to see what data is returned:

| rest /services/authentication/httpauth-tokens splunk_server=*

Figure 1: Current authenticated users via httpauth-tokens

index=_audit user=*

Figure 2: _audit index with a focus on user activity

Now that you understand the basics, the sky is the limit. You can audit each user or display the statistics for all users. Take a look at our dashboard below to see what is possible. If you find it useful, we provide the code for it at the bottom of this article. Give it a try and let us know what you think.

Figure 3: User Metrics dashboard with all panels

Conclusion

Dashboard XML code

Below is the dashboard code needed to enumerate your user metrics. Feel free to modify the dashboard as needed:

<form>
<label>User Metrics</label>
<description>Displays Interesting Usage Metrics</description>

<fieldset autoRun="True">
<input type="time" searchWhenChanged="true">
<default>
<earliestTime>-24h@h</earliestTime>
<latestTime>now</latestTime>
</default>
</input>
<input type="text" token="wild">
<label>Search</label>
<default>*</default>
<suffix/>
</input>
</fieldset>
<row>
<panel>
<chart>
<title>Current Active Users</title>
<search>
<query>| rest /services/authentication/httpauth-tokens splunk_server=* | where NOT userName="splunk-system-user" | stats dc(userName) AS "Total Users"</query>
<earliest>$earliest$</earliest>
<latest>$latest$</latest>
</search>
<option name="charting.axisLabelsX.majorLabelStyle.overflowMode">ellipsisNone</option>
<option name="charting.axisLabelsX.majorLabelStyle.rotation">0</option>
<option name="charting.axisTitleX.visibility">visible</option>
<option name="charting.axisTitleY.visibility">visible</option>
<option name="charting.axisTitleY2.visibility">visible</option>
<option name="charting.axisX.scale">linear</option>
<option name="charting.axisY.scale">linear</option>
<option name="charting.axisY2.enabled">false</option>
<option name="charting.axisY2.scale">inherit</option>
<option name="charting.chart">fillerGauge</option>
<option name="charting.chart.nullValueMode">gaps</option>
<option name="charting.chart.sliceCollapsingThreshold">0.01</option>
<option name="charting.chart.stackMode">default</option>
<option name="charting.chart.style">shiny</option>
<option name="charting.drilldown">all</option>
<option name="charting.layout.splitSeries">0</option>
<option name="charting.legend.labelStyle.overflowMode">ellipsisMiddle</option>
<option name="charting.legend.placement">right</option>
</chart>
</panel>
<panel>
<table>
<title>Current Logged in Users</title>
<search>
<query>| rest /services/authentication/httpauth-tokens splunk_server=* | where NOT userName ="splunk-system-user" | stats max(timeAccessed) AS "Latest Activity" by userName | rename userName AS "User" | sort -"Latest Activity"</query>
<earliest>$earliest$</earliest>
<latest>$latest$</latest>
</search>
<option name="count">10</option>
<option name="dataOverlayMode">none</option>
<option name="drilldown">cell</option>
<option name="rowNumbers">false</option>
<option name="wrap">true</option>
</table>
</panel>
<panel>
<table>
<title>Total Searches</title>
<search>
<query>index=_audit user=* (action="search" AND info="granted") | where NOT user ="splunk-system-user" | stats count(action) AS Searches by user | sort - Searches</query>
<earliest>$earliest$</earliest>
<latest>$latest$</latest>
</search>
<option name="count">10</option>
<option name="dataOverlayMode">none</option>
<option name="drilldown">cell</option>
<option name="rowNumbers">false</option>
<option name="wrap">true</option>
</table>
</panel>
</row>
<row>
<panel>
<table>
<title>Successful Logins</title>
<search>
<query>index=_audit user=* (action="login attempt" AND info="succeeded") | stats count(action) AS Logins by user | rename user AS User, Logins AS Successes | sort - Successes</query>
<earliest>$earliest$</earliest>
<latest>$latest$</latest>
</search>
<option name="count">10</option>
<option name="dataOverlayMode">none</option>
<option name="drilldown">cell</option>
<option name="rowNumbers">false</option>
<option name="wrap">true</option>
</table>
</panel>
<panel>
<table>
<title>Failed Logins</title>
<search>
<query>index=_audit user=* (action="login attempt" AND info="failed") | stats count(action) AS Logins by user | rename user AS User, Logins AS Failures | sort - Failures</query>
<earliest>0</earliest>
<latest></latest>
</search>
<option name="count">10</option>
<option name="dataOverlayMode">none</option>
<option name="drilldown">cell</option>
<option name="rowNumbers">false</option>
<option name="wrap">true</option>
</table>
</panel>
<panel>
<table>
<title>Aged Accounts (15 days or older)</title>
<search>
<query>index=_audit user=* (action="login attempt" AND info="succeeded") | dedup user | eval age_days=round((now()-_time)/(60*60*24)) | where age_days >= 15 | eval time=strftime(_time, "%m/%d/%Y %H:%M:%S") | table user, time, age_days | sort -age_days</query>
<earliest>-15d@d</earliest>
<latest>now</latest>
</search>
<option name="wrap">true</option>
<option name="rowNumbers">false</option>
<option name="dataOverlayMode">none</option>
<option name="drilldown">cell</option>
<option name="count">10</option>
</table>
</panel>
</row>
</form>

By Tony Lee

Are you a Splunk + Qualys customer? If so, are you downloading the Qualys Knowledge Base data? Hint: This us usually accomplished by enabling the Qualys TA knowledge base input. Chances are pretty good that you are since that data is used by the Qualys Splunk app to map Qualys QID codes to human readable names of vulnerabilities.

While this is very useful for the Qualys app's dashboards, we took the by-product of the mapping to the next level by creating a Vulnerability Lookup dashboard (see Figure 1 below) to be used by humans in a more flexible way that has nothing to do with the Qualys scans themselves. This dashboard provides SOC analysts the ability to search the knowledge base by QID, title of the vulnerability, CVE, and even vendor reference numbers such as MS or KB numbers. Best of all, we included the code at the bottom of the article for anyone to use. :-)

Figure 1: Vulnerability Lookup dashboard

Understanding the Data

Once the Knowledge Base data is downloaded to the search head (per Qualys instructions), try to search for it. In a Splunk search box, copy and paste the following.

| inputlookup qualys_kb_lookup

If you see results, you are all set to use the dashboard code at the bottom of the article.

Figure 2: Sample KB data. If you see data returned with this query, you should be good to go.

Conclusion

If you are going to spend the time and resources downloading the Qualys Knowledge Base, you might as well benefit twice by getting a handy localized vulnerability lookup tool at no extra cost. We hope this proves useful to others. Enjoy!

Dashboard Code

<form>
<label>Vulnerability Lookup</label>
<description>Enter the known field below</description>

<fieldset autoRun="false" submitButton="true">
<input type="text" searchWhenChanged="true" token="qid">
<label>Enter the QID. ex: 90464</label>
<default>*</default>
</input>
<input type="text" searchWhenChanged="true" token="title">
<label>Enter the Title. ex: *August 2017*</label>
<default>*</default>
</input>
<input type="text" searchWhenChanged="true" token="cve">
<label>Enter the CVE. ex: *2017-0272*</label>
<default>*</default>
</input>
<input type="text" searchWhenChanged="true" token="vr">
<label>Enter the Vendor Reference (MS or KB). ex: *08-067* or *4022747*</label>
<default>*</default>
</input>
</fieldset>
<row>
<panel>
<table>
<title>Details</title>
<search>
<query>| inputlookup qualys_kb_lookup | rename VULN_TYPE as TYPE | table QID, SEVERITY, TYPE, TITLE, CATEGORY, PATCHABLE, CVSS_BASE, CVSS_TEMPORAL, CVE, VENDOR_REFERENCE, PUBLISHED_DATETIME | fillnull | search TITLE="$title$" QID=$qid$ CVE=$cve$ VENDOR_REFERENCE=$vr$</query>
<earliest>0</earliest>
<latest></latest>
</search>
<option name="count">10</option>
<option name="dataOverlayMode">none</option>
<option name="drilldown">cell</option>
<option name="rowNumbers">false</option>
<option name="wrap">true</option>
</table>
</panel>
</row>
</form>

by Tony Lee

As a Splunk admin, you don’t always control the devices that generate your data. As a result, you may only have control of the data once it reaches Splunk. But what happens when that data stops being sent to Splunk? How long does it take anyone to notice and how much data is lost in the meantime?

We have seen many customers struggle with monitoring and detecting data feed issues so we figured we would cast some light onto the subject. Part I of this series discusses the challenges and steps required to build a potential solution. We highly recommend a quick read since it lays the ground work for the dashboard shown here: http://www.securitysynapse.com/2017/11/detecting-data-feed-issues-with-splunk.html

In this article, we build on that work and provide a handy dashboard (screenshot shown below) that can be used for heads up awareness.

Figure 1: Data Feed Monitor dashboard

Dashboard Explanation

The search to generate the percentage drop is similar to the search we created in part I of this series. It looks back over the past two days and calculates the last two days worth of traffic. It takes the difference and generates a percentage drop. Anything over 50% drop will be displayed as a tile. Notice that we are also excluding a few indexes such as test, main, and lastchanceindex. This can also be customized depending on your needs.

| tstats prestats=t count where earliest=-2d@d latest=-0d@d index!=lastchanceindex index!=test index=* by index, _time span=1d | timechart useother=false limit=0 span=1d count by index | eval _time=strftime(_time,"%Y-%m-%d") | transpose | rename column AS DataSource, "row 1" AS TwoDaysAgo, "row 2" AS Yesterday | eval PercentageDiff=(100-((Yesterday/TwoDaysAgo)*100)) | where PercentageDiff>50 AND DataSource!="catch_all" | table DataSource, PercentageDiff | eval tmp="anything" | xyseries tmp DataSource PercentageDiff | fields - tmp | sort PercentageDiff

The dashboard code uses a trellis layout where tiles are dynamically created when the percentage drop exceeds 50%. Then range colors are used to indicate severity. Anything below 50% (which typically is not shown is green, 50 - 80% is yellow, and over 80% is red. These can also be customized to fit your needs.

Conclusion

This dashboard can be one more tool use to help detect data loss. It is not as real-time as it could be, but if it is made too real-time, there can be false positives when legitimate dips in traffic occur (e.g. employees go home for the day). Because you have the code, you are welcome to adjust it as needed to fit your situation. Enjoy!

Dashboard Code

<label>Data Feed Monitor</label>

<description>Percentage Drop Shown Below</description>

<row>

<panel>

<query>| tstats prestats=t count where earliest=-2d@d latest=-0d@d index!=test index!=main index!=lastchanceindex index=* by index, _time span=1d | timechart useother=false limit=0 span=1d count by index | eval _time=strftime(_time,"%Y-%m-%d") | transpose | rename column AS DataSource, "row 1" AS TwoDaysAgo, "row 2" AS Yesterday | eval PercentageDiff=(100-((Yesterday/TwoDaysAgo)*100)) | where PercentageDiff>50 AND DataSource!="catch_all" | table DataSource, PercentageDiff | eval tmp="anything" | xyseries tmp DataSource PercentageDiff | fields - tmp | sort PercentageDiff</query>

</search>

<option name="colorBy">value</option>

<option name="colorMode">block</option>

<option name="refresh.display">progressbar</option>

<option name="trellis.size">medium</option>

<option name="trellis.splitBy">DataSource</option>

<option name="trendColorInterpretation">standard</option>

<option name="trendDisplayMode">absolute</option>

<option name="unitPosition">after</option>

</single>

</panel>

</row>

</dashboard>

By Tony Lee

Hunting down Windows account lockout issues can be both confusing and infuriating. There are so many logs to sift through and they are not all in one convenient location. In this first article of the series, we will show you one potential method for finding a lockout issue and it will even point you in the right direction to figure out where to go to figure out why the lockout is occurring. As a bonus, we will include a handy dashboard at the end of the article to help you get started on monitoring for lockout issues.

Examining the frequency of Event ID 4740

Detect the problem

Typically the easiest way to detect an account lockout issue in a domain environment is by collecting the Event ID 4740 logs from the domain controllers. Let's examine the contents of a 4740 event using a fictional lockout.

A user account was locked out.

Subject:

Security ID: S-1-5-18
Account Name: MyFakeDC$
Account Domain: MyFakeDomain
Logon ID: 0x3e7

Account That Was Locked Out:

Security ID: MyFakeDomain\John
Account Name: John

Additional Information:

Caller Computer Name: WIN-R9H5Y

The most important takeaways are:

In a domain setting, the "Subject" information will be the Domain and DC reporting the lockout
The "Account That Was Locked Out" section is self explanatory
The Caller Computer Name is where the lockout occurred

Where to go next

Are you feeling a bit underwhelmed at the "plethora" of information provided by this Windows Event ID? In fact, you might be asking yourself: "What caused the account lockout?!?!" Well, for that you will need to go to the "Caller Computer" and gather those logs to get the additional details needed to solve the case. Now, let us remind you of the very first sentence in the article: "Hunting down Windows account lockout issues can be both confusing and infuriating." The next article in the series will cover collecting and examining Event ID 4625 from the Caller Computer so we can determine the cause of the lockout.

Conclusion

We now know how to detect account lockout issues and where to go to find out why the account is getting locked out. We also know that all of the logs necessary to accomplish this task cannot be pulled from one host. The DC will provide the account domain and name as well as the computer in which the failed authentication occurs, but we will now need to collect all of the 4625 logs from every computer to make this scale to an enterprise environment. Hence this is where a central log aggregation platform such as Splunk comes in handy. We hope you find the dashboard code in the next section helpful.

Dashboard Code

The following dashboard code relies on the index name of wineventlog. If this is not your Windows event log index, just change it to suit your needs. The past few cases we worked also had either a Qualys on Nessus scanner generating some noise. We left the Qualys filter in but disabled it. Feel free to tweak that as needed too.

<form>
<label>Auth Examination - 4740</label>
<description>Event ID 4740</description>
<fieldset submitButton="true">
<input type="time" searchWhenChanged="true" token="time">
<label>Time Range</label>
<default>
<earliest>-4h@h</earliest>
<latest>now</latest>
</default>
</input>
<input type="text" searchWhenChanged="true" token="wild">
<label>Wildcard Search</label>
<default>*</default>
</input>
<input type="radio" searchWhenChanged="true" token="notqualys">
<label>Exclude Qualys</label>
<choice value="NOT Qualys">Yes</choice>
<choice value="*">No</choice>
<default>*</default>
<initialValue>*</initialValue>
</input>
</fieldset>
<row>
<panel>
<title>10 Day Glance of Total Lockouts (Independent of Dashboard Time Range Input) :</title>
<chart>
<title>Unique Lockouts per 2 minutes</title>
<search>
<query>index=wineventlog $wild$ $notqualys$ source="WinEventLog:Security" EventCode="4740" |bin _time span=2min|dedup user _time| timechart count span=1h</query>
<earliest>-10d@d</earliest>
<latest>now</latest>
<sampleRatio>1</sampleRatio>
</search>
<option name="charting.axisLabelsX.majorLabelStyle.overflowMode">ellipsisNone</option>
<option name="charting.axisLabelsX.majorLabelStyle.rotation">0</option>
<option name="charting.axisLabelsY.majorUnit">25</option>
<option name="charting.axisTitleX.visibility">visible</option>
<option name="charting.axisTitleY.visibility">visible</option>
<option name="charting.axisTitleY2.visibility">visible</option>
<option name="charting.axisX.scale">linear</option>
<option name="charting.axisY.maximumNumber">285</option>
<option name="charting.axisY.scale">linear</option>
<option name="charting.axisY2.enabled">0</option>
<option name="charting.axisY2.scale">inherit</option>
<option name="charting.chart">column</option>
<option name="charting.chart.bubbleMaximumSize">50</option>
<option name="charting.chart.bubbleMinimumSize">10</option>
<option name="charting.chart.bubbleSizeBy">area</option>
<option name="charting.chart.nullValueMode">gaps</option>
<option name="charting.chart.showDataLabels">none</option>
<option name="charting.chart.sliceCollapsingThreshold">0.01</option>
<option name="charting.chart.stackMode">default</option>
<option name="charting.chart.style">shiny</option>
<option name="charting.drilldown">none</option>
<option name="charting.layout.splitSeries">0</option>
<option name="charting.layout.splitSeries.allowIndependentYRanges">0</option>
<option name="charting.legend.labelStyle.overflowMode">ellipsisMiddle</option>
<option name="charting.legend.placement">right</option>
<option name="refresh.display">progressbar</option>
<option name="trellis.enabled">0</option>
<option name="trellis.scales.shared">1</option>
<option name="trellis.size">medium</option>
</chart>
</panel>
</row>
<row>
<panel>
<table>
<title>Top Domain</title>
<search>
<query>index=wineventlog $wild$ $notqualys$ source="WinEventLog:Security" EventCode="4740" | table _time, EventCode, Account_Domain, user, dvc, Caller_Computer_Name | top limit=0 Account_Domain</query>
<earliest>$time.earliest$</earliest>
<latest>$time.latest$</latest>
</search>
<option name="drilldown">cell</option>
<option name="refresh.display">progressbar</option>
</table>
</panel>
<panel>
<table>
<title>Top User</title>
<search>
<query>index=wineventlog $wild$ $notqualys$ source="WinEventLog:Security" EventCode="4740" | table _time, EventCode, Account_Domain, user, dvc, Caller_Computer_Name | top limit=0 user</query>
<earliest>$time.earliest$</earliest>
<latest>$time.latest$</latest>
</search>
<option name="drilldown">cell</option>
<option name="refresh.display">progressbar</option>
</table>
</panel>
<panel>
<table>
<title>Top Reporting Server</title>
<search>
<query>index=wineventlog $wild$ $notqualys$ source="WinEventLog:Security" EventCode="4740" | table _time, EventCode, Account_Domain, user, dvc, Caller_Computer_Name | top limit=0 dvc</query>
<earliest>$time.earliest$</earliest>
<latest>$time.latest$</latest>
</search>
<option name="drilldown">cell</option>
<option name="refresh.display">progressbar</option>
</table>
</panel>
<panel>
<table>
<title>Top Caller_Computer_Name</title>
<search>
<query>index=wineventlog $wild$ $notqualys$ source="WinEventLog:Security" EventCode="4740" | table _time, EventCode, Account_Domain, user, dvc, Caller_Computer_Name | top limit=0 Caller_Computer_Name</query>
<earliest>$time.earliest$</earliest>
<latest>$time.latest$</latest>
</search>
<option name="drilldown">cell</option>
<option name="refresh.display">progressbar</option>
</table>
</panel>
</row>
<row>
<panel>
<title>Timechart by Account_Name</title>
<chart>
<search>
<query>index=wineventlog $wild$ $notqualys$ source="WinEventLog:Security" EventCode="4740"| timechart count by user</query>
<earliest>$time.earliest$</earliest>
<latest>$time.latest$</latest>
<sampleRatio>1</sampleRatio>
</search>
<option name="charting.axisLabelsX.majorLabelStyle.overflowMode">ellipsisNone</option>
<option name="charting.axisLabelsX.majorLabelStyle.rotation">0</option>
<option name="charting.axisTitleX.visibility">visible</option>
<option name="charting.axisTitleY.visibility">visible</option>
<option name="charting.axisTitleY2.visibility">visible</option>
<option name="charting.axisX.scale">linear</option>
<option name="charting.axisY.scale">linear</option>
<option name="charting.axisY2.enabled">0</option>
<option name="charting.axisY2.scale">inherit</option>
<option name="charting.chart">area</option>
<option name="charting.chart.bubbleMaximumSize">50</option>
<option name="charting.chart.bubbleMinimumSize">10</option>
<option name="charting.chart.bubbleSizeBy">area</option>
<option name="charting.chart.nullValueMode">gaps</option>
<option name="charting.chart.showDataLabels">none</option>
<option name="charting.chart.sliceCollapsingThreshold">0.01</option>
<option name="charting.chart.stackMode">default</option>
<option name="charting.chart.style">shiny</option>
<option name="charting.drilldown">none</option>
<option name="charting.layout.splitSeries">0</option>
<option name="charting.layout.splitSeries.allowIndependentYRanges">0</option>
<option name="charting.legend.labelStyle.overflowMode">ellipsisMiddle</option>
<option name="charting.legend.placement">right</option>
<option name="refresh.display">progressbar</option>
<option name="trellis.enabled">0</option>
<option name="trellis.scales.shared">1</option>
<option name="trellis.size">medium</option>
</chart>
</panel>
<panel>
<title>Timechart by reporting host</title>
<chart>
<search>
<query>index=wineventlog $wild$ $notqualys$ source="WinEventLog:Security" EventCode="4740"| timechart count by dvc</query>
<earliest>$time.earliest$</earliest>
<latest>$time.latest$</latest>
<sampleRatio>1</sampleRatio>
</search>
<option name="charting.axisLabelsX.majorLabelStyle.overflowMode">ellipsisNone</option>
<option name="charting.axisLabelsX.majorLabelStyle.rotation">0</option>
<option name="charting.axisTitleX.visibility">visible</option>
<option name="charting.axisTitleY.visibility">visible</option>
<option name="charting.axisTitleY2.visibility">visible</option>
<option name="charting.axisX.scale">linear</option>
<option name="charting.axisY.scale">linear</option>
<option name="charting.axisY2.enabled">0</option>
<option name="charting.axisY2.scale">inherit</option>
<option name="charting.chart">area</option>
<option name="charting.chart.bubbleMaximumSize">50</option>
<option name="charting.chart.bubbleMinimumSize">10</option>
<option name="charting.chart.bubbleSizeBy">area</option>
<option name="charting.chart.nullValueMode">gaps</option>
<option name="charting.chart.showDataLabels">none</option>
<option name="charting.chart.sliceCollapsingThreshold">0.01</option>
<option name="charting.chart.stackMode">default</option>
<option name="charting.chart.style">shiny</option>
<option name="charting.drilldown">none</option>
<option name="charting.layout.splitSeries">0</option>
<option name="charting.layout.splitSeries.allowIndependentYRanges">0</option>
<option name="charting.legend.labelStyle.overflowMode">ellipsisMiddle</option>
<option name="charting.legend.placement">right</option>
<option name="refresh.display">progressbar</option>
<option name="trellis.enabled">0</option>
<option name="trellis.scales.shared">1</option>
<option name="trellis.size">medium</option>
</chart>
</panel>
</row>
<row>
<panel>
<title>Timechart by Account_Domain</title>
<chart>
<search>
<query>index=wineventlog $wild$ $notqualys$ source="WinEventLog:Security" EventCode="4740"| timechart count by Account_Domain</query>
<earliest>$time.earliest$</earliest>
<latest>$time.latest$</latest>
<sampleRatio>1</sampleRatio>
</search>
<option name="charting.axisLabelsX.majorLabelStyle.overflowMode">ellipsisNone</option>
<option name="charting.axisLabelsX.majorLabelStyle.rotation">0</option>
<option name="charting.axisTitleX.visibility">visible</option>
<option name="charting.axisTitleY.visibility">visible</option>
<option name="charting.axisTitleY2.visibility">visible</option>
<option name="charting.axisX.scale">linear</option>
<option name="charting.axisY.scale">linear</option>
<option name="charting.axisY2.enabled">0</option>
<option name="charting.axisY2.scale">inherit</option>
<option name="charting.chart">area</option>
<option name="charting.chart.bubbleMaximumSize">50</option>
<option name="charting.chart.bubbleMinimumSize">10</option>
<option name="charting.chart.bubbleSizeBy">area</option>
<option name="charting.chart.nullValueMode">gaps</option>
<option name="charting.chart.showDataLabels">none</option>
<option name="charting.chart.sliceCollapsingThreshold">0.01</option>
<option name="charting.chart.stackMode">default</option>
<option name="charting.chart.style">shiny</option>
<option name="charting.drilldown">none</option>
<option name="charting.layout.splitSeries">0</option>
<option name="charting.layout.splitSeries.allowIndependentYRanges">0</option>
<option name="charting.legend.labelStyle.overflowMode">ellipsisMiddle</option>
<option name="charting.legend.placement">right</option>
<option name="refresh.display">progressbar</option>
<option name="trellis.enabled">0</option>
<option name="trellis.scales.shared">1</option>
<option name="trellis.size">medium</option>
</chart>
</panel>
<panel>
<title>Timechart by Caller_Computer_Name</title>
<chart>
<search>
<query>index=wineventlog $wild$ $notqualys$ source="WinEventLog:Security" EventCode="4740"| timechart count by src</query>
<earliest>$time.earliest$</earliest>
<latest>$time.latest$</latest>
<sampleRatio>1</sampleRatio>
</search>
<option name="charting.axisLabelsX.majorLabelStyle.overflowMode">ellipsisNone</option>
<option name="charting.axisLabelsX.majorLabelStyle.rotation">0</option>
<option name="charting.axisTitleX.visibility">visible</option>
<option name="charting.axisTitleY.visibility">visible</option>
<option name="charting.axisTitleY2.visibility">visible</option>
<option name="charting.axisX.scale">linear</option>
<option name="charting.axisY.scale">linear</option>
<option name="charting.axisY2.enabled">0</option>
<option name="charting.axisY2.scale">inherit</option>
<option name="charting.chart">area</option>
<option name="charting.chart.bubbleMaximumSize">50</option>
<option name="charting.chart.bubbleMinimumSize">10</option>
<option name="charting.chart.bubbleSizeBy">area</option>
<option name="charting.chart.nullValueMode">gaps</option>
<option name="charting.chart.showDataLabels">none</option>
<option name="charting.chart.sliceCollapsingThreshold">0.01</option>
<option name="charting.chart.stackMode">default</option>
<option name="charting.chart.style">shiny</option>
<option name="charting.drilldown">none</option>
<option name="charting.layout.splitSeries">0</option>
<option name="charting.layout.splitSeries.allowIndependentYRanges">0</option>
<option name="charting.legend.labelStyle.overflowMode">ellipsisMiddle</option>
<option name="charting.legend.placement">right</option>
<option name="refresh.display">progressbar</option>
<option name="trellis.enabled">0</option>
<option name="trellis.scales.shared">1</option>
<option name="trellis.size">medium</option>
</chart>
</panel>
</row>
<row>
<panel>
<title>Details</title>
<table>
<search>
<query>index=wineventlog $wild$ $notqualys$ source="WinEventLog:Security" EventCode="4740" | table _time, EventCode, Account_Domain, user, dvc, Caller_Computer_Name</query>
<earliest>$time.earliest$</earliest>
<latest>$time.latest$</latest>
<sampleRatio>1</sampleRatio>
</search>
<option name="drilldown">cell</option>
<option name="refresh.display">progressbar</option>
</table>
</panel>
</row>
</form>

By Tony Lee

Welcome to part II of the series dedicated to troubleshooting Windows account lockouts using Splunk. In part I (http://securitysynapse.com/2018/08/troubleshooting-windows-account-lockout-part-i.html) of the series, we highlighted and examined a 4740 event pulled from a domain controller. This 4740 event contained the following information:

The domain controller that handled the authentication request and reported the lockout
Domain name
Account name
The original host where the account attempted authentication

In this article we will look at a 4625 event from the originating host because it will contain further authentication details such as the reason for failure and the application that is attempting to authenticate. Our dashboard provided at the end of the article with also include searches for Event ID 529 to include Windows operating systems that are end of life (EOL).

Examine the Problem

As we did with the 4740 event, we will now examine a fictional 4625 event and we will highlight and summarize the key points below. This fictional 4625 event was pulled from a host indicated by the 4740 event pulled from the domain controller.

LogName=Security
SourceName=Microsoft Windows security auditing.
EventCode=4625
EventType=0
Type=Information
ComputerName=WIN-R9H5Y.MYFAKEDOMAIN.COM
TaskCategory=Logon
OpCode=Info
RecordNumber=267153
Keywords=Audit Failure
Message=An account failed to log on.

Subject:
Security ID:S-1-5-18
Account Name:WIN-R9H5Y$
Account Domain:MYFAKEDOMAIN
Logon ID:0x3E7

Logon Type:8

Account For Which Logon Failed:
Security ID:S-1-0-0
Account Name:John
Account Domain:MyFakeDomain.com

Failure Information:
Failure Reason:Unknown user name or bad password.
Status:0xC000006D
Sub Status:0xC000006A

Process Information:
Caller Process ID:0x5aac
Caller Process Name:C:\Windows\System32\inetsrv\w3wp.exe

Network Information:
Workstation Name:WIN-R9H5Y
Source Network Address:192.1.1.100
Source Port:49770

Detailed Authentication Information:
Logon Process:Advapi
Authentication Package:Negotiate
Transited Services:-
Package Name (NTLM only):-
Key Length:0

The most important takeaways from this event are:

Failure Reason: In this case it was an unknown username or password. We know the username is correct, so it must be a bad password
Caller Process Name: A quick Google search for w3wp.exe shows that it is most likely associated with an Exchange server running IIS.

After pulling a few more events, we see several more bad passwords and then the eventual lockout. Common causes for account lockouts indicated by this process are mobile devices (phone or tablet) that contain stale credentials. The mobile device continues to attempt to authenticate until it locks out the account. Mystery solved!

Conclusion

Even though we presented fictional event logs, this example is based on real situations. Fortunately we had the 4740 events from the domain controllers and we were collecting the 4625 logs from the rest of the servers (and some workstations). It would be very difficult and time consuming to perform this sort of correlation without a central point of aggregation such as Splunk. Even if you were to do this manually for one or two instances, you would not want to do it for the entire enterprise. To make your life easier, we are including dashboard code in the section below to display the 4625 events. We eventually added some workflow integration between the 4740 dashboard provided in the previous article and the 4625 dashboard below, but we will leave that exercise up to the reader. Have fun and happy Splunking.

Dashboard Code

The following dashboard code relies on the index name of wineventlog. If this is not your Windows event log index, just change it to suit your needs. Also, the past few cases we worked had either a Qualys on Nessus scanner generating some noise. We left the Qualys filter in but disabled it. Feel free to also tweak that as needed.

<form>
<label>Auth Examination - 4625</label>
<description>Event ID 4625 or 529</description>
<fieldset submitButton="true">
<input type="time" token="time" searchWhenChanged="true">
<label>Time Range</label>
<default>
<earliest>-4h@h</earliest>
<latest>now</latest>
</default>
</input>
<input type="text" token="wild" searchWhenChanged="true">
<label>Wildcard Search</label>
<default>*</default>
</input>
<input type="radio" token="notqualys" searchWhenChanged="true">
<label>Exclude Qualys</label>
<choice value="NOT Qualys">Yes</choice>
<choice value="*">No</choice>
<default>*</default>
<initialValue>*</initialValue>
</input>
</fieldset>
<row>
<panel>
<table>
<title>Top Failure_Reason</title>
<search>
<query>index=wineventlog $wild$ $notqualys$ source="WinEventLog:Security" EventCode="4625" OR EventCode="529" | table _time, EventCode, Logon_Type, Status, Failure_Reason, Account_Domain, Account_Name, user, dvc, src, src_ip, Caller_Process_Name | top limit=0 Failure_Reason</query>
<earliest>$time.earliest$</earliest>
<latest>$time.latest$</latest>
</search>
<option name="drilldown">cell</option>
</table>
</panel>
<panel>
<table>
<title>Top Domain</title>
<search>
<query>index=wineventlog $wild$ $notqualys$ source="WinEventLog:Security" EventCode="4625" OR EventCode="529" | table _time, EventCode, Logon_Type, Status, Failure_Reason, Account_Domain, Account_Name, user, dvc, src, src_ip, Caller_Process_Name | top limit=0 Account_Domain</query>
<earliest>$time.earliest$</earliest>
<latest>$time.latest$</latest>
</search>
<option name="drilldown">cell</option>
</table>
</panel>
<panel>
<table>
<title>Top User</title>
<search>
<query>index=wineventlog $wild$ $notqualys$ source="WinEventLog:Security" EventCode="4625" OR EventCode="529" | table _time, EventCode, Logon_Type, Status, Failure_Reason, Account_Domain, Account_Name, user, dvc, src, src_ip, Caller_Process_Name | top limit=0 user</query>
<earliest>$time.earliest$</earliest>
<latest>$time.latest$</latest>
</search>
<option name="drilldown">cell</option>
</table>
</panel>
<panel>
<table>
<title>Top src</title>
<search>
<query>index=wineventlog $wild$ $notqualys$ source="WinEventLog:Security" EventCode="4625" OR EventCode="529" | table _time, EventCode, Logon_Type, Status, Failure_Reason, Account_Domain, Account_Name, user, dvc, src, src_ip, Caller_Process_Name | top limit=0 src</query>
<earliest>$time.earliest$</earliest>
<latest>$time.latest$</latest>
</search>
<option name="drilldown">cell</option>
</table>
</panel>
</row>
<row>
<panel>
<table>
<title>Top Process</title>
<search>
<query>index=wineventlog $wild$ $notqualys$ source="WinEventLog:Security" EventCode="4625" OR EventCode="529" | table _time, EventCode, Logon_Type, Status, Failure_Reason, Account_Domain, Account_Name, user, dvc, src, src_ip, Caller_Process_Name | top limit=0 Caller_Process_Name</query>
<earliest>$time.earliest$</earliest>
<latest>$time.latest$</latest>
</search>
<option name="drilldown">cell</option>
</table>
</panel>
<panel>
<table>
<title>Top Status</title>
<search>
<query>index=wineventlog $wild$ $notqualys$ source="WinEventLog:Security" EventCode="4625" OR EventCode="529" | table _time, EventCode, Logon_Type, Status, Failure_Reason, Account_Domain, Account_Name, user, dvc, src, src_ip, Caller_Process_Name | top limit=0 Status</query>
<earliest>$time.earliest$</earliest>
<latest>$time.latest$</latest>
</search>
<option name="drilldown">cell</option>
</table>
</panel>
</row>
<row>
<panel>
<title>Timechart by Account_Name</title>
<chart>
<search>
<query>index=wineventlog $wild$ $notqualys$ source="WinEventLog:Security" EventCode="4625" OR EventCode="529"| timechart count by user</query>
<earliest>$time.earliest$</earliest>
<latest>$time.latest$</latest>
<sampleRatio>1</sampleRatio>
</search>
<option name="charting.axisLabelsX.majorLabelStyle.overflowMode">ellipsisNone</option>
<option name="charting.axisLabelsX.majorLabelStyle.rotation">0</option>
<option name="charting.axisTitleX.visibility">visible</option>
<option name="charting.axisTitleY.visibility">visible</option>
<option name="charting.axisTitleY2.visibility">visible</option>
<option name="charting.axisX.scale">linear</option>
<option name="charting.axisY.scale">linear</option>
<option name="charting.axisY2.enabled">0</option>
<option name="charting.axisY2.scale">inherit</option>
<option name="charting.chart">area</option>
<option name="charting.chart.bubbleMaximumSize">50</option>
<option name="charting.chart.bubbleMinimumSize">10</option>
<option name="charting.chart.bubbleSizeBy">area</option>
<option name="charting.chart.nullValueMode">gaps</option>
<option name="charting.chart.showDataLabels">none</option>
<option name="charting.chart.sliceCollapsingThreshold">0.01</option>
<option name="charting.chart.stackMode">default</option>
<option name="charting.chart.style">shiny</option>
<option name="charting.drilldown">none</option>
<option name="charting.layout.splitSeries">0</option>
<option name="charting.layout.splitSeries.allowIndependentYRanges">0</option>
<option name="charting.legend.labelStyle.overflowMode">ellipsisMiddle</option>
<option name="charting.legend.placement">right</option>
<option name="trellis.enabled">0</option>
<option name="trellis.scales.shared">1</option>
<option name="trellis.size">medium</option>
</chart>
</panel>
<panel>
<title>Timechart by reporting host</title>
<chart>
<search>
<query>index=wineventlog $wild$ $notqualys$ source="WinEventLog:Security" EventCode="4625" OR EventCode="529"| timechart count by dvc</query>
<earliest>$time.earliest$</earliest>
<latest>$time.latest$</latest>
<sampleRatio>1</sampleRatio>
</search>
<option name="charting.axisLabelsX.majorLabelStyle.overflowMode">ellipsisNone</option>
<option name="charting.axisLabelsX.majorLabelStyle.rotation">0</option>
<option name="charting.axisTitleX.visibility">visible</option>
<option name="charting.axisTitleY.visibility">visible</option>
<option name="charting.axisTitleY2.visibility">visible</option>
<option name="charting.axisX.scale">linear</option>
<option name="charting.axisY.scale">linear</option>
<option name="charting.axisY2.enabled">0</option>
<option name="charting.axisY2.scale">inherit</option>
<option name="charting.chart">area</option>
<option name="charting.chart.bubbleMaximumSize">50</option>
<option name="charting.chart.bubbleMinimumSize">10</option>
<option name="charting.chart.bubbleSizeBy">area</option>
<option name="charting.chart.nullValueMode">gaps</option>
<option name="charting.chart.showDataLabels">none</option>
<option name="charting.chart.sliceCollapsingThreshold">0.01</option>
<option name="charting.chart.stackMode">default</option>
<option name="charting.chart.style">shiny</option>
<option name="charting.drilldown">none</option>
<option name="charting.layout.splitSeries">0</option>
<option name="charting.layout.splitSeries.allowIndependentYRanges">0</option>
<option name="charting.legend.labelStyle.overflowMode">ellipsisMiddle</option>
<option name="charting.legend.placement">right</option>
<option name="trellis.enabled">0</option>
<option name="trellis.scales.shared">1</option>
<option name="trellis.size">medium</option>
</chart>
</panel>
</row>
<row>
<panel>
<title>Timechart by Account_Domain</title>
<chart>
<search>
<query>index=wineventlog $wild$ $notqualys$ source="WinEventLog:Security" EventCode="4625" OR EventCode="529"| timechart count by Account_Domain</query>
<earliest>$time.earliest$</earliest>
<latest>$time.latest$</latest>
<sampleRatio>1</sampleRatio>
</search>
<option name="charting.axisLabelsX.majorLabelStyle.overflowMode">ellipsisNone</option>
<option name="charting.axisLabelsX.majorLabelStyle.rotation">0</option>
<option name="charting.axisTitleX.visibility">visible</option>
<option name="charting.axisTitleY.visibility">visible</option>
<option name="charting.axisTitleY2.visibility">visible</option>
<option name="charting.axisX.scale">linear</option>
<option name="charting.axisY.scale">linear</option>
<option name="charting.axisY2.enabled">0</option>
<option name="charting.axisY2.scale">inherit</option>
<option name="charting.chart">area</option>
<option name="charting.chart.bubbleMaximumSize">50</option>
<option name="charting.chart.bubbleMinimumSize">10</option>
<option name="charting.chart.bubbleSizeBy">area</option>
<option name="charting.chart.nullValueMode">gaps</option>
<option name="charting.chart.showDataLabels">none</option>
<option name="charting.chart.sliceCollapsingThreshold">0.01</option>
<option name="charting.chart.stackMode">default</option>
<option name="charting.chart.style">shiny</option>
<option name="charting.drilldown">none</option>
<option name="charting.layout.splitSeries">0</option>
<option name="charting.layout.splitSeries.allowIndependentYRanges">0</option>
<option name="charting.legend.labelStyle.overflowMode">ellipsisMiddle</option>
<option name="charting.legend.placement">right</option>
<option name="trellis.enabled">0</option>
<option name="trellis.scales.shared">1</option>
<option name="trellis.size">medium</option>
</chart>
</panel>
<panel>
<title>Timechart by src</title>
<chart>
<search>
<query>index=wineventlog $wild$ $notqualys$ source="WinEventLog:Security" EventCode="4625" OR EventCode="529"| timechart count by src</query>
<earliest>$time.earliest$</earliest>
<latest>$time.latest$</latest>
<sampleRatio>1</sampleRatio>
</search>
<option name="charting.axisLabelsX.majorLabelStyle.overflowMode">ellipsisNone</option>
<option name="charting.axisLabelsX.majorLabelStyle.rotation">0</option>
<option name="charting.axisTitleX.visibility">visible</option>
<option name="charting.axisTitleY.visibility">visible</option>
<option name="charting.axisTitleY2.visibility">visible</option>
<option name="charting.axisX.scale">linear</option>
<option name="charting.axisY.scale">linear</option>
<option name="charting.axisY2.enabled">0</option>
<option name="charting.axisY2.scale">inherit</option>
<option name="charting.chart">area</option>
<option name="charting.chart.bubbleMaximumSize">50</option>
<option name="charting.chart.bubbleMinimumSize">10</option>
<option name="charting.chart.bubbleSizeBy">area</option>
<option name="charting.chart.nullValueMode">gaps</option>
<option name="charting.chart.showDataLabels">none</option>
<option name="charting.chart.sliceCollapsingThreshold">0.01</option>
<option name="charting.chart.stackMode">default</option>
<option name="charting.chart.style">shiny</option>
<option name="charting.drilldown">none</option>
<option name="charting.layout.splitSeries">0</option>
<option name="charting.layout.splitSeries.allowIndependentYRanges">0</option>
<option name="charting.legend.labelStyle.overflowMode">ellipsisMiddle</option>
<option name="charting.legend.placement">right</option>
<option name="trellis.enabled">0</option>
<option name="trellis.scales.shared">1</option>
<option name="trellis.size">medium</option>
</chart>
</panel>
</row>
<row>
<panel>
<title>Details</title>
<table>
<search>
<query>index=wineventlog $wild$ $notqualys$ source="WinEventLog:Security" EventCode="4625" OR EventCode="529" | table _time, EventCode, Logon_Type, Status, Failure_Reason, Account_Domain, Account_Name, user, dvc, src, src_ip, Caller_Process_Name</query>
<earliest>$time.earliest$</earliest>
<latest>$time.latest$</latest>
<sampleRatio>1</sampleRatio>
</search>
<option name="drilldown">cell</option>
</table>
</panel>
</row>
</form>

By Tony Lee

Welcome to part III of the series dedicated to troubleshooting Windows account lockouts using Splunk. In this article we will give you a dashboard that we affectionately named Lockout Hunter. It combines the knowledge (and some dashboard panels) from both part I and part II of this series into a single interactive dashboard that allows users to drilldown on data without leaving the dashboard. You will notice in the screenshot below that the first row is event ID 4740 related panels. The far right two panels are hyperlink clickable and will cause the second row of event ID 4625 events to populate. This filter can be cleared by clicking the "Reset Filters" link or clicking on a different user or computer.

Figure 1: Lockout Hunter Dashboard

Background

In part I (http://securitysynapse.com/2018/08/troubleshooting-windows-account-lockout-part-i.html) of the series, we highlighted and examined a 4740 event pulled from a domain controller. This 4740 event contained the following information:

The domain controller that handled the authentication request and reported the lockout
Domain name
Account name
The original host where the account attempted authentication

In part II (http://securitysynapse.com/2018/08/troubleshooting-windows-account-lockout-part-ii.html) of the series, we highlighted and examined a 4625 event (and Event ID 529 for EOL operating systems) pulled from workstations. The most important takeaways from this event are:

Why the authentication attempt is failing
The actual process (caller process name) failing authentication

When combined these two log sources are quite powerful.

Conclusion

We wanted to take lockout hunting up one more notch by releasing the lockout hunter dashboard. Our original intention was to help security practitioners find brute force attempts via account lockouts, however it ended up having a huge impact with ITOps. These dashboards have saved help desks quite a few hours in determining the root cause for account lockout tickets. We hope you find them useful too. Happy Splunking!

Dashboard Code

<form>

<label>Lockout Hunter - 4740 & 4625</label>

<description>Click on Top User or Top Caller_Computer_Name to pivot on the next row</description>

<label>Time Range</label>

</default>

</input>

</input>

<label>Source</label>

</input>

<label>Wildcard Search</label>

</input>

<label>Exclude Qualys</label>

</input>

</fieldset>

<row>

<panel>

<html>

<u1><h3>Event ID 4740 row - Click a user or host below to drill in on the second row</h3></u1>

<a href="lockout_hunter?form.user=*&form.src=*" style="margin-left:0px">Reset Filters</a>

</html>

</panel>

</row>

<row>

<panel>

<table>

<title>Top Domain</title>

<query>index=wineventlog $wild$ $notqualys$ source="WinEventLog:Security" EventCode="4740" | table _time, EventCode, Account_Domain, user, dvc, Caller_Computer_Name | top limit=0 Account_Domain</query>

<earliest>$time.earliest$</earliest>

<latest>$time.latest$</latest>

</search>

</table>

</panel>

<panel>

<table>

<title>Top Reporting Server</title>

<query>index=wineventlog $wild$ $notqualys$ source="WinEventLog:Security" EventCode="4740" | table _time, EventCode, Account_Domain, user, dvc, Caller_Computer_Name | top limit=0 dvc</query>

<earliest>$time.earliest$</earliest>

<latest>$time.latest$</latest>

</search>

</table>

</panel>

<panel>

<table>

<title>Top User (Click pivots to 4625)</title>

<query>index=wineventlog $wild$ $notqualys$ source="WinEventLog:Security" EventCode="4740" | table _time, EventCode, Account_Domain, user, dvc, Caller_Computer_Name | top limit=0 user</query>

<earliest>$time.earliest$</earliest>

<latest>$time.latest$</latest>

</search>

<set token="form.user">$click.value$</set>

</drilldown>

</table>

</panel>

<panel>

<table>

<title>Top Caller_Computer_Name (Click pivots to 4625)</title>

<earliest>$time.earliest$</earliest>

<latest>$time.latest$</latest>

</search>

<set token="form.src">$click.value$</set>

</drilldown>

</table>

</panel>

</row>

<row>

<panel>

<html>

<u1><h3>Event ID 4625 and 529 logs from the hosts</h3></u1>

</html>

</panel>

</row>

<row>

<panel>

<table>

<title>Top Failure_Reason</title>

<query>index=wineventlog $wild$ user=$user$ src=$src$ $notqualys$ source="WinEventLog:Security" EventCode="4625" OR EventCode="529" | table _time, EventCode, Logon_Type, Status, Failure_Reason, Account_Domain, Account_Name, user, dvc, src, src_ip, Caller_Process_Name | top limit=0 Failure_Reason</query>

<earliest>$time.earliest$</earliest>

<latest>$time.latest$</latest>

</search>

<option name="refresh.display">progressbar</option>

</table>

</panel>

<panel>

<table>

<title>Top Domain</title>

<earliest>$time.earliest$</earliest>

<latest>$time.latest$</latest>

</search>

<option name="refresh.display">progressbar</option>

</table>

</panel>

<panel>

<table>

<earliest>$time.earliest$</earliest>

<latest>$time.latest$</latest>

</search>

<option name="refresh.display">progressbar</option>

</table>

</panel>

<panel>

<table>

<earliest>$time.earliest$</earliest>

<latest>$time.latest$</latest>

</search>

<option name="refresh.display">progressbar</option>

</table>

</panel>

</row>

<row>

<panel>

<table>

<title>Top Process</title>

<earliest>$time.earliest$</earliest>

<latest>$time.latest$</latest>

</search>

<option name="refresh.display">progressbar</option>

</table>

</panel>

<panel>

<table>

<title>Top Status</title>

<earliest>$time.earliest$</earliest>

<latest>$time.latest$</latest>

</search>

<option name="refresh.display">progressbar</option>

</table>

</panel>

</row>

<row>

<panel>

<title>10 Day Glance of Total Lockouts (Independent of Dashboard Time Range Input) :</title>

<chart>

<title>Unique Lockouts per 2 minutes</title>

<query>index=wineventlog $wild$ $notqualys$ source="WinEventLog:Security" EventCode="4740" |bin _time span=2min|dedup user _time| timechart count span=1h</query>

</search>

<option name="charting.axisLabelsX.majorLabelStyle.overflowMode">ellipsisNone</option>

<option name="charting.axisTitleX.visibility">visible</option>

<option name="charting.axisTitleY.visibility">visible</option>

<option name="charting.axisTitleY2.visibility">visible</option>

<option name="charting.axisX.scale">linear</option>

<option name="charting.axisY.scale">linear</option>

<option name="charting.axisY2.scale">inherit</option>

<option name="charting.chart">column</option>

<option name="charting.chart.stackMode">default</option>

<option name="charting.chart.style">shiny</option>

<option name="charting.legend.labelStyle.overflowMode">ellipsisMiddle</option>

<option name="charting.legend.placement">right</option>

<option name="refresh.display">progressbar</option>

<option name="trellis.size">medium</option>

</chart>

</panel>

</row>

</form>

By Tony Lee

Welcome to the fifth article of the Spelunking your Splunk series, all designed to help you understand your Splunk environment at a quick glance. Here is a quick recap of the previous articles:

Spelunking your Splunk Part I (Exploring Your Data) - A clever dashboard that can be used to quickly understand the indexes, sources, sourcetypes, and hosts in any Splunk environment.
Spelunking your Splunk – Part II (Disk Usage) - A dashboard that can be used to monitor data distribution across multiple indexers.
Spelunking your Splunk – Part III (License Usage) - A dashboard to understand license usage over time.
Spelunking your Splunk – Part IV (User Metrics) - A dashboard to provide insight into user activity

This article focuses on understanding your Splunk environment at a high-level. Have you ever wondered the following?

How many events ingested over a user-defined time period
How that equates to events per second (EPS)
The distinct host count
Number of indexes with data
Number of sourcetypes
Number of sources
Visually what the data ingest looks like by total event count and by index

This dashboard will give it to you and do it fast! As a bonus we will provide the dashboard code at the end of the article.

Figure 1: Splunk Stats dashboard

Finding detailed index information quickly

There are at least two places within Splunk to discover index information. The first uses a RESTful call and provides detailed information about indexes. The second requires more calculation and is less efficient. For this exercise, lets try copying and pasting the following RESTful search into your Splunk search bar to see what data is returned:

| rest /services/data/indexes-extended

Figure 2: Results of the restful search (remember to scroll right)

| dbinspect index=*

Figure 3: Column headers from dbinspect (remember to scroll right)

Now try the following which combines both (thank you Splunk!):

| dbinspect index=* cached=t
| where NOT match(index, "^_")
| stats max(rawSize) AS raw_size max(eventCount) AS event_count BY bucketId, index
| stats sum(raw_size) AS raw_size sum(event_count) AS event_count dc(bucketId) AS buckets BY index
| eval raw_size_gb = round(raw_size / 1024 / 1024 / 1024 , 2) | fields index raw_size_gb event_count buckets
| join type=outer index [| rest /services/data/indexes-extended
| table title maxTime minTime frozenTimePeriodInSecs
| eval minTime = case(minTime >= "0", minTime)
| stats max(maxTime) AS maxTime min(minTime) AS minTime max(frozenTimePeriodInSecs) AS retention BY title
| eval maxTime = replace(maxTime, "T", ""), maxTime = replace(maxTime, "\+0000", ""), minTime = replace(minTime, "T", ""), minTime = replace(minTime, "\+0000", ""), retention = round(retention / 86400, 0)." Days"
| rename title AS index] | fields index raw_size_gb event_count buckets minTime maxTime retention
| rename raw_size_gb AS "Index Size (GB)" event_count AS "Total Event Count" buckets AS "Total Bucket Count" minTime AS "Earliest Event" maxTime AS "Latest Event" retention AS Retention

Now that you understand the basics, the sky is the limit. :-)

Finding source, sourcetype, and host data quickly

You may remember from the first article of this series (Spelunking your Splunk Part I (Exploring Your Data) called tstats. In a nutshell, tstats can perform statistical queries on indexed fields—very very quickly. These indexed fields by default are index, source, sourcetype, and host. It just so happens that these are the fields that we need to understand the environment. Best of all, even on an underpowered environment or one with lots of data ingested per day, these commands will still outperform the rest of your typical searches even over long periods of time. This works great for our dashboard!

Conclusion

Dashboard XML code

Below is the dashboard code needed to enumerate your Splunk stats. Feel free to modify the dashboard as needed:

<form>
<label>Splunk Stats</label>
<fieldset submitButton="true" autoRun="true">
<input type="time" token="time">
<label>Time Range Selector</label>
<default>
<earliest>-7d@h</earliest>
<latest>now</latest>
</default>
</input>
</fieldset>
<row>
<panel>
<single>
<title>Distinct Events</title>
<search>
<query>| tstats count where index=*</query>
<earliest>$time.earliest$</earliest>
<latest>$time.latest$</latest>
<sampleRatio>1</sampleRatio>
</search>
<option name="drilldown">none</option>
<option name="refresh.display">progressbar</option>
</single>
</panel>
<panel>
<single>
<title>Events Per Second (EPS)</title>
<search>
<query>| tstats count where index=* | addinfo | eval diff = info_max_time - info_min_time | eval EPS = count / diff | table EPS</query>
<earliest>$time.earliest$</earliest>
<latest>$time.latest$</latest>
</search>
<option name="drilldown">none</option>
</single>
</panel>
<panel>
<single>
<title>Distinct Hosts</title>
<search>
<query>| tstats dc(host) where index=*</query>
<earliest>$time.earliest$</earliest>
<latest>$time.latest$</latest>
<sampleRatio>1</sampleRatio>
</search>
<option name="drilldown">none</option>
<option name="refresh.display">progressbar</option>
</single>
</panel>
<panel>
<single>
<title>Distinct Indexes with Data</title>
<search>
<query>| dbinspect index=* cached=t
| where NOT match(index, "^_")
| stats max(rawSize) AS raw_size max(eventCount) AS event_count BY bucketId, index
| stats sum(raw_size) AS raw_size sum(event_count) AS event_count dc(bucketId) AS buckets BY index
| eval raw_size_gb = round(raw_size / 1024 / 1024 / 1024 , 2) | fields index raw_size_gb event_count buckets
| join type=outer index [| rest /services/data/indexes-extended
| table title maxTime minTime frozenTimePeriodInSecs
| eval minTime = case(minTime >= "0", minTime)
| stats max(maxTime) AS maxTime min(minTime) AS minTime max(frozenTimePeriodInSecs) AS retention BY title
| eval maxTime = replace(maxTime, "T", ""), maxTime = replace(maxTime, "\+0000", ""), minTime = replace(minTime, "T", ""), minTime = replace(minTime, "\+0000", ""), retention = round(retention / 86400, 0)." Days"
| rename title AS index] | fields index raw_size_gb event_count buckets minTime maxTime retention
| rename raw_size_gb AS "Index Size (GB)" event_count AS "Total Event Count" buckets AS "Total Bucket Count" minTime AS "Earliest Event" maxTime AS "Latest Event" retention AS Retention | stats count</query>
<earliest>0</earliest>
<latest></latest>
<sampleRatio>1</sampleRatio>
</search>
<option name="drilldown">none</option>
<option name="refresh.display">progressbar</option>
</single>
</panel>
<panel>
<single>
<title>Distinct Sourcetypes</title>
<search>
<query>| tstats dc(sourcetype) where index=*</query>
<earliest>$time.earliest$</earliest>
<latest>$time.latest$</latest>
<sampleRatio>1</sampleRatio>
</search>
<option name="drilldown">none</option>
<option name="refresh.display">progressbar</option>
</single>
</panel>
<panel>
<single>
<title>Distinct Sources</title>
<search>
<query>| tstats dc(source) where index=*</query>
<earliest>$time.earliest$</earliest>
<latest>$time.latest$</latest>
<sampleRatio>1</sampleRatio>
</search>
<option name="drilldown">none</option>
<option name="refresh.display">progressbar</option>
</single>
</panel>
</row>
<row>
<panel>
<chart>
<title>Total Event Count Over Time</title>
<search>
<query>| tstats prestats=t count where index=* by _time | timechart count</query>
<earliest>$time.earliest$</earliest>
<latest>$time.latest$</latest>
</search>
<option name="charting.chart">area</option>
<option name="charting.drilldown">none</option>
<option name="refresh.display">progressbar</option>
</chart>
</panel>
<panel>
<chart>
<title>Event Count by Index Over Time</title>
<search>
<query>| tstats prestats=t count where index=* by index, _time | timechart count by index</query>
<earliest>$time.earliest$</earliest>
<latest>$time.latest$</latest>
</search>
<option name="charting.chart">area</option>
<option name="charting.drilldown">none</option>
<option name="refresh.display">progressbar</option>
</chart>
</panel>
</row>
<row>
<panel>
<table>
<title>Indexes with Data</title>
<search>
<query>| dbinspect index=* cached=t
| where NOT match(index, "^_")
| stats max(rawSize) AS raw_size max(eventCount) AS event_count BY bucketId, index
| stats sum(raw_size) AS raw_size sum(event_count) AS event_count dc(bucketId) AS buckets BY index
| eval raw_size_gb = round(raw_size / 1024 / 1024 / 1024 , 2) | fields index raw_size_gb event_count buckets
| join type=outer index [| rest /services/data/indexes-extended
| table title maxTime minTime frozenTimePeriodInSecs
| eval minTime = case(minTime >= "0", minTime)
| stats max(maxTime) AS maxTime min(minTime) AS minTime max(frozenTimePeriodInSecs) AS retention BY title
| eval maxTime = replace(maxTime, "T", ""), maxTime = replace(maxTime, "\+0000", ""), minTime = replace(minTime, "T", ""), minTime = replace(minTime, "\+0000", ""), retention = round(retention / 86400, 0)." Days"
| rename title AS index] | fields index raw_size_gb event_count buckets minTime maxTime retention
| rename raw_size_gb AS "Index Size (GB)" event_count AS "Total Event Count" buckets AS "Total Bucket Count" minTime AS "Earliest Event" maxTime AS "Latest Event" retention AS Retention</query>
<earliest>0</earliest>
<latest></latest>
</search>
<option name="drilldown">none</option>
<option name="refresh.display">progressbar</option>
</table>
</panel>
</row>
</form>

By Tony Lee

Have you ever had a data source that you thought was sending data using the wrong time? This can be a problem since Splunk tries to parse and use the event time instead of the ingest time, this can cause issues when trying to find ingested data. If you suspect this is the case you may be experiencing one of the following scenarios:

Systems not using NTP that experience clock drift
Systems using broken or faulty NTP
Systems using the wrong timezone (ex: Sending events in central time, but specifies GMT)

Depending on the time range selected, this can result in data not showing up within Splunk (or any SIEM) because the data may appear to be in the past or the future. For example, events that are lagging current time by 5 hours will not show up if "Last 4 hours" is selected for the time range. In a similar fashion, events that are sent with a future date and time will only show up when the time range selector of "All Time" is selected.

Enough about the problems, let's walk through building one possible solution. As a bonus we provide the dashboard shown below at the bottom of the article.

Figure 1: Last Communicated Calculator

Dashboard Components

To assist in usability, we provide a drop down input at the top that contains a list of the indexes. This is list of indexes is populated dynamically. This is derived using the dbinspect command which contains data about existing indexes within Splunk. The following creates the drop down input in the dashboard.

| dbinspect index=* | where NOT match(index, "^_") | table index | dedup index

The upper (host detail) panel consists of columns indicating the host, total count, first written time, last written time and so on--perfect information to determine time issues. This information can be found using the metadata command which can quickly query info about hosts, sources, and sourcetypes. In this case, we care about the hosts.

| metadata index=<index we care about> type=hosts

The lower panel (a time-based area chart), represents the volume of data at a given time for a given host. We used the tstats command that we covered in previous article, but looks like the following:

| tstats prestats=t count where index=<index we care about> AND host=<host we care about> by host, _time | timechart useother=false count by host

It is certainly noteworthy that every search on this dashboard uses metadata and that's why it is so quick to discover these details. As a result, you will probably notice that there is no time wasted waiting for the search to return as the data renders almost instantly.

Conclusion

Splunk provides decent visibility into various features within Monitoring Console / DMC (Distributed management console), but we found this flexible and customizable dashboard to be quite helpful for gaining additional insight into the last time a host communicated. This can be used to identify, troubleshoot, and finally confirm the time being reported by devices. We hope this article helps you troubleshoot these very frustrating issues. Enjoy!

Dashboard XML code

Below is the dashboard code needed to see the Last Communicated Times for hosts by Index. Feel free to modify the dashboard as needed:

<form>
<label>Last Communicated Calculator</label>
<description>Select an Index (or Indexes) - High Number is bad...</description>
<fieldset submitButton="true">
<input type="time" token="time">
<label>Time Range</label>
<default>
<earliest>-24h@h</earliest>
<latest>now</latest>
</default>
</input>
<input type="multiselect" token="index">
<label>Index</label>
<fieldForLabel>Index</fieldForLabel>
<fieldForValue>index</fieldForValue>
<search>
<query>| dbinspect index=* | where NOT match(index, "^_") | table index | dedup index</query>
<earliest>-30d@d</earliest>
<latest>now</latest>
</search>
<valuePrefix>index=</valuePrefix>
<delimiter> OR </delimiter>
</input>
<input type="text" token="host">
<label>Host</label>
<default>*</default>
</input>
</fieldset>
<row>
<panel>
<table>
<title>Hosts</title>
<search>
<query>| metadata $index$ type=hosts | dedup host | eval currentTime=now() | eval seconds=now()-lastTime | eval minutes=(seconds/60) | eval hours=(minutes/60) | convert ctime(lastTime) ctime(firstTime) ctime(currentTime) | table host, totalCount, firstTime, lastTime, currentTime, hours, minutes, seconds | sort - seconds | rename hours AS "Last Comm (in hrs)", minutes AS "Last Comm (in mins)", seconds AS "Last Comm (in secs)" | search host=$host$</query>
<earliest>$time.earliest$</earliest>
<latest>$time.latest$</latest>
<sampleRatio>1</sampleRatio>
</search>
<option name="count">20</option>
<option name="dataOverlayMode">none</option>
<option name="drilldown">none</option>
<option name="percentagesRow">false</option>
<option name="rowNumbers">true</option>
<option name="totalsRow">false</option>
<option name="wrap">true</option>
</table>
</panel>
</row>
<row>
<panel>
<chart>
<title>Visual (Keep in mind your time range. Anything beyond the time range will not show up)</title>
<search>
<query>| tstats prestats=t count where $index$ AND host=$host$ by host, _time | timechart useother=false count by host</query>
<earliest>$time.earliest$</earliest>
<latest>$time.latest$</latest>
</search>
<option name="charting.chart">area</option>
<option name="charting.drilldown">none</option>
<option name="refresh.display">progressbar</option>
</chart>
</panel>
</row>
</form>

By Tony Lee

Have you ever wanted to monitor what goes on with removable media in your environment, but maybe lack the money or man power to run a Data Loss Prevention (DLP) tool to monitor the USB devices? The good news is that you can do this on the cheap using Microsoft Windows Event logs and a bit of data crunching effort. In this article we will provide a few ways to collect the logs, but we will ultimately use Splunk to aggregate, process, and display the information. As a bonus, we will not only outline the steps to accomplish this task, but we will also provide working dashboard code at the end of the article.

Figure 1: Dashboard provided at the end of the article

High-level steps

There are two main steps needed to accomplish this task. We need to generate and collect the Windows event logs and then we need to process and display the logs within Splunk. Each is outlined below.

Windows Event Generation
Microsoft logs USB connect and disconnect actions in the following Windows Event Viewer location: Application and Services Logs > Microsoft > Windows > DriverFrameworks-UserMode > Operational

Unfortunately, this log is disabled by default. Administrators can manually enable it per machine or take action on a larger scale using a login script or other mechanism outlined in the References section below. For this article, we will enable the logs manually by right clicking on “Operational” and selecting “Properties” to show that it is disabled. Check the box to enable these logs. After checking the box, we rebooted for good measure, because hey, this is Windows.

Figure 2: DriverFrameworks-UserMode enablement and log path

Connect Event IDs
Now that USB connectivity logging is enabled, insert a USB drive and click the refresh button to see some events. You will notice that there are quite a few event IDs associated with connecting a USB device, but fortunately for our situation, not all of them are important. For example, some of the event IDs pertain to USB functions needed to ready the device. For the sake of completeness, the event IDs associated with connecting a device are the following:

2003 – This is a unique event created upon connecting a USB device which contains helpful data
2004
2006
2010
2100
2101
2105
2106

Disconnect Event IDs
Fortunately, there are far fewer event IDs associated with disconnecting a USB device.

2100
2102 – This is a unique event created upon disconnecting a USB device which contains helpful data

Feel free to explore the data within each event but note that we have called out two Event IDs that contain the most amount of data pertaining to connection (2003) and disconnection (2102).

Windows Event Collection
Now that the logs are being generated, they need to be forwarded from the endpoints to a central location—in this case Splunk. This task could be accomplished using a number of methods such as Windows Event Collector (WEC), a Splunk Universal Forwarder agent, or some other forwarding method. For this demo, we will use a Splunk Universal Forwarder shown in next section.

Splunk
While we are assuming a functional Splunk Enterprise installation exists, we still need to collect the logs. We provide a sample Splunk Universal Forwarder configuration file below to help those using the Splunk Universal Forwarder. Note: we will be placing the events into an index called wineventlog. If this index does not already exist, you will first need to create it.

inputs.conf
Located on the Windows endpoint (Usually found here: C:\Program Files\SplunkUniversalForwarder\etc\apps\SplunkUniversalForwarder\local\inputs.conf)

WinEventLog://Microsoft-Windows-DriverFrameworks-UserMode/Operational]
index = wineventlog
checkpointInterval = 5
current_only = 0
disabled = 0
start_from = oldest
whitelist = 2003, 2102

Once the inputs.conf file is properly configured (and the universal forwarder restarted) to collect these logs from the endpoint, we need to verify that the logs are reaching Splunk. Try running the following Splunk search:

index=wineventlog

If you see results, try something more specific, such as either of the following:

index=wineventlog EventCode=2003
index=wineventlog EventCode=2102

Field Extraction

Now that we have the proper Windows Event IDs we need to make sure we can reference the fields. Unfortunately, Windows event logs are a hybrid between human readable and machine readable—which usually means that no one likes to read them. As a result, we need to perform some manual extraction within Splunk to pull out key information such as the USB vendor, product, serial number, and guid. Within Splunk (Settings -> Fields -> Fields extractions) we added the following regex string to enable this parsing:

.*?VEN_(?<vendor>.*?)\&PROD_(?<product>.*?)\&.*?#(?<serialNumber>.*?)&.*?{(?<guid>.*?)}

Figure 3: Example Field Extraction

Figure 4: Example Event ID 2003 showing fields are properly extracted

Conclusion

Now that we have the proper event IDs flowing into Splunk and the necessary fields extracted, we created a Removable Storage Connections dashboard. The dashboard provides statistical analysis for connects, disconnects, top vendors, products, serial numbers, and hosts. It even includes events over time by action and serial number along with the details needed to investigate USB connections. For your convenience, we included the dashboard code below.

Caveats

Per Greg Shultz, “If you find an Event ID 2003 event record for a specific USB flash drive but don't find a corresponding Event ID 2102 event record, that either means that the USB flash drive is still attached to the system or the system was shut down before the device was removed.”

Acknowledgement and References

Big thanks to the following articles which were quite useful:

https://www.techrepublic.com/article/how-to-track-down-usb-flash-drive-usage-in-windows-10s-event-viewer/
https://df-stream.com/2014/01/the-windows-7-event-log-and-usb-device/

Dashboard Code

Splunk dashboard code provided below:

<form>

<label>Removable Storage Connections</label>

<description>index=wineventlog EventCode=2003 & 2102 - Microsoft-Windows-DriverFrameworks-UserMode/Operational"</description>

<label>Time Range</label>

</default>

</input>

<label>Wildcard Search</label>

</input>

</fieldset>

<row>

<panel>

<title>Number of Connect Events</title>

<query>index=wineventlog EventCode=2003 USBSTOR $wild$ | table _time, ComputerName, EventCode, User, vendor, product, serialNumber | stats count</query>

<earliest>$time.earliest$</earliest>

<latest>$time.latest$</latest>

</search>

<option name="refresh.display">progressbar</option>

</single>

</panel>

<panel>

<title>Number of Disconnect Events</title>

<query>index=wineventlog EventCode=2102 USBSTOR $wild$ | transaction maxspan=5s EventCode, ComputerName, serialNumber | dedup _time, ComputerName, serialNumber | table _time, ComputerName, EventCode, User, vendor, product, serialNumber | stats count</query>

<earliest>$time.earliest$</earliest>

<latest>$time.latest$</latest>

</search>

<option name="refresh.display">progressbar</option>

</single>

</panel>

<panel>

<table>

<title>Top Hosts with USB Activity</title>

<query>index=wineventlog (EventCode=2003 OR EventCode=2102) USBSTOR $wild$ | transaction maxspan=5s EventCode, ComputerName, serialNumber | table _time, ComputerName, EventCode, User, vendor, product, serialNumber | top limit=0 ComputerName</query>

<earliest>$time.earliest$</earliest>

<latest>$time.latest$</latest>

</search>

<option name="refresh.display">progressbar</option>

</table>

</panel>

<panel>

<table>

<title>Top Removable Storage Vendors</title>

<query>index=wineventlog (EventCode=2003 OR EventCode=2102) USBSTOR $wild$ | dedup serialNumber | table _time, ComputerName, EventCode, User, vendor, product, serialNumber | top limit=0 vendor</query>

<earliest>$time.earliest$</earliest>

<latest>$time.latest$</latest>

</search>

<option name="refresh.display">progressbar</option>

</table>

</panel>

<panel>

<table>

<title>Top Removable Storage Products</title>

<earliest>$time.earliest$</earliest>

<latest>$time.latest$</latest>

</search>

<option name="refresh.display">progressbar</option>

</table>

</panel>

<panel>

<table>

<title>Top Serial Numbers</title>

<earliest>$time.earliest$</earliest>

<latest>$time.latest$</latest>

</search>

</table>

</panel>

</row>

<row>

<panel>

<chart>

<title>Events Over Time</title>

<query>index=wineventlog (EventCode=2003 OR EventCode=2102) USBSTOR $wild$ | eval action=case(EventCode == 2003, "Connect", EventCode == 2102, "Disconnect") | table _time, ComputerName, action, EventCode, User, vendor, product, serialNumber | eval ActionSerial = action + ":" + serialNumber | timechart dc(serialNumber) by ActionSerial</query>

<earliest>$time.earliest$</earliest>

<latest>$time.latest$</latest>

</search>

<option name="charting.chart">column</option>

</chart>

</panel>

</row>

<row>

<panel>

<table>

<title>Connect Events (EventCode=2003)</title>

<query>index=wineventlog EventCode=2003 USBSTOR $wild$ | table _time, ComputerName, EventCode, User, vendor, product, serialNumber</query>

<earliest>$time.earliest$</earliest>

<latest>$time.latest$</latest>

</search>

<option name="percentagesRow">false</option>

<option name="refresh.display">progressbar</option>

<option name="rowNumbers">false</option>

<option name="totalsRow">false</option>

</table>

</panel>

<panel>

<table>

<title>Disconnect Events (EventCode=2102)</title>

<earliest>$time.earliest$</earliest>

<latest>$time.latest$</latest>

</search>

<option name="refresh.display">progressbar</option>

</table>

</panel>

</row>

</form>

By Tony Lee

Welcome to Part two in our series on Monitoring USB Storage Activity. In the first article (http://www.securitysynapse.com/2018/11/monitoring-usb-storage-activity-part-1.html), we examined what is required to monitor USB Storage connect and disconnect events. But how about activity that happens after the drives are connected? The good news is that this is also possible using Microsoft Windows Event logs and a bit of data crunching effort. In this article we will again use Splunk to aggregate, process, and display the logs. As a bonus, we will not only outline the steps to accomplish this task, but we will also provide working dashboard code at the end of the article.

Note: The Audit Removable Storage policy is only available in Windows 8 / 2008 and above—It is not available in Windows 7 / 2003. ☹

Figure 1: Dashboard provided at the end of the article

High-level steps

There are two main steps needed to accomplish this task. We need to generate and collect the Windows event logs and then we need to process and display the logs within Splunk. Each is outlined below.

Windows Event Generation
For Windows 8 / 2008 hosts and above, Microsoft USB activity logs can be enabled manually one machine at a time or via Group Policy (see references section below for instructions). For this demo, we will show how to enable it on one machine using Local Security Policy: Advanced Audit Policy Configuration > System Audit Policies > Local Group > Object Access > Audit Removable Storage

Figure 2: Enabling Audit of Removable Storage

Double click and audit for Success and Failure. After enabling auditing, we rebooted for good measure, because hey, this is Windows.

Activity Event IDs
Now that Audit Removable Storage is enabled, open Event Viewer > Windows Logs > Security. Select Filter Current Log on the right-hand side and type in 4663 for event ID and click OK. Insert a USB device and click the Refresh button on the right-hand side. If all is well, there should be multiple 4663 success events. Note that Event ID 4656 is used for failures.

Figure 3: Testing 4663 and 4656 event visibility

Feel free to explore the data within each event but take note that for USB auditing the events that we care about have a Task Category of “Removable Storage”. For convenience we provide a file delete event below:

XX/XX/XXXX 05:54:43 PM
LogName=Security
SourceName=Microsoft Windows security auditing.
EventCode=4663
EventType=0
Type=Information
ComputerName=DESKTOP-8HSPO8Q
TaskCategory=Removable Storage
OpCode=Info
RecordNumber=1211
Keywords=Audit Success
Message=An attempt was made to access an object.

Subject:
Security ID:S-1-5-21-XXXXXXX-XXXXXXXXX-XXXXXXXXXX-XXXX
Account Name:User
Account Domain:DESKTOP-8HSPO8Q
Logon ID:0x229E9

Object:
Object Server:Security
Object Type:File
Object Name:\Device\HarddiskVolume7\New Microsoft Word Document.docx
Handle ID:0x1404
Resource Attributes:

Process Information:
Process ID:0x17b4
Process Name:C:\Windows\explorer.exe

Access Request Information:
Accesses:DELETE

Access Mask:0x10000

Windows Event Collection
Now that the logs are being generated, they need to be forwarded from the endpoints to a central location—in this case Splunk. This task could be accomplished using a number of methods such as Windows Event Collector (WEC), a Splunk Universal Forwarder agent, or some other forwarding method. For this demo, we will use a Splunk Universal Forwarder shown in next section.

Splunk

While we are assuming a functional Splunk Enterprise installation exists, we still need to collect the logs. We provide a sample Splunk Universal Forwarder configuration file below to help those using the Splunk Universal Forwarder. Note: we will be placing the events into an index called wineventlog. If this index does not already exist, you will first need to create it.

inputs.conf
Located on the Windows endpoint (Usually found here: C:\Program Files\SplunkUniversalForwarder\etc\apps\SplunkUniversalForwarder\local\inputs.conf)

[WinEventLog://Security]
index = wineventlog
checkpointInterval = 5
current_only = 0
disabled = 0
start_from = oldest
whitelist = 4663, 4656

Once the inputs.conf file is properly configured (and the universal forwarder restarted) to collect these logs from the endpoint, we need to verify that the logs are reaching Splunk. Try running the following Splunk search:

index=wineventlog

If you see results, try something more specific, such as either of the following:

index=wineventlog EventCode=4663
index=wineventlog EventCode=4656

Conclusion

Now that we have the proper event IDs flowing into Splunk, we created a Removable Storage Activity dashboard. The dashboard provides statistical analysis for top accounts, hostname, actions, and processes. It even includes events over time by hostname and action along with the details needed to investigate USB connections. Because there may be applications within an environment that scan or interact with removable storage, it may be necessary to add some filters to reduce noise which can be customized for each environment. For your convenience, we included the dashboard code below.

Acknowledgement and References

https://www.eventtracker.com/tech-articles/tracking-removable-storage-windows-security-log/
https://docs.microsoft.com/en-us/previous-versions/windows/it-pro/windows-server-2012-R2-and-2012/jj574128(v=ws.11)

Dashboard Code

The following dashboard assumes that the appropriate logs are being collected and sent to Splunk. Additionally, the dashboard code assumes an index of wineventlog. Fell free to adjust as necessary. Splunk dashboard code provided below:

<form>
<label>Removable Storage Activity</label>
<description>index=wineventlog EventCode=4663 TaskCategory="Removable Storage"</description>
<fieldset autoRun="true" submitButton="true">
<input type="time" token="time">
<label>Time Range</label>
<default>
<earliest>0</earliest>
<latest></latest>
</default>
</input>
<input type="text" token="wild">
<label>Wildcard Search</label>
<default>*</default>
<initialValue>*</initialValue>
</input>
<input type="multiselect" token="Accesses">
<label>Actions (Accesses)</label>
<choice value="*">All</choice>
<choice value="ReadData (or ListDirectory)">ReadData (or ListDirectory)</choice>
<choice value="WriteData (or AddFile)">WriteData (or AddFile)</choice>
<choice value="AppendData (or AddSubdirectory or CreatePipeInstance)">AppendData (or AddSubdirectory or CreatePipeInstance)</choice>
<choice value="DELETE">DELETE</choice>
<default>*</default>
<initialValue>*</initialValue>
<valuePrefix>Accesses="</valuePrefix>
<valueSuffix>"</valueSuffix>
<delimiter> OR </delimiter>
</input>
</fieldset>
<row>
<panel>
<single>
<title>Total events</title>
<search>
<query>index=wineventlog EventCode=4663 TaskCategory="Removable Storage" $wild$ $Accesses$ | dedup _time, Account_Domain, ComputerName, Account_Name, Accesses, Process_Name, Object_Name | table _time, Account_Domain, ComputerName, Account_Name, Accesses, Process_Name, Object_Name | stats count</query>
<earliest>$time.earliest$</earliest>
<latest>$time.latest$</latest>
</search>
<option name="drilldown">all</option>
<option name="refresh.display">progressbar</option>
</single>
</panel>
<panel>
<table>
<title>Top Account_Domain</title>
<search>
<query>index=wineventlog EventCode=4663 TaskCategory="Removable Storage" $wild$ $Accesses$ | dedup _time, Account_Domain, ComputerName, Account_Name, Accesses, Process_Name, Object_Name | table _time, Account_Domain, ComputerName, Account_Name, Accesses, Process_Name, Object_Name | top limit=0 Account_Domain</query>
<earliest>$time.earliest$</earliest>
<latest>$time.latest$</latest>
</search>
<option name="drilldown">cell</option>
<option name="refresh.display">progressbar</option>
</table>
</panel>
<panel>
<table>
<title>Top ComputerName</title>
<search>
<query>index=wineventlog EventCode=4663 TaskCategory="Removable Storage" $wild$ $Accesses$ | dedup _time, Account_Domain, ComputerName, Account_Name, Accesses, Process_Name, Object_Name | table _time, Account_Domain, ComputerName, Account_Name, Accesses, Process_Name, Object_Name | top limit=0 ComputerName</query>
<earliest>$time.earliest$</earliest>
<latest>$time.latest$</latest>
</search>
<option name="drilldown">cell</option>
<option name="refresh.display">progressbar</option>
</table>
</panel>
<panel>
<table>
<title>Top Account_Name</title>
<search>
<query>index=wineventlog EventCode=4663 TaskCategory="Removable Storage" $wild$ $Accesses$ | dedup _time, Account_Domain, ComputerName, Account_Name, Accesses, Process_Name, Object_Name | table _time, Account_Domain, ComputerName, Account_Name, Accesses, Process_Name, Object_Name | top limit=0 Account_Name</query>
<earliest>$time.earliest$</earliest>
<latest>$time.latest$</latest>
</search>
<option name="drilldown">cell</option>
<option name="refresh.display">progressbar</option>
</table>
</panel>
</row>
<row>
<panel>
<table>
<title>Top Accesses</title>
<search>
<query>index=wineventlog EventCode=4663 TaskCategory="Removable Storage" $wild$ $Accesses$ | dedup _time, Account_Domain, ComputerName, Account_Name, Accesses, Process_Name, Object_Name | table _time, Account_Domain, ComputerName, Account_Name, Accesses, Process_Name, Object_Name | top limit=0 Accesses</query>
<earliest>$time.earliest$</earliest>
<latest>$time.latest$</latest>
</search>
<option name="drilldown">cell</option>
<option name="refresh.display">progressbar</option>
</table>
</panel>
<panel>
<table>
<title>Top Process_Name</title>
<search>
<query>index=wineventlog EventCode=4663 TaskCategory="Removable Storage" $wild$ $Accesses$ | dedup _time, Account_Domain, ComputerName, Account_Name, Accesses, Process_Name, Object_Name | table _time, Account_Domain, ComputerName, Account_Name, Accesses, Process_Name, Object_Name | top limit=0 Process_Name</query>
<earliest>$time.earliest$</earliest>
<latest>$time.latest$</latest>
</search>
<option name="drilldown">cell</option>
<option name="refresh.display">progressbar</option>
</table>
</panel>
</row>
<row>
<panel>
<chart>
<title>Activity Over Time</title>
<search>
<query>index=wineventlog EventCode=4663 TaskCategory="Removable Storage" $wild$ $Accesses$ | dedup _time, Account_Domain, ComputerName, Account_Name, Accesses, Process_Name, Object_Name | eval ComputerAction = ComputerName + ":" + Accesses | timechart count(ComputerAction) by ComputerAction</query>
<earliest>$time.earliest$</earliest>
<latest>$time.latest$</latest>
</search>
<option name="charting.chart">column</option>
<option name="charting.drilldown">none</option>
<option name="refresh.display">progressbar</option>
</chart>
</panel>
</row>
<row>
<panel>
<table>
<title>Details</title>
<search>
<query>index=wineventlog EventCode=4663 TaskCategory="Removable Storage" $wild$ $Accesses$ | dedup _time, Account_Domain, ComputerName, Account_Name, Accesses, Process_Name, Object_Name | table _time, Account_Domain, ComputerName, Account_Name, Accesses, Process_Name, Object_Name</query>
<earliest>$time.earliest$</earliest>
<latest>$time.latest$</latest>
</search>
<option name="dataOverlayMode">none</option>
<option name="drilldown">cell</option>
<option name="percentagesRow">false</option>
<option name="refresh.display">progressbar</option>
<option name="rowNumbers">false</option>
<option name="totalsRow">false</option>
<option name="wrap">true</option>
</table>
</panel>
</row>
</form>

By Tony Lee

Have you ever searched for a Splunk app or TA and came up empty? We have too... Not to worry though, with a little parsing and some dashboarding we should be able to create visibility where there may not be much previously. This was exactly the case when we tried to parse AirWatch logs (https://www.air-watch.com/).

Figure 1: At the time of writing this article, no app or TA existed for airwatch.

If you have this same situation, hopefully we can help you too. This is the process we followed along with the regex used and the final dashboard produced. As a bonus, not only will we give you our regex that we used, but also the dashboard code at the end of the article.

Figure 2: Final dashboard to display airwatch data

Raw Log

Mar 15 07:43:45 airwatchhost Mar 15 13:43:45 AirWatch AirWatch Syslog Details are as follows Event Type: Device
Event: SecurityInformationConfirmed
User: sysadmin
Enrollment User: TLEE
Event Source: Device
Event Module: Devices
Event Category: Command
Event Data:
Device Friendly Name: TLEE iPhone iOS 12.1.0 GRY9

Fields we need to parse

Event Type
Event
User
Enrollment User
Event Source
Event Module
Event Category
Event Data
Device Friendly Name

Regular Expression Needed

There may be more graceful ways to parse these logs, but this seemed to work for us. Go to Settings > Fields > Field Extractions > New Field Extraction. For the fields use the following:

Select the app
Name: All-Airwatch-Fields
Select the sourcetype for airwatch data
Inline
Extraction: Copy and paste what we have below

Event\sType:\s(?P<EventType>.*?)\sEvent:\s(?P<Event>.*?)\sUser:\s(?P<User>.*?)\sEnrollment\sUser:\s(?P<EnrollmentUser>.*?)\sEvent\sSource:\s(?P<EventSource>.*?)\sEvent\sModule:\s(?P<EventModule>.*?)\sEvent\sCategory:\s(?P<EventCategory>.*?)\sEvent\sData:\s(?P<EventData>.*?)\sDevice\sFriendly\sName:\s(?P<DeviceFriendlyName>.*)

You should not need to restart Splunk, but give it 5 minutes and search with your index and sourcetype again in Verbose mode and the fields should now be parsed.

Conclusion

Even though we did not have a Splunk TA or App to help create visibility, we did this ourselves using the flexibility provided within Splunk. We hope this article helped other save time. If it helped or even if it did not work, feel free to leave a comment below. Happy Splunking!

Dashboard Code

The following dashboard assumes that the appropriate logs are being collected and sent to Splunk. Additionally, the dashboard code assumes an index of airwatch. Feel free to adjust as necessary. Splunk dashboard code provided below:

<form>
<label>Airwatch</label>
<fieldset submitButton="true" autoRun="true">
<input type="time" token="time">
<label>Time Range</label>
<default>
<earliest>-60m@m</earliest>
<latest>now</latest>
</default>
</input>
<input type="text" token="wild">
<label>Wildcard Search</label>
<default>*</default>
<initialValue>*</initialValue>
</input>
</fieldset>
<row>
<panel>
<single>
<title>Event Count</title>
<search>
<query>index=airwatch $wild$ | table _time, EventType, Event, User, EnrollmentUser, EventSource, EventModule, EventCategory, EventData, DeviceFriendlyName | stats count</query>
<earliest>$time.earliest$</earliest>
<latest>$time.latest$</latest>
</search>
<option name="drilldown">all</option>
</single>
</panel>
<panel>
<table>
<title>Top Event</title>
<search>
<query>index=airwatch $wild$ | table _time, EventType, Event, User, EnrollmentUser, EventSource, EventModule, EventCategory, EventData, DeviceFriendlyName | top limit=0 Event</query>
<earliest>$time.earliest$</earliest>
<latest>$time.latest$</latest>
</search>
<option name="count">10</option>
<option name="drilldown">cell</option>
</table>
</panel>
<panel>
<table>
<title>Top EventModule</title>
<search>
<query>index=airwatch $wild$ | table _time, EventType, Event, User, EnrollmentUser, EventSource, EventModule, EventCategory, EventData, DeviceFriendlyName | top limit=0 EventModule</query>
<earliest>$time.earliest$</earliest>
<latest>$time.latest$</latest>
</search>
<option name="count">10</option>
<option name="drilldown">cell</option>
<option name="refresh.display">progressbar</option>
</table>
</panel>
<panel>
<table>
<title>Top Enrollment User</title>
<search>
<query>index=airwatch $wild$ | table _time, EventType, Event, User, EnrollmentUser, EventSource, EventModule, EventCategory, EventData, DeviceFriendlyName | top limit=0 EnrollmentUser</query>
<earliest>$time.earliest$</earliest>
<latest>$time.latest$</latest>
</search>
<option name="count">10</option>
<option name="drilldown">cell</option>
</table>
</panel>
<panel>
<table>
<title>Top Device Friendly Name</title>
<search>
<query>index=airwatch $wild$ | table _time, EventType, Event, User, EnrollmentUser, EventSource, EventModule, EventCategory, EventData, DeviceFriendlyName | top limit=0 DeviceFriendlyName</query>
<earliest>$time.earliest$</earliest>
<latest>$time.latest$</latest>
</search>
<option name="count">10</option>
<option name="drilldown">cell</option>
</table>
</panel>
</row>
<row>
<panel>
<chart>
<title>Top Event over Time</title>
<search>
<query>index=airwatch $wild$ | table _time, EventType, Event, User, EnrollmentUser, EventSource, EventModule, EventCategory, EventData, DeviceFriendlyName | timechart count by Event</query>
<earliest>$time.earliest$</earliest>
<latest>$time.latest$</latest>
</search>
<option name="charting.chart">line</option>
<option name="charting.drilldown">none</option>
<option name="refresh.display">progressbar</option>
</chart>
</panel>
<panel>
<chart>
<title>Top Enrollment User over Time</title>
<search>
<query>index=airwatch $wild$ | table _time, EventType, Event, User, EnrollmentUser, EventSource, EventModule, EventCategory, EventData, DeviceFriendlyName | timechart count by EnrollmentUser</query>
<earliest>$time.earliest$</earliest>
<latest>$time.latest$</latest>
</search>
<option name="charting.chart">line</option>
<option name="charting.drilldown">none</option>
</chart>
</panel>
</row>
<row>
<panel>
<table>
<title>Details</title>
<search>
<query>index=airwatch $wild$ | table _time, EventType, Event, User, EnrollmentUser, EventSource, EventModule, EventCategory, EventData, DeviceFriendlyName</query>
<earliest>$time.earliest$</earliest>
<latest>$time.latest$</latest>
<sampleRatio>1</sampleRatio>
</search>
<option name="count">10</option>
<option name="dataOverlayMode">none</option>
<option name="drilldown">cell</option>
<option name="percentagesRow">false</option>
<option name="refresh.display">progressbar</option>
<option name="rowNumbers">false</option>
<option name="totalsRow">false</option>
<option name="wrap">true</option>
</table>
</panel>
</row>
</form>

By Tony Lee

If you are reading this page chances are good that you have both Splunk and Cisco Identity Services Engine (ISE). Chances are also pretty good that you have seen what a challenge it can be to parse these logs and make them into a useful dashboard. Granted, there is probably one app and TA combination out of the 50+ Cisco Apps and TAs on Splunkbase that will work for you, but if you strike out there, you can always try the solution and dashboard provided in the article below.

To get started, you should at least install the TA found here to parse the fields: http://splunkbase.splunk.com/app/1915 and give your incoming Cisco ISE syslog stream a sourcetype of "cisco:ise:syslog" per the documentation here: http://docs.splunk.com/Documentation/AddOns/released/CiscoISE/Datatypes. If you have data flowing and the fields are parsed out, we are in business.

Now, hold on to your hats, we are about to dive into the world of Cisco ISE logs and figure out just how to create the following dashboard.

Figure 1: A useful Cisco ISE dashboard with all necessary data.

Caveat: This article assumes that you called your Cisco ISE index "cisco-ise". If you did not, just change the commands and dashboard to fit your index name.

The Problem

As mentioned in the introduction, the logs are a bit messy. The upside is that they are data rich. There is so much that you can extract from the logs, but first you need to piece them back together -- literally. The logs are sent over in chunks as shown below:

2019-08-06T16:33:06+00:00 HOST CISE_Passed_Authentications 0000649495 4 3 ....

2019-08-06T16:33:06+00:00 HOST CISE_Passed_Authentications 0000649495 4 2 ....

2019-08-06T16:33:06+00:00 HOST CISE_Passed_Authentications 0000649495 4 1 ....

2019-08-06T16:33:06+00:00 HOST CISE_Passed_Authentications 0000649495 4 0 ....

Here is the kicker, those four events are related (as indicated by the first large number which we are calling an event_id and then the next two numbers, the last of which increments). When combined into one event, it contains a ton of data. So, how do we combine the events?

The Solution

Fortunately, Splunk has a transaction function that we can use to indicate that the events are related and should be combined into one event. But we have a problem, that field is not parsed by the Splunk TA mentioned in the introduction, so we will need to parse it.

We can parse it with the following gnarly regex:

^(?:[^ \n]* ){3}(?P<event_id>\d+)\s+

Figure 2: event_id parsed using a Splunk field extraction

With the event_id parsed, we can now use the transaction statement to combine the four events into one event which can be seen with the following search command:

index=cisco-ise | transaction event_id

Now, let's take it a bit farther and table the most interesting fields (feel free to leave a comment if you feel that we left out an interesting field):

index=cisco-ise | transaction event_id | table _time, host, event_id, NetworkDeviceName, NAS_IP_Address, NAS_Port, Location, SelectedAuthenticationIdentityStores, SelectedAuthorizationProfiles, SSID, ISEPolicySetName, UserName, EndPointMACAddress, Calling_Station_ID, Called_Station_ID

Conclusion

We hope that this article has been helpful in understanding Cisco ISE logs and how to combine them to extract feature rich data from single events. As always, happy Splunking!

Dashboard Code

The dashboard code below assumes the index is cisco-ise and the Cisco TA is properly parsing the data. Please adjust as necessary.

<form>
<label>Cisco ISE</label>
<description>Populated by syslog data</description>
<fieldset submitButton="true" autoRun="false">
<input type="time" token="time" searchWhenChanged="true">
<label>Time Range</label>
<default>
<earliest>-8h@h</earliest>
<latest>now</latest>
</default>
</input>
<input type="text" token="wild">
<label>Wildcard</label>
<default>*</default>
</input>
</fieldset>
<row>
<panel>
<table>
<title>Top Network Device Name</title>
<search>
<query>index=cisco-ise | transaction event_id | table _time, host, event_id, NetworkDeviceName, NAS_IP_Address, NAS_Port, Location, SelectedAuthenticationIdentityStores, SelectedAuthorizationProfiles, SSID, ISEPolicySetName, UserName, EndPointMACAddress, Calling_Station_ID, Called_Station_ID | search $wild$ | top limit=0 NetworkDeviceName</query>
<earliest>$time.earliest$</earliest>
<latest>$time.latest$</latest>
</search>
<option name="count">10</option>
<option name="drilldown">cell</option>
<option name="refresh.display">progressbar</option>
</table>
</panel>
<panel>
<table>
<title>Top Location</title>
<search>
<query>index=cisco-ise | transaction event_id | table _time, host, event_id, NetworkDeviceName, NAS_IP_Address, NAS_Port, Location, SelectedAuthenticationIdentityStores, SelectedAuthorizationProfiles, SSID, ISEPolicySetName, UserName, EndPointMACAddress, Calling_Station_ID, Called_Station_ID | search $wild$ | top limit=0 Location</query>
<earliest>$time.earliest$</earliest>
<latest>$time.latest$</latest>
</search>
<option name="count">10</option>
<option name="drilldown">cell</option>
<option name="refresh.display">progressbar</option>
</table>
</panel>
<panel>
<table>
<title>Top ISE Policy Set Name</title>
<search>
<query>index=cisco-ise | transaction event_id | table _time, host, event_id, NetworkDeviceName, NAS_IP_Address, NAS_Port, Location, SelectedAuthenticationIdentityStores, SelectedAuthorizationProfiles, SSID, ISEPolicySetName, UserName, EndPointMACAddress, Calling_Station_ID, Called_Station_ID | search $wild$ | top limit=0 ISEPolicySetName</query>
<earliest>$time.earliest$</earliest>
<latest>$time.latest$</latest>
</search>
<option name="count">10</option>
<option name="drilldown">cell</option>
<option name="refresh.display">progressbar</option>
</table>
</panel>
<panel>
<table>
<title>Top User Name</title>
<search>
<query>index=cisco-ise | transaction event_id | table _time, host, event_id, NetworkDeviceName, NAS_IP_Address, NAS_Port, Location, SelectedAuthenticationIdentityStores, SelectedAuthorizationProfiles, SSID, ISEPolicySetName, UserName, EndPointMACAddress, Calling_Station_ID, Called_Station_ID | search $wild$ | top limit=0 UserName</query>
<earliest>$time.earliest$</earliest>
<latest>$time.latest$</latest>
</search>
<option name="count">10</option>
<option name="drilldown">cell</option>
<option name="refresh.display">progressbar</option>
</table>
</panel>
</row>
<row>
<panel>
<table>
<title>Details</title>
<search>
<query>index=cisco-ise | transaction event_id | table _time, host, event_id, NetworkDeviceName, NAS_IP_Address, NAS_Port, Location, SelectedAuthenticationIdentityStores, SelectedAuthorizationProfiles, SSID, ISEPolicySetName, UserName, EndPointMACAddress, Calling_Station_ID, Called_Station_ID | search $wild$</query>
<earliest>$time.earliest$</earliest>
<latest>$time.latest$</latest>
<sampleRatio>1</sampleRatio>
</search>
<option name="count">10</option>
<option name="dataOverlayMode">none</option>
<option name="drilldown">cell</option>
<option name="percentagesRow">false</option>
<option name="refresh.display">progressbar</option>
<option name="rowNumbers">false</option>
<option name="totalsRow">false</option>
<option name="wrap">true</option>
</table>
</panel>
</row>
</form>

By Tony Lee

If you are reading this page chances are good that you have both Splunk and Infoblox DNS. While there is a pre-built TA (https://splunkbase.splunk.com/app/2934/) to help with the parsing, we needed some visualizations, so we wrote them and figured we would share what we created.

Figure 1: At the time of writing this article, only a TA existed for Infoblox DNS.

If you have this same situation, hopefully we can help you too. As a bonus, we will include the dashboard code at the end of the article.

Figure 2: Dashboard that we include at the end of the article

Raw Log

This is what an Infoblox raw log might look like:

30-Apr-2013 13:35:02.187 client 10.120.20.32#42386: query: foo.com IN A + (100.90.80.102)

Source: https://docs.infoblox.com/display/NAG8/Capturing+DNS+Queries+and+Responses

Fields to Parse

Fortunately, our job is taken care of by the Infoblox TA (https://splunkbase.splunk.com/app/2934/)! Just use the sourcetype of infoblox:dns to ensure it is properly parsed.

Conclusion

Even though we only had a Splunk TA (and not an app to go with it), we used the flexibility provided within Splunk to gain insight into Infoblox DNS logs. We hope this article helps other save time. Feel free to leave comments in the section below. Happy Splunking!

Dashboard Code

The following dashboard assumes that the appropriate logs are being collected and sent to Splunk. Additionally, the dashboard code assumes an index of infoblox and a sourcetype of infoblox:dns. Feel free to adjust as necessary. Splunk dashboard code provided below:

<form>
<label>Infoblox DNS</label>
<description>This is a high volume data feed - Be mindful of your time range</description>
<fieldset submitButton="true">
<input type="time" token="time" searchWhenChanged="true">
<label>Time Range</label>
<default>
<earliest>-15m</earliest>
<latest>now</latest>
</default>
</input>
<input type="text" token="wild" searchWhenChanged="true">
<label>Wildcard Search</label>
<default>*</default>
</input>
</fieldset>
<row>
<panel>
<table>
<title>Total DNS Traffic by Infoblox Host</title>
<search>
<query>| tstats count where index=infoblox, sourcetype="infoblox:dns" by host</query>
<earliest>$time.earliest$</earliest>
<latest>$time.latest$</latest>
</search>
<option name="count">10</option>
<option name="drilldown">none</option>
<option name="refresh.display">progressbar</option>
</table>
</panel>
<panel>
<table>
<title>Top dns_request_client_ip</title>
<search>
<query>index=infoblox sourcetype="infoblox:dns" $wild$ | table _time, host, message_type, record_type, query, dns_request_client_ip, dns_request_client_port, dns_request_name_serverIP, named_message | top limit=0 dns_request_client_ip</query>
<earliest>$time.earliest$</earliest>
<latest>$time.latest$</latest>
</search>
<option name="drilldown">cell</option>
<option name="refresh.display">progressbar</option>
</table>
</panel>
<panel>
<table>
<title>Top message_type</title>
<search>
<query>index=infoblox sourcetype="infoblox:dns" $wild$ | table _time, host, message_type, record_type, query, dns_request_client_ip, dns_request_client_port, dns_request_name_serverIP, named_message | top limit=0 message_type</query>
<earliest>$time.earliest$</earliest>
<latest>$time.latest$</latest>
</search>
<option name="count">10</option>
<option name="drilldown">cell</option>
<option name="refresh.display">progressbar</option>
</table>
</panel>
<panel>
<table>
<title>Top record_type</title>
<search>
<query>index=infoblox sourcetype="infoblox:dns" $wild$ | table _time, host, message_type, record_type, query, dns_request_client_ip, dns_request_client_port, dns_request_name_serverIP, named_message | top limit=0 record_type</query>
<earliest>$time.earliest$</earliest>
<latest>$time.latest$</latest>
</search>
<option name="drilldown">cell</option>
<option name="refresh.display">progressbar</option>
</table>
</panel>
<panel>
<table>
<title>Top query</title>
<search>
<query>index=infoblox sourcetype="infoblox:dns" $wild$ | table _time, host, message_type, record_type, query, dns_request_client_ip, dns_request_client_port, dns_request_name_serverIP, named_message | top limit=0 query</query>
<earliest>$time.earliest$</earliest>
<latest>$time.latest$</latest>
</search>
<option name="drilldown">cell</option>
<option name="refresh.display">progressbar</option>
</table>
</panel>
</row>
<row>
<panel>
<table>
<search>
<query>index=infoblox sourcetype="infoblox:dns" $wild$ | table _time, host, message_type, record_type, query, dns_request_client_ip, dns_request_client_port, dns_request_name_serverIP, named_message</query>
<earliest>$time.earliest$</earliest>
<latest>$time.latest$</latest>
<sampleRatio>1</sampleRatio>
</search>
<option name="count">10</option>
<option name="dataOverlayMode">none</option>
<option name="drilldown">cell</option>
<option name="percentagesRow">false</option>
<option name="refresh.display">progressbar</option>
<option name="rowNumbers">false</option>
<option name="totalsRow">false</option>
<option name="wrap">true</option>
</table>
</panel>
</row>
</form>

By Tony Lee

This article builds on our Infoblox DNS article available at: http://securitysynapse.com/2019/01/parsing-and-displaying-infoblox-dns-in-splunk.html

If you are reading this page chances are good that you have both Splunk and Infoblox DHCP. While there is a pre-built TA (https://splunkbase.splunk.com/app/2934/) to help with the parsing, we needed some visualizations, so we wrote them and figured we would share what we created.

Figure 1: At the time of writing this article, only a TA existed for Infoblox DHCP.

If you have this same situation, hopefully we can help you too. As a bonus, we will include the dashboard code at the end of the article.

Figure 2: Dashboard that we include at the end of the article

Raw Log

This is what an Infoblox raw log might look like:

Sep 4 09:23:44 10.34.6.28 dhcpd[20310]: DHCPACK on 70.1.20.250 to fc:5c:fc:5f:10:85 via eth1 relay 10.120.20.66 lease-duration 600

Source: https://docs.infoblox.com/display/NAG8/Using+a+Syslog+Server

Fields to Parse

Fortunately, our job is taken care of by the Infoblox TA (https://splunkbase.splunk.com/app/2934/)! Just use the sourcetype of infoblox:dhcp to ensure it is properly parsed.

Search String

Now that the data is parsed, we can use the following to table the data:

index=infoblox sourcetype="infoblox:dhcp" | table _time, host, action, signature, src_category, src_hostname, src_ip, src_mac, dest_category, dest_hostname, dest_ip, relay

Combine a few panels together and we will have a dashboard similar to the one in the dashboard code section at the bottom of the article.

Conclusion

Even though we only had a Splunk TA (and not an app to go with it), we used the flexibility provided within Splunk to gain insight into Infoblox DHCP logs. We hope this article helps other save time. Feel free to leave comments in the section below. Happy Splunking!

Dashboard Code

The following dashboard assumes that the appropriate logs are being collected and sent to Splunk. Additionally, the dashboard code assumes an index of infoblox. Feel free to adjust as necessary. Splunk dashboard code provided below:

<form>
<label>Infoblox DHCP</label>
<description>This is a high volume data feed - Be mindful of your time range</description>
<fieldset submitButton="true">
<input type="time" token="time" searchWhenChanged="true">
<label>Time Range</label>
<default>
<earliest>-4h@h</earliest>
<latest>now</latest>
</default>
</input>
<input type="text" token="wild" searchWhenChanged="true">
<label>Wildcard Search</label>
<default>*</default>
</input>
</fieldset>
<row>
<panel>
<table>
<title>Total DHCP Traffic by Infoblox Host</title>
<search>
<query>| tstats count where index=infoblox, sourcetype="infoblox:dhcp" by host</query>
<earliest>$time.earliest$</earliest>
<latest>$time.latest$</latest>
</search>
<option name="count">10</option>
<option name="drilldown">cell</option>
<option name="refresh.display">progressbar</option>
</table>
</panel>
<panel>
<table>
<title>Top Action</title>
<search>
<query>index=infoblox sourcetype="infoblox:dhcp" $wild$ | table _time, host, action, signature, src_category, src_hostname, src_ip, src_mac, dest_category, dest_hostname, dest_ip, relay | top limit=0 action</query>
<earliest>$time.earliest$</earliest>
<latest>$time.latest$</latest>
</search>
<option name="drilldown">cell</option>
<option name="refresh.display">progressbar</option>
</table>
</panel>
<panel>
<table>
<title>Top signature</title>
<search>
<query>index=infoblox sourcetype="infoblox:dhcp" $wild$ | table _time, host, action, signature, src_category, src_hostname, src_ip, src_mac, dest_category, dest_hostname, dest_ip, relay | top limit=0 signature</query>
<earliest>$time.earliest$</earliest>
<latest>$time.latest$</latest>
</search>
<option name="drilldown">cell</option>
<option name="refresh.display">progressbar</option>
</table>
</panel>
<panel>
<table>
<title>Top Servicing Host</title>
<search>
<query>index=infoblox sourcetype="infoblox:dhcp" $wild$ | table _time, host, action, signature, src_category, src_hostname, src_ip, src_mac, dest_category, dest_hostname, dest_ip, relay | top limit=0 src_hostname</query>
<earliest>$time.earliest$</earliest>
<latest>$time.latest$</latest>
</search>
<option name="drilldown">cell</option>
<option name="refresh.display">progressbar</option>
</table>
</panel>
<panel>
<table>
<title>Top src_ip</title>
<search>
<query>index=infoblox sourcetype="infoblox:dhcp" $wild$ | table _time, host, action, signature, src_category, src_hostname, src_ip, src_mac, dest_category, dest_hostname, dest_ip, relay | top limit=0 src_ip</query>
<earliest>$time.earliest$</earliest>
<latest>$time.latest$</latest>
</search>
<option name="drilldown">cell</option>
<option name="refresh.display">progressbar</option>
</table>
</panel>
<panel>
<table>
<title>Top dest_ip</title>
<search>
<query>index=infoblox sourcetype="infoblox:dhcp" $wild$ | table _time, host, action, signature, src_category, src_hostname, src_ip, src_mac, dest_category, dest_hostname, dest_ip, relay | top limit=0 dest_ip</query>
<earliest>$time.earliest$</earliest>
<latest>$time.latest$</latest>
</search>
<option name="drilldown">cell</option>
<option name="refresh.display">progressbar</option>
</table>
</panel>
</row>
<row>
<panel>
<table>
<title>Raw Logs</title>
<search>
<query>index=infoblox sourcetype="infoblox:dhcp" $wild$ | table _time, host, action, signature, src_category, src_hostname, src_ip, src_mac, dest_category, dest_hostname, dest_ip, relay</query>
<earliest>$time.earliest$</earliest>
<latest>$time.latest$</latest>
<sampleRatio>1</sampleRatio>
</search>
<option name="count">10</option>
<option name="dataOverlayMode">none</option>
<option name="drilldown">cell</option>
<option name="percentagesRow">false</option>
<option name="refresh.display">progressbar</option>
<option name="rowNumbers">false</option>
<option name="totalsRow">false</option>
<option name="wrap">true</option>
</table>
</panel>
</row>
</form>

By Tony Lee

We found it a bit surprising that there are so few articles on how to use an rsyslog server to forward logs to Splunk. This provided the motivation to write this article and hopefully save others some Googling. Choosing between rsyslog, syslog-ng, or other software is entirely up to the reader and may depend on their environment and approved/available software. We realize that this is not the only option for log architecture or collection, but it may help those faced with this task—especially if rsyslog is the standard in their environment.

Warnings

Before we jump in, we wanted to remind you of three potential gotchas that may thwart your success and give you a troubleshooting migraine.

Network firewalls – You may not own this, but make sure the network path is clear
iptables– Complex rule sets can throw you for a loop
SE Linux– Believe it or not, SE Linux when locked down can prevent the writing of the log files

If something is not working the way you expect it to work, it is most likely due to one of the three items mentioned above. It could be worth temporarily disabling them until you get everything working. Just don’t forget to go back and lock it down.

Note: We will also be using Splunk Universal Forwarders (UF) in this article. Universal Forwarders have very little pre-processing or filtering capabilities when compared to Heavy Forwarders. If significant filtering is necessary, consider using a Splunk Heavy Forwarder in the same fashion as we are using the UFs below.

Architecture

Whether your Splunk instance is on-prem or it is in the cloud, you will most likely need syslog collectors and forwarders at some point. The architecture diagram below shows one potential configuration. The number of each component is configurable and dependent upon the volume of traffic.

Figure 1: Architecture diagram illustrating traffic flow from data sources to the Index Cluster

Rsyslog configuration

Rsyslog is a flexible service, but in this case rsyslog’s primary role will be to:

Open the sockets to accept data from the sources
Properly route traffic to local temporary files that Splunk will forward on to the indexers

If you are fortunate enough to be able to route network traffic to different ports, you may be able to reduce the if-then logic shown below for routing the events to separate files. In this case, we were not able to open separate ports from the load balancer, thus we needed to do the routing on our end. In the next article we will cover more advanced routing to include regex and traffic coming in on different ports.

Note: Modern rsyslog is designed to run extra config files that exist in the /etc/rsyslog.d/ directory. If that directory exists, place the following 15-splunk-rsyslog.conf file in that directory. Otherwise, the /etc/rsyslog.conf file is interpreted from top to bottom, so make a copy of your current config file (cp /etc/rsyslog.conf /etc/rsyslog.bak) and selectively add the following at the top of the new active rsyslog.conf file. This addition to the rsyslog configuration will do the following (assuming the day is 2018-06-01:

Open TCP and UDP 514
Write all data from 192.168.1.1 to: /rsyslog/cisco/192.168.1.1/2018-06-01.log
Write all data from 192.168.1.2 to: /rsyslog/cisco/192.168.1.2/2018-06-01.log
Write all data from 10.1.1.* to /rsyslog/pan/10.1.1.*/2018-06-01.log (where * is the last octet of the source IP
Write all remaining data to /rsyslog/unclaimed/<host>/2018-06-01.log (where <host> is the source IP or hostname of the sender)

Note: If the rsyslog server sees the hosts by their hostname instead of IP address, feel free to use $fromhost == '<hostname>' in the configuration file below.

/etc/rsyslog.d/15-splunk-rsyslog.conf

$ModLoad imtcp

$ModLoad imudp

$UDPServerRun 514

$InputTCPServerRun 514

# do this in FRONT of the local/regular rules

$template ciscoFile,"/rsyslog/cisco/%fromhost%/%$YEAR%-%$MONTH%-%$DAY%.log"

$template PANFile,"/rsyslog/pan/%fromhost%/%$YEAR%-%$MONTH%-%$DAY%.log"

$template unclaimedFile,"/rsyslog/unclaimed/%fromhost%/%$YEAR%-%$MONTH%-%$DAY%.log"

if ($fromhost-ip == '192.168.1.1' or $fromhost-ip == '192.168.1.2') then ?ciscoFile

& stop

if $fromhost-ip startswith '10.1.1' then ?PANFile

& stop

else ?unclaimedFile

& stop

# local/regular rules, like

*.* /var/log/syslog.log

Note: Rsyslog should create directories that don't already exist, but just in case it doesn't, you need to create the directories and make them writable. For example:

mkdir -p /rsyslog/cisco/

mkdir -p /rsyslog/pan/

mkdir -p /rsyslog/unclaimed/

Pro tip: After making changes to the rsyslog config file, you can verify that there are no syntax errors BEFORE you restart the rsyslog daemon. For a simple rsyslog config validation. Try using the following command:

rsyslogd -N 1

If there are no errors, then you should be good to restart the rsyslog service so your changes take effect:

service rsyslog restart

Log cleanup

The rsyslog servers in our setup are not intended to store the data permanently. They are intended to act as a caching server for temporary storage before shipping the logs off to the Splunk Indexers for proper long-term storage. Since disk space is not unlimited on these caching servers we will need to implement log rotation and deletion so we do not fill up the hard disk. Our rsyslog config file already takes care of the log rotation with the template parameter specifying the name of the file as “%$YEAR%-%$MONTH%-%$DAY%.log", however, we still need to clean up the files, so they don’t sit there indefinitely. One possible solution is to use a daily cron job to clean up files in the /rsyslog/ directory that are more than x days old (where x is defined by the organization). Once you have some files in the /rsyslog/ directory, try the following command to see what would potentially be deleted. The command below lists files in the rsyslog directory that are older than two days.

find /rsyslog/ -type f -mtime +1 -exec ls -l "{}" \;

If you are happy with a two-day cache period, add it to a daily cron job (as shown below). Otherwise feel free to play with the +1 until you are comfortable with what it will delete and use that for your cron job.

/etc/cron.daily/logdelete.sh

find /rsyslog/ -type f -mtime +1 -delete

Splunk Universal Forwarder (UF) Configuration

Splunk Forwarders are very flexible in terms of data ingest. For example, they can create listening ports, monitor directories, run scripts, etc. In this case, since rsyslog is writing the information to a directory, we will use a Splunk UF to monitor those directories and send them to the appropriate indexes and label them with the appropriate sourcetypes. See our example configuration below.

Note: Make sure the indexes mentioned below exist prior to trying to send data there. These will need to be created within Splunk. Also ensure that the UF is configured to forward data to indexers (out of the scope of this write up).

/opt/splunkforwarder/etc/apps/SplunkForwarder/local/inputs.conf

[monitor:///rsyslog/cisco/]

whitelist = \.log$

host_segment=3

sourcetype = cisco:ios

index = cisco

[monitor:///rsyslog/pan/]

whitelist = \.log$

host_segment=3

sourcetype = pan:traffic

index = pan_logs

[monitor:///rsyslog/unclaimed/]

whitelist = \.log$

host_segment=3

sourcetype = syslog

index = lastchanceindex

Pro tip: Remember to restart the Splunk UF after modifying files.

/opt/splunkforwarder/bin/splunk restart

Conclusion

A simple Splunk search of index=cisco, index=pan_logs, or index=lastchanceindex should be able to confirm that you are now receiving data in Splunk. Keep monitoring the lastchanceindex to move hosts to where they need to go as they come on-line. Moving the hosts is accomplished by editing the rsyslog.conf file and possibly adding another monitor stanza within the Splunk UF config. This process can be challenging to create, but once it is going, it just needs a little care from time to time to make sure that all is well. We hope you found this article helpful. Happy Splunking!

References

By Tony Lee

Welcome to part II in our series covering how to use rsyslog to route and forward logs to Splunk. Please see Part I of the series (http://securitysynapse.blogspot.com/2019/01/rsyslog-fun-basic-splunk-log-collection-part-i.html) for the basics in opening ports, routing traffic by IP address or hostname, and monitoring files to send the data on to Splunk Indexers. As a reminder, choosing between rsyslog, syslog-ng, or other software is entirely up to the reader and may depend on their environment and approved/available software. We also realize that this is not the only option for log architecture or collection, but it may help those faced with this task—especially if rsyslog is the standard in their environment. That said, let's look at some more advanced scenarios concerning file permissions, routing logs via regex, and routing logs via ports. We will wrap up with some helpful hints on a possible method to synchronize the rsyslog and Splunk configuration files.

File Permissions

There are times where you may need to adjust the file permissions for the files that rsyslog is writing to disk. For example, if following best practice and running the Splunk Universal Forwarder as a lower privileged account, it will need access to the logs files. Using the following rsyslogd.conf directives at the top of the configuration file will change the permissions on the directories and files created. The following example creates directories with permissions of 755 and files with a permission of 644:

$umask 0000

$DirCreateMode 0755

$FileCreateMode 0644

Routing logs via Regex

Another more advanced rsyslog option is the ability to drop or route data at the event level via regex. For example, maybe you want to drop certain packets -- such as Cisco Teardown packets generated from ASA's. Note: this rsyslog ability is useful since we are using Splunk Universal Forwarders in our example and not Splunk Heavy Forwarders.

Or maybe you have thousands of hosts and don't want to maintain a giant list of IP addresses in an if-statement. For example, maybe you want to route thousands of Cisco Meraki host packets to a particular file via a regex pattern.

Possibly even more challenging would be devices in a particular CIDR range that end in a specific octet.

These three examples are covered in the rsyslog.conf snippet below:

#Drop Cisco ASA Teardown packets

:msg, contains, ": Teardown " ~

& stop

#Route Cisco Meraki hosts to specific directory

if ($msg contains ' events type=') then ?ciscoMerakiFile

& stop

#ICS Devices 10.160.0.0/11 (last octet being .150)
:fromhost-ip, regex, "10\.\$1[6-8][0-9]\\|19[0-1]\$\..*\.150" -?icsDevices

& stop

Routing logs via Port

I know we just provided you the ability to route packets via regex, however sometimes that can be inefficient--especially at high events per second. If you are really fortunate, the source sending the data has the ability to send to a different port. Then it may be worth looking into routing data to different files based on port. The example file below provides port 6517 and 6518 as an example.

#Dynamic template names

template(name="file6517" type="string" string="/rsyslog/port6517/%FROMHOST%/%$YEAR%-%$MONTH%-%$DAY%.log")

template(name="file6518" type="string" string="/rsyslog/port6518/%FROMHOST%/%$YEAR%-%$MONTH%-%$DAY%.log")

#Rulesets

ruleset(name="port6517"){

action(type="omfile" dynafile="file6517")

}

ruleset(name="port6518"){

action(type="omfile" dynafile="file6518")

}

input(type="imtcp" port="6517" ruleset="port6517")

input(type="imtcp" port="6518" ruleset="port6518")

Synchronizing Multiple Rsyslog Servers

Since our architecture in part I outlined using a load balancer and multiple rsyslog servers, we will eventually need a way to synchronize the configuration files across the multiple rsyslog servers. The example below provides two bash shell scripts to perform just that task. The first one will synchronize the rsyslog configuration and the second will synchronize the Splunk configuration--both scripts restart the respective service. Note: This is not the only method available for synchronization, but it is one possible method. Remember to replace <other_server> with the actual IP or FQDN of that server.

On the rsyslog server that you make the changes on, create these two bash scripts and modify the <other_server> section. Once you make a change to the rsyslog or Splunk UF configuration, run the necessary script.

sync-rsyslog.sh

scp /etc/rsyslog.conf <other_server>:/etc/rsyslog.conf

ssh <other_server> service rsyslog restart

sync-splunk.sh

scp /opt/splunkforwarder/etc/apps/SplunkUniversalForwarder/local/inputs.conf <other_server>:/opt/splunkforwarder/etc/apps/SplunkUniversalForwarder/local/inputs.conf

ssh <other_server> /opt/splunkforwarder/bin/splunk restart

Conclusion

In this article, we outlined key advanced features within rsyslog that may not be immediately evident. Hopefully this article will save you some Googling time when trying to operationalize log collection and forwarding using rsyslog in your environment. After all, eventually you will probably need to deal with file permissions, routing logs via regex and/or port, and configuration synchronization. We hope you enjoyed the article and found it useful. Feel free to post your favorite tips and tricks in the comments section below. Happy Splunking!

By Tony Lee

This series is not intended to start a “Big Data” holy war, but instead hopefully offer some unbiased insight for those looking to implement Splunk, ELK or even both platforms. After all both platforms are highly regarded in their abilities to collect, parse, analyze, and display log data. In fact, the first article in this series will show how the two competing technologies are similar in the following areas:

Purpose
Architecture
Cost

Caveat

Most articles on this subject seem to have some sort of agenda to push folks in one direction or another—so we will do our absolute best to keep it unbiased. We admit that we know Splunk better than we know the ELK stack, so we are banking on ELK (and even Splunk) colleagues and readers to help keep us honest. Lastly, our hope is to update this article as we learn or receive more information and the two products continue to mature.

Similar Purpose

Both Splunk and ELK stack are designed to be highly efficient in log collection and search while allowing users to create visualizations and dashboards. The similar goal and purpose of the two platforms naturally means that many of the concepts are also similar. One minor annoyance is that the concepts are referred to by different names. Thus, the table below should help those that are familiar with one platform map ideas and concepts to the other.

Splunk	ELK Stack
Search Head	Kibana
Indexer	Elastic Search
Forwarder	Logstash
Universal Forwarder	Beats (Filebeat, Metricbeat, Packetbeat, Winlogbeat, Auditbeat, Heartbeat, etc.)
Search Processing Language (SPL)	Lucene query syntax
Panel	Panel
Index	Index

Similar Architecture

In many ways, even the architecture between Splunk and ELK are very similar. The diagram below highlights the key components along with the names of each component in both platforms.

Figure 1: Architectural similarities

Cost

This is also an area where there are more similarities than most would imagine due to a misconception that ELK (with comparable features to Splunk) is free. While the core components may be free, the extensions that make ELK an enterprise-scalable log collection platform are not free—and this is by design. According to Shay Banon, Founder, CEO and Director of Elasticsearch:

“We are a business. And part of being a business is the belief that those businesses who can pay us, should. And those who cannot, should not be paying us. In return, our responsibility is to ensure that we continue to add features valuable to all our users and ensure a commercial relationship with us is beneficial to our customers. This is the balance required to be a healthy company.”

Elastic does this by identifying “high-value features and to offer them as commercial extensions to the core software. This model, sometimes called ‘open core’, is what culminated in our creation of X-Pack. To build and integrate features and capabilities that we maintain the Intellectual Property (IP) of and offer either on a subscription or a free basis. Maintaining this control of our IP has been what has allowed us to invest the vast majority of our engineering time and resources in continuing to improve our core, open source offerings.”

Source: https://www.elastic.co/blog/doubling-down-on-open

That said, which enterprise-critical features aren’t included in the open source or even basic free license? The subscription comparison screenshot found below shows that one extension not included for free is Security (formerly Shields). This includes Encrypted communications, Role-based Access Control (RBAC), and even authentication. Most would argue that an enterprise needs a login page and the ability to control who can edit vs. view searches, visualizations, and dashboards, thus it is not a fair comparison to say that Splunk costs money while ELK is free. There are alternatives to X-PACK, but we will leave that to another article since it is not officially developed and maintained as part of the ELK stack.

Figure 2: Encryption, RBAC, and even authentication is not free

In terms of host much Splunk costs vs. ELK, there are also many arguments there--some of which include the cost of build time, maintenance, etc. It mostly depends on your skills to negotiate with each vendor.

Conclusion

Splunk and ELK stack are similar in many ways. In fact, knowing one platform can help a security practitioner learn the other because many of the concepts are close enough to transfer. The reduction in the learning curve is a huge advantage for those that need to convert from one platform to the other. That said, there are differences, however we will discuss those in the next article. In the meantime, we hope that this article was useful for you and we are open to feedback and corrections, so feel free to leave your comments below. Please note that any inappropriate comments will not be posted—thanks in advance. 😊

By Tony Lee

In the first part of our series (http://securitysynapse.blogspot.com/2019/02/splunk-and-elk-impartial-comparison-part-i.html), we discussed the similarities between Splunk and the ELK stack. Part II will discuss some of the differences in terms of limitations. Not all of these are deal breakers and they cannot necessarily be scored as one for one in terms of importance. But it is good for folks to know the differences before implementing one platform vs. the other. We welcome the reader to chime in with their own limitations (or corrections) as well. We will start off with the Splunk limitations and then follow up with the ELK limitations. Remember, these are not necessarily weighted equally in terms of importance (as that is determined by the end user), so we are not declaring a winner.

Splunk Limitations

- ELK can easily create dynamically named indexes and keys, Splunk cannot

- ELK can search on a wildcarded key… For example: search host.*=foo

- ELK provides DevTools àConsole: a useful method for running commands against the ELK instance from the Kibana GUI

- Splunk does not provide relevance weighting such as ELK’s _score field

ELK limitations

- ELK does not allow piping of search commands to create more complex commands ßThis is one of the most difficult differences to overcome when transitioning from Splunk to ELK

- Splunk is considered “Schema on read”, which means you can throw pretty much anything at it and it may autoparse or can be parsed later. ELK requires more upfront parsing to make use of the data.

- There is no central manager for beat agents, Splunk includes a deployment server for free which manages Universal Forwarders

- discuss.elastic.co closes threads after 60 days of inactivity… Splunk Answers never closes a thread and thus users can contribute at any time – this helps prevent duplicate entries and stale worthless data

- Installation of Splunk can be completed in minutes, ELK takes much more time and is more dependent upon versions of each component since there is no unified installer

- Kibana can only sort on numeric fields and not alphabetical fields

- It appears that Splunk has more mathematical/statistical functions out of the box

- ELK has a separate beat for collecting different sources/components of a system. Splunk has a single Universal Forwarder that can collect different data sources by using a flexible configuration file.

- ELK time range selector is missing a range for: Quick à All time

- ELK may introduce significant “breaking changes” on new version releases which can cause some customers to become stuck on a certain version of the platform. Splunk seems to be very careful not to do this and it is rare and often not as limiting if it does occur.

Conclusion

This should serve as an initial list of limitations for both platforms. Again, we will not declare a winner because some of those limitations may not matter to the end user, however it is good to get the list out in the open for discussion. Both platforms are always looking for ways to innovate and improve the customer experience. These lists are often a good start for that purpose and competition is definitely a good thing. If you have a correction, please keep it constructive and it will get posted in the comments section below. Thanks for reading. 😉

By Tony Lee

Have you ever wanted to update the time range for all of the panels in a dashboard using a timechart selection? (See screenshot below)

Figure 1: Timechart selection to update earliest and latest variables

This feat is possible using the smallest amount of code, but it is not the most intuitive process -- which makes it a perfect blog article to highlight this ability.

At first we thought this would be a drilldown feature and spent many precious minutes in the GUI editor. However, my sharp colleague Arjun Mathew pointed out an obscure docs article that contained information regarding "selection". Then we found this other more concise article on Chart Controls:

https://docs.splunk.com/Documentation/Splunk/7.2.3/Viz/Chartcontrols

How it works

As mentioned before, we do not believe this is exposed through the GUI, so you will need to use the simple XML editor. We are updating the dashboard code (first timechart panel) we provided in the 4740 account lockout article (http://securitysynapse.com/2018/08/troubleshooting-windows-account-lockout-part-i.html) to now possess this feature.

Inside of the <chart> tags, we will add the following:

<set token="form.time.earliest">$start$</set>

</selection>

This will now set the form.time.earliest and latest fields in that dashboard in real time. This controls all of the remaining panels in the 4740 dashboard and makes a perfect use case in which we may want to use a timechart to control the sub panels.

Conclusion

We hope by highlighting the selection tag that it gets more use in creating a better user experience. For right now, it is not controlled via the web UI editor, however as its popularity grows, this may change. Happy Splunking!

By Tony Lee and Matt Kemelhar

This series on osquery will take us on a journey from stand-alone agents, to managing multiple agents with Kolide, and then finally onto more advanced integrations, queries, and analysis. Crawl, walk, run, right? Ok, let's start crawling.

What is osquery?

osquery (https://osquery.io/) is an open source agent developed by Facebook that allows organizations to query endpoints of varying operating system using the same SQL syntax. These queries can be used for security, compliance, or DevOps as event-based, user-driven, or scheduled information gathering. Once the user learns the SQL syntax and osquery schema it will work the same across multiple operating systems [Windows, macOS, FreeBSD, Debian, RPM Linux, etc.] (for the most part).

For example, to list processes on Windows, it can be accomplished natively using the tasklist command. For Linux/Unix this same task can be accomplished using the ps command. If you are in osquery, regardless of the operating system, it can be accomplished with select * from processes; While this may seem more cumbersome at first, there is an advantage of a single query and normalized output across all supported operating systems.

Installation

Installation is simple using one of the provided installers found here:
https://osquery.io/downloads/official

There are installation instructions for each operating system in the docs section of the site:

https://osquery.readthedocs.io/en/stable/

For example, if you are looking for Windows installation instructions you would go here:
https://osquery.readthedocs.io/en/stable/installation/install-windows/

For the majority of our article, it is simple, we will download the Windows .msi and double click it.

Interaction

Once osquery is installed (in this example on Windows), you can check to make sure the default installation path was created and populated. In windows, it is: C:\ProgramData\osquery

Then in a command prompt, check to see if the osqueryd agent is running using the following command:

C:\>sc.exe query osqueryd

SERVICE_NAME: osqueryd
TYPE : 10 WIN32_OWN_PROCESS
STATE : 4 RUNNING
(STOPPABLE, NOT_PAUSABLE, ACCEPTS_SHUTDOWN)
WIN32_EXIT_CODE : 0 (0x0)
SERVICE_EXIT_CODE : 0 (0x0)
CHECKPOINT : 0x0

WAIT_HINT : 0x0

If it is not running, try using:

C:\>sc.exe start osqueryd

Once running, we should be able to start the local client (osqueryi.exe) and run some queries. By default it is located in: c:\programdata\osquery\osqueryi.exe. Run this from the command line and you will receive a new osquery prompt. Try the following to ensure that the agent and client are working properly:

osquery> select * from uptime;

+------+-------+---------+---------+---------------+
| days | hours | minutes | seconds | total_seconds |
+------+-------+---------+---------+---------------+
| 21 | 10 | 17 | 34 | 1851454 |
+------+-------+---------+---------+---------------+

Here are a few useful commands to remember:

.help = help menu

.tables = list all the possible tables to query

.summary = version and configuration

.mode = change the output mode: csv, column, line, list, pretty (default)

.exit = leave the program

Pro-tip: The osqueryi client remembers command history so use the up and down arrows liberally.

Online Schema

We showed you a couple of queries so far, but how are you supposed to know what else exists?

1) You can run .tables within the osqueryi client

2) You can use the online schema (https://osquery.io/schema/) that contains every table, all columns, types, descriptions, and even displays the operating systems supported.

Figure 1: The osquery schema - a great reference

Linux Example

For those with Linux, it is just as easy. At the time of this writing here is the latest release:

Download:
wget https://pkg.osquery.io/deb/osquery_3.3.2_1.linux.amd64.deb

Install:
dpkg -i osquery_3.3.2_1.linux.amd64.deb

Usage:
root@ubuntu:~/osquery# osqueryi
-- snip --
successfully completed!
Using a virtual database. Need help, type '.help'

osquery> select * from osquery_info;

Uninstall:

dpkg --remove osquery

Conclusion

Now that we understand the basics of osquery installation and local client usage, it should be very apparent that this will not scale to hundreds of thousands of hosts. Thus, we need an osquery manager to make it enterprise ready. However, we will leave this topic to the next article.