A BlueVoyant Custom Levenshtein Detection
What Does This Detection Mitigate?
This use case has been designed to capture email spoofing attempts from an external attacker where the attacker impersonates an internal user or trusted supplier. As domain verification is not built into the Simple Mail Transfer Protocol (SMTP), attackers can counterfeit email addresses with the domain of a business or legitimate supplier. Unless the user inspects the email headers, they will be unaware that the headers have been forged and the email address displayed in their email is spoofed.
The objective is to convince an unsuspecting User into responding with data that could be exploited or open malicious content. This use case caters for both the replication of internal or trusted domains or close matches, which the naked eye could easily mistake.
Prerequisites
- The Sentinel Microsoft 365 Defender Data Connector must be connected. The only schema required is: EmailEvents.
- The SPN used in the Logic App (for all Sentinel related steps) requires the Log Analytics contributor role in this workspace. This enables the Logic App to delete and add to the Watchlist which maintains the entries detailing spoof attempts.
High Level Solution
BlueVoyant makes use the Microsoft advanced hunt schema ‘EmailEvents’. This log source forms the root of this detection due to its rich source of email logging.
The power of KQL is used to expand on the customer’s internal domains and output the results to CSV for processing in a Levenshtein algorithm. This is a PowerShell based Function App. The Function App sends close matches to a watchlist which an Analytics Rule alerts on.
KQL Query
After the Logic App initiates and clears down the Watchlist, which will be the final destination for the logs analyzed as email domain spoof attempts, the automation runs the below KQL query in Sentinel:
let internal_domains = dynamic([‘supplier.com’, ‘businessexample.com’]);
EmailEvents
| where DeliveryAction == “Delivered”
| extend External_Domain = case(EmailDirection == “Inbound”, SenderFromDomain, EmailDirection == “Outbound”, extract(@’\@(\S+)$’, 1, RecipientEmailAddress), “”)
| extend SenderDomain = extract(@’\@(\S+)$’, 1, SenderFromAddress), RecipientDomain = extract(@’\@(\S+)$’, 1, RecipientEmailAddress)
| where not(SenderDomain =~ RecipientDomain)
| extend User = case(EmailDirection == “Inbound”, RecipientEmailAddress, EmailDirection == “Outbound”, SenderFromAddress, “”)
| extend Attacker = case(EmailDirection == “Outbound”, RecipientEmailAddress, EmailDirection == “Inbound”, SenderFromAddress, “”)
| mv-apply internal_domains to typeof(string) on (where isnotempty(DeliveryAction) and isnotempty(User) and isnotempty(Attacker))
| project-rename InternalDomains = Column1
| summarize by InternalDomains, External_Domain, User, Attacker, Subject, EmailDirection
This query normalises the sender and recipient on inbound and outbound logs, so the attacker and internal user are the same fields whatever the direction of traffic.
It expands on internal domains so these can be compared against the external domains in the next step. The above simulation uses an example customer domain and their fictitious key supplier. The query aggregates on the salient fields after expanding on internal domains. The results are then put into a CSV table.
Function App
In the PowerShell script, the resulting CSV from the previous step is split into single lines and then split by the commas to separate into columns within the PowerShell processing. This Ignores any empty columns and the first line (as this is the header).
It then runs the first two columns, external and internal domains, through the Levenshtein function for comparison. For any lines that have a Levenshtein score of 4 or below, the columns are output as objects and converted to JSON.
Rule Output
The results are then added to a watchlist.
The final product shows where very close matches have successfully hit the mailbox of a user. It will also capture where the user has responded.
Conclusion
On the initial deployment of this solution, it was found that this completely satisfied the key requirement to capture an attacker who had successfully spoofed the customer domain i.e. a 100% domain match but the email sent from an external source.
This solution compares external domains against internal domains. Therefore, Third Party / Supplier spoof attempts can only be caught where the domain is a close match but not a 100% match. This is because the Third Party / Supplier will always be an external source so when compared as an external against itself, the Levenshtein function will generate a score of 0 (an exact match). This is correct but it’s also extremely likely that this is legitimate traffic so in this case, the Analytics Rule can be updated with a filter out in order to not alert on every supplier email, such as:
| where not(LevenshteinScore == 0 and Attacker has ‘supplier.com’)
This solution will capture spoofed Third Party / Supplier domains where there is a small difference such as:
- suppliar.com
- 2upplier.com
- supplier1.com
In summary, the solution will capture:
- Internal domain spoof attempts on both on a 100% match or close match basis, and;
- External domain spoof attempts where the domain is a close match only (not a 100% match)
References
https://www.powershellgallery.com/packages/gibbels-algorithms/1.0.3/Content/scripts%5Clevenshtein%5CTest-LevenshteinDistance.ps1