Investigating UNC5537: Snowflake Database Threat Campaign

Written by Team Axon | Sep 11, 2024 6:53:10 PM

By Alon Klayman and Yagel Yosef, Team Axon.

BACKGROUND

In June 2024, it was discovered that a targeted threat campaign by a financially motivated threat actor (tracked by the name “UNC5537”) was conducted against Snowflake customers. Per the investigations conducted by Mandiant and Snowflake, it was identified that multiple Snowflake customers were affected by an info-stealer malware, which led to stolen credentials prior to the unauthorized access to the organizational Snowflake account. The threat actor was able to leverage these credentials as there was no MFA enabled on these accounts.

While many blog posts and publications have outlined basic threat-hunting queries related to this breach, we aim to provide advanced technical insights into an unverified Hunter’s response to this threat campaign. Our approach includes sophisticated hunting efforts that leverage Snowflake's statistical and analytical functionalities, as well as comprehensive follow-up investigations to ensure our customers' security. This blog post will showcase not just standard queries, but advanced techniques and methodologies that elevate threat-hunting and investigation capabilities beyond the basics.

Our insights can be used as part of security teams’ hunting and investigation efforts to make sure their Snowflake accounts are safe.

It is important to not only hunt for and investigate threats but to also carry out proactive measures to ensure the safety of your Snowflake accounts. You can utilize the Snowflake Trust Center to assess and monitor your account for security risks.

TECHNICAL DETAILS

At the beginning of this incident, we had some specific pieces of information, including the following:

Unverified threat intelligence indicating a potential threat to Snowflake customers
Communication from Snowflake that included Indicators of Compromise (IOCs):
- IP addresses
- Snowflake connections using specific client characteristics

Using the information above, we initiated threat-hunting and investigation efforts. First, we collected and created a set of Snowflake threat-hunting queries, including queries to look for the specific IOCs mentioned above. The additional queries were to look for different characteristics, such as suspicious clients, unauthorized login attempts, different exfiltration methods, suspicious daily error rates, spikes in user queries volume, etc.

Following the hunting efforts, we investigated specific hunting “hits” that stood out, to make sure there were no malicious activities among them. The following section includes a human-friendly, in-depth explanation regarding some of the main hunting queries we used.

HUMAN-FRIENDLY HUNTING QUERIES

As part of the threat-hunting campaign, we used a variety of queries to hunt for different types of malicious activities, in addition to the specific IOC lookups conducted directly based on the IOC publications by Snowflake.

In this section we provide a few examples of the queries we used, and an in-depth explanation regarding each one of them, to make sure anyone who uses them fully understands the rationale behind them and can use those explanations to better understand other Snowflake-related hunting queries.

HUNTING QUERY 1: Suspicious Snowflake Sessions - Unusual Application

Query logic: This query can be used for the detection of unusual, non-common applications and client characteristics that had been used to connect to the Snowflake account, using a comparison to the previous usage baseline.
Query:


WITH PREVIOUSLY_USED_APPS AS (
    SELECT DISTINCT PARSE_JSON(client_environment) :APPLICATION::STRING   AS PREV_CLIENT_APPS
    FROM SNOWFLAKE.ACCOUNT_USAGE.SESSIONS
    WHERE CREATED_ON >= '2023-12-01'
        AND CREATED_ON < '2024-03-01'
        AND PREV_CLIENT_APPS IS NOT NULL
)
SELECT PARSE_JSON(client_environment) :APPLICATION::STRING 			AS CLIENT_APPLICATION,
       PARSE_JSON(client_environment) :OS::STRING 					AS CLIENT_OS,
       PARSE_JSON(client_environment) :OS_VERSION::STRING 			AS CLIENT_OS_VERSION,
       ARRAY_AGG(SESSIONS.SESSION_ID) 								AS SESSION_IDS,
       ARRAY_AGG(SESSIONS.LOGIN_EVENT_ID) 							AS LOGIN_EVENT_IDS,
       MIN(created_on) 												AS MIN_CREATED_ON,
       MAX(CREATED_ON) 												AS MAX_CREATED_ON,
       COUNT(*) 													AS COUNTER
FROM SNOWFLAKE.ACCOUNT_USAGE.SESSIONS sessions
WHERE 1 = 1
    AND sessions.created_on >= '2024-03-01'
    AND CLIENT_APPLICATION NOT IN (
        SELECT *
        FROM PREVIOUSLY_USED_APPS
    )
GROUP BY 1, 2, 3
HAVING COUNTER < 50
ORDER BY 1 ASC;

In-Depth Query Logic:
- In the first part of the query we used WITH PREVIOUSLY_USED_APPS AS to create what is known as CTE (Common Table Expression). This means that after the execution of this part of the query, the results will be saved in a “temporary table” named “PREVIOUSLY_USED_APPS”, and will be accessible by other parts of the query.
- In this CTE we query for distinct client applications that were previously used in the environment (PREV_CLIENT_APPS) - as the baseline of client applications used between December 2023 and February 2024.
- In the second (the main) part of the query, we group (GROUP BY) the client applications, client operating systems, and client operating systems versions, that were used between March 2024 and the current date, fetching additional information like first-time seen and last time seen (MIN/MAX_CREATED_ON) and relevant session and login IDs.
- We use 2 clean-ups as part of the main query:
  1. First, we remove all the client applications that were identified in the CTE query.
  2. Lastly, we use the HAVING option to clean the grouped results that appeared in more than 50 sessions between March and the current date.
    
    This way, we identify sessions in which client applications weren’t used as part of the baseline time (CTE) and also weren’t extensively used over the last months.
    
    (Note: modify the threshold of 50 and the timestamps based on investigation/infrastructure characteristics.)

HUNTING QUERY 2: Snowflake Daily Error Rate Anomaly

Query logic: The following query can be used for the detection of Snowflake account exploration/enumeration that may have been conducted by an unauthorized actor, based on unauthorized access attempts to data.
Query:

WITH error_stats AS (
SELECT 	date_trunc('day',START_TIME)                        AS DATE,
		USER_NAME					                        AS USER_NAME,
		COUNT(*) 					                        AS ERROR_COUNT
FROM SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY
WHERE ERROR_CODE IS NOT NULL
AND START_TIME >= current_timestamp - interval '4 months'
AND ERROR_MESSAGE <> 'Amount of data in Snowflake internal staging area exceeds the soft limit. Please remove unneeded files or contact Snowflake Support about raising the limit.'
AND NOT ERROR_MESSAGE LIKE 'SQL compilation error%'
GROUP BY date, USER_NAME
),

total_queries AS (
SELECT date_trunc('day',START_TIME) 	                    AS DATE,
       USER_NAME,
	   COUNT(*) 				                            AS TOTAL_QUERIES
FROM SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY
WHERE START_TIME >= current_timestamp - interval '4 months'
GROUP BY DATE, USER_NAME
),

final_stats AS (
SELECT   tq.DATE,
		 tq.USER_NAME,
		 tq.TOTAL_QUERIES,
         COALESCE(es.error_count, 0) 		                AS ERROR_COUNT,
(COALESCE(es.error_count, 0) / tq.total_queries) * 100      AS DAILY_ERROR_PERCENTAGE
FROM total_queries tq
LEFT JOIN error_stats es ON tq.date = es.date AND tq.USER_NAME = es.USER_NAME
)

SELECT * FROM final_stats;

In-Depth Query Logic:

First, we use the “CTE” feature of Snowflake to define two views:

1. Total queries
2. Error_stats.

Both views are derived from the QUERY_HISTORY table that Snowflake generates.

In the “total_queries” CTE we:

- Round up the date to the day instead of the exact time: date_trunc('day',START_TIME) AS date
- Group the events per username per day in the last four months, and display a count of the query for each user per day:
  - WHERE START_TIME >= current_timestamp - interval '4 months'
  - GROUP BY date, USER_NAME

And in the “error_stats” view we:
- Count events per user per day in the same manner discussed in our previous subquery, but this time filtering only Snowflake errors in the ‘WHERE’ clause:
  - WHERE ERROR_CODE IS NOT NULL
    
    (Note: we are excluding common errors that happen naturally and should not be taken into account, like syntax errors or background staging Snowflake errors.)

Finally, we join these two views together by the day and username to pair the daily error count to the daily total query count and display the total percentage of the errors from the total query count for each user per day.

(Note: we are using the COALESCE function to return 0 in case a NULL value is found in any of the subqueries.)

HUNTING QUERY 3: Potential exfiltration based on spikes in user queries volume

Query logic: The following query computes the average and standard deviation of daily bytes written for each user. It then flags any day where the total bytes written significantly exceed the baseline, indicating potential anomalies.
Query:


WITH USER_DAILY_BYTES AS (
SELECT  USER_NAME												AS USER_NAME,
        DATE_TRUNC('DAY', END_TIME) 							AS QUERY_DATE,
        QUERY_TEXT,
        BYTES_WRITTEN_TO_RESULT,
        SUM(BYTES_WRITTEN_TO_RESULT) OVER (PARTITION BY USER_NAME, DATE_TRUNC('DAY',END_TIME))						AS TOTAL_BYTES_WRITTEN,
ROW_NUMBER() OVER (PARTITION BY USER_NAME, DATE_TRUNC('DAY', END_TIME) ORDER BY BYTES_WRITTEN_TO_RESULT DESC)		AS ROW_NUMBER
FROM SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY
WHERE END_TIME >= CURRENT_TIMESTAMP() - INTERVAL '30 DAY'
),

TOP_QUERY_PER_DAY AS (
SELECT 				    USER_NAME,
					    QUERY_DATE,
					    QUERY_TEXT,
					    BYTES_WRITTEN_TO_RESULT,
					    TOTAL_BYTES_WRITTEN
FROM USER_DAILY_BYTES
WHERE ROW_NUMBER = 1
),


USER_DAILY_AVERAGE AS (
SELECT  USER_NAME,
        AVG(TOTAL_BYTES_WRITTEN) 		 AS AVG_BYTES_WRITTEN,
STDDEV_SAMP(total_bytes_written) 	     AS STDDEV_BYTES_WRITTEN
FROM TOP_QUERY_PER_DAY
GROUP BY USER_NAME
)


SELECT      u.USER_NAME,
            u.QUERY_DATE,
            u.QUERY_TEXT                             AS TOP_QUERY_TEXT,
            ROUND(u.BYTES_WRITTEN_TO_RESULT, 2)      AS TOP_QUERY_BYTES_WRITTEN,
            ROUND(u.TOTAL_BYTES_WRITTEN, 2) 		 AS TOTAL_BYTES_WRITTEN,
            ROUND(a.AVG_BYTES_WRITTEN, 2) 		     AS AVG_DAILY_BYTES,
            ROUND(a.STDDEV_BYTES_WRITTEN, 2) 		 AS STDDEV_DAILY_BYTES
FROM TOP_QUERY_PER_DAY u
JOIN USER_DAILY_AVERAGE a
ON u.USER_NAME = a.USER_NAME
WHERE u.TOTAL_BYTES_WRITTEN > a.AVG_BYTES_WRITTEN
AND u.TOTAL_BYTES_WRITTEN > a.AVG_BYTES_WRITTEN + 3 * a.stddev_bytes_written
AND u.TOTAL_BYTES_WRITTEN > 1000000 -- Minimum threshold (1 MB)
ORDER BY u.USER_NAME, u.QUERY_DATE;

In-Depth Query Logic: First, we use the “CTE” feature of Snowflake to construct three views:
1. User_daily_bytes:
  - We query ACCOUNT_USAGE.QUERY_HISTORY and round up the date to the day DATE_TRUNC('DAY', END_TIME)
  - Sum the number of bytes written by the user: SUM(BYTES_WRITTEN_TO_RESULT) OVER (PARTITION BY USER_NAME, DATE_TRUNC('DAY',END_TIME)) AS total_bytes_written
  - Order the result by the top number of bytes for each user:
    
    ROW_NUMBER() OVER (PARTITION BY USER_NAME, DATE_TRUNC('DAY', END_TIME) ORDER BY
    BYTES_WRITTEN_TO_RESULT DESC) AS row_number
    
    (This will effectively let us use the RN column to get the top result for each user on the next query. We found this information very helpful for the investigation since the majority of the noise is originated from a single heavy query that a user conducted that is out of his "usual" behavior)
2. Top_query_per_day: Query the previous daily bytes subquery and get the top result for each user (row_number = 1)
3. User_daily_average: Average the daily bytes written for each user AVG(total_bytes_written) AS avg_bytes_written,
  Calculate the standard deviation of the bytes written for each user:
  STDDEV_SAMP(total_bytes_written)
Finally, we pair each user’s top query per day to the daily average and display results where:
- The total number of bytes is bigger than average
- The total number of bytes is bigger than the average number of bytes plus three times the standard deviation of the user
- The total number of bytes is greater than 1MB, to reduce unnecessary noise
  
  This lets us view only relevant anomalous results.

HUNTING QUERY 4: Authentication attempts to a locked/disabled user account

Query logic: The following query can be used to detect attempts to log in to inactive or disabled accounts (for example, accounts of employees who have left your organization), which may indicate unauthorized activities. This is valuable both for initial threat-hunting, but also for continuous monitoring after disabling suspiciously compromised user account.
Query:


SELECT EVENT_TIMESTAMP                                     AS TIMESTAMP,
       EVENT_TYPE                                          AS EVENT_TYPE,
       USER_NAME                                           AS USER_NAME,
       CLIENT_IP                                           AS CLIENT_IP,
       REPORTED_CLIENT_TYPE                                AS CLIENT_TYPE,
       REPORTED_CLIENT_VERSION                             AS CLIENT_VERSION,
       IS_SUCCESS                                          AS IS_SUCCESSFULL_LOGIN,
       FIRST_AUTHENTICATION_FACTOR                         AS FIRST_AUTH_FACTOR,
       SECOND_AUTHENTICATION_FACTOR                        AS SECOND_AUTH_FACTOR,
       ERROR_MESSAGE                                       AS ERROR_MESSAGE,
       ERROR_CODE                                          AS ERROR_CODE
FROM SNOWFLAKE.ACCOUNT_USAGE.LOGIN_HISTORY
WHERE IS_SUCCESS = 'NO'
AND ERROR_MESSAGE = 'USER_ACCESS_DISABLED'
AND event_timestamp >= current_timestamp - interval '8 months'

Reader Account Query: When hunting for suspicious activities in Snowflake, it is very important to look for them in the “READER_ACCOUNT_*” tables as well to avoid missing important hits. An in-depth explanation of this aspect can be found below (“The Importance of Threat Hunting in Reader Accounts”).


SELECT READER_ACCOUNT_NAME                       AS READER_ACCOUNT_NAME,
       EVENT_TIMESTAMP                           AS TIMESTAMP,
       EVENT_TYPE                                AS EVENT_TYPE,
       USER_NAME                                 AS USER_NAME,
       CLIENT_IP                                 AS CLIENT_IP,
       REPORTED_CLIENT_TYPE                      AS CLIENT_TYPE,
       REPORTED_CLIENT_VERSION                   AS CLIENT_VERSION,
       IS_SUCCESS                                AS IS_SUCCESSFULL_LOGIN,
       FIRST_AUTHENTICATION_FACTOR               AS FIRST_AUTH_FACTOR,
       SECOND_AUTHENTICATION_FACTOR           	 AS SECOND_AUTH_FACTOR,
       ERROR_MESSAGE                             AS ERROR_MESSAGE,
       ERROR_CODE                                AS ERROR_CODE
FROM SNOWFLAKE.READER_ACCOUNT_USAGE.LOGIN_HISTORY
WHERE IS_SUCCESS = 'NO'
AND ERROR_MESSAGE = 'USER_ACCESS_DISABLED'
AND event_timestamp >= current_timestamp - interval '8 months'

In-Depth Query Logic:
- These queries are very straightforward, we look for the LOGIN_HISTORY table of Snowflake, to look for failed login attempts (IS_SUCCESS = ‘NO’), when the error (ERROR_MESSAGE) specifically indicates “USER_ACCESS_DISABLED”, over a time interval of last 8 months.
- The important thing to keep in mind is we used 2 different queries. One for the Reader Account and one for the “standard” logon history logs to make sure we cover suspicious logon events of both types.
  
  (Note: This query is not the only one in which separate execution against Reader accounts’ logs is required. We added this example for clarity. Please make sure you cover the Reader account logs in every query that uses a table that has a dedicated Reader account table.

THE IMPORTANCE OF THREAT HUNTING IN READER ACCOUNTS

One aspect we wanted to highlight is the importance of covering Reader Account’s logs as part of hunting/investigation efforts.

Reader accounts are designed to provide read-only access to a Snowflake database shared by a Snowflake “provider.” Imagine a multi-tenant cloud setup where each customer operates their own distinct database, all managed by a central vendor. If you, as the vendor, wish to grant read access to the underlying data lake, you would utilize a reader account.

For organizations functioning as Snowflake providers, where reader accounts are employed to facilitate data access, it is essential to include the logs from these accounts in your hunting and investigation routines. Ignoring them could leave significant gaps in your threat detection strategy.

The logs related to Reader accounts can be found in dedicated log tables. For example, in Team Axon’s UNC5537-focused threat-hunting /investigations, the following tables were used:

READER_ACCOUNT_USAGE.QUERY_HISTORY
READER_ACCOUNT_USAGE.LOGIN_HISTORY

By looking at those tables we made sure that potential malicious activities related to Reader Accounts will be identified by the threat-hunting efforts conducted.

INVESTIGATION TIPS

As part of threat-hunting, one of the most important parts is of course the follow-up investigation of interesting hits. This section includes some questions that should be asked by analysts when investigating suspicious hits that came up in Snowflake threat-hunting efforts, followed by some examples of investigation queries that can be used throughout the investigation of different types of threat-hunting hits.

Main questions:

Was the user account used as part of the suspicious activity and was it regularly active in the weeks/months before?
Is the user account used as part of suspicious activity normally used at this time of day?
Does the user normally use the client application used in the suspicious session?
Is the client application used in the suspicious activity regularly used in this Snowflake account?
Was the source IP address used as part of previous Snowflake logon activities conducted prior to the known attack time frame?
Was the source IP address used as part of previous Snowflake logon activities conducted during the estimated/known attack time frame?
Is the IP address known as “organizational IP”?
Does the relevant user account have MFA enabled?
Which queries were used as part of the suspicious session?
- Do they align with the type of queries normally used by this user account?
- If not, can we identify potential malicious usage characteristics, such as exfiltration patterns in query history, etc.
Were the databases/tables accessed by this user account, being normally accessed by it? (for example, access to 30 different databases or tables, while the user normally uses only 2 specific tables, makes the activity very suspicious).
Which IP was used as part of the logon activity related to the suspicious session? What can we tell about this IP address?

Queries that can be used as part of Snowflake investigation efforts:

QUERY 1: Logon information based on suspicious Session IDs
In case of identification of a suspicious Snowflake session, one of the first things we would like to do is get the relevant login information related to this session.

This logon information is important for us to get additional characteristics related to the user and application used for this session.

Here are the important pieces of information you can get using this investigation query:

Source IP address
Client application - you get some information as part of the session column, however, in the logon_history table, we get a more specific identifier of the application.
Logon type → SAML/OAuth/Password/etc → Was there MFA usage or not?


SELECT      SESSION_TABLE_1.CREATED_ON                                                                               AS SESSION_CREATION_TIME,
            PARSE_JSON(SESSION_TABLE_1.client_environment):APPLICATION::STRING                                       AS SESSION_CLIENT_APPLICATION,
            PARSE_JSON(SESSION_TABLE_1.client_environment):OS::STRING                                                AS SESSION_CLIENT_OS,
            PARSE_JSON(SESSION_TABLE_1.client_environment):OS_VERSION::STRING                                        AS SESSION_CLIENT_OS_VERSION,
            SESSION_TABLE_1.SESSION_ID                                                                               AS SESSION_ID,
            SESSION_TABLE_1.USER_NAME                                                                                AS USER_NAME,
            SESSION_TABLE_1.AUTHENTICATION_METHOD                                                                    AS SESSION_AUTHENTICATION_METHOD,
            SESSION_TABLE_1.LOGIN_EVENT_ID                                                                           AS LOGIN_EVENT_ID,
            SESSION_TABLE_1.CLIENT_APPLICATION_ID                                                                    AS SESSION_CLIENT_APPLICATION_ID,
            SESSION_TABLE_1.CLOSED_REASON                                                                            AS SESSION_CLOSED_REASON,
            LOGIN_HISTORY_2.EVENT_ID                                                                                 AS LOGIN_HISTORY_EVENT_ID,
            LOGIN_HISTORY_2.EVENT_TIMESTAMP                                                                          AS LOGIN_EVENT_TIME,
            LOGIN_HISTORY_2.EVENT_TYPE                                                                               AS LOGIN_EVENT_TYPE,
            LOGIN_HISTORY_2.CLIENT_IP                                                                                AS LOGIN_CLIENT_IP,
            LOGIN_HISTORY_2.REPORTED_CLIENT_TYPE                                                                     AS LOGIN_REPORTED_CLIENT_TYPE,
            LOGIN_HISTORY_2.REPORTED_CLIENT_VERSION                                                                  AS LOGIN_REPORTED_CLIENT_VERSION,
            LOGIN_HISTORY_2.FIRST_AUTHENTICATION_FACTOR                                                              AS LOGIN_FIRST_AUTH_FACTOR,
            LOGIN_HISTORY_2.SECOND_AUTHENTICATION_FACTOR                                                             AS LOGIN_SECOND_AUTH_FACTOR, 
            LOGIN_HISTORY_2.IS_SUCCESS                                                                               AS SUCCESSFULL_LOGIN,
            LOGIN_HISTORY_2.ERROR_MESSAGE                                                                            AS LOGIN_ERROR_MESSAGE,
            LOGIN_HISTORY_2.ERROR_CODE                                                                               AS LOGIN_ERROR_CODE
    FROM SNOWFLAKE.ACCOUNT_USAGE.SESSIONS AS SESSION_TABLE_1
    JOIN SNOWFLAKE.ACCOUNT_USAGE.LOGIN_HISTORY AS LOGIN_HISTORY_2 ON SESSION_TABLE_1.LOGIN_EVENT_ID = LOGIN_HISTORY_2.EVENT_ID
            -- Modify timestamps according to investigation timeframe
            WHERE LOGIN_HISTORY_2.EVENT_TIMESTAMP BETWEEN '2024-01-01 00:00:00' and '2024-06-22 22:30:00'
             -- Modify timestamps according to investigation timeframe
            AND SESSION_TABLE_1.CREATED_ON BETWEEN '2024-01-01 00:00:00' and '2024-06-22 22:30:00'
            -- Insert the Login Event ID of suspicious session(s) to get the relevant login details.
            AND SESSION_TABLE_1.LOGIN_EVENT_ID IN ('')

QUERY 2: Get relevant Session information and Login information related to suspicious queries identified
A convenient way to quickly gather relevant session details and login information of suspicious queries identified.


SELECT QUERY_HISTORY_3.QUERY_TEXT                                                                                AS QUERY_TEXT,
       QUERY_HISTORY_3.QUERY_TAG                                                                                 AS QUERY_TAG,
       QUERY_HISTORY_3.START_TIME                                                                                AS QUERY_TIME,
       QUERY_HISTORY_3.QUERY_ID                                                                                  AS QUERY_ID,
       QUERY_HISTORY_3.QUERY_TYPE                                                                                AS QUERY_TYPE,
       QUERY_HISTORY_3.ROLE_NAME                                                                                 AS ROLE_NAME,
       QUERY_HISTORY_3.EXECUTION_STATUS                                                                          AS EXECUTION_STATUS,
       QUERY_HISTORY_3.ROWS_WRITTEN_TO_RESULT                                                                    AS RESULTS_ROWS_WRITTEN,
       QUERY_HISTORY_3.BYTES_WRITTEN_TO_RESULT                                                                   AS RESULTS_BYTES_WRITTEN,
       SESSIONS_TABLE_2.SESSION_ID                                                                               AS SESSION_ID,
       SESSIONS_TABLE_2.CREATED_ON                                                                               AS SESSION_CREATION_TIME,
       PARSE_JSON(SESSIONS_TABLE_2.client_environment):APPLICATION::STRING                                       AS SESSION_CLIENT_APPLICATION,
       PARSE_JSON(SESSIONS_TABLE_2.client_environment):OS::STRING                                                AS SESSION_CLIENT_OS,
       PARSE_JSON(SESSIONS_TABLE_2.client_environment):OS_VERSION::STRING                                        AS SESSION_CLIENT_OS_VERSION,
       SESSIONS_TABLE_2.SESSION_ID                                                                               AS SESSION_ID,
       SESSIONS_TABLE_2.USER_NAME                                                                                AS USER_NAME,
       SESSIONS_TABLE_2.AUTHENTICATION_METHOD                                                                    AS SESSION_AUTHENTICATION_METHOD,
       SESSIONS_TABLE_2.LOGIN_EVENT_ID                                                                           AS LOGIN_EVENT_ID,
       SESSIONS_TABLE_2.CLIENT_APPLICATION_ID                                                                    AS SESSION_CLIENT_APPLICATION_ID,
       SESSIONS_TABLE_2.CLOSED_REASON                                                                            AS SESSION_CLOSED_REASON,
       LOGIN_HISTORY_1.EVENT_ID                                                                                  AS LOGIN_HISTORY_EVENT_ID,
       LOGIN_HISTORY_1.EVENT_TIMESTAMP                                                                           AS LOGIN_EVENT_TIME,
       LOGIN_HISTORY_1.EVENT_TYPE                                                                                AS LOGIN_EVENT_TYPE,
       LOGIN_HISTORY_1.CLIENT_IP                                                                                 AS LOGIN_CLIENT_IP,
       LOGIN_HISTORY_1.REPORTED_CLIENT_TYPE                                                                      AS LOGIN_REPORTED_CLIENT_TYPE,
       LOGIN_HISTORY_1.REPORTED_CLIENT_VERSION                                                                   AS LOGIN_REPORTED_CLIENT_VERSION,
       LOGIN_HISTORY_1.FIRST_AUTHENTICATION_FACTOR                                                               AS LOGIN_FIRST_AUTH_FACTOR,
       LOGIN_HISTORY_1.SECOND_AUTHENTICATION_FACTOR                                                              AS LOGIN_SECOND_AUTH_FACTOR,
       LOGIN_HISTORY_1.IS_SUCCESS                                                                                AS SUCCESSFULL_LOGIN
    from SNOWFLAKE.ACCOUNT_USAGE.LOGIN_HISTORY AS LOGIN_HISTORY_1
    join snowflake.account_usage.SESSIONS      AS SESSIONS_TABLE_2 ON EVENT_ID = LOGIN_EVENT_ID
    join snowflake.ACCOUNT_USAGE.QUERY_HISTORY AS QUERY_HISTORY_3  ON QUERY_HISTORY_3.SESSION_ID = SESSIONS_TABLE_2.SESSION_ID
    where SESSIONS_TABLE_2.SESSION_ID IN (
            select DISTINCT SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY.SESSION_ID
                from SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY
               -- Modify Query text to look for based on investigated activity
                where (REGEXP_LIKE(QUERY_TEXT, 'create\\s+share\\s.*','i') or REGEXP_LIKE(QUERY_TEXT, '\\s+to\\s+share\\s.*','i'))
                -- Modify timestamps according to investigation timeframe
                AND START_TIME BETWEEN '2024-04-01 00:00:00' and '2024-05-01 00:00:00')
                AND LOGIN_HISTORY_1.EVENT_TIMESTAMP > '2024-03-20 00:00:05.188000 +00:00' 
                AND QUERY_HISTORY_3.START_TIME BETWEEN '2024-04-01 00:00:00.188000 +00:00' AND '2024-05-01 15:26:05.188000 +00:00'
                AND SESSIONS_TABLE_2.CREATED_ON > '2024-03-20 00:00:05.188000 +00:00'
    ORDER BY QUERY_HISTORY_3.START_TIME ASC;

QUERY 3: Get relevant queries and Session IDs based on Login ID
Can be used to explore the activities that were conducted under the context of a specific suspicious login. For example, a case in which login event from a malicious/unknown IP address had been conducted.


SELECT ---- Login Information
       LOGIN_HISTORY_1.EVENT_ID                                                                         AS LOGIN_HISTORY_EVENT_ID,
       LOGIN_HISTORY_1.EVENT_TIMESTAMP                                                                  AS LOGIN_EVENT_TIME,
       LOGIN_HISTORY_1.EVENT_TYPE                                                                       AS LOGIN_EVENT_TYPE,
       LOGIN_HISTORY_1.CLIENT_IP                                                                        AS LOGIN_CLIENT_IP,
       LOGIN_HISTORY_1.REPORTED_CLIENT_TYPE                                                             AS LOGIN_REPORTED_CLIENT_TYPE,
       LOGIN_HISTORY_1.REPORTED_CLIENT_VERSION                                                          AS LOGIN_REPORTED_CLIENT_VERSION,
       LOGIN_HISTORY_1.FIRST_AUTHENTICATION_FACTOR                                                      AS LOGIN_FIRST_AUTH_FACTOR,
       LOGIN_HISTORY_1.SECOND_AUTHENTICATION_FACTOR                                                     AS LOGIN_SECOND_AUTH_FACTOR,
       LOGIN_HISTORY_1.IS_SUCCESS                                                                       AS SUCCESSFULL_LOGIN,
       ---- Session Details
       SESSIONS_TABLE_2.SESSION_ID                                                                      AS SESSION_ID,
       SESSIONS_TABLE_2.CREATED_ON                                                                      AS SESSION_CREATION_TIME,
       PARSE_JSON(SESSIONS_TABLE_2.client_environment):APPLICATION::STRING                              AS SESSION_CLIENT_APPLICATION,
       PARSE_JSON(SESSIONS_TABLE_2.client_environment):OS::STRING                                       AS SESSION_CLIENT_OS,
       PARSE_JSON(SESSIONS_TABLE_2.client_environment):OS_VERSION::STRING                               AS SESSION_CLIENT_OS_VERSION,
       SESSIONS_TABLE_2.SESSION_ID                                                                      AS SESSION_ID,
       SESSIONS_TABLE_2.USER_NAME                                                                       AS USER_NAME,
       SESSIONS_TABLE_2.AUTHENTICATION_METHOD                                                           AS SESSION_AUTHENTICATION_METHOD,
       SESSIONS_TABLE_2.LOGIN_EVENT_ID                                                                  AS LOGIN_EVENT_ID,
       SESSIONS_TABLE_2.CLIENT_APPLICATION_ID                                                           AS SESSION_CLIENT_APPLICATION_ID,
       SESSIONS_TABLE_2.CLOSED_REASON                                                                   AS SESSION_CLOSED_REASON,
       ---- Query Details
       QUERY_HISTORY_3.QUERY_TEXT                                                                       AS QUERY_TEXT,
       QUERY_HISTORY_3.QUERY_TAG                                                                        AS QUERY_TAG,
       QUERY_HISTORY_3.START_TIME                                                                       AS QUERY_TIME,
       QUERY_HISTORY_3.QUERY_ID                                                                         AS QUERY_ID,
       QUERY_HISTORY_3.QUERY_TYPE                                                                       AS QUERY_TYPE,
       QUERY_HISTORY_3.ROLE_NAME                                                                        AS ROLE_NAME,
       QUERY_HISTORY_3.EXECUTION_STATUS                                                                 AS EXECUTION_STATUS,
       QUERY_HISTORY_3.ROWS_WRITTEN_TO_RESULT                                                           AS RESULTS_ROWS_WRITTEN,
       QUERY_HISTORY_3.BYTES_WRITTEN_TO_RESULT                                                          AS RESULTS_BYTES_WRITTEN
FROM SNOWFLAKE.ACCOUNT_USAGE.LOGIN_HISTORY AS LOGIN_HISTORY_1
    JOIN SNOWFLAKE.ACCOUNT_USAGE.SESSIONS AS SESSIONS_TABLE_2 ON LOGIN_HISTORY_1.EVENT_ID = SESSIONS_TABLE_2.LOGIN_EVENT_ID
    JOIN SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY QUERY_HISTORY_3 ON SESSIONS_TABLE_2.SESSION_ID = QUERY_HISTORY_3.SESSION_ID
WHERE LOGIN_HISTORY_1.EVENT_ID = ''
-- Modify timestamps according to investigation timeframe
AND LOGIN_HISTORY_1.EVENT_TIMESTAMP BETWEEN '2024-03-12 00:00:00' AND '2024-03-15 00:30:00'
AND SESSIONS_TABLE_2.CREATED_ON BETWEEN '2024-03-12 00:00:00' AND '2024-03-15 00:30:00'
AND QUERY_HISTORY_3.START_TIME BETWEEN '2024-03-12 00:00:00' AND '2024-03-15 00:30:00'
ORDER BY QUERY_TIME ASC

QUERY 4: Correlation of Internal Logon Event (IP: 0.0.0.0) to potential source IP address
During the threat-hunting efforts conducted by the team, we witnessed cases in which the source IP of login events related to some hits of query history-based hunting queries was 0.0.0.0.

An example of a client application that was used in cases in which the source IP address was 0.0.0.0 is “Snowflake Web App (Snowscope sub-service)”. This is a sub-service, which means that the communication originated from Snowflake’s internal application, thus the correlation to previous logon event can be useful.

The following investigation query was used in those cases to try and correlate those Snowscope sub-service login events to the origin source IP used by the actual user.


SELECT     DISTINCT
           LHT1.EVENT_TIMESTAMP           								AS LOGON_EVENT_TIME,
           DATEADD(HOUR, -6, LOGON_EVENT_TIME)							AS LOGON_EVENT_TIME_MIUNS_6_HOUR,
           LHT1.USER_NAME                           					AS LOGON_USER_NAME,
           LHT1.EVENT_ID                            					AS LOGON_EVENT_ID,
           LHT1.FIRST_AUTHENTICATION_FACTOR         					AS LOGON_FIRST_AUTH_FACTOR,
           LHT1.SECOND_AUTHENTICATION_FACTOR        					AS LOGON_SECOND_AUTH_FACTOR,
           PARSE_JSON(ST2.client_environment) :APPLICATION :: STRING    AS CLIENT_APPLICATION,
           LHT1.CLIENT_IP                                               AS LOGON_CLIENT_IP,
           LHT3.EVENT_TIMESTAMP                                         AS PREVIOUS_LOGONS_TIMESTAMP,
           LHT3.USER_NAME                                               AS PREVIOUS_LOGONS_USER_NAME,
           LHT3.EVENT_ID                                                AS PREVIOUS_LOGON_EVENT_ID,
           PARSE_JSON(ST4.client_environment) :APPLICATION :: STRING    AS PREVIOUS_LOGONS_CLIENT_APPLICATION,
           LHT3.FIRST_AUTHENTICATION_FACTOR                             AS PREVIOUS_LOGON_FIRST_AUTH_FACTOR,
           LHT3.SECOND_AUTHENTICATION_FACTOR                            AS PREVIOUS_LOGON_SECOND_AUTH_FACTOR,
           LHT3.CLIENT_IP                                               AS PREVIOUS_LOGONS_LOGON_CLIENT_IP
FROM SNOWFLAKE.ACCOUNT_USAGE.LOGIN_HISTORY  AS LHT1
JOIN SNOWFLAKE.ACCOUNT_USAGE.SESSIONS       AS ST2 ON ST2.LOGIN_EVENT_ID = LHT1.EVENT_ID
JOIN SNOWFLAKE.ACCOUNT_USAGE.LOGIN_HISTORY  AS LHT3 ON (LHT3.EVENT_TIMESTAMP  BETWEEN LOGON_EVENT_TIME_MIUNS_6_HOUR AND LOGON_EVENT_TIME AND PREVIOUS_LOGONS_USER_NAME = LOGON_USER_NAME)
JOIN SNOWFLAKE.ACCOUNT_USAGE.SESSIONS       AS ST4 ON ST4.LOGIN_EVENT_ID = LHT3.EVENT_ID
WHERE LHT1.EVENT_TIMESTAMP BETWEEN '2024-03-01 00:00:00.000000 +00:00' and '2024-05-01 00:30:00.000000 +00:00'
            AND ST2.CREATED_ON BETWEEN '2024-03-01 00:00:00.000000 +00:00' and '2024-05-01 00:30:00.000000 +00:00'
            AND LHT3.EVENT_TIMESTAMP BETWEEN '2024-03-01 00:00:00.000000 +00:00' and '2024-05-01 00:30:00.000000 				+00:00'
            AND ST4.CREATED_ON BETWEEN '2024-03-01 00:00:00.000000 +00:00' and '2024-05-01 00:30:00.000000 +00:00'
            AND CLIENT_APPLICATION = 'Snowflake Web App (Snowscope sub-service)'
            AND PREVIOUS_LOGONS_CLIENT_APPLICATION != 'Snowflake Web App (Snowscope sub-service)'
ORDER BY LOGON_EVENT_TIME ASC

QUERY 5.1 & 5.2: Get info regarding suspicious IP addresses (cross-data sources)

The usage of pivoting is key in many types of investigations including the cases of Snowflake attacks threat-hunting/investigations.

While investigating different threat-hunting hits we used additional data sources to get to a final verdict.

Query 5.1 for example demonstrates a very simple query we used to check if an IP address of suspicious sign-in was part of our “Organizational_IP” schema at Hunters. Getting the indication of prevalence of this IP address usage in the organization is super useful.

Query 5.2 In addition, we also used specific data sources to better understand the characteristics of usage of specific user accounts and their work from a specific source IP address. For example, in the case of a suspicious query that we correlated to a sign-in event from the IP x.x.x.x → we looked for sign-ins of the specific user account in Okta logs, to check two different things:

1. Was this source IP address used as part of Okta usage as well?
2. And more importantly, what are the characteristics of the sessions of user of interest when using Okta from the suspicious IP address, compared to cases in which they signed into Okta from other IP addresses (did they access the applications they normally use? Did they use the same user-agent they normally use? etc.)


--- Query 5.1
SELECT IP                                  AS IP_ADDRESS,
       ID                                  AS USER_ID,
       ORIGINAL_SPECIFIC_SOURCE_TYPE       AS SPECIFIC_LOG_SOURCE,
       DAYS_SEEN                           AS NUMBERS_OF_DAYS_SEEN,
       BASELINE_START                      AS BASELINE_START,
       BASELINE_END                        AS BASELINE_END
FROM INVESTIGATION.ORGANIZATIONAL_IP
WHERE IP = ''


--- Query 5.2
SELECT PUBLISHED							AS EVENT_TIME,
       EVENT_TYPE                           AS EVENT_TYPE,
       DISPLAY_MESSAGE                      AS DISPLAY_MESSAGE,
       ACTOR_DISPLAY_NAME                   AS ACTOR_DISPLAY_NAME,
       ACTOR_ALTERNATE_ID                   AS ACTOR_ALTERNATE_ID,
       CLIENT_IP_ADDRESS                    AS SOURCE_IP,
       CLIENT_DEVICE                        AS CLIENT_DEVICE,
       CLIENT_USER_AGENT_RAW_USER_AGENT     AS USER_AGENT
FROM RAW.OKTA_LOGS
WHERE CLIENT_IP_ADDRESS ILIKE ''
AND ACTOR_ALTERNATE_ID ILIKE ''
AND PUBLISHED BETWEEN '2024-03-01 00:00:00' and '2024-07-15 00:30:00'

QUERY 6: Application baseline of a user
Similar to the Logon History baseline of a user mentioned above, a baseline of applications used as part of the user’s sessions can also be useful to understand if the suspicious activity aligns with the baseline or not.

SELECT USER_NAME                                                    AS USER_NAME,
       MIN(CREATED_ON)                                              AS MIN_CREATED_ON,
       MAX(CREATED_ON)                                              AS MAX_CREATED_ON,
       ARRAY_AGG(DISTINCT AUTHENTICATION_METHOD)                    AS AUTH_METHODS,
       ARRAY_AGG(DISTINCT CLIENT_APPLICATION_ID)                    AS CLIENT_APP_IDS,
       PARSE_JSON(client_environment):APPLICATION::STRING           AS CLIENT_APPLICATION,
       PARSE_JSON(client_environment):BROWSER_VERSION::STRING       AS BROWSER_VERSION,
       COUNT(*)                                                     AS COUNTER
FROM SNOWFLAKE.ACCOUNT_USAGE.SESSIONS
---- Insert the user name of interest
WHERE USER_NAME ILIKE ''
---- Modify Timestamp based on investigation timeframe
AND CREATED_ON BETWEEN '2024-06-01 00:00:00' and '2024-07-01 00:30:00'
GROUP BY 1,6,7

UNC5537 - HUNTERS DETECTION CONTENT

In addition to our threat hunting efforts, Hunters offers a detection layer of security that could provide early alerts and coverage if a customer was targeted by UNC5537. This added protection comes from our out-of-the-box detection content. This section outlines one such pre-existing detector that could have flagged the attack during its course and another that was developed in response to the campaign.

New Content: Snowflake Query Awareness Behavior
In response to the campaign, Hunters’ researchers released a new detector based on the behavior and the commands the threat actors executed on the database in the Reconnaissance phase (examples could be seen in the Mandiant report).

From the Hunter's documentation: "Detects an execution of multiple situational awareness database commands in a short period (30 minutes) on a Snowflake account.

Adversaries often use situational awareness commands on databases to discover critical information such as the location of sensitive data, user roles, and permissions, which helps them plan targeted attacks and privilege escalation. By listing and describing databases, tables, and schemas, as well as enumerating permissions, attackers gather insights that enable more effective and precise exploitation.

This detector checks the query_history table from both Snowflake accounts: account usage and reader account usage. The account usage contains auditing of the queries that were executed from this account and reader account usage contains auditing of the queries that were executed, but only the ones that are related to reader accounts.”

Important: The significance of monitoring Reader Account data (in the context of Snowflake) has been discussed in the Technical section. It is crucial to note that this data is covered here, as it is often overlooked but vital for comprehensive security.

Existing Detector: Impossible travel between SaaS logins

From the Hunter's documentation:“Detects anomalous consecutive SaaS logins from two different IP addresses by the same user, with the required traveling speed between them being impossible in the observed time frame. Impossible travel may indicate the logins were not made by the same person, therefore may indicate the user was compromised by a malicious actor.”

If a threat actor had logged into a Snowflake account from an irregular user or a geolocation significantly distant from the previous login location, the following detector would trigger an alert. This notification would prompt security personnel to take action, potentially preventing data exfiltration or other malicious activities before the threat actors could fully exploit their access.

Additional Existing Detectors
While these TTPs were not part of the UNC5537 attack vector, they would be useful in similar attack scenarios:

Login to a user tagged as terminated
From the Hunters’ documentation: “Detects a login event for a user marked as terminated. Terminated employees should be tagged as terminated_user using the asset tagging API or the protal's asset annotation. Threat actors can use accounts of terminated users that were not deleted properly. It is recommended to follow up on what was performed by the user after the termination date.”

This would be useful in attack scenarios involving credential stealing of old or deprecated accounts which are often used as an initial foothold to a SaaS environment.
SaaS application password spraying attempt
From the Hunters’ documentation: "Detects password spraying attempts. Threat actors usually use password spraying as a method of brute forcing into organizational accounts. This behavior can indicate of an IP address, which does not belong to the organization, attempting to log in to multiple organizational users without success. It is recommended to check whether the IPs are in fact part of the organization, and whether they appear as part of threat intel feeds. It should also be looked whether the password spraying attempt succeeded if there were successful logins from the IP address involved in the lead."

Another common method of initial foothold that could have been detected.
SaaS application brute force attempt
From the Hunters’ documentation: “Detects an anomalous number of failed logins to a specific account that could indicate a brute force attempt. This is a Time Series Anomaly detector that is specifically designed to identify anomalies in behavior over time. The detector compares current behavior with past behavior during a corresponding time period (e.g., the same weekday and hour in the previous week), as well as with behavior in the recent past (e.g. the past 10 hours).”

Another common method of initial access, but this time the detector leverages Hunters’ UEBA capabilities to determine the normal amount of failed logins expected from each user and identify anomalies straying from that number to reduce false positives.

SUMMARY
The purpose of this blog post is to share knowledge and guidelines that can be used for Snowflake hunting/investigations
We wanted to highlight the importance of comprehensive threat-hunting and investigation while providing tools that can be used by IR teams, threat hunters, and analysts to conduct investigations independently while understanding the logic of hunting queries
If you are reading this blog post due to the UNC5537-threat, keep in mind that Snowflake’s team provided some very good IOCs to work with, and we highly recommend using those for UNC5537-related threat-hunting efforts in case you use Snowflake in your organization
When hunting/investigating Snowflake threats, make sure you cover both Reader and Non-Reader logs as part of your threat-hunting efforts
When facing a suspicious logon event that represents an internal communication, you can try to correlate it to the actual source IP even though it doesn’t appear in the specific log entry
Additional organizational data sources can (and should) be used for Source IP baselining
Finally, we touched on the detection aspect to prevent incidents and specifically examples of Hunters’ existing and new content that could help detect such an attack in advance

RESOURCES

The following resources were used for the creation of some hunting/investigation queries:

https://community.snowflake.com/s/question/0D5VI00000Emyl00AB/detecting-and-preventing-unauthorized-user-access
https://community.snowflake.com/s/article/Communication-ID-0108977-Additional-Information
https://www.rezonate.io/blog/threat-hunting-in-snowflake/
https://www.mitiga.io/blog/tactical-guide-to-threat-hunting-in-snowflake-environments

View full post