Inside Hunters: How we’re building and maintaining a top-tier integrations ecosystem

Written by Hunters Research Team | Nov 3, 2022 2:01:48 PM

Since the beginning of Hunters, we’ve been working to reshape the way teams approach security operations. If you’re familiar with us, you know we make a bold claim - we allow security teams to detect real threats, faster and more reliably than SIEMs. As a young SOC Platform operating in the crowded SIEM market, we’re competing against big, established contenders. The onus is on us to not just tell you, but also to show you what we have to offer against the biggest SIEM vendors in the industry.

The Hunters Integration Ecosystem

The foundation of a SOC Platform is the ability to ingest and consolidate logs from all tools and segments in the security environment. These tools are essential to ensure maximal security coverage for the organization. But between EDR, firewall, CSPM, identity, email, and every other tool in the enterprise security stack, managing each tool individually creates a cumbersome and inefficient workflow for the team. As a central tool for security teams, Hunters needs to be able to integrate to any product a security team might be using - which means hundreds of integrations, each slightly different. This is how we do it.

Integrating Your Security Tools

While teams use different security tools in different workflows, what they all have in common is a set of requirements for the integration to be successful. For the typical integration, all users need to be able to:

Ingest the data, getting it collected and parsed into a comfortable querying interface.
Explore the data, by querying the ingested data and automating refined rules as detection queries.
Triage alerts, going over them by criticality level and ensuring a standardized escalation flow.
Correlate events by connecting the dots and creating a flow of events for the suspected activity.
Respond by isolating and remediating any threat to the organization’s security.
Author any user-specific content related to the integration, such as custom detection rules.

To ensure all of these requirements, each integration needs to be carefully thought-out. Keeping data airtight, up to date, well-structured and accessible are top priority when it comes to your organization’s security operations.

Initial Exploration

The first thing we do in developing an integration is mapping the use case that’s relevant both to the typical customer’s workflow and to the outcome of the product being used. As data and security engineers, we need to identify the relevant components that need to be accessed and queried from the product’s interface. This might be done via REST API, webhook integrations, actively exporting to storage, on-premise collection, or some other method.

Mapping the required data types, or concrete schemas defined by the product, is the cornerstone of every successful integration. It’s important to carefully survey the existing data collection methods and find the most efficient way of executing them. We dedicate time to this up-front to avoid the pain of altering integrations (and dealing with escaped bugs) that are running in the environments of dozens of customers.

For the highest-quality integrations, it’s especially important to study the vendor’s documentation, including API documentation, output schema documentation, limitations and edge cases. In some cases, the vendors we integrate with are formal partners of ours, either from a technology alliance or a commercial perspective. This can both lead to a more thorough exploration, and help us better demonstrate the value of combining the two tools together.

Finally, we strive to test our new integrations over real customers’ data, to make sure they can handle the scale and real-world data uses of our customers, since they can be quite varied.

Bringing the Data In

After we’ve examined the data types exposed by the product, the next step is to access the data. In this stage the data is collected, usually in its raw format, in an intermediate staging area. Whether it’s JSONs, CSVs, key-values, or CEFs, we want them!

While doing this we need to take several major issues into account:

Hermeticity: Bringing in all data without any missing events.
Minimal delay: Reducing delays is crucial for identifying threats and responding in a timely manner.
Authentication: Getting access to our customers’ data should be done in the safest manner, having only relevant and minimal permissions (e.g. read-only access in some cases). This is managed when creating the access credentials.
Rate Limitation: Making sure that the vendor’s limitations, such as daily API call limits or event count limits, are known and handled appropriately.
Visibility: Making sure our customers are aware of errors occurring over time and their effects (like missing or delayed data), and are able to fix them when relevant (e.g. adding missing permissions).
Setup flexibility: Input parameters changed? Want to add another account to the API collection? We need to ensure that these, or any other setup use cases, are supported out-of-the-box in a flexible UI.

Parsing the Data

After we’ve collected the data, we need to make it accessible for automatic querying, investigation, compliance, and other important activities. Large amounts of data need to be consumed and processed in a timely manner, then pushed to a chosen database infrastructure.

When we’re consuming the data from the staging area, it’s important that we do it in a versatile and resilient manner. Various sources, such as AWS S3 or Azure Blobs, need to be supported and access to these needs to be managed and secure. Consuming the data itself is best done using a push methodology, i.e. letting the source manage the queue of data while we are listening on it.

Parsing the data consists of two main steps:

Decoding the files. In this step, the format of the file (like CSV, JSON, or CEF) needs to be identified and parsed with the relevant parameters, such as delimiters, headers, and splitters. In this stage we also encounter the various 3rd party shippers (like Cribl, Splunk, and FluentD) that are used to collect data by our customers, and usually add layers of metadata that need to be parsed as well.
Transforming the resultant data. Once the data is consumed from the input files, we need to transform it, parsing it from the raw input format to the expected output format. This includes extracting data from internal structures, transforming non-readable values into meaningful ones, and flattening or exploding events. In this stage we need to ensure that users of the resulting schema will have a smooth experience while querying and analyzing the data, which requires a thorough understanding of the data structure of the product’s schema.

What's Next

At this point, the data is in the database, and we need to make it actionable for the relevant security use cases. This means processing the data and translating it into relevant detection content, whether authored by the Hunters team or custom-written by customers. It’s important to make sure alerts are generated in a timely manner, which is often challenging for high volume data streams, like EDR raw telemetries or cloud audit events.

–

At Hunters, we pride ourselves on seamlessly connecting the dots between security tools for next-level threat detection and response. This blog was a small peak behind the scenes of our integrations ecosystem, the backbone of our SOC Platform. For more information on how Hunters helps security operations teams move beyond SIEM, schedule a demo.

View full post