Can you explain how your solution works?
Our solution, the Enterprise Immune System, analyses network flows using a physical probe resembling a server. It is positioned in the heart of the network and monitors flows within the IS, in order to detect anomalies and prevent an attack from spreading. Specifically, we monitor file sharing flows like SMB, Internet traffic, and even new USB keys when they are connected to Windows systems, since they send a query to the network by default. We can retrieve metadata at different levels, which can provide clues when a terminal is compromised. The alerts sent are classified mainly according to three metrics: the number of connected machines, the volume of data and connection failures. Today’s technologies do not yet allow systems to be 100 % autonomous, and that’s not our objective. With the growing volumes of data that we use, operators need contextualised information in order to be effective. We want to guide and simplify decision-making by easily providing operators with the most relevant information.
How many appliances (physical probes) are needed to cover an IS? Do they cover the entire network even if there are multiple sites or is one needed for each site?
It depends on the IS architecture and if there is a point like a data centre centralising the entire network flow. If so, a single appliance can suffice. On the other hand, if the sites are not completely linked to each other and if they have their own Internet connection, DHCP servers, local AD and DNS, an appliance will be needed for each site. However, we can centralise information feedback and monitor all of the sites accessible to the solution from the interface. Our solution is also very well suited to networks with IoT, smart phones and any other objects that communicate via the network.
We know that your solution can be used to study user behaviour and to detect anomalies. Can you explain how the different mathematical models to study these behaviours are generated? More specifically, do you start with a pre-trained learning model when you deploy your solution on a network, or is it entirely blank with each new deployment?
The behavioural model isn’t pre-trained; it’s trained on the network in order to learn the client’s habits and environment after the probe has been installed. For example: Our solution doesn’t know that an SSH connection is on TCP port 22. So, when it’s deployed, it will not send an alert if the SSH connection port of the IS is different. It learns what port is used for SSH. On the other hand, if there is a connection attempt on port 22, it will raise an alert. More specifically, our learning is based on three classes of information:
- The logs of the machines studied on a case-by-case basis
- Netflow and forensic network information accessed via ELK (ElasticSearch, Logstash, Kibana)
- Packet captures and analyses using Wireshark tools
The data’s lifespan is limited and varies according to their volume. The logs are kept between eight and 12 months, ELK data approximately six to eight weeks, and network frames (packet capture) for one week. We can monitor users by keeping this information (which remains on the client’s machines, with no analysis in the Cloud). DHCP log analysis makes it possible to monitor a machine, Kerberos traffic and its user over time, and if we want, we can also customise learning. In addition to the behavioural model, we include about 80 pre-trained detection models. These models resemble SIEM correlation models, and are used to detect targeted and known attacks like Kerberos ticket brute force attacks.
Tell us about false positives and negatives during your solution’s detections.
This is a hard question to address because our solution is based on behavioural analysis. We detect abnormal and more or less worrisome behaviours with respect to a baseline. To do this, we use a whole arsenal of algorithms, including unsupervised machine learning, and we qualify network symptoms. So, even if an abnormal connection is legitimate, it’s still unusual. Our solution is different from pre-established rules with IOCs and signatures where the result is binary, either true or false. Our solution is never binary and assigns a severity score.
How long does your solution take to learn behaviours and recognize "normal” behaviours?
We generally consider that it takes about a week to ten days of learning on the network, but in reality, it’s more correct to say that the solution is constantly learning. It can detect abnormal behaviours from the moment it is connected to the IS. It already detects abnormal behaviours within a few minutes, but it becomes truly functional after ten days. Our clients’ systems change, with new machines being added through mergers and acquisitions, integrations, etc. So, we have to be able to manage this. With this unsupervised learning approach, the system is constantly learning and analysing, generating net gains in efficiency, time and flexibility.
Concerning IS evolutions, what happens if a new machine is added? Does a new rule have to be defined to avoid generating an alert, or does your solution understand that this machine is legitimate by analysing its behaviour?
Darktrace detects a new machine on the network and analyses its behaviour to see if it performs a network scan or other abnormal activity, for example. If it does, an alert with the related information will be sent to the operator. However, human intervention is useful to determine whether the new machine is legitimate on the network. For example, if an administrator scans the network when a machine is added, the solution could consider this scan to be malicious. However, it highlights essential information for operator decision-making and saves a lot of time.
What about your "Threat Visualizer” tool?
The Threat Visualizer is the main interface providing total visibility of the network in 3D. Its purpose is to help contextualise something that is abstract. Contrary to what you might think, the aim of machine/deep learning is not to replace operators or human thinking. Currently, a tool like ours can’t take decisions to remedy attacks. However, it can guide the operator to take the right decisions by very quickly detecting whether an abnormal behaviour is legitimate or not. However, content is not considered, and the solution doesn’t know whether files are critical and/or are transferred or not. This is one of the current limitations of approaches like ours, simply because all IT businesses and their policies are different.
Do you have problems related to user data and new standards like GDPR?
We don’t use only user machines or the contents of packets transmitted on the network. Plus, data can be made anonymous, so we don’t have any problems at this level. And, unlike many other solutions, the data collected by our solution stay on our clients’ networks. We don’t transfer them to our servers for analysis. Our clients are reassured to know that the data remain on their networks.
How are you positioned with respect to your competitors?
We are the first to have developed a solution like this, and it is still a quite restricted environment, so we don’t have many competitors. Right now, we are pretty much a leader on this market.
What do you think of hackers who use deep learning to teach their malware to hide from this kind of solution?
Current threats are based more on bad internal practices, shadow IT, bad habits of system administrator or user behaviour. This is the reality on the market; there are few clients with internal security levels that allow focussing on greater threats. Some of our teams are exploring these subjects, but it’s not the heart of our solution.
What are your plans for the future?
AI attacks are one of our priorities; not necessarily very advanced attacks but ones showing very dangerous potential.