Automating the SOC – Towards AI-Based Incident Response in the Factory of the Future
The Security Operations Centre (SOC) is an established service for continuous protection of companies against cyber threats, and it will extend to cover industrial environments in the near future.
However, with the increasing activity of threat actors as well as their growing resources, knowledge and skills, cyber security is a moving target. This is especially true for industrial systems which have evolved from isolated proprietary enclaves to integrated, interconnected, and distributed digital manufacturing environments with a significantly more exposed attack surface. The defender must always be one step ahead, and thus the evolving threat landscape must be met with corresponding technological and organisational advances, especially in the SOC. To protect its customers in the face of this rising menace, and to help them achieve regulatory compliance, we, at Airbus CyberSecurity, develop cyber defense innovations in close cooperation with European industry and academia.
Automation of the SOC: Enabler for future security and resilience
Upon detecting malicious activity, the SOC typically runs through the following distinct phases that altogether comprise the incident response process: detection, decision, and response. While significant advances have been made towards automating the detection phase of incidents, the subsequent decision and response phases remain dominated by manual analyst work. Even though it is currently impossible to achieve total automation and replace human expertise completely, there is still significant potential for unburdening analysts of the drudgery of tasks that could be performed by a machine. With such support, analysts could focus on tasks where human capacities are actually critical, such as decision making.
This optimisation of resources is an indispensable factor in preparing the SOC for its future challenges, because apart from the growing threat landscape, the scale of the number of endpoints and infrastructure to monitor is increasing considerably. The Industrial Internet of Things (IIoT) will inevitably establish a significantly larger baseline of devices to be protected, which is especially true for the industrial context. In addition, communication networks will experience a generational leap in available bandwidths, for instance with the introduction of 5G. As a result, the incident generation and reporting rates will increase exponentially with respect to the growing number of devices, thereby moving the traditional incident response process beyond the scope of manual feasibility. Since an exponential increase of work load cannot be addressed with a linear growth of work force, technology and process innovations are required.
Starting with low hanging fruits: Automated data acquisition and enrichment
Our approach of advancing the state-of-the-art of incident response towards automation is twofold, and suggests a gradual increase of automation across all SOC processes. As a first step, we develop automation for tasks which are currently performed by analysts but can be assigned to a machine that would handle them not only correctly but also more efficiently, e.g. resolving hostnames to IP addresses. At this step of automation, the analyst keeps the handling authority over incidents, but is supported with automated acquisition, enrichment, and presentation of data which is necessary for decision making (see Figure 1). Currently, analysts need to acquire a significant part of this data manually, when they could instead focus their time and attention on actual decision making to increase productivity. This initial step of automation is further supported with incident playbooks that are based on stream-lined response processes. These playbooks are traversed automatically upon incident creation, i.e. conditions are checked, actions are taken or assigned, and analysts are involved over respective interfaces.
Figure 1: Semi-automated incident response
Our ambition: AI-based incident response for IIoT scalability
The next and more challenging step of our approach towards automation consists of assisting and potentially replacing analysts in terms of the decision making authority. However, this does not make the role of the analyst redundant. Instead, the work and focus of analysts shifts from performing all decision-making themselves towards monitoring Artificial Intelligence (AI) which handles the bulk of the decisions. During the training phase of the AI system, analysts review all its decisions, and provide feedback on their correctness as well as responsible parameters in case of failure. With time and data, the knowledge base of the AI grows, and as a result the number of incorrect decisions declines. At this operational phase, analysts can continuously review sample sets of AI decisions, and are able to take over incident handling at any time if required, and perform the job as usual. In that regard, the future analyst role will not require differing but additional skills and responsibilities, such as knowledge of machine learning and prevention of adversarial learning (see Figure 2).
Figure 2: Automated AI-based incident response
Due to the resulting scalability of incident handling, the SOC will be ready to face a potentially exponential increase in the rate of generated incidents which will be produced by IIoT and other technological advances in the monitored networks. The use of machine learning for incident response appears to be especially promising for the industrial sector, where networks and processes display recurring patterns by nature, and are thus highly predictable.
Moreover, the handling of most incidents by an AI system will enable the detection of cross-incident and cross-customer attack patterns and insights. As of now, related incidents are likely handled by different analysts, and therefore no consideration is given as to whether these incidents are correlated or belong to the same attack. If this correlation analysis were to be done by analysts manually, the overhead work would likely increase exponentially with the number of incidents, rendering the entire effort infeasible. However, for an AI system that has knowledge of all incidents the additional workload would ultimately just require increased computing power, which is now both more available and affordable due to the emergence of cloud computing and computing-as-a-service.
Our portfolio and research: A comprehensive story
Airbus already uses AI systems within manufacturing systems to perform traditionally manual tasks, and allow humans to focus their expertise on monitoring, maintaining, and improving the AI, thus significantly scaling up efficiency, productivity, and skill sets. From a conceptual point of view, the same approach will become indispensable for the SOC to handle the growing numbers of devices, incidents, and attacks in the near future. Even though incident detection, handling, and response are intellectually complex tasks, most cyber-attacks fall into specific categories, follow corresponding actions, and disclose detectable patterns. We at Airbus CyberSecurity have the ambition to develop AI systems capable of addressing these challenges using a baseline of consolidated and automated processes, a well-trained AI system with a comprehensive knowledge base, and the ability to hand over execution to analysts at any time.
This innovation towards automated and AI-based incident response supports our portfolio development for a dedicated SOC for Operational Technologies (OT), and leverages synergies resulting from our contributions to broader European research efforts regarding collaborative manufacturing and collaborative SOC architectures.
Find out more about our offering