On 2025-07-15
by Emma Mullins, Cyber Defense Specialit - Intelligence
Cybersecurity

Cyber Threat Intelligence Part 3: Artificial Intelligence for Intelligence Collection

Cyber Threat Intelligence part 3: AI for intelligence collection
Summary

Reminder: Cyber Threat Intelligence

Cyber threats are continuously growing in complexity and frequency, therefore the ability to rapidly process and act upon Cyber Threat Intelligence (CTI) can mean the difference between a mitigated threat and a breach. In the first part of our CTI focused blog posts series, we introduced the Intelligence Production Cycle and proposed a functional and technical architecture for a Cyber Threat Intelligence platform integrated into and supporting both SOC and Incident Response (IR) operations. The second part centred on how Cyber Threat Intelligence can contribute and support the SOC’s Threat Hunting and Detection Engineering activities through a targeted Threat Actor Intelligence activity. In this third part, we will focus on this paramount stage of ‘Collection’ within the Threat Intelligence cycle, how we at Airbus Protect are enhancing this stage with Artificial Intelligence (AI), and some of the ways we see this being adopted in the constantly evolving Cyber Threat Intelligence landscape.

 

What is Intelligence Collection?

The collection phase in the intelligence lifecycle is a crucial step where raw data is gathered from various sources to answer specific intelligence requirements. This phase is designed to acquire the information that will later be analysed and processed to support decision-making, policy formation or security measure implementation. The primary goal is to gather relevant and actionable data from a variety of sources, ensuring that it directly supports the intelligence needs or objectives set in the earlier stages of the lifecycle (such as during the planning and direction phase). Ensuring the data is comprehensive, relevant and reliable for future analysis.

 

Types of Collection in CTI

The gathering of Cyber Threat Intelligence initially requires the identification of the sources. There are several methods or sources that can be used for collecting intelligence, some of which are described below:

 

  • HUMINT (Human Intelligence): Information gathered from human sources, such as informants, interviews, or undercover operations.
  • SOCMINT (Social Media Intelligence): Information collected from an aggregation of social media platforms.
  • ISAC (Intelligence Sharing and Analysis Centre): Intelligence derived from non-profit organisations that provide a central resource for gathering information.
  • OSINT (Open Source Intelligence): Information from publicly available sources, like newspapers, social media and online databases.
  • MISP (Malware Intelligence Sharing Platform): An open source Threat Intelligence platform used for sharing indicators of compromise.
  • Dark Web Marketplaces and Forums: Marketplaces usually used for cybercrime activity such as the sale of stolen data, malware services and tools and compromised user credentials. 

 

In order to successfully collect from this vast range of sources, an effective collection plan needs to be put in place. This involves intelligence analysts defining the specific information requirements that must be met as part of their overall outcome. This could involve focusing on a particular customer, region, target, or product. The collection plan also outlines the logistics of the phase such as how and where the data will be gathered, the resources and methods to be used, and who will be responsible for each collection activity.

 

What challenges do CTI experts face when they do Intelligence Collection? 

As anticipated with such a vast amount of data, the intelligence collection phase can be faced with various challenges, which can impact the overall effectiveness of the intelligence cycle. Some of the main problems with the process of collection in the intelligence cycle include:

  • Volume of Data

The sheer amount of data collected can be overwhelming. With the explosion of digital information (social media, open-source intelligence, surveillance data, etc.), CTI teams may face difficulties in sorting through massive amounts of information to find relevant data.
This can also leave important intelligence buried under an avalanche of irrelevant or trivial information, making it harder to identify actionable insights.

  • Reliability and Accuracy of Sources

Not all intelligence sources are equally reliable, and there can be challenges in validating the authenticity of the information. Some sources may be biased, misinformed, or deliberately deceptive. Unreliable sources can lead to false conclusions, which can influence decisions that affect a customer’s security or operations.

  • Access to Information

Some critical information might be inaccessible due to technological limitations, political barriers, legal constraints, or secrecy. Additionally, information may be intentionally concealed by adversaries or withheld due to internal policies. Limited access to key intelligence can hinder decision-making, preventing an accurate picture of the situation from emerging.

  • Data Overload and Analysis Paralysis

Intelligence agencies may face a situation of “analysis paralysis” where they gather so much information that they struggle to make sense of it all. This can happen when there are too many conflicting or ambiguous pieces of data. Slow or poor decision-making, as analysts become bogged down in trying to process and prioritise the collected data.

  • Timeliness

Intelligence collection needs to be timely to be useful, especially in fast-moving industries such as cybersecurity. However, the time it takes to collect and process intelligence can lead to delays. Delays in gathering intelligence can result in missed opportunities or response time lag, allowing threats to evolve rendering the original intelligence to be quickly redundant.

  • Security and Encryption Challenges

As intelligence collection becomes increasingly digital, securing communications and data becomes more difficult. Cyber threats, hacking, and leaks can undermine the collection process and also make the community in general less willing to share with their fellow security professionals. Adversaries may use cyber attacks to disrupt or deceive the intelligence cycle.

  • Legal and Ethical Issues

Collection efforts often need to balance national security interests with privacy and human rights. There can be legal limitations on the collection of data, in some cases these legal constraints can prevent the collection of certain types of intelligence, reducing the breadth and depth of the intelligence gathered.

  • Human Factors and Bias

As hard as they may try not to, analysts and agents involved in the collection process can introduce their own unconscious biases, either through personal opinions or pre-existing beliefs, into the collection process. Biases can distort the selection and interpretation of data, potentially leading to inaccurate intelligence and flawed conclusions.

  • Cost and Resource Constraints

Collecting high-quality intelligence, especially through human intelligence (HUMINT) or technical means (SIGINT, IMINT, etc.), requires significant resources in terms of personnel, technology, and funding. Insufficient resources or budget constraints can limit the effectiveness and consistency of intelligence collection, leading to gaps or inefficiencies.

 

Organisations have to retain visibility of existing, emerging and evolving cyber threats, implement appropriate proactive and reactive measures, as well as define effective mitigation strategies. The collection phase of the intelligence cycle is essential to this, as the data gets transformed into information and ultimately intelligence, the sheer volume of outputs drops significantly; but the value added from the output increases exponentially, as pictured in the figure below:

Cyber Threat Intelligence

 

However, addressing the collection phases inherent issues requires careful planning, investment in technology, and ongoing evaluation of methods and processes to ensure that the intelligence cycle remains effective and responsive to the ever-evolving nature of threats and adversaries.

 

Intelligence Collection Models

There are several models and frameworks that can help with the collection phase and some of the previously mentioned challenges of the intelligence lifecycle. These models and frameworks are designed to streamline the collection process, improve efficiency, and ensure that the intelligence gathered is both relevant and actionable. Below are just some of the key models and frameworks:

Model

Description

The Intelligence Cycle Model

A widely recognised framework, which typically includes five phases: Planning and Direction, Collection, Processing and Exploitation, Analysis and Production, and Dissemination. The collection phase is directly informed by the “Planning and Direction” phase, which defines the priorities, objectives, and requirements for intelligence gathering. A clear understanding of priorities helps to focus collection efforts, directing resources toward high-value intelligence.

 

OODA Loop

A decision-making model that can be adapted to intelligence collection. It is a continuous, iterative process used for adapting to dynamic and uncertain environments.

The collection phase can benefit from the OODA loop because it emphasises rapid adaptation to new information. The focus is on observation — gathering intelligence and interpreting it within the context of the operational environment, helping to make sure collection is agile and responsive.

Collection Management Framework (CMF)

Focuses on the management of intelligence collection. It provides a structured approach for organising and directing collection efforts, ensuring that resources are applied efficiently. CMF emphasises the prioritisation of collection requirements and ensures that the collection process aligns with intelligence needs. It can be used to create a collection plan, assign tasks, and track performance.

The 5Ws and 1H Model (Who, What, When, Where, Why, How)

This model is often used in journalism but can be applied to intelligence collection. It ensures that intelligence gathered addresses key aspects of a situation, event, or target. This framework helps to ensure that the collection efforts cover all the essential aspects of a target or operation, ensuring thorough and well-rounded intelligence collection.

SWOT Analysis Model

SWOT analysis is traditionally used for strategic planning and business analysis but can also be applied to intelligence collection. The model assesses internal and external factors influencing the intelligence collection process. By applying a SWOT analysis to the collection phase, intelligence teams can assess the strengths and weaknesses of their collection efforts, as well as identify opportunities to improve.

These frameworks and models can help optimise the collection phase of the intelligence cycle by focusing on efficiency, adaptability and comprehensive planning. They provide structured approaches to organising what at times can be an overwhelming amount of resources and aid in the selection of appropriate collection methods, provide better management of intelligence requirements, and decrease the impact of the challenges that can arise. By integrating elements from these models, intelligence teams can improve the effectiveness and timeliness of their collection efforts.

Artificial Intelligence Collaboration 

AI is becoming an integral part of cybersecurity, offering advanced capabilities to protect systems, detect threats, and respond to incidents more effectively. Just some of the ways we have seen AI being used for Cyber defences include machine learning for network anomaly detection, natural language processing used in phishing defences, deep learning for advanced malware analysis and automated security response procedures. 

 

Cyber Threat Intelligence is also an area of Cyber which has seen huge benefits from AI-powered tooling. AI can help security analysts by sifting through huge amounts of security data to identify Indicators of Compromise (IOCs) or Tactics, Techniques, and Procedures (TTPs) that could suggest a targeted attack. Threat Intelligence Platforms have also seen advancements through AI by adopting its ability to aggregate and analyse vast datasets from the multiple sources already discussed. 

 

The collection process in particular can benefit from the following machine learning advancements:

Data Ingestion

AI plays a big role in improving and automating data ingestion, especially when dealing with large volumes, unstructured formats or inconsistent data. Not only does it have the ability to enhance data collection and aggregation by automatically compiling Threat Intelligence from a range of sources into a unified format, the data itself can also be positively impacted. This could be through processes like automated extraction, data tagging or the ‘cleaning’ and normalisation of data. All of these functions can have a large impact on resources if conducted manually and if aided through AI can allow analysts to focus on the next stages of the intelligence cycle. 

Pattern Recognition

AI excels at pattern recognition by identifying hidden relationships, trends, or regularities in data, even when the patterns are complex or subtle. This ability is central to many AI applications and makes it highly valuable across various fields. This has the obvious advantage with cybersecurity and is commonly seen for anomaly detection. However, these benefits can also be transferred into data collection. AI’s ability to recognise complex patterns across a variety of data sources (network traffic, logs, endpoints, etc.) makes it invaluable for Cyber Threat Intelligence. Automated Threat Intelligence feeds for example powered by AI help cybersecurity teams stay ahead of evolving threats by continuously learning from new data and sharing actionable insights in real time. 

Data Cleaning

Data cleaning is the process of detecting, correcting, or removing inconsistent, or incomplete data out of a dataset as it’s being collected. Instead of dealing with time consuming data corrections after ingestion, AI models help clean or validate it in real time. This reduces manual effort, improves accuracy, and ensures smoother downstream processing for the further intelligence life cycle stages. This can include things such as duplication detection, multilingual conversions, missing value handling, anomaly detection such as negative values or dates in the future. All of these functions can be used to customise a Threat Intelligence team’s data collection process, resulting in higher quality data, resource savings, quicker processing times and overall greater consistent and relevant Threat Intelligence outputs.

These are just a few examples of how AI is revolutionising data collection by offering automated and highly intelligent mechanisms to combat its challenges. By automating the usual manual efforts of intelligence collection, AI enhances the efficiency and effectiveness of cybersecurity teams to process vast amounts of data in real time and helps them to stay one step ahead. 

 

Operational Insights – From an MSSP Perspective

We have already discussed some of the ways that Managed Security Service Providers (MSSPs), such as Airbus Protect, are increasingly leveraging AI to enhance their security operations, improve threat detection, automate responses, and streamline their services. However, from an Cyber Threat Intelligence perspective these enhancements have been seen in the daily tasks completed by our team of analysts. Smart Data Extraction for example is a huge benefit to our Threat Management Centre and to the collection phase of the intelligence life cycle by automatically extracting relevant information such as Indicators of Compromise out of a variety of sources such as PDF’s, images and webpages, even if the layout varies between each source. It has enabled us to provide real-time data collections which results in real-time intelligence. The AI used to filter through our data sources are truly crucial to our analysts work, the automated filtering out of noise or irrelevant content before it is stored for processing gives our team crucial time to prioritise, focus and investigate the relevant threats for our clients. Having AI built into our custom built tooling really amplifies our ability as a team to detect anomalies and spikes during collection too, making CTI outputs such as trend and pattern analysis a far less challenging topic to approach. Overall, within the collection process of CTI,  AI helps us to automate monotonous tasks, analyse large volumes of data, and make quicker, more accurate decisions, allowing Airbus Protect to deliver better security outcomes for their clients.

 

Considerations

Using AI in Cyber Threat Intelligence offers massive potential – but it’s not plug-and-play. There are some important considerations to keep in mind to ensure it’s effective, secure, and ethical. Data Privacy for example is a key consideration, AI models in CTI rely on massive datasets   logs, emails, endpoint behavior, etc. This can include sensitive or even personally identifiable information (PII). Models should be compliant, avoid model leakage and use pre-processing techniques such as anonymisation which can also be implemented. It is also evident that the processing powers of AI are not only used for defensive measure, but offensive as well. Threat actors can manipulate AI systems using adversarial techniques and can potentially poison training data and develop complex evasion attacks to bypass AI detections. Having an over reliance on such tooling can lead to fake or misinformation being fed into your datasets. This is why it is crucial for AI to be used to empower the Threat Intelligence Cycle, and not replace it. Analysis is not only still crucial of the processed data itself but also for the review, validation and improvements of the AI models in place. It is important for teams adopting such models that they understand what AI can and can’t do for them and not to rely on single sources. 

 

What is the Future of Artificial Intelligence in CTI?

The future of AI and data collection is incredibly dynamic and holds vast potential across numerous industries. As AI technology continues to evolve, we can expect major advancements in how data is collected, analysed and utilised. Cyber Threat Intelligence will be no exception to this, with AI-driven Threat Prediction and Advanced Threat Detection models AI will be able to detect even the most subtle indicators of compromise and tactics, techniques, and procedures used by advanced persistent threats and help organisations stay ahead of cybercriminals by taking preventive actions based on AI-driven forecasts. Further stages of the intelligence lifecycle will also see advancements in areas such as AI-powered Threat Intelligence sharing and collaboration. AI could help automate and enhance collaboration among different organisations, industries, and Threat Intelligence sharing platforms. AI systems will be able to automatically share relevant and actionable Threat Intelligence across platforms, whilst maintaining the required level of security, allowing for faster and more coordinated global responses to cyber threats. Simultaneously however, threat actors will continue to leverage AI to accelerate  malicious activities such as vulnerability discovery, personalised phishing attacks, and sophisticated evasion techniques for malware, creating an escalating technological arms race between attackers and defenders.

 

Want to learn more about Airbus Protect’s Managed Services? Click here.

Bibliography

CREST Registered Threat Intelligence Analyst Syllabus

https://www.paloaltonetworks.co.uk/cyberpedia/predictions-of-artificial-intelligence-ai-in-cybersecurity

https://www.ncsc.gov.uk/guidance/ai-and-cyber-security-what-you-need-to-know

https://www.forcepoint.com/blog/insights/ai-data-security-examples

  • Share