Page 1 of 3

ADHD: Adaptive Design Honeypot Deployments

Kate Highnam

Imperial College London

London, UK

Zach Hanif

Independent Researcher

Washington, D.C., USA

Ellie Van Vogt

Imperial College London

London, UK

Sonali Parbhoo

Imperial College London

London, UK

Sergio Maffeis

Imperial College London

London, UK

Nicholas R. Jennings

Loughborough University

Loughborough, UK

ABSTRACT

Traditional honeypot deployments expose vulnerable systems for

extended periods of time in large quantities to gather empirical

information on intrusion techniques. This deployment strategy can

be too slow to respond to emerging threats and provide opportu- nity for attackers to develop detection techniques on the honeypots

employed. In this poster, we present a novel approach for honeypot

deployments that optimises the allocation of resources (based on

events seen) for the purposes of proving or disproving an initial

hypothesis about the environment. Our adaptive design (AD)

honeypot deployment is inspired by the clinical trial community

version: a variant of a randomised control trial (RCT) to measure

how a particular “treatment” affects a population. While more re- strictive than the breadth of questions a traditional deployment

could answer, our directed approach quickly answers specific ques- tions such as “Does inserting this API vulnerability increase the

chances of seeing exploits on the SQL server?” and “Are attackers

constantly exploiting misconfigured cloud instances across cloud

regions in the U.S.?”. We run a study to answer the latter question

to compare the RCT, AD, and traditional deployment methods. By

conducting studies with a control, we uncover (with high confi- dence) the cause-and-effect relationship a vulnerability has on the

likelihood of system exploitation.

CCS CONCEPTS

• Security and privacy → Vulnerability management; Network

security.

KEYWORDS

datasets, honeypots, randomised control trials, security, intrusion

research

ACM Reference Format:

Kate Highnam, Zach Hanif, Ellie Van Vogt, Sonali Parbhoo, Sergio Maf- feis, and Nicholas R. Jennings. 2023. ADHD: Adaptive Design Honeypot

Deployments. In Proceedings of The 28th European Symposium on Research

in Computer Security (ESORICS ’23). ACM, New York, NY, USA, 3 pages.

https://doi.org/XXXXXXX.XXXXXXX

Permission to make digital or hard copies of all or part of this work for personal or

classroom use is granted without fee provided that copies are not made or distributed

for profit or commercial advantage and that copies bear this notice and the full citation

on the first page. Copyrights for components of this work owned by others than ACM

must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,

to post on servers or to redistribute to lists, requires prior specific permission and/or a

fee. Request permissions from permissions@acm.org.

ESORICS ’23, September 25–29, 2023, The Hague, The Netherlands

© 2023 Association for Computing Machinery.

ACM ISBN 978-x-xxxx-xxxx-x/YY/MM. . . $15.00

https://doi.org/XXXXXXX.XXXXXXX

1 MOTIVATION AND METHODOLOGY

Traditional honeypot deployments - or “vanilla” deployments as

we will call them in this paper - expose a large number of identical

vulnerable systems for a particular (long) length of time [1, 4, 5, 7].

While sufficiently large and long-lived vanilla deployments all but

guarantee observations and can summarise the general state of

automated threats, they carry several risks and costs that may

be sometimes unacceptable. For example, leaving a meaningful

quantity of identical honeypots online provides opportunity for

adversaries to identify the presence of monitoring tools employed

within the honeypot. This can hinder observations and render the

tools useless, i.e., when adversaries stop acting after detecting active

monitoring or debugging tools. Additionally, large scale deploy- ments cost time and money, which absorbs budget, and can hinder

or preclude timely observations.

As an alternative to the vanilla deployment, we propose to save

resources and limit exposure by directing honeypot studies to an- swer pre-specified critical security questions about the current state

of the environment. Our new deployment strategy answers “What

if...?” and “Why...?” questions using a control group, a set of hon- eypots deployed that remain unchanged while others are altered.

This allows us measure the effect of our change on the environ- ment. These are slightly adapted techniques from those used to

design clinical trials. See Table 1 for some of the terminology from

healthcare mapped to security as it is used to define our work.

In healthcare, a typical control group study follows the design of

a randomised-control trial (RCT), the gold standard for clinical

trial methods. RCT is used to minimise the impact of researcher

biases while evaluating causal relationships [3]. Our method is

based on adaptive design (AD), a variant of RCT that incorpo- rates pre-planned opportunities to modify aspects of an ongoing

trial in response to data accumulated during the study, without

invalidating its integrity [6, 8, 10]. Unlike the healthcare AD, we

use the Kaplan-Meier function to calculate the likelihood of an

event (i.e., a honeypot being exploited) to encourage infection for

intrusion research purposes. As shown in Figure 1, RCT and AD

both account for known conditions and unforeseen events (e.g., a

pandemic or war) which might require the trial to end early by

separating the trial into multiple stages to run interim analysis. We

show these methods can significantly reduce the cost of the study

while answering security questions through preset objectives.

In this paper, we present the first control-based deployment

strategy for honeypots that optimises resource allocation and limits

honeypot exposure. The strategy is then employed in an exemplary

study to determine the impact of an ssh vulnerability on cloud

servers across the United States, comparing our AD to vanilla and

Page 2 of 3

ESORICS ’23, September 25–29, 2023, The Hague, The Netherlands Highnam, et al.

Table 1: Mapping of healthcare terminology to security ter- minology for control trials deploying honeypots.

Healthcare Security

“trial” “a study comparing honeypots

with and without a vulnerability”

“study population” “a collection of individual

honeypots”

“patient”, “participant” “a honeypot”

“recruiting more subjects” “starting more honeypots with

specific characteristics”

“infection” “exploitation”

“disease” “attacker technique for exploit”

“intervention”, “treatment” “corruption” or “the presence

or insertion of a vulnerability”

“treated” “corrupted”

RCT deployments. We find that AD reduces costs and can answer

a directed question in a shorter amount of time while limiting

the likelihood of error. This study ran using automated scripts,

presenting the first automated control study.

2 EXPERIMENTS AND RESULTS

We compare the vanilla, RCT, and AD in separate honeypot de- ployments (i.e., trials) for the same study - following Figure 1. Our

study population are inactive cloud servers with no applications or

connections besides our monitoring in an isolated Docker container.

Our corruption for the study is an ssh vulnerability, altering the

honeypots to accept any password for four fake IT user accounts.

Because we never login, any user login seen is considered mali- cious. Our control group only accepts “password” as the password

for the same user accounts, which we know are also scanned for

by attackers [4]. Our hypothesis for the study is that the chosen

corruption significantly increases the likelihood of exploitation in

the U.S. during local working hours. This closely follows the work

of Highnam et al. [2], which contains the dataset used as our pilot

study and honeypot setup.

Each trial took approximately 12 hours; in the multi-stage tri- als (RCT and AD), the duration is the same but divided into three

four-hour stages. After each stage, AD uses infection rates to de- termine allocation proportions by region. RCT maintains the same

allocation proportions, redeploying the same number of control

and corrupted honeypots unless it met a predefined early stopping

criteria. The honeypots are deployed in one of four cloud-provided

regions geographically located within the U.S. Each region has dis- tinct IP ranges, so IP scanning or region specific attacks should be

observable in the logs.

Over the 12-hour vanilla trial which deployed 140 corrupted

honeypots, there was no obvious pattern in the order of the IP

addresses hit and only 3 honeypots remained unexploited by the

end. Following power analysis [9] to limit error, the RCT deployed

Vanilla

Deploy triple

the given

Population Size

(3Ntotal ):

Corrupted only

Wait for

Trial

Duration

(3 ⨉ Stage

Length)

Calculate Power Analysis for Population Size per Stage (Ntotal )

Deploy the given Population Size (Ntotal ):

Equal parts control and corrupted

End Trial

Wait for Stage Duration

Randomized Control

Time to

Stop?

Yes

No

Adaptive Design

Gather Events from

Stage Logs

Calculate new stage

deployment

allocations

Update proportions

of incidence

Deploy new

allocations

Wait for Stage

Duration

Time to

Stop?

Yes

No

Start Trial

Set initial parameters

Figure 1: Flow diagram of the three trial designs documented.

“Time to Stop?” step is when the trial could end early.

48 honeypots at each stage, split equally between regions and cor- ruption status (control or corrupted). The AD trial started the same

as the RCT, but in the later stages it automatically updated the

allocation proportions to request fewer honeypots overall (AD: 119,

RCT: 144). Overall, the AD trial viewed more exploitations than in

the RCT (AD: 50, RCT: 42).

This study demonstrates that while recording more intrusions

in observational studies (i.e., in the vanilla trial), the presence of

the control group (as in RCT and AD) enables us to identify the cor- ruption effect. Our AD shows it is capable of confirming corruption

effect at a cheaper and quicker rate than RCT. Had the difference

due to the corruption been less apparent a priori (e.g., in alter- ing multiple points of entry or limiting sequences of vulnerability

exploits), deploying control honeypots amongst the corrupted pro- vides counterfactual information for confidence in the corruption’s

(causal) effect.

Page 3 of 3

ADHD: Adaptive Design Honeypot Deployments ESORICS ’23, September 25–29, 2023, The Hague, The Netherlands

REFERENCES

[1] Samuel Kelly Brew and Emmanuel Ahene. 2022. Threat Landscape Across Multi- ple Cloud Service Providers Using Honeypots as an Attack Source. In Frontiers in

Cyber Security: 5th International Conference, FCS 2022, Kumasi, Ghana, December

13–15, 2022, Proceedings. Springer, 163–179.

[2] Kate Highnam, Kai Arulkumaran, Zachary Hanif, and Nicholas R. Jennings. 2021.

BETH Dataset: Real Cybersecurity Data for Unsupervised Anomaly Detection

Research. The Conference on Applied Machine Learning in Information Security

(CAMLIS) (2021).

[3] Sherilyn Houle. 2015. An introduction to the fundamentals of randomized con- trolled trials in pharmacy research. The Canadian journal of hospital pharmacy

68, 1 (2015), 28.

[4] Christopher Kelly, Nikolaos Pitropakis, Alexios Mylonas, Sean McKeown, and

William J Buchanan. 2021. A comparative analysis of honeypots on different

cloud platforms. Sensors 21, 7 (2021), 2433.

[5] Stefan Machmeier. 2023. Honeypot Implementation in a Cloud Environment.

arXiv preprint arXiv:2301.00710 (2023).

[6] Philip Pallmann, Alun W Bedding, Babak Choodari-Oskooei, Munyaradzi Di- mairo, Laura Flight, Lisa V Hampson, Jane Holmes, Adrian P Mander, Lang’o

Odondi, Matthew R Sydes, et al. 2018. Adaptive designs in clinical trials: why

use them, and how to run and report them. BMC medicine 16, 1 (2018), 1–15.

[7] Niels Provos and Thorsten Holz. 2007. Virtual honeypots: from botnet tracking to

intrusion detection. Pearson Education.

[8] Nigel Stallard, Lisa Hampson, Norbert Benda, Werner Brannath, Thomas Burnett,

Tim Friede, Peter K Kimani, Franz Koenig, Johannes Krisam, Pavel Mozgunov,

et al. 2020. Efficient adaptive designs for clinical trials of interventions for

COVID-19. Statistics in Biopharmaceutical Research 12, 4 (2020), 483–497.

[9] Kristian Thorlund, Shirin Golchi, Jonas Haggstrom, and Edward Mills. 2019.

Highly Efficient Clinical Trials Simulator (HECT): Software application for plan- ning and simulating platform adaptive trials. Gates Open Research 3 (2019).

[10] CH van Werkhoven, S Harbarth, and MJM Bonten. 2019. Adaptive designs in

clinical trials in critically ill patients: principles, advantages and pitfalls. Intensive

Care Medicine 45, 5 (2019), 678–682.

Received 21 July 2023