ION Markets » Blog » How ML can improve alarms classification to detect market abuse

General

How ML can improve alarms classification to detect market abuse

May 10, 2024

by Davide Bonamico and Serena Manti

Table of contents

A crucial balance in financial regulation

In the financial sector, market abuse — a collection of practices that undermine or illegally exploit market operations — threatens markets integrity and public trust. Through a rigorous screening enforced by the Market Abuse Regulation (MAR), adopted by the European Parliament and the Council of the European Union in 2014 and effective from 03 July 2016, institutions are tasked with monitoring transactions and are responsible for developing specific algorithms to identify anomalous patterns in data flows. The MAR, which applies across all EU member states, defines what an anomalous pattern is and the rules to apply for different abuse categories:

Insider Dealing
Market Manipulation
HFT
Cross Product Manipulation
Inter-Trading Venues Manipulation, and
Bid-Ask Spread

This is where market abuse detection (MAD) systems come into play, offering an advanced solution in this space.

A high number of alarms to review

In MAD, the algorithms’ parameters defining an unusual trading activity are set within certain ranges, and surpassing these limits prompts the system to raise an alarm. This alarm is then assessed by a person in charge of reporting verified market abuse instances to regulators. Firms must calibrate the parameter thresholds to detect real abuses while minimizing false alarms. Typically, the approach is conservative, leading to the generation of an excessive number of alerts that, upon review, are found to be harmless. This burdens compliance officers with a largely unnecessary workload and undermines confidence in monitoring tools.

A machine learning approach

At LISTGroup, we prioritize leveraging cutting-edge technologies, also in the realm of artificial intelligence (AI) solutions. Drawing on this commitment, in order to overcome the above challenge, a machine learning (ML) classifier was designed to be trained on the past behavior of compliance officers and to guide them for future decision-making. To do this, we analyzed a dataset of about 360,000 alarms generated by a European tier one bank over a year (table 1).

Pattern	Closed	Signaled	Total	Closed %
Large daily gross 19.3%	69855	177	70032	99.75%
Insider trading MAR 18.9%	68422	277	68719	99.96%
Pair crossed trades 12.7%	46028	4	46032	99.99%
Order frequency 5.9%	21568	22	21590	99.90%
Painting the tape 6.0%	21379	440	21819	97.98%
OTC off price 5.0%	18155	2	18157	99.99%
Daily gross leading price move 4.7%	16945	152	17097	99.11%
Daily net leading price move 3.6%	13006	88	13094	99.33%
Smoking 2.9%	10724	0	10724	100.00%
Unbalanced bid-ask spread 2.5%	9134	0	9134	100.00%
Crossed trades 2.3%	8202	323	8525	96.21%
Front running and tailgating 2.2%	7995	0	7995	100.00%
Decreasing the offer 1.7%	6330	2	6332	99.97%
Advancing the bid 1.7%	6017	17	6034	99.72%
Spoofing 1.4%	4803	193	5196	92.44%
Abusive squeeze 1.3%	4538	70	4608	98.48%
Momentum ignition 1.3%	4517	121	4638	97.39%
Marking the close 1.2%	4301	85	4386	98.06%
Quote stuffing 0.9%	3337	1	3338	99.97%
Creation of a ceiling in the price pattern 0.9%	3195	6	3201	99.81%
Creation of a floor in the price pattern 0.8%	2738	5	2743	99.82%
Attempted position reversal 0.8%	2564	315	2879	89.06%
Deleted orders 0.5%	1721	24	1745	98.62%
Position reversal 0.4%	1271	21	1292	98.37%
Front running 0.3%	1042	0	1042	100.00%
Wash trades on own account 0.3%	976	3	979	99.69%
Improper matched orders 0.2%	863	39	902	95.68%
Placing orders with no intention of executing …	422	0	422	100.00%
Pump and dump 0.1%	271	14	285	95.09%
Front running and tailgating OTC 0.1%	235	0	235	100.00%
Ping orders 0.1%	232	1	233	99.57%
Inter trading venues manipulation 0.0%	133	0	133	100.00%
Trash and cash 0.0%	70	4	74	94.59%
Underlying leaded cross product manipulation 0.0%	1	0	1	100.00%

The dataset is labeled with two categories: Closed and Signaled (that is those sent to the regulators). The number of raised alarms is roughly 360k, but as you can see the majority were closed upon reviewing as not actually related to a fraud.

Each alarm is represented by a set of relevant features, including both numerical and categorical variables. In Table 2, we show a subset of the features available for the Large Daily Gross (LDG) pattern, which refers to a significant or unusually high level of trading activity or gross trading volume observed within a single trading day in financial markets.

Field	Values
TRADING_TYPE	Professional
CLIENT_TYPE	Professional
ENTITY_CATEGORY1	None
ENTITY_CATEGORY2	S02
INVOLVED_INSTRUMENT_ASSET_CLASS	Equity
REFERENCE_MIC	MTAA
NUMBER_OF_INVOLVED_ORDER	34
NUMBER_OF_INVOLVED_TRADES	76
NUMBER_OF_ALERTS_SAME_CLIENT	448
NUMBER_OF_ALERTS_SAME_CLIENT_INSTRUMENT	6
NUMBER_OF_ALERTS_SAME_CLIENT_INSTRUMENT_TYPE	1
Absolute % intraday price delta_ACTUALVALUE	28.63
High day price_ACTUALVALUE	0.656
Low day price_ACTUALVALUE	0.51
Market daily volume_ACTUALVALUE	541637
Min % daily volume_ACTUALVALUE	255.11
Min % exec daily volume_ACTUALVALUE	55.53
Min daily volume_ACTUALVALUE	None
Minimum value of orders/trades_ACTUALVALUE	None
Minimum value of single order/trade_ACTUALVALUE	None
Value of orders_ACTUALVALUE	834162.69
Value of trades_ACTUALVALUE	173260.75
Volume of orders_ACTUALVALUE	1381786
Volume of trades_ACTUALVALUE	300755

Crafting the most suitable classifier

The primary challenge of training our model comes from the significant imbalance within our dataset, with Closed alarms far outnumbering Signaled ones. This imbalance introduces a risk of asymmetric classification errors. Mistakenly closing an alarm that should have been signaled, that is the False Negative Rate (FNR), could have serious regulatory repercussions. Conversely, incorrectly signaling an alarm that should be closed, that is the False Positive Rate (FPR), is less impactful although still undesired. Therefore, we chose to use a Random Forest Algorithm (RFA) due to its proficiency in managing imbalanced datasets and its robustness against overfitting. Balancing these two quantities is our primary objective in the model fine-tuning process.

Decoding ML classification: a look at metrics

The effectiveness of our RFA is measured by two critical customized metrics: RISK and WORK.

RISK quantifies the chance of mistakenly closing an alarm that should be signaled. Considering 1000 alarms, it is expressed as:

RISK = 1000 × P(RFA=clo) × P(sig|RFA=clo),

where P(RFA=clo) is the probability that RFA predicts ‘closed’, and P(sig|RFA=clo) is the conditional probability of an alert being signaled, given the RFA predicting ‘closed’.

WORK, on the other hand, calculates the residual alarms that require manual checks by the Compliance Officer, and is defined as:

WORK = 1000 × P(RFA=sig).

Here, P(RFA=sig) represents the probability of RFA classifying an alarm as ‘signaled’.

Results

Here, we present the results yielded by the classifier and their meaning for day-to-day MAD activities.

Our experiments consist of randomly splitting the data into training and test sets; training the RFA; and then measuring the performance using a confusion matrix (that is a box with true classes on the rows and predicted classes on the columns) to calculate our key metrics, RISK and WORK. The results, for a total of 100 experiments, are shown in the above box plots (RISK on the left and WORK on the right), where each green triangle marks the average of all the experiments for each pattern.

The findings are promising—employing ML allows us to reduce the number of alarms from 30,000 to roughly 2,593, with an average RISK of missing about 10.5 market abuse cases in a month. While this may seem significant, a closer inspection reveals that such risk is mostly associated with complex patterns like “Insider Trading” or “Painting the Tape”, which collectively account for half of the total risk.

Efforts are ongoing to enrich the dataset with relevant features we could have missed to further refine our model. It is worth noting that the strength of this supervised learning approach in MAD is intrinsically linked to the availability of historical data. A robust history of classified data is critical for the algorithm to effectively identify, and adapt to, various abuse patterns.

This solution equips compliance officers with a tool to prioritize their attention on specific alarms. Integrated into a streamlined system that manages the generated alarms, it allows for a significant working time reduction. Below we show an example of our implementation in the LookOut product.

On the left part of the screen, you can see the status of the alert and the relevant metrics returned by the ML classifier for each alarm, such as the suggested classification, a risk measure, and the model confidence level. Then, selecting a specific alarm, a section appears on the right side containing all the relevant information about the generated alert, which can be exploited to carry out a deeper investigation.

In conclusion, we have shown the promising role of ML in addressing the increasingly resource-demanding task of market abuse detection, significantly lightening the compliance officer’s load.

A common concern with deploying ML in real-world scenarios is the perceived opacity of the models during their training phase, which can lead to skepticism about their decisions. Stay tuned for our upcoming posts, where we’ll tackle the topic of ML transparency, exploring methods to enhance the classifiers’ interpretability and demystifying the processes behind these powerful tools.

The content is adapted from this series on the LIST team’s blog.

Don't miss out

Subscribe to our blog to stay up to date on industry trends and technology innovations.

Davide Bonamico is a Senior Quantitative Analyst at List with seven years of experience in financial engineering and artificial intelligence. He employs a data-driven approach, focusing on implementing innovative solutions to enhance the company's products.

Serena Manti is a Senior Quantitative Analyst at LIST with a eight-year background in AI and financial engineering and is expert in writing both technical and informative content to share the team’s research.