Project 3 – Fraud Detection

This report considers how Machine Learning can provide a solution for the issue of fraud such that affects COMPANY as a Share registrar and Pension service provider.

In the first part of the report AWS Sage Maker is outlined as a potential cloud architecture for creating a fraud detection model, with the second part of the report showcasing a fraud detection deep learning artificial neural network artifact created using public simulated dataset of credit card transactions, which establishes the principles and approach needed to apply similar model for COMPANY specific problems and datasets.

Data and privacy remain one of the key considerations and a plan for safe implementation within COMPANY is presented, whereby the PROD data could be used within COMPANY’s own infrastructure and system to first generate an anonymized or even fully synthetic / simulated dataset. Once such a data model is acquired, it is uploaded to a Cloud Architecture of choice, where a model is trained, tested, validated, and adjusted until it produces satisfactory results. Such a model is then moved on-premises and can be used to tackle real-world scenarios using PROD data.

Implementation of such a fraud detection system would need to be divided into several stages, based on typical business use cases and complexity of the ML solution required.

First stage, which is showcased in the artifact using public dataset, would focus on transactions – as units for analysis – with the neural network looking at features of a transaction itself for patterns indicating fraud. Future stages / updates could see an Ensemble model (a model made up of many specialist models) take a holistic view of an account or entire dataset to spot Anomalies and/or patterns in the data and could be used to spot fraud type or fraudster gangs.

The artifact is developed in Google Collab using TensorFlow, with a baseline model contrasted against progressively mode advanced experiments, allowing for objective comparison of model’s improving ability to predict and generalize. Focus is placed on data pre-processing, which is shown to have strong influence on the model’s ability to learn patterns. The results achieved indicate that COMPANY will best be served by generating its own dataset and supplementing it with simulated data, where each feature can be thought through and controlled, so that the model has best chance to find patterns.

The volume of data is not at issue in such a scenario as Transfer Learning can be applied, where a small dataset can be used to fine-tune an established State of the Art model in the Fraud detection category to specialize it for COMPANY data.