An End-to-end Project-based Approach to Teaching Data Mining Process

class: center, middle, inverse, title-slide

.title[
# An End-to-end Project-based Approach to Teaching Data Mining Process 
]
.subtitle[
## A Case Study in Credit Card Fraud Detection 
]
.author[
### Cheng Peng 
]
.institute[
### West Chester University of Pennsylvania 
]
.date[
### 05/14/2022 Presented at eCOTS 2022: Teaching Data Mining Slides available at: 
]

---

class: inverse, middle
## Agenda

### Learning from Learning Theories

- Learning Theories

- Pedagogical Strategies

### Case-study: Credit Card Fraud Mining

- Fraud Background

- Analytic View of Fraud and Challenges

- Feature Extraction

- Analytic Fraud Identification Methods and Assessment

- Deployment and Automation

---
class: inverse, center, middle

# Learning from Learning Theories

---
class: center, middle

# Teaching DM Process vs Techniques

Cross-Industry Standard Process for Data Mining (CRISP-DM)

---
# Learning from Learning Theories

There are many learning theories. They all fall under the three major theories.

-	**Behaviorism Learning Theory**: knowledge is independent and on the exterior of the learner. It focuses on the outside environment’s influences on learning.

-	**Cognitive Learning Theory**: processing information received rather than just responding to a stimulus as in behaviorism learning theory. It uses metacognition - “thinking about thinking”—to understand how thought processes influence learning .

-	**Constructivism Learning Theory**: constructing learning new ideas based on the prior knowledge and experiences through active engagement with the world (such as experiments or real-world problem solving)

---

# Some Principles of Constructivism Theory

I am a firm believer of constructivism learning theory.

-	Knowledge is constructed. This is the basic principle, meaning that knowledge is built upon the foundation of previous learning.

-	Learning is a social activity. Learning is something we do together, in interaction with each other, rather than an abstract concept.

-	There is no knowledge independent of the meaning attributed to experience (constructed) by the learner, or community of learners.

-	Learning is contextual: we do not learn isolated facts and theories that are separated from the rest of our lives.

-	Motivation is key to learning. Cognitive motivation is rooted in the availability of information and past experience/ prior knowledge.

---
# My Adopted Pedagogies in Teaching Analyics

-	Providing experience with the knowledge construction process - students determine how they will learn.

-	Providing experience in and appreciation for multiple perspectives - evaluation of alternative solutions.

-	Embedding learning in realistic contexts - authentic tasks.

-	Embedding learning in social experience – collaborative learning.

-	Encourage awareness of the knowledge construction process - reflection, metacognition.

-	Facilitate students to make sense of information presently available and in determining how to respond or relate to the current situation.

---
class: inverse, center, middle

# Case Study

### Credit Card Fraud Detection

---

# Adapt CRISP-DM for Fraud Mining Process

---

# Credit Card Transaction Process

---

# What is Credit Card Fraud?

**Credit card fraud** is a form of identity theft that involves an unauthorized taking of another’s credit card information for the purpose of charging purchases to the account or removing funds from it.

**Credit Card Fraud Types**: Credit card fraud schemes generally fall into one of two categories of fraud: application fraud and account takeover.

- Identity theft

- [Skimming Fraud (a kind of account takeover)](https://www.youtube.com/watch?v=G_aH50Tn8Fo)

**Why Combat Credit Fraud Loss**:  Card fraud over the next decade will cost the industry a collective $408.50 billion in losses globally, according to an annual report from the industry research firm Nilson Report.

---

# Fraud Data Generation Process & Availability

Pre-authorization: timestamp, geo-info of POS, Card information (card number, expiration date, billing address, security code)

Authorization: Pre-auth info + requested payment amount

Authentication: the issuing bank will

- verify the authorization information sent from the processor: validating card info and checking the availability of funds (credit line); and

- send the result of the authentication to the merchant: approval or denial.

- The merchant will send the complete transaction information to the issuing bank or the processor.

---

# Fraud Data Generation Process: A General Fraud Management System

---
#	Availability and Types of Data

Based on credit card processing and the general fraud detection system, The following information is available in different processing stages:

-	**Pre-authorization Data**: geo-info of POS, timestamps, card information.

-	**Authorization and Authentication Data**: pre-auth info + payment info.

-	**Historical Data**: complete transaction information (at least 2 years back), confirmed fraud (labels), account information, etc.

- **Other Publicly Data**: crime rate,

---

# Data Preparation - Collection

**Goal**: detect/identify fraudulent transactions.
 
**Challenges**:

- No information about fraudsters!

- Real-time detection.

- rarity of fraud.

**What information is relevant?**

- Current transaction: card info, timestamps, amount, POS info.

- Historical transactions: timestamps, amount, POS info, fraud labels.

- Account information: Card holder’s info.

- Derived merchant site info (including publicly available info).

---
#	Creating Analytic Data According to Potential Analytic Methods

-	**Key Point**: Fraudulent activity alters genuine customers’ spending patterns!

-	**Cross-sectional Data**: current transactions.

-	**Longitudinal /Panel Data**: current and historical transactions

-	**Hybrid Cross-sectional and Longitudinal Data**: both current transactions and aggregated information of historical transactions

---
#	Types of Candidate Models/Algorithms

-	Business rules (expert system).

-	Supervised classification models/algorithms (need to handle the issue of the rarity of fraudulent transactions) – using fraud labels to train models (index will be the most powerful predictor variable): logistic and tree-based classification models/algorithms.

-	Unsupervised anomaly detection methods – using the distribution to detect fraud: high quantile along with operational constraints.

-	Other probabilistic models/algorithms such as HMM.

---
#	Fraud Index Based on Historical Transactions

How fraudulent activities alter genuine customers’ spending patterns.

-	The transaction dollar amount is significantly different from that of genuine customers.

-	The genuine customers spending frequency will be changed.

-	The genuine customers’ transaction gap times (time between consecutive transactions) will be changed.

---
# What is Process Capability Index (PCI)?

Process capability compares the output of an in-control process to the specification limits by using capability indices.

-	If the PCI of a process is under a threshold, the process is incapable.
-	There are different PCIs for different processes (main manufacturing processes).
-	USL and LSL need to be estimated (there are different estimation methods).

---
class:inverse, center, middle

# A Numerical Example

### Data Layout, Candidate Models and Algorithms

---

#	The “Capability” of Customers’ Spending Process – Fraud Index

For illustration, we define a fraud index using payment dollar amount to define the fraud index as shown in the following figure.

---

# Pre-processed Data (Long Table)

---

# Data Matrix

---
class: inverse, center, middle

# A PCI-like Fraud Index Using Payment Amount

## 
`$$idx=\frac{(USL-\mu)^2}{9(\mbox{max} - \mu)^2+(T-\mu)^2}$$`

### USL, T: Estimated from the larger data.

### max, `$\mu$`: Estimated from the smaller data.

### Sample sizes of both data sets are tuning parameters

---

# How Fraud Index Works in Fraud Detection

---

# Distribution of Resulting Fraud Index

- The above figure indicates that the fraud index can be used as a standalone fraud detection algorithm with no structural parameters - an unsupervised anomaly detection.

---

# Performance Analysis

- Consideration of multivariate fraud index to incorporate gap time and spending frequency to boost the discriminatory power of the index.

---
# Supervised Algorithms and Models

Fraud index will be used as a feature variable.

Models and algorithms need to account for imbalance labels.

- Firth penalized logit models.

- King and Zeng's rare event logistic model.

- Qing's semi-parametric logistic model.

- penalized tree-based algorithm (including BAGGING. RF is not an option for this particular case).
 
- regular logit models based over-/under sampled data.

- asymmetric-link GLMs.

---
class: inverse, center, middle

# Deployment / Monitoring and Updating

---
class: inverse,center, middle

# Thanks!

Slides created via the R package [**xaringan**](https://github.com/yihui/xaringan).