IMI BigDataAIHUB Case Competition下载比赛规则与我们小组的团队代码:https://github.com/ChenhaoZhu/detecting_FinancialCrime.git比赛信息概述:二分类与多分类,同时帮助加拿大ScociaBank银行寻找潜在的洗钱组织,该竞赛项目很适合给数据科学的小白或者刚刚入门者来尝试。MotivationFinancial Institutions are used by criminals to “legitimize” money generated from criminal activities, hence banks are expected by society and regulators to act proactively in detecting and stopping criminals from using the financial system to access their money through effective monitoring and reporting to authorities.ComplicationNumber of clients and transaction volume are large and only expected to grow in the futureDiminishing returns of adding more headcountCriminals are constantly innovating on ways to use financial system to “legitimize” their fundsCase GoalUse suitable data analytics tools and techniques to help Scotiabank detect financial crimes:Find known high risk people in our customer base using public dataScore clients according to their likelihood of being involved in Money Laundering using transactional dataEnhance scoring and visualize networks using connections between clientsCase TasksName Screening: Detect 50 Bad Actors in Scotiabank's customer base using public data sources.Risking Rating: Classify customers into Low, Medium and High risk, and predict Bad Actors.Improve Model using Graph Data: Add customer connections information to improve risk rating model or use a graph model directly.Data SourcesRaw DataUofT_nodes.csv: contains KYC, Transactional data and Risk RatingUofT_edges.csv: contains connections between customers i.e. amount of money sent via EMT from one customer (source) to another (target)UofT_occupation_risk.csv: contains mapping between an occupation (code) to their risk level of being involves in financial crimestargets.simple.csv: contains a list of Bad Actors from https://www.opensanctions.org/datasets/default/Intermediate Datamatches_fuzzy.csv: generated from case.ipynb, contains potential Customer-Bad Actor matches, based on names and birth dates, with FuzzyWuzzy method.badactor_foundin_kyc_bt_match.csv: generated from task 1, a subset of UofT_nodes.csv containing ONLY the Bad Actors.