标签 MachineLearning 下的文章 - Knorth

关键词搜索

标签搜索

热门文章

Knorth小柯北

“” 共(5)篇

预测股价项目地址：https://github.com/ChenhaoZhu/multiAssets_prediction.git项目概括：拿着四个公司的过去股价信息使用transformer模型进行预测。很简单的部署，很适合初学者。
- 2年前
- 1,610
- 0
- 0
Weather_Classification Weather_Classification项目概括：运用Pytorch训练天气分类，适用于开始学习深度学习的小白，可以给他们一个上手的项目去锻炼自己训练模型。最后的测试环节使用Yotube视频并使用cv2库去逐帧解帧成图片。项目代码：https://github.com/ChenhaoZhu/Weather_Classification.gitOur data is found on kaggle, and here is the kaggle dataset link : https://www.kaggle.com/datasets/vijaygiitk/multiclass-weather-datasetAlso, we upload the kaggle dataset into our drive and here is the shared drive link : https://drive.google.com/drive/folders/1vBxCKRGNq8gs9xU6i3F4AjWyZzAuMQmn?usp=sharingOur group intends to classify weather images first, and then convert severe weather images into clear images that are easier to identify. Our project has a wide range of application scenarios like autonomous driving. Specifically, Severe weather phenomena have various negative effects on transportation. As a source of information for vehicle sensors, the state of the environment is directly influenced by weather conditions. For camera-led multi-sensor fusion system which is one of the mainstream, it is particularly important to recognize the weather through images and obtain surrounding information based on clear pictures.For the classification part, we tried three different networks and chose the one that performed the best which is resnet101. Firstly, we dowanload data from kaggle and divide labeled data into 5 classes which is Sunrise, Shine, Rainy, Foggy, Cloudy. This is what the raw data looks like. We split the data into training set and validation set in a ratio of 2:1 and Resize image into 255:255. For training and validation set, the input is pictures and the output is the probability that the image belongs to each class.For practicality, we use videos recorded by in-car cameras downloaded from youtube as our test data. We use the code to capture a picture of the video every 30s, and output the probability of belonging to each weather based on the screenshot.For the defrog and derain part, unfortunately, this part is quite challenging for us, we used cycle gan but did not get good results.The purpose of image translation is to map the input from the source domain to the target domain. Similar tasks include image coloring, domain adaptation, data augmentation and so on. Transferring weather conditions is important for photographic style transfer. Although lots of approaches have been proposed in traditional image translation tasks, few of them can handle the multi-category weather translation task, since weather conditions have rich categories and highly complex semantic structures.Our project has a wide range of application scenarios. For example, we can classified the weather automatically and process bad weather pictures obtained by driving recorders to allow drivers to make better judgments. In view of the fact that the driving recorder is often installed behind the front windshield, using the wiper can ensure that the clarity of the picture will not be affected by the raindrops on the glass. Also, reversing cameras often get very blurry images due to bad weather. It can also help drivers get a better view based on similar ideasClassification AnalysisFor one try of testing, we used yoturb video depend on each weather condition. Because we are using our model to real-world, we need to see how our model perform on the real-world weather classification at driving.The shared link on drive about five yoturb video is below : https://drive.google.com/drive/folders/10E1oLFdjpivQNiQnFSVpzDtjwNuo5Yu9?usp=sharingOur testing result on real-driving video shows that resnet101 is the best model among our three models. However, the accuracy is still worth of improving. There are several points that our model need to improve.Because we are supposed to train a model on recoginizing weather condition while driving, we could add some data from driving car carmera. Our training model do not include many pictures on that. The reason why we do not include any images into the training could be stated in two main points. First, the car camera video is hard to find, either searching on yoturb or recording by ourselves is time-consuming and difficult. Secondly, we want to see the performance of just trainig model with regular weather images. As shown until, the best performance on test data set is around 85 % . We can say this result is relatively good because of our limit on training data. If we add some picture while driving into the training part, we could possibly get a higher test accuracy on real-world.Image Stype TransferOur next step is going toward "how we could make use of our correctly identitied weather". What is the use of "weather classification" ? Once we recognize the current weather, like raniy or foggy. Could we make the image more clear ? One possible discussion among our group is to think about using SVD, dimension reduction, to get rid of the background. However, it does not help because drivers still need to watch the road even thought he current weather is rainy or foggy. Then we search for the method to clear the fog and rain so that we could let the driver see more clear about what is going on.
- 2年前
- 1,129
- 0
- 0
IMI BigDataAIHUB Case Competition IMI BigDataAIHUB Case Competition下载比赛规则与我们小组的团队代码：https://github.com/ChenhaoZhu/detecting_FinancialCrime.git比赛信息概述：二分类与多分类，同时帮助加拿大ScociaBank银行寻找潜在的洗钱组织，该竞赛项目很适合给数据科学的小白或者刚刚入门者来尝试。MotivationFinancial Institutions are used by criminals to “legitimize” money generated from criminal activities, hence banks are expected by society and regulators to act proactively in detecting and stopping criminals from using the financial system to access their money through effective monitoring and reporting to authorities.ComplicationNumber of clients and transaction volume are large and only expected to grow in the futureDiminishing returns of adding more headcountCriminals are constantly innovating on ways to use financial system to “legitimize” their fundsCase GoalUse suitable data analytics tools and techniques to help Scotiabank detect financial crimes:Find known high risk people in our customer base using public dataScore clients according to their likelihood of being involved in Money Laundering using transactional dataEnhance scoring and visualize networks using connections between clientsCase TasksName Screening: Detect 50 Bad Actors in Scotiabank's customer base using public data sources.Risking Rating: Classify customers into Low, Medium and High risk, and predict Bad Actors.Improve Model using Graph Data: Add customer connections information to improve risk rating model or use a graph model directly.Data SourcesRaw DataUofT_nodes.csv: contains KYC, Transactional data and Risk RatingUofT_edges.csv: contains connections between customers i.e. amount of money sent via EMT from one customer (source) to another (target)UofT_occupation_risk.csv: contains mapping between an occupation (code) to their risk level of being involves in financial crimestargets.simple.csv: contains a list of Bad Actors from https://www.opensanctions.org/datasets/default/Intermediate Datamatches_fuzzy.csv: generated from case.ipynb, contains potential Customer-Bad Actor matches, based on names and birth dates, with FuzzyWuzzy method.badactor_foundin_kyc_bt_match.csv: generated from task 1, a subset of UofT_nodes.csv containing ONLY the Bad Actors.
- 2年前
- 983
- 0
- 0
CycleGan人物动漫化 human_to_cartoon项目概括：人物动漫化与动漫人物化，使用的算法是CycleGan，过段时间可以出一篇关于CycleGan的解释博客，代码是完整的python文件包含训练与测试代码。由于训练的图片数据太大了，github上传不了不过大家可以去网上搜索CycleGan，有不少相关的数据集可以提供下载。这个项目适合准备入行AIGC行业的小白，在他们学习扩散模型前可以先试试cyclegan算法自己来实现。项目代码：https://github.com/ChenhaoZhu/human_to_cartoon.gitThis is a small project that transforming human pictures into cartoon images, and also capable of transforming cartoon images into human images. The methods applied is CycleGAN. CycleGAN is primarily used to transform domain of an image into another domain. In this project, we trained two generators and two discriminators to apply cyclegan. Because the training time takes too long, our model does not perform very well on human-cartoon transformation. However, given more time, we will train more and make generator better for transformation. Human :Cartoon(Transformed) :Cartoon :Human(Transformed) :
- 2年前
- 950
- 0
- 0
Predicting the Rossmann Store Sales Using Different Models in R --- Kaggle IN class Project OK, Let us Start our Project ----- Predicting the SalesStep 0 : Define the Root Mean Square Function for checking my Model in the future.Step 1 : Loading the Data Set and Combine them correspondingly.It is Always important to know what kind of data we are dealing withUse summary(train) to see, and the result is below:Step 2 : Explorative Data analysis. Take a closer look at the data we have.Now we could First Make a Huge Graph containing all possible variables, VS Sales ;It seems that “Customers” has a positive relation with Sales; However, Test Data does not have “Customers”. Under this condition, I think deleting “Customers” in the future modeling will be helpful;From the Graph, it is easy to see that most of our features are categorical features;There are many interesting points I noticed. First, The Sales will be greater on weekday than weekend, as we could see that Sunday has the lowest Sales; Second, Holiday will influence Sales a lot. How About other Variables ? Let us see whether “StoreType” has anything interesting.It seems that “StoreType a “ will have better possibility to get higher sales. Moreover, check “CompetitionDistance” with “StoreType”:The Plot tells us that Higher Sales dependent on Smaller Distance with StoreType “a”; So, ‘CompetitionDistance’, “StoreType” will be crucial features for modeling.Step 3 : Transforming the DataWe could define “week”, “Day”, “Month” using “Date”;We could make Dummy Variables using “StateHoliday”, “StoreType”;Also, we could create “WeekEnd” using “DayOfWeek”;According to any possible holidays, we combine them into “HolidayT”;Using “HolidayT” and “CompetitionDistance”, create “DistanceHoliday”;Next, we need to delete some useless variables;Then we have transformed our data to be available in Modeling !Step 4 : Modeling As we do not want “Customers”, and we need to create data set suitable to different models;The First Model : Linear Regression + Lasso Regression + Ridge RegressionFor Linear Regression Model, we could use StepWise Method to select the variables;Then we could combine these three regression model together with different weights, and we could get our first Kaggle Score !The Second Model : Random ForestBefore Using the Raw Model, we need to select our best parameters;Applying the Random Forest Model to the test file, I Tried many times with Random Forest Model, sometimes with mtry = 10, sometimes with mtry = 9;And Below is the best Kaggle Score using Random Forest Model;The Third Model : XgBoostSimilarly, before training the model, we need to select the best parametes;Here, I used grid-search for several parametes;Then We apply the xgboost model with the best parameters ;Using similar codes shown in Regression Model Session, I get the result from xgboost and submit on Kaggle to get Score.Through several attempts, the best Score is :The Final Model : Combined with Random Forest and XgboostWe could Try some combination between models. The usage of our Model 1 (combination of regressions) may be helpful in this time.Through several attempts, I found that 0.75 xgboost + 0.25 random_forest will be the best combination under my data.Then the final and the best Kaggle Score is the combination of random forest and Xgboost; End of the Project, we used several models to predict the sales;Lasso Regression, Ridge Regression, LInear Regression, Random Forest, Xgboost, And we apply some combinations;Here is the rank of my result out of 80 students ! Hope this is a good help to anyone want to learn machine learning.
- 5年前
- 3,064
- 0
- 0