Jaydeep Chakraborty

Data Scientist | Machine Learning Enthusiast

PROJECTS

Kaggle competition challenges

2016 - Present

Entity Linking - Google Summer of Code Participant 2019

May 2019 - Aug 2019

The requirement of the project is to create a workflow for entity linking between DBpedia and external data set. Entity linking is a process of ontology alignment or ontology mapping between source and target ontology. In this workflow, the goal is to detect similar concepts/classes or instances/individuals between the source and target ontology. There are two levels of links, the first one is schema level linking and another one is instance level linking. Schema level linking is mapping/alignment between concepts/classes of the source and target ontology and on the other hand, instance-level linking is mapping/alignment between instances/individuals of the source and target ontology.

Entity Ranking

Feb 2018 - Apr 2018
  • Wikipedia Page ranking algorithm of Andreas Thalhammer is implemented on DBkwik, 2017 knowledge graph.
  • Scala scripts are used to implement the ranking algorithm on DBkwik, 2017 triple dataset.
  • Fuseki server is used as triple datastore to store the rank information of each individual (subject, object, and predicate).

Analysis on noisy data

Oct 2017 - Nov 2017
  • Implementing methods to deal with noisy data mentioned in the following paper. Nagarajan Natarajan, Inderjit S Dhillon, Pradeep K Ravikumar, and Ambuj Tewari, Learning with noisy labels, In Advances in neural information processing systems, pages 11961204, 2013.

Smart Diet

Oct 2017 - Nov 2017
  • This project is about gesture-based nutrition intake information retrieval and analysis by analyzing the user eating gesture with wearable armband or wristband sensors.
  • It was a part of Smart diet assessment system such as MT-Diet (https://impact.asu.edu/MTDiet.html) are being recently proposed.

Technologies used:

  • python
  • matlab

RecoBoard – An Academic Recommendation Application

Dec 2016 - Jul 2017
  • Implemented scripts in Python for scraping data from course catalogs, instructor web pages and Google Scholar.
  • Employed topic modelling techniques (viz. latent dirichlet allocation (LDA), hierarchical dirichlet process(HDP), latent semantic analysis(LSA)) and association mapping techniques (apriori algorithm) combined with collaborative filtering approach to recommend relevant professors to students.

Technologies used:

  • python
  • Selenium
  • Scrapy
  • Scikit
  • Gensim

Statistically Significant Hot-Spot Analysis

Dec 2016 - Jul 2017
  • Implemented a 3D space-time cube for (~1.8 GB) NYC Taxi Trip dataset with each grid corresponding to a specific location of taxi pickup and computed 50 top statistically significant locations based on the z-scores generated by Getis-Ord statistic
  • Configured and setup Apache Hadoop and Spark cluster in Amazon AWS for large scale dataset processing

Technologies used:

  • Java
  • Apache Hadoop
  • Apache Spark
  • Hive
  • Amazon AWS

Movie Recommendation

Dec 2016 - Jul 2017
  • Implementation of a movie recommendation system to address Netflix Prize competition, i.e find best collaborative filtering algorithm to predict user ratings for films, based on previous ratings without any other information about the users or films.
  • Employed Bayes, KNN, Logistic regression and SVM supervised algorithms to predict ratings based on other users ratings and do comparative analysis.

Technologies used:

  • Python
  • Matlab

Sentiment Analysis

Sep 2016 - Dec 2016
  • It an application that analysis whether we’re using facebook’s “safety check” option or “marksafe” tag properly and reasonably.
  • Implemented Python scripts for scraping data from facebook, twitter.
  • Employed Bayes classifier to classify user’s posts into positive and negative class.

Technologies used:

  • Python
  • Selenium
  • Scrapy
  • nltk