Krishna Mehta EXL United States
Poonam Gandhi EXL
Abhigya Chetna EXL
Achal Gupta EXL
Aditya Mahajan EXL
Anuja Ghosh EXL
Aparna Viswanathan EXL
Deepak Chopra EXL
Enakshy Dutta EXL
Harshad Ranadive EXL
Nekhil Agrawal EXL
Nihit Mohan EXL
Nitant Kaushal EXL
Nivedita Dangwal EXL
Pritish Kumar EXL
Rajinder Negi EXL
Reena Aggarwal EXL
Ruchika Ruchika EXL
Sachin Dean EXL
Sassoon Kosian EXL
Shilpi Jain EXL
Sonmitra Mondal EXL
Sunayna Agarwal EXL
Tanu Mahajan EXL
Tushar Mishra EXL
Varun Aggarwal EXL
Varun Kapoor EXL
Provide a URL to a web page, technical memorandum, or a paper.
No response.
Provide a general summary with relevant background information: Where does the method come from? Is it novel? Name the prior art.
Logistic regression and segmentation analysis have been primarily used for model development. Rasch model technique has been used to capture the effects of student level proficiency and steps' level difficulty. Random Forest Method has been leveraged for model stability.
Summarize the algorithms you used in a way that those skilled in the art should understand what to do. Profile of your methods as follows:
Please describe your data understanding efforts, and interesting observations:
Details on feature generation:
We divided the dataset into history, modeling and in-sample validation. Used the history dataset to create summary variables at different levels. These features turned out to be highly correlated with dependent variable.
Details on feature selection:
Variable Clustering, Checking Variance Inflation Factor and Logistic Regressions were used for feature selection.
Details on latent factor discovery (techniques used, useful student/step features, how were the factors used, etc.):
More details on preprocessing:
Details on classification:
Details on model selection:
We used random forest technique for selecting a stable model.
Scores shown in the table below are Cup scores, not leaderboard scores. The difference between the two is described on the Evaluation page.
A reader should also know from reading the fact sheet what the strength of the method is.
Please comment about the following:
We developed the 1-PL logit (Rasch model) to estimate student proficiency and item difficulty, as student ability and item difficulty will drive the likelihood of correct response for a particular student. These two measures were then treated as an independent variable in our model building exercise. Due to the large number of steps (700k), and computational limitations, we developed the Rasch model based on the definition below- At an aggregate level event rate was 85%. Hence in constructing the response indicator for each Unit a student attempted, the new response indictor was tagged as 1 if the student has an average First Correct Attempt >=85% else the new response was tagged zero.
CART, Fixed Effects Logit model, Random-effects logit model, Neural Network and Support Vector Machine.
Details on the relevance of the KC models and latent factors:
Details on software implementation:
SAS, CART
Details on hardware implementation. Specify whether you provide a self contained-application or libraries.
Provide a URL for the code (if available):
Unlike typical data mining problems, this problem was challenging given the clustered structure of the data at different levels. Most of the data that are dealt in most industries do not such complexities and working with this data was challenging and very interesting.
List references below.