Krishna Mehta EXL United States

Poonam Gandhi EXL

Abhigya Chetna EXL

Achal Gupta EXL

Aditya Mahajan EXL

Anuja Ghosh EXL

Aparna Viswanathan EXL

Deepak Chopra EXL

Enakshy Dutta EXL

Harshad Ranadive EXL

Nekhil Agrawal EXL

Nihit Mohan EXL

Nitant Kaushal EXL

Nivedita Dangwal EXL

Pritish Kumar EXL

Rajinder Negi EXL

Reena Aggarwal EXL

Ruchika Ruchika EXL

Sachin Dean EXL

Sassoon Kosian EXL

Shilpi Jain EXL

Sonmitra Mondal EXL

Sunayna Agarwal EXL

Tanu Mahajan EXL

Tushar Mishra EXL

Varun Aggarwal EXL

Varun Kapoor EXL

Provide a URL to a web page, technical memorandum, or a paper.

No response.

Provide a general summary with relevant background information: Where does the method come from? Is it novel? Name the prior art.

Logistic regression and segmentation analysis have been primarily used for model development. Rasch model technique has been used to capture the effects of student level proficiency and steps' level difficulty. Random Forest Method has been leveraged for model stability.

Summarize the algorithms you used in a way that those skilled in the art should understand what to do. Profile of your methods as follows:

Please describe your data understanding efforts, and interesting observations:

Details on feature generation:

We divided the dataset into history, modeling and in-sample validation. Used the history dataset to create summary variables at different levels. These features turned out to be highly correlated with dependent variable.

Details on feature selection:

Variable Clustering, Checking Variance Inflation Factor and Logistic Regressions were used for feature selection.

Details on latent factor discovery (techniques used, useful student/step features, how were the factors used, etc.):

More details on preprocessing:

Details on classification:

Details on model selection:

We used random forest technique for selecting a stable model.

Scores shown in the table below are Cup scores, not leaderboard scores. The difference between the two is described on the Evaluation page.

A reader should also know from reading the fact sheet what the strength of the method is.

Please comment about the following:

We developed the 1-PL logit (Rasch model) to estimate student proficiency and item difficulty, as student ability and item difficulty will drive the likelihood of correct response for a particular student. These two measures were then treated as an independent variable in our model building exercise. Due to the large number of steps (700k), and computational limitations, we developed the Rasch model based on the definition below- At an aggregate level event rate was 85%. Hence in constructing the response indicator for each Unit a student attempted, the new response indictor was tagged as 1 if the student has an average First Correct Attempt >=85% else the new response was tagged zero.

CART, Fixed Effects Logit model, Random-effects logit model, Neural Network and Support Vector Machine.

Details on the relevance of the KC models and latent factors:

Details on software implementation:

SAS, CART

Details on hardware implementation. Specify whether you provide a self contained-application or libraries.

Provide a URL for the code (if available):

Unlike typical data mining problems, this problem was challenging given the clustered structure of the data at different levels. Most of the data that are dealt in most industries do not such complexities and working with this data was challenging and very interesting.

List references below.