Ensemble Hybrid Logit Model


Team Leader

Krishna Mehta
EXL
United States

Team Members

Poonam Gandhi
EXL

Abhigya Chetna
EXL

Achal Gupta
EXL

Aditya Mahajan
EXL

Anuja Ghosh
EXL

Aparna Viswanathan
EXL

Deepak Chopra
EXL

Enakshy Dutta
EXL

Harshad Ranadive
EXL

Nekhil Agrawal
EXL

Nihit Mohan
EXL

Nitant Kaushal
EXL

Nivedita Dangwal
EXL

Pritish Kumar
EXL

Rajinder Negi
EXL

Reena Aggarwal
EXL

Ruchika Ruchika
EXL

Sachin Dean
EXL

Sassoon Kosian
EXL

Shilpi Jain
EXL

Sonmitra Mondal
EXL

Sunayna Agarwal
EXL

Tanu Mahajan
EXL

Tushar Mishra
EXL

Varun Aggarwal
EXL

Varun Kapoor
EXL

Overview

Supplementary online material

Provide a URL to a web page, technical memorandum, or a paper.

No response.

Background*

Provide a general summary with relevant background information: Where does the method come from? Is it novel? Name the prior art.

Logistic regression and segmentation analysis have been primarily used for model development. Rasch model technique has been used to capture the effects of student level proficiency and steps' level difficulty. Random Forest Method has been leveraged for model stability.

Method

Summarize the algorithms you used in a way that those skilled in the art should understand what to do. Profile of your methods as follows:

Data exploration and understanding

Did you use data exploration techniques to

  • [not checked]  Identify selection biases
  • [checked]  Identify temporal effects (e.g. students getting better over time)
  • [checked]  Understand the variables
  • [checked]  Explore the usefulness of the KC models
  • [checked]  Understand the relationships between the different KC types

Please describe your data understanding efforts, and interesting observations:

No response.

Preprocessing

Feature generation

  • [checked]  Features designed to capture the step type (e.g. enter given, or ... )
  • [checked]  Features based on the textual step name
  • [checked]  Features designed to capture the KC type
  • [not checked]  Features based on the textual KC name
  • [checked]  Features derived from opportunity counts
  • [checked]  Features derived from the problem name
  • [checked]  Features based on student ID
  • [checked]  Other features

Details on feature generation:

We divided the dataset into history, modeling and in-sample validation. Used the history dataset to create summary variables at different levels. These features turned out to be highly correlated with dependent variable.

Feature selection

  • [checked]  Feature ranking with correlation or other criterion (specify below)
  • [not checked]  Filter method (other than feature ranking)
  • [not checked]  Wrapper with forward or backward selection (nested subset method)
  • [not checked]  Wrapper with intensive search (subsets not nested)
  • [not checked]  Embedded method
  • [checked]  Other method not listed above (specify below)

Details on feature selection:

Variable Clustering, Checking Variance Inflation Factor and Logistic Regressions were used for feature selection.

Did you attempt to identify latent factors?

  • [not checked]  Cluster students
  • [not checked]  Cluster knowledge components
  • [checked]  Cluster steps
  • [not checked]  Latent feature discovery was performed jointly with learning

Details on latent factor discovery (techniques used, useful student/step features, how were the factors used, etc.):

No response.

Other preprocessing

  • [checked]  Filling missing values (for KC)
  • [checked]  Principal component analysis

More details on preprocessing:

No response.

Classification

Base classifier

  • [checked]  Decision tree, stub, or Random Forest
  • [not checked]  Linear classifier (Fisher's discriminant, SVM, linear regression)
  • [not checked]  Non-linear kernel method (SVM, kernel ridge regression, kernel logistic regression)
  • [not checked]  Naïve
  • [not checked]  Bayesian Network (other than Naïve Bayes)
  • [not checked]  Neural Network
  • [not checked]  Bayesian Neural Network
  • [not checked]  Nearest neighbors
  • [not checked]  Latent variable models (e.g. matrix factorization)
  • [not checked]  Neighborhood/correlation based collaborative filtering
  • [not checked]  Bayesian Knowledge Tracing
  • [not checked]  Additive Factor Model
  • [not checked]  Item Response Theory
  • [not checked]  Other classifier not listed above (specify below)

Loss Function

  • [not checked]  Hinge loss (like in SVM)
  • [not checked]  Square loss (like in ridge regression)
  • [checked]  Logistic loss or cross-entropy (like in logistic regression)
  • [checked]  Exponential loss (like in boosting)
  • [not checked]  None
  • [not checked]  Don't know
  • [not checked]  Other loss (specify below)

Regularizer

  • [not checked]  One-norm (sum of weight magnitudes, like in Lasso)
  • [not checked]  Two-norm (||w||^2, like in ridge regression and regular SVM)
  • [not checked]  Structured regularizer (like in group lasso)
  • [not checked]  None
  • [not checked]  Don't know
  • [not checked]  Other (specify below)

Ensemble Method

  • [checked]  Boosting
  • [checked]  Bagging (check this if you use Random Forest)
  • [checked]  Other ensemble method
  • [not checked]  None

Were you able to use information present only in the training set?

  • [checked]  Corrects, incorrects, hints
  • [checked]  Step start/end times

Did you use post-training calibration to obtain accurate probabilities?

  • [selected]  Yes
  • [not selected]  No

Did you make use of the development data sets for training?

  • [not selected]  Yes
  • [selected]  No

Details on classification:

No response.

Model selection/hyperparameter selection

  • [checked]  We used the online feedback of the leaderboard.
  • [not checked]  K-fold or leave-one-out cross-validation (using training data)
  • [not checked]  Virtual leave-one-out (closed for estimations of LOO with a single classifier training)
  • [checked]  Out-of-bag estimation (for bagging methods)
  • [checked]  Bootstrap estimation (other than out-of-bag)
  • [not checked]  Other cross-validation method
  • [not checked]  Bayesian model selection
  • [not checked]  Penalty-based method (non-Bayesian)
  • [not checked]  Bi-level optimization
  • [not checked]  Other method not listed above (specify below)

Details on model selection:

We used random forest technique for selecting a stable model.

Results

Final Team Submission

Scores shown in the table below are Cup scores, not leaderboard scores. The difference between the two is described on the Evaluation page.

A reader should also know from reading the fact sheet what the strength of the method is.

Please comment about the following:

Quantitative advantages (e.g., compact feature subset, simplicity, computational advantages).

No response.

Qualitative advantages (e.g. compute posterior probabilities, theoretically motivated, has some elements of novelty).

We developed the 1-PL logit (Rasch model) to estimate student proficiency and item difficulty, as student ability and item difficulty will drive the likelihood of correct response for a particular student. These two measures were then treated as an independent variable in our model building exercise. Due to the large number of steps (700k), and computational limitations, we developed the Rasch model based on the definition below- At an aggregate level event rate was 85%. Hence in constructing the response indicator for each Unit a student attempted, the new response indictor was tagged as 1 if the student has an average First Correct Attempt >=85% else the new response was tagged zero.

Other methods. List other methods you tried.

CART, Fixed Effects Logit model, Random-effects logit model, Neural Network and Support Vector Machine.

How helpful did you find the included KC models?

  • [selected]  Crucial in getting good predictions
  • [not selected]  Somewhat helpful in getting good predictions
  • [not selected]  Neutral
  • [not selected]  Not particularly helpful
  • [not selected]  Irrelevant

If you learned latent factors, how helpful were they?

  • [not selected]  Crucial in getting good predictions
  • [not selected]  Somewhat helpful in getting good predictions
  • [not selected]  Neutral
  • [not selected]  Not particularly helpful
  • [not selected]  Irrelevant

Details on the relevance of the KC models and latent factors:

No response.

Software Implementation

Availability

  • [not checked]  Proprietary in-house software
  • [not checked]  Commercially available in-house software
  • [not checked]  Freeware or shareware in-house software
  • [checked]  Off-the-shelf third party commercial software
  • [not checked]  Off-the-shelf third party freeware or shareware

Language

  • [not checked]  C/C++
  • [not checked]  Java
  • [not checked]  Matlab
  • [not checked]  Python/NumPy/SciPy
  • [checked]  Other (specify below)

Details on software implementation:

SAS, CART

Hardware implementation

Platform

  • [checked]  Windows
  • [not checked]  Linux or other Unix
  • [not checked]  Mac OS
  • [not checked]  Other (specify below)

Memory

  • [not selected]  <= 2 GB
  • [not selected]  <= 8 GB
  • [not selected]  >= 8 GB
  • [not selected]  >= 32 GB

Parallelism

  • [checked]  Multi-processor machine
  • [not checked]  Run in parallel different algorithms on different machines
  • [not checked]  Other (specify below)

Details on hardware implementation. Specify whether you provide a self contained-application or libraries.

No response.

Code URL

Provide a URL for the code (if available):

No response.

Competition Setup

From a performance point of view, the training set was

  • [selected]  Too big (could have achieved the same performance with significantly less data)
  • [not selected]  Too small (more data would have led to better performance)

From a computational point of view, the training set was

  • [selected]  Too big (imposed serious computational challenges, limited the types of methods that can be applied)
  • [not selected]  Adequate (the computational load was easy to handle)

Was the time constraint imposed by the challenge a difficulty or did you feel enough time to understand the data, prepare it, and train models?

  • [not selected]  Not enough time
  • [not selected]  Enough time
  • [selected]  It was enough time to do something decent, but there was a lot left to explore. With more time performance could have been significantly improved.

How likely are you to keep working on this problem?

  • [not selected]  It is my main research area.
  • [not selected]  It was a very interesting problem. I'll keep working on it.
  • [selected]  This data is a good fit for the data mining methods I am using/developing. I will use it in the future for empirical evaluation.
  • [not selected]  Maybe I'll try some ideas , but it is not high priority.
  • [not selected]  Not likely to keep working on it.

Comments on the problem (What aspects of the problem you found most interesting? Did it inspire you to develop new techniques?)

Unlike typical data mining problems, this problem was challenging given the clustered structure of the data at different levels. Most of the data that are dealt in most industries do not such complexities and working with this data was challenging and very interesting.

References

List references below.

No response.