ASSISTments2017 Data Analysis¶
Data Description¶
Column Description¶
Field |
Annotation |
---|---|
student id |
a deidentified ID/tag used for identifying an individual student |
SY ASSISTments Usage |
the academic years the student used ASSISTments |
AveKnow |
average student knowledge level (according to Bayesian Knowledge Tracing algorithm – cf. Corbett & Anderson, 1995) |
AveCarelessness |
average student carelessness (according to San Pedro, Baker, & Rodrigo, 2011 model) |
AveCorrect |
average student correctness |
NumActions |
total number of student actions in system |
AveResBored |
average student affect: boredom (see Pardos, Baker, San Pedro, Gowda, & Gowda, 2014) |
AveResEngcon |
average student affect:engaged concentration (see Pardos, Baker, San Pedro, Gowda, & Gowda, 2014) |
AveResConf |
average student affect:confusion (see Pardos, Baker, San Pedro, Gowda, & Gowda, 2014) |
AveResFrust |
average student affect:frustration (see Pardos, Baker, San Pedro, Gowda, & Gowda, 2014) |
AveResOfftask |
average student affect: off task (see Pardos, Baker, San Pedro, Gowda, & Gowda, 2014 and also Baker, 2007) |
AveResGaming |
average student affect:gaming the system (see Pardos, Baker, San Pedro, Gowda, & Gowda, 2014 and also Baker Corbett Koedinger & Wagner, 2004) |
actionId |
the unique id of this specific action |
skill |
a tag used for identifying the cognitive skill related to the problem (see Razzaq, Heffernan, Feng, & Pardos, 2007) |
problemId |
a unique ID used for identifying a single problem |
assignmentId |
a unique ID used for identifying an assignment |
assistmentId |
a unique ID used for identifying an assistment (a instance of a multi-part problem) |
startTime |
when did the student start the problem (UNIX time, seconds) |
endTime |
when did the student end the problem (UNIX time, seconds) |
timeTaken |
Time spent on the current step |
correct |
Answer is correct |
original |
Problem is original not a scaffolding problem |
hint |
Action is a hint response |
hintCount |
Total number of hints requested so far |
hintTotal |
total number of hints requested for the problem |
scaffold |
Problem is a scaffolding problem |
bottomHint |
Bottom-out hint is used |
attemptCount |
Total problems attempted in the tutor so far. |
problemType |
the type of the problem |
frIsHelpRequest |
First response is a help request |
frPast5HelpRequest |
Number of last 5 First responses that included a help request |
frPast8HelpRequest |
Number of last 8 First responses that included a help request |
stlHintUsed |
Second to last hint is used an indicates a hint that gives considerable detail but is not quite bottom-out |
past8BottomOut |
Number of last 8 problems that used the bottom-out hint. |
totalFrPercentPastWrong |
Percent of all past problems that were wrong on this KC. |
totalFrPastWrongCount |
Total first responses wrong attempts in the tutor so far. |
frPast5WrongCount |
Number of last 5 First responses that were wrong |
frPast8WrongCount |
Number of last 8 First responses that were wrong |
totalFrTimeOnSkill |
Total first response time spent on this KC across all problems |
timeSinceSkill |
Time since the current KC was last seen. |
frWorkingInSchool |
First response Working during school hours (between 7:00 am and 3:00 pm) |
totalFrAttempted |
Total first responses attempted in the tutor so far. |
totalFrSkillOpportunities |
Total first response practice opportunities on this KC so far. |
responseIsFillIn |
Response is filled in (No list of answers available) |
responseIsChosen |
Response is chosen from a list of answers (Multiple choice, etc). |
endsWithScaffolding |
Problem ends with scaffolding |
endsWithAutoScaffolding |
Problem ends with automatic scaffolding |
frTimeTakenOnScaffolding |
First response time taken on scaffolding problems |
frTotalSkillOpportunitiesScaffolding |
Total first response practice opportunities on this skill so far |
totalFrSkillOpportunitiesByScaffolding |
Total first response scaffolding opportunities for this KC so far |
frIsHelpRequestScaffolding |
First response is a help request Scaffolding |
timeGreater5Secprev2wrong |
Long pauses after 2 Consecutive wrong answers |
sumRight |
NaN |
helpAccessUnder2Sec |
Time spent on help was under 2 seconds |
timeGreater10SecAndNextActionRight |
Long pause after correct answer |
consecutiveErrorsInRow |
Total number of 2 wrong answers in a row across all the problems |
sumTime3SDWhen3RowRight |
NaN |
sumTimePerSkill |
NaN |
totalTimeByPercentCorrectForskill |
Total time spent on this KC across all problems divided by percent correct for the same KC |
prev5count |
NaN |
timeOver80 |
NaN |
manywrong |
NaN |
confidence(BORED) |
the confidence of the student affect prediction: bored |
confidence(CONCENTRATING) |
the confidence of the student affect prediction: concecntrating |
confidence(CONFUSED) |
the confidence of the student affect prediction: confused |
confidence(FRUSTRATED) |
the confidence of the student affect prediction: frustrated |
confidence(OFF TASK) |
the confidence of the student affect prediction: off task |
confidence(GAMING) |
the confidence of the student affect prediction: gaming |
RES_BORED |
rescaled of the confidence of the student affect prediction: boredom |
RES_CONCENTRATING |
rescaled of the confidence of the student affect prediction: concentration |
RES_CONFUSED |
rescaled of the confidence of the student affect prediction: confusion |
RES_FRUSTRATED |
rescaled of the confidence of the student affect prediction: frustration |
RES_OFFTASK |
rescaled of the confidence of the student affect prediction: off task |
RES_GAMING |
rescaled of the confidence of the student affect prediction: gaming |
Ln-1 |
baysian knowledge tracing’s knowledge estimate at the previous time step |
Ln |
baysian knowledge tracing’s knowledge estimate at the time step |
schoolID |
the id (anonymized) of the school the student was in during the year the data was collected |
MCAS |
Massachusetts Comprehensive Assessment System test score. In short, this number is the student’s state test score (outside ASSISTments) during that year. -999 represents the data is missing |
[1]:
import numpy as np
import pandas as pd
import plotly.express as asz
from plotly.subplots import make_subplots
import plotly.graph_objs as go
[70]:
path = "anonymized_full_release_competition_dataset.csv"
data = pd.read_csv(path, encoding = "ISO-8859-15",low_memory=False)
[36]:
pd.set_option('display.max_columns', 500)
data.head()
[36]:
studentId | MiddleSchoolId | InferredGender | SY ASSISTments Usage | AveKnow | AveCarelessness | AveCorrect | NumActions | AveResBored | AveResEngcon | AveResConf | AveResFrust | AveResOfftask | AveResGaming | action_num | skill | problemId | problemType | assignmentId | assistmentId | startTime | endTime | timeTaken | correct | original | hint | hintCount | hintTotal | scaffold | bottomHint | attemptCount | frIsHelpRequest | frPast5HelpRequest | frPast8HelpRequest | stlHintUsed | past8BottomOut | totalFrPercentPastWrong | totalFrPastWrongCount | frPast5WrongCount | frPast8WrongCount | totalFrTimeOnSkill | timeSinceSkill | frWorkingInSchool | totalFrAttempted | totalFrSkillOpportunities | responseIsFillIn | responseIsChosen | endsWithScaffolding | endsWithAutoScaffolding | frTimeTakenOnScaffolding | frTotalSkillOpportunitiesScaffolding | totalFrSkillOpportunitiesByScaffolding | frIsHelpRequestScaffolding | timeGreater5Secprev2wrong | sumRight | helpAccessUnder2Sec | timeGreater10SecAndNextActionRight | consecutiveErrorsInRow | sumTime3SDWhen3RowRight | sumTimePerSkill | totalTimeByPercentCorrectForskill | Prev5count | timeOver80 | manywrong | confidence(BORED) | confidence(CONCENTRATING) | confidence(CONFUSED) | confidence(FRUSTRATED) | confidence(OFF TASK) | confidence(GAMING) | RES_BORED | RES_CONCENTRATING | RES_CONFUSED | RES_FRUSTRATED | RES_OFFTASK | RES_GAMING | Ln-1 | Ln | MCAS | Enrolled | Selective | isSTEM | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 8 | 2 | Male | 2004-2005 | 0.352416 | 0.183276 | 0.483902 | 1056 | 0.208389 | 0.679126 | 0.115905 | 0.112408 | 0.156503 | 0.196561 | 9950 | properties-of-geometric-figures | 1118 | textfieldquestion | 20405010 | 104051118 | 1096470301 | 1096470350 | 49.0 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0.0 | 0 | 0 | 0 | 0.0 | 0.0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0.0 | 0 | 0.0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.0 | 49.0 | 0.000000 | 0 | 0 | 0 | 0.597865 | 0.234294 | 0.0000 | 0.0 | 0.838710 | 0.008522 | 0.376427 | 0.320317 | 0.000000 | 0.0 | 0.785585 | 0.000264 | 0.13 | 0.061190409 | 45 | 0 | 0 | NaN |
1 | 8 | 2 | Male | 2004-2005 | 0.352416 | 0.183276 | 0.483902 | 1056 | 0.208389 | 0.679126 | 0.115905 | 0.112408 | 0.156503 | 0.196561 | 9951 | properties-of-geometric-figures | 1119 | noprobtype | 20405010 | 104051119 | 1096470350 | 1096470354 | 4.0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 0.0 | 0 | 0 | 0 | 49.0 | 0.0 | 1 | 1 | 1 | 0 | 0 | 1 | 0 | 4.0 | 0 | 0.0 | 1 | 0 | 1 | 0 | 1 | 0 | 0.0 | 53.0 | 106.000000 | 1 | 0 | 0 | 0.355694 | 0.992585 | 0.9375 | 0.0 | 0.600000 | 0.047821 | 0.156027 | 0.995053 | 0.887452 | 0.0 | 0.468252 | 0.001483 | 0.061190409 | 0.213509945 | 45 | 0 | 0 | NaN |
2 | 8 | 2 | Male | 2004-2005 | 0.352416 | 0.183276 | 0.483902 | 1056 | 0.208389 | 0.679126 | 0.115905 | 0.112408 | 0.156503 | 0.196561 | 9952 | sum-of-interior-angles-more-than-3-sides | 1120 | noprobtype | 20405010 | 104051120 | 1096470354 | 1096470360 | 6.0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0.0 | 0 | 0 | 0 | 0.0 | 0.0 | 1 | 2 | 0 | 0 | 0 | 0 | 0 | 6.0 | 0 | 0.0 | 0 | 0 | 1 | 0 | 0 | 0 | 0.0 | 6.0 | 0.000000 | 2 | 0 | 0 | 0.355694 | 0.992585 | 0.9375 | 0.0 | 0.600000 | 0.047821 | 0.156027 | 0.995053 | 0.887452 | 0.0 | 0.468252 | 0.001483 | 0.116 | 0.033305768 | 45 | 0 | 0 | NaN |
3 | 8 | 2 | Male | 2004-2005 | 0.352416 | 0.183276 | 0.483902 | 1056 | 0.208389 | 0.679126 | 0.115905 | 0.112408 | 0.156503 | 0.196561 | 9953 | sum-of-interior-angles-more-than-3-sides | 1120 | noprobtype | 20405010 | 104051120 | 1096470360 | 1096470378 | 18.0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0.0 | 1 | 0 | 0 | 0.0 | 0.0 | 1 | 3 | 1 | 0 | 0 | 0 | 0 | 6.0 | 1 | 0.0 | 0 | 0 | 1 | 0 | 0 | 1 | 0.0 | 24.0 | 0.000000 | 3 | 0 | 0 | 0.355694 | 0.617065 | 0.0000 | 0.0 | 0.204082 | 0.343996 | 0.156027 | 0.744520 | 0.000000 | 0.0 | 0.108417 | 0.010665 | 0.116 | 0.033305768 | 45 | 0 | 0 | NaN |
4 | 8 | 2 | Male | 2004-2005 | 0.352416 | 0.183276 | 0.483902 | 1056 | 0.208389 | 0.679126 | 0.115905 | 0.112408 | 0.156503 | 0.196561 | 9954 | sum-of-interior-angles-more-than-3-sides | 1121 | noprobtype | 20405010 | 104051121 | 1096470378 | 1096470380 | 2.0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1.0 | 1 | 1 | 1 | 6.0 | 0.0 | 1 | 3 | 1 | 0 | 0 | 0 | 0 | 2.0 | 1 | 1.0 | 0 | 0 | 2 | 0 | 1 | 0 | 0.0 | 26.0 | 77.999999 | 4 | 0 | 1 | 0.355694 | 0.617065 | 0.0000 | 0.0 | 0.204082 | 0.343996 | 0.156027 | 0.744520 | 0.000000 | 0.0 | 0.108417 | 0.010665 | 0.033305768 | 0.118385889 | 45 | 0 | 0 | NaN |
General features¶
[23]:
data.describe()
[23]:
studentId | MiddleSchoolId | AveKnow | AveCarelessness | AveCorrect | NumActions | AveResBored | AveResEngcon | AveResConf | AveResFrust | AveResOfftask | AveResGaming | action_num | problemId | assignmentId | assistmentId | startTime | endTime | timeTaken | correct | original | hint | hintCount | hintTotal | scaffold | bottomHint | attemptCount | frIsHelpRequest | frPast5HelpRequest | frPast8HelpRequest | stlHintUsed | past8BottomOut | totalFrPercentPastWrong | totalFrPastWrongCount | frPast5WrongCount | frPast8WrongCount | totalFrTimeOnSkill | timeSinceSkill | frWorkingInSchool | totalFrAttempted | totalFrSkillOpportunities | responseIsFillIn | responseIsChosen | endsWithScaffolding | endsWithAutoScaffolding | frTimeTakenOnScaffolding | frTotalSkillOpportunitiesScaffolding | totalFrSkillOpportunitiesByScaffolding | frIsHelpRequestScaffolding | timeGreater5Secprev2wrong | sumRight | helpAccessUnder2Sec | timeGreater10SecAndNextActionRight | consecutiveErrorsInRow | sumTime3SDWhen3RowRight | sumTimePerSkill | totalTimeByPercentCorrectForskill | Prev5count | timeOver80 | manywrong | confidence(BORED) | confidence(CONCENTRATING) | confidence(CONFUSED) | confidence(FRUSTRATED) | confidence(OFF TASK) | confidence(GAMING) | RES_BORED | RES_CONCENTRATING | RES_CONFUSED | RES_FRUSTRATED | RES_OFFTASK | RES_GAMING | MCAS | Enrolled | Selective | isSTEM | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
count | 942816.000000 | 942816.000000 | 942816.000000 | 942816.000000 | 942816.000000 | 942816.000000 | 942816.000000 | 942816.000000 | 942816.000000 | 942816.000000 | 942816.000000 | 942816.000000 | 9.428160e+05 | 942816.000000 | 9.428160e+05 | 9.428160e+05 | 9.428160e+05 | 9.428160e+05 | 942816.000000 | 942816.000000 | 942816.000000 | 942816.000000 | 942816.000000 | 942816.000000 | 942816.000000 | 942816.000000 | 942816.000000 | 942816.000000 | 942816.000000 | 942816.000000 | 942816.000000 | 942816.000000 | 942816.000000 | 942816.000000 | 942816.000000 | 942816.000000 | 942816.000000 | 9.428160e+05 | 942816.000000 | 942816.000000 | 942816.000000 | 942816.000000 | 942816.0 | 942816.000000 | 942816.000000 | 942816.000000 | 942816.000000 | 942816.000000 | 942816.000000 | 942816.000000 | 942816.000000 | 942816.000000 | 942816.000000 | 942816.000000 | 942731.000000 | 942816.000000 | 942816.000000 | 942816.000000 | 942816.000000 | 942816.000000 | 942816.000000 | 9.428160e+05 | 942816.000000 | 942816.000000 | 942816.000000 | 942816.000000 | 942816.000000 | 9.428160e+05 | 942816.000000 | 942816.000000 | 942816.000000 | 942816.000000 | 942816.000000 | 942816.000000 | 942816.000000 | 316974.000000 |
mean | 3844.844105 | 2.515472 | 0.195155 | 0.109436 | 0.372681 | 869.850594 | 0.232949 | 0.658442 | 0.098940 | 0.131406 | 0.172212 | 0.192703 | 1.849329e+06 | 1899.719319 | 1.198773e+07 | 6.061572e+07 | 1.120793e+09 | 1.120793e+09 | 29.747869 | 0.372681 | 0.264214 | 0.331025 | 1.218490 | 1.953967 | 0.385732 | 0.062794 | 2.673605 | 0.268104 | 1.947322 | 2.575323 | 0.004012 | 0.241678 | 0.227882 | 1.988008 | 0.719380 | 0.944750 | 376.213405 | 4.850802e+05 | 0.974588 | 193.316005 | 8.381019 | 0.023131 | 0.0 | 0.636085 | 0.005724 | 24.060853 | 3.989371 | 1.036189 | 0.670047 | 0.045388 | 145.982059 | 0.055674 | 0.207414 | 0.155307 | 0.087042 | 601.665586 | 2166.143744 | 4.972689 | 0.081323 | 0.715638 | 0.436958 | 5.400894e-01 | 0.134450 | 0.164114 | 0.256006 | 0.337888 | 0.232949 | 6.584415e-01 | 0.098940 | 0.131406 | 0.172212 | 0.192703 | -95.982302 | 0.641147 | 0.300434 | 0.204178 |
std | 2250.484065 | 1.039785 | 0.116451 | 0.059952 | 0.107367 | 530.210725 | 0.030637 | 0.027440 | 0.034788 | 0.038875 | 0.057992 | 0.153455 | 1.726001e+06 | 2579.212724 | 1.434706e+07 | 5.128829e+07 | 1.940359e+07 | 1.940354e+07 | 72.019768 | 0.483519 | 0.440914 | 0.470582 | 1.980665 | 2.929242 | 0.486768 | 0.242592 | 2.929801 | 0.442972 | 1.712580 | 2.457799 | 0.063217 | 0.674613 | 0.271404 | 3.390149 | 0.832699 | 1.076276 | 689.302924 | 2.075599e+06 | 0.157373 | 164.898869 | 11.998292 | 0.150319 | 0.0 | 0.481125 | 0.075443 | 71.665655 | 6.581897 | 1.184489 | 0.470196 | 0.208155 | 124.342503 | 0.229291 | 0.405455 | 0.885692 | 1.619202 | 953.900687 | 4601.435964 | 0.315281 | 0.273331 | 0.451110 | 0.120751 | 1.830483e-01 | 0.292877 | 0.326057 | 0.213177 | 0.335292 | 0.116371 | 1.734275e-01 | 0.249505 | 0.300351 | 0.216997 | 0.340232 | 332.827628 | 0.479664 | 0.458447 | 0.403100 |
min | 8.000000 | 1.000000 | 0.028057 | 0.007801 | 0.000000 | 2.000000 | 0.170871 | 0.403309 | 0.005075 | 0.000000 | 0.083167 | 0.001974 | 9.950000e+03 | 1.000000 | 2.000000e+00 | 5.000000e+00 | 1.095421e+09 | 1.095421e+09 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | -9.928014e+06 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.0 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | -11.332080 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.355694 | 6.500000e-07 | 0.000000 | 0.000000 | 0.000000 | 0.000039 | 0.156027 | 8.890000e-07 | 0.000000 | 0.000000 | 0.000000 | 0.000001 | -999.000000 | 0.000000 | 0.000000 | 0.000000 |
25% | 1952.000000 | 2.000000 | 0.110542 | 0.068760 | 0.294989 | 478.000000 | 0.209035 | 0.642060 | 0.076385 | 0.107278 | 0.131467 | 0.060724 | 7.221708e+05 | 721.000000 | 7.230000e+02 | 2.213000e+03 | 1.103136e+09 | 1.103136e+09 | 5.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 37.000000 | 0.000000e+00 | 1.000000 | 67.000000 | 2.000000 | 0.000000 | 0.0 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 50.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 107.000000 | 189.736571 | 5.000000 | 0.000000 | 0.000000 | 0.355694 | 3.743169e-01 | 0.000000 | 0.000000 | 0.090909 | 0.047821 | 0.156027 | 5.117519e-01 | 0.000000 | 0.000000 | 0.048295 | 0.001483 | 14.000000 | 0.000000 | 0.000000 | 0.000000 |
50% | 3766.000000 | 2.000000 | 0.159285 | 0.094513 | 0.345575 | 754.000000 | 0.230394 | 0.660669 | 0.096357 | 0.127504 | 0.159598 | 0.156245 | 9.578745e+05 | 1116.000000 | 2.040501e+07 | 1.040504e+08 | 1.112980e+09 | 1.112980e+09 | 11.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1.000000 | 0.000000 | 0.000000 | 2.000000 | 0.000000 | 2.000000 | 2.000000 | 0.000000 | 0.000000 | 0.166667 | 1.000000 | 1.000000 | 1.000000 | 140.000000 | 0.000000e+00 | 1.000000 | 151.000000 | 4.000000 | 0.000000 | 0.0 | 1.000000 | 0.000000 | 8.000000 | 2.000000 | 1.176471 | 1.000000 | 0.000000 | 113.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 275.000000 | 840.000006 | 5.000000 | 0.000000 | 1.000000 | 0.355694 | 5.676439e-01 | 0.000000 | 0.000000 | 0.230769 | 0.186970 | 0.156027 | 7.115475e-01 | 0.000000 | 0.000000 | 0.122595 | 0.005797 | 23.000000 | 1.000000 | 0.000000 | 0.000000 |
75% | 5781.000000 | 4.000000 | 0.247704 | 0.137316 | 0.428822 | 1151.000000 | 0.252082 | 0.676588 | 0.119282 | 0.150582 | 0.198533 | 0.298914 | 2.722813e+06 | 1419.000000 | 2.040510e+07 | 1.040511e+08 | 1.138371e+09 | 1.138371e+09 | 30.000000 | 1.000000 | 1.000000 | 1.000000 | 2.000000 | 3.000000 | 1.000000 | 0.000000 | 3.000000 | 1.000000 | 3.000000 | 4.000000 | 0.000000 | 0.000000 | 0.333333 | 2.000000 | 1.000000 | 1.000000 | 396.000000 | 8.000000e+00 | 1.000000 | 276.000000 | 10.000000 | 0.000000 | 0.0 | 1.000000 | 0.000000 | 23.000000 | 5.000000 | 1.656250 | 1.000000 | 0.000000 | 210.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 669.000001 | 2404.499998 | 5.000000 | 0.000000 | 1.000000 | 0.597865 | 6.591692e-01 | 0.000000 | 0.091463 | 0.230769 | 0.614582 | 0.376427 | 7.726099e-01 | 0.000000 | 0.009561 | 0.122595 | 0.259648 | 34.000000 | 1.000000 | 1.000000 | 0.000000 |
max | 7783.000000 | 4.000000 | 0.752498 | 0.430576 | 0.932990 | 3057.000000 | 0.440870 | 0.723990 | 0.402483 | 0.543463 | 0.837402 | 0.709200 | 6.355811e+06 | 22761.000000 | 1.000000e+09 | 1.040515e+08 | 1.180218e+09 | 1.180218e+09 | 9999.000000 | 1.000000 | 1.000000 | 1.000000 | 56.000000 | 56.000000 | 1.000000 | 1.000000 | 91.000000 | 1.000000 | 5.000000 | 8.000000 | 1.000000 | 8.000000 | 1.000000 | 73.000000 | 5.000000 | 8.000000 | 11663.000000 | 4.840000e+07 | 1.000000 | 1378.000000 | 221.000000 | 1.000000 | 0.0 | 1.000000 | 1.000000 | 9999.000000 | 105.000000 | 37.000000 | 1.000000 | 1.000000 | 962.000000 | 1.000000 | 1.000000 | 56.000000 | 92.709045 | 12459.000000 | 310590.000100 | 5.000000 | 1.000000 | 1.000000 | 0.680982 | 1.000000e+00 | 1.000000 | 1.000000 | 1.000000 | 0.999676 | 0.505313 | 1.000000e+00 | 1.000000 | 1.000000 | 1.000000 | 0.999377 | 54.000000 | 1.000000 | 1.000000 | 1.000000 |
[40]:
print("The number of records: " + str(len(data['action_num'].unique())))
#or use print(data['action_num'].count())
The number of records: 942816
[37]:
print('Part of missing values for every column')
print(data.isnull().sum() / len(data))
Part of missing values for every column
studentId 0.000000
MiddleSchoolId 0.000000
InferredGender 0.184189
SY ASSISTments Usage 0.000000
AveKnow 0.000000
...
Ln 0.000000
MCAS 0.000000
Enrolled 0.000000
Selective 0.000000
isSTEM 0.663801
Length: 82, dtype: float64
studentId 942816
MiddleSchoolId 942816
InferredGender 769160
SY ASSISTments Usage 942816
AveKnow 942816
...
Ln 942816
MCAS 942816
Enrolled 942816
Selective 942816
isSTEM 316974
Length: 82, dtype: int64
[28]:
len(data.studentId.unique())
[28]:
1709
[29]:
len(data.MiddleSchoolId.unique())
[29]:
4
Sort by student id¶
[86]:
ds = data['studentId'].value_counts().reset_index() #value_counts以后studentid是index,需要reset
ds.columns = [
'studentId',
'count'
]
ds['studentId'] = ds['studentId'].astype(str) + '-' #将数据转成str类。否则纵坐标出错
ds = ds.sort_values(['count']).tail(40)
fig = px.bar(
ds,
x = 'count',
y = 'studentId',
orientation='h',
title='Top 40 students by number of actions'
)
fig.show("svg")
[58]:
ds = data['studentId'].value_counts().reset_index()
ds.columns = [
'studentId',
'count'
]
## Correct answers
ds = ds.sort_values('studentId')
fig = px.histogram(
ds,
x = 'studentId',
y = 'count',
title = 'User action distribution'
)
fig.show("svg")
Sort by MiddleSchoolId¶
[92]:
ds = data['MiddleSchoolId'].value_counts().reset_index()
ds.columns = [
'MiddleSchoolId',
'percent'
]
ds['percent'] /= len(data)
ds = ds.sort_values(['percent'])
fig = px.pie(
ds,
names = 'MiddleSchoolId',
values = 'percent',
title = 'Percent of schools',
)
fig.show("svg")
Sort by correct answers¶
[93]:
ds = data['correct'].value_counts().reset_index()
ds.columns = [
'correct',
'percent'
]
ds['percent'] /= len(data)
ds = ds.sort_values(['percent'])
fig = px.pie(
ds,
names = ['0', '1'],
values = 'percent',
title = 'Percent of correct answers'
)
fig.show("svg")
Sort by problem id¶
[94]:
ds = data['problemId'].value_counts().reset_index()
ds.columns = [
'problemId',
'count'
]
ds['problemId'] = ds['problemId'].astype(str) + '-'
ds = ds.sort_values(['count']).tail(40)
fig = px.bar(
ds,
x = 'count',
y = 'problemId',
orientation = 'h',
title = 'Top 40 useful problem ids'
)
fig.show("svg")
[90]:
ds = data['problemId'].value_counts().reset_index()
ds.columns = [
'problemId',
'count'
]
ds = ds.sort_values('problemId')
fig = px.histogram(
ds,
x='problemId',
y='count',
title='problemid action distribution'
)
fig.show("svg")
[83]:
ds = data['problemType'].value_counts().reset_index()
ds.columns = [
'problemType',
'percent'
]
ds['percent'] /= len(data)
ds = ds.sort_values(['percent'])
fig = px.pie(
ds,
names = 'problemType',
values = 'percent',
title = 'Percent of problem types',
)
fig.show("svg")
[85]:
ds = ds.sort_values(['percent']).tail(6)
fig = make_subplots(rows=3, cols=2)
traces = [
go.Bar(
x = ['wrong', 'right'],
y = [
len(data[(data['problemType'] == item) & (data['correct'] == 0)]),
len(data[(data['problemType'] == item) & (data['correct'] == 1)])
],
name = 'Type: ' + str(item),
text = [
str(round(100*len(data[(data['problemType'] == item)&(data['correct'] == 0)])/len(data[data['problemType'] == item]),2)) + '%',
str(round(100*len(data[(data['problemType'] == item)&(data['correct'] == 1)])/len(data[data['problemType'] == item]),2)) + '%'
],
textposition = 'auto'
) for item in ds['problemType'].unique().tolist()
]
for i in range(len(traces)):
fig.append_trace(
traces[i],
(i //2) + 1,
(i % 2) + 1
)
fig.update_layout(
title_text = 'Percent of correct answers for top 6 problem type',
)
fig.show("svg")
Sort by skills¶
[67]:
ds = data['skill'].dropna() # There are less NaNs in 'skill_id' column than 'skill_name' column.
ds = ds.value_counts().reset_index()
ds.columns = [
'skill',
'count'
]
ds['skill'] = ds['skill'].astype(str) + '-'
ds = ds.sort_values(['count']).tail(40)
fig = px.bar(
ds,
x = 'count',
y = 'skill',
orientation = 'h',
title = 'Top 40 useful skills'
)
fig.show("svg")
[ ]: