ASSISTments2017 Data Analysis

Data Description

Column Description

Field

Annotation

student id

a deidentified ID/tag used for identifying an individual student

SY ASSISTments Usage

the academic years the student used ASSISTments

AveKnow

average student knowledge level (according to Bayesian Knowledge Tracing algorithm – cf. Corbett & Anderson, 1995)

AveCarelessness

average student carelessness (according to San Pedro, Baker, & Rodrigo, 2011 model)

AveCorrect

average student correctness

NumActions

total number of student actions in system

AveResBored

average student affect: boredom (see Pardos, Baker, San Pedro, Gowda, & Gowda, 2014)

AveResEngcon

average student affect:engaged concentration (see Pardos, Baker, San Pedro, Gowda, & Gowda, 2014)

AveResConf

average student affect:confusion (see Pardos, Baker, San Pedro, Gowda, & Gowda, 2014)

AveResFrust

average student affect:frustration (see Pardos, Baker, San Pedro, Gowda, & Gowda, 2014)

AveResOfftask

average student affect: off task (see Pardos, Baker, San Pedro, Gowda, & Gowda, 2014 and also Baker, 2007)

AveResGaming

average student affect:gaming the system (see Pardos, Baker, San Pedro, Gowda, & Gowda, 2014 and also Baker Corbett Koedinger & Wagner, 2004)

actionId

the unique id of this specific action

skill

a tag used for identifying the cognitive skill related to the problem (see Razzaq, Heffernan, Feng, & Pardos, 2007)

problemId

a unique ID used for identifying a single problem

assignmentId

a unique ID used for identifying an assignment

assistmentId

a unique ID used for identifying an assistment (a instance of a multi-part problem)

startTime

when did the student start the problem (UNIX time, seconds)

endTime

when did the student end the problem (UNIX time, seconds)

timeTaken

Time spent on the current step

correct

Answer is correct

original

Problem is original not a scaffolding problem

hint

Action is a hint response

hintCount

Total number of hints requested so far

hintTotal

total number of hints requested for the problem

scaffold

Problem is a scaffolding problem

bottomHint

Bottom-out hint is used

attemptCount

Total problems attempted in the tutor so far.

problemType

the type of the problem

frIsHelpRequest

First response is a help request

frPast5HelpRequest

Number of last 5 First responses that included a help request

frPast8HelpRequest

Number of last 8 First responses that included a help request

stlHintUsed

Second to last hint is used an indicates a hint that gives considerable detail but is not quite bottom-out

past8BottomOut

Number of last 8 problems that used the bottom-out hint.

totalFrPercentPastWrong

Percent of all past problems that were wrong on this KC.

totalFrPastWrongCount

Total first responses wrong attempts in the tutor so far.

frPast5WrongCount

Number of last 5 First responses that were wrong

frPast8WrongCount

Number of last 8 First responses that were wrong

totalFrTimeOnSkill

Total first response time spent on this KC across all problems

timeSinceSkill

Time since the current KC was last seen.

frWorkingInSchool

First response Working during school hours (between 7:00 am and 3:00 pm)

totalFrAttempted

Total first responses attempted in the tutor so far.

totalFrSkillOpportunities

Total first response practice opportunities on this KC so far.

responseIsFillIn

Response is filled in (No list of answers available)

responseIsChosen

Response is chosen from a list of answers (Multiple choice, etc).

endsWithScaffolding

Problem ends with scaffolding

endsWithAutoScaffolding

Problem ends with automatic scaffolding

frTimeTakenOnScaffolding

First response time taken on scaffolding problems

frTotalSkillOpportunitiesScaffolding

Total first response practice opportunities on this skill so far

totalFrSkillOpportunitiesByScaffolding

Total first response scaffolding opportunities for this KC so far

frIsHelpRequestScaffolding

First response is a help request Scaffolding

timeGreater5Secprev2wrong

Long pauses after 2 Consecutive wrong answers

sumRight

NaN

helpAccessUnder2Sec

Time spent on help was under 2 seconds

timeGreater10SecAndNextActionRight

Long pause after correct answer

consecutiveErrorsInRow

Total number of 2 wrong answers in a row across all the problems

sumTime3SDWhen3RowRight

NaN

sumTimePerSkill

NaN

totalTimeByPercentCorrectForskill

Total time spent on this KC across all problems divided by percent correct for the same KC

prev5count

NaN

timeOver80

NaN

manywrong

NaN

confidence(BORED)

the confidence of the student affect prediction: bored

confidence(CONCENTRATING)

the confidence of the student affect prediction: concecntrating

confidence(CONFUSED)

the confidence of the student affect prediction: confused

confidence(FRUSTRATED)

the confidence of the student affect prediction: frustrated

confidence(OFF TASK)

the confidence of the student affect prediction: off task

confidence(GAMING)

the confidence of the student affect prediction: gaming

RES_BORED

rescaled of the confidence of the student affect prediction: boredom

RES_CONCENTRATING

rescaled of the confidence of the student affect prediction: concentration

RES_CONFUSED

rescaled of the confidence of the student affect prediction: confusion

RES_FRUSTRATED

rescaled of the confidence of the student affect prediction: frustration

RES_OFFTASK

rescaled of the confidence of the student affect prediction: off task

RES_GAMING

rescaled of the confidence of the student affect prediction: gaming

Ln-1

baysian knowledge tracing’s knowledge estimate at the previous time step

Ln

baysian knowledge tracing’s knowledge estimate at the time step

schoolID

the id (anonymized) of the school the student was in during the year the data was collected

MCAS

Massachusetts Comprehensive Assessment System test score. In short, this number is the student’s state test score (outside ASSISTments) during that year. -999 represents the data is missing

[1]:
import numpy as np
import pandas as pd

import plotly.express as asz
from plotly.subplots import make_subplots
import plotly.graph_objs as go
[70]:
path = "anonymized_full_release_competition_dataset.csv"
data = pd.read_csv(path, encoding = "ISO-8859-15",low_memory=False)
[36]:
pd.set_option('display.max_columns', 500)
data.head()
[36]:
studentId MiddleSchoolId InferredGender SY ASSISTments Usage AveKnow AveCarelessness AveCorrect NumActions AveResBored AveResEngcon AveResConf AveResFrust AveResOfftask AveResGaming action_num skill problemId problemType assignmentId assistmentId startTime endTime timeTaken correct original hint hintCount hintTotal scaffold bottomHint attemptCount frIsHelpRequest frPast5HelpRequest frPast8HelpRequest stlHintUsed past8BottomOut totalFrPercentPastWrong totalFrPastWrongCount frPast5WrongCount frPast8WrongCount totalFrTimeOnSkill timeSinceSkill frWorkingInSchool totalFrAttempted totalFrSkillOpportunities responseIsFillIn responseIsChosen endsWithScaffolding endsWithAutoScaffolding frTimeTakenOnScaffolding frTotalSkillOpportunitiesScaffolding totalFrSkillOpportunitiesByScaffolding frIsHelpRequestScaffolding timeGreater5Secprev2wrong sumRight helpAccessUnder2Sec timeGreater10SecAndNextActionRight consecutiveErrorsInRow sumTime3SDWhen3RowRight sumTimePerSkill totalTimeByPercentCorrectForskill Prev5count timeOver80 manywrong confidence(BORED) confidence(CONCENTRATING) confidence(CONFUSED) confidence(FRUSTRATED) confidence(OFF TASK) confidence(GAMING) RES_BORED RES_CONCENTRATING RES_CONFUSED RES_FRUSTRATED RES_OFFTASK RES_GAMING Ln-1 Ln MCAS Enrolled Selective isSTEM
0 8 2 Male 2004-2005 0.352416 0.183276 0.483902 1056 0.208389 0.679126 0.115905 0.112408 0.156503 0.196561 9950 properties-of-geometric-figures 1118 textfieldquestion 20405010 104051118 1096470301 1096470350 49.0 0 1 1 1 1 0 0 1 1 0 0 0 0 0.0 0 0 0 0.0 0.0 1 0 0 0 0 0 0 0.0 0 0.0 0 0 0 0 0 0 0.0 49.0 0.000000 0 0 0 0.597865 0.234294 0.0000 0.0 0.838710 0.008522 0.376427 0.320317 0.000000 0.0 0.785585 0.000264 0.13 0.061190409 45 0 0 NaN
1 8 2 Male 2004-2005 0.352416 0.183276 0.483902 1056 0.208389 0.679126 0.115905 0.112408 0.156503 0.196561 9951 properties-of-geometric-figures 1119 noprobtype 20405010 104051119 1096470350 1096470354 4.0 1 0 0 0 0 1 0 1 1 1 1 0 0 0.0 0 0 0 49.0 0.0 1 1 1 0 0 1 0 4.0 0 0.0 1 0 1 0 1 0 0.0 53.0 106.000000 1 0 0 0.355694 0.992585 0.9375 0.0 0.600000 0.047821 0.156027 0.995053 0.887452 0.0 0.468252 0.001483 0.061190409 0.213509945 45 0 0 NaN
2 8 2 Male 2004-2005 0.352416 0.183276 0.483902 1056 0.208389 0.679126 0.115905 0.112408 0.156503 0.196561 9952 sum-of-interior-angles-more-than-3-sides 1120 noprobtype 20405010 104051120 1096470354 1096470360 6.0 0 0 0 0 0 0 0 1 0 0 0 0 0 0.0 0 0 0 0.0 0.0 1 2 0 0 0 0 0 6.0 0 0.0 0 0 1 0 0 0 0.0 6.0 0.000000 2 0 0 0.355694 0.992585 0.9375 0.0 0.600000 0.047821 0.156027 0.995053 0.887452 0.0 0.468252 0.001483 0.116 0.033305768 45 0 0 NaN
3 8 2 Male 2004-2005 0.352416 0.183276 0.483902 1056 0.208389 0.679126 0.115905 0.112408 0.156503 0.196561 9953 sum-of-interior-angles-more-than-3-sides 1120 noprobtype 20405010 104051120 1096470360 1096470378 18.0 0 0 0 0 0 0 0 2 0 0 0 0 0 0.0 1 0 0 0.0 0.0 1 3 1 0 0 0 0 6.0 1 0.0 0 0 1 0 0 1 0.0 24.0 0.000000 3 0 0 0.355694 0.617065 0.0000 0.0 0.204082 0.343996 0.156027 0.744520 0.000000 0.0 0.108417 0.010665 0.116 0.033305768 45 0 0 NaN
4 8 2 Male 2004-2005 0.352416 0.183276 0.483902 1056 0.208389 0.679126 0.115905 0.112408 0.156503 0.196561 9954 sum-of-interior-angles-more-than-3-sides 1121 noprobtype 20405010 104051121 1096470378 1096470380 2.0 1 0 0 0 1 0 0 1 0 0 0 0 0 1.0 1 1 1 6.0 0.0 1 3 1 0 0 0 0 2.0 1 1.0 0 0 2 0 1 0 0.0 26.0 77.999999 4 0 1 0.355694 0.617065 0.0000 0.0 0.204082 0.343996 0.156027 0.744520 0.000000 0.0 0.108417 0.010665 0.033305768 0.118385889 45 0 0 NaN

General features

[23]:
data.describe()
[23]:
studentId MiddleSchoolId AveKnow AveCarelessness AveCorrect NumActions AveResBored AveResEngcon AveResConf AveResFrust AveResOfftask AveResGaming action_num problemId assignmentId assistmentId startTime endTime timeTaken correct original hint hintCount hintTotal scaffold bottomHint attemptCount frIsHelpRequest frPast5HelpRequest frPast8HelpRequest stlHintUsed past8BottomOut totalFrPercentPastWrong totalFrPastWrongCount frPast5WrongCount frPast8WrongCount totalFrTimeOnSkill timeSinceSkill frWorkingInSchool totalFrAttempted totalFrSkillOpportunities responseIsFillIn responseIsChosen endsWithScaffolding endsWithAutoScaffolding frTimeTakenOnScaffolding frTotalSkillOpportunitiesScaffolding totalFrSkillOpportunitiesByScaffolding frIsHelpRequestScaffolding timeGreater5Secprev2wrong sumRight helpAccessUnder2Sec timeGreater10SecAndNextActionRight consecutiveErrorsInRow sumTime3SDWhen3RowRight sumTimePerSkill totalTimeByPercentCorrectForskill Prev5count timeOver80 manywrong confidence(BORED) confidence(CONCENTRATING) confidence(CONFUSED) confidence(FRUSTRATED) confidence(OFF TASK) confidence(GAMING) RES_BORED RES_CONCENTRATING RES_CONFUSED RES_FRUSTRATED RES_OFFTASK RES_GAMING MCAS Enrolled Selective isSTEM
count 942816.000000 942816.000000 942816.000000 942816.000000 942816.000000 942816.000000 942816.000000 942816.000000 942816.000000 942816.000000 942816.000000 942816.000000 9.428160e+05 942816.000000 9.428160e+05 9.428160e+05 9.428160e+05 9.428160e+05 942816.000000 942816.000000 942816.000000 942816.000000 942816.000000 942816.000000 942816.000000 942816.000000 942816.000000 942816.000000 942816.000000 942816.000000 942816.000000 942816.000000 942816.000000 942816.000000 942816.000000 942816.000000 942816.000000 9.428160e+05 942816.000000 942816.000000 942816.000000 942816.000000 942816.0 942816.000000 942816.000000 942816.000000 942816.000000 942816.000000 942816.000000 942816.000000 942816.000000 942816.000000 942816.000000 942816.000000 942731.000000 942816.000000 942816.000000 942816.000000 942816.000000 942816.000000 942816.000000 9.428160e+05 942816.000000 942816.000000 942816.000000 942816.000000 942816.000000 9.428160e+05 942816.000000 942816.000000 942816.000000 942816.000000 942816.000000 942816.000000 942816.000000 316974.000000
mean 3844.844105 2.515472 0.195155 0.109436 0.372681 869.850594 0.232949 0.658442 0.098940 0.131406 0.172212 0.192703 1.849329e+06 1899.719319 1.198773e+07 6.061572e+07 1.120793e+09 1.120793e+09 29.747869 0.372681 0.264214 0.331025 1.218490 1.953967 0.385732 0.062794 2.673605 0.268104 1.947322 2.575323 0.004012 0.241678 0.227882 1.988008 0.719380 0.944750 376.213405 4.850802e+05 0.974588 193.316005 8.381019 0.023131 0.0 0.636085 0.005724 24.060853 3.989371 1.036189 0.670047 0.045388 145.982059 0.055674 0.207414 0.155307 0.087042 601.665586 2166.143744 4.972689 0.081323 0.715638 0.436958 5.400894e-01 0.134450 0.164114 0.256006 0.337888 0.232949 6.584415e-01 0.098940 0.131406 0.172212 0.192703 -95.982302 0.641147 0.300434 0.204178
std 2250.484065 1.039785 0.116451 0.059952 0.107367 530.210725 0.030637 0.027440 0.034788 0.038875 0.057992 0.153455 1.726001e+06 2579.212724 1.434706e+07 5.128829e+07 1.940359e+07 1.940354e+07 72.019768 0.483519 0.440914 0.470582 1.980665 2.929242 0.486768 0.242592 2.929801 0.442972 1.712580 2.457799 0.063217 0.674613 0.271404 3.390149 0.832699 1.076276 689.302924 2.075599e+06 0.157373 164.898869 11.998292 0.150319 0.0 0.481125 0.075443 71.665655 6.581897 1.184489 0.470196 0.208155 124.342503 0.229291 0.405455 0.885692 1.619202 953.900687 4601.435964 0.315281 0.273331 0.451110 0.120751 1.830483e-01 0.292877 0.326057 0.213177 0.335292 0.116371 1.734275e-01 0.249505 0.300351 0.216997 0.340232 332.827628 0.479664 0.458447 0.403100
min 8.000000 1.000000 0.028057 0.007801 0.000000 2.000000 0.170871 0.403309 0.005075 0.000000 0.083167 0.001974 9.950000e+03 1.000000 2.000000e+00 5.000000e+00 1.095421e+09 1.095421e+09 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 -9.928014e+06 0.000000 0.000000 0.000000 0.000000 0.0 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 -11.332080 0.000000 0.000000 0.000000 0.000000 0.000000 0.355694 6.500000e-07 0.000000 0.000000 0.000000 0.000039 0.156027 8.890000e-07 0.000000 0.000000 0.000000 0.000001 -999.000000 0.000000 0.000000 0.000000
25% 1952.000000 2.000000 0.110542 0.068760 0.294989 478.000000 0.209035 0.642060 0.076385 0.107278 0.131467 0.060724 7.221708e+05 721.000000 7.230000e+02 2.213000e+03 1.103136e+09 1.103136e+09 5.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 37.000000 0.000000e+00 1.000000 67.000000 2.000000 0.000000 0.0 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 50.000000 0.000000 0.000000 0.000000 0.000000 107.000000 189.736571 5.000000 0.000000 0.000000 0.355694 3.743169e-01 0.000000 0.000000 0.090909 0.047821 0.156027 5.117519e-01 0.000000 0.000000 0.048295 0.001483 14.000000 0.000000 0.000000 0.000000
50% 3766.000000 2.000000 0.159285 0.094513 0.345575 754.000000 0.230394 0.660669 0.096357 0.127504 0.159598 0.156245 9.578745e+05 1116.000000 2.040501e+07 1.040504e+08 1.112980e+09 1.112980e+09 11.000000 0.000000 0.000000 0.000000 0.000000 1.000000 0.000000 0.000000 2.000000 0.000000 2.000000 2.000000 0.000000 0.000000 0.166667 1.000000 1.000000 1.000000 140.000000 0.000000e+00 1.000000 151.000000 4.000000 0.000000 0.0 1.000000 0.000000 8.000000 2.000000 1.176471 1.000000 0.000000 113.000000 0.000000 0.000000 0.000000 0.000000 275.000000 840.000006 5.000000 0.000000 1.000000 0.355694 5.676439e-01 0.000000 0.000000 0.230769 0.186970 0.156027 7.115475e-01 0.000000 0.000000 0.122595 0.005797 23.000000 1.000000 0.000000 0.000000
75% 5781.000000 4.000000 0.247704 0.137316 0.428822 1151.000000 0.252082 0.676588 0.119282 0.150582 0.198533 0.298914 2.722813e+06 1419.000000 2.040510e+07 1.040511e+08 1.138371e+09 1.138371e+09 30.000000 1.000000 1.000000 1.000000 2.000000 3.000000 1.000000 0.000000 3.000000 1.000000 3.000000 4.000000 0.000000 0.000000 0.333333 2.000000 1.000000 1.000000 396.000000 8.000000e+00 1.000000 276.000000 10.000000 0.000000 0.0 1.000000 0.000000 23.000000 5.000000 1.656250 1.000000 0.000000 210.000000 0.000000 0.000000 0.000000 0.000000 669.000001 2404.499998 5.000000 0.000000 1.000000 0.597865 6.591692e-01 0.000000 0.091463 0.230769 0.614582 0.376427 7.726099e-01 0.000000 0.009561 0.122595 0.259648 34.000000 1.000000 1.000000 0.000000
max 7783.000000 4.000000 0.752498 0.430576 0.932990 3057.000000 0.440870 0.723990 0.402483 0.543463 0.837402 0.709200 6.355811e+06 22761.000000 1.000000e+09 1.040515e+08 1.180218e+09 1.180218e+09 9999.000000 1.000000 1.000000 1.000000 56.000000 56.000000 1.000000 1.000000 91.000000 1.000000 5.000000 8.000000 1.000000 8.000000 1.000000 73.000000 5.000000 8.000000 11663.000000 4.840000e+07 1.000000 1378.000000 221.000000 1.000000 0.0 1.000000 1.000000 9999.000000 105.000000 37.000000 1.000000 1.000000 962.000000 1.000000 1.000000 56.000000 92.709045 12459.000000 310590.000100 5.000000 1.000000 1.000000 0.680982 1.000000e+00 1.000000 1.000000 1.000000 0.999676 0.505313 1.000000e+00 1.000000 1.000000 1.000000 0.999377 54.000000 1.000000 1.000000 1.000000
[40]:
print("The number of records: " + str(len(data['action_num'].unique())))
#or use print(data['action_num'].count())
The number of records: 942816
[37]:
print('Part of missing values for every column')
print(data.isnull().sum() / len(data))
Part of missing values for every column
studentId               0.000000
MiddleSchoolId          0.000000
InferredGender          0.184189
SY ASSISTments Usage    0.000000
AveKnow                 0.000000
                          ...
Ln                      0.000000
MCAS                    0.000000
Enrolled                0.000000
Selective               0.000000
isSTEM                  0.663801
Length: 82, dtype: float64
studentId               942816
MiddleSchoolId          942816
InferredGender          769160
SY ASSISTments Usage    942816
AveKnow                 942816
                         ...
Ln                      942816
MCAS                    942816
Enrolled                942816
Selective               942816
isSTEM                  316974
Length: 82, dtype: int64
[28]:
len(data.studentId.unique())
[28]:
1709
[29]:
len(data.MiddleSchoolId.unique())
[29]:
4

Sort by student id

[86]:
ds = data['studentId'].value_counts().reset_index() #value_counts以后studentid是index,需要reset

ds.columns = [
    'studentId',
    'count'
]

ds['studentId'] = ds['studentId'].astype(str) + '-' #将数据转成str类。否则纵坐标出错
ds = ds.sort_values(['count']).tail(40)

fig = px.bar(
    ds,
    x = 'count',
    y = 'studentId',
    orientation='h',
    title='Top 40 students by number of actions'
)

fig.show("svg")
../../../_images/build_blitz_ASSISTments_ASSISTments2017_12_0.svg
[58]:
ds = data['studentId'].value_counts().reset_index()

ds.columns = [
    'studentId',
    'count'
]
## Correct answers
ds = ds.sort_values('studentId')

fig = px.histogram(
    ds,
    x = 'studentId',
    y = 'count',
    title = 'User action distribution'
)

fig.show("svg")
../../../_images/build_blitz_ASSISTments_ASSISTments2017_13_0.svg

Sort by MiddleSchoolId

[92]:
ds = data['MiddleSchoolId'].value_counts().reset_index()

ds.columns = [
    'MiddleSchoolId',
    'percent'
]

ds['percent'] /= len(data)
ds = ds.sort_values(['percent'])

fig = px.pie(
    ds,
    names = 'MiddleSchoolId',
    values = 'percent',
    title = 'Percent of schools',
)

fig.show("svg")
../../../_images/build_blitz_ASSISTments_ASSISTments2017_15_0.svg

Sort by correct answers

[93]:
ds = data['correct'].value_counts().reset_index()

ds.columns = [
    'correct',
    'percent'
]

ds['percent'] /= len(data)
ds = ds.sort_values(['percent'])

fig = px.pie(
    ds,
    names = ['0', '1'],
    values = 'percent',
    title = 'Percent of correct answers'
)

fig.show("svg")

../../../_images/build_blitz_ASSISTments_ASSISTments2017_17_0.svg

Sort by problem id

[94]:
ds = data['problemId'].value_counts().reset_index()

ds.columns = [
    'problemId',
    'count'
]

ds['problemId'] = ds['problemId'].astype(str) + '-'
ds = ds.sort_values(['count']).tail(40)

fig = px.bar(
    ds,
    x = 'count',
    y = 'problemId',
    orientation = 'h',
    title = 'Top 40 useful problem ids'
)

fig.show("svg")
../../../_images/build_blitz_ASSISTments_ASSISTments2017_19_0.svg
[90]:
ds = data['problemId'].value_counts().reset_index()

ds.columns = [
    'problemId',
    'count'
]

ds = ds.sort_values('problemId')

fig = px.histogram(
    ds,
    x='problemId',
    y='count',
    title='problemid action distribution'
)

fig.show("svg")
../../../_images/build_blitz_ASSISTments_ASSISTments2017_20_0.svg
[83]:
ds = data['problemType'].value_counts().reset_index()

ds.columns = [
    'problemType',
    'percent'
]

ds['percent'] /= len(data)
ds = ds.sort_values(['percent'])

fig = px.pie(
    ds,
    names = 'problemType',
    values = 'percent',
    title = 'Percent of problem types',
)

fig.show("svg")
../../../_images/build_blitz_ASSISTments_ASSISTments2017_21_0.svg
[85]:
ds = ds.sort_values(['percent']).tail(6)

fig = make_subplots(rows=3, cols=2)

traces = [
    go.Bar(
        x = ['wrong', 'right'],
        y = [
            len(data[(data['problemType'] == item) & (data['correct'] == 0)]),
            len(data[(data['problemType'] == item) & (data['correct'] == 1)])
        ],
        name = 'Type: ' + str(item),
        text = [
            str(round(100*len(data[(data['problemType'] == item)&(data['correct'] == 0)])/len(data[data['problemType'] == item]),2)) + '%',
            str(round(100*len(data[(data['problemType'] == item)&(data['correct'] == 1)])/len(data[data['problemType'] == item]),2)) + '%'
        ],
        textposition = 'auto'
    ) for item in ds['problemType'].unique().tolist()
]

for i in range(len(traces)):
    fig.append_trace(
        traces[i],
        (i //2) + 1,
        (i % 2) + 1
    )

fig.update_layout(
    title_text = 'Percent of correct answers for top 6 problem type',
)

fig.show("svg")
../../../_images/build_blitz_ASSISTments_ASSISTments2017_22_0.svg

Sort by skills

[67]:
ds = data['skill'].dropna() # There are less NaNs in 'skill_id' column than 'skill_name' column.
ds = ds.value_counts().reset_index()

ds.columns = [
    'skill',
    'count'
]

ds['skill'] = ds['skill'].astype(str) + '-'
ds = ds.sort_values(['count']).tail(40)

fig = px.bar(
    ds,
    x = 'count',
    y = 'skill',
    orientation = 'h',
    title = 'Top 40 useful skills'
)

fig.show("svg")
../../../_images/build_blitz_ASSISTments_ASSISTments2017_24_0.svg
[ ]: