OLI data in fall, 2011(transaction)

[1]:
%matplotlib inline
import pandas as pd
import numpy as np
# global configuration: show every rows and cols
pd.set_option('display.max_rows', None)
pd.set_option('max_colwidth',None)
pd.set_option('display.max_columns', None)

1. Data Description

1.1 Column Description

[2]:
# help_table1: the description for data by transactions
df1 = pd.read_csv('OLI_data/help_table1.csv',sep=',',encoding="gbk")
df1 = df1.loc[:, ['Field', 'Annotation']]
df1
[2]:
Field Annotation
0 Row A row counter
1 Sample Name The sample that contains the transaction. If a transaction appears in multiple samples, the transaction will be repeated, but with a different sample name.
2 Transaction Id A unique ID that identifies the transaction. Currently used for annotating transactions with custom fields via web services.
3 Anon Student Id DataShop-generated anonymous student ID. To obtain original student identifiers or to learn more about data anonymization, see About data anonymization below.
4 Session Id A dataset-unique string that identifies the user's session with the tutor.
5 Time Time the transaction occurred. For instance, if a student types "25" and presses return, the transaction time is at the point in which they press return.
6 Time Zone The local time zone (e.g., EST, PST, US/Eastern).
7 Duration (sec) Duration of the transaction in seconds. This is the time of the current transaction minus that of the preceding transaction or problem start event—whichever is closer in time to the current transaction. If this difference is greater than 10 minutes, or if the prior transaction occurred during a different user session, DataShop reports the duration as null (a dot). If the current transaction is preceded by neither another transaction or a problem start event, duration is shown as null. The duration is formatted without decimal places if the two times used in the calculation were without millisecond precision.
8 Student Response Type The type of attempt made by the student (e.g., "ATTEMPT" or "HINT_REQUEST"). This is logged in the semantic_event element.
9 Student Response Subtype A more detailed classification of the student attempt. For example, the CTAT software describes actions taken by the tutor on behalf of the student as having subtype "tutor-performed".
10 Tutor Response Type The type of response made by the tutor (e.g., "RESULT" or "HINT_MSG").
11 Tutor Response Subtype A more detailed classification of the tutor response.
12 Level (level_type) The problem hierarchy name (e.g., "Understanding Fractions") of the type specified in the column header (e.g., "Unit"). There may be multiple "Level" columns if the problem hierarchy is more than one level deep. Level is logged in the level element.
13 Problem Name The name of the problem. Two problems with the same "Problem Name" are considered different "problems" by DataShop if the following logged values are not identical: problem name, context, tutor_flag (whether or not the problem or activity is tutored) and "other" field. These fields are logged in the problem element.
14 Problem View The number of times the student encountered the problem so far. This counter increases with each instance of the same problem. See "Problem View" in the "By Student-Step" table below.
15 Problem Start Time If the problem start time is not given in the original log data, then it is set to the time of the last transaction of the prior problem. If there is no prior problem for the session, the time of the earliest transaction is used. Earliest transaction time is equivalent to the minimum transaction time for the earliest step of the problem. For more detail on how problem start time is determined, see Determining Problem Start Time.
16 Step Name Formed by concatenating the "selection" and "action". Also see the glossary entry for "step".
17 Attempt at Step As of this transaction, the current number of attempts toward the identified step.
18 Outcome The tutor's evaluation of the student's attempt. For example, "CORRECT", "INCORRECT", or "HINT". This is logged in the action_evaluation element.
19 Selection A description of the interface element(s) that the student selected or interacted with (for example, "LowestCommonDenominatorCell"). This is logged in the event_descriptor element.
20 Action A description of the manipulation applied to the selection.
21 Input The input the student submitted (e.g., the text entered, the text of a menu item or a combobox entry).
22 Feedback Text The body of a hint, success, or incorrect action message shown to the student. It is generally a text value, logged in the tutor_advice element.
23 Feedback Classification The type of error (e.g., "sign error") or type of hint.
24 Help Level In the case of hierarchical hints, this is the depth of the hint. "1", for example, is an initial hint, while "3" is the third hint.
25 Total Num Hints The total number of hints available. This is logged in the action_evaluation element.
26 Condition Name The name of the condition (e.g., "Unworked").
27 Condition Type A condition classification (e.g., "Experimental", "Control"); optional at the time of logging.
28 KC (model_name) The knowledge component for this transaction. It is a member of the knowledge component model named in the column header. One "KC (model_name)" column should appear in the export for each KC model in the dataset.
29 KC Category (model_name) The knowledge component "category" logged by some tutors. It is a member of the knowledge component model named in the column header. One "KC Category (model_name)" column should appear in the export for each KC model in the dataset.
30 School The name of the school where the student used the tutor to create this transaction.
31 Class The name of the class the student was in when he or she used the tutor to create this transaction.
32 CF (custom_field_name) The value of a custom field. This is usually information that did not fit into any of the other logging fields (i.e., any of the other columns), and so was logged in this special container.
33 Event Type Allowed values are "assess", "instruct" and "assess_instruct". Blank is also allowed. Only "instruct" and "assess_instruct" values are treated as learning opportunities.

1.2 Summarization of Data

This table organizes the data as student-problem-step-transaction

[3]:
df_transaction =  pd.read_csv('OLI_data/AllData_transaction_2011F.csv',low_memory=False) # sep="\t"
df_transaction.head(5)
[3]:
Row Sample Name Transaction Id Anon Student Id Session Id Time Time Zone Duration (sec) Student Response Type Student Response Subtype Tutor Response Type Tutor Response Subtype Level (Sequence) Level (Unit) Level (Module) Level (Section1) Problem Name Problem View Problem Start Time Step Name Attempt At Step Is Last Attempt Outcome Selection Action Input Input.1 Feedback Text Feedback Classification Help Level Total Num Hints KC (Single-KC) KC Category (Single-KC) KC (Unique-step) KC Category (Unique-step) KC (F2011) KC Category (F2011) KC (F2011).1 KC Category (F2011).1 KC (F2011).2 KC Category (F2011).2 School Class CF (oli:activityGuid) CF (oli:highStakes) CF (oli:purpose) CF (oli:resourceType)
0 1 All Data 2adbe4abefd649d48862d3f62b1abf5e Stu_00b2b35fd027e7891e8a1a527125dd65 8dd109e680020ca6016f8e64290b5610 2011-09-21 17:26:36 US/Eastern 1 VIEW_PAGE UI Event NaN NaN Statics Concentrated Forces and Their Effects Introduction to Free Body Diagrams NaN _m2_assess 1 2011-09-21 17:26:35 NaN NaN NaN NaN Navigation SelectPageNumber 1 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN Marion Technical College MET2010B-01 NaN NaN NaN NaN
1 2 All Data 4393251e32a6f00502f3f1ef894af8fe Stu_00b2b35fd027e7891e8a1a527125dd65 8dd109e680020ca6016f8e64290b5610 2011-09-21 17:35:28 US/Eastern 23.13 ATTEMPT NaN RESULT NaN Statics Concentrated Forces and Their Effects Introduction to Free Body Diagrams NaN _m2_assess 1 2011-09-21 17:26:35 q1_point1i1 UpdateComboBox 1.0 1.0 CORRECT q1_point1i1 UpdateComboBox <material>cord c</material> NaN NaN NaN NaN NaN Single-KC NaN NaN NaN identify_interaction NaN NaN NaN NaN NaN Marion Technical College MET2010B-01 NaN NaN NaN NaN
2 3 All Data e2fb2cb788d10ebaa6f288e0757d1b09 Stu_00b2b35fd027e7891e8a1a527125dd65 8dd109e680020ca6016f8e64290b5610 2011-09-21 17:35:28 US/Eastern 23.13 ATTEMPT NaN RESULT NaN Statics Concentrated Forces and Their Effects Introduction to Free Body Diagrams NaN _m2_assess 1 2011-09-21 17:26:35 q1_point3i3 UpdateComboBox 1.0 1.0 CORRECT q1_point3i3 UpdateComboBox <material>120 lb</material> NaN NaN NaN NaN NaN Single-KC NaN NaN NaN gravitational_forces NaN NaN NaN NaN NaN Marion Technical College MET2010B-01 NaN NaN NaN NaN
3 4 All Data e7e150d423862e346dc7e36a95e394e4 Stu_00b2b35fd027e7891e8a1a527125dd65 8dd109e680020ca6016f8e64290b5610 2011-09-21 17:35:28 US/Eastern 23.13 ATTEMPT NaN RESULT NaN Statics Concentrated Forces and Their Effects Introduction to Free Body Diagrams NaN _m2_assess 1 2011-09-21 17:26:35 q1_point6i2 UpdateComboBox 1.0 1.0 INCORRECT q1_point6i2 UpdateComboBox <material>no interaction</material> NaN NaN NaN NaN NaN Single-KC NaN NaN NaN represent_interaction_spring NaN NaN NaN NaN NaN Marion Technical College MET2010B-01 NaN NaN NaN NaN
4 5 All Data 684b1f770a225f21745c6c4c977ddc32 Stu_00b2b35fd027e7891e8a1a527125dd65 8dd109e680020ca6016f8e64290b5610 2011-09-21 17:35:28 US/Eastern 23.13 ATTEMPT NaN RESULT NaN Statics Concentrated Forces and Their Effects Introduction to Free Body Diagrams NaN _m2_assess 1 2011-09-21 17:26:35 q1_point1i2 UpdateComboBox 1.0 1.0 CORRECT q1_point1i2 UpdateComboBox <material>up</material> NaN NaN NaN NaN NaN Single-KC NaN NaN NaN represent_interaction_cord NaN NaN NaN NaN NaN Marion Technical College MET2010B-01 NaN NaN NaN NaN

2. Data Analysis

[4]:
df_transaction.describe()
[4]:
Row Tutor Response Subtype Problem View Attempt At Step Is Last Attempt Feedback Classification Help Level Total Num Hints KC Category (Single-KC) KC Category (Unique-step) KC Category (F2011) KC Category (F2011).1 KC Category (F2011).2
count 361092.000000 0.0 361092.000000 289858.000000 289858.000000 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
mean 180546.500000 NaN 1.180192 2.382867 0.658678 NaN NaN NaN NaN NaN NaN NaN NaN
std 104238.426039 NaN 0.907172 9.948941 0.474154 NaN NaN NaN NaN NaN NaN NaN NaN
min 1.000000 NaN 1.000000 1.000000 0.000000 NaN NaN NaN NaN NaN NaN NaN NaN
25% 90273.750000 NaN 1.000000 1.000000 0.000000 NaN NaN NaN NaN NaN NaN NaN NaN
50% 180546.500000 NaN 1.000000 1.000000 1.000000 NaN NaN NaN NaN NaN NaN NaN NaN
75% 270819.250000 NaN 1.000000 2.000000 1.000000 NaN NaN NaN NaN NaN NaN NaN NaN
max 361092.000000 NaN 32.000000 427.000000 1.000000 NaN NaN NaN NaN NaN NaN NaN NaN

(1)Analysis for Null and Unique value of column attributes

[5]:
def work_col_analysis(df_work):
    num_nonull_toal = df_work.notnull().sum()  # Not Null
    dict_col_1 = {'col_name':num_nonull_toal.index,'num_nonull':num_nonull_toal.values}
    df_work_col_1 = pd.DataFrame(dict_col_1)

    num_null_toal = df_work.isnull().sum()  # Null
    dict_col_2 = {'col_name':num_null_toal.index,'num_null':num_null_toal.values}
    df_work_col_2 = pd.DataFrame(dict_col_2)

    num_unique_toal = df_work.apply(lambda col: len(col.unique()))   # axis=0
    print(type(num_unique_toal))
    dict_col_3 = {'col_name':num_unique_toal.index,'num_unique':num_unique_toal.values}
    df_work_col_3 = pd.DataFrame(dict_col_3)

    df_work_col = pd.merge(df_work_col_1, df_work_col_2, on=['col_name'])
    df_work_col = pd.merge(df_work_col, df_work_col_3, on=['col_name'])
    return df_work_col
print("-------------------num_unique_toal and num_nonull_toal----------------------")
df_result = work_col_analysis(df_transaction)
df_result
-------------------num_unique_toal and num_nonull_toal----------------------
<class 'pandas.core.series.Series'>
[5]:
col_name num_nonull num_null num_unique
0 Row 361092 0 361092
1 Sample Name 361092 0 1
2 Transaction Id 361092 0 361092
3 Anon Student Id 361092 0 335
4 Session Id 361092 0 6656
5 Time 361092 0 263172
6 Time Zone 361092 0 1
7 Duration (sec) 361092 0 2565
8 Student Response Type 361092 0 5
9 Student Response Subtype 71234 289858 2
10 Tutor Response Type 289858 71234 3
11 Tutor Response Subtype 0 361092 1
12 Level (Sequence) 361092 0 1
13 Level (Unit) 361092 0 7
14 Level (Module) 361092 0 19
15 Level (Section1) 59480 301612 10
16 Problem Name 361092 0 300
17 Problem View 361092 0 32
18 Problem Start Time 361092 0 46473
19 Step Name 289858 71234 383
20 Attempt At Step 289858 71234 428
21 Is Last Attempt 289858 71234 3
22 Outcome 289858 71234 4
23 Selection 361082 10 287
24 Action 361082 10 10
25 Input 302086 59006 6827
26 Input.1 1 361091 2
27 Feedback Text 231063 130029 1579
28 Feedback Classification 0 361092 1
29 Help Level 0 361092 1
30 Total Num Hints 0 361092 1
31 KC (Single-KC) 289858 71234 2
32 KC Category (Single-KC) 0 361092 1
33 KC (Unique-step) 283336 77756 1179
34 KC Category (Unique-step) 0 361092 1
35 KC (F2011) 152592 208500 81
36 KC Category (F2011) 0 361092 1
37 KC (F2011).1 16904 344188 19
38 KC Category (F2011).1 0 361092 1
39 KC (F2011).2 6690 354402 9
40 KC Category (F2011).2 0 361092 1
41 School 361092 0 7
42 Class 361092 0 9
43 CF (oli:activityGuid) 45002 316090 1244
44 CF (oli:highStakes) 45002 316090 3
45 CF (oli:purpose) 44516 316576 4
46 CF (oli:resourceType) 45002 316090 3

(2)Analysis for Discrete value of column attributes

Columns with a small number of discrete values may represent very informative, so identify these columns first and analyze them one by one

[6]:
discrete_cols = []
series = []
cols = list(df_transaction.columns.values)

for col in cols:
    if len(df_transaction[col].unique().tolist()) <= 20 and len(df_transaction[col].unique().tolist()) >= 2:
        discrete_cols.append(col)
        series.append(df_transaction[col].unique().tolist())

for a,b in zip(discrete_cols,series):
    print(a," : ",b)
    print("-"*80)
Student Response Type  :  ['VIEW_PAGE', 'ATTEMPT', 'SAVE_ATTEMPT', 'SUBMIT_ATTEMPT', 'HINT_REQUEST']
--------------------------------------------------------------------------------
Student Response Subtype  :  ['UI Event', nan]
--------------------------------------------------------------------------------
Tutor Response Type  :  [nan, 'RESULT', 'HINT_MSG']
--------------------------------------------------------------------------------
Level (Unit)  :  ['Concentrated Forces and Their Effects', 'Engineering Systems - Single Body Equilibrium', 'Complex Interactions Between Bodies', 'Multiple Body Equilibrium - Frames', 'Multiple Body Equilibrium - Trusses', 'Friction', 'Moments of Inertia']
--------------------------------------------------------------------------------
Level (Module)  :  ['Introduction to Free Body Diagrams', 'Effects of Force', 'Representing Interactions Between Bodies', 'Effects of Multiple Forces', 'Equilibrium Under 2D Concentrated Forces', 'Equilibrium of a Single Subsystem', 'Couples', 'Statically Equivalent Loads', 'Applications of Static Equivalency to Distributed Forces', 'Representing Engineering Connections', 'Drawing FBDs of a Single Subsystem', 'Choosing a Solvable Subsystem', 'Drawing FBDs of Multiple Subsystems', 'Solving Multiple Subsystems', 'Method of Joints', 'Method of Sections', 'Friction', 'Second Moment of Area', 'Mass Moment of Inertia']
--------------------------------------------------------------------------------
Level (Section1)  :  [nan, 'Combining Concurrent Forces', 'Combining Moments', 'Applying Force Equilibrium', 'Applying Force and Moment Equilibrium', 'Simplifying 3D loadings to 2D or 1D loading', 'Fixed Connections', 'Pin Connections', 'Other Connections', 'Center of Gravity and Centroid']
--------------------------------------------------------------------------------
Is Last Attempt  :  [nan, 1.0, 0.0]
--------------------------------------------------------------------------------
Outcome  :  [nan, 'CORRECT', 'INCORRECT', 'HINT']
--------------------------------------------------------------------------------
Action  :  ['SelectPageNumber', 'UpdateComboBox', 'Click', 'UpdateRadioButton', 'UpdateCheckbox', 'UpdateNumberField', 'UpdateShortAnswer', 'UpdateHotspotSingle', 'UpdateHotspotMultiple', nan]
--------------------------------------------------------------------------------
Input.1  :  [nan, 'No, the forces of B on A and A on B shown on the diagram on the right are not correct because body B and body A are interacting on one another when ???F??? is applied to the body ???B??? but A opposite senses on each other. In this case B will push A in a']
--------------------------------------------------------------------------------
KC (Single-KC)  :  [nan, 'Single-KC']
--------------------------------------------------------------------------------
KC (F2011).1  :  [nan, 'rotation_sense_of_force', 'identify_interaction', 'motion_dependence_on_force', 'couple_represents_net_zero_force', 'recognize_equivalence_from_motion', 'relate_direction_normal_force_and_contact', 'moment_sign_sense_relation', 'possible_interaction_for_nonuniform_contact', 'represent_interaction_contacting_body', 'represent_forces_two-force_member', 'represent_interaction_cord', 'identify_enabling_unknown', 'identify_equation_isolates_specific_unknown', 'sense_if_assuming_tension', 'determine_joint_is_solvable', 'judge_force_sense_based_on_sign', 'identify_internal_load_points_on_section', 'identify_external_load_points_on_section']
--------------------------------------------------------------------------------
KC (F2011).2  :  [nan, 'rotation_sense_of_force', 'statics_problem_force_and_moment', 'represent_interaction_cord', 'represent_interaction_pin_connection', 'recognize_variable_solvable_from_subsystem', 'tension_vs_compression_given_force_senses', 'sense_if_assuming_tension', 'identify_internal_load_points_on_section']
--------------------------------------------------------------------------------
School  :  ['Marion Technical College', 'Sinclair Community College', 'Carnegie Mellon University', 'Kettering University', 'Miami University', 'University of Maryland Eastern Shore', 'University of Mississippi']
--------------------------------------------------------------------------------
Class  :  ['MET2010B-01', 'F11-E213-01', '24-261Fall11', 'FEA-Fall11', 'DesignFall11', 'F11-E213-50', 'F11-MME-211', 'ENGE260-F11', 'ENGR309H-F11']
--------------------------------------------------------------------------------
CF (oli:highStakes)  :  [nan, False, True]
--------------------------------------------------------------------------------
CF (oli:purpose)  :  [nan, 'quiz', 'didigetthis', 'learnbydoing']
--------------------------------------------------------------------------------
CF (oli:resourceType)  :  [nan, 'x-oli-assessment2', 'x-oli-inline-assessment']
--------------------------------------------------------------------------------

(3)Data Cleaning

Data Cleaning Suggestions - Redundant columns: Columns that are all NULL or Single value. - Others

[7]:
df_transaction_clear = df_transaction.copy(deep=True) # deep copy
[8]:
# 直接清除所有”冗余列“
cols = list(df_transaction.columns.values)
drop_cols = []
for col in cols:
    if len(df_transaction_clear[col].unique().tolist()) == 1:
        df_transaction_clear.drop(col,axis =1,inplace=True)
        drop_cols.append(col)

print("the cols num before clear: ",len(df_transaction.columns.to_list()))
print("the cols num after clear:",len(df_transaction_clear.columns.to_list()))
for col in drop_cols:
    print("drop:---",col)
the cols num before clear:  47
the cols num after clear: 35
drop:--- Sample Name
drop:--- Time Zone
drop:--- Tutor Response Subtype
drop:--- Level (Sequence)
drop:--- Feedback Classification
drop:--- Help Level
drop:--- Total Num Hints
drop:--- KC Category (Single-KC)
drop:--- KC Category (Unique-step)
drop:--- KC Category (F2011)
drop:--- KC Category (F2011).1
drop:--- KC Category (F2011).2
[9]:
df_transaction_clear.head()
[9]:
Row Transaction Id Anon Student Id Session Id Time Duration (sec) Student Response Type Student Response Subtype Tutor Response Type Level (Unit) Level (Module) Level (Section1) Problem Name Problem View Problem Start Time Step Name Attempt At Step Is Last Attempt Outcome Selection Action Input Input.1 Feedback Text KC (Single-KC) KC (Unique-step) KC (F2011) KC (F2011).1 KC (F2011).2 School Class CF (oli:activityGuid) CF (oli:highStakes) CF (oli:purpose) CF (oli:resourceType)
0 1 2adbe4abefd649d48862d3f62b1abf5e Stu_00b2b35fd027e7891e8a1a527125dd65 8dd109e680020ca6016f8e64290b5610 2011-09-21 17:26:36 1 VIEW_PAGE UI Event NaN Concentrated Forces and Their Effects Introduction to Free Body Diagrams NaN _m2_assess 1 2011-09-21 17:26:35 NaN NaN NaN NaN Navigation SelectPageNumber 1 NaN NaN NaN NaN NaN NaN NaN Marion Technical College MET2010B-01 NaN NaN NaN NaN
1 2 4393251e32a6f00502f3f1ef894af8fe Stu_00b2b35fd027e7891e8a1a527125dd65 8dd109e680020ca6016f8e64290b5610 2011-09-21 17:35:28 23.13 ATTEMPT NaN RESULT Concentrated Forces and Their Effects Introduction to Free Body Diagrams NaN _m2_assess 1 2011-09-21 17:26:35 q1_point1i1 UpdateComboBox 1.0 1.0 CORRECT q1_point1i1 UpdateComboBox <material>cord c</material> NaN NaN Single-KC NaN identify_interaction NaN NaN Marion Technical College MET2010B-01 NaN NaN NaN NaN
2 3 e2fb2cb788d10ebaa6f288e0757d1b09 Stu_00b2b35fd027e7891e8a1a527125dd65 8dd109e680020ca6016f8e64290b5610 2011-09-21 17:35:28 23.13 ATTEMPT NaN RESULT Concentrated Forces and Their Effects Introduction to Free Body Diagrams NaN _m2_assess 1 2011-09-21 17:26:35 q1_point3i3 UpdateComboBox 1.0 1.0 CORRECT q1_point3i3 UpdateComboBox <material>120 lb</material> NaN NaN Single-KC NaN gravitational_forces NaN NaN Marion Technical College MET2010B-01 NaN NaN NaN NaN
3 4 e7e150d423862e346dc7e36a95e394e4 Stu_00b2b35fd027e7891e8a1a527125dd65 8dd109e680020ca6016f8e64290b5610 2011-09-21 17:35:28 23.13 ATTEMPT NaN RESULT Concentrated Forces and Their Effects Introduction to Free Body Diagrams NaN _m2_assess 1 2011-09-21 17:26:35 q1_point6i2 UpdateComboBox 1.0 1.0 INCORRECT q1_point6i2 UpdateComboBox <material>no interaction</material> NaN NaN Single-KC NaN represent_interaction_spring NaN NaN Marion Technical College MET2010B-01 NaN NaN NaN NaN
4 5 684b1f770a225f21745c6c4c977ddc32 Stu_00b2b35fd027e7891e8a1a527125dd65 8dd109e680020ca6016f8e64290b5610 2011-09-21 17:35:28 23.13 ATTEMPT NaN RESULT Concentrated Forces and Their Effects Introduction to Free Body Diagrams NaN _m2_assess 1 2011-09-21 17:26:35 q1_point1i2 UpdateComboBox 1.0 1.0 CORRECT q1_point1i2 UpdateComboBox <material>up</material> NaN NaN Single-KC NaN represent_interaction_cord NaN NaN Marion Technical College MET2010B-01 NaN NaN NaN NaN
[10]:
# the remaining columns
print("-------------------num_unique_toal and num_nonull_toal----------------------")
df_result = work_col_analysis(df_transaction_clear)
df_result
-------------------num_unique_toal and num_nonull_toal----------------------
<class 'pandas.core.series.Series'>
[10]:
col_name num_nonull num_null num_unique
0 Row 361092 0 361092
1 Transaction Id 361092 0 361092
2 Anon Student Id 361092 0 335
3 Session Id 361092 0 6656
4 Time 361092 0 263172
5 Duration (sec) 361092 0 2565
6 Student Response Type 361092 0 5
7 Student Response Subtype 71234 289858 2
8 Tutor Response Type 289858 71234 3
9 Level (Unit) 361092 0 7
10 Level (Module) 361092 0 19
11 Level (Section1) 59480 301612 10
12 Problem Name 361092 0 300
13 Problem View 361092 0 32
14 Problem Start Time 361092 0 46473
15 Step Name 289858 71234 383
16 Attempt At Step 289858 71234 428
17 Is Last Attempt 289858 71234 3
18 Outcome 289858 71234 4
19 Selection 361082 10 287
20 Action 361082 10 10
21 Input 302086 59006 6827
22 Input.1 1 361091 2
23 Feedback Text 231063 130029 1579
24 KC (Single-KC) 289858 71234 2
25 KC (Unique-step) 283336 77756 1179
26 KC (F2011) 152592 208500 81
27 KC (F2011).1 16904 344188 19
28 KC (F2011).2 6690 354402 9
29 School 361092 0 7
30 Class 361092 0 9
31 CF (oli:activityGuid) 45002 316090 1244
32 CF (oli:highStakes) 45002 316090 3
33 CF (oli:purpose) 44516 316576 4
34 CF (oli:resourceType) 45002 316090 3

Outlier Analysis

  • It is found that there is a non-numeric type in duration that is ‘.’ , which should represent 0

[11]:
# Change . to 0 in "duration"
rectify_cols = ['Duration (sec)']
for col in rectify_cols:
    df_transaction_clear[col] = df_transaction_clear[col].apply(lambda x: 0 if x=='.' else x)
    df_transaction_clear[col] = df_transaction_clear[col].astype(float)
print(df_transaction_clear[rectify_cols].dtypes)
Duration (sec)    float64
dtype: object

3. Data Visualization

[12]:
import plotly.express as px
from plotly.subplots import make_subplots
import plotly.graph_objs as go
[13]:
# Histogram of discrete values
def show_value_counts_bar(colname, sort = True):
    ds = df_transaction[colname].value_counts().reset_index()
    ds.columns = [
        colname,
        'Count'
    ]
    if sort:
        ds = ds.sort_values(by='Count', ascending=False)
    # histogram
    fig = px.bar(
        ds,
        x = colname,
        y = 'Count',
        title = colname + ' distribution'
    )
    fig.show("svg")


# Pie of discrete values
def show_value_counts_pie(colname, sort = True):
    ds = df_transaction[colname].value_counts().reset_index()
    ds.columns = [
        colname,
        'percent'
    ]
    ds['percent'] /= len(df_transaction)
    if sort:
        ds = ds.sort_values(by='percent', ascending=False)
    fig = px.pie(
        ds,
        names = colname,
        values = 'percent',
        title = colname+ ' Percentage',
    )
    fig.show("svg")
[14]:
col_pies = ['Student Response Type','Tutor Response Type','Outcome']
for col in col_pies:
    show_value_counts_pie(col)
../../../_images/build_blitz_OLI_Fall2011_OLI_2011F_transaction_22_0.svg
../../../_images/build_blitz_OLI_Fall2011_OLI_2011F_transaction_22_1.svg
../../../_images/build_blitz_OLI_Fall2011_OLI_2011F_transaction_22_2.svg

Analysis by label description: > - If the Student Response Type == ATTEMPT, then the Tutor Response Type == Result, then the Student Response Type => correct or incorrect

  • If the Student Response Type == HINT_REQUEST, then the Tutor Response Type == HINT_MSG, then the outCome => hint

  • If Student Response Type == other,then the Tutor Response Type == NaN, then the outCome => NaN

[15]:
%matplotlib inline
def show_value_counts_pie2(col1,type1,col2, sort = True):
    df_tmp = df_transaction[df_transaction[col1] == type1]
    ds = df_tmp[col2].value_counts().reset_index()
    ds.columns = [
        col2,
        'percent'
    ]
    ds['percent'] /= len(df_tmp)
    if sort:
        ds = ds.sort_values(by='percent', ascending=False)
    fig = px.pie(
        ds,
        names = col2,
        values = 'percent',
        title = col2+ ' Percentage when ' + col1 + ' =='+ type1,
    )
    fig.show("svg")

# Take Student Response Type as an example
col1 = 'Student Response Type'
col2 = 'Outcome'
# col1 = 'Tutor Response Type'
# col2 = 'Outcome'

show_value_counts_pie2(col1,"ATTEMPT",col2)
show_value_counts_pie2(col1,"HINT_REQUEST",col2)
../../../_images/build_blitz_OLI_Fall2011_OLI_2011F_transaction_24_0.svg
../../../_images/build_blitz_OLI_Fall2011_OLI_2011F_transaction_24_1.svg
[16]:
col_bars = ['Level (Unit)','Level (Module)','Level (Section1)','KC (F2011)']

fig = make_subplots(rows=3, cols=2,   # 2*2
      start_cell="top-left",
      subplot_titles=col_bars,
      column_widths=[0.5, 0.5])
traces = [
    go.Bar(
        x = df_transaction[colname].value_counts().reset_index().index.tolist(),
        y = df_transaction[colname].value_counts().reset_index()[colname].tolist(),
        name = 'Type: ' + str(colname),
        text = df_transaction[colname].value_counts().reset_index()['index'].tolist(),
        textposition = 'auto',
    ) for colname in col_bars
]
for i in range(len(traces)):
    fig.append_trace(
        traces[i],
        (i //2) + 1, # pos_row
        (i % 2) + 1  # pos_col
    )

fig.update_layout(
    title_text = 'Bar of distributions for every type',
)

fig.show("svg")

../../../_images/build_blitz_OLI_Fall2011_OLI_2011F_transaction_25_0.svg

According to the chart below, there are 3 schools with a smaller sample of students.

[17]:
# 按学校统计学生人数
schools = [item for item in df_transaction_clear['School'].unique().tolist()]
students = [len(df_transaction_clear[df_transaction_clear['School'] == sch]['Anon Student Id'].unique()) for sch in schools]
fig = go.Figure(data=[go.Bar(
        x = schools,
        y = students,
        name = 'The number of students is counted by school',
        text = schools,
        textposition = 'auto',
    )])
fig.show("svg")
fig = go.Figure(data=[go.Pie(
        labels = schools,
        values = students,
        name = 'The number of students is counted by school',
        text = schools,
        textposition = 'auto',
    )])
fig.show("svg")
../../../_images/build_blitz_OLI_Fall2011_OLI_2011F_transaction_27_0.svg
../../../_images/build_blitz_OLI_Fall2011_OLI_2011F_transaction_27_1.svg