math2015-Math2 Data Analysis

[1]:
import numpy as np
import pandas as pd

import plotly.express as px
from plotly.subplots import make_subplots
import plotly.graph_objs as go
[2]:
path = "C:/Users/Administrator/Desktop/Math2/rawdata.txt"
data = pd.read_table(path,header=None)

RECORDS

The learning records are saved in a score matrix. Each row corresponds to the records of a student on different test items.

[3]:
pd.set_option('display.max_rows',10)
data
[3]:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
0 3 0 0 0 3 0 3 3 3 3 0 0 0 0 0 0 0 0 6 0
1 3 3 3 3 3 3 3 0 3 0 0 3 4 0 0 4 12 12 6 1
2 3 3 3 0 0 0 3 3 0 0 0 3 4 0 0 0 8 0 6 0
3 0 3 3 3 0 3 3 0 3 3 0 3 0 0 0 0 12 12 6 0
4 3 3 0 0 3 3 0 0 3 0 0 3 0 0 0 4 4 12 6 0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
3906 3 3 0 3 3 0 0 0 3 0 0 3 0 0 0 0 6 0 6 1
3907 3 0 3 3 3 3 3 3 0 0 3 3 4 4 0 0 6 2 6 0
3908 3 3 0 3 3 3 3 0 3 3 0 0 0 0 0 0 2 12 6 4
3909 3 3 0 3 3 3 0 0 3 0 0 3 0 0 4 0 1 12 3 0
3910 3 3 3 3 0 3 0 3 3 0 0 0 0 0 0 0 6 8 6 2

3911 rows × 20 columns

For example, the first row presents the learning records of student 0 where he gets 3 point on item 0 and 6 point on item 18.

General features

[4]:
data.describe()
[4]:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
count 3911.000000 3911.000000 3911.000000 3911.000000 3911.000000 3911.000000 3911.000000 3911.000000 3911.000000 3911.00000 3911.000000 3911.000000 3911.000000 3911.000000 3911.000000 3911.000000 3911.000000 3911.000000 3911.000000 3911.000000
mean 2.633342 2.120174 1.608540 2.162363 1.300946 1.893122 1.765789 0.441064 2.145487 1.72360 1.418307 1.548709 1.557658 0.413194 0.805932 0.773204 4.384301 5.333419 6.609051 1.724367
std 0.982743 1.365965 1.496259 1.346009 1.486924 1.447754 1.476453 1.062517 1.354184 1.48343 1.497965 1.499401 1.950719 1.217549 1.604637 1.579750 3.812531 4.820757 3.085368 2.542170
min 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.00000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
25% 3.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.00000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 6.000000 0.000000
50% 3.000000 3.000000 3.000000 3.000000 0.000000 3.000000 3.000000 0.000000 3.000000 3.00000 0.000000 3.000000 0.000000 0.000000 0.000000 0.000000 4.000000 5.000000 6.000000 0.000000
75% 3.000000 3.000000 3.000000 3.000000 3.000000 3.000000 3.000000 0.000000 3.000000 3.00000 3.000000 3.000000 4.000000 0.000000 0.000000 0.000000 6.000000 11.000000 7.000000 3.000000
max 3.000000 3.000000 3.000000 3.000000 3.000000 3.000000 3.000000 3.000000 3.000000 3.00000 3.000000 3.000000 4.000000 4.000000 4.000000 4.000000 12.000000 12.000000 12.000000 10.000000
[5]:
print('The number of records:' + str(len(data)))
The number of records:3911
[6]:
data['count']=data.apply(lambda x: x.sum(),axis=1)
data['index1']=data.index
ds=data.loc[:, ['count', 'index1']]
ds['index1'] = ds['index1'].astype(str) + '-'
ds = ds.sort_values(['count']).tail(50)
fig = px.bar(
    ds,
    x = 'count',
    y = 'index1',
    orientation='h',
    title='Top 50 students by score'
)

fig.show("svg")
../../../_images/build_blitz_math2015_Math2_10_0.svg

This figure shows the total score of Top 50 students.

[7]:
ds=data.loc[:, ['count', 'index1']]
ds = ds.sort_values(['index1'])
fig = px.histogram(
    ds,
    x='index1',
    y='count',
    title='Students score distribution'
)
fig.show("svg")
../../../_images/build_blitz_math2015_Math2_12_0.svg

Sort by correct rate

[8]:
path = "C:/Users/Administrator/Desktop/Math2/data.txt"
data = pd.read_table(path,header=None)
[9]:
ds = data.mean()
ds1=pd.DataFrame(columns=['problem_id','count'])
for i in range(len(ds)):
    new=pd.DataFrame({
        'problem_id':int(i),
        'count':ds[i]
    },index=[0]
    )
    ds1=ds1.append(new,ignore_index=True)

ds1=ds1.sort_values(['count'])
ds1['problem_id'] = (ds1['problem_id']).astype(str) + '-'
fig = px.bar(
    ds1,
    x = 'count',
    y = 'problem_id',
    orientation = 'h',
    title = 'Average correct rate of questions'
)

fig.show("svg")
../../../_images/build_blitz_math2015_Math2_15_0.svg

This figure presents the average correct rate of questions.It’s obvious that students do the best on item 0 but need to improve on item 13.

Sort by problem type

[10]:
ds = data.mean()
ds1=pd.DataFrame(columns=['problem_id','count'])
for i in range(len(ds)):
    new=pd.DataFrame({
        'problem_id':int(i),
        'count':ds[i]
    },index=[0]
    )
    ds1=ds1.append(new,ignore_index=True)

data2= [('Obj',ds1[ds1['problem_id']<15]['count'].mean()),
    ('Sub',ds1[ds1['problem_id']>=14]['count'].mean())]
ds2 = pd.DataFrame(
    data=data2,
    columns=['Type','Percent']
)

fig = make_subplots(rows=1,cols=2)
traces = [
    go.Bar(
        x=['wrong','right'],
        y=[
        1-float(ds2[ds2['Type']==item]['Percent']),
        float(ds2[ds2['Type']==item]['Percent'])
        ],
        name='Type:' + str(item),
        text=[
        str(round(100*(1-float(ds2[ds2['Type']==item]['Percent'])))) + '%',
        str(round(100*float(ds2[ds2['Type']==item]['Percent']))) + '%'
        ],
        textposition='auto'
    ) for item in ds2['Type'].tolist()
]
for i in range(len(traces)):
    fig.append_trace(
        traces[i],
        (i //2) + 1,
        (i % 2) + 1
    )
fig.update_layout(
    title_text = 'Average correct rate of questions for every problem type',
)

fig.show("svg")
../../../_images/build_blitz_math2015_Math2_18_0.svg

This figure shows that students do better in objective questions and their ability to answer subjective questions needs to be strengthened.

[11]:
path = "C:/Users/Administrator/Desktop/Math2/problemdesc.txt"
data1 = pd.read_table(path,header=0)
[12]:
count = data1['Full Score'].sum()
data2= [('Obj',data1[data1['Type']=='Obj']['Full Score'].sum()),
    ('Sub',data1[data1['Type']=='Sub']['Full Score'].sum())]
ds = pd.DataFrame(
    data=data2,
    columns=['Type','Percent']
)

ds['Percent']/=count
ds=ds.sort_values('Percent')

fig=px.pie(
    ds,
    names='Type',
    values='Percent',
    title='Problem type',
)
fig.show("svg")
../../../_images/build_blitz_math2015_Math2_21_0.svg

Sort by skills

[13]:
path1 = "C:/Users/Administrator/Desktop/Math2/q.txt"
data1 = pd.read_table(path1,header=None)
[14]:
ds1 = data1.sum()
[15]:
path2 = "C:/Users/Administrator/Desktop/Math2/qnames.txt"
data2 = pd.read_table(path2,header=0)
[16]:
ds=pd.DataFrame(columns=['skill','count'])
for i in range(len(ds1)):
    new=pd.DataFrame({
        'skill':data2['Skill Names'][i],
        'count':ds1[i]
    },index=[0]
    )
    ds=ds.append(new,ignore_index=True)
[17]:
ds=ds.sort_values(['count'])
fig = px.bar(
    ds,
    x='count',
    y='skill',
    orientation='h',
    title='Skill count'
)
fig.show("svg")
../../../_images/build_blitz_math2015_Math2_27_0.svg

This figure shows that calculation is the most important skills in the test as almost every item is related to it. This proves that calculation is a necessary skill for math tests.