Bonjour à tous

Dans le cadre d'un projet statistique. Il nous a été demandé de choisir un jeu de données sur lequel on doit travailler et réaliser des analyses univariées, bivariées et multivariées.

Nous avons choisi une base de données qui contient les données suivants :

school
Student's school (binary: 'GP' - Gabriel Pereira or 'MS' - Mousinho da Silveira)

sex
Student's sex (binary: 'F' - female or 'M' - male)

age
Student's age (numeric: from 15 to 22)

address
Student's home address type (binary: 'U' - urban or 'R' - rural)

famsize
Family size (binary: 'LE3' - less or equal to 3 or 'GT3' - greater than 3)

Pstatus
Parent's cohabitation status (binary: 'T' - living together or 'A' - living apart)

Medu
Mother's education (numeric: 0 - none, 1 - primary education (4th grade), 2 - 5th to 9th grade, 3 - secondary education, or 4 - higher education)

Fedu
Father's education (numeric: 0 - none, 1 - primary education (4th grade), 2 - 5th to 9th grade, 3 - secondary education, or 4 - higher education)

Mjob
Mother's job (nominal: 'teacher', 'health' care related, civil 'services' (e.g. administrative or police), 'at_home' or 'other')

Fjob
Father's job (nominal: 'teacher', 'health' care related, civil 'services' (e.g. administrative or police), 'at_home' or 'other')

reason
Reason to choose this school (nominal: close to 'home', school 'reputation', 'course' preference or 'other')

guardian
Student's guardian (nominal: 'mother', 'father' or 'other')

traveltime
Home to school travel time (numeric: 1 - <15 min., 2 - 15 to 30 min., 3 - 30 min. to 1 hour, or 4 - >1 hour)

studytime
Weekly study time (numeric: 1 - <2 hours, 2 - 2 to 5 hours, 3 - 5 to 10 hours, or 4 - >10 hours)

failures
Number of past class failures (numeric: n if 1<=n<3, else 4)

schoolsup
Extra educational support (binary: yes or no)

famsup
Family educational support (binary: yes or no)

paid
Extra paid classes within the course subject (Math or Portuguese) (binary: yes or no)

activities
Extra-curricular activities (binary: yes or no)
nursery

Attended nursery school (binary: yes or no)

higher
Wants to take higher education (binary: yes or no)

internet
Internet access at home (binary: yes or no)

romantic
With a romantic relationship (binary: yes or no)

famrel
Quality of family relationships (numeric: from 1 - very bad to 5 - excellent)

freetime
Free time after school (numeric: from 1 - very low to 5 - very high)

goout
Going out with friends (numeric: from 1 - very low to 5 - very high)

Dalc
Workday alcohol consumption (numeric: from 1 - very low to 5 - very high)

Walc
Weekend alcohol consumption (numeric: from 1 - very low to 5 - very high)

health
Current health status (numeric: from 1 - very bad to 5 - very good)

absences
Number of school absences (numeric: from 0 to 93)

G1
First period grade (numeric: from 0 to 20)

G2
Second period grade (numeric: from 0 to 20)

G3
Final grade (numeric: from 0 to 20, output target)

(Je vous ai directement mis le descriptif des variables en anglais)


Disposant de plus de 300 observations, pour notre analyse multivariées, on souhaite pouvoir voir les facteurs qui influent sur la note finale (Donc G3). Pour cela nous avons décidé de faire une régression linéaire. Pour cela, nous avons écris ces lignes de code :

Code :Sélectionner tout -Visualiser dans une fenêtre à part
1
2
3
4
5
6
 
%let VarExp_G3 = c_age Mjob traveltime studytime failures schoolsup famsup paid activities nursery higher internet romantic famrel freetime goout Dalc Walc health absences G1 G2;
PROC GENMOD data= projet.data;
CLASS &VarExp_G3;
MODEL G3 = &VarExp_G3;
RUN;


Et nous avons obtenu les résultats suivants :









Suis-je sur la bonne voie ? Si oui, comment puis je interprété ces résultats ?
Si non, que faire ?

Navré pour le pavé et merci beaucoup pour le temps que vous allez y accorder.

Cordialement