Bart Cockx, Matteo Picchio, and Stijn Baert, "Modeling the Effects of
Grade Retention in High School", Journal of Applied Econometrics,
Vol. 34, No. 3, 2019, pp. 403-424.
This paper uses the longitudinal SONAR dataset. The SONAR dataset was
collected starting in 1999 by the Flemish inter-university research
group for youths born in 1976, 1978, and 1980 and living in Flanders.
The participants to the survey were chosen by randomly selecting from
the National Register about 3,000 individuals per each cohort.
From the original sample, we selected those born in 1978 and 1980. We
removed pupils whose grandmother on mother's side had a foreign
nationality (584 pupils deleted), pupils who needed special help,
temporarily or permanently, and were therefore in special schools, and
pupils who started high school when older than 15 (473 pupils
deleted). We dropped students entering the arts track (103 students),
those leaving school before the end of compulsory education (9
pupils), those ending in part-time education (183 students), those
with inconsistent or missing information on the end-of-year evaluation
and grade mobility (396 pupils) and those with missing values for some
of the covariates used in the econometric analysis (146 students).
Since only 42 students were retained in seventh grade and only 46
students made track transitions involving more than two steps, we
deleted their records from our sample. After applying these selection
criteria, we ended up with a sample of 3,933 pupils who were observed
in each year of their high school career.
The matlab dataset is made available. The fila data_wide.mat is a wide
format and it is a matrix of dimension 3933x301. Each row is a student.
Each column contains a different variable, whose meaning is explained
below:
id = data(:,1); % individual identified
coho = data(:,2); % cohort (1978 or 1980)
fem = data(:,3); % =1 if female
bro = data(:,4); % Number of brothers at 14
sis = data(:,5); % Number of sisters at 14
ageduf = data(:,6); % Age father left education
agedum = data(:,7); % Age mother left education
unemf = data(:,8); % Father was unemployed when student was 14
unemm = data(:,9); % Mother was unemployed when student was 14
sta_pri = data(:,10); % Year started primary school
end_pri = data(:,11); % Year ended primary school
monthb = data(:,12); % Month of birth
int_26 = data(:,13); %Interviewed at 26
int_29 = data(:,14); %Interviewed at 29
birdate = data(:,15); % Date of birth in calendar time January 1976=1
intdate = data(:,16); % Calendar time of the last interview: =1 at January 1976
ageint = data(:,17); % Age in months at the last interview
agespri = data(:,18); % Age at which primary school started
ageepri = data(:,19); % Age at which primary school ended
sta_sec = data(:,20); % Starting year of high school
agessec = data(:,21); % Age at which high school started
faele = data(:,22); % Number of failures at the end of primary school
delay = data(:,23); % Age at which secondary school started minus 6
fele = data(:,24); % At last one failure at the end of primary school
drop = data(:,25); % School drop out
teve = data(:,26); % Total number of years in high school
retention = data(:,27:37); % Retention dummy across max. 11 years of high school
bdown = data(:,38:48); % 2-step track downgrade across max. 11 years of high school
sdown = data(:,49:59); % 1-step track downgrade across max. 11 years of high school
downgrade = data(:,60:70); % Track downgrade across max. 11 years of high school
grade = data(:,71:81); % Grade across max. 11 years of high school
year = data(:,82:92); % Calendar year across max. 11 years of high school
diploma = data(:,93); % =1 if high school is completed and diploma attained
yexit = data(:,94:104); % High school exit at the end of the school year across max. 11 years of high school
noexit = data(:,105:115); % No school exit at the end of the school year across max. 11 years of high school
nodip = data(:,116:126); % High school exit without diploma at the end of the school year across max. 11 years of high school
dip = data(:,127:137); % High school exit with diploma at the end of the school year across max. 11 years of high school
cens = data(:,138:148); % High school exit to part-time education at the end of the school year across max. 11 years of high school
fuga = data(:,149:159); % High school exit before June across max. 11 years of high school
sam = data(:,160:170); % =1 if the student started the school year across max. 11 years of high school
course = data(:,171:181); % School track across max. 11 years of high school
pdrop = data(:,182:192); % Legally possible to drop at the end of the school year across max. 11 years of high school
resu = data(:,193:203); % Evaluation across max. 11 years of high school
eduf = data(:,204); % Father's education
edum = data(:,205); % Mother's education
sib = data(:,206); % Number of siblings at 14
sib0 = data(:,207); % Number of siblings is 0
sib1 = data(:,208); % Number of siblings is 1
sib2 = data(:,209); % Number of siblings is 2
sib3 = data(:,210); % Number of siblings is 3 or more
eduf1 = data(:,211); % Father's education: primary or missing
eduf2 = data(:,212); % Father's education: lower secondary
eduf3 = data(:,213); % Father's education: upper secondary
eduf4 = data(:,214); % Father's education: Tertiary education
edum1 = data(:,215); % Mother's education: primary or missing
edum2 = data(:,216); % Mother's education: lower secondary
edum3 = data(:,217); % Mother's education: upper secondary
edum4 = data(:,218); % Mother's education: Tertiary education
kid = data(:,219:229); % =1 if student has a kid across max. 11 years of high school
freta = data(:,230:240); % =1 if father has already retired across max. 11 years of high school
mreta = data(:,241:251); % =1 if mother has already retired across max. 11 years of high school
preta = data(:,252:262); % =1 if at least one parent has already retired across max. 11 years of high school
fdeth = data(:,263:273); % =1 if father has already passed away across max. 11 years of high school
mdeth = data(:,274:284); % =1 if mother has already passed away across max. 11 years of high school
pdeth = data(:,285:295); % =1 if at least one parent has already passed away across max. 11 years of high school
dayb = data(:,296; % Day of birth (from 1 to 365)
moedu = data(:,297); % Mother's education
faedu = data(:,298); % Father's education
moedu_im = data(:,299); % Mother's education is imputed
faedu_im = data(:,300); % Father's education is imputed
univ = data(:,301); % =1 if the student will go to university
The data are in the file zipped folder cpb-data.zip, which contains
the dataset both in Matlab format (data_wide.mat) and in an ASCII file
in DOS format (data_wide.csv).
The zipped folder cpb-estimation.zip contains the matlab files used to
estimate the benchmark model. The file data.m loads the dataset
data_wide.mat, creates variables for the construction of the
log-likelihood function, starts the minimization with analytical
derivatives of the function into g_function_r.m, which contains minus
the log-likelihood. The analytical derivatives in g_function_r.m were
compiled using ADiMat. The file function_r.m contains minus the
log-likelihood and it can be used for minimization with fminunc with
numerical derivatives. In the subfolder 'results', the estimation
outputs are stored by data.m.