{ "cells": [ { "cell_type": "markdown", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "# This Notebook:\n", "A small Monte Carlo study of the performance of the log likelihood appraoch based on the normal normal model. The Chetty polynomial estimator is calculated as well in all cases.\n", "\n", "alpha = **0.05** (in this one!)\n", "\n", "We have first a bunch of declaration (functions etc...), then a presentation of the data, and then some calculations.\n", "The calculations are slightly more extensive than those presented in the text of the paper. This may convince the reader that we have carried out due diligence..." ] }, { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "## BiModal Mixture Normal Normal case" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd\n", "import os" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ " $\\newcommand{\\Bold}[1]{\\mathbf{#1}}\\verb|/cygdrive/c/Users/gala0013/Dropbox/sagemath/alternativeETI|$ " ], "text/latex": [ "\\begin{math}\n", "\\newcommand{\\Bold}[1]{\\mathbf{#1}}\\verb|/cygdrive/c/Users/gala0013/Dropbox/sagemath/alternativeETI|\n", "\\end{math}" ], "text/plain": [ "'/cygdrive/c/Users/gala0013/Dropbox/sagemath/alternativeETI'" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "tt=os.getcwd()\n", "show(tt)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Tax Info" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "# this is for UMea...what is the swedish average...?\n", "taxinfo = [ [2001,0.677,0.477,252000],\n", " [2002,0.677,0.477,273800],\n", " [2003,0.677,0.477,284300],\n", " [2004,0.677,0.477,291800],\n", " [2005,0.669,0.469,298600],\n", " [2006,0.669,0.469,306000],\n", " [2007,0.669,0.469,316700], \n", " [2008,0.669,0.469,328800],\n", " [2009,0.669,0.469,367600],\n", " [2010,0.669,0.469,372100],\n", " [2011,0.669,0.469,383000],\n", " [2012,0.669,0.469,401100],\n", " [2013,0.669,0.469,413200],\n", " [2014,0.664,0.464,420800],\n", " [2015,0.664,0.464,430200],\n", " [2016,0.6635,0.4635,430200],\n", " [2017,0.6585,0.46585,438900],\n", " [2018,0.6585,0.46585,455300],\n", " [2019,0.6585,0.46585,490700]\n", " ]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "# so how do we select a symmetric range around the kink...?\n", "# we check the distance from the kink to the min or the max, \n", "# and select only if the difference from the kink \n", "# is less that the minimum between those 2 differences... simple...\n", "year = 2019\n", "Info_in_year = flatten([ nn for nn in taxinfo if nn[0]==year])\n", "Kink = Info_in_year[3]/1000.\n", "lnk=ln(Kink).n()" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ " $\\newcommand{\\Bold}[1]{\\mathbf{#1}}\\verb|/cygdrive/c/Users/gala0013/Dropbox/sagemath/alternativeETI|$ " ], "text/latex": [ "\\begin{math}\n", "\\newcommand{\\Bold}[1]{\\mathbf{#1}}\\verb|/cygdrive/c/Users/gala0013/Dropbox/sagemath/alternativeETI|\n", "\\end{math}" ], "text/plain": [ "'/cygdrive/c/Users/gala0013/Dropbox/sagemath/alternativeETI'" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "os.chdir(tt)\n", "tt2=os.getcwd()\n", "show(tt2)" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "#pl1,pl2,ps,ps1,ps2,pr12,lw,delta = var('pl1,pl2,ps,ps1,ps2,pr12,lw,delta')\n", "#delta = (STEP_BIN*.99999)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Tax parameters:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "# this is complicated since their application aggregates many years...here I take the rates for 2004...\n", "lnt1c = log(0.6585)\n", "lnt2c = log(0.465850000000000)\n", "Kink = exp(ln(490))\n", "lnk = ln(Kink).n()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Range" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### we simulate approximately only between ln(Kink-70000) and ln(Kink+70000)." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "lnebo = ln(Kink+70).n()+0.0001\n", "lnebu = ln(Kink-70).n()-0.0001" ] }, { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "## Simulating data" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "tags": [] }, "outputs": [], "source": [ "# number of replications\n", "Nrep = 1001\n", "# trying to get close to 15000 observations...\n", "Nsim = round(15000*1.125)\n", "init_seed_GL = 91503022025\n", "# one unit between bins...a unit may be worth 1000SEK or more or less\n", "STEP_BIN = 1 \n", "delta = (STEP_BIN*.99999)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Collecting Results" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "file_collect_res_ll = \"collect_res_ll_pareto_nor_05_23042023\"\n", "file_collect_res_ch = \"collect_res_ch_pareto_nor_05_23042023\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Several scenarii" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Homegenous ETI, Normal-Normal case" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "#alpha!\n", "#a = 0.05\n", "\n", "#mu_nor_nor = 6.\n", "#mu_nor_nor = mu_nor_nor.n()\n", "#se_nor_nor = .25\n", "#s_nor_nor = 1/se_nor_nor\n", "\n", "#mese = 0.005\n", "#memu = -0.5*mese*mese" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Homegenous ETI, Bimodal Normal-Normal case" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "#a = 0.1\n", "\n", "# these numbers are such that the normal model will not work well...\n", "#mu_bimod_nor_nor = [5.95,6.35]\n", "#se_bimod_nor_nor = [0.1,0.1]\n", "#sshare_bimod_nor_nor = 0.6\n", "#sshare_bimod_nor_nor = sshare_bimod_nor_nor.n()\n", "#s = 1/se\n", "\n", "#mese = 0.005\n", "#memu = -0.5*mese*mese" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Homegenous ETI, Pareto-Normal case" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "a = 0.05\n", "\n", "mu_paret_nor = 0\n", "se_paret_nor = 1\n", "\n", "paretExp_paret_nor = 3.\n", "\n", "mese = 0.005\n", "memu = -0.5*mese*mese\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Cython code..." ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Compiling ./GLbunching2.spyx...\n" ] } ], "source": [ "#compile some useful routines\n", "#load(\"~/Users/gauthier/Dropbox/sagemath/alternativeETI/GLbunching.spyx\")\n", "#load('C:\\Users\\gala0013\\Dropbox\\sagemath\\alternativeETI\\GLbunching.spyx')\n", "load(\"GLbunching2.spyx\")\n" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "#%%cython\n", "load(\"chetty_bunch.py\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Least Squares and Likelihood Routines\n", "The routines use various parametrisation. This matters when calculating the second derivatives and the ML precision." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Tax Info" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "#newd = \"C:\\\\Users\\\\gala0013\\\\Dropbox\\\\sagemath\\\\alternativeETI\"\n", "#os.chdir(newd)" ] }, { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "## Least Squares and Likelihood Routines\n", "The routines use various parametrisation. This matters when calculating the second derivatives and the ML precision." ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "tags": [] }, "outputs": [], "source": [ "%%cython\n", "clib: m\n", "\n", "cdef extern from \"math.h\":\n", " double sin(double x)\n", " double exp(double x)\n", " double log(double x)\n", " double sqrt(double x)\n", " double erf(double x)\n", " double tanh(double x)\n", " double pow(double x, double y)\n", " double fmax(double x, double y)\n", " double fmin(double x, double y)\n", " \n", " \n", "def grd(faf,x0,*args):\n", " x00 = [j for j in x0]\n", " r = len(x00)\n", " faf0 = faf(x00,*args)\n", " grd0 = [0.] * r\n", " petit = 1.3e-6\n", " for i in range(r):\n", " x = [j for j in x0]\n", " x[i]=x[i]*(1.+petit)+petit\n", " grd0[i] = (faf(x,*args)-faf0)/(x[i]-x00[i])\n", " \n", " return grd0\n", "\n", "# central gradient...\n", "def grdcd(faf,x0,*args):\n", " x00 = [j for j in x0]\n", " r = len(x00)\n", " \n", " grd0 = [0.] * r\n", " petit = 1.3e-6\n", " for i in range(r):\n", " x = [j for j in x0]\n", " xi = x[i]\n", " x[i] = (1.0+petit)*xi+petit\n", " fafp = faf(x,*args)\n", " dx = 2.0*petit*xi+2.0*petit\n", " x[i] = (1.0-petit)*xi-petit\n", " fafm = faf(x,*args)\n", " grd0[i] = (fafp-fafm)/(dx)\n", " \n", " return grd0\n", "\n", "# central gradient...long version\n", "# faf returns a 'vector' of size nfaf\n", "def grdcd_long(faf,x0,nfaf,*args): \n", " x00 = [j for j in x0]\n", " r = len(x00)\n", " #import pdb; pdb.set_trace() \n", " grd0 = [[0.]*nfaf] * r\n", " petit = 1.3e-6\n", " for i in range(r):\n", " x = [j for j in x0]\n", " xi = x[i]\n", " x[i] = (1.0+petit)*xi+petit\n", " fafp = faf(x,*args)\n", " dx = 2.0*petit*xi+2.0*petit\n", " x[i] = (1.0-petit)*xi-petit\n", " fafm = faf(x,*args)\n", " grd0[i] = [ (fafp[j]-fafm[j])/(dx) for j in range(nfaf)]\n", " \n", " return grd0\n", "\n", "\n", "#central hessian...\n", "def hescd(faf,x0,*args):\n", " x00 = [j for j in x0]\n", " r = len(x00)\n", " faf0 = faf(x00,*args)\n", " dh0 = [0.] * r\n", " hes0 = [[0. for j in range(r)] for l in range(r)]\n", " petit = 1.03e-4\n", " for i in range(r):\n", " dh0[i] = (x00[i]*(1.+petit)+petit)-x00[i]\n", "\n", " #print(hes0)\n", " #petit = 1.3e-5\n", " for i in range(r):\n", " for k in range(i,r):\n", " if i==k:\n", " x1 = [j for j in x00]\n", " x1[i] = x1[i] + 2*dh0[i]\n", " fafp2 = faf(x1,*args)\n", " x1 = [j for j in x00]\n", " x1[i] = x1[i] + dh0[i]\n", " fafp1 = faf(x1,*args) \n", " x1 = [j for j in x00]\n", " x1[i] = x1[i] - dh0[i]\n", " fafm1 = faf(x1,*args) \n", " x1 = [j for j in x00]\n", " x1[i] = x1[i] - 2*dh0[i]\n", " fafm2 = faf(x1,*args)\n", " hes0[i][k]=(-fafp2+16*fafp1-30*faf0+16*fafm1-fafm2)/12/dh0[i]/dh0[i]\n", " else:\n", " x1 = [j for j in x00]\n", " x1[i] = x1[i] + dh0[i]\n", " x1[k] = x1[k] + dh0[k]\n", " fafp1p1 = faf(x1,*args)\n", " x1 = [j for j in x00]\n", " x1[i] = x1[i] + dh0[i]\n", " x1[k] = x1[k] - dh0[k]\n", " fafp1m1 = faf(x1,*args)\n", " x1 = [j for j in x00]\n", " x1[i] = x1[i] - dh0[i]\n", " x1[k] = x1[k] + dh0[k]\n", " fafm1p1 = faf(x1,*args)\n", " x1 = [j for j in x00]\n", " x1[i] = x1[i] - dh0[i]\n", " x1[k] = x1[k] - dh0[k]\n", " fafm1m1 = faf(x1,*args) \n", " hes0[i][k]=(fafp1p1-fafp1m1-fafm1p1+fafm1m1)/4/dh0[i]/dh0[k]\n", " hes0[k][i]=hes0[i][k]\n", " \n", " return hes0\n" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "tags": [] }, "outputs": [], "source": [ "# predicts the distribution\n", "def pred_f(pprm):\n", " global wxw\n", " qq = param(pprm)\n", " return ff0n(wxw,*qq)\n", "\n", "# predict the distribution and a naive confidence interval around it!\n", "def pred_se(qq,hhiovn,talf):\n", " global wxw\n", " tse = [0 for n in range(len(dens))]\n", " td = [[0,0] for n in range(len(dens))]\n", " t = [[0,0] for n in range(len(dens))]\n", " tu = [[0,0] for n in range(len(dens))]\n", " \n", " for n in range(0, len(dens)):\n", " wxw = dens[n][0]\n", " g = matrix(grd(pred_f,qq))\n", " #print(type(g))\n", " #print(g)\n", " st = (g*hhiovn)*transpose(g)\n", " tse[n] = sqrt(st[0][0])\n", " t[n][0] = wxw \n", " t[n][1] = pred_f(qq)\n", " td[n][0] = wxw\n", " td[n][1] = t[n][1] - talf*tse[n]\n", " tu[n][0] = wxw\n", " tu[n][1] = t[n][1] + talf*tse[n]\n", " \n", " return {'pred':t, 'CId':td ,'CIu':tu }\n", "\n", "#formats result...very crude...\n", "def res_lik_print0(likfct,qq,hhi,sN,N,ttl,*args):\n", " se = [ xx/sN for xx in map(sqrt,hhi.diagonal())]\n", " se = [xx.n(digits=5) for xx in se]\n", "\n", " print(ttl)\n", " print(\"Parameter Estimate Std.Err.\")\n", " if len(qq)==3:\n", " print(table(list(zip([\"eti\",\"mu\",\"sigma\"],qq.n(digits=4),se))))\n", " elif len(qq)==4:\n", " print(table(list(zip([\"eti\",\"mu\",\"sigma1\",\"v\"],qq.n(digits=4),se))))\n", " elif len(qq)==5:\n", " print(table(list(zip([\"eti\",\"mu\",\"sigma1\",\"sigma_alf\",\"v\"],qq.n(digits=4),se)))) \n", " \n", " ll = -N*likfct(qq,*args)\n", " ll = float(ll)\n", " #print(type(ll))\n", " #print(int(N))\n", " #\n", " print(table([[int(N)],[ll]],header_column=['number of observations','log_likelihood'],align='right'))\n", "\n", " return qq,se,ll" ] }, { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "# Model with Imperfect Bunching. \n", "## model of density throughout. Log Normal-Log Normal model..." ] }, { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "We start with some more routines..." ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "tags": [] }, "outputs": [], "source": [ "%%cython\n", "def lnwh_obs(gNsim,lnk,lnwh1,lnwh2):\n", " cdef int i\n", " lnwh = [0]*gNsim\n", " for i in range(gNsim):\n", " if lnwh1[i]lnk:\n", " lnwh[i] = lnwh2[i]\n", " else:\n", " lnwh[i] = lnk\n", " \n", " return lnwh" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "tags": [] }, "outputs": [], "source": [ "%%cython\n", "\n", "clib: m\n", "\n", "cdef extern from \"math.h\":\n", " double sin(double x)\n", " double exp(double x)\n", " double log(double x)\n", " double sqrt(double x)\n", " double erf(double x)\n", " double tanh(double x)\n", " double pow(double x, double y)\n", " double fmax(double x, double y)\n", " double fmin(double x, double y)\n", "\n", "def param1(pprm,delta,lnt1c,lnt2c,lnebu,lnebo,Kink,lnk):\n", " pps = exp(fmax(fmin(pprm[2],5.),-5))\n", " ppl2 = abs(pprm[0])*lnt2c + pprm[1]\n", " ppl2 = pps*ppl2\n", " ppl1 = abs(pprm[0])*lnt1c + pprm[1]\n", " ppl1 = pps*ppl1 \n", " \n", " return (ppl1,ppl2,pps)\n", " \n", "def paramib(pprm,delta,lnt1c,lnt2c,lnebu,lnebo,Kink,lnk):\n", " #pprm[1] = fmax(fmin(pprm[1],5.),-5)\n", " #pl1 = abs(pprm[0])*lnt1c + lnebu + (lnebo-lnebu)/(1+exp(-pprm[1]))\n", " #pl2 = abs(pprm[0])*lnt2c + lnebu + (lnebo-lnebu)/(1+exp(-pprm[1]))\n", " pl1 = abs(pprm[0])*lnt1c + pprm[1]\n", " pl2 = abs(pprm[0])*lnt2c + pprm[1]\n", " \n", " if len(pprm)==4:\n", " #homogenous model with imperfect bunching\n", " ps1 = exp(-fmax(fmin(pprm[2],5.),-5)) \n", " ps2 = ps1\n", " vs = exp(fmax(fmin(pprm[3],10.),-5)) \n", " return [pl1/ps1,pl2/ps2,1/ps1,1/ps2,1.,vs]\n", " \n", " elif len(pprm)==5:\n", " # heterogenous model with imperfect bunching \n", " # independence between disutility and response remains equal to 0\n", " prr = 0.0\n", " #elif len(pprm)==5:\n", " # prr = tanh(pprm[4])\n", " \n", " # variance of eti can't be too large under the normal model...\n", " sa = abs(pprm[0])/2.5/(1+exp(-fmax(fmin(pprm[3],5.),-5)))\n", " st = exp(-fmax(fmin(pprm[2],5.),-5)) \n", " \n", " #ps1 = pprm[2]^2 + 2.*prr*abs(pprm[2]*pprm[3])*lnt1c + (lnt1c*lnt1c)*(pprm[3]^2)\n", " ps1 = st^2 + 2.*prr*abs(st*sa)*lnt1c + (lnt1c*lnt1c)*(sa^2)\n", " ps1 = sqrt(ps1)\n", " ps2 = st^2 + 2.*prr*abs(st*sa)*lnt2c + (lnt2c*lnt2c)*(sa^2)\n", " ps2 = sqrt(ps2)\n", " pc12= st^2 + prr*abs(st*sa)*(lnt1c+lnt2c) + (lnt1c*lnt2c)*(sa^2)\n", " pc12= pc12/ps1/ps2\n", " \n", " vs = exp(fmax(fmin(pprm[4],10.),-5))\n", " #print(pc12)#\n", " pc12 = fmin(fmax(pc12,-1.+1e-5),1.-1e-5)\n", " #pc12 = pc12/ps1/ps2\n", " return [pl1/ps1,pl2/ps2,1/ps1,1/ps2,pc12,vs]" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "tags": [] }, "outputs": [], "source": [ "import math \n", "\n", "def ff0n(w,pl1,pl2,ps,delta,lnt1c,lnt2c,lnebu,lnebo,Kink,lnk):\n", " ppprm = param1([pl1,pl2,ps])\n", " ppl1 = ppprm[0]\n", " ppl2 = ppprm[1]\n", " pps = ppprm[2] \n", " cndI = fmax(cdfn(pps*lnebo-ppl2)-cdfn(pps*lnebu-ppl1),0.0000001)\n", " \n", " if math.log(w)=log(Kink-delta):\n", " return (cdfn(pps*lnk-ppl2)-cdfn(pps*log(Kink-delta)-ppl1))/cndI \n", "\n", " if math.log(w)>lnk:\n", " return pdfn(pps*math.log(w)-ppl2)*pps/w/cndI \n", "\n", "def ff0na(w,pprm,delta,lnt1c,lnt2c,lnebu,lnebo,Kink,lnk):\n", " ppprm = param1(pprm)\n", " pl1 = ppprm[0]\n", " pl2 = ppprm[1]\n", " ps = ppprm[2]\n", "\n", " lw = math.log(w)\n", " lKmd = math.log(Kink-delta)\n", " \n", " if lwlnk:\n", " return pdfn(ps*lw-pl2)*ps/w\n", " if lw<=lnk and lw>=lKmd:\n", " return (cdfn(ps*lnk-pl2)-cdfn(ps*lKmd-pl1))\n", "\n", "def prob_int(pprm,delta,lnt1c,lnt2c,lnebu,lnebo,Kink,lnk):\n", " # this parametrisation is better behaved...\n", " # this feels very ad hoc but it works (at least here!)\n", " ppprm = param1(pprm)\n", " ppl1 = ppprm[0]\n", " ppl2 = ppprm[1]\n", " pps = ppprm[2]\n", " \n", " cndI = (cdfn(pps*lnebo-ppl2)-cdfn(pps*lnebu-ppl1))\n", " return cndI\n", "\n", "\n", "def h0iba1(ww,pl1,pl2,ps1,ps2,pr12,vs,delta,lnt1c,lnt2c,lnebu,lnebo,Kink,lnk):\n", " # density as in the paper...\n", " #import pdb; pdb.set_trace() \n", " lnw = math.log(ww)\n", " ffss1 = ps1\n", " ffss12=ffss1*ffss1\n", " ffmm1 = ffss1*lnk-pl1\n", " ffss2 = ps2\n", " ffss22=ffss2*ffss2\n", " ffmm2 = ffss2*lnk-pl2 \n", " evs = vs\n", " evs2 = evs*evs\n", " return (ffss1/sqrt(evs2+ffss12)*pdfn((evs*ffss1*lnw-(pl1*evs-1/2*(ffss1/evs)))/sqrt(evs2+ffss12))*\\\n", " cdfn((-evs2*lnw +(evs2+ffss12)*lnk-1/2-pl1*ffss1)/sqrt(evs2+ffss12))/ww)\n", "\n", "def h0iba2(ww,pl1,pl2,ps1,ps2,pr12,vs,delta,lnt1c,lnt2c,lnebu,lnebo,Kink,lnk):\n", " # density as in the paper...\n", " #import pdb; pdb.set_trace() \n", " lnw = math.log(ww)\n", " ffss1 = ps1\n", " ffss12=ffss1*ffss1\n", " ffmm1 = ffss1*lnk-pl1\n", " ffss2 = ps2\n", " ffss22=ffss2*ffss2\n", " ffmm2 = ffss2*lnk-pl2 \n", " evs = vs\n", " evs2 = evs*evs\n", " return (ffss2/sqrt(evs2+ffss22)*pdfn((evs*ffss2*lnw-(pl2*evs-1/2*(ffss2/evs)))/sqrt(evs2+ffss22))*\\\n", " cdfn((evs2*lnw-(evs2+ffss22)*lnk+1/2+pl2*ffss2)/sqrt(evs2+ffss22))/ww)\n", "\n", "def h0ibak(ww,pl1,pl2,ps1,ps2,pr12,vs,delta,lnt1c,lnt2c,lnebu,lnebo,Kink,lnk):\n", " # density as in the paper...\n", " #import pdb; pdb.set_trace() \n", " lnw = math.log(ww)\n", " ffss1 = ps1\n", " ffss12=ffss1*ffss1\n", " ffmm1 = ffss1*lnk-pl1\n", " ffss2 = ps2\n", " ffss22=ffss2*ffss2\n", " ffmm2 = ffss2*lnk-pl2 \n", " evs = vs\n", " evs2 = evs*evs\n", " if (pr12>=1):\n", " FFk= cdfn(ffmm2)-cdfn(ffmm1)\n", " else:\n", " FFk= cdfn(ffmm2)-cdfbvn(ffmm1,ffmm2,pr12)\n", " return (pdfn(evs*(lnw-lnk)+1/2/evs)/ww*FFk)\n", " #return FFk/evs\n", "\n", "def h0iba(ww,pl1,pl2,ps1,ps2,pr12,vs,delta,lnt1c,lnt2c,lnebu,lnebo,Kink,lnk):\n", " # density as in the paper...\n", " #import pdb; pdb.set_trace() \n", " lnw = math.log(ww)\n", " ffss1 = ps1\n", " ffss12=ffss1*ffss1\n", " ffmm1 = ffss1*lnk-pl1\n", " ffss2 = ps2\n", " ffss22=ffss2*ffss2\n", " ffmm2 = ffss2*lnk-pl2 \n", " evs = vs\n", " evs2 = evs*evs\n", " if (pr12>=1):\n", " FFk= cdfn(ffmm2)-cdfn(ffmm1)\n", " else:\n", " FFk= cdfn(ffmm2)-cdfbvn(ffmm1,ffmm2,pr12)\n", " #if tanh(pr12)<0.9999:\n", " # FFk= cdfN(ffmm2)-cdfbvn(ffmm1,ffmm2,tanh(pr12))\n", " #else:\n", " # FFk= cdfN(ffmm2)-cdfN(ffmm1)\n", " #print(FFk.n())\n", " # removed *evs\n", " return (pdfn(evs*(lnw-lnk)+1/2/evs)/ww*FFk+\\\n", " ffss2/sqrt(evs2+ffss22)*pdfn((evs*ffss2*lnw-(pl2*evs-1/2*(ffss2/evs)))/sqrt(evs2+ffss22))*\\\n", " cdfn((evs2*lnw-(evs2+ffss22)*lnk+1/2+pl2*ffss2)/sqrt(evs2+ffss22))/ww+ \\\n", " ffss1/sqrt(evs2+ffss12)*pdfn((evs*ffss1*lnw-(pl1*evs-1/2*(ffss1/evs)))/sqrt(evs2+ffss12))*\\\n", " cdfn((-evs2*lnw +(evs2+ffss12)*lnk-1/2-pl1*ffss1)/sqrt(evs2+ffss12))/ww)*evs\n", "\n", "\n", "def cndI0ib(prm,delta,lnt1c,lnt2c,lnebu,lnebo,Kink,lnk):\n", " pprm = paramib(prm,delta,lnt1c,lnt2c,lnebu,lnebo,Kink,lnk)\n", "\n", " cndI = prob_alt(math.exp(lnebo),*pprm,delta,lnt1c,lnt2c,lnebu,lnebo,Kink,lnk)-prob_alt(math.exp(lnebu),*pprm,delta,lnt1c,lnt2c,lnebu,lnebo,Kink,lnk)\n", " \n", " return cndI\n", "\n", " \n", "def prob_alt(ww,pl1,pl2,ps1,ps2,pr12,vs,delta,lnt1c,lnt2c,lnebu,lnebo,Kink,lnk):\n", " #a formal expression...\n", " lnw = math.log(ww)\n", " ffss1 = ps1\n", " ffss12=ffss1*ffss1\n", " ffmm1 = ffss1*lnk-pl1\n", " ffss2 = ps2\n", " ffss22=ffss2*ffss2\n", " ffmm2 = ffss2*lnk-pl2 \n", " evs = vs\n", " evs2 = evs*evs\n", " #import pdb; pdb.set_trace()\n", " FFk= cdfn(ffmm2)-cdfn(ffmm1)\n", " #show(FFk)\n", " P0 = FFk*cdfn(evs*(lnw-lnk)+1/2/evs)\n", " \n", " #show(ffss1/sqrt(evs2+ffss12))\n", " y2 = (evs*ffss1*lnw-(pl1*evs-1/2*(ffss1/evs)))/sqrt(evs2+ffss12) \n", " P1 = cdfbvn(ffmm1,y2,evs/sqrt(evs2+ffss12))\n", "\n", " y2 = (evs*ffss2*lnw-(pl2*evs-1/2*(ffss2/evs)))/sqrt(evs2+ffss22) \n", " #y1 = (evs*ffss2*lnk-(pl2*evs-1/2*(ffss2/evs)))/sqrt(evs2+ffss22) \n", " P2 = cdfbvn(-ffmm2,y2,-evs/sqrt(evs2+ffss22))\n", " #-cdfbvn(-ffmm2,y1,-evs/sqrt(evs2+ffss22))\n", " #show((P0.n(),P1.n(),P2.n()))\n", " #+P2\n", " return (P0+P1+P2) " ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "tags": [] }, "outputs": [], "source": [ "import math\n", "\n", "def lik0(pprm,densi,delta,lnt1c,lnt2c,lnebu,lnebo,Kink,lnk): \n", " #ppprm = param0(pprm)\n", " ppprm = param1(pprm,delta,lnt1c,lnt2c,lnebu,lnebo,Kink,lnk)\n", " ppl1 = ppprm[0]\n", " ppl2 = ppprm[1]\n", " pps = ppprm[2]\n", "\n", "# print(pl1,pl2)\n", " #cndI = (cdfn(pps*lnebo-ppl2)-cdfn(pps*lnebu-ppl1))\n", " cndI = max(cdfn(pps*lnebo-ppl2)-cdfn(pps*lnebu-ppl1),0.000000001) \n", " if cndI<1.e-9:\n", " cndI = 1.e-9\n", " print(pprm,cndI)\n", " \n", " ores = [ n[1]*math.log(ff0na(n[0],pprm,delta,lnt1c,lnt2c,lnebu,lnebo,Kink,lnk)/cndI) for n in densi ]\n", " #print(res)\n", " #res = sum(res).n()-ln(cndI).n()\n", " return -sum(ores)\n", "\n", "# least square with minimize; rough and ready!\n", "def lsq0(pprm,densi,delta,lnt1c,lnt2c,lnebu,lnebo,Kink,lnk):\n", " #ppprm = param0(pprm)\n", " ppprm = param1(pprm,delta,lnt1c,lnt2c,lnebu,lnebo,Kink,lnk)\n", " ppl1 = ppprm[0]\n", " ppl2 = ppprm[1]\n", " pps = ppprm[2]\n", "\n", " #cndI = cdfn(pps*lnebo-ppl2)-cdfn(pps*lnebu-ppl1)\n", " cndI = max(cdfn(pps*lnebo-ppl2)-cdfn(pps*lnebu-ppl1),0.000000001) \n", " if cndI<1.e-9:\n", " cndI = 1.e-9 \n", " #print(pprm,cndI)\n", " \n", " ores = [ (n[1]-(ff0na(n[0],pprm,delta,lnt1c,lnt2c,lnebu,lnebo,Kink,lnk)/cndI))^2 for n in densi ]\n", " ores = sum(ores)\n", " #print(pprm,res)\n", " return ores \n", "\n", "def lik0ib(prm,densi,delta,lnt1c,lnt2c,lnebu,lnebo,Kink,lnk):\n", " pprm = paramib(prm,delta,lnt1c,lnt2c,lnebu,lnebo,Kink,lnk) \n", " cndI = prob_alt(math.exp(lnebo),*pprm,delta,lnt1c,lnt2c,lnebu,lnebo,Kink,lnk)-prob_alt(math.exp(lnebu),*pprm,delta,lnt1c,lnt2c,lnebu,lnebo,Kink,lnk)+0.00001\n", " #show(cndI)\n", " #import pdb; pdb.set_trace()\n", " ores = 0 \n", " p_bef = prob_alt(math.exp(lnebu),*pprm,delta,lnt1c,lnt2c,lnebu,lnebo,Kink,lnk)\n", " longueur = len(densi)\n", " for ii in range(longueur):\n", " p_at = prob_alt(densi[ii][0],*pprm,delta,lnt1c,lnt2c,lnebu,lnebo,Kink,lnk)\n", " #if ((p_at-p_bef)/cndI)<1e-10:\n", " # #show([ii,(p_at-p_bef),cndI,densi[ii]],prm,pprm)\n", " # ores -= densi[ii][1]*(-25)\n", " #else:\n", " #ores -= densi[ii][1]*math.log((p_at-p_bef)/cndI) \n", " ores -= densi[ii][1]*(math.log((p_at-p_bef))- math.log(cndI) )\n", " p_bef = p_at\n", " \n", " return ores\n", "\n", "def lik0ib_long(prm,densi,delta,lnt1c,lnt2c,lnebu,lnebo,Kink,lnk):\n", " pprm = paramib(prm,delta,lnt1c,lnt2c,lnebu,lnebo,Kink,lnk) \n", " \n", " cndI = prob_alt(math.exp(lnebo),*pprm,delta,lnt1c,lnt2c,lnebu,lnebo,Kink,lnk)-prob_alt(math.exp(lnebu),*pprm,delta,lnt1c,lnt2c,lnebu,lnebo,Kink,lnk)+0.00001\n", " \n", " p_bef = prob_alt(math.exp(lnebu),*pprm,delta,lnt1c,lnt2c,lnebu,lnebo,Kink,lnk)\n", " longueur = len(densi)\n", " ores = [0 for n in range(longueur)]\n", " for ii in range(longueur):\n", " p_at = prob_alt(densi[ii][0],*pprm,delta,lnt1c,lnt2c,lnebu,lnebo,Kink,lnk) \n", " ores[ii] = -densi[ii][1]*(math.log((p_at-p_bef))- math.log(cndI) )\n", " p_bef = p_at\n", " \n", " return ores\n", "\n", "def lsq0ib(prm,densi,delta,lnt1c,lnt2c,lnebu,lnebo,Kink,lnk):\n", " pprm = paramib(prm,delta,lnt1c,lnt2c,lnebu,lnebo,Kink,lnk)\n", "\n", " cndI = (prob_alt(math.exp(lnebo),*pprm,delta,lnt1c,lnt2c,lnebu,lnebo,Kink,lnk)-prob_alt(math.exp(lnebu),*pprm,delta,lnt1c,lnt2c,lnebu,lnebo,Kink,lnk)+0.00001)*STEP_BIN\n", "\n", " ores = 0 \n", " p_bef = prob_alt(exp(lnebu),*pprm,delta,lnt1c,lnt2c,lnebu,lnebo,Kink,lnk)\n", " longueur = len(densi)\n", " for ii in range(longueur):\n", " p_at = prob_alt(densi[ii][0],*pprm,delta,lnt1c,lnt2c,lnebu,lnebo,Kink,lnk)\n", " ores += (densi[ii][1]-((p_at-p_bef)/cndI))^2\n", " p_bef = p_at\n", "\n", " return ores\n", "\n", "def cndI0ib(prm,delta,lnt1c,lnt2c,lnebu,lnebo,Kink,lnk):\n", " #x = var('x')\n", " pprm = paramib(prm,delta,lnt1c,lnt2c,lnebu,lnebo,Kink,lnk)\n", " cndI = prob_alt(math.exp(lnebo),*pprm,delta,lnt1c,lnt2c,lnebu,lnebo,Kink,lnk)-prob_alt(math.exp(lnebu),*pprm,delta,lnt1c,lnt2c,lnebu,lnebo,Kink,lnk)+0.00001\n", " \n", " return cndI" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "tags": [] }, "outputs": [], "source": [ "def mak_vc(loglik,loglik_long,x,lendti,N,stp_bn,*args):\n", " #construct the variance covariance of the ML estimator...\n", " #Gradients, Hessian, Information matrices\n", " #adjusting for density etc...\n", " G_all = stp_bn*matrix(RDF,grdcd(loglik,x,*args))\n", " hh0ib = stp_bn*matrix(RDF,hescd(loglik,x,*args))\n", " hh0ibi = hh0ib.inverse()\n", " G00 = grdcd_long(loglik_long,x,lendti,*args)\n", " G0ib = stp_bn*matrix(RDF,G00) \n", " jj0ib = N*G0ib*G0ib.transpose()\n", " return {\n", " 'G_all':G_all,\n", " 'hh0ib':hh0ib,\n", " 'hh0ibi':hh0ibi,\n", " 'jj0ib':jj0ib\n", " }" ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "tags": [] }, "outputs": [], "source": [ "def mak_chi2(datai,N,x,*par):\n", " #make a chi squared stat...\n", " OO2 = [ [nn[0],nn[1]*totnobs] for nn in datai] \n", " TT2 = [ [n[0],float(h0iba(n[0],*paramib(x,*par),*par))/cndI0ib(x,*par)*N] for n in datai ]\n", " csq = [ float((OO2[ii][1]-TT2[ii][1])^2/TT2[ii][1]) for ii in range(len(datai))]\n", " csq2 = [ [OO2[ii][0],float((OO2[ii][1]-TT2[ii][1])^2/TT2[ii][1])] for ii in range(len(datai))]\n", " return {\n", " 'Observed #':OO2,\n", " 'Theoretical #':TT2,\n", " 'Chi_squared stat': sum(csq),\n", " 'obs vs ind chi2 stat': csq2\n", " }" ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "tags": [] }, "outputs": [], "source": [ "def mak_OLS_R2(O2,T2):\n", " data = [ (T2[ii][1],O2[ii][1]) for ii in range(len(O2))]\n", " N = sum([n[1] for n in O2])\n", " # design a model with adjustable parameters a,b,c that describes the data\n", " a,b,x = var('a, b, x')\n", " model(x) = a * x + b\n", "\n", " # calculate the values of the parameters that best fit the model to the data\n", " sol = find_fit(data,model)\n", " # define f(x), the model with the parameters set to the fitted values\n", " f(x) = model(a=sol[0].rhs(),b=sol[1].rhs())\n", "\n", " # create an empty plot object\n", " a = plot([])\n", " # add a plot of the model, with respect to x from -0.5 to 12.5\n", " a += plot(f(x),x,[min(nn[1] for nn in T2),max(nn[1] for nn in T2)])\n", "\n", " # add a plot of the data, in red\n", " a += list_plot(data,color='red')\n", " \n", " V_obs = variance([nn[1] for nn in O2])\n", " V_the = variance([f(nn[1]) for nn in T2])\n", " R2sq = V_the/V_obs\n", " \n", " return {\n", " 'a':sol[0].rhs(),\n", " 'b':sol[1].rhs(),\n", " 'Nobs':N,\n", " 'graph':a,\n", " 'Var obs':V_obs,\n", " 'Var theoretical':V_the,\n", " 'R2':R2sq\n", " }" ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "tags": [] }, "outputs": [], "source": [ "from sage.misc.randstate import current_randstate\n", "from sage.misc.prandom import normalvariate\n", "from sage.misc.prandom import uniform\n", "\n", "uniform = current_randstate().python_random().uniform\n", "normalvariate = current_randstate().python_random().normalvariate" ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "tags": [] }, "outputs": [ { "data": { "text/html": [ " $\\newcommand{\\Bold}[1]{\\mathbf{#1}}91503022025$ " ], "text/latex": [ "\\begin{math}\n", "\\newcommand{\\Bold}[1]{\\mathbf{#1}}91503022025\n", "\\end{math}" ], "text/plain": [ "91503022025" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "set_random_seed(init_seed_GL)\n", "srs = initial_seed()\n", "show(srs)" ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "tags": [] }, "outputs": [], "source": [ "from sage.misc.randstate import current_randstate" ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "tags": [] }, "outputs": [], "source": [ "def gen_nor_nor(gNsim,ga,gmu,gse,gmemu,gmese,glnt1c,glnt2c,glnebu,glnebo,gKink,glnk):\n", " # generates binned data\n", " # if chettydum==1 densb and st2\n", " # if chettydum==0 densb\n", " #import pdb; pdb.set_trace()\n", " U = RealDistribution('uniform', [0,1])\n", " T = RealDistribution('gaussian', 1)\n", " #show([(glnebu-ga*glnt1c-gmu-4*gmese),(glnebo-ga*glnt2c-gmu+4*gmese)])\n", " Phi1 = T.cum_distribution_function((glnebu-ga*glnt1c-gmu-4*gmese)/gse)\n", " DifPhi = T.cum_distribution_function((glnebo-ga*glnt2c-gmu+4*gmese)/gse)-Phi1\n", " \n", " uniform = current_randstate().python_random().uniform\n", " uu = [ uniform(0,1) for i in range(gNsim)]\n", " \n", " cst1 = gmu\n", " #+ga*(glnt1c)\n", " ge = [T.cum_distribution_function_inv(DifPhi*uu[i]+Phi1)*gse+cst1 for i in range(gNsim)]\n", " ge.sort() \n", " \n", " cst1 = ga*glnt1c\n", " cst2 = ga*glnt2c\n", " lnwh1 = [cst1+ge[i] for i in range(gNsim)]\n", " lnwh2 = [cst2+ge[i] for i in range(gNsim)]\n", " \n", " lnwh = lnwh_obs(gNsim,glnk,lnwh1,lnwh2) \n", " \n", " #generates data with noise\n", " normalvariate = current_randstate().python_random().normalvariate\n", " gme = [gmese*normalvariate(0,1)+gmemu for i in range(gNsim)]\n", " \n", " #cst1 = glnebu-4*gmese\n", " #cst2 = glnebo+4*gmese\n", " dd0 = [ lnwh[i]+gme[i] for i in range(gNsim)]\n", " Edd0 = [ exp(dd0[i]) for i in range(gNsim)]\n", " #simulated data with noise \n", " # still needs to be cut to the interval of interest\n", " \n", " return dd0,Edd0 " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 30, "metadata": { "tags": [] }, "outputs": [], "source": [ "def gen_nor_laplace(gNsim,ga,gmu,gse,glpld,glplu,glnt1c,glnt2c,glnebu,glnebo,gKink,glnk):\n", " # generates binned data\n", " # if chettydum==1 densb and st2\n", " # if chettydum==0 densb\n", " #import pdb; pdb.set_trace()\n", " #import pdb; pdb.set_trace()\n", " U = RealDistribution('uniform', [0,1])\n", " T = RealDistribution('gaussian', 1)\n", " \n", " # some extreme? based on double laplace distribution\n", " # this appears to be important!\n", " # dwn is negative by construction\n", " dwn = log(1-0.02)/glpld\n", " # upp is positive\n", " upp = -log(2*(1-0.999))/glplu\n", " \n", " Phi1 = T.cum_distribution_function((glnebu-ga*glnt1c-gmu+dwn)/gse)\n", " DifPhi = T.cum_distribution_function((glnebo-ga*glnt2c-gmu+upp)/gse)-Phi1\n", " \n", " uniform = current_randstate().python_random().uniform\n", " uu = [ uniform(0,1) for i in range(gNsim)]\n", " \n", " cst1 = gmu\n", " #+ga*(glnt1c)\n", " ge = [T.cum_distribution_function_inv(DifPhi*uu[i]+Phi1)*gse+cst1 for i in range(gNsim)]\n", " ge.sort() \n", " \n", " cst1 = ga*glnt1c\n", " cst2 = ga*glnt2c\n", " lnwh1 = [cst1+ge[i] for i in range(gNsim)]\n", " lnwh2 = [cst2+ge[i] for i in range(gNsim)]\n", " \n", " lnwh = lnwh_obs(gNsim,glnk,lnwh1,lnwh2) \n", " \n", " #generates data with noise\n", " #normalvariate = current_randstate().python_random().normalvariate\n", " uniform = current_randstate().python_random().uniform\n", " uua = [ -ln(uniform(0,1))-1. for i in range(gNsim)]\n", " uub = [ -ln(uniform(0,1))-1. for i in range(gNsim)]\n", " gme = [ uua[i]/glplu-uub[i]/glpld for i in range(gNsim)]\n", " dd0 = [ lnwh[i] + gme[i] for i in range(gNsim)]\n", " Edd0 = [ exp(dd0[i]) for i in range(gNsim)]\n", " #simulated data with noise \n", " # still needs to be cut to the interval of interest\n", " \n", " return dd0,Edd0 " ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "tags": [] }, "outputs": [], "source": [ "def gen_binor_nor(gNsim,ga,gshare,gmu,gse,gmemu,gmese,glnt1c,glnt2c,glnebu,glnebo,gKink,glnk):\n", " # bimodal data...gmu and gse are vectors of 2 components each!\n", " # gshare is share of 0th type\n", " # generates binned data\n", " # if chettydum==1 densb and st2\n", " # if chettydum==0 densb\n", " #import pdb; pdb.set_trace()\n", " U = RealDistribution('uniform', [0,1])\n", " T = RealDistribution('gaussian', 1)\n", " #show(gmu)\n", " #show(gse)\n", " Phi1 = T.cum_distribution_function((glnebu-ga*glnt1c-gmu[0]-4*gmese)/gse[0])\n", " DifPhi1 = T.cum_distribution_function((glnebo-ga*glnt2c-gmu[0]+4*gmese)/gse[0])-Phi1\n", " \n", " Phi2 = T.cum_distribution_function((glnebu-ga*glnt1c-gmu[1]-4*gmese)/gse[1])\n", " DifPhi2 = T.cum_distribution_function((glnebo-ga*glnt2c-gmu[1]+4*gmese)/gse[1])-Phi2\n", " \n", " ptot = gshare*DifPhi1+(1.-gshare)*DifPhi2\n", " \n", " #show([DifPhi1,DifPhi2,ptot])\n", " \n", " w1 = gshare*DifPhi1/ptot\n", " w2 = (1.-gshare)*DifPhi2/ptot\n", " \n", " #show([w1,w2])\n", " uniform = current_randstate().python_random().uniform\n", " uu1 = [ uniform(0,1) for i in range(gNsim)]\n", " uu2 = [ uniform(0,1) for i in range(gNsim)]\n", " \n", " uus = [ uniform(0,1) for i in range(gNsim)]\n", " \n", " cst1 = gmu[0]\n", " #+ga*(glnt1c)\n", " #show(cst1)\n", " ge1 = [T.cum_distribution_function_inv(DifPhi1*uu1[i]+Phi1)*gse[0]+cst1 for i in range(gNsim)]\n", " \n", " cst1 = gmu[1]\n", " #+ga*(glnt1c)\n", " #show(cst1) \n", " ge2 = [T.cum_distribution_function_inv(DifPhi2*uu2[i]+Phi2)*gse[1]+cst1 for i in range(gNsim)]\n", " \n", " ge = [ ge1[i]+(uus[i]=cst1 and n<=cst2]\n", " dd0 = [ lnwh[i]+gme[i] for i in range(gNsim)]\n", " Edd0 = [ exp(dd0[i]) for i in range(gNsim)]\n", " #simulated data with noise \n", " # still needs to be cut to the interval of interest\n", " \n", " return dd0,Edd0 \n" ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "tags": [] }, "outputs": [], "source": [ "def gen_pareto_nor(gNsim,ga,gmu,gparetexp,gmemu,gmese,glnt1c,glnt2c,glnebu,glnebo,gKink,glnk):\n", " # generates binned data\n", " # if chettydum==1 densb and st2\n", " # if chettydum==0 densb\n", " #import pdb; pdb.set_trace()\n", " U = RealDistribution('uniform', [0,1])\n", " T = RealDistribution('gaussian', 1)\n", "\n", " #show([w1,w2])\n", " uniform = current_randstate().python_random().uniform\n", " uu = [ uniform(0,1) for i in range(gNsim)]\n", " \n", " cst1 = glnebu-ga*glnt1c-gmu-4*gmese\n", " cst2 = (1-(cst1/(glnebo-ga*glnt2c-gmu+4*gmese))^gparetexp)\n", " \n", " #ge = [ lnebu*(1-(1-(lnebu/lnebo)^paretexp)*uu[i])^(-1/paretexp) for i in range(Nsim)]\n", " ge = [ cst1*(1-cst2*uu[i])^(-1/gparetexp) for i in range(gNsim)]\n", " \n", " #show([mean(ge1),std(ge1)])\n", " #show([mean(ge2),std(ge2)])\n", " #histogram(ge,bins=111).show()\n", " \n", " cst1 = ga*glnt1c+gmu\n", " cst2 = ga*glnt2c+gmu\n", " lnwh1 = [cst1+ge[i] for i in range(gNsim)]\n", " lnwh2 = [cst2+ge[i] for i in range(gNsim)]\n", " \n", " lnwh = lnwh_obs(gNsim,glnk,lnwh1,lnwh2) \n", " \n", " #generates data with noise\n", " normalvariate = current_randstate().python_random().normalvariate\n", " gme = [gmese*normalvariate(0,1)+gmemu for i in range(gNsim)]\n", " \n", " #cst1 = glnebu-4*gmese\n", " #cst2 = glnebo+4*gmese\n", " dd0 = [ lnwh[i]+gme[i] for i in range(gNsim)]\n", " Edd0 = [ exp(dd0[i]) for i in range(gNsim)]\n", " #simulated data with noise \n", " # still needs to be cut to the interval of interest\n", " \n", " return dd0,Edd0\n", " \n", "# we don't need this...\n", "# ddb = [exp(n) for n in dd0 if n>=cst1 and n<=cst2] \n", "# ddb.sort()\n", " " ] }, { "cell_type": "code", "execution_count": 33, "metadata": { "tags": [] }, "outputs": [], "source": [ "def make_bins(gendd,glnebu,glnebo,gKink,gnbrgrp,gstep_bin,garndkinkn,garndkinkp,gpolyorder,gchettydum): \n", " #show(len(ddb))\n", " bin_IG = binning_bunch(np.array(gendd),gKink,gstep_bin,gnbrgrp)\n", " #dd = list(zip(list(np.array(bin_IG['IG'])*gstep_bin+Kink),bin_IG['FREQ'].list()))\n", " #import pdb; pdb.set_trace() \n", " dd = [ (bin_IG['IG'][i][0]*gstep_bin+gKink , bin_IG['FREQ'][i][0]) for i in range(bin_IG['IG'].nrows())]\n", " Ntot = sum(bin_IG['FREQ'].list())\n", " densb = [ (nn[0],(nn[1]/Ntot).n()) for nn in dd]\n", "\n", " if gchettydum:\n", " st2 = design_bunch(bin_IG['IG'],bin_IG['FREQ'],gKink,garndkinkn,garndkinkp,gpolyorder)\n", " return densb,Ntot,bin_IG,st2\n", " \n", " return densb,Ntot" ] }, { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "## Seeds" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "data": { "text/html": [ " $\\newcommand{\\Bold}[1]{\\mathbf{#1}}91503022025$ " ], "text/latex": [ "\\begin{math}\n", "\\newcommand{\\Bold}[1]{\\mathbf{#1}}91503022025\n", "\\end{math}" ], "text/plain": [ "91503022025" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# reset the seed...\n", "set_random_seed(init_seed_GL)\n", "srs = initial_seed()\n", "show(srs)" ] }, { "cell_type": "code", "execution_count": 35, "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "outputs": [ { "data": { "text/html": [ " $\\newcommand{\\Bold}[1]{\\mathbf{#1}}1$ " ], "text/latex": [ "\\begin{math}\n", "\\newcommand{\\Bold}[1]{\\mathbf{#1}}1\n", "\\end{math}" ], "text/plain": [ "1" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ " $\\newcommand{\\Bold}[1]{\\mathbf{#1}}51$ " ], "text/latex": [ "\\begin{math}\n", "\\newcommand{\\Bold}[1]{\\mathbf{#1}}51\n", "\\end{math}" ], "text/plain": [ "51" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ " $\\newcommand{\\Bold}[1]{\\mathbf{#1}}101$ " ], "text/latex": [ "\\begin{math}\n", "\\newcommand{\\Bold}[1]{\\mathbf{#1}}101\n", "\\end{math}" ], "text/plain": [ "101" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ " $\\newcommand{\\Bold}[1]{\\mathbf{#1}}151$ " ], "text/latex": [ "\\begin{math}\n", "\\newcommand{\\Bold}[1]{\\mathbf{#1}}151\n", "\\end{math}" ], "text/plain": [ "151" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ " $\\newcommand{\\Bold}[1]{\\mathbf{#1}}201$ " ], "text/latex": [ "\\begin{math}\n", "\\newcommand{\\Bold}[1]{\\mathbf{#1}}201\n", "\\end{math}" ], "text/plain": [ "201" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ " $\\newcommand{\\Bold}[1]{\\mathbf{#1}}251$ " ], "text/latex": [ "\\begin{math}\n", "\\newcommand{\\Bold}[1]{\\mathbf{#1}}251\n", "\\end{math}" ], "text/plain": [ "251" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ " $\\newcommand{\\Bold}[1]{\\mathbf{#1}}301$ " ], "text/latex": [ "\\begin{math}\n", "\\newcommand{\\Bold}[1]{\\mathbf{#1}}301\n", "\\end{math}" ], "text/plain": [ "301" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ " $\\newcommand{\\Bold}[1]{\\mathbf{#1}}351$ " ], "text/latex": [ "\\begin{math}\n", "\\newcommand{\\Bold}[1]{\\mathbf{#1}}351\n", "\\end{math}" ], "text/plain": [ "351" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ " $\\newcommand{\\Bold}[1]{\\mathbf{#1}}401$ " ], "text/latex": [ "\\begin{math}\n", "\\newcommand{\\Bold}[1]{\\mathbf{#1}}401\n", "\\end{math}" ], "text/plain": [ "401" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ " $\\newcommand{\\Bold}[1]{\\mathbf{#1}}451$ " ], "text/latex": [ "\\begin{math}\n", "\\newcommand{\\Bold}[1]{\\mathbf{#1}}451\n", "\\end{math}" ], "text/plain": [ "451" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ " $\\newcommand{\\Bold}[1]{\\mathbf{#1}}501$ " ], "text/latex": [ "\\begin{math}\n", "\\newcommand{\\Bold}[1]{\\mathbf{#1}}501\n", "\\end{math}" ], "text/plain": [ "501" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ " $\\newcommand{\\Bold}[1]{\\mathbf{#1}}551$ " ], "text/latex": [ "\\begin{math}\n", "\\newcommand{\\Bold}[1]{\\mathbf{#1}}551\n", "\\end{math}" ], "text/plain": [ "551" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ " $\\newcommand{\\Bold}[1]{\\mathbf{#1}}601$ " ], "text/latex": [ "\\begin{math}\n", "\\newcommand{\\Bold}[1]{\\mathbf{#1}}601\n", "\\end{math}" ], "text/plain": [ "601" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ " $\\newcommand{\\Bold}[1]{\\mathbf{#1}}651$ " ], "text/latex": [ "\\begin{math}\n", "\\newcommand{\\Bold}[1]{\\mathbf{#1}}651\n", "\\end{math}" ], "text/plain": [ "651" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ " $\\newcommand{\\Bold}[1]{\\mathbf{#1}}701$ " ], "text/latex": [ "\\begin{math}\n", "\\newcommand{\\Bold}[1]{\\mathbf{#1}}701\n", "\\end{math}" ], "text/plain": [ "701" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ " $\\newcommand{\\Bold}[1]{\\mathbf{#1}}751$ " ], "text/latex": [ "\\begin{math}\n", "\\newcommand{\\Bold}[1]{\\mathbf{#1}}751\n", "\\end{math}" ], "text/plain": [ "751" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ " $\\newcommand{\\Bold}[1]{\\mathbf{#1}}801$ " ], "text/latex": [ "\\begin{math}\n", "\\newcommand{\\Bold}[1]{\\mathbf{#1}}801\n", "\\end{math}" ], "text/plain": [ "801" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ " $\\newcommand{\\Bold}[1]{\\mathbf{#1}}851$ " ], "text/latex": [ "\\begin{math}\n", "\\newcommand{\\Bold}[1]{\\mathbf{#1}}851\n", "\\end{math}" ], "text/plain": [ "851" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ " $\\newcommand{\\Bold}[1]{\\mathbf{#1}}901$ " ], "text/latex": [ "\\begin{math}\n", "\\newcommand{\\Bold}[1]{\\mathbf{#1}}901\n", "\\end{math}" ], "text/plain": [ "901" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ " $\\newcommand{\\Bold}[1]{\\mathbf{#1}}951$ " ], "text/latex": [ "\\begin{math}\n", "\\newcommand{\\Bold}[1]{\\mathbf{#1}}951\n", "\\end{math}" ], "text/plain": [ "951" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# this is the normal normal model...\n", "# the parameters we pass on determine the type of data that is generated.\n", "# collect results likelihood\n", "collect_res_ll = []\n", "# collect results chetty polynomial approach\n", "collect_res_ch = []\n", "\n", "verb_TF = False\n", "#verb_TF = True\n", "#Nrep = 1\n", "for i in range(Nrep):\n", " #this is only temporary\n", " #cleaning up\n", " if i%50==1:\n", " save(collect_res_ll,file_collect_res_ll)\n", " save(collect_res_ch,file_collect_res_ch)\n", " show(i)\n", " DD = []\n", " DD0= []\n", " solib_LS1 = []\n", " #import pdb; pdb.set_trace() \n", " #generating data \n", " DD0 = gen_pareto_nor(Nsim,a,mu_paret_nor,paretExp_paret_nor,memu,mese,lnt1c,lnt2c,lnebu,lnebo,Kink,lnk) \n", " DD0[1].sort()\n", " \n", " \n", " #binning data\n", " DD = make_bins(DD0[1],lnebu,lnebo,Kink,70,STEP_BIN,-10,10,7,1)\n", " densb = list(DD[0]) \n", "\n", " # getting rid of last category...this is messy programming\n", " del densb[-1]\n", " \n", " Ntot = DD[1]\n", " \n", " # least squares first, from almost the correct values...\n", " tax_par = (densb,delta,lnt1c,lnt2c,lnebu,lnebo,Kink,lnk)\n", " solib_LSl = minimize(lsq0ib,[a,6.05,.75,5.35], args=tax_par, verbose=verb_TF)\n", " gd_LSl = STEP_BIN*matrix(RDF,grdcd(lsq0ib,solib_LSl,*tax_par))\n", " # maximum likelihood second\n", " solib_Lll = minimize(lik0ib,solib_LSl,args=tax_par, verbose=verb_TF)\n", " gd_Lll = STEP_BIN*matrix(RDF,grdcd(lik0ib,solib_Lll,*tax_par))\n", " \n", " collect_res_ll.append([i,solib_LSl,gd_LSl,solib_Lll,gd_Lll,Ntot])\n", " \n", " \n", " chet_res = chetty_poly_bunch_court(DD[2]['IG'],DD[2]['FREQ'],DD[3]['X'],DD[3]['X_nodum'],DD[3]['XXinvX'] ,DD[3]['get_beta_poly'],DD[3]['get_beta_dummy'],DD[3]['ndum'],DD[3]['people_nearbunch'])\n", " collect_res_ch.append((i,10,chet_res['b']/(lnt1c-lnt2c)/Kink))\n", " \n", " #re binning data: plus or minus 5\n", " DD = make_bins(DD0[1],lnebu,lnebo,Kink,70,STEP_BIN,-5,5,7,1)\n", " chet_res = chetty_poly_bunch_court(DD[2]['IG'],DD[2]['FREQ'],DD[3]['X'],DD[3]['X_nodum'],DD[3]['XXinvX'] ,DD[3]['get_beta_poly'],DD[3]['get_beta_dummy'],DD[3]['ndum'],DD[3]['people_nearbunch'])\n", " collect_res_ch.append((i,5,chet_res['b']/(lnt1c-lnt2c)/Kink)) \n", " \n", " #re binning data: plus or minus 15\n", " DD = make_bins(DD0[1],lnebu,lnebo,Kink,70,STEP_BIN,-15,15,7,1)\n", " chet_res = chetty_poly_bunch_court(DD[2]['IG'],DD[2]['FREQ'],DD[3]['X'],DD[3]['X_nodum'],DD[3]['XXinvX'] ,DD[3]['get_beta_poly'],DD[3]['get_beta_dummy'],DD[3]['ndum'],DD[3]['people_nearbunch'])\n", " collect_res_ch.append((i,15,chet_res['b']/(lnt1c-lnt2c)/Kink))\n", " \n", "save(collect_res_ll,file_collect_res_ll)\n", "save(collect_res_ch,file_collect_res_ch) " ] } ], "metadata": { "kernelspec": { "display_name": "SageMath 9.5", "language": "sage", "name": "sagemath" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.12" } }, "nbformat": 4, "nbformat_minor": 4 }