Using Julia to Find MLE for a Factor Model
I’ve been staying home for a bit more than a week now. While keep working on my research, I also think it may help fill my time by picking up some skills. I’ve been following the development of Julia but haven’t had the time to start learning it seriously, so this is the perfect time! As my first learning note, I’ve decided to try obtaining maximum likelihood estimates on some simulated data based on a one-factor model. I’m sure a lot of improvement can be made as this is really my first Julia notebook. But since to my knowledge there is currently no SEM packages in Julia, may be it could be a starting point.
Set Seed
using Random, Distributions
Random.seed!(123) # Setting the seedGenerate Univarate Normal Data
Let’s generate some normal data. In Julia the Distribution.Normal function can be used to set a Normal distribution. The Base.rand function will then generate data from that class. I was amazed that the Distributions package supported a wide range of distributions.
d = Normal()
x = rand(d, 100)100-element Array{Float64,1}:
1.1902678809862768
2.04817970778924
1.142650902867199
0.45941562040708034
-0.396679079295223
-0.6647125451916877
0.9809678267585334
-0.07548306639775595
0.27381537121215616
-0.19422906710572047
-0.33936602980781916
-0.84387792144707
-0.8889357468973064
⋮
-0.7339610029444202
0.45939803668120377
1.7061863874321828
0.6784427697934589
0.28717953880710856
1.0681555054109295
-0.3068768981211787
-1.9202140874350073
1.6696020873668111
-0.2135576214062456
-0.16371133936712523
-0.9029858060964956
using StatsPlots
density(x)Generate Multivariate Normal Data
d = MvNormal([1 0.5
0.5 1])ZeroMeanFullNormal(
dim: 2
μ: [0.0, 0.0]
Σ: [1.0 0.5; 0.5 1.0]
)
x = rand(d, 1000)2×1000 Array{Float64,2}:
0.173994 1.25713 -1.33584 … 1.22644 0.207192 0.254031
-0.371218 -0.22681 -0.730016 0.944955 0.606183 -0.000434324
# Covariance matrix
using LinearAlgebra
cov(x')2×2 Array{Float64,2}:
0.987069 0.473074
0.473074 0.949026
The above shows how Julia can simulate multivariate normal data. Now, generate some cfa data
Λ = [.7
.7
.7
.7]
Σ = Λ*Λ' + Diagonal(ones(4) * .51)4×4 Array{Float64,2}:
1.0 0.49 0.49 0.49
0.49 1.0 0.49 0.49
0.49 0.49 1.0 0.49
0.49 0.49 0.49 1.0
# Simulate data
y = rand(MvNormal(Σ), 100)
Sy = cov(y')4×4 Array{Float64,2}:
0.846131 0.460656 0.418965 0.368367
0.460656 0.942118 0.396868 0.405196
0.418965 0.396868 0.98197 0.342197
0.368367 0.405196 0.342197 0.902086
function loglik(θ, S = Sy)
Λ = θ[1:4]
Θ = Diagonal(θ[5:8])
Σ = Λ*Λ' + Θ
logdet(Σ) + tr(Σ\Sy)
endloglik (generic function with 2 methods)
# Try the function
loglik([.8, .7, .7, .7, .51, .51, .51, .51])2.8167873279580555
# Find maximum likelihood estimates
using Optim
@time optimize(loglik, [ones(4) * 0.7; ones(4) * 0.51], LBFGS()) 0.030934 seconds (24.63 k allocations: 1.637 MiB)
* Status: success
* Candidate solution
Minimizer: [6.72e-01, 6.85e-01, 6.03e-01, ...]
Minimum: 2.748551e+00
* Found with
Algorithm: L-BFGS
Initial Point: [7.00e-01, 7.00e-01, 7.00e-01, ...]
* Convergence measures
|x - x'| = 7.06e-09 ≰ 0.0e+00
|x - x'|/|x'| = 1.03e-08 ≰ 0.0e+00
|f(x) - f(x')| = 1.33e-15 ≰ 0.0e+00
|f(x) - f(x')|/|f(x')| = 4.85e-16 ≰ 0.0e+00
|g(x)| = 7.33e-11 ≤ 1.0e-08
* Work counters
Seconds run: 0 (vs limit Inf)
Iterations: 12
f(x) calls: 33
∇f(x) calls: 33
# Use Hessians
using NLSolversBase, ForwardDiff
td = TwiceDifferentiable(loglik, [ones(4) * 0.7; ones(4) * 0.51]; autodiff = :forward)
@time opt = optimize(td, [ones(4) * 0.7; ones(4) * 0.51]) 0.540849 seconds (330.71 k allocations: 17.147 MiB, 2.58% gc time)
* Status: success
* Candidate solution
Minimizer: [6.72e-01, 6.85e-01, 6.03e-01, ...]
Minimum: 2.748551e+00
* Found with
Algorithm: Newton's Method
Initial Point: [7.00e-01, 7.00e-01, 7.00e-01, ...]
* Convergence measures
|x - x'| = 2.33e-06 ≰ 0.0e+00
|x - x'|/|x'| = 3.40e-06 ≰ 0.0e+00
|f(x) - f(x')| = 2.45e-11 ≰ 0.0e+00
|f(x) - f(x')|/|f(x')| = 8.92e-12 ≰ 0.0e+00
|g(x)| = 1.20e-11 ≤ 1.0e-08
* Work counters
Seconds run: 0 (vs limit Inf)
Iterations: 4
f(x) calls: 12
∇f(x) calls: 12
∇²f(x) calls: 4
# Numerical Hessian
Optim.minimizer(opt)
@time ForwardDiff.hessian(loglik, Optim.minimizer(opt)) 0.026639 seconds (8.52 k allocations: 540.367 KiB)
8×8 Array{Float64,2}:
3.16649 -0.6156 -0.440022 … -0.390832 -0.150853 -0.205811
-0.6156 2.72063 -0.325774 1.09962 -0.174804 -0.124924
-0.440022 -0.325774 2.24179 -0.260119 0.634179 -0.102112
-0.38865 -0.380951 -0.237476 -0.184736 -0.101475 0.686983
1.48048 -0.46019 -0.264316 0.333658 0.106909 0.198808
-0.390832 1.09962 -0.260119 … 2.64123 0.144781 0.0711164
-0.150853 -0.174804 0.634179 0.144781 1.94531 0.0498062
-0.205811 -0.124924 -0.102112 0.0711164 0.0498062 2.25437
# Using NLSolverBase
@time hess = NLSolversBase.hessian!(td, Optim.minimizer(opt)) 0.000150 seconds (46 allocations: 80.734 KiB)
8×8 Array{Float64,2}:
3.16649 -0.6156 -0.440022 … -0.390832 -0.150853 -0.205811
-0.6156 2.72063 -0.325774 1.09962 -0.174804 -0.124924
-0.440022 -0.325774 2.24179 -0.260119 0.634179 -0.102112
-0.38865 -0.380951 -0.237476 -0.184736 -0.101475 0.686983
1.48048 -0.46019 -0.264316 0.333658 0.106909 0.198808
-0.390832 1.09962 -0.260119 … 2.64123 0.144781 0.0711164
-0.150853 -0.174804 0.634179 0.144781 1.94531 0.0498062
-0.205811 -0.124924 -0.102112 0.0711164 0.0498062 2.25437
# Asymptotic covariance matrix
diag(inv(hess) * 2 / 100)8-element Array{Float64,1}:
0.008868570489264425
0.009937870865887575
0.010707718924937756
0.009921369374706349
0.00792975201139125
0.009846163164104109
0.011585717625197327
0.00995946113518446