Regression with constraints on contribution from variables
Problem Description:
I’m trying to develop a regression model with constraints on effect from the independent variables. So my model equation is y = a0 + a1x1 + a2x2 with 200 datapoints. What I want to achieve is sum(a1x1) over 200 datapoints should fall in certain range i.e. lb1<sum(a1x1)<ub1. I am using Gekko for the optimization part and a got stuck while applying this condition.
I am using the following code where ubdict is the dictionary for the boundaries:
m = gk.GEKKO(remote=False)
m.options.IMODE=2 #Regression mode
y = np.array(df['y']) #dependant vars for optimization
x = np.array(df[X]) #array of independent vars for optimization
n = x.shape[1] #number of variables
c = m.Array(m.FV, n+1) #array of parameters and intercept
for ci in c:
ci.STATUS = 1 #calculate fixed parameter
xp = [None]*n
#load data
xd = m.Array(m.Param,n)
yd = m.Param(value=y)
for i in range(n):
xd[i].value = x[:,i]
xp[i] = m.Var()
if ubound_dict[i] >= 0:
xp[i] = m.Var(lb=0, ub=ubdict[i])
elif ubound_dict[i] < 0:
xp[i] = m.Var(lb=ubdict[i], ub=0)
m.Equation(xp[i]==c[i]*xd[i])
yp = m.Var()
m.Equation(yp==m.sum([xp[i] for i in range(n)] + [c[n]]))
#Minimize difference between actual and predicted y
m.Minimize((yd-yp)**2)
#APOPT solver
m.options.SOLVER = 1
#Solve
m.solve(disp=True)
#Retrieve parameter values
a = [i.value[0] for i in c]
print(a)
But this is applying the constraint row-wise. What I want is something like
xp[i] = m.Var(lb=0, ub=ubdict[i])
m.Equation(xp[i]==sum(c[i]*xd[i]) over observations)
Any suggestion would be of great help!
Solution – 1
Below is a similar problem with sample data.
Regression Mode with IMODE=2
Use the m.vsum()
object in Gekko with IMODE=2
. Gekko lets you write the equations once and then applies the data to each equation. This is more efficient for large-scale data sets.
import numpy as np
from gekko import GEKKO
# load data
x1 = np.array([1,2,5,3,2,5,2])
x2 = np.array([5,6,7,2,1,3,2])
ym = np.array([3,2,3,5,6,7,8])
# model
m = GEKKO()
c = m.Array(m.FV,3)
for ci in c:
ci.STATUS=1
x1 = m.Param(value=x1)
x2 = m.Param(value=x2)
ymeas = m.Param(value=ym)
ypred = m.Var()
m.Equation(ypred == c[0] + c[1]*x1 + c[2]*x2)
# add constraint on sum(c[1]*x1) with vsum
v1 = m.Var(); m.Equation(v1==c[1]*x1)
con = m.Var(lb=0,ub=10); m.Equation(con==m.vsum(v1))
m.Minimize((ypred-ymeas)**2)
m.options.IMODE = 2
m.solve()
print('Final SSE Objective: ' + str(m.options.objfcnval))
print('Solution')
for i,ci in enumerate(c):
print(i,ci.value[0])
# plot solution
import matplotlib.pyplot as plt
plt.figure(figsize=(8,4))
plt.plot(ymeas,ypred,'ro')
plt.plot([0,10],[0,10],'k-')
plt.xlabel('Meas')
plt.ylabel('Pred')
plt.savefig('results.png',dpi=300)
plt.show()
Optimization Mode (IMODE=3)
The optimization mode 3 allows you to write each equation and objective term individually. Both give the same solution.
import numpy as np
from gekko import GEKKO
# load data
x1 = np.array([1,2,5,3,2,5,2])
x2 = np.array([5,6,7,2,1,3,2])
ym = np.array([3,2,3,5,6,7,8])
n = len(ym)
# model
m = GEKKO()
c = m.Array(m.FV,3)
for ci in c:
ci.STATUS=1
yp = m.Array(m.Var,n)
for i in range(n):
m.Equation(yp[i]==c[0]+c[1]*x1[i]+c[2]*x2[i])
m.Minimize((yp[i]-ym[i])**2)
# add constraint on sum(c[1]*x1)
s = m.Var(lb=0,ub=10); m.Equation(s==c[1]*sum(x1))
m.options.IMODE = 3
m.solve()
print('Final SSE Objective: ' + str(m.options.objfcnval))
print('Solution')
for i,ci in enumerate(c):
print(i,ci.value[0])
# plot solution
import matplotlib.pyplot as plt
plt.figure(figsize=(8,4))
ypv = [yp[i].value[0] for i in range(n)]
plt.plot(ym,ypv,'ro')
plt.plot([0,10],[0,10],'k-')
plt.xlabel('Meas')
plt.ylabel('Pred')
plt.savefig('results.png',dpi=300)
plt.show()
For future questions, please create a simple and complete example that demonstrates the issue.