機器學習專題 --02 邏輯回歸和最大熵模型(純python實現和sklearn)

掘金用戶007 2021-08-15 21:13:43 阅读数:736

本文一共[544]字,预计阅读时长:1分钟~
最大 模型 python sklearn

這是我參與8月更文挑戰的第14天,活動詳情查看:8月更文挑戰

邏輯回歸

介紹

首先對要有一個感性的認識,它是一個分類模型。其次它也是一個線性模型和線性回歸的參數空間是一致的,信息都蘊含在w和b中。

也就是在線性回歸上再加一層映射函數f,這個映射函數我們一般用sigmoid函數,函數形式如下所示:

f ( x 11 w 1 + x 12 w 2 + x 13 w 3 + . . . + x 1 j w j + . . . + x 1 m w m + b ) = y ^ 1 f( x_{11} w_1 + x_{12} w_2 + x_{13} w_3 + ... + x_{1j} w_j + ... + x_{1m} w_m + b) = \hat y_1

其中 f ( x ) = 1 1 + e x f(x) = \frac{1}{1 + e^{-x}}

image.png

一句話概括邏輯回歸就是:邏輯回歸假設數據服從伯努利分布,通過極大化似然函數的方法,運用梯度下降來求解參數,來達到二分類的目的。其中:

p ( y i = 1 x i ; w , b ) = f ( x i w + b ) = 1 1 + e ( x i w + b ) p(y_i=1|x_i; w,b) = f(x_i w + b) = \frac{1}{1 + e^{-(x_i w +b)}}

p ( y i = 0 x i ; w , b ) = 1 f ( x i w + b ) = e ( x i w + b ) 1 + e ( x i w + b ) p(y_i=0|x_i; w,b) = 1-f(x_i w + b) = \frac{e^{-(x_i w +b)}}{1 + e^{-(x_i w +b)}}

損失函數

對所有樣本求似然函數:

L ( w , b ) = i = 1 n [ f ( x i w + b ) ] y i [ 1 f ( x i w + b ) ] ( 1 y i ) L(w,b) = \prod_{i=1}^{n} [f(x_i w + b)] ^{y_i} [1-f(x_i w + b)]^{(1-y_i)}

對數似然函數:

log L ( w , b ) = log i = 1 n [ f ( x i w + b ) ] y i [ 1 f ( x i w + b ) ] ( 1 y i ) \log L(w,b) = \log \prod_{i=1}^{n} [f(x_i w + b)] ^{y_i} [1-f(x_i w + b)]^{(1-y_i)}

似然函數求取的是最大值,所以損失函數(代價函數)可以定義為(也就是似然函數再乘一個負號):

J ( w , b ) = log L ( w , b ) = i = 1 n y i log [ f ( x i w + b ) ] + ( 1 y i ) log [ 1 f ( x i w + b ) ] J(w,b) = -\log L(w, b) = -\sum_{i=1}^n y_i \log [f(x_i w + b)] + (1-y_i) \log [1-f(x_i w + b)]

預先算好 f ( x ) f'(x) :

f ( x ) = ( 1 1 + e x ) = 1 ( 1 + e x ) 2 ( 1 + e x ) = 1 1 + e x e x 1 + e x = f ( 1 f ) \begin{aligned} f'(x) &= (\frac{1}{1 + e^{-x}})'\\ &=- \frac{1}{ {(1+e^{-x}})^2} (1+e^{-x})'\\ &=\frac{1}{1+e^{-x}} \frac{e^{-x}}{1+e^{-x}} \\ &=f*(1-f) \end{aligned}

對w求偏導: J ( w , b ) w j = w j { i = 1 n y i log [ f ( x i w + b ) ] + ( 1 y i ) log [ 1 f ( x i w + b ) ] } = i = 1 n { y i 1 f ( x i w + b ) ( 1 y i ) 1 1 f ( x i w + b ) } f ( x i w + b ) w j = i = 1 n { y i 1 f ( x i w + b ) ( 1 y i ) 1 1 f ( x i w + b ) } f ( x i w + b ) [ 1 f ( x i w + b ) ] ( x i w + b ) w j = i = 1 n { y i [ 1 f ( x i w + b ) ] ( 1 y i ) f ( x i w + b ) } ( x i w + b ) w j = i = 1 n { y i [ 1 f ( x i w + b ) ] ( 1 y i ) f ( x i w + b ) } x i j = i = 1 n { y i f ( x i w + b ) } x i j = i = 1 n { f ( x i w + b ) y i } x i j = i = 1 n { y ^ i y i } x i j \begin{aligned} \frac {\partial J(w,b)}{\partial w_j} &= \frac {\partial}{\partial w_j} \{-\sum_{i=1}^n y_i \log [f(x_i w + b)] + (1-y_i) \log [1-f(x_i w + b)] \} \\ &= -\sum_{i=1}^n \{y_i \frac {1}{f(x_i w + b)} - (1-y_i) \frac {1}{1-f(x_i w + b)} \} \frac {\partial f(x_i w + b)} {\partial w_j} \\ &= -\sum_{i=1}^n \{y_i \frac {1}{f(x_i w + b)} - (1-y_i) \frac {1}{1-f(x_i w + b)} \} f(x_i w + b) [1-f(x_i w + b)] \frac {\partial (x_i w + b)} {\partial w_j} \\ &= -\sum_{i=1}^n \{y_i [1-f(x_i w + b)] - (1-y_i) f(x_i w + b) \} \frac {\partial (x_i w + b)} {\partial w_j} \\ &= -\sum_{i=1}^n \{y_i [1-f(x_i w + b)] - (1-y_i) f(x_i w + b) \} x_{ij} \\ &= -\sum_{i=1}^n \{y_i - f(x_i w + b) \} x_{ij} \\ &= \sum_{i=1}^n \{f(x_i w + b) - y_i \} x_{ij} \\ &= \sum_{i=1}^n \{\hat y_i - y_i \} x_{ij} \\ \end{aligned}

同理對b求偏導:

J ( w , b ) b = i = 1 n f ( x i w + b ) y i = y ^ i y i \frac {\partial J(w,b)}{\partial b} = \sum_{i=1}^n f(x_i w + b) - y_i = \hat y_i - y_i

可以用隨機梯度下降的方法對w和b進行求解:

w j = w j α i = 1 n ( f ( x i w + b ) y i ) x i j w_j = w_j - \alpha \sum_{i=1}^{n} {(f(x_i w + b) - y_i)} x_{ij}

b = b α i = 1 n ( f ( x i w + b ) y i ) b = b - \alpha \sum_{i=1}^{n} {(f(x_i w + b) - y_i)}

這兩個式子也是手撕代碼的關鍵。

例子

純python實現

暫時只使用鳶尾花的前兩類作為數據進行分類。

# 暫時實現二分類
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
import numpy as np
def get_train_test():
iris = load_iris()
index = list(iris.target).index(2) # only for class0 andclass1
iris = load_iris()
X = iris.data[:index]
y = iris.target[:index]
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.2,random_state=0)
return X_train,y_train,X_test,y_test
def sigmoid(x):
return 1/(1+ np.exp(-x))
lr = LogisticRegression()
X_train,y_train,X_test,y_test = get_train_test()
X_train.shape,y_train.shape,X_test.shape,y_test.shape
lr.fit(X_train,y_train)
predictions = lr.predict(X_test)
print(y_test == (predictions >0.5))
複制代碼

skleran

from sklearn.datasets import load_iris
iris = load_iris()
X = iris.data
Y = iris.target
# 將數據劃分為訓練集和測試集
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test = train_test_split(X,Y,test_size=0.2,random_state=0)
# 導入模型,調用邏輯回歸 LogisticRegression()函數
from sklearn.linear_model import LogisticRegression
# lr = LogisticRegression(penalty='l2',solver='newton-cg',multi_class='multinomial')
lr = LogisticRegression()
lr.fit(x_train,y_train)
# 對模型進行評估
print('邏輯回歸訓練集准確率:%.3f'% lr.score(x_train,y_train))
print('邏輯回歸測試集准確率:%.3f'% lr.score(x_test,y_test))
from sklearn import metrics
pred = lr.predict(x_test)
accuracy = metrics.accuracy_score(y_test,pred)
print('邏輯回歸模型准確率:%.3f'% accuracy)
複制代碼

最大熵模型

首先對最大熵模型有一個感性認識,它是一個使得熵最大的模型,(說了沒沒說一樣)。就是除了已經給出的條件其他的應該是越混亂,或者說越隨機越好。而這種隨機體現在數學上就是最大熵模型。

代碼核心公式:

δ i = 1 f ( x , y ) log E P ^ ( f i ) E P ( f i ) \delta_i = \frac {1}{f^*(x,y)} \log \frac {E_{\hat P}(f_i) }{E_P(f_i)}

想要求得參數 w i w_i 就可以通過迭代得到:

w i = w i + δ i w_i = w_i + \delta_i

python實現

class MaxEntropy(object):
def __init__(self, lr=0.01,epoch = 1000):
self.lr = lr # 學習率
self.N = None # 數據個數
self.n = None # Xy對
self.hat_Ep = None
# self.sampleXY = []
self.labels = None
self.xy_couple = {}
self.xy_id = {}
self.id_xy = {}
self.epoch = epoch
def _rebuild_X(self,X):
X_result = []
for x in X:
print(x,self.X_columns)
X_result.append([y_s + '_' + x_s for x_s, y_s in zip(x, self.X_columns)])
return X_result
def build_data(self,X,y,X_columns):
self.X_columns = X_columns
self.y = y
self.X = self._rebuild_X(X)
self.N = len(X)
self.labels = set(y)
for x_i,y_i in zip(self.X,y):
for f in x_i:
self.xy_couple[(f,y_i)] = self.xy_couple.get((f,y_i),0) + 1
self.n = len(self.xy_couple.items())
def fit(self,X,y,X_columns):
self.build_data(X,y,X_columns)
self.w = [0] * self.n
for _ in range(self.epoch):
for i in range(self.n):
# self.w[i] += 1/self.n * np.log(self.get_hat_Ep(i) / self.get_Ep(i) ) # 此處乘1/self.n,或者乘一個較小的學習率
self.w[i] += self.lr * np.log(self.get_hat_Ep(i) / self.get_Ep(i) ) # 此處乘1/self.n,或者乘一個較小的學習率
# print(_,np.log(self.get_hat_Ep(i) / self.get_Ep(i) ) )
def predict(self,X):
print(X)
X = self._rebuild_X(X)
print(X)
result = [{} for _ in range(len(X))]
for i,x_i in enumerate (X):
for y in self.labels:
# print(x_i)
result[i][y] = self.get_Pyx(x_i,y)
return result
def get_hat_Ep(self,index):
self.hat_Ep = [0]*(self.n)
for i,xy in enumerate(self.xy_couple):
self.hat_Ep[i] = self.xy_couple[xy] / self.N
self.xy_id[xy] = i
self.id_xy[i] = xy
return self.hat_Ep[index]
def get_Zx(self,x_i):
Zx = 0
for y in self.labels:
count = 0
for f in x_i :
if (f,y) in self.xy_couple:
count += self.w[self.xy_id[(f,y)]]
Zx += np.exp(count)
return Zx
def get_Pyx(self,x_i,y):
count = 0
for f in x_i :
if (f,y) in self.xy_couple:
count += self.w[self.xy_id[(f,y)]]
return np.exp(count) / self.get_Zx(x_i)
def get_Ep(self,index):
f,y = self.id_xy[index]
# print(f,y)
ans = 0
# print(self.X)
for x_i in self.X:
if f not in x_i:
continue
pyx = self.get_Pyx(x_i,y)
ans += pyx / self.N
# print("ans",ans,pyx)
return ans
data_set = [['youth', 'no', 'no', '1', 'refuse'],
['youth', 'no', 'no', '2', 'refuse'],
['youth', 'yes', 'no', '2', 'agree'],
['youth', 'yes', 'yes', '1', 'agree'],
['youth', 'no', 'no', '1', 'refuse'],
['mid', 'no', 'no', '1', 'refuse'],
['mid', 'no', 'no', '2', 'refuse'],
['mid', 'yes', 'yes', '2', 'agree'],
['mid', 'no', 'yes', '3', 'agree'],
['mid', 'no', 'yes', '3', 'agree'],
['elder', 'no', 'yes', '3', 'agree'],
['elder', 'no', 'yes', '2', 'agree'],
['elder', 'yes', 'no', '2', 'agree'],
['elder', 'yes', 'no', '3', 'agree'],
['elder', 'no', 'no', '1', 'refuse'],
]
X = [i[:-1] for i in data_set]
X_columns = columns[:-1]
Y = [i[-1] for i in data_set]
train_X = X[:12]
test_X = X[12:]
train_Y = Y[:12]
test_Y = Y[12:]
columns = ['age', 'working', 'house', 'credit_situation','labels']
X_columns = columns[:-1]
mae = MaxEntropy()
mae.fit(train_X,train_Y,X_columns)
mae.predict(test_X)
複制代碼

參考資料

zhuanlan.zhihu.com/p/68423193

blog.csdn.net/weixin_4156…

blog.csdn.net/weixin_4156…

版权声明:本文为[掘金用戶007]所创,转载请带上原文链接,感谢。 https://gsmany.com/2021/08/20210815211248088p.html