[기계학습 4강] Logistic Regression with Gradient Ascent

Notice

Recent Posts

Recent Comments

Link

GitHub

« 2025/04 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

Tags more

Archives

Today

Total

관리 메뉴

히비스서커스의 블로그

[기계학습 4강] Logistic Regression with Gradient Ascent 본문

Theory/Machine Learning

[기계학습 4강] Logistic Regression with Gradient Ascent

HibisCircus 2021. 4. 15. 23:00

728x90

※이 내용들은 edwith(KAIST Open Online Course)의 인공지능 및 기계학습 개론 1 Chap. 4강 내용을 기반으로 재구성하였음을 먼저 밝힙니다.※

Finding $\theta$ with Gradient Ascent

우리는 이전에 Logistic Regression에서

$\hat{\theta} = argmax_{\theta} \prod_{1 \leq i \leq N} P(Y_i | X_i ; \theta)$

와 같이 \theta$를 얻기 위하여 아래와 같이

$f(\theta) = \sum_{1 \leq i \leq N} log( P(Y_i | X_i ; \theta))$

$\frac{\partial f(\theta)}{\partial \theta_j} = \frac{\partial}{\partial \theta_j} \left \{ \sum_{1 \leq i \leq N} log(P(Y_i | X_i ; \theta)) \right \} = \sum_{1 \leq i \leq N} X_{i, j} (Y_i - P(y=1 | x; \theta))$

미분의 과정까지 거쳐보았다. 여기서 $\theta$ 가 open form solution이었기 때문에 이를 최적화하기 위해서는 approximate한 값을 구해야했다. 그러기 위해서 우리는 Gradient Ascent방법을 적용(argmax이기 때문)할 수 있다. 이를 적용해보면

$x_{t+1} \leftarrow x_t + hu^* = x_t + h \frac{f'(x_t)}{|f'(x_t)|}$

임을 고려하여

$\theta_{j}^{t+1} \leftarrow \theta_{j}^{t} + h \frac{\partial f(\theta^t)}{\partial \theta_{j}^t} = \theta_{j}^{t} + h \left \{ \sum_{1 \leq i \leq N} X_{i, j} (Y_i - P(Y = 1 | X_i ; \theta^t)) \right \}$

$= \theta_{j}^{t+1} + \frac{h}{C} \left \{ \sum_{1 \leq i \leq N} X_{i, j} ( Y_i - \frac{e^{X_i \theta^t}}{1 + e^{X_i \theta^t}}) \right \}$

( $\theta_{j}^{0}$ 는 임의로 선택될 수 있음, $C$ 는 유닛벡터를 만들기 위해 Normalize해주는 값)

과 같이 된다.

이제 우리가 구했던 Linear Regression을 다시 살펴보자.

Linear Regression Revisited

이전에 우리는

$\theta = argmin_{\theta}(f - \hat{f})^{2}$

$= argmin_{\theta} (Y - X \theta)^{2}$

$= argmin_{\theta} (Y - X \theta)^{t}$

$= argmin_{\theta} (Y^{t} - \theta^{t}X^{t})(Y - X \theta)$

$= argmin_{\theta} Y^{t}Y - Y^{t} X \theta - \theta^{t} X^{t} Y + \theta^{T} X^{T} X \theta$

$= argmin_{\theta} (\theta^{t} X^{t} X \theta - 2 \theta^{t} X^{t} Y + Y^{t}Y)$ , $\because Y^{t}X\theta = (X\theta)^{t}Y = \theta^{t} X^{t}Y$

$= argmin_{\theta} (\theta^{t} X^{t} X \theta - 2 \theta^{t} X^{t} Y)$ , $\theta$ 입장에서는 $Y^{t} Y$ 는 상수이기 때문

여기서 최적의 $\theta$ 는

$\nabla_{\theta} (\theta^{t}X^{t}X\theta - 2\theta^{t}X^{t}Y) = 0$

$2X^{t}X\theta - 2X^{t}Y = 0$

$\theta = (X^{t}X)^{-1} X{t}Y$

임을 살펴본 바가 있다.

Linear Regression에서는 closed form solution으로 더 이상 문제가 될 것이 없어보이나 X를 이루는 데이터가 커지면 문제가 될 수 있다. 왜냐하면 역행렬을 구하는 과정에서의 계산량이 매우 많이 들기 때문이다. 따라서 gradient descent 방법으로 이를 해결할 수 있는데 적용해보면

$\theta = argmin_{\theta}(f - \hat{f})^{2} = argmin_{\theta} (Y - X \theta)^{2}$

$\frac{\partial}{\partial \theta_k} \sum_{1 \leq i \leq N} (Y^i - \sum_{1 \leq j \leq d X_{j}^{i} \theta_j})^2 = -\sum_{1 \leq i \leq N} 2(Y^i - \sum_{1 \leq j \leq d} X_{j}^{i} \theta_j) X_{k}^{i}$

$\theta_{k}^{t+1} \leftarrow \theta_{k}^{t} - h \frac{\partial f(\theta^t)}{\partial \theta_{k}^{t}} = \theta_{k}^{t} +\sum_{1 \leq i \leq N} 2(Y^i - \sum_{1 \leq j \leq d} X_{j}^{i} \theta_{j}) X_{k}^{i}$

과 같이 될 수 있음을 알 수 있다.

-히비스서커스-

728x90

저작자표시

'Theory > Machine Learning' 카테고리의 다른 글

[기계학습 5강] Decision boundary with margin (0)	2021.04.19
[기계학습 4강] Naive Bayes and Logistic Regression (0)	2021.04.16
[기계학습 4강] Gradient Method (0)	2021.04.13
[기계학습 4강] Logistic Regression (0)	2021.04.10
[기계학습 4강] Decision Boundary (0)	2021.04.08

'Theory/Machine Learning' Related Articles

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

히비스서커스의 블로그

히비스서커스의 블로그

[기계학습 4강] Logistic Regression with Gradient Ascent 본문

[기계학습 4강] Logistic Regression with Gradient Ascent

Finding $\theta$ with Gradient Ascent

Linear Regression Revisited

'Theory > Machine Learning' 카테고리의 다른 글

티스토리툴바

개인정보

단축키

내 블로그

블로그 게시글

모든 영역

히비스서커스의 블로그

[기계학습 4강] Logistic Regression with Gradient Ascent 본문

[기계학습 4강] Logistic Regression with Gradient Ascent

Finding θ\theta with Gradient Ascent

Linear Regression Revisited

'Theory > Machine Learning' 카테고리의 다른 글

티스토리툴바

개인정보

단축키

내 블로그

블로그 게시글

모든 영역

Finding $\theta$ with Gradient Ascent