Machine Learning의 개념과 용어

Sung Kim 교수님 강의로 공부하는 머신러닝 Section 0,1

Machine Learning 등장 배경

명시적인 프로그래밍(explicit programming)의 한계를 느끼기 시작했다. Spam detection이나, 자율주행의 경우 rule이 너무 많아 explicit programming을 적용하기 어렵다. 하지만, 머신러닝은 학습하고, 자동화되는 프로그래밍으로, 이런 문제들을 해결하기에 적합하다.

Supervised v.s. Unsupervised

학습 방법에 따라 supervised learning과 unsupervised learning으로 나뉜다.

Supervised Learning
- most common problem type of ML
- learning with labeled examples - training set
- 예로 Image labeling, Spam filtering(spam or pam), Predicting exam score이 있다.
- 대표적인 기법으로는 regression, classification이 있다.
- classification은 label의 개수에 따라 binary(class 2개), multi-label(class 3개 이상)로 구분된다.
Unsupervised Learning
- 예로는 News grouping, Word clustering이 있다.

Training set(data) v.s. Test set(data)

Modeling에 사용되는 데이터를 training data, 모델을 만든 후 성능을 검증할 때 쓰는 data를 test data라고 한다. 보통 6:4 또는 7:3의 비율로 나눈다.

Tensorflow

an open source software library for numerical computation using data flow graphs

구글이 만든 오픈소스 라이브러리
Machine learning, artificail intellegence를 위한 python library

Data Flow Graph

Node와 Edge로 이루어진 flow graph

Node: mathematical operation
Edge: tensor(multidimensional data arrays), data

TensorFlow Mechanics

Node and Session

Build graph(tensors) using TF operations
Feed data and run graph(operation)
- sess.run(op) Node를 placeholder라는 특별한 node로 만들 수 있다.
- sess.run(op, feed_dict={x: x_data})
Update variables in the graph and return values

Tensor Ranks, Shapes, and Types

Rank: 차원
- Rank 0: scalar = 4
- Rank 1: vector = [1, 2, 3]
- Rank 2: Matrix = [[1,2,3], [4,5,6]]
- Rank 3: 3-Tensor = [[[2],[4],[6]]]
- Rank n: n-Tensor = ...
Shape: element number, dimension number
- 0-D: [] (scalar)
- 1-D: [5]
- 2-D: [3,4]
- n-D: [1,3,..,n]
Type
- float, double, int…
Example
- [[1,2,3], [4,5,6]]: Rank 2, Shape [2, 3]
- [[[1., 2., 3.]], [[7., 8., 9.]]]: Rank 3, Shape [2,1,3]

Variable

우리가 흔히 아는 변수가 아닌, tensorflow가 학습하는 과정에서 자체적으로 변경시키는 trainable한 variable

W = tf.Variable(tf.random_normal([1]), name="weight")
b = tf.Variable(tf.random_normal([1]), name="bias")

모든 코드는 Sung Kim 교수님의 Github에서 확인하실 수 있습니다.