# The Ones and Zeros of Numpy as np

As with any good project nowadays, there eventually came a point when some machine learning based classification "needed" to come into play (this is more of a joke, as you should seek to use the right tool for the job).

There are a lot of different toolkits for working with datasets for the purpose of data mining, feature extraction and classification. Scikit learn is a Python package that was written by some interns over at Google. It is easily installed using `pip install -U scikit-learn`

.

Support Vector Classifier (SVC) is a common classifier that can be used for multiple classes. It is a part of the group of machine learning algorithms known as Support Vector Machines (SVM).

Lets say we wanted to classify a set of X, Y corrdinates into two groups; positive and negative numbers. We could use a SVC to train a dataset for prediction of future inputs.

- 0 will represent the group of positive coordinates
- 1 will represent the group of negative coordinates

```
import numpy as np
from sklearn import svm
# define positive and negative coordinates
positive = np.array([[1,1], [5,5], [2,3], [2,4]])
negative = np.array([[-1,-1], [-3,-4], [-5, -1], [-2,-2]])
```

Positive and negative pairs of coordinates have now been created. These represent the groups 0 and 1 respectively. We will use this information to train a classifier and then test some new coordinates to see how accurate it has been trained. Usually real datasets are much larger, but this is simple.

```
# generate X data for the classifer by combining the positive and negative numbers
X = np.vstack((positive, negative))
# generate y group labels for the data
y = [0,0,0,0,1,1,1,1]
# or using the ones and zeros of np!
# might not seem like much here, but on larger datasets, this helps a lot!
y = np.append(np.zeros(positive.shape[0]), np.ones(negative.shape[0]))
```

The classfier takes samples of both groups (`X`

), while an array of binary group labels(`y`

), 1 and 0, were created to classify the training data.

The model for the data can then be fit using `sklearn.svm`

.

```
clf = svm.SVC(kernel='linear')
fit = clf.fit(X, y)
```

The output of this command on in a `print`

statement is probably similar to,

```
>>>print fit
SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
decision_function_shape=None, degree=3, gamma='auto', kernel='linear',
max_iter=-1, probability=False, random_state=None, shrinking=True,
tol=0.001, verbose=False)
```

Yay! We have now trained our classifier.

Lets see how well we did but trying a few test cases.

Points can be passed as an array, or individual coordinates using,

`predict = fit.predict([[1,4]])`

A score can be derived using a set of labeled test data such as

`score = fit.score([[2,3],[4,1],[5,4],[2,1],[4,3]], np.zeros(5))`

for testing new positive coordinates. The same can be done with negative corrdinates or any combination of both.

Print out the results.

```
print "predict ", predict # -> [0]
print "score ", score # -> [1.0]
```

np.ones and np.zeros. A new appreciation. Perfect classification.