# The Ones and Zeros of Numpy as np

As with any good project nowadays, there eventually came a point when some machine learning based classification "needed" to come into play (this is more of a joke, as you should seek to use the right tool for the job).

There are a lot of different toolkits for working with datasets for the purpose of data mining, feature extraction and classification. Scikit learn is a Python package that was written by some interns over at Google. It is easily installed using `pip install -U scikit-learn`.

Support Vector Classifier (SVC) is a common classifier that can be used for multiple classes. It is a part of the group of machine learning algorithms known as Support Vector Machines (SVM).

Lets say we wanted to classify a set of X, Y corrdinates into two groups; positive and negative numbers. We could use a SVC to train a dataset for prediction of future inputs.

• 0 will represent the group of positive coordinates
• 1 will represent the group of negative coordinates
``````import numpy as np
from sklearn import svm

# define positive and negative coordinates
positive = np.array([[1,1], [5,5], [2,3], [2,4]])
negative = np.array([[-1,-1], [-3,-4], [-5, -1], [-2,-2]])``````

Positive and negative pairs of coordinates have now been created. These represent the groups 0 and 1 respectively. We will use this information to train a classifier and then test some new coordinates to see how accurate it has been trained. Usually real datasets are much larger, but this is simple.

``````# generate X data for the classifer by combining the positive and negative numbers
X = np.vstack((positive, negative))

# generate y group labels for the data
y = [0,0,0,0,1,1,1,1]
# or using the ones and zeros of np!
# might not seem like much here, but on larger datasets, this helps a lot!
y = np.append(np.zeros(positive.shape), np.ones(negative.shape))``````

The classfier takes samples of both groups (`X`), while an array of binary group labels(`y`), 1 and 0, were created to classify the training data.

The model for the data can then be fit using `sklearn.svm`.

``````clf = svm.SVC(kernel='linear')
fit = clf.fit(X, y)``````

The output of this command on in a `print` statement is probably similar to,

``````>>>print fit
SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
decision_function_shape=None, degree=3, gamma='auto', kernel='linear',
max_iter=-1, probability=False, random_state=None, shrinking=True,
tol=0.001, verbose=False)``````

Yay! We have now trained our classifier.

Lets see how well we did but trying a few test cases.

Points can be passed as an array, or individual coordinates using,

``predict = fit.predict([[1,4]])``

A score can be derived using a set of labeled test data such as

``score = fit.score([[2,3],[4,1],[5,4],[2,1],[4,3]], np.zeros(5))``

for testing new positive coordinates. The same can be done with negative corrdinates or any combination of both.

Print out the results.

``````print "predict ", predict # -> 
print "score ", score # -> [1.0]``````

np.ones and np.zeros. A new appreciation. Perfect classification.