The Ones and Zeros of Numpy as np
As with any good project nowadays, there eventually came a point when some machine learning based classification "needed" to come into play (this is more of a joke, as you should seek to use the right tool for the job).
There are a lot of different toolkits for working with datasets for the purpose of data mining, feature extraction and classification. Scikit learn is a Python package that was written by some interns over at Google. It is easily installed using
pip install -U scikit-learn.
Support Vector Classifier (SVC) is a common classifier that can be used for multiple classes. It is a part of the group of machine learning algorithms known as Support Vector Machines (SVM).
Lets say we wanted to classify a set of X, Y corrdinates into two groups; positive and negative numbers. We could use a SVC to train a dataset for prediction of future inputs.
- 0 will represent the group of positive coordinates
- 1 will represent the group of negative coordinates
import numpy as np from sklearn import svm # define positive and negative coordinates positive = np.array([[1,1], [5,5], [2,3], [2,4]]) negative = np.array([[-1,-1], [-3,-4], [-5, -1], [-2,-2]])
Positive and negative pairs of coordinates have now been created. These represent the groups 0 and 1 respectively. We will use this information to train a classifier and then test some new coordinates to see how accurate it has been trained. Usually real datasets are much larger, but this is simple.
# generate X data for the classifer by combining the positive and negative numbers X = np.vstack((positive, negative)) # generate y group labels for the data y = [0,0,0,0,1,1,1,1] # or using the ones and zeros of np! # might not seem like much here, but on larger datasets, this helps a lot! y = np.append(np.zeros(positive.shape), np.ones(negative.shape))
The classfier takes samples of both groups (
X), while an array of binary group labels(
y), 1 and 0, were created to classify the training data.
The model for the data can then be fit using
clf = svm.SVC(kernel='linear') fit = clf.fit(X, y)
The output of this command on in a
>>>print fit SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0, decision_function_shape=None, degree=3, gamma='auto', kernel='linear', max_iter=-1, probability=False, random_state=None, shrinking=True, tol=0.001, verbose=False)
Yay! We have now trained our classifier.
Lets see how well we did but trying a few test cases.
Points can be passed as an array, or individual coordinates using,
predict = fit.predict([[1,4]])
A score can be derived using a set of labeled test data such as
score = fit.score([[2,3],[4,1],[5,4],[2,1],[4,3]], np.zeros(5))
for testing new positive coordinates. The same can be done with negative corrdinates or any combination of both.
Print out the results.
print "predict ", predict # ->  print "score ", score # -> [1.0]
np.ones and np.zeros. A new appreciation. Perfect classification.