Training

The process of training a machine learning model involves providing a learning algorithm with training data to learn from. The goal of the algorithm is to take some data with a known relationship and to create a model of those relationships. The model thus trained can be used for predictive analysis that will help in understanding patterns related to customer behaviour.

Here are the  steps to train a model:

1. Provide a model name.

2. Select an algorithm that you want to apply on the model and once you are done with this click on + present at the right of the dialog box. Following options can be selected depending on the requirement:

  • Classification – Single Label: A single label may be assigned to each instance of data.
  • Classification – Multi Label:  Multiple labels may be assigned to each instance of data.
  • Regression: It is used to model the relationship between a dependent variable and one or more independent variables.

Attributes that need to filled for above three streams are as follows:

Attribute Description
kernel : string, optional (default=’rbf’)   Specifies the kernel type to be used in the algorithm. It must be one of ‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’ or ‘precomputed’. If none is given, ‘rbf’ will be used.  
degree : int, optional (default=3) Degree of the polynomial kernel function (‘poly’). Ignored by all other kernels.    
gamma : float, optional (default=1/num_features)     Kernel coefficient for ‘rbf’, ‘poly’ and ‘sigmoid’.  
coef0 : float, optional (default=0.0)   Independent term in kernel function. It is only significant in ‘poly’ and ‘sigmoid’.  
cache_size : float, optional   Specify the size of the kernel cache (in MB).  
eps: (default: 0.001) Tolerance of termination criteria.
C : float, optional (default=1.0)   Penalty parameter C of the error term.  
nr_weight: (default: 0) Number of elements present in the weight_label and weight.
weight_label:  (default: NULL) Array of output variable values.
weight: (default: 1) Weights corresponding to the elements present in the weight label array.
nu: (default: 0.5) upper bound on the fraction of training errors / lower bound of the fraction of support vectors; acceptable range (0, 1]
p: (default: 0.1) p value for significance testing.
shrinking : boolean, optional (default=True)   Whether to use the shrinking heuristic.  
  • IE: Information extraction (IE) extracts structured information from unstructured and/or semi-structured machine-readable documents. It does NER (Name Entity Recognition), builds relationships (subject, object, predicate) and allows user to traverse knowledge graph to understand and extracts the hidden rules, conditions or information
  • IE Word Dictionary: IE can be applied for larger domain or specific ones. Therefore we need options to create a custom knowledge base for a particular domain. The word dictionary creation option allows us to create basic KB for a domain which allows us to train NER and IE models
  • IE Named Entity Recognition (NER): This is to extracts named entity from a given text which are the basis for information extraction
  • Sentiment Analysis: Contextual mining of text which identifies and extracts subjective information in source material, and helps a business to understand the social sentiment of their brand or a product.
  • KMEANS:  This is a unsupervised learning model which is used to classify a given data set through a certain number of  clusters.
  • KMEANS Centroid: KMEANS uses various clusters and finds proximity to one or two clusters for classifications. The center of the clusters are known as centroid and therefore we could leverage centroids to understand how close or far a test data point is from given set of clusters
  • KMEANS Statistics: Various statistics are involved with KMEANS, such as mean, avg, centroid, etc. defined to understand the class of a given data point.

3. Select the training speed of the model. This varies from Very Fast to Very Slow where Slow means that the data will be processed more accurately.

4. You can choose an input stream using which you will train the model.

5. Choose the type of attributes among string, number, hybrid (combination of string and number).

6. Choose the file format in which you want the model to appear.

7. Choose the training source where you need to specify whether you want to map the attributes with an existing stream or through an external source i.e., URL or File.

8. Upload the training data. The format of the training data can be checked on the page by clicking on ‘here’. After uploading the data, click on ‘Start Training’ in order to begin creating the model.

9. Once successful, the model will be listed down on the right hand side of the page.