Machine Learning

Machine Learning (ML)

Machine Learning (ML) is a branch of artificial intelligence that allows systems to learn from data and enhance their performance on tasks without needing explicit programming. ML algorithms examine data to detect patterns and relationships, which can then be utilized for making predictions, classifications, or decisions. These techniques are commonly applied in areas such as fraud detection, recommendation systems, and predictive analytics. Unlike traditional programming, ML focuses on data-driven learning and can handle both structured and unstructured data.

Process

  • Training
    • Input data
    • Feature extraction (manual in traditional ML, automatic in deep learning)
    • Model learning
  • Prediction (Inference)
    • New input data
    • Apply trained model
    • Output prediction or classification

Data Splitting

  • Training set: Used to train the model
  • Validation set: Used to tune and evaluate during training
  • Test set: Used to evaluate final performance on unseen data
  • A common split is 70% / 20% / 10%, but this may vary.

Example

import numpy as np # For handling arrays
from sklearn.feature_extraction.text import CountVectorizer # Convert text to numeric feature vectors
from sklearn.ensemble import RandomForestClassifier # Machine learning model for classification

# Input texts (simulated messages) and labels
texts = np.array([
    ‘Click at this link’, # Suspicious / phishing-like message
    ‘Click at this link to download’, # Suspicious
    ‘Click here to transfer money’, # Suspicious
    ‘My name is Jone’, # Normal / safe message
    ‘How are you’ # Normal / safe message
])
labels = np.array([1, 1, 1, 0, 0]) # 1 = positive/suspicious, 0 = negative/normal
tags = np.array([“negative”, “positive”]) # Labels for display

# Extract features from text using Bag-of-Words
count_vectorizer = CountVectorizer(min_df=1) # Convert text to word frequency vectors
features = count_vectorizer.fit_transform(texts).toarray() # Learn vocabulary and convert texts to array

# Train Random Forest classifier
random_forest_classifier = RandomForestClassifier() # Initialize model
random_forest_classifier.fit(features, labels) # Train model on features and labels

# Predict new text
features = count_vectorizer.transform([‘How are you’]) # Convert new text to feature vector
prediction = random_forest_classifier.predict(features) # Predict label (0 or 1)
print(prediction, tags[prediction]) # Print numeric prediction and human-readable tag

import numpy as np
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.ensemble import RandomForestClassifier

#Input
texts = np.array(['Click at this link', 'Click at this link to download', 'Click here to transfer money', 'My name is Jone', 'How are you'])
labels = np.array([1, 1, 1, 0, 0])
#0 = negative
#1 = positive
tags = np.array(["negative","positive"])

#Extract Features
count_vectorizer = CountVectorizer(min_df=1)
features = count_vectorizer.fit_transform(texts).toarray()

#Train
random_forest_classifier = RandomForestClassifier()
random_forest_classifier.fit(features, labels)

#Predict
features = count_vectorizer.transform(['How are you'])
prediction = random_forest_classifier.predict(features)
print(prediction, tags[prediction])