Tag: sklearn

  • Machine Learning

    Machine Learning (ML)

    Machine Learning (ML) is a branch of artificial intelligence that allows systems to learn from data and enhance their performance on tasks without needing explicit programming. ML algorithms examine data to detect patterns and relationships, which can then be utilized for making predictions, classifications, or decisions. These techniques are commonly applied in areas such as fraud detection, recommendation systems, and predictive analytics. Unlike traditional programming, ML focuses on data-driven learning and can handle both structured and unstructured data.

    Process

    • Training
      • Input data
      • Feature extraction (manual in traditional ML, automatic in deep learning)
      • Model learning
    • Prediction (Inference)
      • New input data
      • Apply trained model
      • Output prediction or classification

    Data Splitting

    • Training set: Used to train the model
    • Validation set: Used to tune and evaluate during training
    • Test set: Used to evaluate final performance on unseen data
    • A common split is 70% / 20% / 10%, but this may vary.

    Example

    import numpy as np # For handling arrays
    from sklearn.feature_extraction.text import CountVectorizer # Convert text to numeric feature vectors
    from sklearn.ensemble import RandomForestClassifier # Machine learning model for classification

    # Input texts (simulated messages) and labels
    texts = np.array([
        ‘Click at this link’, # Suspicious / phishing-like message
        ‘Click at this link to download’, # Suspicious
        ‘Click here to transfer money’, # Suspicious
        ‘My name is Jone’, # Normal / safe message
        ‘How are you’ # Normal / safe message
    ])
    labels = np.array([1, 1, 1, 0, 0]) # 1 = positive/suspicious, 0 = negative/normal
    tags = np.array([“negative”, “positive”]) # Labels for display

    # Extract features from text using Bag-of-Words
    count_vectorizer = CountVectorizer(min_df=1) # Convert text to word frequency vectors
    features = count_vectorizer.fit_transform(texts).toarray() # Learn vocabulary and convert texts to array

    # Train Random Forest classifier
    random_forest_classifier = RandomForestClassifier() # Initialize model
    random_forest_classifier.fit(features, labels) # Train model on features and labels

    # Predict new text
    features = count_vectorizer.transform([‘How are you’]) # Convert new text to feature vector
    prediction = random_forest_classifier.predict(features) # Predict label (0 or 1)
    print(prediction, tags[prediction]) # Print numeric prediction and human-readable tag

    import numpy as np
    from sklearn.feature_extraction.text import CountVectorizer
    from sklearn.ensemble import RandomForestClassifier

    #Input
    texts = np.array(['Click at this link', 'Click at this link to download', 'Click here to transfer money', 'My name is Jone', 'How are you'])
    labels = np.array([1, 1, 1, 0, 0])
    #0 = negative
    #1 = positive
    tags = np.array(["negative","positive"])

    #Extract Features
    count_vectorizer = CountVectorizer(min_df=1)
    features = count_vectorizer.fit_transform(texts).toarray()

    #Train
    random_forest_classifier = RandomForestClassifier()
    random_forest_classifier.fit(features, labels)

    #Predict
    features = count_vectorizer.transform(['How are you'])
    prediction = random_forest_classifier.predict(features)
    print(prediction, tags[prediction])