QeeqBox

Author: Giga Alqeeq

Identification, Authentication, Authorization, and Accountability
Identification

Identification is the process of claiming or declaring an identity, where a person, system, or object presents information that indicates who or what they are. It is the first step in identity verification, often involving credentials such as a username, ID number, card, or biometric data that represent the claimed identity. Identification by itself does not prove authenticity; instead, it signals the intent of the entity to be recognized, which is later confirmed through authentication mechanisms. This process is essential in security systems, as it establishes the basis for determining access rights and privileges.
- Username
- SSN
Example

id # command that shows user identity information (UID, GID, groups)
```
id
```
whoami # command that prints the current logged-in username
```
whoami
```
Authentication

Authentication is the process of verifying that a claimed identity is valid by confirming that the person, system, or object truly is who or what they assert to be. It typically follows identification and uses various methods such as passwords, PINs, security tokens, smart cards, or biometrics like fingerprints and facial recognition. Authentication can be implemented through single-factor, two-factor, or multi-factor approaches, depending on the required security level. By validating the authenticity of an identity before granting access, authentication helps prevent unauthorized use, strengthens trust, and protects sensitive systems and data.

Authentication factors
- Something you know
  - Password
    
    A sequence of characters that identifies a user
  - Personal Identification Number (PIN)
    
    A sequence of numbers that identifies a user
- Something you have
  - Passport
    
    A travel document issued by a government that verifies their identity and international travel
  - Smartphone
    
    A cellular telephone with an integrated computer
  - Smart Card
    
    A physical plastic card with an embedded microprocessor that stores and processes data (It acts as a security token)
  - Token
    
    A device that’s used to gain access to restricted resource (Might include name, password, key, certificate, group, privilege)
- Something you are
  - Fingerprint
    
    A unique pattern made by a person’s fingertip friction ridges
  - Facial recognition
    
    A technology that identifies a user based on their faces
  - Iris Scan
    
    A technology that identifies a user based on their iris
- Somewhere you are
  - IP address
    
    A logical network address that is used to locate the device
  - MAC Address
    
    A physical network address that is used to locate the device
- Something you do
  - Pattern unlock
    
    A technology that identifies a user based on drawing a specific pattern
  - Picture Password
    
    A technology that identifies a user based on selection of images
Example

passwd # command used to set or change a user’s password in Linux
```
passwd
```
Authorization

Authorization is the process of determining what actions, resources, or services a person, system, or object is permitted to access after their identity has been successfully verified through authentication. It defines the level of access granted, such as whether a user can read, modify, delete, or execute specific files, applications, or functions. Authorization is typically enforced through policies, access control lists, or role-based access control (RBAC), ensuring that users only perform actions aligned with their roles and responsibilities. This step is critical for enforcing the principle of least privilege, reducing security risks, and protecting sensitive information from misuse or unauthorized access.
- Access Control
  - A security technique to protect a system against unauthorized access
Example

echo # command that prints text to the terminal or output stream
“example” # the text string being displayed
> # output redirection operator
file # the file where the output will be written (created or overwritten)
```
echo "QeeqBox" > file
```
sudo # run the command with administrative (root) privileges
groupadd # command used to create a new group in Linux
sales # name of the group being created
```
sudo groupadd sales
```
chown # command to change ownership of a file
john # user who will become the new owner
file.txt # file whose ownership is being changed
```
chown john file.txt
```
chgrp # command to change group ownership of a file
sales # group that will become the new group owner
file.txt # file whose group is being changed
```
chgrp sales file.txt
```
Accountability (Auditing)

Accountability (Auditing) is the ability to trace actions, events, or system changes back to a specific individual, system, or object, ensuring that every activity within an environment can be attributed to its source. It involves maintaining detailed logs, audit trails, and monitoring mechanisms that record who did what, when, and how. Accountability helps detect misuse, supports investigations, enforces compliance with policies and regulations, and promotes responsible behavior by making users aware that their actions are being tracked. By providing transparency and traceability, auditing strengthens overall security and trust within an organization’s systems.
- Audit logs
Example

who # shows currently logged-in users on the system
```
who
```
last # shows history of user logins and system reboots
```
last
```
sudo # runs the command with root (admin) privileges
cat # command used to display file contents
/var/log/auth.log # system authentication log file (login, sudo, SSH events)
```
sudo cat /var/log/auth.log
```
Identity Management (IdM)

Identity Management (IdM) is the process of managing and controlling digital identities within an organization or system. It involves creating, maintaining, and governing user accounts, as well as assigning appropriate access rights and permissions based on roles or responsibilities. IdM ensures that only authorized individuals can access specific resources, applications, or data, while also maintaining compliance and security. This process includes activities such as authentication, authorization, password management, and account lifecycle management, helping to protect sensitive information and streamline user access.

Example

sudo # run command with admin (root) privileges
useradd # command to create a new user account
john # username being created
```
sudo useradd john
```
sudo # run command with admin (root) privileges
passwd # command to set/change a user password
john # username whose password is being set
```
sudo passwd john
```
Access Management (AM)

Access Management (AM) is the process of ensuring that people, systems, or objects have the appropriate level of access to resources, applications, and data based on their roles and responsibilities. It deals specifically with permissions and privileges, determining what a user or entity can do once authenticated. AM enforces policies such as granting, restricting, or revoking access, often using methods like role-based access control (RBAC), attribute-based access control (ABAC), or least privilege principles. By managing access effectively, organizations can reduce the risk of unauthorized activities, protect sensitive assets, and maintain compliance with security and regulatory requirements.

Example

echo # command that prints text to the terminal or output stream
“example” # the text string being displayed
> # output redirection operator
file # the file where the output will be written (created or overwritten)
```
echo "QeeqBox" > file
```
sudo # run the command with administrative (root) privileges
groupadd # command used to create a new group in Linux
sales # name of the group being created
```
sudo groupadd sales
```
chown # command to change ownership of a file
john # user who will become the new owner
file.txt # file whose ownership is being changed
```
chown john file.txt
```
chgrp # command to change group ownership of a file
sales # group that will become the new group owner
file.txt # file whose group is being changed
```
chgrp sales file.txt
```
Identity and Access Management (IAM)

Identity and Access Management (IAM) is the integrated framework of policies, processes, and technologies used to manage and control digital identities while ensuring that users, systems, or objects have the appropriate level of access to organizational resources. It combines Identity Management (IdM), which focuses on creating and maintaining digital identities, with Access Management (AM), which governs permissions and privileges. IAM solutions handle authentication, authorization, and account lifecycle management, enforcing security principles like least privilege and separation of duties. By implementing IAM, organizations can safeguard sensitive data, streamline user access, improve operational efficiency, and ensure compliance with industry regulations.
May 16, 2026
Access Controls
Access Controls

Access Control is a security technique that regulates who or what can view, use, or interact with resources in a system, thereby protecting it against unauthorized access. It involves defining and enforcing policies that determine permissions based on user roles, attributes, or contexts, ensuring that only authorized entities can perform specific actions. Common models include Discretionary Access Control (DAC), Mandatory Access Control (MAC), and Role-Based Access Control (RBAC), each offering different levels of granularity and security. By restricting access to sensitive data and critical systems, access control minimizes risks, supports compliance, and upholds the confidentiality, integrity, and availability of information.

Attribute-based Access Control (ABAC)

Attribute-Based Access Control (ABAC) is an access control model that grants or denies access to resources based on a combination of attributes associated with users, resources, actions, and the environment. Attributes can include factors such as a user’s role, department, security clearance, location, time of access, or even device type. ABAC policies use logical rules to evaluate these attributes and determine whether access should be permitted, making it highly flexible and context-aware. Unlike role-based models, ABAC allows more fine-grained control, enabling organizations to enforce dynamic and situation-specific access decisions that strengthen security and compliance.
- User attributes
- Object attributes
- Environment conditions
Example

user_attrs = { # dictionary storing attributes for each user
“john”: {“role”: “admin”, “department”: “IT”}, # john is admin in IT department
“jane”: {“role”: “dev”, “department”: “Sales”} # jane is developer in Sales department
}

resource_attrs = { # dictionary storing attributes for each resource
“apache.log”: {“department”: “IT”, “sensitivity”: “high”}, # IT log file with high sensitivity
“report.txt”: {“department”: “Sales”, “sensitivity”: “low”} # Sales report with low sensitivity
}

def check_access(user, resource):
user_attrs.get(user) # attempt to fetch user attributes (unused redundant line)
if not user_attrs.get(user) or not resource_attrs.get(resource): # check if user or resource exists
return False # deny access if either is missing
if user_attrs.get(user)[“department”] == resource_attrs.get(resource)[“department”]: # check department match
return True # allow access if user and resource belong to same department
if user_attrs.get(user)[“role”] == “admin”: # check if user has admin role
return True # allow access if user is admin
return False # deny access if no rules matched

print(check_access(“john”, “apache.log”)) # test access for john to apache.log (Will be successful)
print(check_access(“jane”, “apache.log”)) # test access for jane to apache.log (will fail)
```
user_attrs = {
    "john": {"role": "admin", "department": "IT"},
    "jane": {"role": "dev", "department": "Sales"}
}
resource_attrs = {
    "apache.log": {"department": "IT", "sensitivity": "high"},
    "report.txt": {"department": "Sales", "sensitivity": "low"}
}

def check_access(user, resource):
    user_attrs.get(user)
    if not user_attrs.get(user) or not resource_attrs.get(resource):
        return False
    if user_attrs.get(user)["department"] == resource_attrs.get(resource)["department"]:
        return True
    if user_attrs.get(user)["role"] == "admin":
        return True
    return False

print(check_access("john", "system_logs"))
print(check_access("jane", "system_logs"))
```
Output
```
True
False
```
Discretionary Access Control (DAC)

Discretionary Access Control (DAC) or Owner-based Access Control is an access control model in which the owner of a resource has the authority to decide who can access it and what level of access is granted. This model relies on Access Control Lists (ACLs), which are used to specify the permissions for individual users or groups, such as read, write, or execute rights. DAC provides flexibility by allowing owners to manage access to their resources directly, but it can also introduce security risks if permissions are misconfigured or broadly shared. It is widely used in operating systems and file systems, where individual users control access to their own files while maintaining a balance between usability and security.
- The data owner of an organization determines the level of access
Example

files = { # dictionary storing file ownership and permissions
“report.txt”: { # file name key in the system
“owner”: “alice”, # user who owns the file (has highest control in DAC)
“permissions”: { # permission rules for users and others
“alice”: “rw”, # owner permissions: read and write
“sales”: “r”, # sales group can only read the file
“others”: “r” # default permissions for all other users
}
}
}

def can_access(user, file, action):
if not files.get(file): # check if the requested file exists in the system
return False # deny access if file is not found
if user in files.get(file)[“permissions”] and action in files.get(file)[“permissions”][user]: # check if user has explicit permission
return True # allow access if user has matching permission for the action
return action in files.get(file)[“permissions”][“others”] # fallback to “others” permission if no user-specific rule exists

print(can_access(“alice”, “report.txt”, “r”)) # owner access (expected True because alice has rw)
print(can_access(“alice”, “report.txt”, “x”)) # invalid action for owner (expected False because “x” not in rw)
print(can_access(“salse”, “report.txt”, “r”)) # salse has read permission (expected True)
print(can_access(“test”, “report.txt”, “r”)) # unknown user treated as “others” (expected True because others = r)
```
files = {
    "report.txt": {
        "owner": "alice",
        "permissions": {
            "alice": "rw",
            "sales": "r",
            "others": "r"
        }
    }
}

def can_access(user, file, action):
    if not files.get(file):
        return False
    if user in files.get(file)["permissions"] and action in files.get(file)["permissions"][user]:
        return True
    return action in files.get(file)["permissions"]["others"]

print(can_access("alice", "report.txt", "r"))
print(can_access("alice", "report.txt", "x"))
print(can_access("sales", "report.txt", "r"))
print(can_access("test", "report.txt", "r"))
```
Output
```
True
False
True
True
```
Graph-based Access Control (GBAC)

Graph-Based Access Control (GBAC) is an access control model that determines permissions based on the relationships between data, users, and resources within a system. In GBAC, entities and their interactions are represented as nodes and edges in a graph, allowing access decisions to be made by analyzing these connections. For example, a user’s access to a document might depend on their relationship to the document’s owner or their position within an organizational network. This model provides fine-grained, context-aware control, enabling organizations to enforce dynamic policies that reflect complex real-world relationships, making it particularly useful in social networks, collaborative environments, and interconnected data systems.
- Using an organizational query language (john -> admin_role -> IT_access -> apache.log)
Example

graph = { # GBAC graph showing relationships between users, roles, and resources

“john”: [“admin_role”], # john is assigned to admin role
“jane”: [“sales_role”], # jane is assigned to sales role

“admin_role”: [“IT_access”, “Security_access”], # admin role grants IT and security access
“sales_role”: [“Salesforce_access”], # sales role grants Salesforce access

“IT_access”: [“apache.log”, “system_logs”], # IT access allows system logs and apache logs
“Security_access”: [“security.log”], # security access allows security logs
“Salesforce_access”: [“report.txt”, “app_code”] # salesforce access allows reports and code access
}

def check_access(user, resource): # function to check access using graph traversal
def depth_first_search(node, target, visited=None): # depth-first search to traverse graph
if visited is None:
visited = set() # track visited nodes to avoid loops
if node == target: # if target resource is reached
return True # access is allowed
visited.add(node) # mark current node as visited
for neighbor in graph.get(node, []): # explore connected nodes
if neighbor not in visited: # avoid revisiting nodes
if depth_first_search(neighbor, target, visited): # recursive search
return True # access granted if path found
return False # no valid path found
return depth_first_search(user, resource) # start search from user node

print(check_access(“john”, “apache.log”)) # True (john -> admin_role -> IT_access -> apache.log)
print(check_access(“jane”, “app_code”)) # True (jane -> sales_role -> Salesforce_access -> app_code)
print(check_access(“jane”, “security.log”)) # False (no path from jane to security.log)
```
graph = {
    "john": ["admin_role"],
    "jane": ["sales_role"],
    "admin_role": ["IT_access", "Security_access"],
    "sales_role": ["Salesforce_access"],
    "IT_access": ["apache.log", "system_logs"],
    "Security_access": ["security.log"],
    "Salesforce_access": ["report.txt", "app_code"]
}

def check_access(user, resource):
    def depth_first_search(node, target, visited=None):
        if visited is None:
            visited = set()
        if node == target:
            return True
        visited.add(node)
        for neighbor in graph.get(node, []):
            if neighbor not in visited:
                if depth_first_search(neighbor, target, visited):
                    return True
        return False
    return depth_first_search(user, resource)

print(check_access("john", "apache.log"))
print(check_access("jane", "app_code"))
print(check_access("jane", "security.log"))
```
Output
```
True
True
False
```
History-Based Access Control (HBAC)

History-Based Access Control (HBAC) is an access control model that determines whether a user, system, or object can access a resource based on the evaluation of their past activities or behavior. Unlike static models, HBAC dynamically considers a history of actions (such as previous logins, transaction patterns, or resource usage) to make real-time access decisions. For example, a user who has performed unusual or risky actions in the past may be temporarily restricted from sensitive operations. By leveraging behavioral history, HBAC enhances security, helps prevent misuse, and enables more context-aware, adaptive access control that responds to potential threats or anomalies as they occur.
- A user declined access to sensitive info because of past behavior
Example

user_history = { # stores past actions performed by each user
“john”: [“success_login”, “read_log”, “write_log”], # john’s activity history
“jane”: [“success_login”, “failed_login”, “read_report”] # jane’s activity history
}

restricted_actions = [“failed_login”, “security_violation”] # actions that indicate risky or bad behavior

def check_access(user, action): # function to decide access based on user history
history = user_history.get(user, []) # get user history or empty list if user not found
for event in history: # loop through each past action in user’s history
if event in restricted_actions: # check if any action is considered risky
return False # deny access if bad behavior is found in history
return True # allow access if no restricted actions are found

print(check_access(“john”, “access_log”)) # check access for john (expected True)
print(check_access(“jane”, “access_log”)) # check access for jane (expected False due to failed_login)
```
user_history = {
    "john": ["success_login", "read_log", "write_log"],
    "jane": ["success_login", "failed_login", "read_report"]
}

restricted_actions = ["failed_login", "security_violation"]

def check_access(user, action):

    history = user_history.get(user, [])
    for event in history:
        if event in restricted_actions:
            return False
    return True

print(check_access("john", "access_log"))
print(check_access("jane", "access_log"))
```
Output
```
True
False
```
Identity-Based Access Control (IBAC)

Identity-Based Access Control (IBAC) is an access control model in which access to resources is granted or denied based on the specific identity of an individual user rather than a group or role. Each user is assigned explicit permissions that define what actions they can perform on particular resources, ensuring precise, user-specific control. This model provides a high level of granularity and accountability, as every action can be traced directly to a single identity. IBAC is particularly useful in environments where individualized access policies are required, such as sensitive data systems, administrative applications, or scenarios demanding strict auditing and compliance.
- A specific user has access to sensitive information
Example

access_list = { # dictionary mapping user identity to allowed resources
“john”: [“apache.log”, “system_logs”], # john can access apache.log and system_logs
“jane”: [“report.txt”] # jane can only access report.txt
}

def check_access(user, resource):
allowed_resources = access_list.get(user, []) # get list of resources allowed for the user
if resource in allowed_resources: # check if requested resource is in user’s allowed list
return True # allow access if resource is found
return False # deny access if resource is not found

print(check_access(“john”, “apache.log”)) # True (john has access)
print(check_access(“john”, “report.txt”)) # False (john not allowed)
print(check_access(“jane”, “report.txt”)) # True (jane has access)
```
access_list = {
    "john": ["apache.log", "system_logs"],
    "jane": ["report.txt"]
}

def check_access(user, resource):
    allowed_resources = access_list.get(user, [])
    if resource in allowed_resources:
        return True
    return False

print(check_access("john", "apache.log"))
print(check_access("john", "report.txt"))
print(check_access("jane", "report.txt"))
```
Output
```
True
False
True
```
Mandatory Access Control (MAC)

Mandatory Access Control (MAC) is an access control model in which access to resources is strictly regulated by policies set by a central authority rather than by individual users. In MAC, every user and resource is assigned a classification or security label, and access decisions are made based on these labels according to predefined rules. Users cannot change permissions on the resources they own, ensuring that access is enforced consistently and in compliance with organizational or regulatory requirements. This model is commonly used in highly secure environments, such as government, military, and critical infrastructure systems, where strict control and data confidentiality are paramount.
- A user must demonstrate a need for the information before granting access
Example

security_labels = { # dictionary assigning security clearance levels to users
“john”: “secret”, # john has secret-level clearance
“jane”: “confidential” # jane has confidential-level clearance
}

resource_labels = { # dictionary assigning classification levels to resources
“apache.log”: “secret”, # apache.log is classified as secret
“report.txt”: “confidential” # report.txt is classified as confidential
}

levels = { # mapping of security levels to numeric values for comparison
“public”: 1, # lowest security level
“confidential”: 2, # medium security level
“secret”: 3 # highest security level
}

def check_access(user, resource): # function to evaluate MAC policy
user_level = levels.get(security_labels.get(user, “public”)) # get numeric clearance level of user
resource_level = levels.get(resource_labels.get(resource, “public”)) # get numeric classification level of resource
if user_level >= resource_level: # MAC rule: user must have equal or higher clearance than resource
return True # allow access if condition is satisfied
return False # deny access if clearance is too low

print(check_access(“john”, “apache.log”)) # True (secret >= secret)
print(check_access(“jane”, “apache.log”)) # False (confidential < secret)
print(check_access(“jane”, “report.txt”)) # True (confidential >= confidential)
```
security_labels = {
    "john": "secret",
    "jane": "confidential"
}

resource_labels = {
    "apache.log": "secret",
    "report.txt": "confidential"
}

levels = {
    "public": 1,
    "confidential": 2,
    "secret": 3
}

def check_access(user, resource):
    user_level = levels.get(security_labels.get(user, "public"))
    resource_level = levels.get(resource_labels.get(resource, "public"))
    if user_level >= resource_level:
        return True
    return False

print(check_access("john", "apache.log"))
print(check_access("jane", "apache.log"))
print(check_access("jane", "report.txt")) 
```
Output
```
True
False
True
```
Role-Based Access Control (RBAC)

Role-Based Access Control (RBAC) is an access control model in which access to resources and permissions is granted based on a user’s role within an organization rather than on an individual basis. Each role is assigned specific access rights that correspond to the responsibilities and tasks associated with that role, and users are then assigned to one or more roles. This approach simplifies administration by allowing permissions to be managed collectively for roles instead of individually for each user, reduces the risk of excessive privileges, and supports the principle of least privilege. RBAC is widely used in enterprise systems to enforce consistent security policies, streamline user management, and maintain compliance with regulatory standards.
- Job title
user_roles = { # dictionary mapping each user to a role
“john”: “admin”, # john is assigned the admin role
“jane”: “developer” # jane is assigned the developer role
}

role_permissions = { # dictionary mapping roles to allowed resources
“admin”: [“apache.log”, “report.txt”], # admin role can access both resources
“developer”: [“report.txt”] # developer role can only access report.txt
}

def check_access(user, resource): # function to check access based on role
role = user_roles.get(user) # get the role assigned to the user
if not role: # check if user has a valid role
return False # deny access if no role is assigned
allowed_resources = role_permissions.get(role, []) # get resources allowed for that role
if resource in allowed_resources: # check if resource is allowed for the role
return True # allow access if resource is in role permissions
return False # deny access if resource is not permitted

print(check_access(“john”, “apache.log”)) # True (admin has access)
print(check_access(“jane”, “apache.log”)) # False (developer does not have access)
print(check_access(“jane”, “report.txt”)) # True (developer has access)
```
user_roles = {
    "john": "admin",
    "jane": "developer"
}

role_permissions = {
    "admin": ["apache.log", "report.txt"],
    "developer": ["report.txt"]
}

def check_access(user, resource):
    role = user_roles.get(user)
    if not role:
        return False
    allowed_resources = role_permissions.get(role, [])
    if resource in allowed_resources:
        return True
    return False

print(check_access("john", "apache.log"))
print(check_access("jane", "apache.log"))
print(check_access("jane", "report.txt"))
```
Output
```
True
False
True
```
Rule-Based Access Control (RAC)

Rule-Based Access Control (RAC) is an access control model in which access to resources is determined by a predefined set of rules or conditions established by the system or administrator. These rules evaluate factors such as time of access, location, device type, or specific actions being attempted to decide whether access should be granted or denied. Unlike role-based or identity-based models, RAC enforces dynamic policies that can automatically adapt to context or system states, making it suitable for environments that require flexible, policy-driven control. By using clearly defined rules, RAC helps maintain security, ensure compliance, and reduce the risk of unauthorized access while allowing automated, consistent decision-making.
- Allowing access to a resource using ip, time and role
Example

def check_access(user, resource, hour, ip): # function to evaluate access based on system rules
if hour < 8 or hour > 17: # rule: allow access only during business hours (8 AM to 5 PM)
return False # deny access if outside allowed time window
if not ip.startswith(“10.1.”): # rule: allow only trusted internal network IP range
return False # deny access if IP is not from trusted subnet
if resource == “apache.log” and user != “admin”: # rule: only admin can access sensitive log file
return False # deny access if non-admin tries to access apache.log
return True # allow access if all rules pass

print(check_access(“admin”, “apache.log”, 9, “10.1.1.10”)) # True (admin, valid time, trusted IP)
print(check_access(“jane”, “apache.log”, 9, “10.1.1.10”)) # False (not admin)
print(check_access(“admin”, “report.txt”, 9, “10.2.0.5”)) # False (untrusted IP)
```
def check_access(user, resource, hour, ip):
    if hour < 8 or hour > 17:
        return False
    if not ip.startswith("10.1."):
        return False
    if resource == "apache.log" and user != "admin":
        return False
    return True

print(check_access("admin", "apache.log", 9, "10.1.1.10"))
print(check_access("jane", "apache.log", 9, "10.1.1.10"))
print(check_access("admin", "report.txt", 9, "10.2.0.5"))
```
Output
```
True
False
True
```
Responsibility-Based Access Control (ReBAC)

Responsibility-Based Access Control (ReBAC) is an access control model in which access to resources is granted based on the specific responsibilities assigned to a user or group of users. Unlike role-based models that focus on general job roles, ReBAC ties permissions directly to the duties or tasks a user is expected to perform, ensuring that access is closely aligned with operational responsibilities. This approach enforces the principle of least privilege, as users receive only the access necessary to fulfill their assigned duties. ReBAC is particularly useful in complex or task-oriented environments, where security and accountability must be balanced with operational efficiency.
- Data engineer has access to a backup management interface
Example

responsibilities = { # maps users to the tasks they are responsible for
“john”: [“incident_6998”, “server_backup”], # john is responsible for incident handling and backups
“jane”: [“ticket_3467”, “report_review”] # jane is responsible for ticket handling and report review
}

task_resources = { # maps each task to the resource it controls
“incident_6998”: “apache.log”, # incident task gives access to apache log
“ticket_3467”: “report.txt”, # ticket task gives access to report file
“server_backup”: “backup.log”, # backup task gives access to backup logs
“report_review”: “report.txt” # report review task gives access to report file
}

def check_access(user, resource): # function to check access based on responsibilities
user_tasks = responsibilities.get(user, []) # get all tasks assigned to the user
for task in user_tasks: # loop through each responsibility/task
if task_resources.get(task) == resource: # check if task maps to requested resource
return True # allow access if match is found
return False # deny access if no responsibility matches the resource

print(check_access(“john”, “apache.log”)) # True (john is responsible for incident_6998)
print(check_access(“jane”, “apache.log”)) # False (jane has no matching responsibility)
print(check_access(“jane”, “report.txt”)) # True (jane responsible for ticket_3467 or report_review)
```
responsibilities = {
    "john": ["incident_6998", "server_backup"],
    "jane": ["ticket_3467", "report_review"]
}

task_resources = {
    "incident_6998": "apache.log",
    "ticket_3467": "report.txt",
    "server_backup": "backup.log",
    "report_review": "report.txt"
}

def check_access(user, resource):
    user_tasks = responsibilities.get(user, [])
    for task in user_tasks:
        if task_resources.get(task) == resource:
            return True
    return False

print(check_access("john", "apache.log"))
print(check_access("jane", "apache.log"))
print(check_access("jane", "report.txt"))
```
Output
```
True
False
True
```
May 16, 2026
CIA Triad
CIA Triad

The CIA Triad is a fundamental cybersecurity model that encompasses three key principles for protecting information systems and data: Confidentiality, Integrity, and Availability. This model provides a straightforward framework for designing security controls and assessing risks, ensuring that information remains protected, accurate, and accessible when needed. These principles guide organizations in securing their systems, responding to threats, and managing data protection across various environments.

Confidentiality

Confidentiality is the principle of protecting information from unauthorized access, use, or disclosure. It ensures that data is accessible only to individuals who have been explicitly granted permission to view or handle it. This protection applies whether the data is stored, processed, or transmitted.

To maintain confidentiality, organizations store data in secure locations and use access controls such as passwords, file permissions, encryption, and authentication systems. They implement policies and technical safeguards to prevent sensitive information from being accessed by unauthorized users, accidentally exposed, or intentionally leaked.

Overall, confidentiality ensures that private or sensitive data remains “hidden” from unauthorized users, keeping it safe in a controlled, secure environment.

Example

Access is restricted so that only the file owner can read and write the file.

chmod # change file permissions command
600 # owner read/write only, no permissions for group or others
file.txt # target file to apply permissions
```
chmod 600 file.txt
```
Integrity

Integrity is the principle that ensures data remains accurate, complete, and trustworthy throughout its lifecycle. It means that information should not be altered, deleted, or manipulated by unauthorized users, whether intentionally or accidentally. When data integrity is maintained, users can trust that the information is correct and consistent.

To protect integrity, systems employ controls such as permissions, checksums, hashing, audit logs, and version control. These mechanisms help detect or prevent unauthorized changes, ensuring that any data modifications are tracked and approved.

Overall, integrity guarantees that data remains reliable and unchanged unless properly authorized, making it trustworthy for decision-making and operations.

Example

A hash value is generated to verify that the file contents have not changed..

sha256sum # command to compute SHA-256 hash of a file
message.txt # target file whose integrity is being checked
```
sha256sum message.txt
```
Availability

Availability is the principle that ensures data and systems are accessible to authorized users whenever necessary. This means that information, applications, and services should be reliably accessible without unnecessary delays or downtime.

To maintain availability, organizations implement measures such as backups, redundancy, failover systems, load balancing, and disaster recovery plans. These controls ensure that even if hardware fails, networks go down, or unexpected incidents occur, users can still access the data they are authorized to use.

Overall, availability ensures that information is readily accessible to the right people at the right time, supporting continuous, reliable operations.

Example

The status of a web service is checked to ensure it is running and available for users.

systemctl # systemd command to control and manage services
status # check the current state of a service
apache2 # the Apache web server service being checked
```
systemctl status apache2
```
Authenticity

Authenticity is the principle that guarantees data, users, or systems are genuine and verifiable as coming from a trusted and legitimate source. It ensures that information has not been falsified or impersonated and that communication or data originates from the claimed sender.

To maintain authenticity, systems employ various methods such as digital signatures, certificates, multi-factor authentication, cryptographic keys, and identity verification protocols. These mechanisms help validate identities and ensure that interactions between users and systems are trustworthy.

Overall, authenticity enables you to trust the source of the data or user you are interacting with, preventing impersonation and the spread of fake or misleading information.

Example

A digital signature is verified to confirm the sender’s identity.

gpg # GNU Privacy Guard tool used for encryption and signature verification
–verify # option to verify a digital signature
signed_message.txt.sig # signature file used to confirm authenticity
message.txt # original file being verified against the signature
```
gpg --verify signed_message.txt.sig message.txt
```
Non-repudiation

Non-repudiation is a security principle that guarantees an individual or system cannot deny having performed a specific action. It provides proof of both origin and integrity, allowing actions such as sending messages, approving transactions, or modifying data to be reliably attributed to a specific actor.

To achieve non-repudiation, systems employ mechanisms like digital signatures, cryptographic keys, audit logs, timestamps, and secure authentication records. These tools generate verifiable evidence that a particular user or system executed a specific action and that the action has not been altered afterward.

Overall, non-repudiation ensures accountability by making it impossible for someone to credibly deny their actions within a system.

Example

A digital signature is created to prove who signed the file.

gpg # GNU Privacy Guard tool for encryption and signing
–sign # create a digital signature to prove authorship and ensure integrity
message.txt # file being signed to provide proof of origin and non-repudiation
```
gpg --sign message.txt
```
May 16, 2026
Data Classification
Data Classification

Data classification involves defining and categorizing information based on its type, sensitivity, and value to the organization. This process allows for more effective management, protection, and utilization of data. By identifying data as confidential, sensitive, internal, or public, organizations can implement appropriate security controls, access restrictions, and handling procedures to safeguard confidentiality, integrity, and availability.

Furthermore, classification enhances operational efficiency by making it easier for authorized users to locate and access information while ensuring compliance with regulatory requirements and internal policies. Organizations typically create their own classification models and categories to align with their business objectives, regulatory obligations, and risk tolerance. This enables them to prioritize resources and protect their most critical or valuable information.

Content-Based Classification

This approach examines the actual content of files (payload) to determine sensitivity, often identifying patterns such as credit card numbers or PII (personally identifiable information).
- Techniques: Uses automated scanning, pattern matching, algorithms, or machine learning to scan text.
- Pros: Highly accurate because it analyzes data content directly.
- Cons: Can be resource-intensive.
Context-Based Classification

This approach analyzes the surrounding circumstances (the metadata) rather than the data itself to infer sensitivity.
- Techniques: Evaluates application (e.g., Salesforce, Jira), location (e.g., specific file paths), creator, or time of creation.
- Pros: Fast and efficient, often used in DLP (Data Loss Prevention) tools.
- Cons: May miss sensitive data if the context is deceptive.
User-Based Classification

This approach relies on human judgment, where creators or users manually select a classification label for a file at creation or modification.
- Techniques: Manual tagging prompts that ask the user to classify the data (e.g., Public, Confidential).
- Pros: Highly accurate for understanding business value, as the creator knows the data’s true purpose.
- Cons: Subjective, inconsistent, and prone to user error or negligence
Military Classification Scheme
- Top Secret
  - Data requires the highest degree of protection, and disclosure of it would cause exceptionally grave damage to national security
  - Policy for conducting intelligence
- Secret
  - Disclosure of it would cause serious damage to national security
  - Indications of weakness
- Confidential
  - Disclosure of it would cause damage to national security
  - Intelligence reports
- Sensitive
  - Data is not classified, and disclosure of it would cause limited damage to national security
  - For Official Use Only (FOUO)
  - Limited Official Use (LOU)
  - Official Use Only (OUO)
- Unclassified
  - Data is not classified and non-sensitive
Commercial Classification Scheme
- Restricted
  - High sensitive data and access is restricted to specific individuals or authorized third parties (disclosure to it would lead to permanent damage)
  - Examples:
    
    SSN
    
    Credit cards
    
    Criminal Record
    
    Medical info
    
    Biometric data
- Confidential
  - Sensitive data that is team-wide and disclosure to it would harm the origination operation
  - Examples:
    
    Vendor contracts
    
    Employees salaries
    
    Names, addresses, and dates
- Sensitive
  - Non-Sensitive data that is origination-wide and cannot be disclosed to anyone
  - Examples:
    
    Internal policies
    
    Internal user guides
    
    Ogrinzaitonl charts
    
    Project documents
- Public
  - Information that can be disclosed to anyone
  - Examples:
    
    Public API documents
    
    Job titles and names
    
    Open API Data
May 16, 2026
Data States
Data States

Data states refer to the different conditions in which data exists, encompassing both structured and unstructured information. They are typically divided into three categories: at rest, in use, and in transit.

Data at Rest

Data stored on physical or digital media that is not actively being processed or transmitted.
- Examples: Databases, File servers, Cloud storage, Backups, Endpoint devices
- Security Controls:
  - Encryption: Full disk, file-level, and database encryption to protect confidentiality.
  - Access Controls: Role-Based Access Control (RBAC) and the principle of least privilege.
  - Data Loss Prevention (DLP): Identifies and protects sensitive stored data.
  - Integrity Controls: Hashing and checksums to detect unauthorized modifications.
  - Availability Controls: Backups, redundancy, and disaster recovery plans.
  - Cloud Access Security Broker (CASB): Enforces policies for cloud-stored data.
  - Mobile Device Management (MDM): Secures data on mobile endpoints (e.g., remote wipe, enforced encryption).
Example

echo # prints text to standard output
“Hello World” # the exact string being printed
> # redirects output into a file (overwrites file if it exists)
file.txt # destination file
```
echo "Qeeqbox" > file.txt
```
ls # list directory contents
– l # use long listing format (permissions, owner, size, date)
file.txt # the specific file to display info about
```
ls -l file.txt
```
Data in Use

Data actively accessed, processed, or modified by users or applications, typically in memory (RAM).
- Examples: Editing documents, Running applications, Processing transactions
- Security Controls:
  - Access Controls & Authentication: Ensures only authorized users or processes can access data.
  - Privileged Access Management (PAM): Monitors and restricts administrative access.
  - Rights Management (Digital Rights Management/Information Rights Management): Controls usage (e.g., restricts copy, print, and forwarding).
  - Endpoint Security: Endpoint Detection and Response (EDR) and antivirus solutions to detect malicious activity during use.
  - Data Masking/Tokenization: Protects sensitive data during processing.
  - Session Controls: Implement timeouts, re-authentication, and continuous monitoring.
  - DLP (Endpoint): Prevents unauthorized actions, such as copying to USB devices.
Note: Traditional encryption does not fully protect data in use since it must be decrypted in memory. Advanced methods like confidential computing exist but are not yet standard.

Example

nano # open the nano text editor
file.txt # target file to open or create
```
nano file.txt
```
ps aux # list all running processes with details
| # pipe sends output of left command to right command
grep nano # filter results to only lines containing “nano”
```
ps aux | grep nano
```
Data in Transit

Data that is transmitted between systems, networks, or users.
- Examples: Emails, Web traffic, File transfers, API communications
- Security Controls:
  - Encryption in Transit: Utilize TLS/SSL (HTTPS), secure email encryption, and VPNs.
  - Secure Protocols: Use SFTP and SSH instead of insecure protocols like FTP and Telnet.
  - DLP (Network): Monitors and blocks unauthorized data exfiltration.
  - CASB: Controls data movement to and from cloud services.
  - Integrity Controls: Use digital signatures to verify authenticity and prevent tampering.
  - Network Security Monitoring: Intrusion Detection Systems (IDS) and Intrusion Prevention Systems (IPS) to detect attacks and anomalies.
  - Rights Management (DRM/IRM): Maintains usage restrictions after sharing.
Example

curl # Run the curl command-line download tool
https://qeeqbox.com/file.txt # URL of the file to download
-o file.txt # Save the downloaded content as “file.txt”
```
curl https://qeeqbox.com/dummy.txt -o file.txt
```
May 16, 2026
Data Visualization
Data Visualization

The process of translating data into a visual context (A graphical representation of data). This process is very important because it allows businesses to see the relationships and patterns between the data. Visualization makes large datasets coherent and makes them more accessible and understandable.

Line Chart

A line chart is a graphical representation used to track changes in data over time. It displays data points connected by straight lines, making it easy to visualize trends, patterns, and fluctuations. Line charts are commonly used for time-series data, such as stock prices, temperature changes, website traffic, or sales performance

Example

from datetime import datetime, timedelta # Import tools to work with dates and time differences
from random import randint # Import function to generate random numbers
import matplotlib.pyplot as plt # Import Matplotlib for plotting graphs
x = [datetime.now() + timedelta(hours=i) for i in range(24)] # Create 24 timestamps (one per hour starting now)
y = [randint(0, i) for i, _ in enumerate(x)] # Generate random values based on index position
plt.plot(x, y) # Plot the x (time) and y (random values) data
plt.show() # Display the graph
```
from datetime import datetime, timedelta
from random import randint
import matplotlib.pyplot as plt
x = [datetime.now() + timedelta(hours=i) for i in range(24)]
y = [randint(0,i) for i,_ in enumerate(x)]
plt.plot(x,y)
plt.show()
```
Output

You can also plot multiple lines like this

Scatter Chart

A scatter plot is a graphical representation in which each value in a dataset is plotted as a dot. It is used to visualize the relationship or correlation between two variables. The position of each dot along the x-axis and y-axis corresponds to the values of the two variables. Scatter plots are useful for identifying patterns, trends, clusters, and outliers in data

Example

from datetime import datetime, timedelta # Import date/time tools (not used in this example)
import numpy as np # Import NumPy for generating random data
import matplotlib.pyplot as plt # Import Matplotlib for plotting
x_1 = np.random.randint(low=20, high=50, size=20) # Generate 20 random x-values for Day 1
y_1 = np.random.randint(low=25, high=120, size=20) # Generate 20 random y-values for Day 1
x_2 = np.random.randint(low=20, high=50, size=20) # Generate 20 random x-values for Day 2
y_2 = np.random.randint(low=25, high=70, size=20) # Generate 20 random y-values for Day 2
plt.scatter(x_1, y_1) # Create scatter plot for Day 1 data
plt.scatter(x_2, y_2) # Create scatter plot for Day 2 data
plt.legend(labels=[‘Day 1’, ‘Day 2′], loc=’upper right’) # Add legend to distinguish datasets
plt.show() # Display the scatter plot
```
from datetime import datetime, timedelta
import numpy as np
import matplotlib.pyplot as plt
x_1 = np.random.randint(low=20,high=50, size=20)
y_1 = np.random.randint(low=25,high=120, size=20)
x_2 = np.random.randint(low=20,high=50, size=20)
y_2 = np.random.randint(low=25,high=70, size=20)
plt.scatter(x_1,y_1)
plt.scatter(x_2,y_2)
plt.legend(labels=['Day 1', 'Day 2'], loc='upper right')
plt.show()
```
Output

Bar Chart

A bar chart is a graphical representation in which values are depicted as vertical or horizontal bars. The length of each bar corresponds to the magnitude of the value it represents, making it easy to compare different categories or groups. Bar charts are commonly used to display discrete data, such as sales by product, population by region, or survey results

Example

from datetime import datetime, timedelta # Import date/time tools (not used in this example)
import matplotlib.ticker as mticker # Import ticker module to control axis ticks
import numpy as np # Import NumPy for handling arrays
import matplotlib.pyplot as plt # Import Matplotlib for plotting
x = np.array([“MON”, “TUE”, “WED”, “THU”, “FRI”, “SAT”, “SUN”]) # Days of the week
y = np.array([20, 10, 5, 5, 8, 1, 1]) # Malware counts per day
plt.bar(x, y) # Create a bar chart
plt.gca().yaxis.set_major_locator(mticker.MultipleLocator(5)) # Set y-axis ticks at intervals of 5
plt.xlabel(‘Day’) # Label x-axis
plt.ylabel(‘Malware Count’) # Label y-axis
plt.show() # Display the bar chart
```
from datetime import datetime, timedelta
import matplotlib.ticker as mticker
import numpy as np
import matplotlib.pyplot as plt
x = np.array(["MON", "TUE", "WED", "THU", "FRI", "SAT", "SUN"])
y = np.array([20,10, 5, 5, 8, 1, 1])
plt.bar(x,y)
plt.gca().yaxis.set_major_locator(mticker.MultipleLocator(5))
plt.xlabel('Day')
plt.ylabel('Malware Count')
plt.show()
```
Output

Maps

Maps are a type of data visualization used to display geographic data. You can plot points, lines, or areas on a map to show locations, routes, or spatial patterns. Tools like Plotly provide built-in integration with OpenStreetMap, allowing you to create interactive maps without needing an access token. Maps are useful for visualizing data such as population distribution, weather patterns, travel routes, or incidents across different locations

Example

import plotly.express as px # Import Plotly Express for interactive plotting
from random import uniform # Import uniform to generate random floating-point numbers
temp_list = [] # Initialize empty list to store random coordinates
for i in range(5): # Loop 5 times
temp_list.append({‘lat’: round(uniform(-90, 90), 5), ‘lon’: round(uniform(-180, 180), 5)}) # Append a dictionary with random latitude (-90 to 90) and longitude (-180 to 180)
fig = px.scatter_mapbox(temp_list, lat=”lat”, lon=”lon”, zoom=3) # Create an interactive scatter map using the generated coordinates
fig.update_layout(mapbox_style=”open-street-map”, margin={“r”:0,”t”:0,”l”:0,”b”:0}) # Set the map style and remove extra margins
fig.show() # Display the interactive map
```
import plotly.express as px
from random import uniform

temp_list = []

for i in range(5):
    temp_list.append({'lat':round(uniform( -90,  90), 5),'lon':round(uniform(-180, 180), 5)})

fig = px.scatter_mapbox(temp_list, lat="lat", lon="lon", zoom=3)
fig.update_layout(mapbox_style="open-street-map", margin={"r":0,"t":0,"l":0,"b":0})
fig.show()
```
Output

You can also add lines between dots

Example

import plotly.graph_objects as go # Import Plotly Graph Objects for more customizable plots
fig = go.Figure(go.Scattermapbox( # Create a scatter map with markers connected by lines
mode=”markers+lines”, # Show both points (markers) and connecting lines
lat=[45.6280, 38.9072], # Latitude coordinates of the points
lon=[-122.6615, -77.0369], # Longitude coordinates of the points
marker={‘size’: 10} # Set the size of the markers
))
fig.update_layout(mapbox_style=”open-street-map”, margin={“r”:0, “t”:0, “l”:0, “b”:0}) # Set map style and remove extra margins
fig.show() # Display the interactive map
```
import plotly.graph_objects as go

fig = go.Figure(go.Scattermapbox(
    mode = "markers+lines",
    lat = [45.6280, 38.9072],
    lon = [-122.6615, -77.0369 ],
    marker = {'size': 10}))

fig.update_layout(mapbox_style="open-street-map",margin={"r":0,"t":0,"l":0,"b":0})
fig.show()
```
Output
May 3, 2026
Deep Learning
Deep Learning (DL)

Deep Learning (DL) is a subset of Machine Learning that utilizes artificial neural networks with multiple layers to identify complex patterns in data. Unlike traditional machine learning methods, which often depend on manually engineered features, deep learning models automatically learn hierarchical representations directly from raw data. This capability makes them particularly effective for processing unstructured data, such as images, audio, text, and video. Deep learning is widely applied in various fields, including image recognition, speech processing, natural language processing, autonomous systems, and cybersecurity, where large-scale and complex data need to be analyzed efficiently.

Process
- Input (raw data)
- Hidden layers (learn low-level -> high-level features automatically)
- Output (prediction / classification)
Example (Addition)

import numpy as np # For numerical operations and generating random data
from tensorflow.keras.models import Sequential # For building a sequential neural network
from tensorflow.keras.layers import LSTM, Dense, Dropout, SpatialDropout1D, Embedding # Neural network layers
from keras.callbacks import EarlyStopping # Stop training early if model stops improving

# Generate random input data
x = np.random.randint(0, 500, size=(1000,2)) # 1000 samples, 2 features each (random integers 0-499)
y = x[:, 0] + x[:, 1] # Target is sum of two features

# Build a simple neural network
model = Sequential() # Initialize sequential model
model.add(Dense(32, input_shape=(2,), activation=’relu’)) # Hidden layer with 32 neurons, ReLU activation
model.add(Dense(1)) # Output layer with 1 neuron (predict sum)

# Compile the model
model.compile(loss=’mean_absolute_error’, optimizer=’adam’, metrics=[‘mae’]) # Use MAE loss and Adam optimizer

# Train the model
model.fit(
x, y, # Training data and targets
validation_split=0.2, # Use 20% of data for validation
batch_size=32, # Batch size for training
epochs=100, # Maximum number of epochs
verbose=1, # Show progress
callbacks=[EarlyStopping(monitor=’val_loss’, patience=5)] # Stop early if validation loss doesn’t improve for 5 epochs
)

# Predict on new data
print(model.predict(np.array([[0.2, 10], [50, 1]]))) # Predict sum for two new samples
```
import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM,Dense, Dropout, SpatialDropout1D,Embedding
from keras.callbacks import EarlyStopping

x = np.random.randint(0, 500, size=(1000,2))
y = x[:, 0] + x[:, 1]

model = Sequential()
model.add(Dense(32, input_shape=(2,), activation='relu'))
model.add(Dense(1))
model.compile(loss='mean_absolute_error', optimizer='adam', metrics=['mae'])

# ~1000 samples, batch size 32 (hyperparameter)
# For fixed validation, use train_test_split instead of validation_split
model.fit(x, y, validation_split=0.2, batch_size=32, epochs=100, verbose=1, callbacks=[EarlyStopping(monitor='val_loss', patience=5)])

print(model.predict(np.array([[0.2, 10], [50, 1]])))
```
Example (Multiplication)

import numpy as np # For creating and handling arrays
from tensorflow.keras.models import Sequential # For building a sequential neural network
from tensorflow.keras.layers import LSTM, Dense, Dropout, SpatialDropout1D, Embedding # Neural network layers
from keras.callbacks import EarlyStopping # Stop training early if validation loss stops improving

# Generate random input data
x = np.random.randint(0, 10, size=(1000,2)) # 1000 samples, each with 2 features (integers 0-9)
y = x[:, 0] * x[:, 1] # Target = multiplication of the two features

# Build the neural network
model = Sequential() # Initialize sequential model
model.add(Dense(64, input_shape=(2,), activation=’relu’)) # First hidden layer with 64 neurons, ReLU activation
model.add(Dense(64, activation=’relu’)) # Second hidden layer with 64 neurons, ReLU activation
model.add(Dense(1)) # Output layer with 1 neuron (predict the product)

# Compile the model
model.compile(loss=’mean_absolute_error’, optimizer=’adam’, metrics=[‘mae’]) # MAE loss for regression, Adam optimizer

# Train the model
model.fit(
x, y, # Training data and targets
validation_split=0.2, # Use 20% of data for validation
batch_size=32, # Batch size
epochs=100, # Maximum number of epochs
verbose=1, # Show progress bar
callbacks=[EarlyStopping(monitor=’val_loss’, patience=5)] # Stop early if validation loss does not improve for 5 epochs
)

# Predict new data
print(model.predict(np.array([[2, 3]]))) # Predict the product of 2*3
```
import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM,Dense, Dropout, SpatialDropout1D,Embedding
from keras.callbacks import EarlyStopping

x = np.random.randint(0, 10, size=(1000,2))
y = x[:, 0] * x[:, 1]

model = Sequential()
model.add(Dense(64, input_shape=(2,), activation='relu'))
model.add(Dense(64, activation='relu'))
model.add(Dense(1))
model.compile(loss='mean_absolute_error', optimizer='adam', metrics=['mae'])

# ~1000 samples, batch size 32 (hyperparameter)
# For fixed validation, use train_test_split instead of validation_split
model.fit(x, y, validation_split=0.2, batch_size=32, epochs=100, verbose=1, callbacks=[EarlyStopping(monitor='val_loss', patience=5)])

print(model.predict(np.array([[2, 3]])))
```
Predicting Suspicious Emails (phishing)

import numpy as np # Numerical operations (not heavily used here but commonly included)
from tensorflow.keras.models import Sequential # Sequential model (stack layers linearly)
from tensorflow.keras.layers import Dense # Fully connected (dense) neural network layers
from sklearn.feature_extraction.text import CountVectorizer # Converts text into numeric feature vectors (bag-of-words)
emails = [
“Click here to reset your password”, # Likely phishing example
“Your invoice is attached”, # Likely safe example
“Verify your bank account immediately”, # Likely phishing example
“Meeting tomorrow at 10am”, # Likely safe example
]
labels = [1, 0, 1, 0] # Target labels: 1 = phishing, 0 = safe
vectorizer = CountVectorizer() # Initialize text vectorizer (bag-of-words model)
features = vectorizer.fit_transform(emails).toarray() # Learn vocabulary + convert emails into numeric feature matrix
model = Sequential() # Create a sequential neural network model
model.add(Dense(32, input_shape=(features.shape[1],), activation=’relu’)) # Input layer + first hidden layer (32 neurons)
model.add(Dense(16, activation=’relu’)) # Second hidden layer (16 neurons)
model.add(Dense(1, activation=’sigmoid’)) # Output layer (1 neuron for binary classification, sigmoid = probability)
model.compile(optimizer=’adam’, loss=’binary_crossentropy’, metrics=[‘accuracy’]) # Configure model training settings
model.fit(features, labels, epochs=50, verbose=0) # Train the model for 50 iterations (epochs), no training output shown
new_emails = vectorizer.transform([
“Your account will be locked, click here”, # Suspicious/phishing-like message
“Lunch tomorrow?” # Normal/safe message
]).toarray() # Convert new emails into the same feature format
prediction = model.predict(new_emails) > 0.5 # Predict probabilities and convert to True/False using threshold 0.5
print(“Phishing predictions (True=Phishing, False=Safe):”, prediction) # Display prediction results
```
import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from sklearn.feature_extraction.text import CountVectorizer

emails = [
    "Click here to reset your password",
    "Your invoice is attached",
    "Verify your bank account immediately",
    "Meeting tomorrow at 10am",
]

labels = [1, 0, 1, 0]  # 1 = phishing, 0 = safe

vectorizer = CountVectorizer()
features = vectorizer.fit_transform(emails).toarray()

model = Sequential()
model.add(Dense(32, input_shape=(features.shape[1],), activation='relu'))
model.add(Dense(16, activation='relu'))
model.add(Dense(1, activation='sigmoid'))  
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
model.fit(features, labels, epochs=50, verbose=0)

new_emails = vectorizer.transform([
    "Your account will be locked, click here",
    "Lunch tomorrow?"
]).toarray()

prediction = model.predict(new_emails) > 0.5
print("Phishing predictions (True=Phishing, False=Safe):", prediction)
```
Predicting Suspicious Files (Malware)

import numpy as np # Library for numerical operations and arrays
from tensorflow.keras.models import Sequential # Sequential model to stack layers
from tensorflow.keras.layers import Dense # Fully connected neural network layers
x = np.random.randint(0, 100, size=(1000, 3)) # Generate 1000 samples, each with 3 random features (0–99)
y = (x[:,0] + x[:,1] + x[:,2] > 150).astype(int) # Label: 1 (malware) if sum > 150, else 0 (safe)
model = Sequential() # Initialize the neural network model
model.add(Dense(32, input_shape=(3,), activation=’relu’)) # Input layer + first hidden layer (32 neurons, ReLU activation)
model.add(Dense(16, activation=’relu’)) # Second hidden layer (16 neurons)
model.add(Dense(1, activation=’sigmoid’)) # Output layer (1 neuron, sigmoid for binary classification)
model.compile(optimizer=’adam’, loss=’binary_crossentropy’, metrics=[‘accuracy’]) # Configure model with optimizer, loss, and accuracy metric
model.fit(x, y, epochs=50, batch_size=32, verbose=0) # Train the model for 50 epochs with batch size of 32
new_files = np.array([[60, 50, 50], [10, 5, 15]]) # New data samples to classify (each has 3 features)
prediction = model.predict(new_files) > 0.5 # Predict probabilities and convert to True/False using threshold 0.5
print(“Malware predictions (True=Malware, False=Safe):”, prediction) # Print classification results
```
import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

x = np.random.randint(0, 100, size=(1000, 3))
y = (x[:,0] + x[:,1] + x[:,2] > 150).astype(int)  # 1 = malware, 0 = safe

model = Sequential()
model.add(Dense(32, input_shape=(3,), activation='relu'))
model.add(Dense(16, activation='relu'))
model.add(Dense(1, activation='sigmoid'))

model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
model.fit(x, y, epochs=50, batch_size=32, verbose=0)

new_files = np.array([[60, 50, 50], [10, 5, 15]])
prediction = model.predict(new_files) > 0.5
print("Malware predictions (True=Malware, False=Safe):", prediction)
```
May 3, 2026
Machine Learning
Machine Learning (ML)

Machine Learning (ML) is a branch of artificial intelligence that allows systems to learn from data and enhance their performance on tasks without needing explicit programming. ML algorithms examine data to detect patterns and relationships, which can then be utilized for making predictions, classifications, or decisions. These techniques are commonly applied in areas such as fraud detection, recommendation systems, and predictive analytics. Unlike traditional programming, ML focuses on data-driven learning and can handle both structured and unstructured data.

Process
- Training
  - Input data
  - Feature extraction (manual in traditional ML, automatic in deep learning)
  - Model learning
- Prediction (Inference)
  - New input data
  - Apply trained model
  - Output prediction or classification
Data Splitting
- Training set: Used to train the model
- Validation set: Used to tune and evaluate during training
- Test set: Used to evaluate final performance on unseen data
- A common split is 70% / 20% / 10%, but this may vary.
Example

import numpy as np # For handling arrays
from sklearn.feature_extraction.text import CountVectorizer # Convert text to numeric feature vectors
from sklearn.ensemble import RandomForestClassifier # Machine learning model for classification

# Input texts (simulated messages) and labels
texts = np.array([
‘Click at this link’, # Suspicious / phishing-like message
‘Click at this link to download’, # Suspicious
‘Click here to transfer money’, # Suspicious
‘My name is Jone’, # Normal / safe message
‘How are you’ # Normal / safe message
])
labels = np.array([1, 1, 1, 0, 0]) # 1 = positive/suspicious, 0 = negative/normal
tags = np.array([“negative”, “positive”]) # Labels for display

# Extract features from text using Bag-of-Words
count_vectorizer = CountVectorizer(min_df=1) # Convert text to word frequency vectors
features = count_vectorizer.fit_transform(texts).toarray() # Learn vocabulary and convert texts to array

# Train Random Forest classifier
random_forest_classifier = RandomForestClassifier() # Initialize model
random_forest_classifier.fit(features, labels) # Train model on features and labels

# Predict new text
features = count_vectorizer.transform([‘How are you’]) # Convert new text to feature vector
prediction = random_forest_classifier.predict(features) # Predict label (0 or 1)
print(prediction, tags[prediction]) # Print numeric prediction and human-readable tag
```
import numpy as np
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.ensemble import RandomForestClassifier

#Input
texts = np.array(['Click at this link', 'Click at this link to download', 'Click here to transfer money', 'My name is Jone', 'How are you'])
labels = np.array([1, 1, 1, 0, 0])
#0 = negative
#1 = positive
tags = np.array(["negative","positive"])

#Extract Features
count_vectorizer = CountVectorizer(min_df=1)
features = count_vectorizer.fit_transform(texts).toarray()

#Train
random_forest_classifier = RandomForestClassifier()
random_forest_classifier.fit(features, labels)

#Predict
features = count_vectorizer.transform(['How are you'])
prediction = random_forest_classifier.predict(features)
print(prediction, tags[prediction])
```
May 3, 2026
Natural Language Processing
Natural Language Processing

Natural Language Processing (NLP) is a branch of artificial intelligence that focuses on enabling computers to understand, interpret, and interact with human language in a meaningful way. It combines linguistics, computer science, and machine learning to process text and speech, allowing machines to analyze syntax, semantics, and context in written or spoken language. NLP is used for tasks such as sentiment analysis, language translation, chatbots, information extraction, and text summarization. While NLP focuses on understanding and interpreting language, rather than predicting future events, it forms the foundation for applications that require machines to comprehend and respond to human communication in a natural, human-like manner.

Text Pre-Processing

There is a popular module in Python called nltk that used for NLP methodology. This module can be used to enhance threat detection and response

Install

pip3 # Python package installer for Python 3
install # Command that tells pip to install a package
nltk # The Natural Language Toolkit library (used for NLP tasks)
```
pip3 install nltk
```
Run this in Python

import nltk # Imports the Natural Language Toolkit (NLP library) into your Python script
nltk.download(‘all’) # Downloads all available NLTK datasets, models, and corpora
```
import nltk
nltk.download('all')
```
Breaking Sentences Into Words

You can break unstructured data and natural language text into chunks of information (Numerical data structure that can be used for machine learning) using a tokenizer. E.g., breaking a sentence words using the word_tokenize() method

Example

from nltk.tokenize import word_tokenize # Imports the word_tokenize function from NLTK’s tokenize module
print(word_tokenize(“Please follow this link.”)) # Tokenizes (splits) the sentence into individual words and punctuation, then prints the resulting list
```
from nltk.tokenize import word_tokenize
print(word_tokenize("Please follow this link."))
```
Output
```
['Please', 'follow', 'this', 'link', '.']
```
Finding Common Words

You can find common words in a sentence using the FreqDist() method

Example

from nltk.probability import FreqDist # Imports FreqDist class to calculate word frequency distribution
from nltk.tokenize import word_tokenize # Imports the word_tokenize function to split text into tokens
tokens = word_tokenize(“Please follow this link.”) # Tokenizes the sentence into individual words and punctuation marks
FreqDist(tokens).tabulate() # Creates a frequency distribution of the tokens and displays the counts in a formatted table
```
from nltk.probability import FreqDist
from nltk.tokenize import word_tokenize
tokens = word_tokenize("Please follow this link.")
FreqDist(tokens).tabulate()
```
Output
```
 Please follow    this    link       . 
      1       1       1       1       1 
```
Finding Senetnce Parts

If you want to find nouns, pronouns, verbs, adjectives, adverbs, prepositions, conjunctions, interjections, etc tags in a sentence, you can use pos_tag() method, you can review all the tags using nltk.help.upenn_tagset()

Example

from nltk import pos_tag # Imports the part-of-speech (POS) tagging function
from nltk.tokenize import word_tokenize # Imports the tokenizer to split text into words
tokens = word_tokenize(“Please follow this link.”) # Splits the sentence into individual tokens (words and punctuation)
for token in tokens: # Loops through each token
print(pos_tag([token])) # Tags the token with its part of speech and prints it
```
from nltk import pos_tag
from nltk.tokenize import word_tokenize
tokens = word_tokenize("Please follow this link.")
for token in tokens:
    print(pos_tag([token]))
```
Output
```
[('Please', 'VB')]
[('follow', 'NN')]
[('this', 'DT')]
[('link', 'NN')]
[('.', '.')]
```
Normalizing Words

If you want to normalize a word, you can use the PorterStemmer() method or lemmatize(). Stemming removes the last few characters from a word (It removes the suffix from the word), whereas lemmatization replaces a word with its root or head (It returns the lemma of the word). Usually, search engines use them to analyze the meaning of a word, then use that to return search results that include all relevant forms of that word used. E.g., if you search for cars, you also get result for car. Bots, use that to understand the overall meaning of the sentence.

Example

from nltk.stem import PorterStemmer # Imports the Porter Stemmer algorithm for word stemming
for item in [“test”, “tests”, “testing”, “tested”]: # Loops through each word in the list
print(item, “: “, PorterStemmer().stem(item)) # Applies stemming to each word and prints the original word along with its stemmed (root) form
```
from nltk.stem import PorterStemmer
for item in ["test","tests","testing","tested"]:
    print(item, ": ",PorterStemmer().stem(item))
```
Output
```
test
```
Example

from nltk.stem import WordNetLemmatizer # Imports the WordNet lemmatizer (uses vocabulary + morphology rules)
for item in [“test”, “tests”, “testing”, “tested”]: # Loops through each word in the list
print(item, “: “, WordNetLemmatizer().lemmatize(item)) # Lemmatizes (reduces to dictionary base form) each word and prints the original word with its lemma
```
from nltk.stem import WordNetLemmatizer
for item in ["test","tests","testing","tested"]:
    print(item, ": ", WordNetLemmatizer().lemmatize(item))
```
Output
```
testing
```
Example

from nltk.stem import WordNetLemmatizer # Imports the WordNet lemmatizer
from nltk.corpus import wordnet # Imports WordNet corpus (provides POS constants)
from nltk import word_tokenize, pos_tag # Imports tokenizer and POS tagger
from collections import defaultdict # (Not used here, but commonly used for default dictionary behavior)
mapped = {
“V”: wordnet.VERB, # Maps POS tags starting with ‘V’ to VERB
“J”: wordnet.ADJ, # Maps POS tags starting with ‘J’ to ADJECTIVE
“R”: wordnet.ADV # Maps POS tags starting with ‘R’ to ADVERB
}
tokens = word_tokenize(“caring”) # Tokenizes the word
for token, tag in pos_tag(tokens): # Tags the token with its Penn Treebank POS tag (e.g., VBG, NN, JJ)
tag = mapped.get(tag[0], wordnet.NOUN) # Looks at the first letter of the POS tag, of it exists in the mapped dictionary, use the corresponding WordNet POS, otherwise, default to NOUN
print(token, WordNetLemmatizer().lemmatize(token, tag)) # Lemmatizes the token using the correct POS
```
from nltk.stem import WordNetLemmatizer
from nltk.corpus import wordnet
from nltk import word_tokenize, pos_tag
from collections import defaultdict

mapped = {
    "V": wordnet.VERB,
    "J": wordnet.ADJ,
    "R": wordnet.ADV
}

tokens = word_tokenize("caring")
for token, tag in pos_tag(tokens):
    tag  = mapped.get(tag[0], wordnet.NOUN)
    print(token, WordNetLemmatizer().lemmatize(token, tag))
```
Part-Of-Speech

POS stands for Part-Of-Speech, which is a grammatical category assigned to each word in a sentence. POS tagging tells you whether a word is a noun, verb, adjective, adverb, etc., based on its role in the sentence
```
CC Coordinating conjunction
CD Cardinal number
DT Determiner
EX Existential there 
FW Foreign word
IN Preposition or subordinating conjunction
JJ Adjective
JJR Adjective, comparative
JJS Adjective, superlative
LS List item marker
MD Modal
NN Noun, singular or mass
NNS Noun, plural
NNP Proper noun, singular
NNPS Proper noun, plural
PDT Predeterminer
POS Possessive ending
PRP Personal pronoun
PRP$ Possessive pronoun
RB Adverb
RBR Adverb, comparative
RBS Adverb, superlative
RP Particle
SYM Symbol
TO to
UH Interjection
VB Verb, base form
VBD Verb, past tense
VBG Verb, gerund or present participle
VBN Verb, past participle
VBP Verb, non-3rd person singular present
VBZ Verb, 3rd person singular present
WDT Wh-determiner
WP Wh-pronoun
WP$ Possessive wh-pronoun
WRB Wh-adverb
```
Remove Stops Words

If you want to remove stopwords from a sentence, you can compare the words of the sentence with the stopwords

Example

from nltk.tokenize import sent_tokenize, word_tokenize # Import sentence and word tokenizers
from nltk.corpus import stopwords # Import stopwords list
tokens = word_tokenize(“Please followw this link.”) # Tokenize sentence into words
stop_words = set(stopwords.words(‘english’)) # Get the set of English stopwords
filtered = [w for w in tokens if w.lower() not in stop_words] # Filter out tokens that are stopwords
print(filtered) # Print the filtered words
```
from nltk.tokenize import sent_tokenize, word_tokenize
from nltk.corpus import stopwords
tokens = word_tokenize("Please followw this link.")
stop_words = set(stopwords.words('english'))
filtered = [w for w in tokens if w not in stop_words]
print(filtered)
```
Output
```
['Please', 'followw', 'link', '.']
```
Example #1

You can clean text using regex and nltk

import re # Import regular expressions for pattern-based text cleaning
from nltk.corpus import stopwords # Import list of common English stopwords
def clean_text(text):
text = text.lower() # Convert all letters to lowercase so that ‘This’ and ‘this’ are treated the same
text = re.sub(r’\d+’, ‘ ‘, text) # Remove all digits/numbers by replacing them with a space
text = re.sub(r'[^\w\s]’, ‘ ‘, text) # Remove punctuation by replacing anything that is NOT a word character or whitespace with a space
text = ” “.join(w for w in text.split() if w not in set(stopwords.words(‘english’))) # Remove stopwords (common words like ‘the’, ‘is’, ‘this’)
return text # Return the cleaned text
print(clean_text(“Please follow this link.”)) # Expected output: “please follow link”
```
import re
from nltk.corpus import stopwords

def clean_text(text):
    text = text.lower()
    text = re.sub(r'\d+', ' ', text)
    text = re.sub(r'[^\w\s]', ' ', text)
    text = " ".join(w for w in text.split() if w not in set(stopwords.words('english')))
    return text

print(clean_text("Please follow this link."))
```
Output
```
please follow link
```
Example #2

If you want to check a phishing email for broken words, you can do that using nltk module

import nltk # Import NLTK library
words = set(nltk.corpus.words.words()) # Load the set of valid English words from the NLTK corpus
sentence = “Please followw this link.” # Example sentence to check
errors = [] # List to store words not found in the dictionary (possible typos)
for w in nltk.wordpunct_tokenize(sentence): # Tokenize the sentence into words and punctuation
if w.lower() in words or not w.isalpha(): # Check if the word is in the dictionary or is non-alphabetic (punctuation, numbers)
pass # Word is correct or ignored
else:
errors.append(w) # Word is likely a typo
print(“Error(s): “, len(errors)) # Print the number of errors found
```
import nltk 
words = set(nltk.corpus.words.words())
sentence = "Please followw this link."
errors = []
for w in nltk.wordpunct_tokenize(sentence):
    if w.lower() in words or not w.isalpha():
        pass
    else:
        errors.append(w)
print("Error(s): ", len(errors))
```
Output
```
Error(s): 1
```
May 3, 2026
Web Scraping Prevention
Web Scraping Prevention Techniques

Many websites prohibit web scraping and use anti-scraping measures to block automated data extraction. These protections can make it challenging and time-consuming to scale scraping activities. For instance, if a script sends requests too frequently (like once every second), the website may block those requests or display a message asking the user to slow down or try again later.

Fingerprinting

Fingerprinting is a technique used to identify and track clients based on detailed technical information such as IP addresses, user-agent strings, browser versions, operating systems, screen resolutions, installed fonts, and even hardware characteristics. By combining these signals, websites can create a unique “fingerprint” for each visitor. If multiple requests appear to originate from the same fingerprint in an automated pattern, the system can flag or block them, even if the IP address changes.

Example

from http.server import BaseHTTPRequestHandler, HTTPServer # import base classes for HTTP server
from time import time # import time function for request timing
requests = {} # dictionary to store request history per fingerprint

class CustomHandler(BaseHTTPRequestHandler): # define request handler class
def do_GET(self): # handle GET requests
now = time() # current timestamp
ip = self.client_address[0] # get client IP address
user_agent = self.headers.get(“User-Agent”, “”) # browser info
accept_lang = self.headers.get(“Accept-Language”, “”) # language preference
encoding = self.headers.get(“Accept-Encoding”, “”) # compression support
fingerprint = f”{ip}{user_agent}|{accept_lang}|{encoding}” # create a simple fingerprint using IP + headers
requests[fingerprint] = [t for t in requests.get(fingerprint, []) if now – t < 10] # keep only requests from last 10 seconds for this fingerprint
requests[fingerprint].append(now) # log current request time

if len(requests[fingerprint]) > 5: # if too many requests in time window, block client
self.send_response(429) # HTTP status: Too Many Requests
self.send_header(‘Content-type’, ‘text/plain’) # response type
self.end_headers() # finish HTTP headers
self.wfile.write(f”Fingerprint:{fingerprint} – Too many requests…”.encode(“utf-8”)) # send blocked message with fingerprint info
else:
self.send_response(200) # HTTP OK
self.send_header(‘Content-type’, ‘text/plain’) # response type
self.end_headers() # finish headers
self.wfile.write(f”Fingerprint:{fingerprint} – Server Running…”.encode(“utf-8”)) # send normal response with fingerprint info

return # end request handling

HTTPServer((“”, 8085), CustomHandler).serve_forever() # start server on port 8080 and run forever
```
from http.server import BaseHTTPRequestHandler, HTTPServer
from time import time
requests = {}

class CustomHandler(BaseHTTPRequestHandler):
    def do_GET(self):
        now = time()
        ip = self.client_address[0]
        user_agent = self.headers.get("User-Agent", "")
        accept_lang = self.headers.get("Accept-Language", "")
        encoding = self.headers.get("Accept-Encoding", "")
        fingerprint = f"{ip}{user_agent}|{accept_lang}|{encoding}"
        requests[fingerprint] = [t for t in requests.get(fingerprint, []) if now - t < 10]
        requests[fingerprint].append(now)

        if len(requests[fingerprint]) > 5:
            self.send_response(429)
            self.send_header('Content-type', 'text/plain')
            self.end_headers()
            self.wfile.write(f"Fingerprint:{fingerprint} - Too many requests...".encode("utf-8"))
        else:
            self.send_response(200)
            self.send_header('Content-type', 'text/plain')
            self.end_headers()
            self.wfile.write(f"Fingerprint:{fingerprint} - Server Running...".encode("utf-8"))

        return

HTTPServer(("", 8080), CustomHandler).serve_forever()
```
Authentication

Authentication systems require users to verify their identity before accessing content. This is often achieved through login pages, API keys, or session tokens. By requiring users to authenticate, websites can better control who accesses their data and monitor usage per account. This also allows them to enforce limits on a per-user basis rather than per IP address, making scraping more challenging.

Example

from http.server import BaseHTTPRequestHandler, HTTPServer # import basic HTTP server classes
api_keys = {“Example-6C324086-6B3B-48D5-9FEE-4A30C66B70CC”:[“ip”:””,”user”,””]} # dictionary storing valid API keys and optional metadata (invalid Python dict syntax for nested list here)

class CustomHandler(BaseHTTPRequestHandler): # define request handler class
def do_GET(self): # handle GET requests
api_key = self.headers.get(“X-API-Key”, “”) # extract API key from request headers
if api_key not in api_keys: # check if API key is invalid or missing
self.send_response(401) # return HTTP 401 Unauthorized
self.send_header(‘Content-type’, ‘text/plain’) # set response content type
self.end_headers() # finish HTTP headers
self.wfile.write(b”Authentication required”) # send authentication error message
else: # if API key is valid
self.send_response(200) # return HTTP 200 OK
self.send_header(‘Content-type’, ‘text/plain’) # set response content type
self.end_headers() # finish HTTP headers
self.wfile.write(b”Server Running…”) # send success response message
return # end request handling

HTTPServer((“”, 8080), CustomHandler).serve_forever() # start server on port 8080 and run forever
```
from http.server import BaseHTTPRequestHandler, HTTPServer
api_keys = {"Example-6C324086-6B3B-48D5-9FEE-4A30C66B70CC":["ip":"","user",""]}

class CustomHandler(BaseHTTPRequestHandler):
    def do_GET(self):
        api_key = self.headers.get("X-API-Key", "")
        if api_key not in api_keys:
            self.send_response(401)
            self.send_header('Content-type', 'text/plain')
            self.end_headers()
            self.wfile.write(b"Authentication required")
        else:
            self.send_response(200)
            self.send_header('Content-type', 'text/plain')
            self.end_headers()
            self.wfile.write(b"Server Running...")
        return

HTTPServer(("", 8080), CustomHandler).serve_forever()
```
Challenges (CAPTCHA)

CAPTCHA tests are designed to differentiate humans from bots. They may involve identifying distorted text, selecting images, solving puzzles, or performing simple interactive tasks. Since most automated scripts struggle with these challenges, CAPTCHA serves as an effective barrier to prevent large-scale scraping or automated form submissions.

Example

from http.server import BaseHTTPRequestHandler, HTTPServer # HTTP server framework
from random import randint # generate random numbers for CAPTCHA
from uuid import uuid4 # generate unique session ID for each CAPTCHA
captcha_db = {} # store captcha_id -> correct answer mapping

class Handler(BaseHTTPRequestHandler): # request handler class
def do_GET(self): # handle GET requests (show CAPTCHA page)
random_a = randint(1, 10) # first random number
random_b = randint(1, 10) # second random number
captcha_id = str(uuid4()) # create unique ID for this CAPTCHA session
captcha_db[captcha_id] = str(random_a + random_b) # store correct answer on server
self.send_response(200) # HTTP 200 OK
self.send_header(“Content-type”, “text/html”) # response is HTML page
self.end_headers() # finish headers
# send HTML form to user
self.wfile.write(f”””
<html>
<body>
<h3>CAPTCHA: What is {random_a} + {random_b}?</h3>
<form method=”POST”>
<input name=”answer” type=”text”>
<input type=”hidden” name=”captcha_id” value=”{captcha_id}”>
<input type=”submit” value=”Submit”>
</form>

</body>
</html>
“””.encode())

def do_POST(self): # handle form submission
length = int(self.headers.get(‘Content-Length’)) # get size of request body
data = self.rfile.read(length).decode() # read and decode form data
fields = dict(x.split(“=”) for x in data.split(“&”)) # parse form fields
user_answer = fields.get(“answer”, “”) # user submitted answer
captcha_id = fields.get(“captcha_id”, “”) # session id from form
correct_answer = captcha_db.get(captcha_id, “”) # get stored correct answer
self.send_response(200) # HTTP OK
self.send_header(“Content-type”, “text/plain”) # plain text response
self.end_headers() # finish headers
if user_answer == correct_answer: # check if answer is correct
self.wfile.write(b”CAPTCHA passed”) # success message
else:
self.wfile.write(b”CAPTCHA failed”) # failure message

del captcha_db[captcha_id] # remove CAPTCHA after attempt (single-use)

HTTPServer((“”, 8080), Handler).serve_forever() # start server on port 8080
```
from http.server import BaseHTTPRequestHandler, HTTPServer
from random import randint
from uuid import uuid4
captcha_db = {}

class Handler(BaseHTTPRequestHandler):
    def do_GET(self):
        random_a = randint(1, 10)
        random_b = randint(1, 10)
        captcha_id = str(uuid4())
        captcha_db[captcha_id] = str(random_a + random_b)
        self.send_response(200)
        self.send_header("Content-type", "text/html")
        self.end_headers()
       
        self.wfile.write(f"""
        <html>
        <body>
            <h3>CAPTCHA: What is {random_a} + {random_b}?</h3>
            <form method="POST">
                <input name="answer" type="text">
                <input type="hidden" name="captcha_id" value="{captcha_id}">
                <input type="submit" value="Submit">
            </form>

        </body>
        </html>
        """.encode())

    def do_POST(self):
        length = int(self.headers.get('Content-Length'))
        data = self.rfile.read(length).decode()
        fields = dict(x.split("=") for x in data.split("&"))
        user_answer = fields.get("answer", "")
        captcha_id = fields.get("captcha_id", "")
        correct_answer = captcha_db.get(captcha_id, "")
        self.send_response(200)
        self.send_header("Content-type", "text/plain")
        self.end_headers()
        if user_answer == correct_answer:
            self.wfile.write(b"CAPTCHA passed")
        else:
            self.wfile.write(b"CAPTCHA failed")

        del captcha_db[captcha_id]

HTTPServer(("", 8080), Handler).serve_forever()
```
Dynamic Content

Dynamic content is generated at runtime rather than being fixed in the HTML source. This often involves JavaScript rendering, API calls, or asynchronous data loading. Since the content is not directly present in the initial page source, simple HTML-only scraping tools cannot easily extract the data without simulating a real browser environment.

from http.server import BaseHTTPRequestHandler, HTTPServer # HTTP server framework
from datetime import datetime # used to generate dynamic runtime timestamp

class CustomHandler(BaseHTTPRequestHandler): # request handler class
def do_GET(self): # handle GET requests
if self.path == “/”: # main webpage route
self.send_response(200) # HTTP 200 OK
self.send_header(‘Content-type’, ‘text/html’) # response is HTML page
self.end_headers() # finish headers
self.wfile.write(b”””
<html>
<body>
<h1>Server Running…</h1>
<div id=”data”>Loading…</div>
<script>
setTimeout(() => { // wait 10 seconds before loading data
fetch(“/data”) // request dynamic backend endpoint
.then(r => r.text()) // convert response to text
.then(t => document.getElementById(“data”).innerText = t); // update page content
}, 10000); // 10000ms delay (10 seconds)
</script>
</body>
</html>
“””)
return # stop processing this request

if self.path == “/data”: # dynamic data endpoint
self.send_response(200) # HTTP OK
self.send_header(‘Content-type’, ‘text/plain’) # plain text response
self.end_headers() # finish headers
self.wfile.write(f”Dynamic Content Loaded: {datetime.now().strftime(“%m-%d-%Y %I:%M %p”)}”.encode()) # write the dynamic content
return # end request

HTTPServer((“”, 8080), CustomHandler).serve_forever() # start server on port 8080
```
from http.server import BaseHTTPRequestHandler, HTTPServer
from datetime import datetime

class CustomHandler(BaseHTTPRequestHandler):# request handler class
    def do_GET(self):
        if self.path == "/":
            self.send_response(200)
            self.send_header('Content-type', 'text/html')
            self.end_headers()
            self.wfile.write(b"""
            <html>
            <body>
                <h1>Server Running...</h1>
                <div id="data">Loading...</div>
                <script>
                    setTimeout(() => { // wait 10 seconds before loading data
                        fetch("/data") // request dynamic backend endpoint
                        .then(r => r.text()) // convert response to text
                        .then(t => document.getElementById("data").innerText = t); // update page content
                    }, 10000);// 10000ms delay (10 seconds)
                </script>
            </body>
            </html>
            """)
            return

        if self.path == "/data":
            self.send_response(200)
            self.send_header('Content-type', 'text/plain')
            self.end_headers()
            self.wfile.write(f"Dynamic Content Loaded: {datetime.now().strftime("%m-%d-%Y %I:%M %p")}".encode())
            return

HTTPServer(("", 8080), CustomHandler).serve_forever()
```
Randomized Identifiers

Websites often change element IDs, class names, or API endpoints dynamically. This prevents scrapers from relying on fixed selectors to locate data. For instance, a product price element might have a different ID each time the page loads. This forces scrapers to constantly adapt and makes automation less reliable.

from http.server import BaseHTTPRequestHandler, HTTPServer # import HTTP server classes
from random import randint # used to generate random IDs

class CustomHandler(BaseHTTPRequestHandler): # define request handler
def do_GET(self): # handle GET requests
self.send_response(200) # send HTTP 200 OK status
self.send_header(‘Content-type’, ‘text/html’) # response is HTML
self.end_headers() # finish headers
random_id = f”id_{randint(1000,9999)}” # generate random element ID each request
# send HTML response to client
self.wfile.write(f”””
<html>
<body>
<div id=”{random_id}”>Gas Price is: $5.99 per gallon</div>
</body>
</html>
“””.encode())

HTTPServer((“”, 8080), CustomHandler).serve_forever() # start server on port 8080
```
from http.server import BaseHTTPRequestHandler, HTTPServer
from random import randint 

class CustomHandler(BaseHTTPRequestHandler): 
    def do_GET(self):
        self.send_response(200)
        self.send_header('Content-type', 'text/html') 
        self.end_headers()
        random_id = f"id_{randint(1000,9999)}"
        self.wfile.write(f"""
        <html>
            <body>
                <div id="{random_id}">Gas Price is: $5.99 per gallon</div>
            </body>
        </html>
        """.encode()) 

HTTPServer(("", 8080), CustomHandler).serve_forever()
```
User Behavior Analysis

User Behavior Analysis technique focuses on analyzing how users interact with a website over time. Typical human behavior includes pauses, scrolling, clicks, and irregular timing, while bots tend to generate consistent, fast, and repetitive request patterns. Websites use machine learning or rule-based systems to detect anomalies, such as extremely fast navigation, identical click paths, or repetitive page access patterns, and subsequently restrict or block suspicious activity.

Honeypots

Honeypots are hidden elements embedded in a webpage that are either invisible or irrelevant to normal users (such as hidden links or form fields). Bots that blindly follow all available elements may end up interacting with these traps. Once triggered, the system can flag the behavior as automated and take action such as blocking the IP address, logging the activity, or redirecting the user.
April 28, 2026