QeeqBox

Tag: Python

Natural Language Processing
Natural Language Processing

Natural Language Processing (NLP) is a branch of artificial intelligence that focuses on enabling computers to understand, interpret, and interact with human language in a meaningful way. It combines linguistics, computer science, and machine learning to process text and speech, allowing machines to analyze syntax, semantics, and context in written or spoken language. NLP is used for tasks such as sentiment analysis, language translation, chatbots, information extraction, and text summarization. While NLP focuses on understanding and interpreting language, rather than predicting future events, it forms the foundation for applications that require machines to comprehend and respond to human communication in a natural, human-like manner.

Text Pre-Processing

There is a popular module in Python called nltk that used for NLP methodology. This module can be used to enhance threat detection and response

Install

pip3 # Python package installer for Python 3
install # Command that tells pip to install a package
nltk # The Natural Language Toolkit library (used for NLP tasks)
```
pip3 install nltk
```
Run this in Python

import nltk # Imports the Natural Language Toolkit (NLP library) into your Python script
nltk.download(‘all’) # Downloads all available NLTK datasets, models, and corpora
```
import nltk
nltk.download('all')
```
Breaking Sentences Into Words

You can break unstructured data and natural language text into chunks of information (Numerical data structure that can be used for machine learning) using a tokenizer. E.g., breaking a sentence words using the word_tokenize() method

Example

from nltk.tokenize import word_tokenize # Imports the word_tokenize function from NLTK’s tokenize module
print(word_tokenize(“Please follow this link.”)) # Tokenizes (splits) the sentence into individual words and punctuation, then prints the resulting list
```
from nltk.tokenize import word_tokenize
print(word_tokenize("Please follow this link."))
```
Output
```
['Please', 'follow', 'this', 'link', '.']
```
Finding Common Words

You can find common words in a sentence using the FreqDist() method

Example

from nltk.probability import FreqDist # Imports FreqDist class to calculate word frequency distribution
from nltk.tokenize import word_tokenize # Imports the word_tokenize function to split text into tokens
tokens = word_tokenize(“Please follow this link.”) # Tokenizes the sentence into individual words and punctuation marks
FreqDist(tokens).tabulate() # Creates a frequency distribution of the tokens and displays the counts in a formatted table
```
from nltk.probability import FreqDist
from nltk.tokenize import word_tokenize
tokens = word_tokenize("Please follow this link.")
FreqDist(tokens).tabulate()
```
Output
```
 Please follow    this    link       . 
      1       1       1       1       1 
```
Finding Senetnce Parts

If you want to find nouns, pronouns, verbs, adjectives, adverbs, prepositions, conjunctions, interjections, etc tags in a sentence, you can use pos_tag() method, you can review all the tags using nltk.help.upenn_tagset()

Example

from nltk import pos_tag # Imports the part-of-speech (POS) tagging function
from nltk.tokenize import word_tokenize # Imports the tokenizer to split text into words
tokens = word_tokenize(“Please follow this link.”) # Splits the sentence into individual tokens (words and punctuation)
for token in tokens: # Loops through each token
print(pos_tag([token])) # Tags the token with its part of speech and prints it
```
from nltk import pos_tag
from nltk.tokenize import word_tokenize
tokens = word_tokenize("Please follow this link.")
for token in tokens:
    print(pos_tag([token]))
```
Output
```
[('Please', 'VB')]
[('follow', 'NN')]
[('this', 'DT')]
[('link', 'NN')]
[('.', '.')]
```
Normalizing Words

If you want to normalize a word, you can use the PorterStemmer() method or lemmatize(). Stemming removes the last few characters from a word (It removes the suffix from the word), whereas lemmatization replaces a word with its root or head (It returns the lemma of the word). Usually, search engines use them to analyze the meaning of a word, then use that to return search results that include all relevant forms of that word used. E.g., if you search for cars, you also get result for car. Bots, use that to understand the overall meaning of the sentence.

Example

from nltk.stem import PorterStemmer # Imports the Porter Stemmer algorithm for word stemming
for item in [“test”, “tests”, “testing”, “tested”]: # Loops through each word in the list
print(item, “: “, PorterStemmer().stem(item)) # Applies stemming to each word and prints the original word along with its stemmed (root) form
```
from nltk.stem import PorterStemmer
for item in ["test","tests","testing","tested"]:
    print(item, ": ",PorterStemmer().stem(item))
```
Output
```
test
```
Example

from nltk.stem import WordNetLemmatizer # Imports the WordNet lemmatizer (uses vocabulary + morphology rules)
for item in [“test”, “tests”, “testing”, “tested”]: # Loops through each word in the list
print(item, “: “, WordNetLemmatizer().lemmatize(item)) # Lemmatizes (reduces to dictionary base form) each word and prints the original word with its lemma
```
from nltk.stem import WordNetLemmatizer
for item in ["test","tests","testing","tested"]:
    print(item, ": ", WordNetLemmatizer().lemmatize(item))
```
Output
```
testing
```
Example

from nltk.stem import WordNetLemmatizer # Imports the WordNet lemmatizer
from nltk.corpus import wordnet # Imports WordNet corpus (provides POS constants)
from nltk import word_tokenize, pos_tag # Imports tokenizer and POS tagger
from collections import defaultdict # (Not used here, but commonly used for default dictionary behavior)
mapped = {
“V”: wordnet.VERB, # Maps POS tags starting with ‘V’ to VERB
“J”: wordnet.ADJ, # Maps POS tags starting with ‘J’ to ADJECTIVE
“R”: wordnet.ADV # Maps POS tags starting with ‘R’ to ADVERB
}
tokens = word_tokenize(“caring”) # Tokenizes the word
for token, tag in pos_tag(tokens): # Tags the token with its Penn Treebank POS tag (e.g., VBG, NN, JJ)
tag = mapped.get(tag[0], wordnet.NOUN) # Looks at the first letter of the POS tag, of it exists in the mapped dictionary, use the corresponding WordNet POS, otherwise, default to NOUN
print(token, WordNetLemmatizer().lemmatize(token, tag)) # Lemmatizes the token using the correct POS
```
from nltk.stem import WordNetLemmatizer
from nltk.corpus import wordnet
from nltk import word_tokenize, pos_tag
from collections import defaultdict

mapped = {
    "V": wordnet.VERB,
    "J": wordnet.ADJ,
    "R": wordnet.ADV
}

tokens = word_tokenize("caring")
for token, tag in pos_tag(tokens):
    tag  = mapped.get(tag[0], wordnet.NOUN)
    print(token, WordNetLemmatizer().lemmatize(token, tag))
```
Part-Of-Speech

POS stands for Part-Of-Speech, which is a grammatical category assigned to each word in a sentence. POS tagging tells you whether a word is a noun, verb, adjective, adverb, etc., based on its role in the sentence
```
CC Coordinating conjunction
CD Cardinal number
DT Determiner
EX Existential there 
FW Foreign word
IN Preposition or subordinating conjunction
JJ Adjective
JJR Adjective, comparative
JJS Adjective, superlative
LS List item marker
MD Modal
NN Noun, singular or mass
NNS Noun, plural
NNP Proper noun, singular
NNPS Proper noun, plural
PDT Predeterminer
POS Possessive ending
PRP Personal pronoun
PRP$ Possessive pronoun
RB Adverb
RBR Adverb, comparative
RBS Adverb, superlative
RP Particle
SYM Symbol
TO to
UH Interjection
VB Verb, base form
VBD Verb, past tense
VBG Verb, gerund or present participle
VBN Verb, past participle
VBP Verb, non-3rd person singular present
VBZ Verb, 3rd person singular present
WDT Wh-determiner
WP Wh-pronoun
WP$ Possessive wh-pronoun
WRB Wh-adverb
```
Remove Stops Words

If you want to remove stopwords from a sentence, you can compare the words of the sentence with the stopwords

Example

from nltk.tokenize import sent_tokenize, word_tokenize # Import sentence and word tokenizers
from nltk.corpus import stopwords # Import stopwords list
tokens = word_tokenize(“Please followw this link.”) # Tokenize sentence into words
stop_words = set(stopwords.words(‘english’)) # Get the set of English stopwords
filtered = [w for w in tokens if w.lower() not in stop_words] # Filter out tokens that are stopwords
print(filtered) # Print the filtered words
```
from nltk.tokenize import sent_tokenize, word_tokenize
from nltk.corpus import stopwords
tokens = word_tokenize("Please followw this link.")
stop_words = set(stopwords.words('english'))
filtered = [w for w in tokens if w not in stop_words]
print(filtered)
```
Output
```
['Please', 'followw', 'link', '.']
```
Example #1

You can clean text using regex and nltk

import re # Import regular expressions for pattern-based text cleaning
from nltk.corpus import stopwords # Import list of common English stopwords
def clean_text(text):
text = text.lower() # Convert all letters to lowercase so that ‘This’ and ‘this’ are treated the same
text = re.sub(r’\d+’, ‘ ‘, text) # Remove all digits/numbers by replacing them with a space
text = re.sub(r'[^\w\s]’, ‘ ‘, text) # Remove punctuation by replacing anything that is NOT a word character or whitespace with a space
text = ” “.join(w for w in text.split() if w not in set(stopwords.words(‘english’))) # Remove stopwords (common words like ‘the’, ‘is’, ‘this’)
return text # Return the cleaned text
print(clean_text(“Please follow this link.”)) # Expected output: “please follow link”
```
import re
from nltk.corpus import stopwords

def clean_text(text):
    text = text.lower()
    text = re.sub(r'\d+', ' ', text)
    text = re.sub(r'[^\w\s]', ' ', text)
    text = " ".join(w for w in text.split() if w not in set(stopwords.words('english')))
    return text

print(clean_text("Please follow this link."))
```
Output
```
please follow link
```
Example #2

If you want to check a phishing email for broken words, you can do that using nltk module

import nltk # Import NLTK library
words = set(nltk.corpus.words.words()) # Load the set of valid English words from the NLTK corpus
sentence = “Please followw this link.” # Example sentence to check
errors = [] # List to store words not found in the dictionary (possible typos)
for w in nltk.wordpunct_tokenize(sentence): # Tokenize the sentence into words and punctuation
if w.lower() in words or not w.isalpha(): # Check if the word is in the dictionary or is non-alphabetic (punctuation, numbers)
pass # Word is correct or ignored
else:
errors.append(w) # Word is likely a typo
print(“Error(s): “, len(errors)) # Print the number of errors found
```
import nltk 
words = set(nltk.corpus.words.words())
sentence = "Please followw this link."
errors = []
for w in nltk.wordpunct_tokenize(sentence):
    if w.lower() in words or not w.isalpha():
        pass
    else:
        errors.append(w)
print("Error(s): ", len(errors))
```
Output
```
Error(s): 1
```
May 3, 2026
Web Scraping Prevention
Web Scraping Prevention Techniques

Many websites prohibit web scraping and use anti-scraping measures to block automated data extraction. These protections can make it challenging and time-consuming to scale scraping activities. For instance, if a script sends requests too frequently (like once every second), the website may block those requests or display a message asking the user to slow down or try again later.

Fingerprinting

Fingerprinting is a technique used to identify and track clients based on detailed technical information such as IP addresses, user-agent strings, browser versions, operating systems, screen resolutions, installed fonts, and even hardware characteristics. By combining these signals, websites can create a unique “fingerprint” for each visitor. If multiple requests appear to originate from the same fingerprint in an automated pattern, the system can flag or block them, even if the IP address changes.

Example

from http.server import BaseHTTPRequestHandler, HTTPServer # import base classes for HTTP server
from time import time # import time function for request timing
requests = {} # dictionary to store request history per fingerprint

class CustomHandler(BaseHTTPRequestHandler): # define request handler class
def do_GET(self): # handle GET requests
now = time() # current timestamp
ip = self.client_address[0] # get client IP address
user_agent = self.headers.get(“User-Agent”, “”) # browser info
accept_lang = self.headers.get(“Accept-Language”, “”) # language preference
encoding = self.headers.get(“Accept-Encoding”, “”) # compression support
fingerprint = f”{ip}{user_agent}|{accept_lang}|{encoding}” # create a simple fingerprint using IP + headers
requests[fingerprint] = [t for t in requests.get(fingerprint, []) if now – t < 10] # keep only requests from last 10 seconds for this fingerprint
requests[fingerprint].append(now) # log current request time

if len(requests[fingerprint]) > 5: # if too many requests in time window, block client
self.send_response(429) # HTTP status: Too Many Requests
self.send_header(‘Content-type’, ‘text/plain’) # response type
self.end_headers() # finish HTTP headers
self.wfile.write(f”Fingerprint:{fingerprint} – Too many requests…”.encode(“utf-8”)) # send blocked message with fingerprint info
else:
self.send_response(200) # HTTP OK
self.send_header(‘Content-type’, ‘text/plain’) # response type
self.end_headers() # finish headers
self.wfile.write(f”Fingerprint:{fingerprint} – Server Running…”.encode(“utf-8”)) # send normal response with fingerprint info

return # end request handling

HTTPServer((“”, 8085), CustomHandler).serve_forever() # start server on port 8080 and run forever
```
from http.server import BaseHTTPRequestHandler, HTTPServer
from time import time
requests = {}

class CustomHandler(BaseHTTPRequestHandler):
    def do_GET(self):
        now = time()
        ip = self.client_address[0]
        user_agent = self.headers.get("User-Agent", "")
        accept_lang = self.headers.get("Accept-Language", "")
        encoding = self.headers.get("Accept-Encoding", "")
        fingerprint = f"{ip}{user_agent}|{accept_lang}|{encoding}"
        requests[fingerprint] = [t for t in requests.get(fingerprint, []) if now - t < 10]
        requests[fingerprint].append(now)

        if len(requests[fingerprint]) > 5:
            self.send_response(429)
            self.send_header('Content-type', 'text/plain')
            self.end_headers()
            self.wfile.write(f"Fingerprint:{fingerprint} - Too many requests...".encode("utf-8"))
        else:
            self.send_response(200)
            self.send_header('Content-type', 'text/plain')
            self.end_headers()
            self.wfile.write(f"Fingerprint:{fingerprint} - Server Running...".encode("utf-8"))

        return

HTTPServer(("", 8080), CustomHandler).serve_forever()
```
Authentication

Authentication systems require users to verify their identity before accessing content. This is often achieved through login pages, API keys, or session tokens. By requiring users to authenticate, websites can better control who accesses their data and monitor usage per account. This also allows them to enforce limits on a per-user basis rather than per IP address, making scraping more challenging.

Example

from http.server import BaseHTTPRequestHandler, HTTPServer # import basic HTTP server classes
api_keys = {“Example-6C324086-6B3B-48D5-9FEE-4A30C66B70CC”:[“ip”:””,”user”,””]} # dictionary storing valid API keys and optional metadata (invalid Python dict syntax for nested list here)

class CustomHandler(BaseHTTPRequestHandler): # define request handler class
def do_GET(self): # handle GET requests
api_key = self.headers.get(“X-API-Key”, “”) # extract API key from request headers
if api_key not in api_keys: # check if API key is invalid or missing
self.send_response(401) # return HTTP 401 Unauthorized
self.send_header(‘Content-type’, ‘text/plain’) # set response content type
self.end_headers() # finish HTTP headers
self.wfile.write(b”Authentication required”) # send authentication error message
else: # if API key is valid
self.send_response(200) # return HTTP 200 OK
self.send_header(‘Content-type’, ‘text/plain’) # set response content type
self.end_headers() # finish HTTP headers
self.wfile.write(b”Server Running…”) # send success response message
return # end request handling

HTTPServer((“”, 8080), CustomHandler).serve_forever() # start server on port 8080 and run forever
```
from http.server import BaseHTTPRequestHandler, HTTPServer
api_keys = {"Example-6C324086-6B3B-48D5-9FEE-4A30C66B70CC":["ip":"","user",""]}

class CustomHandler(BaseHTTPRequestHandler):
    def do_GET(self):
        api_key = self.headers.get("X-API-Key", "")
        if api_key not in api_keys:
            self.send_response(401)
            self.send_header('Content-type', 'text/plain')
            self.end_headers()
            self.wfile.write(b"Authentication required")
        else:
            self.send_response(200)
            self.send_header('Content-type', 'text/plain')
            self.end_headers()
            self.wfile.write(b"Server Running...")
        return

HTTPServer(("", 8080), CustomHandler).serve_forever()
```
Challenges (CAPTCHA)

CAPTCHA tests are designed to differentiate humans from bots. They may involve identifying distorted text, selecting images, solving puzzles, or performing simple interactive tasks. Since most automated scripts struggle with these challenges, CAPTCHA serves as an effective barrier to prevent large-scale scraping or automated form submissions.

Example

from http.server import BaseHTTPRequestHandler, HTTPServer # HTTP server framework
from random import randint # generate random numbers for CAPTCHA
from uuid import uuid4 # generate unique session ID for each CAPTCHA
captcha_db = {} # store captcha_id -> correct answer mapping

class Handler(BaseHTTPRequestHandler): # request handler class
def do_GET(self): # handle GET requests (show CAPTCHA page)
random_a = randint(1, 10) # first random number
random_b = randint(1, 10) # second random number
captcha_id = str(uuid4()) # create unique ID for this CAPTCHA session
captcha_db[captcha_id] = str(random_a + random_b) # store correct answer on server
self.send_response(200) # HTTP 200 OK
self.send_header(“Content-type”, “text/html”) # response is HTML page
self.end_headers() # finish headers
# send HTML form to user
self.wfile.write(f”””
<html>
<body>
<h3>CAPTCHA: What is {random_a} + {random_b}?</h3>
<form method=”POST”>
<input name=”answer” type=”text”>
<input type=”hidden” name=”captcha_id” value=”{captcha_id}”>
<input type=”submit” value=”Submit”>
</form>

</body>
</html>
“””.encode())

def do_POST(self): # handle form submission
length = int(self.headers.get(‘Content-Length’)) # get size of request body
data = self.rfile.read(length).decode() # read and decode form data
fields = dict(x.split(“=”) for x in data.split(“&”)) # parse form fields
user_answer = fields.get(“answer”, “”) # user submitted answer
captcha_id = fields.get(“captcha_id”, “”) # session id from form
correct_answer = captcha_db.get(captcha_id, “”) # get stored correct answer
self.send_response(200) # HTTP OK
self.send_header(“Content-type”, “text/plain”) # plain text response
self.end_headers() # finish headers
if user_answer == correct_answer: # check if answer is correct
self.wfile.write(b”CAPTCHA passed”) # success message
else:
self.wfile.write(b”CAPTCHA failed”) # failure message

del captcha_db[captcha_id] # remove CAPTCHA after attempt (single-use)

HTTPServer((“”, 8080), Handler).serve_forever() # start server on port 8080
```
from http.server import BaseHTTPRequestHandler, HTTPServer
from random import randint
from uuid import uuid4
captcha_db = {}

class Handler(BaseHTTPRequestHandler):
    def do_GET(self):
        random_a = randint(1, 10)
        random_b = randint(1, 10)
        captcha_id = str(uuid4())
        captcha_db[captcha_id] = str(random_a + random_b)
        self.send_response(200)
        self.send_header("Content-type", "text/html")
        self.end_headers()
       
        self.wfile.write(f"""
        <html>
        <body>
            <h3>CAPTCHA: What is {random_a} + {random_b}?</h3>
            <form method="POST">
                <input name="answer" type="text">
                <input type="hidden" name="captcha_id" value="{captcha_id}">
                <input type="submit" value="Submit">
            </form>

        </body>
        </html>
        """.encode())

    def do_POST(self):
        length = int(self.headers.get('Content-Length'))
        data = self.rfile.read(length).decode()
        fields = dict(x.split("=") for x in data.split("&"))
        user_answer = fields.get("answer", "")
        captcha_id = fields.get("captcha_id", "")
        correct_answer = captcha_db.get(captcha_id, "")
        self.send_response(200)
        self.send_header("Content-type", "text/plain")
        self.end_headers()
        if user_answer == correct_answer:
            self.wfile.write(b"CAPTCHA passed")
        else:
            self.wfile.write(b"CAPTCHA failed")

        del captcha_db[captcha_id]

HTTPServer(("", 8080), Handler).serve_forever()
```
Dynamic Content

Dynamic content is generated at runtime rather than being fixed in the HTML source. This often involves JavaScript rendering, API calls, or asynchronous data loading. Since the content is not directly present in the initial page source, simple HTML-only scraping tools cannot easily extract the data without simulating a real browser environment.

from http.server import BaseHTTPRequestHandler, HTTPServer # HTTP server framework
from datetime import datetime # used to generate dynamic runtime timestamp

class CustomHandler(BaseHTTPRequestHandler): # request handler class
def do_GET(self): # handle GET requests
if self.path == “/”: # main webpage route
self.send_response(200) # HTTP 200 OK
self.send_header(‘Content-type’, ‘text/html’) # response is HTML page
self.end_headers() # finish headers
self.wfile.write(b”””
<html>
<body>
<h1>Server Running…</h1>
<div id=”data”>Loading…</div>
<script>
setTimeout(() => { // wait 10 seconds before loading data
fetch(“/data”) // request dynamic backend endpoint
.then(r => r.text()) // convert response to text
.then(t => document.getElementById(“data”).innerText = t); // update page content
}, 10000); // 10000ms delay (10 seconds)
</script>
</body>
</html>
“””)
return # stop processing this request

if self.path == “/data”: # dynamic data endpoint
self.send_response(200) # HTTP OK
self.send_header(‘Content-type’, ‘text/plain’) # plain text response
self.end_headers() # finish headers
self.wfile.write(f”Dynamic Content Loaded: {datetime.now().strftime(“%m-%d-%Y %I:%M %p”)}”.encode()) # write the dynamic content
return # end request

HTTPServer((“”, 8080), CustomHandler).serve_forever() # start server on port 8080
```
from http.server import BaseHTTPRequestHandler, HTTPServer
from datetime import datetime

class CustomHandler(BaseHTTPRequestHandler):# request handler class
    def do_GET(self):
        if self.path == "/":
            self.send_response(200)
            self.send_header('Content-type', 'text/html')
            self.end_headers()
            self.wfile.write(b"""
            <html>
            <body>
                <h1>Server Running...</h1>
                <div id="data">Loading...</div>
                <script>
                    setTimeout(() => { // wait 10 seconds before loading data
                        fetch("/data") // request dynamic backend endpoint
                        .then(r => r.text()) // convert response to text
                        .then(t => document.getElementById("data").innerText = t); // update page content
                    }, 10000);// 10000ms delay (10 seconds)
                </script>
            </body>
            </html>
            """)
            return

        if self.path == "/data":
            self.send_response(200)
            self.send_header('Content-type', 'text/plain')
            self.end_headers()
            self.wfile.write(f"Dynamic Content Loaded: {datetime.now().strftime("%m-%d-%Y %I:%M %p")}".encode())
            return

HTTPServer(("", 8080), CustomHandler).serve_forever()
```
Randomized Identifiers

Websites often change element IDs, class names, or API endpoints dynamically. This prevents scrapers from relying on fixed selectors to locate data. For instance, a product price element might have a different ID each time the page loads. This forces scrapers to constantly adapt and makes automation less reliable.

from http.server import BaseHTTPRequestHandler, HTTPServer # import HTTP server classes
from random import randint # used to generate random IDs

class CustomHandler(BaseHTTPRequestHandler): # define request handler
def do_GET(self): # handle GET requests
self.send_response(200) # send HTTP 200 OK status
self.send_header(‘Content-type’, ‘text/html’) # response is HTML
self.end_headers() # finish headers
random_id = f”id_{randint(1000,9999)}” # generate random element ID each request
# send HTML response to client
self.wfile.write(f”””
<html>
<body>
<div id=”{random_id}”>Gas Price is: $5.99 per gallon</div>
</body>
</html>
“””.encode())

HTTPServer((“”, 8080), CustomHandler).serve_forever() # start server on port 8080
```
from http.server import BaseHTTPRequestHandler, HTTPServer
from random import randint 

class CustomHandler(BaseHTTPRequestHandler): 
    def do_GET(self):
        self.send_response(200)
        self.send_header('Content-type', 'text/html') 
        self.end_headers()
        random_id = f"id_{randint(1000,9999)}"
        self.wfile.write(f"""
        <html>
            <body>
                <div id="{random_id}">Gas Price is: $5.99 per gallon</div>
            </body>
        </html>
        """.encode()) 

HTTPServer(("", 8080), CustomHandler).serve_forever()
```
User Behavior Analysis

User Behavior Analysis technique focuses on analyzing how users interact with a website over time. Typical human behavior includes pauses, scrolling, clicks, and irregular timing, while bots tend to generate consistent, fast, and repetitive request patterns. Websites use machine learning or rule-based systems to detect anomalies, such as extremely fast navigation, identical click paths, or repetitive page access patterns, and subsequently restrict or block suspicious activity.

Honeypots

Honeypots are hidden elements embedded in a webpage that are either invisible or irrelevant to normal users (such as hidden links or form fields). Bots that blindly follow all available elements may end up interacting with these traps. Once triggered, the system can flag the behavior as automated and take action such as blocking the IP address, logging the activity, or redirecting the user.
April 28, 2026
Web Scraping
Data Scraping

Data scraping is the process of extracting information from a target source and saving it into a file for further use. This target could be a website, an application, or any digital platform containing structured or unstructured data. The main goal of data scraping is to collect large amounts of data efficiently without manual copying, making it easier for organizations or individuals to gather the information they need for analysis or reporting.

The process often involves using automated tools or scripts, such as web crawlers, bots, or specialized scraping frameworks. These tools navigate the target source, locate the desired data, and extract it in a structured format such as CSV, JSON, or Excel. Depending on the source, data scraping may require overcoming challenges such as dynamic content, login requirements, or anti-bot measures. It is a technical process that requires careful handling to ensure accuracy and efficiency.

While data scraping focuses on data collection, the extracted information is often analyzed in a subsequent process called data mining. For example, a web crawler may scrape product details, prices, and reviews from e-commerce websites, and the collected data can then be analyzed to identify trends, patterns, or insights. By separating extraction from analysis, organizations can efficiently manage raw data and transform it into actionable intelligence, making data scraping a crucial first step in many data-driven workflows.

Web Scraping

Web Scraping is the automated process of extracting data from websites by using software tools or scripts to collect information directly from web pages. Websites can contain either static content, which is fixed in the page’s HTML and generally easier to scrape, or dynamic content, which is generated using JavaScript and may require more advanced tools or browser automation to access. Web scraping is commonly used for data collection, research, price monitoring, market analysis, and cybersecurity investigations. However, it is important to follow ethical and legal guidelines when scraping data, including reviewing the website’s terms of service and robots.txt file to ensure that scraping is permitted, as unauthorized data extraction may violate policies or laws.

Manual Web Scraping

The process of extracting data from webpages without using any scraping tools or features is convenient for very small amounts of content. Still, it becomes very complicated if the data is large or needs to be scraped more often. One of the great benefits of manual scraping is human review; every data point is checked by the person who scrapes it.

Manual Web Scraping (Example #1)

Getting all the URLs from this wiki page

Right click of the page and choose View Page Source

Search the page for the href html tags (This tag defines a hyperlink), click on Highlight All and copy them one by one, this will take very long time, what you can do is taking the content and paste it into a text editor, and use href=["'](?<link>.*?)['"] or (?<=href=")[^"]* regex

Save them into a file
```
href="/w/load.php?lang=en&amp;modules=codex-search-styles%7Cext.cite.styles%7Cext.uls.interlanguage%7Cext.visualEditor.desktopArticleTarget.noscript%7Cext.wikimediaBadges%7Cjquery.makeCollapsible.styles%7Cskins.vector.icons%2Cstyles%7Cwikibase.client.init&amp;only=styles&amp;skin=vector-2022"
href="/w/load.php?lang=en&amp;modules=ext.gadget.SubtleUpdatemarker%2CWatchlistGreenIndicators&amp;only=styles&amp;skin=vector-2022"
href="/w/load.php?lang=en&amp;modules=site.styles&amp;only=styles&amp;skin=vector-2022"
href="//upload.wikimedia.org"
href="//en.m.wikipedia.org/wiki/Malware"
href="/w/index.php?title=Malware&amp;action=edit"
href="/static/apple-touch/wikipedia.png"
href="/static/favicon/wikipedia.ico"
href="/w/opensearch_desc.php"
href="//en.wikipedia.org/w/api.php?action=rsd"
href="https://en.wikipedia.org/wiki/Malware"
href="https://creativecommons.org/licenses/by-sa/4.0/deed.en"
href="/w/index.php?title=Special:RecentChanges&amp;feed=atom"
href="//meta.wikimedia.org"
href="//login.wikimedia.org"
...
...
...
```
Automated Web Scraping

This is done by utilizing tools that get the content and save it into files; Python has been heavily utilized for web scraping. There are different Python modules like beautifulsoup or pandas that are used for both scraping and mining.

Automated Web Scraping (Example #1)

The beautifulsoup module is good for getting all the URLs from a webpage, this method of scraping is limited, it works great with static content, but you cannot get dynamic content or a screenshot of the website using this method

Install beautifulsoup4 and lxml using the pip command

from bs4 import BeautifulSoup # Import BeautifulSoup for HTML parsing
from requests import get # Import get() to send HTTP requests
headers = {“User-Agent”: “Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.75 Safari/537.36”} # Mimic a real browser
response = get(“https://en.wikipedia.org/wiki/Main_Page”, headers=headers) # Send GET request with defied header
print(response.status_code) # Print HTTP status code (200 = OK)
soup = BeautifulSoup(response.text, ‘html.parser’) # Parse HTML content
for item in soup.find_all(href=True): # Loop through all tags containing an href attribute
print(item[‘href’]) # Print the link URL
```
from bs4 import BeautifulSoup
from requests import get
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.75 Safari/537.36"}
response = get("https://en.wikipedia.org/wiki/Main_Page", headers=headers)
print(response.status_code)
soup = BeautifulSoup(response.text, 'html.parser')
for item in soup.find_all(href=True):
    print(item['href'])
```
Output
```
href="/w/load.php?lang=en&amp;modules=codex-search-styles%7Cext.cite.styles%7Cext.uls.interlanguage%7Cext.visualEditor.desktopArticleTarget.noscript%7Cext.wikimediaBadges%7Cjquery.makeCollapsible.styles%7Cskins.vector.icons%2Cstyles%7Cwikibase.client.init&amp;only=styles&amp;skin=vector-2022"
href="/w/load.php?lang=en&amp;modules=ext.gadget.SubtleUpdatemarker%2CWatchlistGreenIndicators&amp;only=styles&amp;skin=vector-2022"
href="/w/load.php?lang=en&amp;modules=site.styles&amp;only=styles&amp;skin=vector-2022"
href="//upload.wikimedia.org"
href="//en.m.wikipedia.org/wiki/Malware"
href="/w/index.php?title=Malware&amp;action=edit"
href="/static/apple-touch/wikipedia.png"
href="/static/favicon/wikipedia.ico"
href="/w/opensearch_desc.php"
href="//en.wikipedia.org/w/api.php?action=rsd"
href="https://en.wikipedia.org/wiki/Malware"
href="https://creativecommons.org/licenses/by-sa/4.0/deed.en"
href="/w/index.php?title=Special:RecentChanges&amp;feed=atom"
href="//meta.wikimedia.org"
href="//login.wikimedia.org"
...
...
...
```
Automated Web Scraping (Example #2)

The pandas module is good for getting all tables within a page, similar to the previous example, this method of scraping is limited, it works great with static content, but you cannot get dynamic content or a screenshot of the website using this method

Install pandas and lxml using the pip command

# bash /Applications/Python*/Install\ Certificates.command # macOS command to install SSL certificates if needed
import pandas as pd # Import pandas for data handling and HTML table parsing
import ssl # Import SSL module to handle HTTPS settings
ssl._create_default_https_context = ssl._create_unverified_context # Disable SSL certificate verification (useful when encountering certificate errors)
tables = pd.read_html(“https://goblackbears.com/sports/baseball/stats”) # Read all HTML tables from the given URL into a list of DataFrames
for i, table in enumerate(tables): # Loop through each table with its index
print(“Table %s\n” % i, table.head()) # Print table index and first 5 rows
```
import pandas as pd
tables = pd.read_html("https://goblackbears.com/sports/baseball/stats")
for i, table in enumerate(tables):
    print("Table %s\n" % i,table.head())
```
Output
```
Table 0
     0                                                  1
0 NaN  This article has multiple issues. Please help ...
1 NaN  This article needs to be updated. Please help ...
2 NaN  This article needs additional citations for ve...
Table 1
     0                                                  1
0 NaN  This article needs to be updated. Please help ...
Table 2
     0                                                  1
0 NaN  This article needs additional citations for ve...
Table 3
      Virus  ...                                              Notes
0     1260  ...   First virus family to use polymorphic encryption
1       4K  ...  The first known MS-DOS-file-infector to use st...
2      5lo  ...                            Infects .EXE files only
3  Abraxas  ...  Infects COM file. Disk directory listing will ...
4     Acid  ...  Infects COM file. Disk directory listing will ...

[5 rows x 9 columns]
Table 4
      vteMalware topics                                vteMalware topics.1
0   Infectious malware  Comparison of computer viruses Computer virus ...
1          Concealment  Backdoor Clickjacking Man-in-the-browser Man-i...
2   Malware for profit  Adware Botnet Crimeware Fleeceware Form grabbi...
3  By operating system  Android malware Classic Mac OS viruses iOS mal...
4           Protection  Anti-keylogger Antivirus software Browser secu...
```
Automated Web Scraping (Example #3)

One of the best web scraping techniques is using a headless browser, which means running a browser that runs without a graphical user interface (GUI). This was originally used for automated quality assurance tests but has recently been used for scraping. The main two benefits of using the headless browser is rendering dynamic content and behaving like a human browsing a website.

The following scripts will not run on Google Colab

Scrape using Firefox (with geckodriver setup)
1. Install the latest Firefox version
2. Install selenium using the pip command
3. Download the geckodriver from here (The Firefox application version has to match the webdriver version)
4. Extract the geckodriver and note the location (E.g., /scrape/geckodriver)
from selenium import webdriver # Import Selenium WebDriver
options = webdriver.firefox.options.Options() # Create Firefox options object
options.add_argument(“–headless”) # Run Firefox in headless mode (no GUI)
service = webdriver.firefox.service.Service(r’path to the geckodriver’) # Specify the local path to geckodriver executable
browser = webdriver.Firefox(options=options, service=service) # Launch Firefox with the specified options
browser.get(‘https://www.google.com’) # Open Google homepage
# print(browser.find_element(By.XPATH, “/html/body”).text) # (Optional) Print the full page text
browser.save_screenshot(“screenshot_using_firefox.png”) # Save a screenshot of the loaded page
browser.close() # Close the browser window
browser.quit()
```
from selenium import webdriver
options = webdriver.firefox.options.Options()
options.add_argument("--headless")
service = webdriver.firefox.service.Service(r'path to the geckodriver')
browser = webdriver.Firefox(options=options, service=service)
browser.get('https://www.google.com')
#print(browser.find_element(By.XPATH, "/html/body").text)
browser.save_screenshot("screenshot_using_firefox.png")
browser.close()
browser.quit()
```
Scrape using Firefox (without geckodriver setup)
1. Install the latest Firefox version
2. Install selenium and webdriver-manager using the pip command
from selenium import webdriver # Import Selenium WebDriver
from webdriver_manager.firefox import GeckoDriverManager # Automatically download/manage GeckoDriver
options = webdriver.firefox.options.Options() # Create Firefox options object
options.add_argument(“–headless”) # Run Firefox in headless (no GUI) mode
service = webdriver.firefox.service.Service(GeckoDriverManager().install()) # Set up GeckoDriver service
browser = webdriver.Firefox(options=options, service=service) # Launch Firefox with specified options
browser.get(‘https://www.google.com’) # Open Google homepage
# print(browser.find_element(By.XPATH, “/html/body”).text) # (Optional) Print full page text
browser.save_screenshot(“screenshot_using_firefox.png”) # Capture a screenshot of the page
browser.close() # Close the browser window
browser.quit()
```
from selenium import webdriver
from webdriver_manager.firefox import GeckoDriverManager
options = webdriver.firefox.options.Options()
options.add_argument("--headless")
service = webdriver.firefox.service.Service(GeckoDriverManager().install())
browser = webdriver.Firefox(options=options, service=service)
browser.get('https://www.google.com')
#print(browser.find_element(By.XPATH, "/html/body").text)
browser.save_screenshot("screenshot_using_firefox.png")
browser.close()
browser.quit()
```
Scrape using Chrome (with chromedriver setup)
1. Install the latest Chrome version
2. Install selenium using the pip command
3. Download the ChromeDriver from here (The chrome web browser version has to match the webdriver version)
4. Extract the ChromeDriver and note the location (E.g., /scrape/chromedriver)
from selenium import webdriver # Import Selenium WebDriver
options = webdriver.chrome.options.Options() # Create Chrome options object
options.add_argument(‘–headless’) # Run Chrome in headless (no GUI) mode
options.add_argument(‘–no-sandbox’) # Disable sandbox (required in containers/VMs)
options.add_argument(‘–disable-dev-shm-usage’) # Prevent shared memory issues
service = webdriver.chrome.service.Service(r’path to the chromedriver’) # Specify the local path to chromedriver
browser = webdriver.Chrome(options=options, service=service) # Launch Chrome with specified options
browser.get(‘https://www.google.com’) # Open Google homepage
browser.save_screenshot(“screenshot_using_chrome.png”) # Take a screenshot of the loaded page
browser.close() # Close the browser window
browser.quit()
```
from selenium import webdriver
options = webdriver.chrome.options.Options()
options.add_argument('--headless')
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')
service = webdriver.chrome.service.Service(r'path to the chromedriver')
browser = webdriver.Chrome(options=options, service=service)
browser.get('https://www.google.com')
#print(browser.find_element(By.XPATH, "/html/body").text)
browser.save_screenshot("screenshot_using_chrome.png")
browser.close()
browser.quit()
```
Scrape using Chrome (without chromedriver setup)
1. Install the latest Chrome version
2. Install selenium and webdriver-manager using the pip command
from selenium import webdriver # Import Selenium WebDriver
from webdriver_manager.chrome import ChromeDriverManager # Automatically download/manage ChromeDriver
options = webdriver.chrome.options.Options() # Create Chrome options object
options.add_argument(‘–headless’) # Run Chrome in headless (no GUI) mode
options.add_argument(‘–no-sandbox’) # Disable sandbox (required in some environments)
options.add_argument(‘–disable-dev-shm-usage’) # Avoid shared memory issues in containers
service = webdriver.chrome.service.Service(ChromeDriverManager().install()) # Set up ChromeDriver service
browser = webdriver.Chrome(options=options, service=service) # Launch Chrome with specified options
browser.get(‘https://www.google.com’) # Open Google homepage
browser.save_screenshot(“screenshot_using_chrome.png”) # Capture a screenshot of the page
browser.close() # Close the browser
browser.quit()
```
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
options = webdriver.chrome.options.Options()
options.add_argument('--headless')
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')
service = webdriver.chrome.service.Service(ChromeDriverManager().install())
browser = webdriver.Chrome(options=options, service=service)
browser.get('https://www.google.com')
#print(browser.find_element(By.XPATH, "/html/body").text)
browser.save_screenshot("screenshot_using_chrome.png")
browser.close()
browser.quit()
```
Automated Web Scraping (Example #4 – Best Option)

You can run this one in google colab

Install latest chrome version

!apt update # Update the package list from repositories
!apt install libu2f-udev libvulkan1 # Install dependencies required by Google Chrome
!wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb # Download the Google Chrome .deb package
!dpkg -i google-chrome-stable_current_amd64.deb # Install the Chrome package manually
!apt –fix-broken install # Fix missing dependencies caused by dpkg install
!pip install selenium webdriver-manager # Install Selenium and Chrome driver manager via pip
```
!apt update
!apt install libu2f-udev libvulkan1
!wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
!dpkg -i google-chrome-stable_current_amd64.deb
!apt --fix-broken install 
!pip install selenium webdriver-manager
```
Scrape the website

from selenium import webdriver # Import Selenium WebDriver
from webdriver_manager.chrome import ChromeDriverManager # Automatically manage ChromeDriver
from selenium.webdriver.common.by import By # Import locator strategies (e.g., XPATH)
options = webdriver.chrome.options.Options() # Create Chrome options object
options.add_argument(‘–headless’) # Run Chrome without a visible window
options.add_argument(‘–no-sandbox’) # Disable sandbox (needed in containers/Colab)
options.add_argument(‘–disable-dev-shm-usage’) # Prevent shared memory issues
service = webdriver.chrome.service.Service(ChromeDriverManager().install()) # Install and configure ChromeDriver service
browser = webdriver.Chrome(options=options, service=service) # Launch Chrome with defined options
browser.get(‘https://www.google.com’) # Open Google homepage
# print(browser.find_element(By.XPATH, “/html/body”).text) # (Optional) Print page text using XPath
browser.save_screenshot(“screenshot_using_chrome.png”) # Save a screenshot of the loaded page
browser.close() # Close the browser window
browser.quit()
```
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.by import By 
options = webdriver.chrome.options.Options()
options.add_argument('--headless')
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')
service = webdriver.chrome.service.Service(ChromeDriverManager().install())
browser = webdriver.Chrome(options=options, service=service)
browser.get('https://www.google.com')
#print(browser.find_element(By.XPATH, "/html/body").text)
browser.save_screenshot("screenshot_using_chrome.png")
browser.close()
browser.quit()
```
If you want to wait until a website loads, you can use the sleep function

from selenium import webdriver # Import Selenium WebDriver
from webdriver_manager.chrome import ChromeDriverManager # Automatically manage ChromeDriver
from selenium.webdriver.common.by import By # Import locator strategies (e.g., XPATH)
from time import sleep # Import sleep function
options = webdriver.chrome.options.Options() # Create Chrome options object
options.add_argument(‘–headless’) # Run Chrome without a visible window
options.add_argument(‘–no-sandbox’) # Disable sandbox (needed in containers/Colab)
options.add_argument(‘–disable-dev-shm-usage’) # Prevent shared memory issues
service = webdriver.chrome.service.Service(ChromeDriverManager().install()) # Install and configure ChromeDriver service
browser = webdriver.Chrome(options=options, service=service) # Launch Chrome with defined options
browser.get(‘https://us.shop.battle.net/en-us’) # Open battle homepage
sleep(10) # Wait 10 seconds
# print(browser.find_element(By.XPATH, “/html/body”).text) # (Optional) Print page text using XPath
browser.save_screenshot(“screenshot_using_chrome.png”) # Save a screenshot of the loaded page
browser.close() # Close the browser window
browser.quit()
```
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.by import By 
from time import sleep
options = webdriver.chrome.options.Options()
options.add_argument('--headless')
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')
service = webdriver.chrome.service.Service(ChromeDriverManager().install())
browser = webdriver.Chrome(options=options, service=service)
browser.get('https://us.shop.battle.net/en-us')
sleep(10)
#print(browser.find_element(By.XPATH, "/html/body").text)
browser.save_screenshot("screenshot_using_chrome.png")
browser.close()
browser.quit()
```
April 5, 2026
TinyDB
TinyDB

A document-oriented database written in pure Python, you will need to download and install it using the pip command

Install

pip # Python’s package manager
install # A command to download and install libraries from PyPI (Python Package Index
tinydb # a lightweight Python NoSQL database library
```
pip install tinydb
```
Create a Database

The TinyDB() function is used to connect to the local database or create a new one if the file does not exist

from tinydb import TinyDB # Import the TinyDB class from the tinydb module
db = TinyDB(‘database.json’) # Create (or open) a TinyDB database stored in a JSON file named ‘database.json’, if the file doesn’t exist, TinyDB will create it automatically
```
from tinydb import TinyDB
db = TinyDB('database.json')
```
List All Tables

You can list all tables using the .table() method, you do need to have data inside the table, otherwise it won’t be shown

from tinydb import TinyDB # Import the TinyDB class from the tinydb module
db = TinyDB(‘database.json’) # Create (or open) a TinyDB database stored in a JSON file named ‘database.json’, if the file doesn’t exist, TinyDB will create it automatically
db.tables() # List all tables in the TinyDB database
```
from tinydb import TinyDB
db = TinyDB('database.json')
db.tables()
```
Output
```
{'_default'}
```
Create a Table

Tinydb supports tables (You do not need to use them), to create a table use the .table() method

from tinydb import TinyDB # Import the TinyDB class from the tinydb module
db = TinyDB(‘database.json’) # Create (or open) a TinyDB database stored in a JSON file named ‘database.json’, if the file doesn’t exist, TinyDB will create it automatically
table = db.table(‘users’) # Access (or create if it doesn’t exist) a table named ‘users’ in the TinyDB database
```
from tinydb import TinyDB
db = TinyDB('database.json')
table = db.table('users')
```
Delete Table

You can delete all the data within a database using the .drop_table() method

from tinydb import TinyDB # Import the TinyDB class from the tinydb module
db = TinyDB(‘database.json’) # Create (or open) a TinyDB database stored in a JSON file named ‘database.json’, if the file doesn’t exist, TinyDB will create it automatically
db.drop_table(‘users’) # Delete the entire table named ‘users’ from the TinyDB database
print(db.tables()) # Show all tables
```
from tinydb import TinyDB
db = TinyDB('database.json')
db.drop_table('users')
print(db.tables())
```
Output
```
{'_default'}
```
Insert Data

To add new data, use the .insert() method

from tinydb import TinyDB # Import the TinyDB class from the tinydb module
db = TinyDB(‘database.json’) # Create (or open) a TinyDB database stored in a JSON file named ‘database.json’, if the file doesn’t exist, TinyDB will create it automatically
db.drop_table(‘users’) # Delete the entire table named ‘users’ from the TinyDB database
table = db.table(‘users’) # Access (or create if it doesn’t exist) a table named ‘users’ in the TinyDB database
table.insert({“id”: 1,”user”: “john”,”hash”: “e66860546f18”}) # Insert a new record (dictionary) into the ‘users’ table
table.insert({“id”: 2,”user”: “jane”,”hash”: “cdbbcd86b35e”, “car”:”ford”}) # Insert a new record (dictionary) into the ‘users’ table
```
from tinydb import TinyDB
db = TinyDB('database.json')
db.drop_table('users')
table = db.table('users')
table.insert({"id": 1,"user": "john","hash": "e66860546f18"})
table.insert({"id": 2,"user": "jane","hash": "cdbbcd86b35e", "car":"ford"})
```
Output
Fetching Results

To fetch items from the database, use the .all() method

from tinydb import TinyDB # Import the TinyDB class from the tinydb module
db = TinyDB(‘database.json’) # Create (or open) a TinyDB database stored in a JSON file named ‘database.json’, if the file doesn’t exist, TinyDB will create it automatically
db.drop_table(‘users’) # Delete the entire table named ‘users’ from the TinyDB database
table = db.table(‘users’) # Access (or create if it doesn’t exist) a table named ‘users’ in the TinyDB database
table.insert({“id”: 1,”user”: “john”,”hash”: “e66860546f18”}) # Insert a new record (dictionary) into the ‘users’ table
table.insert({“id”: 2,”user”: “jane”,”hash”: “cdbbcd86b35e”, “car”:”ford”}) # Insert a new record (dictionary) into the ‘users’ table
print(table.all()) # Retrieve and print all records from the ‘users’ table
```
from tinydb import TinyDB
db = TinyDB('database.json')
db.drop_table('users')
table = db.table('users')
table.insert({"id": 1,"user": "john","hash": "e66860546f18"})
table.insert({"id": 2,"user": "jane","hash": "cdbbcd86b35e", "car":"ford"})
print(table.all())
```
Output
```
[{'id': 1, 'user': 'john', 'hash': 'e66860546f18'}, {'id': 2, 'user': 'jane', 'hash': 'cdbbcd86b35e', 'car': 'ford'}]
```
Find Data

You can fetch a specific data using the .search() method

from tinydb import TinyDB # Import the TinyDB class from the tinydb module
db = TinyDB(‘database.json’) # Create (or open) a TinyDB database stored in a JSON file named ‘database.json’, if the file doesn’t exist, TinyDB will create it automatically
db.drop_table(‘users’) # Delete the entire table named ‘users’ from the TinyDB database
table = db.table(‘users’) # Access (or create if it doesn’t exist) a table named ‘users’ in the TinyDB database
table.insert({“id”: 1,”user”: “john”,”hash”: “e66860546f18”}) # Insert a new record (dictionary) into the ‘users’ table
table.insert({“id”: 2,”user”: “jane”,”hash”: “cdbbcd86b35e”, “car”:”ford”}) # Insert a new record (dictionary) into the ‘users’ table
results = table.search(where(‘user’) == ‘jane’) # Search the ‘users’ table for all records where the ‘user’ field equals ‘jane’
print(results) # Print the list of matching records
```
from tinydb import TinyDB, where
db = TinyDB('database.json')
db.drop_table('users')
table = db.table('users')
table.insert({"id": 1,"user": "john","hash": "e66860546f18"})
table.insert({"id": 2,"user": "jane","hash": "cdbbcd86b35e", "car":"ford"})
results = table.search(where('user') == 'jane')
print(results)
```
Output
```
[{'id': 2, 'user': 'jane', 'hash': 'cdbbcd86b35e', 'car': 'ford'}]
```
Update Data

You can update data by using the .update() method

from tinydb import TinyDB # Import the TinyDB class from the tinydb module
db = TinyDB(‘database.json’) # Create (or open) a TinyDB database stored in a JSON file named ‘database.json’, if the file doesn’t exist, TinyDB will create it automatically
db.drop_table(‘users’) # Delete the entire table named ‘users’ from the TinyDB database
table = db.table(‘users’) # Access (or create if it doesn’t exist) a table named ‘users’ in the TinyDB database
table.insert({“id”: 1,”user”: “john”,”hash”: “e66860546f18”}) # Insert a new record (dictionary) into the ‘users’ table
table.insert({“id”: 2,”user”: “jane”,”hash”: “cdbbcd86b35e”, “car”:”ford”}) # Insert a new record (dictionary) into the ‘users’ table
table.update({‘car’: ‘jeep’}, where(‘user’) == ‘jane’) # Update all records in the ‘users’ table where ‘user’ is ‘jane’, change the field ‘car’ with value ‘jeep’
print(table.all()) # Retrieve and print all records from the ‘users’ table
```
from tinydb import TinyDB, where
db = TinyDB('database.json')
db.drop_table('users')
table = db.table('users')
table.insert({"id": 1,"user": "john","hash": "e66860546f18"})
table.insert({"id": 2,"user": "jane","hash": "cdbbcd86b35e", "car":"ford"})
table.update({'car': 'jeep'}, where('user') == 'jane')
print(table.all())
```
Output
```
[{'id': 1, 'user': 'john', 'hash': 'e66860546f18'}, {'id': 2, 'user': 'jane', 'hash': 'cdbbcd86b35e', 'car': 'jeep'}]
```
Delete Specific Data

You can delete data by using the .remove() method

from tinydb import TinyDB # Import the TinyDB class from the tinydb module
db = TinyDB(‘database.json’) # Create (or open) a TinyDB database stored in a JSON file named ‘database.json’, if the file doesn’t exist, TinyDB will create it automatically
db.drop_table(‘users’) # Delete the entire table named ‘users’ from the TinyDB database
table = db.table(‘users’) # Access (or create if it doesn’t exist) a table named ‘users’ in the TinyDB database
table.insert({“id”: 1,”user”: “john”,”hash”: “e66860546f18”}) # Insert a new record (dictionary) into the ‘users’ table
table.insert({“id”: 2,”user”: “jane”,”hash”: “cdbbcd86b35e”, “car”:”ford”}) # Insert a new record (dictionary) into the ‘users’ table
table.remove(where(‘user’) == ‘jane’ # Remove all records in the ‘users’ table where ‘user’ is ‘jane’
print(table.all()) # Retrieve and print all records from the ‘users’ table
```
from tinydb import TinyDB, where
db = TinyDB('database.json')
db.drop_table('users')
table = db.table('users')
table.insert({"id": 1,"user": "john","hash": "e66860546f18"})
table.insert({"id": 2,"user": "jane","hash": "cdbbcd86b35e", "car":"ford"})
table.remove(where('user') == 'jane')
print(table.all())
```
Output
```
[{'id': 1, 'user': 'john', 'hash': 'e66860546f18'}]
```
Delete All Data

You can delete all the data within a database using the .drop_table() method

from tinydb import TinyDB # Import the TinyDB class from the tinydb module
db = TinyDB(‘database.json’) # Create (or open) a TinyDB database stored in a JSON file named ‘database.json’, if the file doesn’t exist, TinyDB will create it automatically
db.drop_table(‘users’) # Delete the entire table named ‘users’ from the TinyDB database
print(db.tables()) # Retrieve and print all tables
```
from tinydb import TinyDB
db = TinyDB('database.json')
db.drop_table('users')
print(db.tables())
```
Output
```
{'_default'}
```
User Input (NoSQL Injection)

A threat actor can construct a malicious query and use it to perform an authorized action

rom tinydb import TinyDB # Import the TinyDB class from the tinydb module
temp_user = input(“Enter username: “) # Prompt the user to enter a username
temp_hash = input(“Enter password: “) # Prompt the user to enter a password (Usually, there will be a function to hash the password, it’s removed from here)
db = TinyDB(‘database.json’) # Create (or open) a TinyDB database stored in a JSON file named ‘database.json’, if the file doesn’t exist, TinyDB will create it automatically
db.drop_table(‘users’) # Delete the entire table named ‘users’ from the TinyDB database
table = db.table(‘users’) # Access (or create if it doesn’t exist) a table named ‘users’ in the TinyDB database
table.insert({“id”: 1,”user”: “john”,”hash”: “e66860546f18”}) # Insert a new record (dictionary) into the ‘users’ table
table.insert({“id”: 2,”user”: “jane”,”hash”: “cdbbcd86b35e”, “car”:”ford”}) # Insert a new record (dictionary) into the ‘users’ table
if len(temp_hash) == 12: # Check if hash value length is 12
results = table.search(Query().user.search(temp_user) & Query().hash.search(temp_hash)) # Search the table for records where the ‘user’ field matches temp_user and the ‘hash’ field matches temp_hash using regex search
print(results) # Print all results
```
from tinydb import TinyDB, Query
temp_user = input("Enter username: ")
temp_hash = input("Enter password: ")
db = TinyDB('database.json')
db.drop_table('users')
table = db.table('users')
table.insert({"id": 1,"user": "john","hash": "e66860546f18"})
table.insert({"id": 2,"user": "jane","hash": "cdbbcd86b35e", "car":"ford"})
if len(temp_hash) == 12:
    results = table.search(Query().user.search(temp_user) & Query().hash.search(temp_hash))
    print(results)
```
Malicious statement

If a user enters [a-zA-Z0-9]+ for the username and any password, it will pass the length check, then the users john and jane will be triggered by the regex pattern (When TinyDB evaluates Query().user.search(temp_user), it’s not searching literally for [a-zA-Z0-9]+, Instead, it treats that as a regex pattern, which will match any username composed of letters/numbers.)
```
[a-zA-Z0-9]+ detects on john -> True, retrieve this user
[a-zA-Z0-9]+ detects on jane -> True, retrieve this user
```
Output
```
[{'id': 1, 'user': 'john', 'hash': 'e66860546f18'}, {'id': 2, 'user': 'jane', 'hash': 'cdbbcd86b35e', 'car': 'ford'}]
```
April 5, 2026
SQLite
SQLite3

SQLite is a lightweight disk-based database library written in C. You can use the SQLite3 binary directly from the command line interface after installing it or the SQLite3 Python module that’s built-in.

Command-Line Interface
```
sqlite>
```
Python
```
import sqlite3
```
Create a Database

The .connect()method is used to connect to the local database or create a new one if the file does not exist

Command-Line Interface

sqlite> .open database.db # Open (or create if it doesn’t exist) a SQLite database file named ‘database.db’
sqlite> .quit # Exit the SQLite command-line interface
```
sqlite> .open database.db
sqlite> .quit
```
Python

from sqlite3 import connect # Import the connect function from sqlite3 to interact with SQLite databases
from contextlib import closing # Import closing from contextlib to ensure the connection is properly closed

with closing(connect(“database.db”,isolation_level=None)) as conn: # Use a context manager to automatically close the database connection when done
pass # ‘pass’ is just a placeholder; replace with actual DB operations
```
from sqlite3 import connect
from contextlib import closing

with closing(connect("database.db",isolation_level=None)) as conn: 
    pass
```
Drop a Table

To drop a table, use the DROP TABLE keyword and table name,

Command-Line Interface

sqlite> .open database.db # Open (or create if it doesn’t exist) a SQLite database file named ‘database.db’
sqlite> DROP TABLE IF EXISTS test; # Delete the table named ‘test’ if it exists
sqlite> .quit # Exit the SQLite command-line interface
```
sqlite> .open database.db
sqlite> DROP TABLE IF EXISTS test;
sqlite> .quit
```
Python

from sqlite3 import connect # Import the connect function from sqlite3 to interact with SQLite databases
from contextlib import closing # Import closing from contextlib to ensure the connection is properly closed

with closing(connect(“database.db”,isolation_level=None)) as conn: # Use a context manager to automatically close the database connection when done
conn.execute(“DROP TABLE IF EXISTS users”) # Delete the table named ‘test’ if it exists
```
from sqlite3 import connect
from contextlib import closing

with closing(connect("database.db",isolation_level=None)) as conn:
    conn.execute("DROP TABLE IF EXISTS users")
```
Create a Table

To create a table, use the CREATE TABLE keyword and table name, you also need to define the table columns and their types or properties

Command-Line Interface

sqlite> .open database.db # Open (or create if it doesn’t exist) a SQLite database file named ‘database.db’
sqlite> DROP TABLE IF EXISTS test; # Delete the table named ‘test’ if it exists
sqlite> CREATE TABLE IF NOT EXISTS users (id integer, user text, hash text); # Create a table named ‘users’ if it doesn’t already exist, column ‘id’: stores a numeric identifier for each user, column ‘user’: stores the username as text, column ‘hash’: stores the password hash as text
sqlite> .quit # Exit the SQLite command-line interface
```
sqlite> .open database.db
sqlite> DROP TABLE IF EXISTS users;
sqlite> CREATE TABLE IF NOT EXISTS users (id integer, user text, hash text);
sqlite> .quit
```
Python

from sqlite3 import connect # Import the connect function from sqlite3 to interact with SQLite databases
from contextlib import closing # Import closing from contextlib to ensure the connection is properly closed

with closing(connect(“database.db”,isolation_level=None)) as conn: # Use a context manager to automatically close the database connection when done
conn.execute(“DROP TABLE IF EXISTS users”) # Delete the table named ‘test’ if it exists
conn.execute(“CREATE TABLE IF NOT EXISTS users (id integer, user text, hash text)”) # Create a table named ‘users’ if it doesn’t already exist, column ‘id’: stores a numeric identifier for each user, column ‘user’: stores the username as text, column ‘hash’: stores the password hash as text
```
from sqlite3 import connect
from contextlib import closing

with closing(connect("database.db",isolation_level=None)) as conn:
    conn.execute("DROP TABLE IF EXISTS users")
    conn.execute("CREATE TABLE IF NOT EXISTS users (id integer, user text, hash text)")
```
List All Tables

To review all tables in a database, you can get the users table from sqlite_master using the SELECT keyword

Command-Line Interface

sqlite> .open database.db # Open (or create if it doesn’t exist) a SQLite database file named ‘database.db’
sqlite> DROP TABLE IF EXISTS test; # Delete the table named ‘test’ if it exists
sqlite> CREATE TABLE IF NOT EXISTS users (id integer, user text, hash text); # Create a table named ‘users’ if it doesn’t already exist, column ‘id’: stores a numeric identifier for each user, column ‘user’: stores the username as text, column ‘hash’: stores the password hash as text
sqlite> SELECT name FROM sqlite_master WHERE type=’table’; #Query the SQLite system table ‘sqlite_master’ to list all tables in the database
sqlite> .quit # Exit the SQLite command-line interface
```
sqlite> .open database.db
sqlite> DROP TABLE IF EXISTS users;
sqlite> CREATE TABLE IF NOT EXISTS users (id integer, user text, hash text);
sqlite> SELECT name FROM sqlite_master WHERE type='table';
sqlite> .quit
```
Python

from sqlite3 import connect # Import the connect function from sqlite3 to interact with SQLite databases
from contextlib import closing # Import closing from contextlib to ensure the connection is properly closed

with closing(connect(“database.db”,isolation_level=None)) as conn: # Use a context manager to automatically close the database connection when done
conn.execute(“DROP TABLE IF EXISTS users”) # Delete the table named ‘test’ if it exists
conn.execute(“CREATE TABLE IF NOT EXISTS users (id integer, user text, hash text)”) # Create a table named ‘users’ if it doesn’t already exist, column ‘id’: stores a numeric identifier for each user, column ‘user’: stores the username as text, column ‘hash’: stores the password hash as text
print(conn.execute(“SELECT name FROM sqlite_master WHERE type=’table’”).fetchall()) #Query the SQLite system table ‘sqlite_master’ to list all tables in the database
```
from sqlite3 import connect
from contextlib import closing

with closing(connect("database.db",isolation_level=None)) as conn:
    conn.execute("DROP TABLE IF EXISTS users")
    conn.execute("CREATE TABLE IF NOT EXISTS users (id integer, user text, hash text)")
    print(conn.execute("SELECT name FROM sqlite_master WHERE type='table'").fetchall())
```
Insert Into a Table

To add new data, use the INSERT keyword (Always parameterized, you do not want to create SQL injection)

Command-Line Interface

sqlite> .open database.db # Open (or create if it doesn’t exist) a SQLite database file named ‘database.db’
sqlite> DROP TABLE IF EXISTS test; # Delete the table named ‘test’ if it exists
sqlite> CREATE TABLE IF NOT EXISTS users (id integer, user text, hash text); # Create a table named ‘users’ if it doesn’t already exist, column ‘id’: stores a numeric identifier for each user, column ‘user’: stores the username as text, column ‘hash’: stores the password hash as text
sqlite> INSERT into users(id ,user, hash) values(1, “john”, “e66860546f18”); # Insert a new row into the ‘users’ table
sqlite> INSERT into users(id, user, hash) values(2, “jane”, “cdbbcd86b35e”); # Insert a new row into the ‘users’ table
sqlite> .quit # Exit the SQLite command-line interface
```
sqlite> .open database.db
sqlite> DROP TABLE IF EXISTS users;
sqlite> CREATE TABLE IF NOT EXISTS users (id integer, user text, hash text);
sqlite> INSERT into users(id ,user, hash) values(1, "john", "e66860546f18");
sqlite> INSERT into users(id, user, hash) values(2, "jane", "cdbbcd86b35e");
sqlite> .quit
```
Python

from sqlite3 import connect # Import the connect function from sqlite3 to interact with SQLite databases
from contextlib import closing # Import closing from contextlib to ensure the connection is properly closed

with closing(connect(“database.db”,isolation_level=None)) as conn: # Use a context manager to automatically close the database connection when done
conn.execute(“DROP TABLE IF EXISTS users”) # Delete the table named ‘test’ if it exists
conn.execute(“CREATE TABLE IF NOT EXISTS users (id integer, user text, hash text)”) # Create a table named ‘users’ if it doesn’t already exist, column ‘id’: stores a numeric identifier for each user, column ‘user’: stores the username as text, column ‘hash’: stores the password hash as text
conn.execute(“INSERT into users(id ,user, hash) values(?,?, ?)”, (1,”john”, “e66860546f18”)) # Insert a new row into the ‘users’ table
conn.execute(“INSERT into users(id, user, hash) values(?,?, ?)”, (2,”jane”, “cdbbcd86b35e”)) # Insert a new row into the ‘users’ table
```
from sqlite3 import connect
from contextlib import closing

with closing(connect("database.db",isolation_level=None)) as conn:
    conn.execute("DROP TABLE IF EXISTS users")
    conn.execute("CREATE TABLE IF NOT EXISTS users (id integer, user text, hash text)")
    conn.execute("INSERT into users(id ,user, hash) values(?,?, ?)", (1,"john", "e66860546f18"))
    conn.execute("INSERT into users(id, user, hash) values(?,?, ?)", (2,"jane", "cdbbcd86b35e"))
```
Fetching Results

To all results from the database, use the SELECT keyword and .fetchall() or use can fetch one result the SELECT keyword and .fetchone()

Command-Line Interface

sqlite> .open database.db # Open (or create if it doesn’t exist) a SQLite database file named ‘database.db’
sqlite> DROP TABLE IF EXISTS test; # Delete the table named ‘test’ if it exists
sqlite> CREATE TABLE IF NOT EXISTS users (id integer, user text, hash text); # Create a table named ‘users’ if it doesn’t already exist, column ‘id’: stores a numeric identifier for each user, column ‘user’: stores the username as text, column ‘hash’: stores the password hash as text
sqlite> INSERT into users(id ,user, hash) values(1, “john”, “e66860546f18”); # Insert a new row into the ‘users’ table
sqlite> INSERT into users(id, user, hash) values(2, “jane”, “cdbbcd86b35e”); # Insert a new row into the ‘users’ table
sqlite> SELECT * FROM users; # Select all columns and all rows from the ‘users’ table
sqlite> .quit # Exit the SQLite command-line interface
```
sqlite> .open database.db
sqlite> DROP TABLE IF EXISTS users;
sqlite> CREATE TABLE IF NOT EXISTS users (id integer, user text, hash text);
sqlite> INSERT into users(id ,user, hash) values(1, "john", "e66860546f18");
sqlite> INSERT into users(id, user, hash) values(2, "jane", "cdbbcd86b35e");
sqlite> SELECT * FROM users;
sqlite> .quit
```
Python

from sqlite3 import connect # Import the connect function from sqlite3 to interact with SQLite databases
from contextlib import closing # Import closing from contextlib to ensure the connection is properly closed

with closing(connect(“database.db”,isolation_level=None)) as conn: # Use a context manager to automatically close the database connection when done
conn.execute(“DROP TABLE IF EXISTS users”) # Delete the table named ‘test’ if it exists
conn.execute(“CREATE TABLE IF NOT EXISTS users (id integer, user text, hash text)”) # Create a table named ‘users’ if it doesn’t already exist, column ‘id’: stores a numeric identifier for each user, column ‘user’: stores the username as text, column ‘hash’: stores the password hash as text
conn.execute(“INSERT into users(id ,user, hash) values(?,?, ?)”, (1,”john”, “e66860546f18”)) # Insert a new row into the ‘users’ table
conn.execute(“INSERT into users(id, user, hash) values(?,?, ?)”, (2,”jane”, “cdbbcd86b35e”)) # Insert a new row into the ‘users’ table
print(conn.execute(“SELECT * FROM users”).fetchall()) # Select all columns and all rows from the ‘users’ table
```
from sqlite3 import connect
from contextlib import closing

with closing(connect("database.db",isolation_level=None)) as conn:
    conn.execute("DROP TABLE IF EXISTS users")
    conn.execute("CREATE TABLE IF NOT EXISTS users (id integer, user text, hash text)")
    conn.execute("INSERT into users(id ,user, hash) values(?,?, ?)", (1,"john", "e66860546f18"))
    conn.execute("INSERT into users(id, user, hash) values(?,?, ?)", (2,"jane", "cdbbcd86b35e"))
    print(conn.execute("SELECT * FROM users").fetchall())
```
Output
```
[(1, 'john', 'e66860546f18'), (2, 'jane', 'cdbbcd86b35e')]
```
Find Data

You can fetch a specific data using the WHERE keyword

Command-Line Interface

sqlite> .open database.db # Open (or create if it doesn’t exist) a SQLite database file named ‘database.db’
sqlite> DROP TABLE IF EXISTS test; # Delete the table named ‘test’ if it exists
sqlite> CREATE TABLE IF NOT EXISTS users (id integer, user text, hash text); # Create a table named ‘users’ if it doesn’t already exist, column ‘id’: stores a numeric identifier for each user, column ‘user’: stores the username as text, column ‘hash’: stores the password hash as text
sqlite> INSERT into users(id ,user, hash) values(1, “john”, “e66860546f18”); # Insert a new row into the ‘users’ table
sqlite> INSERT into users(id, user, hash) values(2, “jane”, “cdbbcd86b35e”); # Insert a new row into the ‘users’ table
sqlite> SELECT * FROM users WHERE id=2; # Select all columns from the ‘users’ table where the user’s id is 2
sqlite> .quit # Exit the SQLite command-line interface
```
sqlite> .open database.db
sqlite> DROP TABLE IF EXISTS users;
sqlite> CREATE TABLE IF NOT EXISTS users (id integer, user text, hash text);
sqlite> INSERT into users(id ,user, hash) values(1, "john", "e66860546f18");
sqlite> INSERT into users(id, user, hash) values(2, "jane", "cdbbcd86b35e");
sqlite> SELECT * FROM users WHERE id=2;
sqlite> .quit
```
Python

from sqlite3 import connect # Import the connect function from sqlite3 to interact with SQLite databases
from contextlib import closing # Import closing from contextlib to ensure the connection is properly closed

with closing(connect(“database.db”,isolation_level=None)) as conn: # Use a context manager to automatically close the database connection when done
conn.execute(“DROP TABLE IF EXISTS users”) # Delete the table named ‘test’ if it exists
conn.execute(“CREATE TABLE IF NOT EXISTS users (id integer, user text, hash text)”) # Create a table named ‘users’ if it doesn’t already exist, column ‘id’: stores a numeric identifier for each user, column ‘user’: stores the username as text, column ‘hash’: stores the password hash as text
conn.execute(“INSERT into users(id ,user, hash) values(?,?, ?)”, (1,”john”, “e66860546f18”)) # Insert a new row into the ‘users’ table
conn.execute(“INSERT into users(id, user, hash) values(?,?, ?)”, (2,”jane”, “cdbbcd86b35e”)) # Insert a new row into the ‘users’ table
print(conn.execute(“SELECT * FROM users WHERE id=2”).fetchall()) # Select all columns and all rows from the ‘users’ table
```
from sqlite3 import connect
from contextlib import closing

with closing(connect("database.db",isolation_level=None)) as conn:
    conn.execute("DROP TABLE IF EXISTS users")
    conn.execute("CREATE TABLE IF NOT EXISTS users (id integer, user text, hash text)")
    conn.execute("INSERT into users(id ,user, hash) values(?,?, ?)", (1,"john", "e66860546f18"))
    conn.execute("INSERT into users(id, user, hash) values(?,?, ?)", (2,"jane", "cdbbcd86b35e"))
    print(conn.execute("SELECT * FROM users WHERE id=2").fetchall())
```
Output
```
(2, 'jane', 'cdbbcd86b35e')
```
Delete Data

You can delete data by using the DELETE keyword

Command-Line Interface

sqlite> .open database.db # Open (or create if it doesn’t exist) a SQLite database file named ‘database.db’
sqlite> DROP TABLE IF EXISTS test; # Delete the table named ‘test’ if it exists
sqlite> CREATE TABLE IF NOT EXISTS users (id integer, user text, hash text); # Create a table named ‘users’ if it doesn’t already exist, column ‘id’: stores a numeric identifier for each user, column ‘user’: stores the username as text, column ‘hash’: stores the password hash as text
sqlite> INSERT into users(id ,user, hash) values(1, “john”, “e66860546f18”); # Insert a new row into the ‘users’ table
sqlite> INSERT into users(id, user, hash) values(2, “jane”, “cdbbcd86b35e”); # Insert a new row into the ‘users’ table
sqlite> DELETE from users WHERE id=1; # Delete rows from the ‘users’ table where the id equals 1
sqlite> SELECT * FROM users; # Select all columns and all rows from the ‘users’ table
sqlite> .quit # Exit the SQLite command-line interface
```
sqlite> .open database.db
sqlite> DROP TABLE IF EXISTS users;
sqlite> CREATE TABLE IF NOT EXISTS users (id integer, user text, hash text);
sqlite> INSERT into users(id ,user, hash) values(1, "john", "e66860546f18");
sqlite> INSERT into users(id, user, hash) values(2, "jane", "cdbbcd86b35e");
sqlite> DELETE from users WHERE id=1
sqlite> SELECT * FROM users;
sqlite> .quit
```
Python

from sqlite3 import connect # Import the connect function from sqlite3 to interact with SQLite databases
from contextlib import closing # Import closing from contextlib to ensure the connection is properly closed

with closing(connect(“database.db”,isolation_level=None)) as conn: # Use a context manager to automatically close the database connection when done
conn.execute(“DROP TABLE IF EXISTS users”) # Delete the table named ‘test’ if it exists
conn.execute(“CREATE TABLE IF NOT EXISTS users (id integer, user text, hash text)”) # Create a table named ‘users’ if it doesn’t already exist, column ‘id’: stores a numeric identifier for each user, column ‘user’: stores the username as text, column ‘hash’: stores the password hash as text
conn.execute(“INSERT into users(id ,user, hash) values(?,?, ?)”, (1,”john”, “e66860546f18”)) # Insert a new row into the ‘users’ table
conn.execute(“INSERT into users(id, user, hash) values(?,?, ?)”, (2,”jane”, “cdbbcd86b35e”)) # Insert a new row into the ‘users’ table
conn.execute(“DELETE from users WHERE id=1”) # Delete rows from the ‘users’ table where the id equals 1
print(conn.execute(“SELECT * FROM users”).fetchall()) # Select all columns and all rows from the ‘users’ table
```
from sqlite3 import connect
from contextlib import closing

with closing(connect("database.db",isolation_level=None)) as conn:
    conn.execute("DROP TABLE IF EXISTS users")
    conn.execute("CREATE TABLE IF NOT EXISTS users (id integer, user text, hash text)")
    conn.execute("INSERT into users(id ,user, hash) values(?,?, ?)", (1,"john", "e66860546f18"))
    conn.execute("INSERT into users(id, user, hash) values(?,?, ?)", (2,"jane", "cdbbcd86b35e"))
    conn.execute("DELETE from users WHERE id=1")
    print(conn.execute("SELECT * FROM users").fetchall())
```
Output
```
[(2, 'jane', 'cdbbcd86b35e')]
```
User Input (SQL Injection)

A threat actor can construct a malicious query and use it to perform an authorized action (This happens because of format string/string concatenation)

Command-Line Interface

sqlite> .open database.db # Open (or create if it doesn’t exist) a SQLite database file named ‘database.db’
sqlite> DROP TABLE IF EXISTS test; # Delete the table named ‘test’ if it exists
sqlite> CREATE TABLE IF NOT EXISTS users (id integer, user text, hash text); # Create a table named ‘users’ if it doesn’t already exist, column ‘id’: stores a numeric identifier for each user, column ‘user’: stores the username as text, column ‘hash’: stores the password hash as text
sqlite> INSERT into users(id ,user, hash) values(1, “john”, “e66860546f18”); # Insert a new row into the ‘users’ table
sqlite> INSERT into users(id, user, hash) values(2, “jane”, “cdbbcd86b35e”); # Insert a new row into the ‘users’ table
sqlite> SELECT * FROM users WHERE user=” or ”=” AND hash=” or ”=”; # Select all columns from ‘users’ table, the WHERE clause is crafted to always be TRUE
sqlite> .quit # Exit the SQLite command-line interface
```
sqlite> .open database.db
sqlite> DROP TABLE IF EXISTS users;
sqlite> CREATE TABLE IF NOT EXISTS users (id integer, user text, hash text);
sqlite> INSERT into users(id ,user, hash) values(1, "john", "e66860546f18");
sqlite> INSERT into users(id, user, hash) values(2, "jane", "cdbbcd86b35e");
sqlite> SELECT * FROM users WHERE user='' or ''='' AND hash='' or ''='';
sqlite> .quit
```
Python

from sqlite3 import connect # Import the connect function from sqlite3 to interact with SQLite databases
from contextlib import closing # Import closing from contextlib to ensure the connection is properly closed
temp_user = input(“Enter username: “) # Prompt the user to enter a username
temp_hash = input(“Enter password: “) # Prompt the user to enter a password (Usually, there will be a function to hash the password, it’s removed from here)
with closing(connect(“database.db”,isolation_level=None)) as conn: # Use a context manager to automatically close the database connection when done
conn.execute(“DROP TABLE IF EXISTS users”) # Delete the table named ‘test’ if it exists
conn.execute(“CREATE TABLE IF NOT EXISTS users (id integer, user text, hash text)”) # Create a table named ‘users’ if it doesn’t already exist, column ‘id’: stores a numeric identifier for each user, column ‘user’: stores the username as text, column ‘hash’: stores the password hash as text
conn.execute(“INSERT into users(id ,user, hash) values(?,?, ?)”, (1,”john”, “e66860546f18”)) # Insert a new row into the ‘users’ table
conn.execute(“INSERT into users(id, user, hash) values(?,?, ?)”, (2,”jane”, “cdbbcd86b35e”)) # Insert a new row into the ‘users’ table
print(conn.execute(“SELECT * FROM users WHERE user=’%s’ AND hash=’%s’” % (temp_user,temp_hash)).fetchall()) # Execute a SQL query using string formatting to insert user-controlled values
```
from sqlite3 import connect
from contextlib import closing
temp_user = input("Enter username: ")
temp_hash = input("Enter password: ")
with closing(connect("database.db",isolation_level=None)) as conn:
    conn.execute("DROP TABLE IF EXISTS users")
    conn.execute("CREATE TABLE IF NOT EXISTS users (id integer, user text, hash text)")
    conn.execute("INSERT into users(id ,user, hash) values(?,?, ?)", (1,"john", "e66860546f18"))
    conn.execute("INSERT into users(id, user, hash) values(?,?, ?)", (2,"jane", "cdbbcd86b35e"))
    print(conn.execute("SELECT * FROM users WHERE user='%s' AND hash='%s'" % (temp_user,temp_hash)).fetchall())
```
Malicious statement

If a use enter ' or ''=' for both username and password, the
```
SELECT * FROM users WHERE user='' or ''='' AND hash='' or ''=''
```
Which will always be true, break the WHERE clause down:
```
user='' OR ''='' → FALSE OR TRUE → TRUE
hash='' OR ''='' → FALSE OR TRUE → TRUE
```
Output

The result is every row in the users table is returned, regardless of username or hash.
```
[(1, 'john', 'e66860546f18'), (2, 'jane', 'cdbbcd86b35e')]
```
User Input (Blind SQL Injection)

A threat actor can construct a malicious query and use it to perform an authorized action without getting error messages regarding the injection (This happens because of format string/string concatenation)

Command-Line Interface

sqlite> .open database.db # Open (or create if it doesn’t exist) a SQLite database file named ‘database.db’
sqlite> DROP TABLE IF EXISTS test; # Delete the table named ‘test’ if it exists
sqlite> CREATE TABLE IF NOT EXISTS users (id integer, user text, hash text); # Create a table named ‘users’ if it doesn’t already exist, column ‘id’: stores a numeric identifier for each user, column ‘user’: stores the username as text, column ‘hash’: stores the password hash as text
sqlite> INSERT into users(id ,user, hash) values(1, “john”, “e66860546f18”); # Insert a new row into the ‘users’ table
sqlite> INSERT into users(id, user, hash) values(2, “jane”, “cdbbcd86b35e”); # Insert a new row into the ‘users’ table
sqlite> SELECT * FROM users WHERE user=” OR (SELECT COUNT(*) FROM users) > 0 — AND hash=’test’; # Determine if table users exists using only true/false behavior (e.g., login success vs failure).
sqlite> .quit # Exit the SQLite command-line interface
```
sqlite> .open database.db
sqlite> DROP TABLE IF EXISTS users;
sqlite> CREATE TABLE IF NOT EXISTS users (id integer, user text, hash text);
sqlite> INSERT into users(id ,user, hash) values(1, "john", "e66860546f18");
sqlite> INSERT into users(id, user, hash) values(2, "jane", "cdbbcd86b35e");
sqlite> SELECT * FROM users WHERE user='' OR (SELECT COUNT(*) FROM users) > 0 -- AND hash='test';
sqlite> .quit
```
Python

from sqlite3 import connect # Import the connect function from sqlite3 to interact with SQLite databases
from contextlib import closing # Import closing from contextlib to ensure the connection is properly closed
temp_user = input(“Enter username: “) # Prompt the user to enter a username
temp_hash = input(“Enter password: “) # Prompt the user to enter a password (Usually, there will be a function to hash the password, it’s removed from here)
with closing(connect(“database.db”,isolation_level=None)) as conn: # Use a context manager to automatically close the database connection when done
conn.execute(“DROP TABLE IF EXISTS users”) # Delete the table named ‘test’ if it exists
conn.execute(“CREATE TABLE IF NOT EXISTS users (id integer, user text, hash text)”) # Create a table named ‘users’ if it doesn’t already exist, column ‘id’: stores a numeric identifier for each user, column ‘user’: stores the username as text, column ‘hash’: stores the password hash as text
conn.execute(“INSERT into users(id ,user, hash) values(?,?, ?)”, (1,”john”, “e66860546f18”)) # Insert a new row into the ‘users’ table
conn.execute(“INSERT into users(id, user, hash) values(?,?, ?)”, (2,”jane”, “cdbbcd86b35e”)) # Insert a new row into the ‘users’ table
result = conn.execute(“SELECT * FROM users WHERE user=’%s’ AND hash=’%s’” % (temp_user,temp_hash)).fetchone() # Determine if table users exists using only true/false behavior (e.g., login success vs failure).
if result: # If a row is returned
print(“Login successful”) # Show the successful message
else: # If there is no row
print(“Login failed”) # Show the failed message
```
from sqlite3 import connect
from contextlib import closing
temp_user = input("Enter username: ")
temp_hash = input("Enter password: ")
with closing(connect("database.db",isolation_level=None)) as conn:
    conn.execute("DROP TABLE IF EXISTS users")
    conn.execute("CREATE TABLE IF NOT EXISTS users (id integer, user text, hash text)")
    conn.execute("INSERT into users(id ,user, hash) values(?,?, ?)", (1,"john", "e66860546f18"))
    conn.execute("INSERT into users(id, user, hash) values(?,?, ?)", (2,"jane", "cdbbcd86b35e"))
    result = conn.execute("SELECT * FROM users WHERE user='%s' AND hash='%s'" % (temp_user,temp_hash)).fetchone()
    if result:
        print("Login successful")
    else:
        print("Login failed")
```
Malicious statement

If a use enter ' OR (SELECT COUNT(*) FROM users) > 0 -- for the username and any password, it will count how many rows exist in the users table. If at least one user exists, this expression evaluates to TRUE.
```
SELECT * FROM users WHERE user='' OR (SELECT COUNT(*) FROM users) > 0 -- AND hash='test'
```
Output

It will show login successful which indicates the users table does exist.
```
Login successful
```
If a use enter ' OR (SELECT COUNT(*) FROM userx) > 0 -- for the username and any password, it will count how many rows exist in the users table. If at least one user exists, this expression evaluates to TRUE.
```
SELECT * FROM users WHERE user='' OR (SELECT COUNT(*) FROM userx) > 0 -- AND hash='test'
```
Output

It will show login successful which indicates the users table does exist.
```
Login failed
```
Insecure Design

A threat actor may use any ID to retrieve user info (The logic receives users by incremental ids)

from sqlite3 import connect # Import the connect function from sqlite3 to interact with SQLite databases
from contextlib import closing # Import closing from contextlib to ensure the connection is properly closed
temp_id = input(“Enter id: “) # Prompt the user to enter a id
with closing(connect(“database.db”,isolation_level=None)) as conn: # Use a context manager to automatically close the database connection when done
conn.execute(“DROP TABLE IF EXISTS users”) # Delete the table named ‘test’ if it exists
conn.execute(“CREATE TABLE IF NOT EXISTS users (id integer, user text, hash text)”) # Create a table named ‘users’ if it doesn’t already exist, column ‘id’: stores a numeric identifier for each user, column ‘user’: stores the username as text, column ‘hash’: stores the password hash as text
conn.execute(“INSERT into users(id ,user, hash) values(?,?, ?)”, (1,”john”, “e66860546f18”)) # Insert a new row into the ‘users’ table
conn.execute(“INSERT into users(id, user, hash) values(?,?, ?)”, (2,”jane”, “cdbbcd86b35e”)) # Insert a new row into the ‘users’ table
print(conn.execute(“SELECT * FROM users WHERE id=?”, (temp_id,)).fetchall()) # Safely query the users table for a specific id using a parameterized query
```
from sqlite3 import connect
from contextlib import closing
temp_id = input("Enter id: ")
with closing(connect("database.db",isolation_level=None)) as conn:
    conn.execute("DROP TABLE IF EXISTS users")
    conn.execute("CREATE TABLE IF NOT EXISTS users (id integer, user text, hash text)")
    conn.execute("INSERT into users(id ,user, hash) values(?,?, ?)", (1,"john", "e66860546f18"))
    conn.execute("INSERT into users(id, user, hash) values(?,?, ?)", (2,"jane", "cdbbcd86b35e"))
    print(conn.execute("SELECT * FROM users WHERE id=?", (temp_id,)).fetchall())
```
Statement will be
```
SELECT * FROM users WHERE id=1
```
Output
```
[(1, 'john', 'e66860546f18')]
```
User Input (SQL/Blind SQL Injection)

If you want to pass dynamic values to the SQL statement, make sure to use ? as a placeholder and pass the value in a tuple as (value,). The ? tells the db engine to properly escape the passed values. Escaping means that the value should be treated as string. E.g., if someone enters ' symbol which can be used to close a clause, the db engine will automatically escape it like this \'

Python

from sqlite3 import connect # Import the connect function from sqlite3 to interact with SQLite databases
from contextlib import closing # Import closing from contextlib to ensure the connection is properly closed
temp_user = input(“Enter username: “) # Prompt the user to enter a username
temp_hash = input(“Enter password: “) # Prompt the user to enter a password (Usually, there will be a function to hash the password, it’s removed from here)
with closing(connect(“database.db”,isolation_level=None)) as conn: # Use a context manager to automatically close the database connection when done
conn.execute(“DROP TABLE IF EXISTS users”) # Delete the table named ‘test’ if it exists
conn.execute(“CREATE TABLE IF NOT EXISTS users (id integer, user text, hash text)”) # Create a table named ‘users’ if it doesn’t already exist, column ‘id’: stores a numeric identifier for each user, column ‘user’: stores the username as text, column ‘hash’: stores the password hash as text
conn.execute(“INSERT into users(id ,user, hash) values(?,?, ?)”, (1,”john”, “e66860546f18”)) # Insert a new row into the ‘users’ table
conn.execute(“INSERT into users(id, user, hash) values(?,?, ?)”, (2,”jane”, “cdbbcd86b35e”)) # Insert a new row into the ‘users’ table
print(conn.execute(“SELECT * FROM users WHERE user=? AND hash=?”, (temp_user,temp_hash,)).fetchall()) # Safely query the users table for a specific username and password using a parameterized query
```
from sqlite3 import connect
from contextlib import closing
temp_user = input("Enter username: ")
temp_hash = input("Enter password: ")
with closing(connect("database.db",isolation_level=None)) as conn:
    conn.execute("DROP TABLE IF EXISTS users")
    conn.execute("CREATE TABLE IF NOT EXISTS users (id integer, user text, hash text)")
    conn.execute("INSERT into users(id ,user, hash) values(?,?, ?)", (1,"john", "e66860546f18"))
    conn.execute("INSERT into users(id, user, hash) values(?,?, ?)", (2,"jane", "cdbbcd86b35e"))
    print(conn.execute("SELECT * FROM users WHERE user=? AND hash=?", (temp_user,temp_hash,)).fetchall())
```
April 5, 2026
Python Reading and Writing Files
Read From File

To read from the file, you can use the open function to open the file. It opens it and returns a file object that users can use to read or modify the content of that file. The syntax is open(file_name, mode), the file_name is the name of the file you want to interact with, and the mode could be any of these:
- r read mode
- w write mode (Overwrites existing file)
- a append to the end mode
- b binary mod
  - There are other modes, but these are commonly used
File Content
```
Test1
Test2
```
Example

temp_file = open(“test_1.txt”, “r”) # Open the file “test_1.txt” in read mode (“r”)
print(temp_file.read()) # Read the entire contents of the file and print it
temp_file.close() # Close the file to free system resources
```
temp_file = open("test_1.txt","r")

print(temp_file.read())
temp_file.close()
```
Result
```
Test1
Test2
```
Read From File (Line by Line)

You can use the .readline method to read line by line

File Content
```
Test1
Test2
```
Example

temp_file = open(“test_1.txt”) # Open the file “test_1.txt” in read mode (default mode is “r”)
for line in temp_file.readlines(): # Read all lines into a list and iterate through each line
print(line, end=””) # Print each line without adding extra newlines (end=””)
temp_file.close() # Close the file to free system resources
```
temp_file = open("test_1.txt")
for line in temp_file.readlines():
    print(line, end="")

temp_file.close()
```
Result
```
Test1
Test2
```
Or, you can use the .readlines method

Example

temp_file = open(“test_1.txt”) # Open the file “test_1.txt” in read mode (default “r”)
lines = temp_file.readlines() # Read all lines into a list called ‘lines’
for line in lines: # Iterate through each line in the list
print(line, end=””) # Print each line without adding extra newlines
temp_file.close() # Close the file to free system resources
```
temp_file = open("test_1.txt")

lines = temp_file.readlines()
for line in lines:
    print(line, end="")

temp_file.close()
```
Write to File

To write, you can use the .write method

Example

temp_file = open(“test_1.txt”, “w”) # Open the file in write mode (“w”); creates the file if it doesn’t exist, or overwrites it if it exists
temp_file.write(“Test\n”) # Write the string “Test” followed by a newline to the file
temp_file.close() # Close the file to save changes and free resources
temp_file = open(“test_1.txt”, “r”) # Reopen the file in read mode (“r”)
print(temp_file.read()) # Read the entire file contents and print that
temp_file.close() # Close the file after reading
```
temp_file = open("test_1.txt","w")
temp_file.write("Test\n")
temp_file.close()

temp_file = open("test_1.txt","r")
print(temp_file.read())
temp_file.close()
```
Result
```
Test
```
Write to File (With User Input)

You can ask the user for input, then save that to a file

User Input
```
Hello World!
```
Example

temp_file = open(“test_1.txt”, “a+”) # Open the file in append and read mode (“a+”); creates file if it doesn’t exist
temp_user_input = input(“Enter text: “) # Prompt the user to enter text
temp_file.write(temp_user_input) # Append the user’s input to the end of the file
temp_file.close() # Close the file to save changes
temp_file = open(“test_1.txt”, “r”) # Reopen the file in read mode
print(temp_file.read()) # Read and print the entire contents of the file
temp_file.close() # Close the file after reading
```
temp_file = open("test_1.txt","a+")
temp_user_input = input("Enter text: ")
temp_file.write(temp_user_input)
temp_file.close()

temp_file = open("test_1.txt","r")
print(temp_file.read())
temp_file.close()
```
Result
```
Hello World!
```
Read\Write Without Close Method

The .close method is used to close the opened file (It’s a good practice to do that). If you do not want to use that, then use the with the statement, which will automatically close it when flow control leaves the with block

File Content
```
Test1
Test2
```
Example

with open(“test_1.txt”, “r”) as f: # Open the file “test_1.txt” in read mode; ‘with’ ensures it will be automatically closed
print(f.read()) # Read the entire file content and print it
```
with open("test_1.txt","r") as f:
    print(f.read())
```
Result
```
Test1
Test2
```
Remove a File

There are different ways to delete a file, one of them is the use the remove function from the Miscellaneous operating system interfaces module (You need to import it first using import os).

User Input
```
Hello World!
```
Example

import os # Import the os module for interacting with the operating system
os.remove(“test_1.txt”) # Delete the file “test_1.txt” from the filesystem
```
import os
os.remove("test_1.txt")
```
March 8, 2026
Python Input
Input

The input function is used to get input from the user in string data type (If the user enters [1,2,3], it will be "[1,2,3]" – it becomes a string, not a list)

Example

age = input(“Enter your age: “) # Prompt the user to enter their age; the input is returned as a string
print(“Your age is: “, age) # Print the age entered by the user
```
age = input("Enter your age: ")
print("Your age is: ", age)
```
Result
```
What is your age? 40
Your age is: 40
```
You can also have that in a loop

Example

temp_var = “” # Initialize an empty string variable
while temp_var != “exit”: # Continue looping until the user types “exit”
temp_var = input(“Enter text: “) # Prompt the user to enter text
print(“You entered: “, temp_var) # Print the text entered by the user
```
temp_var = ""
while temp_var != "exit":
    temp_var = input("Enter text: ")
    print("You entered: ", temp_var)
```
Result
```
Enter text: 10
You entered: 10
Enter text: test
You entered: test
Enter text: exit
You entered: exit
```
Also, you can check the length

Example

temp_var = “” # Initialize an empty string variable
while len(temp_var) != 4: # Repeat the loop until the user enters a string of length 4
temp_var = input(“Enter a number: “) # Prompt the user to enter a number
print(“You entered: “, temp_var) # Print the value entered by the user
```
temp_var = ""
while len(temp_var) != 4:
    temp_var = input("Enter a number: ")
    print("You entered: ", temp_var)
```
Result
```
Enter a number: a
You entered: a
Enter a number: bb
You entered: bb
Enter a number: ccc
You entered: ccc
Enter a number: dddd
You entered: dddd
```
Input (Type)

The input function returns a string, and you can check that using the type function

Example

temp_var = input(“Enter a number: “) # Prompt the user to enter a number; input is always returned as a string
print(type(temp_var)) # Print the type of temp_var
```
temp_var = input("Enter a number: ")
print(type(temp_var))
```
Result
```
Enter a number: 40
<class 'str'>
```
Input (Casting or Converting to int)

To cast, or convert a string into an int, you can use the int function

Example

temp_var = input(“Enter a number: “) # Prompt the user to enter a number; input is returned as a string
temp_var = int(temp_var) # Convert the input string to an integer
print(type(temp_var)) # Print the type of temp_var
```
temp_var = input("Enter a number: ")
temp_var = int(temp_var)
print(type(temp_var))
```
Result
```
Enter a number: 40
<class 'int'>
```
Input (Safe Casting or Converting)

Sometimes, functions that evaluate a string into code could be exploited, so it’s recommended that you use safe eval functions such as literal_eval from ast module (If needed)

Example

import ast # Import the Abstract Syntax Trees module (used here for safe evaluation)
temp_var = input(“Enter a float number: “) # Prompt the user to enter a number; input is returned as a string
temp_var = ast.literal_eval(temp_var) # Safely evaluate the input to its Python type (int, float, etc.)
print(type(temp_var)) # Print the type of temp_var
```
import ast
temp_var = input("Enter a float number: ")
temp_var = ast.literal_eval(temp_var)
print(type(temp_var))
```
Result
```
Enter a number: 40.0
<class 'float'>
```
Sanitizing Input

If you are expecting input that does not contain specific characters, you need to sanitize the input (Do not rely on the user to input something without the specific characters)

Example

temp_var = input(“Enter a string that does not contain @: “) # Prompt the user to enter a string
temp_var = temp_var.replace(“@”, “”) # Remove all occurrences of “@” from the string
print(temp_var) # Print the modified string
```
temp_var = input("Enter a string that does not contain @: ")
temp_var = temp_var.replace("@", "")
print(temp_var)
```
Result
```
Enter a number: Hello World!@
Hello World!
```
March 8, 2026
Python Pattern Matching With Regular Expressions
Search for a value

Some variable data type such as string, list, set and tuple allow you to search them by using the in keyword

Example

temp_list = [1, 2, 3] # Create a list with elements 1, 2, 3
temp_string = “Hello World!” # Create a string variable with value “Hello World!”
if 1 in temp_list: # Check if the number 1 exists in temp_list
print(“Found number 1”) # If True, print this message
if “Hello” in temp_string: # Check if the substring “Hello” exists in temp_string
print(“Found Hello”) # If True, print this message
```
temp_list = [1,2,3]
temp_string = "Hello World!"

if 1 in temp_list:
    print("Found number 1")

if "Hello" in temp_string:
    print("Found Hello")
```
Result
```
Found number 1
Found Hello
```
Check the length

You can use the len function to check the length

Example

mobile = “1112223333” # Create a string variable representing a mobile number
if len(mobile) == 10: # Check if the length of the mobile number is exactly 10
print(“Mobile number length is correct”) # If True, print this message
```
mobile = "1112223333"

if len(mobile) == 10:
    print("Mobile number length is correct")
```
Result
```
Mobile number length is correct
```
Check if Numeric

You can either use the .isdecimal method or loop the string character and check each one individually

Example

mobile = “1112223333” # Create a string variable representing a mobile number
if len(mobile) == 10: # Check if the mobile number has exactly 10 characters
print(“Mobile number length is valid”) # If True, print this message
if mobile.isdecimal(): # Check if all characters in the string are decimal digits (0-9)
print(“Mobile number pattern is valid”) # If True, print this message
```
mobile = "1112223333"

if len(mobile) == 10:
    print("Mobile number length is valid")
    if mobile.isdecimal():
        print("Mobile number pattern is valid")
```
Result
```
Mobile number length is valid
Mobile number pattern is valid
```
Or, you can loop each character and check if it’s number or not

Example

mobile = “1112223333” # Create a string variable representing a mobile number
numbers = “1234567890” # String containing all valid numeric digits
if len(mobile) == 10: # Check if mobile number has exactly 10 characters
print(“Mobile number length is valid”) # Output message if length is valid
for character in mobile: # Loop through each character in the mobile number
if character in numbers: # Check if the character is a valid number
print(character + ” is valid”) # Print a message for each valid character
```
mobile = "1112223333"
numbers = "1234567890"

if len(mobile) == 10:
    print("Mobile number length is valid")
    for character in mobile:
        if character in numbers:
            print(character + " is valid")
```
Result
```
Mobile number length is valid
1 is valid
1 is valid
1 is valid
2 is valid
2 is valid
2 is valid
3 is valid
3 is valid
3 is valid
3 is valid
```
Check by index

You can also use indexing to check a specific character or sub-string

Example

mobile = “111-222-3333” # Create a string variable representing a mobile number in the format XXX-XXX-XXXX
if len(mobile) == 12: # Check if the total length is 12 characters (including dashes)
if mobile[3] == “-” and mobile[7] == “-“: # Check if the 4th and 8th characters are dashes
if mobile[0:3].isdecimal() and mobile[4:7].isdecimal() and mobile[8:12].isdecimal(): # Check if the number parts are all digits: first three, middle three, last four
print(“Mobile number is valid”) # If all conditions are met, print this message
```
mobile = "111-222-3333"

if len(mobile) == 12:
    if mobile[3] == "-" and mobile[7] == "-":
        if mobile[0:2].isdecimal() and mobile[4:6].isdecimal() and mobile[8:11].isdecimal():
            print("Mobile number is valid")
```
Result
```
Mobile number is valid
```
Regex

Regex, or regular expression, is a language for finding a particular string based on a search pattern.
- Characters
  - \d matches 0 to 9
    
    \d\d\d\d with 1234567 returns 1234
    
    \d+ with 1234567 returns 1234567
  - \w matches word character A to Z, a to z, 0 to 9, and _
    
    \w\w with Hello! returns He, and ll
    
    \w+ with Hello! returns Hello
  - \s matches white space character
  - . matches any character except line break
    
    . with car returns c, a, and r
    
    .* with car returns car
- Character classes
  - [ ] for matching characters within the brackets
    
    [abcd] matches a, b, c, or d
    
    [a-d] matches a, b, c, or d (The – means to)
    
    [^abcd] matches anything except a, b, c, or d (The ^ means negated character class)
    
    [^a-d] matches anything except a, b, c, or d (The - means to, and ^ means negated character class)
- Quantifiers
  - + one or more
    
    [1-2] with 112233 returns 1, 1, 2, 2
    
    [1-2]+ with 112233 returns 1122
  - * zero or more
    
    1*2* with 112233 returns 1122
  - {2} matches 2 times
    
    1{4} with 111111 returns 1111
- Boundaries
  - ^ start of string
  - $ end of string
- Normal
  - 123456 with 123456789 returns 123456
  - abcdef with abcdefghijklmnopqrstuvwxyz returns abcdef
    
    Escape special characters using \
Importing Regex (re) Module

To use the regex module named re, you need to make it available to use by using the import statement

Example

import re # Import Python’s built-in regular expression (regex) module
print(dir(re)) # Print a list of all attributes, functions, and classes available in the ‘re’ module
```
import re
print(dir(re))
```
Result
```
['A', 'ASCII', 'DEBUG', 'DOTALL', 'I', 'IGNORECASE', 'L', 'LOCALE', 'M', 'MULTILINE', 'Match', 'Pattern', 'RegexFlag', 'S', 'Scanner', 'T', 'TEMPLATE', 'U', 'UNICODE', 'VERBOSE', 'X', '_MAXCACHE', '__all__', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__', '__version__', '_cache', '_compile', '_compile_repl', '_expand', '_locale', '_pickle', '_special_chars_map', '_subx', 'compile', 'copyreg', 'enum', 'error', 'escape', 'findall', 'finditer', 'fullmatch', 'functools', 'match', 'purge', 'search', 'split', 'sre_compile', 'sre_parse', 'sub', 'subn', 'template']
```
Regex (.search)

You can use the .search method of re module to find a string based on regex pattern

Example

import re # Import the regular expression module
mobile = “111-222-3333” # Create a string variable representing a mobile number
if re.search(“\d\d\d-\d\d\d-\d\d\d\d”, mobile): # Search for the pattern XXX-XXX-XXXX using regex
print(“Mobile number is valid”) # Print this message if the pattern matches
```
import re

mobile = "111-222-3333"

if re.search("\d\d\d-\d\d\d-\d\d\d\d",mobile):
    print("Mobile number is valid")
```
Result
```
Mobile number is valid
```
March 8, 2026
Python Strings
Indexing

You can slice a string using smart indexing [] and : or ::

Example

temp_string = “abcdefghijk” # Create a string variable with value “abcdefghijk”
print(temp_string[1:]) # Slice from index 1 to the end and print that
print(temp_string[2:6]) # Slice from index 2 up to (but not including) index 6 and print that
print(temp_string[::-1]) # Reverse the string using slicing and print that
```
temp_string = "abcdefghijk"

print(temp_string[1:])
print(temp_string[2:6])
print(temp_string[::-1])
```
Result
```
bcdefghijk
cdef
kjihgfedcba
```
Concatenation

You can concatenate strings using the + operator

Example

first = “1234” # Create a string variable named first with value “1234”
second = “5678” # Create a string variable named second with value “5678”
print(first + second) # Concatenate the two strings and print that
```
first = "1234"
second = "5678"

print(first + second)
```
Result
```
12345678
```
Replace a letter or sub-string

You can use the .replace method to replace a word or letter in the string. The .replace method has 3 parameters (old value, new value, count)

Example

temp_string = “Hello World!” # Create a string variable with value “Hello World!”
print(temp_string.replace(“!”, “$”)) # Replace all occurrences of “!” with “$” and print that
```
temp_string = "Hello World!"

print(temp_string.replace("!","$"))
```
Result
```
Hello World$
```
Or, you can replace a word

Example

temp_string = “Hello World!” # Create a string variable with value “Hello World!”
print(temp_string.replace(“World!”, “Mike”)) # Replace the substring “World!” with “Mike” and print that
```
temp_string = "Hello World!"

print(temp_string.replace("World!","Mike"))
```
Result
```
Hello Mike
```
Also, you can remove a word by replacing it with nothing

Example

temp_string = “Hello World!” # Create a string variable with value “Hello World!”
print(temp_string.replace(“World!”, “”)) # Replace the substring “World!” with an empty string and print that
```
temp_string = "Hello World!"

print(temp_string.replace("World!",""))
```
Result
```
Hello
```
Uppercase

You can use the .upper method to return a copy of the string in upper case

Example

temp_string = “Hello World!” # Create a string variable with value “Hello World!”
print(temp_string.upper()) # Convert all characters in the string to uppercase and print that
```
temp_string = "Hello World!"

print(temp_string.upper())
```
Result
```
HELLO WORLD!
```
Lowercase

You can use the .lower method to return a copy of the string in upper case

Example

temp_string = “Hello World!” # Create a string variable with value “Hello World!”
print(temp_string.upper()) # Convert all characters in the string to lowercase and print that
```
temp_string = "Hello World!"

print(temp_string.lower())
```
Result
```
hello world!
```
Split

You can use the .split method to split the string. The split method has 2 parameters (separator, max_split) and the result is a list

Example

temp_string = “Hello World!” # Create a string variable with value “Hello World!”
print(temp_string.split(” “)) # Split the string into a list using space as the separator and print that
```
temp_string = "Hello World!"

print(temp_string.split(" "))
```
Result
```
['Hello', 'World!']
```
Join

You can use the .join method to convert a list of strings into one single string

Example

temp_items = [“Hello”, “World”, “1”] # Create a list of strings
print(“,”.join(temp_items)) # Join all elements of the list into a single string, separated by “,” and print that
```
temp_items = ["Hello","World","1"]

print(",".join(temp_items))
```
Result
```
Hello,World,1
```
Find

You can use .find to return the index of the first occurrence if found; Otherwise, it returns -1

Example

temp_string = “0123456789” # Create a string variable with value “0123456789”
print(temp_string.find(“34”)) # Find the starting index of the substring “34” and print that
```
temp_string = "0123456789"

print(temp_string.find("34"))
```
Result
```
3
```
Count

You can use .count to return the number of occurrences if found; Otherwise, it returns 0

Example

temp_string = “1122334455” # Create a string variable with value “1122334455”
print(temp_string.count(“1”)) # Count how many times the substring “1” appears in the string and print that
```
temp_string = "1122334455"

print(temp_string.count("1"))
```
Result
```
2
```
String Class

When you assign a string to a variable, it will create an str object, the str open includes different methods like __str__ that returns the defined string

Example

temp_var = “test” # Create a variable named temp_var and assign it the string “test”
print(type(temp_var)) # Print the type of temp_var
```
temp_var = "test"
print(type(temp_var))
```
Result
```
<class 'str'>
['__add__', '__class__', '__contains__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__getnewargs__', '__getstate__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__iter__', '__le__', '__len__', '__lt__', '__mod__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rmod__', '__rmul__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'capitalize', 'casefold', 'center', 'count', 'encode', 'endswith', 'expandtabs', 'find', 'format', 'format_map', 'index', 'isalnum', 'isalpha', 'isascii', 'isdecimal', 'isdigit', 'isidentifier', 'islower', 'isnumeric', 'isprintable', 'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'maketrans', 'partition', 'removeprefix', 'removesuffix', 'replace', 'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate', 'upper', 'zfill']
```
Custom Example

class string(): # Define a class named string
def __init__(self, var): # Constructor method, called when creating a new object
self.var = var # Store the argument var in the instance variable self.var
def __str__(self): # Define the string representation for printing
return “{} __str__”.format(self.var) # Return the string with “__str__” appended
def __eq__(self, other): # Define equality comparison for string objects
if isinstance(other, string): # Check if other is also an instance of string
return (self.var == other.var) # Compare the stored values
return False # If other is not a string object, return False
print(string(“test”) == string(“test”)) # Compare two string objects and print that
```
class string():
    def __init__(self, var):
        self.var = var

    def __str__(self):
        return "{} __str__".format(self.var)

    def __eq__(self, other):
         if isinstance(other, string):
            return (self.var == other.var)
         return False

print(string("test") == string("test"))
```
Result
```
True
```
March 8, 2026
Python Dictionaries
Dictionary

A dict is a data type that stores a sequence of key:value pairs (a key is associated with a value). Keys have to be immutable and cannot be duplicated. Notice that dict and set use the same syntax {} . A dict will have key:value pairs, whereas a set, will only have values. Dictionaries are also known as associative arrays or hash tables.

Example

dict_1 = {“key_1”: “value_1”, “key_2”: “value_2”} # Create a dictionary with key-value pairs
set_1 = {“value_1”, “value_2”} # Create a set with two unique values
print(type(dict_1), “=”, dict_1) # Print the type of dict_1 and its contents
print(type(set_1), “=”, set_1) # Print the type of set_1 and its contents
```
dict_1 = {"key_1":"value_1","key_2":"value_2"}
set_1 = {"value_1","value_2"}

print(type(dict_1), "=", dict_1)
print(type(set_1), "=", set_1)
```
Result
```
<class 'dict'> = {'key_1': 'value_1', 'key_2': 'value_2'}
<class 'set'> = {'value_2', 'value_1'}
```
Structuring Data

Dictionaries are very powerful – Let’s say that following is a list of users in a company:
- Jim drives a Toyota Tacoma 2010, and he is 44 years old
- Sara drives a Ford F-150 2021, and she is 31 years old
We can have that organized into a dict
```
{
 "Users": {
 "Jim": {
 "car": "Toyota Tacoma 2010",
 "age": 44
 },
 "Sara": {
 "car": "Ford F-150 2021",
 "age": 31
 }
 }
}
```
Or, we can structure that in a list of dictionaries
```
[
 {
 "name": "Jim",
 "car": "Toyota Tacoma 2010",
 "age": 44
 },
 {
 "name": "Sara",
 "car": "Ford F-150 2021",
 "age": 31
 }
]
```
Or, more structured (The more structured, the easier to search or analyze)
```
[
 {
 "name": "Jim",
 "car": {
 "model": "Tacoma",
 "make": "Toyota",
 "year": 2010
 },
 "age": 44
 },
 {
 "name": "Sara",
 "car": {
 "model": "F-150",
 "make": "Ford",
 "year": 2021
 },
 "age": 31
 }
]
```
Accessing Values

You can access a value by its key. If you have a dict named temp_dict that contains {"key_1":"value_1","key_2":"value_2"}, then you can access the value_1 by using key_1 as temp_dict["key_1"] and so on.

Example

temp_dict = {“key_1”: “value_1”, “key_2”: “value_2”} # Create a dictionary with two key-value pairs
print(temp_dict[“key_1”]) # Access the value associated with the key “key_1” and print it
```
temp_dict = {"key_1":"value_1","key_2":"value_2"}

print(temp_dict["key_1"])
```
Result
```
value_1
```
Or you can use the .get method

Example

temp_dict = {“key_1”: “value_1”, “key_2”: “value_2”} # Create a dictionary with two key-value pairs
print(temp_dict[“key_1”]) # Access the value associated with the key “key_1” using the method .get and print it
```
temp_dict= {"key_1":"value_1","key_2":"value_2"}
print(temp_dict.get("key_1"))
```
Result
```
value_1
```
Get All Keys

To get the keys of a dict, you can use the .keys method

Example

temp_dict = {“key_1”: “value_1”, “key_2”: “value_2”} # Create a dictionary with two key-value pairs
print(temp_dict[“key_1”]) # Print all temp_dict keys
```
temp_dict = {"key_1":"value_1","key_2":"value_2"}

print(temp_dict.keys())
```
Result
```
dict_keys(['key_1', 'key_2'])
```
Get All Values

To get the values of a dict, you can use the .values method

Example

temp_dict = {“key_1”: “value_1”, “key_2”: “value_2”} # Create a dictionary with two key-value pairs
print(temp_dict[“key_1”]) # Print all temp_dict values
```
temp_dict = {"key_1":"value_1","key_2":"value_2"}
print(temp_dict.values())
```
Result
```
dict_values(['value_1', 'value_2'])
```
Add or Update key:value Pair

You can use the update method to add a new pair or update a current pair. Remember that a dict cannot have duplicate keys. So, if you use an existing key, the value will be updated. Otherwise, a new pair will be added to the dict

Example

temp_dict = {“key_1”: “value_1”, “key_2”: “value_2”} # Create a dictionary with two key-value pairs
temp_dict.update({“key_1”: “new_value”}) # Update the value of “key_1” to “new_value”
print(temp_dict) # Print the updated dictionary
temp_dict.update({“key_3”: “value_3”}) # Add a new key-value pair “key_3”: “value_3” to the dictionary
print(temp_dict) # Print the updated dictionary
```
temp_dict = {"key_1":"value_1","key_2":"value_2"}

temp_dict.update({"key_1":"new_value"})print(temp_dict)

temp_dict.update({"key_3":"value_3"})print(temp_dict)
```
Result
```
{'key_1': 'new_value', 'key_2': 'value_2'}
{'key_1': 'new_value', 'key_2': 'value_2', 'key_3': 'value_3'}
```
Modify a value by its Key

You can use the assignment statement = with the value corresponding key

Example

temp_dict = {“key_1”: “value_1”, “key_2”: “value_2”} # Create a dictionary with two key-value pairs
temp_dict[“key_1”] = “new_value” # Update the value of “key_1” to “new_value”
print(temp_dict[“key_1”]) # Access and print the updated value of “key_1”
```
temp_dict = {"key_1":"value_1","key_2":"value_2"}

temp_dict["key_1"] = "new_value"
print(temp_dict["key_1"])
```
Result
```
new_value
```
Length

You can use the len function, which will return the number of keys

Example

temp_dict = {“key_1”: “value_1”, “key_2”: “value_2”} # Create a dictionary with two key-value pairs
print(len(temp_dict)) # Print the number of key-value pairs in the dictionary
```
temp_dict = {"key_1":"value_1","key_2":"value_2"}

print(len(temp_dict))
```
Result
```
2
```
Delete a key:value Pair

To delete a key:value pair, use the del function with the key

Example

temp_dict = {“key_1”: “value_1”, “key_2”: “value_2”} # Create a dictionary with two key-value pairs
del(temp_dict[“key_1”]) # Delete the key-value pair with key “key_1” from the dictionary
print(temp_dict) # Print the updated dictionary
```
temp_dict = {"key_1":"value_1","key_2":"value_2"}

del(temp_dict["key_1"])print(temp_dict)
```
Result
```
{'key_2': 'value_2'}
new_value
```
Pass By Reference

A dict is an immutable objects are passed by reference to function

def change_value(param_in): # Define a function that takes one parameter called param_in
param_in.update({2: “test”}) # Update the dictionary by adding a new key-value pair 2: “test”
var = {1: “test”} # Create a dictionary with one key-value pair 1: “test”
print(“Value before passing: “, var) # Print the dictionary before calling the function
change_value(var) # Call the function; the dictionary is modified inside the function
print(“Value after passing: “, var) # Print the dictionary after the function call

Example
```
def change_value(param_in):
    param_in.update({2:"test"})

var = {1:"test"}

print("Value before passing: ", var)
change_value(var)
print("Value after passing: ", var)
```
Result
```
Value before passing:  [0, 1, 2, 3, 4, 5]
Value after passing:  [0, 1, 2, 3, 4, 5, 99]
```
March 8, 2026