Mimicking Human Activity using Selenium and Python

Intro

The Binary Defense threat hunting team are experts on today’s threat actor groups. In addition to monitoring criminal forums, we conduct our own research to share with the infosec community. This post is a summary of a project our threat hunting team has set up to be able to mimic human activity in a controlled environment. In this post, we will give you an overview of our findings, and how you can do something similar in your organization.

Be sure to check out the video demo as well!

Purpose of the Project

Binary Defense has set up a controlled lab environment that is isolated from any other network to allow threat actors to attack with no repercussions. Allowing threat actors to attack the lab environment gives our team of threat hunters/researchers data to analyze and further use to prevent attacks on our client’s infrastructure.

One of the main things attackers look for as soon as they have access to a network is its size and the activity. If a network is seemingly empty, the threat actor might move on to a target they feel is more worthwhile. To check activity, some malware variants utilize screen capture techniques to see what a person is doing on a machine. This technique is becoming increasingly popular. By creating a script that mimics human activity programmatically, we can simulate the activity without needing real people on the machines.

What is Selenium?

Selenium is an open-source tool based on the JavaScript framework used for browser automation and application testing. Selenium eliminates repetitive manual testing that consumes a lot of time and effort. Users can write scripts in languages such as Java, Python, Ruby, JavaScript, Perl, PHP and C# to run against browsers and virtual machines. This allows many testers to be able to write scripts without language barriers. It also allows for cross-browser compatibility testing using most standard browsers and can be carried out on Windows, MacOS, and Linux systems. Selenium allows for manipulation by finding elements on the source code for web pages. While Selenium was developed for testing purposes, its browser manipulation capabilities allow for a wide range of use cases. These capabilities are what lead Binary Defense to the idea of simulating human activity.

Requirements

The project source code is published on GitHub: https://github.com/stacycreasey/Browsing-Bot

To replicate this project, you will need a web browser and correlating driver. We used Chrome and Chrome driver, Python, Numpy version 1.21.1, and Selenium version 4.0.0a6.post2. Selenium documentation can be found at https://selenium-python.readthedocs.io/.

Starting Webdriver Instance

main.py

from selenium import webdriver
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.service import Service
from selenium.common.exceptions import *

PATH = Service("/path/to/driver")  ##constant file path of Chrome driver

options = webdriver.ChromeOptions()  # Initializing Chrome Options from the Webdriver
options.add_experimental_option("useAutomationExtension", False)  # Adding Argument to Not Use Automation Extension
options.add_experimental_option("excludeSwitches", ["enable-automation"])  # Excluding enable-automation Switch
options.add_argument("disable-popup-blocking")
options.add_argument("disable-notifications")
options.add_argument("disable-gpu")  ##renderer timeout
   
driver = webdriver.Chrome(options=options, service=PATH)

Figure 1. Setting up Selenium Webdriver instance

After setting the path to Chrome driver and adding options to remove popups and turn off a Chrome feature that says the browser is being used with automation, the driver will start and open a blank Chrome webpage.

Fake Credentials for Web Form Grabbing

To allow attackers to grab credentials from websites with form grabbers, we made some fake credentials to log into a few websites. The usernames and passwords for these sites are saved in a text file called “usernamesPasswords.txt” in the format “username:password”. This text file makes it easier to change and add new credentials as needed. Currently, the script signs into four websites solely for the purpose of having the credentials grabbed. The four sites are Shein, Wish, AliExpress and Gearbest. The functionality and code for these four sites are almost identical but include minor changes to accommodate for different element names and paths.

The first part of these functions split the login credentials by “:” and pulls the corresponding one for each site.

AliExpress.py

def login_info():
    with open("usernamesPasswords.txt", "r") as infile:
        data = [line.rstrip().split(":") for line in infile]
        username = data[1][0]
        password = data[1][1]
    return username, password

Figure 2. Function to grab login credentials

Selenium interacts with the page, checks to see if an ad has popped up, and if it has, it will be closed.

def ad_popup(driver):
    try:
        WebDriverWait(driver,5).until(EC.visibility_of_element_located((By.CLASS_NAME, "coupon-poplayer-modal"))) ##coupon popup
        WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.CLASS_NAME, "btn-close"))).click()
    except:
        pass

Figure 3. Close ads

Selenium then waits for specific elements to be located and clickable to determine if sign in was successful

def logged_in_check(driver):
    try:
        WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.ID, "nav-user-account"))).click()  ##drop down menu
        WebDriverWait(driver, 5).until(EC.presence_of_element_located((By.XPATH, "//b[@class='welcome-name']")))
        print("ALIEXPRESS: sign in successful")
        return True
    except:
        return False

Figure 4. Checking if Selenium successfully logged in

The next function opens the website with driver.get(“https://www.aliexpress.com/”), tries to log in four times with the credentials from login_info()and if isLoggedIn == True Selenium will move onto the next site, but if isLoggedIn == False, elements from the source code attempt to be found for credentials to be sent to. For example, on AliExpress, the element ID for the username/email box is “fm-login-id,” Selenium waits until the element is visible, clicks on it with emailElement.click() and types the username by emailElement.send_keys(username). The process is then repeated for the password input box and submit button, then Selenium checks again to see if login was successful. If it was not, the functions repeat until the variable loginAttempts equals 4. If after four attempts login was still not successful, the script moves onto the next site and prints “ALIEXPRESS: sign in NOT successful.”

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import *
import time

loginAttempts = 0

def ali_express_run(driver):
    global loginAttempts

    if loginAttempts < 4:
        try:
            username, password = login_info()
            driver.get("https://www.aliexpress.com/")
            ad_popup(driver)

            isLoggedIn = logged_in_check(driver)

            if isLoggedIn == True:
                return
            elif isLoggedIn == False:
                try:
                    WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.ID, "nav-user-account"))).click() ##drop down menu
                    WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.CLASS_NAME, "sign-btn"))).click() ##sign in button
                    emailElement = WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.ID, "fm-login-id")))
                    emailElement.click()
                    time.sleep(1)
                    emailElement.send_keys(username)
                    time.sleep(1)
                    passwordElement = WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.ID, "fm-login-password")))
                    passwordElement.click()
                    time.sleep(1)
                    passwordElement.send_keys(password)
                    time.sleep(1)
                    WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.CLASS_NAME, "fm-button"))).click()  ##submit

                    isLoggedIn = logged_in_check(driver)

                    if isLoggedIn == True:
                        return
                    elif isLoggedIn == False:
                        loginAttempts += 1
                        ali_express_run(driver)
                except:
                    loginAttempts += 1
                    ali_express_run(driver)
        except:
            loginAttempts += 1
            ali_express_run(driver)
    else:
        print("ALIEXPRESS: sign in NOT successful")

Figure 5. Login function

Simulating User Activity

Mimicking realistic human activity involved multiple parts and websites. These functions vary more than the login functions as the websites used for this part of the project are not all the same type of website, e.g., ecommerce vs. games. The main part of this project was to simulate user browser activity in real time, meaning Selenium opens and closes pages and tabs, slowly scrolls through pages at the speed of a person, and even stops to “read” articles and product pages. Six websites are included for human activity purposes: Amazon, eBay, Etsy, Fox News, 2048 and Cookie Clicker.

For the ecommerce sites, popular search terms were put into text files, “keywordsAmazon.txt”, “keywordsEbay.txt”, and “keywordsEtsy.txt”. Search terms are chosen at random to give the illusion of different search activities. The number of searches made and product page numbers are also chosen at random. Seven to 20 pages were looked at for each search term, and 15 to 25 search terms were used. A function is used to split the keywords and add them to a list to be randomly chosen, similar to the splitting of the usernames and passwords.

Each website uses their own specific URL template to determine the page number a user is on. Knowing the template allows us to manipulate the URL to act as if someone is pressing “next page.” By taking our search term and adding the specific page template, we can replace parts of the URL to include the different product pages.

amazon.py

def next_page(keyword):
    template = "https://www.amazon.com/s?k={}"
    keyword = keyword.replace(" ", "+")

    url = template.format(keyword)
    url += "&page={}"

    return url

Figure 6. Function for getting the next product page

After searching our first keyword, we can get the list of the different products on the page by finding the element associated with the results, in this case by class name “a-link-normal.a-text-normal,” and then grabbing the href attributes from the element and putting them into a separate list, making sure to remove duplicates.

def link_product(tempList, driver):  ##product links for opening in new tab
    resultList = driver.find_elements(By.CLASS_NAME, "a-link-normal.a-text-normal")  ##class name for search results
    links = [x.get_attribute("href") for x in resultList]  ##pull links for search results to open product page

    for m in links:  ##removing duplicates links because class name is broad and has multiple instances of same href
        if m not in tempList:
            tempList.append(m)

    links = tempList

    return links

Figure 7. Getting product links

The product pages for Amazon have a container called “a-container” that covers the entirety of the product; the script gets the location and size of this container for slow scrolling in a later function.

def product_page(driver):  ##coords and dimensions to scroll through page
    container = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CLASS_NAME, "a-container")))
    containerLoc = container.location
    containerSize = container.size
    startY = containerLoc["y"]
    height = containerSize["height"]

    return startY, height

Figure 8. Location and size of product page container

The main function for each website is called “*_run” where “*” is the name of the site. For Amazon, the number of search terms and product pages are randomly determined with numKeywords = random.randint(15,25) and numPages = random.randint(7,20). The keywords are then randomly chosen from the get_keywords() function. The product page links are then received from the link_product() function. Each product on the search results page is then scrolled to and opened in a new tab. Selenium slow scrolls through the product page and when the end of the product page is reached, the product tab closes and returns to the search results tab. The same process is then repeated for each page and search term.

import numpy
from selenium import webdriver
from selenium.webdriver.common.keys import Keys  ##gives access to enter and escape key for results
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.service import Service
from selenium.common.exceptions import *
import time
import random 

def amazon_run(driver):
    numKeywords = random.randint(15,25) ##random num of keywords
    numPages = random.randint(7,20) ##random num of pages
    driver.get("https://www.amazon.com/")
    for i in range(numKeywords):
        keyword = random.choice(get_keywords()) ##random keyword selection
        url = next_page(keyword) ##get the product page num
        try:
            for j in range(1,numPages):
                tempList = []
                time.sleep(2)
                driver.get(url.format(j)) ##add page num to url with j
                time.sleep(5)
                links = link_product(tempList, driver) ##get links to products

                for k in range(0, len(links)):
                    parentWindow = driver.window_handles[0] ##main tab
                    time.sleep(1)
                    fullLink = links[0]  ##start at index 0
                    partialLink = fullLink.removeprefix("https://www.amazon.com")  ##href doesnt use this prefix but web element did
                    elementLink = driver.find_element(By.XPATH, '//a[contains(@href, "' + partialLink + '")]')
                    driver.execute_script("arguments[0].scrollIntoView({ behavior: 'smooth', block: 'center'});", elementLink) ##scroll to product
                    time.sleep(1)
                    driver.execute_script("window.open('" + fullLink + "')")  ##open product in new tab
                    childWindow = driver.window_handles[1]  ##product window
                    time.sleep(1)
                    driver.switch_to.window(childWindow)  ##without switch stays on parent (home) window
                    time.sleep(1)
                    startY, height = product_page(driver)
                    for l in numpy.arange(startY, height, random.uniform(0.04, 0.1)): ##slow scroll through product page
                        driver.execute_script("window.scrollTo(0, {});".format(l))
                    time.sleep(1)
                    driver.close()  ##closes child (product) window
                    driver.switch_to.window(parentWindow)  ##switch back to parent window
        except Exception as e:
            exception_handler(e, driver)

Figure 9. Opening the website and scrolling through products

The game functions are less complex than the other ones. The point of including the games is to use them as filler space to allow the script to run as long as possible.

The Cookie Clicker game is as simple as it gets, Selenium clicks on a cookie to earn money and buy upgrades. The money element is stored by ID “money,” and the upgrades are found in elements with CSS selector “#store b.” If the price of an upgrade is smaller than the amount of money, Selenium buys it.

cookieClicker.py

 from selenium import webdriver
from selenium.common.exceptions import ElementNotInteractableException, NoSuchElementException, StaleElementReferenceException
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time

def cookie_run(driver):
    driver.get("http://orteil.dashnet.org/experiments/cookie/")

    cookie = WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.ID, "cookie")))

    timeout = time.time() + 5
    game_time = time.time() * 60  # * 30 minutes

    while True:
        time.sleep(0.1)
        cookie.click()

        if time.time() > timeout:
            currentMoney = int("".join(driver.find_element(By.ID, "money").text.split(",")))
            access = driver.find_elements(By.CSS_SELECTOR, "#store b")
            upgrades = 
                {"id": f"buy{i.text.split('-')[0].strip()}", "price": int("".join(i.text.split("-")[1].strip().split(",")))}
                for i in access[:-1]
            ]

            for item in upgrades[::-1]:
                if item["price"] < currentMoney:
                    buy = driver.find_element(By.ID, item["id"])
                    buy.click()
                    break

            timeout = time.time() + 5

        if time.time() > game_time:
            time.sleep(5)
            break

Figure 10. Cookie Clicker game

All of these functions and files are put into main.py and are also chosen at random. Two lists are made for activity functions and login functions. They’re separated to allow all of the login functions to run at the beginning of the script because they are supposed to be done quickly and quietly. The lists are randomly shuffled and then iterated through until completion. Start and end times are also included for testing purposes. The full main.py file is listed below.

import numpy
from selenium import webdriver
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.service import Service
from selenium.common.exceptions import *
import time
import random
import datetime
from amazon import amazon_run
from foxnews import fox_news_run
from etsy import etsy_run
from aliExpress import ali_express_run
from gearbest import gearbest_run
from wish import wish_run
from shein import shein_run
from game2048 import game_run
from ebay import ebay_run
from cookieClicker import cookie_run

functionList = [amazon_run, etsy_run, fox_news_run, game_run, cookie_run, ebay_run]
loginList = [gearbest_run, ali_express_run, wish_run, shein_run]

def setDriver():
    PATH = Service("/path/to/driver")  ##constant file path of Chrome driver

    options = webdriver.ChromeOptions()  # Initializing Chrome Options from the Webdriver
    options.add_experimental_option("useAutomationExtension", False)  # Adding Argument to Not Use Automation Extension
    options.add_experimental_option("excludeSwitches", ["enable-automation"])  # Excluding enable-automation Switch
    options.add_argument("disable-popup-blocking")
    options.add_argument("disable-notifications")
    options.add_argument("disable-gpu")  ##renderer timeout

    driver = webdriver.Chrome(options=options, service=PATH)

    return driver

def main():
    start = datetime.datetime.now()
    random.shuffle(functionList) ##shuffles list of functions
    driver = setDriver()
    driver.get("https://www.google.com")
    for i in range(len(loginList)): ##go through login functions
        loginList[i](driver)
        time.sleep(2)
    for i in range(len(functionList)): ##go through functions in shuffled order
        functionList[i](driver)
        time.sleep(2)

    end = datetime.datetime.now()
    executionTime = end - start

    print("Start time - " + str(start))
    print("End time - " + str(end))
    print("Execution Time - " + str(executionTime))

Figure 11. Completed main.py

Advanced Features for Mouse Movement

To simulate humanlike mouse movements, we recommend bezier version 2021.2.12 and pyautogui version 0.9.53.

Real human mouse movement does not move in a straight line. Humans actually move their mouse along a curved path that can be approximated by a Bezier curve. Two control points, b1 and b2, are determined to find the curve in relation to the start and end points, b0 and b3. We follow this curve quite precisely, so it makes a great feature for simulating human activity.

When scrolling through a webpage, you might also notice that your mouse shakes just a tiny bit as you are scrolling with your wheel. Using pyautogui can determine the location of a mouse relative to the computer screen and move to the next HTML element with the Bezier curve in mind, while also simulating the mouse jittering movement when scrolling and slow typing letter by letter with varying speeds.

The slow typing function requires pageInput or the words that need to be typed, and the element in which they need to be sent through. Each letter of the string of words is iterated through and typed into the element with realistic typing speeds.

def slow_type(pageElem, pageInput):
    for letter in pageInput:
        time.sleep(float(random.uniform(.05, .3)))
        pageElem.send_keys(letter)

Figure 12. Slow typing function

panelHeight = driver.execute_script(‘return window.outerHeight – window.innerHeight;’) gives the dimensions of the size of the window panel to be used later when determining mouse location and element location. The beziercurve function determines the absolute x and y locations of a page element and finds the middle of the element for the mouse to move to. The starting position of the mouse can be found with pyautogui.position(). Control points are then determined based on the coordinates of both the start and end locations of the mouse. The curve is then created with the control points and degree of curve, then pyautogui moves the mouse accordingly.

def bezier_mouse(location, size, panelHeight): ##move mouse to middle of element
    x, relY = location["x"], location["y"] ##abs X and relative Y
    absY = relY + panelHeight
    w, h = size["width"], size["height"]
    wCenter = w/2
    hCenter = h/2
    xCenter = int(wCenter + x)
    yCenter = int(hCenter + absY)

    start = pyautogui.position()
    end = xCenter, yCenter

    x2 = (start[0] + end[0]) / 2 #midpoint x
    y2 = (start[1] + end[1]) / 2 ##midpoint y

    control1X = (start[0] + x2) / 2
    control1Y = (end[1] + y2) / 2

    control2X = (end[0] + x2) / 2
    control2Y = (start[1] + y2) / 2

    # Two intermediate control points that may be adjusted to modify the curve.
    control1 = control1X, y2 ##combine midpoints to create perfect curve
    control2 = control2X, y2

    # Format points to use with bezier
    control_points = np.array([start, control1, control2, end])
    points = np.array(control_points[:, 0], control_points[:, 1]])  # Split x and y coordinates
    
    # You can set the degree of the curve here, should be less than # of control points
    degree = 3
    
    # Create the bezier curve
    curve = bezier.Curve(points, degree)

    curve_steps = 50  # How many points the curve should be split into. Each is a separate pyautogui.moveTo() execution
    delay = 0.003  # Time between movements. 1/curve_steps = 1 second for entire curve

    # Move the mouse
    for j in range(1, curve_steps + 1):
        # The evaluate method takes a float from [0.0, 1.0] and returns the coordinates at that point in the curve
        # Another way of thinking about it is that i/steps gets the coordinates at (100*i/steps) percent into the curve
        x, y = curve.evaluate(j / curve_steps)
        pyautogui.moveTo(x, y)  # Move to point in curve
        pyautogui.sleep(delay)  # Wait delay

Figure 13. Mouse movement function

I personally move my mouse to the right of a screen to move it out of the way when browsing, so I created a function to move the mouse to a random position on the right of the screen using bezier curves. The two functions are almost identical.

def resting_mouse(): #move mouse to right of screen

    start = pyautogui.position()
    end = random.randint(1600,1750), random.randint(400,850)

    x2 = (start[0] + end[0])/2 #midpoint x
    y2 = (start[1] + end[1]) / 2 ##midpoint y

    control1X = (start[0] + x2)/2
    control2X = (end[0] + x2) / 2

    # Two intermediate control points that may be adjusted to modify the curve.
    control1 = control1X, y2 ##combine midpoints to create perfect curve
    control2 = control2X, y2 ## using y2 for both to get a more linear curve

    # Format points to use with bezier
    control_points = np.array([start, control1, control2, end])
    points = np.array(control_points[:, 0], control_points[:, 1]])  # Split x and y coordinates
    # You can set the degree of the curve here, should be less than # of control points
    degree = 3
    # Create the bezier curve
    curve = bezier.Curve(points, degree)

    curve_steps = 50  # How many points the curve should be split into. Each is a separate pyautogui.moveTo() execution
    delay = 0.003  # Time between movements