Intro
The Binary Defense threat hunting team are experts on today’s threat actor groups. In addition to monitoring criminal forums, we conduct our own research to share with the infosec community. This post is a summary of a project our threat hunting team has set up to be able to mimic human activity in a controlled environment. In this post, we will give you an overview of our findings, and how you can do something similar in your organization.
Be sure to check out the video demo as well!
Purpose of the Project
Binary Defense has set up a controlled lab environment that is isolated from any other network to allow threat actors to attack with no repercussions. Allowing threat actors to attack the lab environment gives our team of threat hunters/researchers data to analyze and further use to prevent attacks on our client’s infrastructure.
One of the main things attackers look for as soon as they have access to a network is its size and the activity. If a network is seemingly empty, the threat actor might move on to a target they feel is more worthwhile. To check activity, some malware variants utilize screen capture techniques to see what a person is doing on a machine. This technique is becoming increasingly popular. By creating a script that mimics human activity programmatically, we can simulate the activity without needing real people on the machines.
What is Selenium?
Selenium is an open-source tool based on the JavaScript framework used for browser automation and application testing. Selenium eliminates repetitive manual testing that consumes a lot of time and effort. Users can write scripts in languages such as Java, Python, Ruby, JavaScript, Perl, PHP and C# to run against browsers and virtual machines. This allows many testers to be able to write scripts without language barriers. It also allows for cross-browser compatibility testing using most standard browsers and can be carried out on Windows, MacOS, and Linux systems. Selenium allows for manipulation by finding elements on the source code for web pages. While Selenium was developed for testing purposes, its browser manipulation capabilities allow for a wide range of use cases. These capabilities are what lead Binary Defense to the idea of simulating human activity.
Requirements
The project source code is published on GitHub: https://github.com/stacycreasey/Browsing-Bot
To replicate this project, you will need a web browser and correlating driver. We used Chrome and Chrome driver, Python, Numpy version 1.21.1, and Selenium version 4.0.0a6.post2. Selenium documentation can be found at https://selenium-python.readthedocs.io/.
Starting Webdriver Instance
main.py
from selenium import webdriver from selenium.webdriver.support import expected_conditions as EC from selenium.webdriver.chrome.service import Service from selenium.common.exceptions import * PATH = Service("/path/to/driver") ##constant file path of Chrome driver options = webdriver.ChromeOptions() # Initializing Chrome Options from the Webdriver options.add_experimental_option("useAutomationExtension", False) # Adding Argument to Not Use Automation Extension options.add_experimental_option("excludeSwitches", ["enable-automation"]) # Excluding enable-automation Switch options.add_argument("disable-popup-blocking") options.add_argument("disable-notifications") options.add_argument("disable-gpu") ##renderer timeout driver = webdriver.Chrome(options=options, service=PATH)
Figure 1. Setting up Selenium Webdriver instance
After setting the path to Chrome driver and adding options to remove popups and turn off a Chrome feature that says the browser is being used with automation, the driver will start and open a blank Chrome webpage.
Fake Credentials for Web Form Grabbing
To allow attackers to grab credentials from websites with form grabbers, we made some fake credentials to log into a few websites. The usernames and passwords for these sites are saved in a text file called “usernamesPasswords.txt” in the format “username:password”. This text file makes it easier to change and add new credentials as needed. Currently, the script signs into four websites solely for the purpose of having the credentials grabbed. The four sites are Shein, Wish, AliExpress and Gearbest. The functionality and code for these four sites are almost identical but include minor changes to accommodate for different element names and paths.
The first part of these functions split the login credentials by “:” and pulls the corresponding one for each site.
AliExpress.py
def login_info(): with open("usernamesPasswords.txt", "r") as infile: data = [line.rstrip().split(":") for line in infile] username = data[1][0] password = data[1][1] return username, password
Figure 2. Function to grab login credentials
Selenium interacts with the page, checks to see if an ad has popped up, and if it has, it will be closed.
def ad_popup(driver): try: WebDriverWait(driver,5).until(EC.visibility_of_element_located((By.CLASS_NAME, "coupon-poplayer-modal"))) ##coupon popup WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.CLASS_NAME, "btn-close"))).click() except: pass
Figure 3. Close ads
Selenium then waits for specific elements to be located and clickable to determine if sign in was successful
def logged_in_check(driver): try: WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.ID, "nav-user-account"))).click() ##drop down menu WebDriverWait(driver, 5).until(EC.presence_of_element_located((By.XPATH, "//b[@class='welcome-name']"))) print("ALIEXPRESS: sign in successful") return True except: return False
Figure 4. Checking if Selenium successfully logged in
The next function opens the website with driver.get(“https://www.aliexpress.com/”), tries to log in four times with the credentials from login_info()and if isLoggedIn == True Selenium will move onto the next site, but if isLoggedIn == False, elements from the source code attempt to be found for credentials to be sent to. For example, on AliExpress, the element ID for the username/email box is “fm-login-id,” Selenium waits until the element is visible, clicks on it with emailElement.click() and types the username by emailElement.send_keys(username). The process is then repeated for the password input box and submit button, then Selenium checks again to see if login was successful. If it was not, the functions repeat until the variable loginAttempts equals 4. If after four attempts login was still not successful, the script moves onto the next site and prints “ALIEXPRESS: sign in NOT successful.”
from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.common.by import By from selenium.webdriver.support import expected_conditions as EC from selenium.common.exceptions import * import time loginAttempts = 0 def ali_express_run(driver): global loginAttempts if loginAttempts < 4: try: username, password = login_info() driver.get("https://www.aliexpress.com/") ad_popup(driver) isLoggedIn = logged_in_check(driver) if isLoggedIn == True: return elif isLoggedIn == False: try: WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.ID, "nav-user-account"))).click() ##drop down menu WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.CLASS_NAME, "sign-btn"))).click() ##sign in button emailElement = WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.ID, "fm-login-id"))) emailElement.click() time.sleep(1) emailElement.send_keys(username) time.sleep(1) passwordElement = WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.ID, "fm-login-password"))) passwordElement.click() time.sleep(1) passwordElement.send_keys(password) time.sleep(1) WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.CLASS_NAME, "fm-button"))).click() ##submit isLoggedIn = logged_in_check(driver) if isLoggedIn == True: return elif isLoggedIn == False: loginAttempts += 1 ali_express_run(driver) except: loginAttempts += 1 ali_express_run(driver) except: loginAttempts += 1 ali_express_run(driver) else: print("ALIEXPRESS: sign in NOT successful")
Figure 5. Login function
Simulating User Activity
Mimicking realistic human activity involved multiple parts and websites. These functions vary more than the login functions as the websites used for this part of the project are not all the same type of website, e.g., ecommerce vs. games. The main part of this project was to simulate user browser activity in real time, meaning Selenium opens and closes pages and tabs, slowly scrolls through pages at the speed of a person, and even stops to “read” articles and product pages. Six websites are included for human activity purposes: Amazon, eBay, Etsy, Fox News, 2048 and Cookie Clicker.
For the ecommerce sites, popular search terms were put into text files, “keywordsAmazon.txt”, “keywordsEbay.txt”, and “keywordsEtsy.txt”. Search terms are chosen at random to give the illusion of different search activities. The number of searches made and product page numbers are also chosen at random. Seven to 20 pages were looked at for each search term, and 15 to 25 search terms were used. A function is used to split the keywords and add them to a list to be randomly chosen, similar to the splitting of the usernames and passwords.
Each website uses their own specific URL template to determine the page number a user is on. Knowing the template allows us to manipulate the URL to act as if someone is pressing “next page.” By taking our search term and adding the specific page template, we can replace parts of the URL to include the different product pages.
amazon.py
def next_page(keyword): template = "https://www.amazon.com/s?k={}" keyword = keyword.replace(" ", "+") url = template.format(keyword) url += "&page={}" return url
Figure 6. Function for getting the next product page
After searching our first keyword, we can get the list of the different products on the page by finding the element associated with the results, in this case by class name “a-link-normal.a-text-normal,” and then grabbing the href attributes from the element and putting them into a separate list, making sure to remove duplicates.
def link_product(tempList, driver): ##product links for opening in new tab resultList = driver.find_elements(By.CLASS_NAME, "a-link-normal.a-text-normal") ##class name for search results links = [x.get_attribute("href") for x in resultList] ##pull links for search results to open product page for m in links: ##removing duplicates links because class name is broad and has multiple instances of same href if m not in tempList: tempList.append(m) links = tempList return links
Figure 7. Getting product links
The product pages for Amazon have a container called “a-container” that covers the entirety of the product; the script gets the location and size of this container for slow scrolling in a later function.
def product_page(driver): ##coords and dimensions to scroll through page container = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CLASS_NAME, "a-container"))) containerLoc = container.location containerSize = container.size startY = containerLoc["y"] height = containerSize["height"] return startY, height
Figure 8. Location and size of product page container
The main function for each website is called “*_run” where “*” is the name of the site. For Amazon, the number of search terms and product pages are randomly determined with numKeywords = random.randint(15,25) and numPages = random.randint(7,20). The keywords are then randomly chosen from the get_keywords() function. The product page links are then received from the link_product() function. Each product on the search results page is then scrolled to and opened in a new tab. Selenium slow scrolls through the product page and when the end of the product page is reached, the product tab closes and returns to the search results tab. The same process is then repeated for each page and search term.
import numpy from selenium import webdriver from selenium.webdriver.common.keys import Keys ##gives access to enter and escape key for results from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.common.by import By from selenium.webdriver.support import expected_conditions as EC from selenium.webdriver.chrome.service import Service from selenium.common.exceptions import * import time import random def amazon_run(driver): numKeywords = random.randint(15,25) ##random num of keywords numPages = random.randint(7,20) ##random num of pages driver.get("https://www.amazon.com/") for i in range(numKeywords): keyword = random.choice(get_keywords()) ##random keyword selection url = next_page(keyword) ##get the product page num try: for j in range(1,numPages): tempList = [] time.sleep(2) driver.get(url.format(j)) ##add page num to url with j time.sleep(5) links = link_product(tempList, driver) ##get links to products for k in range(0, len(links)): parentWindow = driver.window_handles[0] ##main tab time.sleep(1) fullLink = links[0] ##start at index 0 partialLink = fullLink.removeprefix("https://www.amazon.com") ##href doesnt use this prefix but web element did elementLink = driver.find_element(By.XPATH, '//a[contains(@href, "' + partialLink + '")]') driver.execute_script("arguments[0].scrollIntoView({ behavior: 'smooth', block: 'center'});", elementLink) ##scroll to product time.sleep(1) driver.execute_script("window.open('" + fullLink + "')") ##open product in new tab childWindow = driver.window_handles[1] ##product window time.sleep(1) driver.switch_to.window(childWindow) ##without switch stays on parent (home) window time.sleep(1) startY, height = product_page(driver) for l in numpy.arange(startY, height, random.uniform(0.04, 0.1)): ##slow scroll through product page driver.execute_script("window.scrollTo(0, {});".format(l)) time.sleep(1) driver.close() ##closes child (product) window driver.switch_to.window(parentWindow) ##switch back to parent window except Exception as e: exception_handler(e, driver)
Figure 9. Opening the website and scrolling through products
The game functions are less complex than the other ones. The point of including the games is to use them as filler space to allow the script to run as long as possible.
The Cookie Clicker game is as simple as it gets, Selenium clicks on a cookie to earn money and buy upgrades. The money element is stored by ID “money,” and the upgrades are found in elements with CSS selector “#store b.” If the price of an upgrade is smaller than the amount of money, Selenium buys it.
cookieClicker.py
from selenium import webdriver from selenium.common.exceptions import ElementNotInteractableException, NoSuchElementException, StaleElementReferenceException from selenium.webdriver.common.keys import Keys from selenium.webdriver.common.action_chains import ActionChains from selenium.webdriver.chrome.service import Service from selenium.webdriver.common.by import By from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as EC import time def cookie_run(driver): driver.get("http://orteil.dashnet.org/experiments/cookie/") cookie = WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.ID, "cookie"))) timeout = time.time() + 5 game_time = time.time() * 60 # * 30 minutes while True: time.sleep(0.1) cookie.click() if time.time() > timeout: currentMoney = int("".join(driver.find_element(By.ID, "money").text.split(","))) access = driver.find_elements(By.CSS_SELECTOR, "#store b") upgrades = {"id": f"buy{i.text.split('-')[0].strip()}", "price": int("".join(i.text.split("-")[1].strip().split(",")))} for i in access[:-1] ] for item in upgrades[::-1]: if item["price"] < currentMoney: buy = driver.find_element(By.ID, item["id"]) buy.click() break timeout = time.time() + 5 if time.time() > game_time: time.sleep(5) break
Figure 10. Cookie Clicker game
All of these functions and files are put into main.py and are also chosen at random. Two lists are made for activity functions and login functions. They’re separated to allow all of the login functions to run at the beginning of the script because they are supposed to be done quickly and quietly. The lists are randomly shuffled and then iterated through until completion. Start and end times are also included for testing purposes. The full main.py file is listed below.
import numpy from selenium import webdriver from selenium.webdriver.support import expected_conditions as EC from selenium.webdriver.chrome.service import Service from selenium.common.exceptions import * import time import random import datetime from amazon import amazon_run from foxnews import fox_news_run from etsy import etsy_run from aliExpress import ali_express_run from gearbest import gearbest_run from wish import wish_run from shein import shein_run from game2048 import game_run from ebay import ebay_run from cookieClicker import cookie_run functionList = [amazon_run, etsy_run, fox_news_run, game_run, cookie_run, ebay_run] loginList = [gearbest_run, ali_express_run, wish_run, shein_run] def setDriver(): PATH = Service("/path/to/driver") ##constant file path of Chrome driver options = webdriver.ChromeOptions() # Initializing Chrome Options from the Webdriver options.add_experimental_option("useAutomationExtension", False) # Adding Argument to Not Use Automation Extension options.add_experimental_option("excludeSwitches", ["enable-automation"]) # Excluding enable-automation Switch options.add_argument("disable-popup-blocking") options.add_argument("disable-notifications") options.add_argument("disable-gpu") ##renderer timeout driver = webdriver.Chrome(options=options, service=PATH) return driver def main(): start = datetime.datetime.now() random.shuffle(functionList) ##shuffles list of functions driver = setDriver() driver.get("https://www.google.com") for i in range(len(loginList)): ##go through login functions loginList[i](driver) time.sleep(2) for i in range(len(functionList)): ##go through functions in shuffled order functionList[i](driver) time.sleep(2) end = datetime.datetime.now() executionTime = end - start print("Start time - " + str(start)) print("End time - " + str(end)) print("Execution Time - " + str(executionTime))
Figure 11. Completed main.py
Advanced Features for Mouse Movement
To simulate humanlike mouse movements, we recommend bezier version 2021.2.12 and pyautogui version 0.9.53.
Real human mouse movement does not move in a straight line. Humans actually move their mouse along a curved path that can be approximated by a Bezier curve. Two control points, b1 and b2, are determined to find the curve in relation to the start and end points, b0 and b3. We follow this curve quite precisely, so it makes a great feature for simulating human activity.
When scrolling through a webpage, you might also notice that your mouse shakes just a tiny bit as you are scrolling with your wheel. Using pyautogui can determine the location of a mouse relative to the computer screen and move to the next HTML element with the Bezier curve in mind, while also simulating the mouse jittering movement when scrolling and slow typing letter by letter with varying speeds.
The slow typing function requires pageInput or the words that need to be typed, and the element in which they need to be sent through. Each letter of the string of words is iterated through and typed into the element with realistic typing speeds.
def slow_type(pageElem, pageInput): for letter in pageInput: time.sleep(float(random.uniform(.05, .3))) pageElem.send_keys(letter)
Figure 12. Slow typing function
panelHeight = driver.execute_script(‘return window.outerHeight – window.innerHeight;’) gives the dimensions of the size of the window panel to be used later when determining mouse location and element location. The beziercurve function determines the absolute x and y locations of a page element and finds the middle of the element for the mouse to move to. The starting position of the mouse can be found with pyautogui.position(). Control points are then determined based on the coordinates of both the start and end locations of the mouse. The curve is then created with the control points and degree of curve, then pyautogui moves the mouse accordingly.
def bezier_mouse(location, size, panelHeight): ##move mouse to middle of element x, relY = location["x"], location["y"] ##abs X and relative Y absY = relY + panelHeight w, h = size["width"], size["height"] wCenter = w/2 hCenter = h/2 xCenter = int(wCenter + x) yCenter = int(hCenter + absY) start = pyautogui.position() end = xCenter, yCenter x2 = (start[0] + end[0]) / 2 #midpoint x y2 = (start[1] + end[1]) / 2 ##midpoint y control1X = (start[0] + x2) / 2 control1Y = (end[1] + y2) / 2 control2X = (end[0] + x2) / 2 control2Y = (start[1] + y2) / 2 # Two intermediate control points that may be adjusted to modify the curve. control1 = control1X, y2 ##combine midpoints to create perfect curve control2 = control2X, y2 # Format points to use with bezier control_points = np.array([start, control1, control2, end]) points = np.array(control_points[:, 0], control_points[:, 1]]) # Split x and y coordinates # You can set the degree of the curve here, should be less than # of control points degree = 3 # Create the bezier curve curve = bezier.Curve(points, degree) curve_steps = 50 # How many points the curve should be split into. Each is a separate pyautogui.moveTo() execution delay = 0.003 # Time between movements. 1/curve_steps = 1 second for entire curve # Move the mouse for j in range(1, curve_steps + 1): # The evaluate method takes a float from [0.0, 1.0] and returns the coordinates at that point in the curve # Another way of thinking about it is that i/steps gets the coordinates at (100*i/steps) percent into the curve x, y = curve.evaluate(j / curve_steps) pyautogui.moveTo(x, y) # Move to point in curve pyautogui.sleep(delay) # Wait delay
Figure 13. Mouse movement function
I personally move my mouse to the right of a screen to move it out of the way when browsing, so I created a function to move the mouse to a random position on the right of the screen using bezier curves. The two functions are almost identical.
def resting_mouse(): #move mouse to right of screen start = pyautogui.position() end = random.randint(1600,1750), random.randint(400,850) x2 = (start[0] + end[0])/2 #midpoint x y2 = (start[1] + end[1]) / 2 ##midpoint y control1X = (start[0] + x2)/2 control2X = (end[0] + x2) / 2 # Two intermediate control points that may be adjusted to modify the curve. control1 = control1X, y2 ##combine midpoints to create perfect curve control2 = control2X, y2 ## using y2 for both to get a more linear curve # Format points to use with bezier control_points = np.array([start, control1, control2, end]) points = np.array(control_points[:, 0], control_points[:, 1]]) # Split x and y coordinates # You can set the degree of the curve here, should be less than # of control points degree = 3 # Create the bezier curve curve = bezier.Curve(points, degree) curve_steps = 50 # How many points the curve should be split into. Each is a separate pyautogui.moveTo() execution delay = 0.003 # Time between movements