Web Scraping My Router

I decided to build a tool to web scrape the admin panel web ui of my home internet modem/router so I could programatically learn the IP addresses of devices connected. It turned into a “capture the flag” style reverse-engineering challenge.

I wanted this tool because I am often connecting headless devices (like my new risc-v development board) to my home router directly and I find it tedious to need to log in to the web UI to check if the device has acquired an IP address and what it is. I’m often living in the terminal so having something that can run as a cli tool and/or be used programatically would be a plus for me. I thought it would be trivial as scraping some small HTML and sending some form data is no big challenge but I soon discovered a gauntlet of challenges while reverse-engineering the web ui.

The router is a Vodafone Station which comes bundled with the internet plans for Vodafone Germany. I don’t know much more detail about it as it’s fairly generic and custom branded by Vodafone. The modem has a typical web ui for administration that shows device information and can list the connected devices and their IP addresses.

It should be trivial! Simply send some form data and parse some HTML! But I guess that anytime a programmer says something should be simple, it never will be. Here are the different challenges I faced.

Part 1: Assets not loaded via plain HTML

Logging into the router through the browser shows the list of connected devices on the overview page. Looking at the HTML however shows no signs of the IP addresses or device aliases or other related strings that we could see in the browser.

web ui

Opening the Firefox web developer tools and accessing the network debugger gives the hint that resolves that mystery. The data is loaded via Javascript on the client-side and is not embedded as part of the HTML.

Part 2: Discovering the API endpoint for HostTbl

Looking at the network connections made during page load showed the files and api endpoints accessed. There are some javascript files loaded, some static asset files and some data api endpoints accessed that return json formatted data payloads.

network-debugger

The interesting network accesses to me are to the paths prefixed with /api. These appear to be the api endpoints of the router. Digging through these show one specific api endpoint which provides the IP addresses of the connected devices.

GET http://192.168.0.1/api/v1/host/hostTbl,WPSEnable1,WPSEnable2,Radi
oEnable1,RadioEnable2,SSIDEnable1,SSIDEnable2,SSIDEnable3,operational
,call_no,call_no2,LineStatus1,LineStatus2,DeviceMode,ScheduleEnable,d
hcpLanTbl,dhcpV4LanTbl,lpspeed_1,lpspeed_2,lpspeed_3,lpspeed_4,Additi
onalInfos1,AdditionalInfos2?_=1680890428018

However, if I replicate that network request with curl (copying the headers and cookies), I hit a security wall and get an “unauthorized access” response.

{"error":"error","message":"Unauthorized access!"}

Part 3: The Nonce of the Situation

As I digested the situation, I took notice in the request of the query parameter named “_”. At first I had ignored this parameter thinking it was just noise but I realised it might be related to my failed attempts to access the api endpoints.

The network debugger showed that the first time this query parameter gets used is to access the bsd_acl_rules.js file. From there, the other requests also use this parameter and increment the value of the parameter by 1 sequentially.

Digging into the javascript, I found that it’s populated by the Javascript and loaded with the unix timestamp up to millisecond precision (13 digits).

var nonce = Date.now();

I assume this is some kind of protection mechanism so that stale URLs do not get re-used by the browser or other tools. It seems to check that a request has a timestamp relatively close to the current local time on the router and if it doesn’t, it blocks it.

Adding this parameter to my curl command let me successfully access the api endpoint from the terminal and pull the list of connected devices on the router.

curl -v \
  -H "X-Requested-With: XMLHttpRequest" \
  --referer 'http://192.168.0.1/' \
  -A 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/110.0' \
  --cookie "PHPSESSID=822c56c105b353e537406a54b1e510eb;cwd=No" \
  "http://192.168.0.1/api/v1/host/hostTbl?_=$(date +%s%3N)"

The network debugger showed that the requests to the router also include a session cookie (named PHPSESSID). This is set by the router software at some point of the network interactions but interestingly not on the initial page access (i.e. the index html).

Perusing through the network connections made and traced in the network debugger shows that the cookie first appears after accessing the session api endpoint (/api/v1/session/dlang). So it seems that this must be accessed in order to get a cookie to then be able to log in to the router.

Part 5: Understanding the Login

The login process itself was also interesting. Setting the network debugger to the persistent logs mode enabled logging the of the network connections made over several page loads which is needed in the case for the login (otherwise as the page refreshes during login, the trace gets erased).

With persistent logging turned on, I could trace an actual login performed in the browser and see what data is transferred and the responses from the router.

The login process seems simple at first. Triggered by the user entering their username and password and clicking the button, the browser makes a request to a session api endpoint: /api/v1/session/login. But it does not send the login data and instead sends a request with the following form data.

username=admin
password=seeksalthash

The endpoint responds with a json payload containing two values: salt and saltwebui.

{"error":"ok","salt":"kxbf8pyhmkkp","saltwebui":"9MIw1aXgDzCp"}

The browser then sends a second request to the same endpoint but with a hash of the admin password.

username=admin
password=32dd055052fc682e1e23dc97241504d8

It seems the password is being salted by the salt values retrieved in the first api request. Searching through the Javascript files loaded for saltwebui reveals that the password is being hashed with pbkdf2 in two rounds. A first round with the password and the “salt” (the pbkdf2 algorithm configured with 1000 iterations and the result trimmed to 128 bits) followed by a second round with the result of the first round and the “saltwebui” being hashed (again with the algorithm configured for 1000 iterations and the result trimmed to 128 bits).

The javascript snippets summarised:

function doPbkdf2NotCoded(passwd, saltLocal) {
  var derivedKey = sjcl.misc.pbkdf2(passwd, saltLocal, 1000, 128);
  var hexdevkey = sjcl.codec.hex.fromBits(derivedKey);
  return hexdevkey;
}

var hashed1 = doPbkdf2NotCoded($("#password").val(), distantsaltstored);
$.ajax({
  url: 'api/session/login',
  type: 'POST',
  data: {
    username: username,
    password: doPbkdf2NotCoded(hashed1, distantsaltstoredWebui)
  }
})

I found a Python implementation for pbkdf2 and began trying to replicate the hashes using the sniffed values from the network debugger. At first I didn’t have much success but then by looking through the Javascript I found the algorithm used sha256 while the Python module I was using defaulted to sha1.

Switching from sha1-hmac to sha256-hmac made it possible to replicate the password hashing in Python and I could write a Python script to get a cookie from the router and perform the log in. But trying to use that cookie to then access the host api endpoint, like I did previously, failed with “unauthorized access”.

Part 6: There Can Only Be One

At first everything seemed to be in place to log in but on second try the login api endpoint responded with an error code. The message returned was MSG_LOGIN_150.

This scenario can be replicated in the browser by trying to log in twice at the same time from different sessions. A pop up appears that allows you to “force log out” the current user and then the log in proceeds.

The network debugger shows in this case that a second login attempt is made but this time the network request to get the salt values includes another parameter named “logout” which is set to the string “true”.

Replicating this in the Python script allowed the login to always succeed but I would still hit the “unauthorized access” error when trying to scrape the host api endpoint.

Part 7: Hitting the Browser Security Heuristic

I tested again with the browser and decided it must be that some other accessed pages are vital to the process. Currently I was directly attempting to access the host api endpoint without requesting all the other pages and given the host api endpoint is normally occurring almost last in the browser, this meant the script was skipping many paths that were typically loaded by the browser.

I started inserting requests to other resources in the same order the browser does and suddenly it came to life. After some experimentation I found that the router requires a session to first access /js/app/bsd_acl_rules.js followed by /api/v1/session/menu in order to then be able to access any of the other paths.

Summary

And with that, I had a Python script that could access authenticated http api endpoints in my home router to do things like learn the IP addresses of connected devices.

A summary of the gauntlet of challenges faced to be able to script the data scraping of the router web ui:

  • Data is not embedded in the HTML and instead hidden in some undocumented http api endpoints
  • Some network paths are protected by a timestamp nonce
  • The session cookie only becomes available when hitting a certain path
  • The login process involves pbkdf2 hashing on the client side
  • An already logged in user can block another user from logging in
  • The router requires certain paths to be accessed in order otherwise the session is invalidated

I found the reverse engineering of the security on the router api endpoints was a fun challenge and as a bonus I now have a useful cli tool for finding the IP address of connected devices.

In conclusion, I found a sweet capture the flag challenge in my home router, I ran through a gauntlet of challenges and I made a useful tool for myself. You can find the script on my GitHub page.