September 2, 2024

Unpacking Clever Devices' BusTracker

Background

Clever Devices is a public transit automation provider, based in New York. They develop a variety of systems relating to public bus and rail networks, including a real-time bus tracking software suite known as BusTime, the target for today. This software can be hosted or run on premise, providing, among other things, a real-time map of bus/train locations, and an API exposing that data for developer use.

While developing a map displaying real-time data for various services in my local area, I ran across this software, used by my local traffic authority, so I decided to explore what exactly could be found by snooping around publicly available data.

Exploration

Before attempting to scrape any data, I needed to find where API endpoints were, what type of data was returned by them, and how they were possibly secured. Oftentimes this type of information is publicly available (even if it isn’t supposed to be), so I started looking.

Documentation

First of all, I wanted some documentation. Unfortunately for me, Clever Devices doesn’t make the documentation for BusTime available publicly on their website, only providing a product fact sheet (written for sales teams).

Simply searching for bustime documentation on Google got a couple of hits. Looks like the MTA (the transit authority for New York) has its own custom system for tracking its vehicles, coincidentally also called “BusTime”, so there’s quite a few hits for that. A little ways down the search page, I ran across a link to a BusTime® Developer API Version 3 Guide, hosted by a small local TA. By changing the version number in the URL (replacing 3_0 with 2_0 or 1_0), I could find the documentation for all three currently available versions of the API! I found a couple more copies of the documentation online, all hosted by a different TA, but all with the same url: /bustime/apidoc/docs/DeveloperAPIGuide3_0.pdf.

As it turns out, BusTime doesn’t just host an API and a map, it hosts its own documentation! If you can find the subdomain used by a TA to host BusTime, you can find the exact revision of the software they’re using, and the full documentation for it.

Client Code

Now that I had the documentation, I wanted to find how TAs interacted with BusTime’s API to create the real-time map. I searched around for keywords like “BusTime Real-Time” and “BusTime Map” to find a list of ~10 different TAs using BusTime, and took a look at how the map was implemented in each one.

Notably, there seems to be two different types of map, often both hosted at the same time.

Old Map

The first one is almost always hosted at /bustime/map/displaymap.jsp and appears to be older (Example images place it at least as old as 2009) This map is built with Java Server Pages, using jQuery for the frontend. When viewing the network requests for this map, I found that it fetches data from JSP files, not from the BusTime API, leading me to believe that it was developed before the modern BusTime API was implemented.

JavaScript resources are found at /bustime/javascript/ and /bustime/map/javascript/. The general (/bustime/javascript/) files are un-minified, showing us comments from Clever Devices developers. At the top of each file is a copyright notice and a changelog for the file, noting the developer, date, pull request (always empty), and internal ticket information.

The map (/bustime/map/javascript/) files are slightly minified, hiding most useful comments and only showing changes via comments like //@8.

New Map

The second map is hosted at /map, and is what is used in the BusTime app that TAs can publish. This map seems much more modern, being built with Angular and using the BusTime API to fetch data.

JavaScript resources can be found bundled via Webpack into multiple files at the root, but main.__HASH__.js is the most interesting. This ~63,000 line long file contains the Angular app, alongside their custom BusTime API client.

Scraping and Automation

With both documentation and real-world examples of usage, I could start attempting to grab data from either of the two maps for use in other applications.

Old Map

When first selecting a route to view, the map will make a request like this:

GET /bustime/map/getRoutePoints.jsp?route=BLU&key=0.2806529356346592 HTTP/1.1

When attempting to simply fetch this (via curl), I immediately got a valid XML response! No need to do anything special here, just making an HTTP GET request to the correct endpoint with a couple URL parameters will get you the data you want.

When checking other similar requests, I noticed that the key param keeps changing for each request. I’ve very never seen an API key that looks like this or changes on each request, so I dug into exactly where the code was constructing these URLs. Extremely quickly, I found this code in /bustime/map/javascript/CDAjaxGetAllBuses.js (similar code exists in other files):

var url =
	g_requestContextPath +
	'/map/getBusesForRouteAll.jsp' +
	'?key=' +
	Math.random()

That’s right! The API key is literally a call to Math.random(). It isn’t even checked by the server. I tried completely removing key from requests, and it still returns valid responses!

Scraping data from the old map is as easy as identifying the endpoint you want to access, and sending a GET request to it, no key needed.

New Map

The new map makes scraping slightly more difficult, even though it returns the same/similar data as the old map, just in a different format. Section 2 in the official documentation mentions:

The “key” parameter represents the API key assigned to the developer making the request. All requests to the API must be accompanied by a valid API key.

Sure enough, making a curl request without a key will return a “No API access key supplied” error, and entering a bogus key will return a “Invalid API access key supplied” error. Looks like they implemented proper API authentication!

Unfortunately, this authentication isn’t very strong. Grepping for key= immediately returns a hit for a line of code reading:

H += Q + P + '&key=Qskvu4Z5JDwGEVswqdAVkiA5B&format=json'

The key isn’t Math.random() anymore, but it’s hard-coded into the bundle! I searched through the other examples of this codebase I found and in every case, the key is exactly the same. This key (Qskvu4Z5JDwGEVswqdAVkiA5B) appears to be a global “map” key, which will authenticate you for every TA using Clever Devices BusTime. Neat!

Notably, section 1.5 of the documentation reads:

1.5 Is there a limit to the number of requests I can make to the Developer API?

Yes. By default, one API key can make a maximum of 10,000 requests per day. If you believe that you will require more than 10,000 daily requests, you must request that the cap on your key be raised to handle the additional traffic.

This means one of two things is true:

A bad actor could make over 10,000 requests in a day to one specific TA’s map, making it inaccessible for all other users, or, more likely,
This key is exempt from this restriction, allowing anyone with it to make an unlimited number of requests a day.

Either way, not a great design choice.

DISCLAIMER: Please don’t spam your local TA’s website!

Unfortunately, simply knowing the “secret” API key is not enough to avoid the “No API access permitted” error. The client also sets two non-standard HTTP headers for each request: X-Date and X-Request-ID, which are validated by the server for all requests except gettime. Reading through the documentation, Clever Devices never mentions either of these secret headers, which seems like a real pain for any TA developers attempting to interface with the API.

An example of these headers is:

X-Date: Sun, 01 Sep 2024 01:41:40 GMT
X-Request-ID: 22ce9958bebe7c1b07a7ac91e998f216bbab98109db42ec1b656bb86c7ea8274

X-Date is clearly a call to Date.prototype.toUTCString(), but with no information about how X-Request-ID is set, I was forced to dive directly into the code to attempt to understand what was going on here.

Grepping for X-Request-ID immediately returned a single result within the ~66,000 lines of code, around 23,000 lines in. The class it’s contained in turned out to be an AngularJS HTTP interceptor, so I cleaned up the code, and got this:

@Injectable()
class HeaderInterceptor implements HttpInterceptor {
	constructor(private devAPIService: DevAPIService) {}

	intercept(req: HttpRequest<any>, handler: HttpHandler) {
		let r = req

		if (!this.isGetTimeCall(r)) {
			this.devAPIService.getLocalSynchronizedServerTime().subscribe((date) => {
				const utcString = date.toUTCString()
				r = r.clone({
					headers: r.headers.set('X-Date', utcString),
				})

				const apiEndpoint = this.getApiEndpoint(r)
				const id = this.hashData(apiEndpoint + utcString, 'SHA256')
				r = r.clone({
					headers: r.headers.set('X-Request-ID', id.toString()),
				})
			})
		}

		return handler.handle(r)
	}
	isGetTimeCall(req: HttpRequest<any>) {
		return req.url.includes('/api/v3/gettime')
	}
	getApiEndpoint(req: HttpRequest<any>) {
		const i = req.urlWithParams.indexOf('/api/')
		return req.urlWithParams.substring(i)
	}
	hashData(data: string, hashAlgo: string) {
		const hmacAlgo = CryptoJS['Hmac' + hashAlgo]
		if (hmacAlgo) {
			return hmacAlgo(data, 'ZSqCAFdU7bwxHJUHKYfQUxKin06hMxCK').toString(
				CryptoJS.enc.Hex,
			)
		} else {
			return ''
		}
	}
}

If the endpoint isn’t gettime (the server doesn’t check headers on time requests), the API endpoint (with URL params) is concatenated with the synchronized server time,

let data =
	'v3/getvehicles?requestType=getvehicles&rt=1&key=Qskvu4Z5JDwGEVswqdAVkiA5B&format=json&xtime=1725154901451' +
	'Sun, 01 Sep 2024 01:41:40 GMT'

The result of that concatenation is hashed via CryptoJS.HmacSHA256 with a ‘secret’ hard-coded key (always ZSqCAFdU7bwxHJUHKYfQUxKin06hMxCK),

let id = CryptoJS.HmacSHA256(data, 'ZSqCAFdU7bwxHJUHKYfQUxKin06hMxCK').toString(
	CryptoJS.enc.Hex,
)

Finally, X-Request-ID is set to the hash.

X-Request-ID: 0aaea89fdd69e16e4bbcdcb96f4473ce80c5e9d3ca064566f81dc3d8efdb9e7e

My best guess is that this header pair is meant to be a unique identifier for each request (like Cloudflare’s CF-Ray), although there’s some holes in that theory.

If strictly following the API documentation, every request sent within a specific second to a specific endpoint will have the exact same ID, as Date.prototype.toUTCString is only accurate to the second. The map’s developers seem to have noticed this, as they tack on an undocumented and seemingly unused xtime URL parameter set to the current UNIX timestamp. This means that there’s a lower chance of two requests having the same ID, but if two requests are sent at the same millisecond, to the same endpoint, they will have the same ID.

Unfortunately, the server appears to continue accepting an X-Date and X-Request-ID pair that is up to 10 minutes old! You can keep sending the exact same request over and over, defeating the purpose of a unique identifier.

The more cynical angle on these headers is that they function as a crude deterrent to would-be scrapers, who, having found one hard-coded key, assumedly wouldn’t be able to find the second one?

Conclusion

Scraping information from BusTime’s legacy map is easy, and their new API doesn’t seem to be much harder. They’ve began authenticating API keys (yay!), but use a single hard-coded one for everything (boo…). They also now validate two secret headers, never mentioned in documentation (booooooo…), yet neither of these provide any security either.

In totality, BusTime is an interesting target to scrape, providing useful documentation and near-real-time data, but they don’t seem to have quite gotten the hang of securing APIs yet.