Solving Dragster Optimally
Problem formulation
From the description of the game’s physics one can see that it can be modeled as a deterministic, discrete-time system of the form
where the state and the inputs . The subscripts indicate the time step. Note that the state-update function is time-inhomogeneous because of the frame rule for updating the motor speed.
The state space
is the cartesian product of four variables (clutch), (gear), (motor speed), and (dragster speed). With the given domains of the state variables , the state space has
different configurations. Similarly, the input space
is the cartesian product of two variables (throttle) and (clutch) with a cardinality of .
The objective is to maximize the distance in a fixed amount of time . Since the distance at time is the sum of the speeds (the fourth component of the state vector ) from , this leads to
where is denoting the set of feasible initial conditions. Note that the speed of the final state does not influence the objective value.
Dynamic programming
Since the cardinality of the state space and input space are so small, it is possible to use a dynamic programming approach to efficiently solve problem to optimality.
To this end, one decomposes the above problem into nested sub-problems as follows
with the initial condition .
One solves the problems backwards in time and the best value of (among any feasible initial ) corresponds to the value of the original problem.
Python implementation
The following Python (pseudo-)code implements the solution of the dynamic programming problems .
u = [[None]*m for j in range(n-1)]
Q = [[0]*m for j in range(n)]
for j in range(n-2, -1, -1):
for x in product(range(2),range(5),range(32),range(254)):
umax = max(product(range(2),range(2)), key=lambda u: Q[j+1][f(j, u, x)])
u[j][x] = umax
Qmax = Q[j+1][f(j, umax, x)]
Q[j][x] = x[3] + Qmax
Remarks
- Indexing the lists
Q[j]
/u[j]
by a 4-tuplex
does not work – one has to transform the states into a linear index. - To deal with infeasible inputs, it is practical to augment the state space by one more state denoting infeasibility, e.g. , and defining .
- The time index
j
in the above snippet does not correspond to the global frame counter . One needs an explicit mapping between the two. - One can choose the initial time step to be the last frame before the in-game timer starts running. In this case, the set of feasible initial conditions are: arbitrary (0 or 1), , any multiple of 3 between 0 and 31, and .
- For reconstructing the optimal inputs, one has to store them in a table
(
u[][]
above). However, one only needs the optimal values for the current time stepj
and the previous time stepj+1
. So one could eliminate the tableQ[][]
and improve a little bit on the memory efficienty of the above code.
A working implementation can be found in dprog.py.