The receptive field (RF) $l_k$ of layer $k$ is:

where $l_{k-1}$ is the receptive field of layer $k-1$, $f_k$ is the filter size (height or width, but assuming they are the same here), and $s_i$ is the stride of layer $i$.

The formula above calculates receptive field from bottom up (from layer 1). Intuitively, RF in layer $k$ covers $(f_k - 1) * s_{k-1}$ more pixels relative with layer $k-1$. However, the increment needs to be translated to the first layer, so the increments is a factorial — a stride in layer $k-1$ is exponentially more strides in the lower layers.