It is robustly confirmed empirically that temporally binding features take time.
For us to perceive two sensory features motion and colour, shape and depth, or all four features as occurring at the same time, these features must be bound together in time. If they are not temporally bound we might perceive these features to occur at different times.
But, the process of temporally binding features takes time. I want to argue that this fact can inform discussions about how time perception functions more broadly.
The colour-motion asynchrony illusion
One very good example of the fact that temporal binding takes time comes from the temporal illusions called “colour-motion asynchrony” (Moutoussis & Zeki, 1997).
Consider a scenario where you are staring at a grey screen ‘on top’ of which a pattern of small boxes are changing direction: moving from left to right and changing colour: from black to white. Then if both the colour and motion direction is changing simultaneously (left and white - right and black) at a relatively slow rate (<1hz), we can temporally integrate the two types of changes (direction and colour) so that we consciously perceive them as occurring together (see top row of Figure 8.4 here).
In other words, we are able to temporally bind the two sensory features.
But if we increase the rate of change a bit (to 1-2hz), so that the colour and motion patterns oscillate much quicker, then it is hard to make the temporal binding between colour and motion direction change (see the middle row of Figure 8.4 here).
Although we can consciously see the two features changing, we are unable to consciously perceive them as changing simultaneously/together. We are unable to report which direction/colour pairing goes together.
So the processing of binding features seems to take more time than identifying the features. One proposed reason for this slow temporal binding is that while early visual processing is extremely quick and processes multiple features in parallel, the binding/pairing of features has some sort of bottleneck that makes the binding process work more slowly (Holcombe, 2009).
This bottleneck could be attention. If temporal binding requires attentional selection of the same features at two different positions that must then be brought together, then this must happen faster than the rate of change of the external features. But maybe attentional selection is too slow to make this comparison before the external stimuli have already changed. This is not the common explanation, the most popular explanation is a way more direct explanation and comes from an interesting finding of the colour-motion asynchrony illusion.
The brain time explanation
As I just said, when the rate of change of motion and colour is higher than between 1-2hz people cannot confidently make the pairing between colour and direction.
Yet if we delay colour change by ~100ms then any problems in binding then the temporal binding problem disappears, and participants easily report which of the feature pairs go together.
This shows that there must be ~100ms difference between physical simultaneity in event time and perceived simultaneity in subjective time. But the question is of course what explains this? The popular and direct explanation states that this results proofs that the processing of the colour change is faster than the processing of the direction change. They argue that this difference in neural processing times is then mirrored in our conscious temporal perception (see Moutoussis & Zeki, 1997; Zeki, 2007).
However intuitive this explanation is, there is evidence that this interpretation might be wrongheaded in the way that tracking differential neural latencies of sensory processing is not enough to account for this illusion.
Another view by Shin’ya Nishida & Alan Johnston gives a more complicated yet more convincing explanation of the colour-motion asynchrony illusion, in the shape of their time-marker account. I first explain this account very superficially, and then use it to account for the colour-motion asynchrony illusion.
The time marker account
To account for time perception the time marker account distinguishes between early-level systems and mid-level systems and takes these to play two distinct and important roles in time perception.
Early-level systems: These systems process and keep track of the event time of external features. They do so by representing small time-markers of the early onset of sensory signals of those features (the moment a stimuli hits our sensory boundary surface). The idea is that these time markers can be used as input to mid-level systems.
Mid-level systems: These systems compare the temporal relations of more complex temporal relations between sensory features of different modalities. This system receives two types of input. Time markers (from early level systems) and salient sensory features (extracted from many different sensory channels). The mid-level comparison thus makes a “saliency-based cross-channel comparison” to determine the temporal relations of these features.
The consequence of Nishida & Johnston’s view is that our brain uses early time markers that indicate something close to event time to refer back to when it is comparing the temporal properties of later processed perceptual features. This entails a time perceptual system that is less subject to differential latency of different features and modalities.
Because subjective timing is linked to these time marker thus temporal comparisons of features are between event time occurrences rather than between the brain time occurrences of sensory mechanisms’ finished perceptual representations.
The time-marker explanation
Now we can interpret the colour motion asynchrony in accord with the time-marker view.
Remember, in this colour-motion asynchrony, a white pattern moving leftward and a black pattern moving rightward are alternating at some set rate of change. When the rate of change is higher than between 1-2hz people cannot confidently make the pairing between colour and direction, an effect that disappears if we delay the colour change about ~100ms.
Yet at even higher rates of change (>2hz) the inability to pair the features remains in place even after compensation for a possible asynchrony in the speed at which our system process colour vs. process direction (see Arnold 2005).
With the time-marker account, we can explain this fact because on this view differences in the processing time of sensory features play no essential role in the temporal representation of external features.
What is essential in this view is the temporal structure of the stimuli. The time marker account explains the perceived asynchrony as a consequence of the brain’s attempt to match features that have different temporal structures (Nishida & Johnston 2002). While “color change is a first-order temporal change (first-order temporal derivative of a static attribute that can be defined over two successive points in time), whereas motion direction change is a second-order temporal change (change in the direction of change, a second-order temporal derivative, whose definition requires at least three successive points in time).
This is an important difference for the time marker view. While there is evidence for early-level sensors working at a high temporal resolution that can detect first-order temporal changes (colour, position, change in luminance) there are no such early sensors for second-order temporal changes (motion direction). Any possible detection of second-order changes is constrained by a low temporal resolution mid-level visual comparison.
This means that at fast rates of change, the points of direction change will be blurred out or obscured, as they exceed the temporal resolution of the detection system. As such the comparison between first-order colour change and second-order direction change collapses at fast rates of change where the processes underpinning the detection of second-order changes are out of commission, which leads to faulty matches between the two changes.
Contrary to the brain time view’s analysis this experiment does not show us that the processing of the second-order change is subject to a larger neural delay than the processing of the first-order change. It shows us that any representations of the second-order change are unavailable to the participant when such changes occur above a certain rate of change. We do perceive the motion direction due to a mid-level comparator, but we see it as a blur (as non-salient features), and this is not enough to ground an accurate pairing with the salient features of the colour changes.
The time perceptual system thus seems to have different levels of temporal resolutions which cause bottlenecks that result in visual illusions. This can be explained if we abandon the brain time model according to which all that matters are whether the differential neural delays of our sensory processing systems match up or not.