Hopefully the right forum for this question about sampling and reconstruction.
The sinc function in the spatial domain has infinite extent which causes aliasing in the spatial domain when convolved with the sampled function. However the Fourier transform of the sinc function in the frequency domain is the box function which allows for perfect reconstruction in the frequency domain (if all other sampling conditions are satisfied).
All the books I read about sampling and reconstruction in computer graphics explain reconstruction in the frequency domain and then jump to the spatial domain. Telling that an approximation is necessary because the sinc function is impractical.
So what is the reason then that we don't reconstruct in the frequency domain and use the box function to achieve perfect reconstruction and move from there back to the spatial domain?
I know that what I propose is not possible. But why isn't it possible? I am completely unable to find the explanation for that. (p.s. If I'm unclear I can explain my question in a more mathematical terms, but for now I hope this will do).
Sampling and reconstruction (sinc, box)
Started by MrFrankly, Apr 18 2008 06:30 PM
5 replies to this topic
#1
Posted 18 April 2008 - 06:30 PM
#2
Posted 18 April 2008 - 08:28 PM
Well, whenever computers work with signals, they're always in a sampled format. So in real life, "reconstruction" is what happens when the speakers play back the audio and turn it from a set of samples into a continuous sound wave in the air. Or for graphics, "reconstruction" is the pixels on the screen lighting up, turning a set of samples into a continuous distribution of light. I'm not really sure how you would build an actual speaker or monitor that performed the reconstruction in the frequency domain and then Fourier-transformed things back into the spatial domain.
Given that, I would guess the reason that most of the literature on signal processing kind of skips over reconstruction is just that it's a topic of interest only to hardware designers, and software designers can't really affect reconstruction, so they don't think about it that much.
Given that, I would guess the reason that most of the literature on signal processing kind of skips over reconstruction is just that it's a topic of interest only to hardware designers, and software designers can't really affect reconstruction, so they don't think about it that much.
reedbeta.com - developer blog, OpenGL demos, and other projects
#3
Posted 18 April 2008 - 09:16 PM
Reedbeta said:
...
Given that, I would guess the reason that most of the literature on signal processing kind of skips over reconstruction is just that it's a topic of interest only to hardware designers, and software designers can't really affect reconstruction, so they don't think about it that much.
Given that, I would guess the reason that most of the literature on signal processing kind of skips over reconstruction is just that it's a topic of interest only to hardware designers, and software designers can't really affect reconstruction, so they don't think about it that much.
Well that's not entirely true. Interpolation during image scaling or texture mapping on a surface is a form of reconstruction. Things like bilinear, nearest-neighbour or cubic interpolation are convolution filters in the spatial domain. So it's not just hardware designers that have to deal with it - when you're writing a renderer or image manipulation software you have to deal with reconstruction.
Transferring back to the spatial domain from the frequency domain would mean applying the inverse Fourier transformation.
The only reason I can think of that this isn't done in graphics is the fact that image functions are generally not band-limited so applying a box filter in the frequency domain isn't possible anyway.
#4
Posted 18 April 2008 - 09:41 PM
Hmm...I'm not sure, but I guess that e.g. if you're scaling up an image, then taking its Fourier transform, scaling in frequency space, then inverse Fouriering it might simply take too much time. The FFT algorithm takes O(n log n) time and you'd have to do it for the entire image. Realistically it's faster and gets you just as good-looking results to use a Gaussian filter with a support of a few pixels in the spatial domain.
reedbeta.com - developer blog, OpenGL demos, and other projects
#5
Posted 19 April 2008 - 06:18 AM
sinc filter is actually used in computer graphics to generate mipmaps, so it's perfectly sensible to use it as a pre-processing filter. NVidia texture tools has sinc filter as an option, and generates somewhat better results than plain box filter. It's not reasonable to use it as a real-time filter though because GPUs must support random access to textures, and for sinc filter you would have to sample the entire texture for each sample, while for bilinear filter you sample only 4 surrounding texels.
#6
Posted 02 June 2008 - 11:20 AM
I'm not fully sure what you're asking, so I'll shotgun it, hopefully some of this will help. As you recall, we can reconstruct a chunk of suitably band-limited signal X(t) from its discretization x[t] as follows:
X(t) = sum_{n=0}^N x[n] sinc(t/T-n),
where T is the sampling interval, sinc is the normalized sinc, and x is assumed to be finite and appropriately shifted. This can be viewed as a continuous convolution, but more importantly, it's almost a discrete convolution; the "almost" bit can be illustrated by only sampling X at multiples of T:
y[m] = sum_{n=0}^N x[n] s[m-n],
where s[k] = sinc(k), i.e. s[0] = 1 and s[k] = 0 for all non-zero k. It's easy to see that y equals x pointwise, so we haven't broken anything yet. More generally, we'll get a discrete convolution if we sample X at some countable number of points, call it M.
You get some fun stuff by examining the equation further. Making the left-hand side of the second equation continuous and rearranging, gives you an expression for a good fractional delay -- chorus, flanger, phaser, and wah effects, as well as every physical modelling based instrument you've ever heard, incorporate a fractional delay.
Anyway, back on track. The problem is that the equality, as written, is the BAD version of convolution, the O(N^2) version, assuming (as is typical in this case) that M depends on N linearly -- it's not that sinc is impractical, it's that you'd be evaluating it O(N^2) times when you don't have to evaluate it at all.
So, here's the DSP point of view: convolution in the spatial/time domain is slow, for sufficiently large N, whenever M grows faster than log N (the reason Reedbeta said "it might take too much time" is that M is often a constant in image processing, making spatial domain convolution linear). As you know, convolution is pointwise multiplication in the frequency domain, and you're multiplying by a box, so even that simplifies -- so that's one way to speed it up. I don't understand what you mean by "impossible."
Here's the CG point of view: as fundamental as this is, it's difficult to come up with a good reason to actually use the identities given above. First, sinc falls off quickly -- truncate it to obtain finite support, you get linear time. Second, you're nowhere near Nyquist frequency, so none of this is really relevant, and you have to examine the specifics. If M > N, then you're supersampling, your box function is N wide, and you're filling the gaps with something that may or may not be to your liking. If M < N, then you're subsampling, the convolution doesn't do anything, s[*] is constant in the frequency domain -- you're doing sine interpolation. In both cases, you're feeding your output to something with very non-linear frequency and amplitude response, the human eye. This means you're making very costly decisions based on the assumption of a flat response, which is incorrect a priori. Finally, if you're messing with X(t) in the first equation and then resampling it, then simplification alarms should be going off in your head.
Anyway, like I said, I'm not fully sure what the question is, let me know if this didn't answer it. Conversely, I'm rather new to DSP, so hopefully someone will correct me if I'm on crack.
X(t) = sum_{n=0}^N x[n] sinc(t/T-n),
where T is the sampling interval, sinc is the normalized sinc, and x is assumed to be finite and appropriately shifted. This can be viewed as a continuous convolution, but more importantly, it's almost a discrete convolution; the "almost" bit can be illustrated by only sampling X at multiples of T:
y[m] = sum_{n=0}^N x[n] s[m-n],
where s[k] = sinc(k), i.e. s[0] = 1 and s[k] = 0 for all non-zero k. It's easy to see that y equals x pointwise, so we haven't broken anything yet. More generally, we'll get a discrete convolution if we sample X at some countable number of points, call it M.
You get some fun stuff by examining the equation further. Making the left-hand side of the second equation continuous and rearranging, gives you an expression for a good fractional delay -- chorus, flanger, phaser, and wah effects, as well as every physical modelling based instrument you've ever heard, incorporate a fractional delay.
Anyway, back on track. The problem is that the equality, as written, is the BAD version of convolution, the O(N^2) version, assuming (as is typical in this case) that M depends on N linearly -- it's not that sinc is impractical, it's that you'd be evaluating it O(N^2) times when you don't have to evaluate it at all.
So, here's the DSP point of view: convolution in the spatial/time domain is slow, for sufficiently large N, whenever M grows faster than log N (the reason Reedbeta said "it might take too much time" is that M is often a constant in image processing, making spatial domain convolution linear). As you know, convolution is pointwise multiplication in the frequency domain, and you're multiplying by a box, so even that simplifies -- so that's one way to speed it up. I don't understand what you mean by "impossible."
Here's the CG point of view: as fundamental as this is, it's difficult to come up with a good reason to actually use the identities given above. First, sinc falls off quickly -- truncate it to obtain finite support, you get linear time. Second, you're nowhere near Nyquist frequency, so none of this is really relevant, and you have to examine the specifics. If M > N, then you're supersampling, your box function is N wide, and you're filling the gaps with something that may or may not be to your liking. If M < N, then you're subsampling, the convolution doesn't do anything, s[*] is constant in the frequency domain -- you're doing sine interpolation. In both cases, you're feeding your output to something with very non-linear frequency and amplitude response, the human eye. This means you're making very costly decisions based on the assumption of a flat response, which is incorrect a priori. Finally, if you're messing with X(t) in the first equation and then resampling it, then simplification alarms should be going off in your head.
Anyway, like I said, I'm not fully sure what the question is, let me know if this didn't answer it. Conversely, I'm rather new to DSP, so hopefully someone will correct me if I'm on crack.
1 user(s) are reading this topic
0 members, 1 guests, 0 anonymous users











