Vision systems have become ubiquitous. They are used for traffic monitoring, elderly care, video conferencing, virtual reality, surveillance, smart rooms, home automation, sport games analysis, industrial safety, medical care etc. In most vision systems, the data coming from the visual sensor(s) is processed before transmission in order to save communication bandwidth or achieve higher frame rates. The type of data processing needs to be chosen carefully depending on the targeted application, and taking into account the available memory, computational power, energy resources and bandwidth constraints. 
In this dissertation, we investigate how a vision system should be built under practical constraints. First, this system should be intelligent, such that the right data is extracted from the video source. Second, when processing video data this intelligent vision system should know its own practical limitations, and should try to achieve the best possible output result that lies within its capabilities. We study and improve a wide range of vision systems for a variety of applications, which go together with different types of constraints. 

First, we present a modulo-PCM-based coding algorithm for applications that demand very low complexity coding and need to preserve some of the advantageous properties of PCM coding (direct processing, random access, rate scalability). Our modulo-PCM coding scheme combines three well-known, simple, source coding strategies: PCM, binning, and interpolative coding. The encoder first analyzes the signal statistics in a very simple way. Then, based on these signal statistics, the encoder simply discards a number of bits of each image sample. The modulo-PCM decoder recovers the removed bits of each sample by using its received bits and side information which is generated by interpolating previous decoded signals. Our algorithm is especially appropriate for image coding since it introduces larger coding errors in those regions where it is less visible (edges and textured regions). 
We develop a model for the coding distortion introduced by this modulo-PCM coder. Using this model, we analyze how the coding parameters should be chosen as a function of the target rate and the quality of the side information. 
Experimental results obtained in the encoding of several digital images show that our algorithm has a better objective and subjective performance than PCM at low rates. At high rates, Modulo-PCM and PCM provide similar results. Our algorithm has a worse rate-distortion performance than other source coding techniques such as modulo-PCM coding with side information or Wyner-Ziv video coding, but it has the advantage of a much lower computational complexity (comparable to PCM). This makes our algorithm very useful in applications that require extremely simple encoders such as the encoding of video signals from high-speed cameras. 

Second, in some video applications, it is desirable to reduce the complexity of the video encoder at the expense of a more complex decoder. Examples of such applications are wireless low-power surveillance, wireless PC cameras, multimedia sensor networks, disposable cameras, and mobile camera phones. Distributed video coding is a new paradigm that fulfills this requirement by performing intra-frame encoding and inter-frame decoding. Hence, most of the computational load is moved from the encoder to the decoder, since in this case the distributed video decoders (and not the encoders) perform motion estimation and motion compensated interpolation. Two theorems from information theory, namely the Slepian-Wolf theorem for lossless distributed source coding and the Wyner- Ziv theorem for lossy source coding with side information, suggest that such a system with intra-frame encoding and inter-frame decoding can come close to the efficiency of a traditional inter-frame encoding-decoding system. 
To get a better insight into the functioning of this type of coders, we start with an in-depth study of the coding distortion introduced by pixel-domain Wyner-Ziv video coders. Our coding distortion model can be used to determine the optimal value of coding parameters under rate and distortion constraints. As an example, we show how our model can be used to reduce quality fluctuations between different frames of the video. 
Many Wyner-Ziv video coders make use of a feedback channel to allocate an appropriate rate. However, this feedback channel is not always available, as is the case in offline coding or in unidirectional applications. We propose a rate allocation algorithm that allows to remove the feedback channel from the coding scheme. Our algorithm computes the number of bits to encode each video frame without significantly increasing the encoder complexity. Experimental results show that our rate allocation algorithm delivers good estimates of the rate, and that the frame qualities provided by our algorithm are quite close to the ones provided by a feedback channel-based algorithm. 
A general aim in distributed video coding is to reduce the complexity of the encoder as much as possible, but this is of course at the expense of more decoder complexity. In this respect, we observe that this increase of the decoder complexity is excessive, and hence the complexity of the entire coding process is much higher than in traditional coding schemes. To overcome this problem, we develop a method that reduces the decoder complexity drastically. In this method, we utilize a feedback channel to fine-tune the rate allocation of our rate allocation algorithm and to achieve very near-to-optimal rate allocation, while we eliminate at the same time two main feedback channel inconveniences, i.e., its negative impact on latency and decoder complexity. 

Third, we study in detail how a vision system for the specific application of 2D occupancy sensing should be designed. A 2D occupancy map provides an abstract top view of a scene containing people or objects. Such maps are important in many applications such as surveillance, smart rooms, video conferencing and sport games analysis. We present two different methods. With a first method we aim at providing very accurate 2D occupancy maps. For this, we use a network of smart cameras, which means that the cameras have strong on-board processing capabilities. Consequently, the cameras can process and compress the video data in an intelligent way before sending it to the base station for central processing. In particular, each camera calculates a foreground (FG)/background (BG) silhouette and transfers this silhouette to a reference plane using its camera image-floor homographies. These ground occupancies computed from each view are transmitted to a central processing station. Since the data amount needed to represent these ground occupancies is not large (much smaller than the data amount needed for a typical natural image), the required band width is rather small. At the base station, the ground occupancies from the cameras are fused using the Dempster-Shafer theory of evidence. The method yields very accurate occupancy detection results and outperforms the other state-of-the-art multi-camera 2D occupancy calculation methods. 
This first method is very accurate but cannot always be used due to practical limitations. The major concerns are the possibility of privacy breach, the high cost price, the expensive alterations to the infrastructure, the high-complexity processing and the large power consumption. 
Taking these requirements into consideration, we present a second novel method for 2D occupancy sensing. In this method, we replace the camera by a more specific device consisting of a linear array of optical sensing elements (e.g. photodiodes), which we call a line sensor. We propose to use multiple of these line sensors to calculate an accurate 2D occupancy map. The line sensor is particularly suited for this application due to its low price, its low-power consumption, its high data rates, its high bit depth and its privacy-friendly nature. We propose to use the line sensor together with a light-integrating optical system, which ensures that each sensing element integrates all light within a certain range of incidence angles. The scan line outputs from multiple light-integrating line sensors are very well suited as input for a 2D occupancy calculation algorithm. Occupancy calculation with light-integrating line sensors yields accurate results that approximate quite closely the results obtained with cameras, especially when the line sensors view the scene from aside and not from above. 

Fourth, we investigate how a vision network can deal with many vision tasks that need to be performed simultaneously, e.g. the tracking of multiple persons in a room. The number and the type of tasks a camera network can handle is of course limited by the network resources. The most important camera network restrictions are the limited computational power of the cameras and the communication constraints. 
In a practical multi-camera network charged with multiple tasks and with restricted network resources the aim is to achieve the best overall task performance by distributing the tasks in an efficient way among the sensors in accordance with the given restrictions. This distribution of tasks among the sensors is called task assignment. In this dissertation, we present a novel, general solution to task assignment in practical (i.e. with network restrictions) vision networks with overlapping fields of view. 
This framework offers the possibility of controlling the quality with which tasks are performed, while distributing the tasks among the cameras according to practical criteria. In particular, this framework entails on the one hand cost functions to model the practical criteria, such as for example the limited computational power of the cameras. On the other hand, we use suitability value functions that indicate how well a set of cameras can perform a certain task, in order to monitor the quality of the executed tasks. The cost and value functions are combined in a constrained optimization problem, which has as solution the optimal distribution of the tasks over the cameras. 
As a proof of concept, we use our method for the management of multiple person-tracking tasks. We evaluate how the tracking performance is influenced by bandwidth and computational constraints in the network. We test our method on extensive real data from different camera network environments. 

To summarize, the main contributions of this dissertation are 
1. a modulo-PCM based coding algorithm for very low complexity coding of images;
2. a thorough study and improvement of pixel-domain distributed video coding algorithms;
3. two novel vision systems for calculating accurate 2D occupancy maps;
4. a task assignment framework for intelligent vision networks.

The research performed during this PhD resulted in five international journal publications (two published, two under review, one in preparation) of which three as first author [Morbee et al., 2011, Prades-Nebot et al., 2010,Tessens et al., 2011, Morbee et al., 2010, Morbee et al., 2008a], two (submitted) patent applications as first author [Morbee and Tessens, 2010, Morbee and Tessens, 2011], two chapters in Lecture Notes of Computer Science of which one as first author [Lee et al., 2008, Morbee et al., 2007a], and twelve publications at international conferences of which eight as first author [Morbee et al., 2009b, Morbee et al., 2009a,Tessens et al., 2009,Morbee et al., 2008b,Tessens et al., 2008,Roca et al., 2008,Roca et al., 2007, Morbee et al., 2007d, Morbee et al., 2007c, Morbee et al., 2007b, Morbee et al., 2006a, Morbee et al., 2006b]. 

References: 

[Morbee et al., 2007a] Morbee, M., Prades-Nebot, J., Pizurica, A., and Philips, W. (2007a). Improved pixel-based rate allocation for pixel-domain distributed video coders without feedback channel. In Advanced Concepts for Intelligent Vision Systems (ACIVS), Lecture Notes in Computer Science, pages 663-674, Delft, the Netherlands. Springer-Verlag.

[Morbee et al., 2007b] Morbee, M., Prades-Nebot, J., Pizurica, A., and Philips, W. (2007b). Rate allocation algorithm for pixel-domain distributed video coding without feedback channel. In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), volume 1, pages I521-I524, Honolulu, HI, USA.

[Morbee et al., 2006a] Morbee, M., Prades-Nebot, J., Pizurica, A., and Philips, W. (2006a). Feedback channel suppression in pixel-domain distributed video coding. In Proceedings of the 17th Annual Workshop on Circuits, Systems and Signal Processing (ProRISC), pages 154-157, Eindhoven, The Netherlands. Technology Foundation/IEEE Benelux.

[Morbee et al., 2006b] Morbee, M., Prades-Nebot, J., Pizurica, A., and W., P. (2006b). Content-based MPEG-4 FGS video coding for video surveillance. In Proc. of SPS-DARTS 2006 (the second annual IEEE Benelux/DSP Valley Signal Processing Symposium, pages 135-138.

[Morbee et al., 2008a] Morbee, M., Roca, A., Prades-Nebot, J., Pizurica, A., and Philips, W. (2008a). Reduced decoder complexity and latency in pixel-domain Wyner-Ziv video coders. Springer Journal on Signal, Image and Video Processing (SIViP), 2(2):129-140.

[Morbee and Tessens, 2010] Morbee, M. and Tessens, L. (2010). Multiple light-integrating line sensors for 2D occupancy sensing. EPO Patent Office, Application Number EP10164483.9.

[Morbee and Tessens, 2011] Morbee, M. and Tessens, L. (2011). Multiple light-integrating line sensors for 2D occupancy sensing. EPO Patent Office, Application Number EP11000138.5.

[Morbee et al., 2010] Morbee, M., Tessens, L., Aghajan, H., and Philips, W. (2010). Dempster-Shafer based multi-view occupancy maps. Electronic Letters, 46.

[Morbee et al., 2011] Morbee, M., Tessens, L., Aghajan, H., and Philips, W. (2011). Dempster-Shafer based task assignment in vision networks. submitted to International Journal on Computer Vision.

[Morbee et al., 2008b] Morbee, M., Tessens, L., Lee, H., Philips, W., and Aghajan, H. (2008b). Optimal camera selection in vision networks through shape approximation. In Proceedings of the 2008 IEEE 10th Workshop on Multimedia Signal Processing, pages 46-51, Cairns, Queensland, Australia. ISBN: 978-1-4244-2295-1.

[Morbee et al., 2009a] Morbee, M., Tessens, L., Philips, W., and Aghajan, H. (2009a). PhD forum: Dempster-Shafer based camera contribution evaluation for task assignment in vision networks. In Distributed Smart Cameras, 2009. ICDSC 2009. Third ACM/IEEE International Conference on, pages 1-2.

[Morbee et al., 2007c] Morbee, M., Tessens, L., Prades-Nebot, J., Pizurica, A., and Philips, W. (2007c). A distributed coding-based extension of a mono-view to a multi-view video system. In 3DTV-Conference, Kos, Greece.

[Morbee et al., 2007d] Morbee, M., Tessens, L., Quang-Luong, H., Prades-Nebot, J., Pizurica, A., and Philips, W. (2007d). A distributed coding-based content-aware multi-view video system. In International Conference on Distributed Smart Cameras (ICDSC), pages 355-362, Vienna, Austria.

[Morbee et al., 2009b] Morbee, M., Velisavljevic, V., Mrak, M., and Philips, W. (2009b). Scalable feature-based video retrieval for mobile devices. In ACM International Conference on Internet Multimedia Computing and Service (ICIMCS), pages 1-7, Kunming, Yunnan, China.

[Lee et al., 2008] Lee, H., Tessens, L., Morbee, M., Aghajan, H., and Philips, W. (2008). Sub-optimal camera selection in practical vision networks through shape approximation. volume 5259 LNCS, pages 266 – 277, Juan-les-Pins, France. 

[Roca et al., 2008] Roca, A., Morbee, M., Prades-Nebot, J., and Delp, E. (2008). Rate control algorithm for pixel-domain Wyner-Ziv video coding. In Proc. Visual Communications and Image Processing (VCIP), San Jose, CA, USA.

[Roca et al., 2007] Roca, A., Morbee, M., Prades-Nebot, J., and Delp, E. J. (2007). A distortion control algorithm for pixel-domain Wyner-Ziv video coding. In Picture Coding Symposium, Lisbon, Portugal.

[Prades-Nebot et al., 2010] Prades-Nebot, J., Morbee, M., and Delp, E. J. (2010). Very low complexity coding of images using modulo-PCM. submitted to IEEE Trans. Circuits Syst. Video Technol.

[Tessens et al., 2011] Tessens, L., Morbee, M., Aghajan, H., and Philips, W. (2011). Camera selection for tracking in smart camera networks. submitted to IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI).

[Tessens et al., 2008] Tessens, L., Morbee, M., Lee, H., Philips, W., and Aghajan, H. (2008). Principal view determination for camera selection in distributed smart camera networks. In Proceedings of ACM/IEEE ICDSC, pages 1-8, Stanford, CA, USA.

[Tessens et al., 2009] Tessens, L., Morbee, M., Philips, W., Kleihorst, R., and Aghajan, H. (2009). Efficient approximate foreground detection for low-resource devices. In Distributed Smart Cameras, 2009. ICDSC 2009. Third ACM/IEEE International Conference on, pages 1-8.