Tracking Object

some of these function to implement for example the conversion between an IplImage, using ... image and the current frame return a binary black and white image which .... Indeed we can initialize the class either with a camera IP address.
3MB taille 49 téléchargements 394 vues
MIVIA

ENSICAEN 6, bd maréchal Juin F-14050 Caen cedex 4

Computer science – 2nd year Report of training course

Tracking Object With a PTZ Network Camera

Loïc FAYEL Stéphanie COCHET

Tutor:

Mario VENTO Gennaro PERCANNELLA

May - July 2010

Special thanks We are very thankful to all members of MIVIA laboratories without whom nothing we would not have been able to lead this project: Mario VENTO Gennaro PERCANNELLA

Donatello CONTE Francesco TUFANO

Thank you for your support.

Thank you for the assistant.

Contents

Introduction ................................................................................................................................ 4 1 Architecture program........................................................................................................... 7 1.1 Component.................................................................................................................... 7 1.1.1 The AXIS 215 PTZ Network Camera.................................................................... 7 1.1.2 Network .................................................................................................................. 8 1.1.3 Software ................................................................................................................. 8 1.2 Structure........................................................................................................................ 9 1.2.1 Procedure................................................................................................................ 9 1.2.2 Organization ......................................................................................................... 11 1.3 Technical choices........................................................................................................ 13 1.3.1 Hypothesis ............................................................................................................ 13 1.3.2 Functionalities of OpenCV................................................................................... 13 2 Tracking system (FAYEL Loïc)........................................................................................ 15 2.1 Background Subtraction.............................................................................................. 15 2.1.1 Principle ............................................................................................................... 15 2.1.2 Background importance ....................................................................................... 15 2.1.3 Implementation and processing result.................................................................. 16 2.2 Camshift...................................................................................................................... 17 2.2.1 Meanshift.............................................................................................................. 17 2.2.2 Camshift ............................................................................................................... 17 2.2.3 Improvement ........................................................................................................ 18 2.3 State switcher .............................................................................................................. 19 2.3.1 States initialization ............................................................................................... 19 2.3.2 States switching.................................................................................................... 19 3 Application (COCHET Stéphanie).................................................................................... 20 3.1 Communication........................................................................................................... 20 3.1.1 HTTP API «VAPIX» ........................................................................................... 20 3.1.2 HTTP Client ......................................................................................................... 21 3.2 Control class................................................................................................................ 21 3.3 User graphical interface .............................................................................................. 22 4 Tests................................................................................................................................... 25 Conclusion................................................................................................................................ 29 Bibliographical reference ......................................................................................................... 30 5 Annexes: OpenCV............................................................................................................. 31

3

Introduction Using of video surveillance camera increases year after year. Now, we could see camera in the street, in bus or subway and also in public squares or car parks; in other word in all public places. But video processing and tracking object development also inspire other so different high technologies domains as video game for example. It would be interesting to improve it a little bit more. Nowadays, cameras are mobile and can be controlled from a distance, in order to follow someone for example. How could we automate the detection and the tracking of a moving object entering the scene, as a human made naturally? Application of such software could be really interesting for video surveillance in public or private places, to analyze the movement of a person and have the better image of the target than usual video surveillance installations which don’t focus on the target and settle for filming a large scene. It also could be used to reduce the number of video camera needed for a video surveillance installation: only one camera for several rooms. Otherwise, to film someone moving in continuous without any restriction of field of vision of the camera. Furthermore, almost of video processing algorithms depend on the quality of the target image to be process, so it could increase the output of such algorithm if we use our software pre-processing the video stream to have the best image of the target.

Fig. 1: Tracking software

4

To complete this project, we have an AXIS Pan Tilt and Zoom IP camera 215. The main goal is to create software to control this camera to follow a moving target in a large ordinary area. The program will have to detect a relevant moving target and track it into the scene with an adapted zoom factor in order to record the best image of it. To carry through the issue of detection and tracking, we chose to base the program on two different algorithms, corresponding of the two states of our scenario: -

First of all, the camera films the scene and waits for a moving target. The camera is fixed. Then, a moving target is detecting, the camera follows it into the scene. The camera is moving.

When the camera is fixed, we used an algorithm based on background subtraction with a mixture of Gaussian. When the camera is moving, we used a Continuously Algorithm Mean Shift (CAMShift) algorithm. Regarding the implementation of the software, the main body of the application is coded in C++ language, in a control module. Both of the algorithms quoted above are implemented, in a motion detection module, with OpenCV 2.1 (cf. Annexe) C++ library which we used to code our software. This is a library of programming functions for real time computer vision. Furthermore, we chose the C++ PoCo 1.6 library to implement an adapter interface between the camera interface and our program in a communication module. Finally, our application must be user friendly, that’s why we decided to implement a user graphical interface with the C++ Library Qt 7.0, in a visualization module. It has to implement some of necessary functions as: -

fill in the IP of the required camera, choose a favored field of vision of the camera, start and stop the video processing, record the resulting video stream, select a target with the mouse.

All of these functionalities have to bring the software within everybody’s reach. Indeed, the goal of this one is to be general and accessible software for tracking object. To round out the accessibility of the software, a website (http://www.ecole.ensicaen.fr/~fayel/stage/index.php) would be created to show the result of our work (cf. fig. 2) but also to publish the report of our activities in MIVIA laboratory and the whole documentation of the software (cf. fig. 10).

5

Fig. 2: Website

We had three months to achieve this ambitious project, that is why we plan our work early: - Mlle Stéphanie COCHET work on the communication system and the graphical interface. - M. Loic FAYEL work on real time computer vision and the web site. The final tests of the software are realize by both. Here it is a Gantt diagram to illustrate this:

Fig. 3 : Previous Gantt diagram

6

1 Architecture program As we just said in the introduction, the main goal of this project is to create a software for a PTZ Camera in order to track automatically an object that just crosses in front of the camera. For this project, we have three different tools: - The AXIS 215 PTZ Network Camera in order to capture the video. - A software using OpenCV in C++ in order to detect and track the object - A HTTP Client using PoCo in C++ in order to communicate with the camera.

1.1 Component 1.1.1 The AXIS 215 PTZ Network Camera We will use the AXIS 215 PTZ Network Camera for our project, as you can see in fig. 4.

Fig. 4: AXIS 215 PTZ Network Camera

We chose this camera because it has some interesting network characteristics, we could: - Connect the camera to the Network, - Control camera with HTTP protocol, especially VAPIX furnished by Axis, - Have an IP for camera in IPv4 or IPv6, - Use API to upload some software in the flash memory of camera. Moreover, we chose this because: - It can do a 360 degree pan with Auto-flip and so we can track the object all the time, - It has 48x zoom so we can focus on object, - It has a day and night functionality. If you want more detail about the AXIS 215 PTZ Network Camera please see [1].

7

1.1.2 Network The network can be internet or local network. In our case in M.I.V.I.A laboratory we used the local network of the university. The network will be the link between software and camera. We will use the HTTP protocol. Indeed the VAPIX software provided by AXIS to control the pan, tilt and zoom behavior of a PTZ unit is on HTTP based application programming interface [2]. In order to communicate with our software, we should use a HTTP client library in C++ language. We choose the PoCo library to insure this task [3]. We could see a simple plan of our architecture in fig. 5.

Fig. 5: Network

1.1.3 Software The programming language is C++ with OpenCV 2.1 library. OpenCV is an open library of programming functions for real time computer vision. It will be useful for us, especially because there are some functions to track, like cvCamShift or cvMeanShift, and other functions to do background subtraction. You can see the documentation of OpenCV in [4].

8

1.2 Structure About the structure of the software, here it is a simple sequences diagram to show the global working of our software in fig. 6.

Fig. 6: Diagram sequence, simplify.

1.2.1 Procedure With OpenCV, we will realize the function of tracking object. This function has two different states: - The first is to detect an object to track; in this state the camera is fixed in an initial position chosen by user. The program uses background subtraction to detect the object. We will see in the third part the characteristic of an object. - The second is to track the object detected; in this state the camera move to follow the object. The program use MeanShift algorithm to follow the object. We will see in the third part different problem of this state. In fig. 7, it is a detail sequence diagram to explain the working of the software.

9

Fig. 7: Detail diagram sequence

10

1.2.2 Organization We organize the software in five parts (cf. fig. 8 and 9): − Capture: the camera capture the video, − Communication: PoCo Client and Vapix send information from the capture to the operator, − Operator: the control class drive the application, − Visualization: the user interface enables users to pilot the camera, detection and tracking. − Motion Detection: the state manager switch between background state and tracking state to detect and track an object.

Fig. 8: Architectural diagram

11

Fig. 9: Class diagram

To help the understanding of this organisation, we build a full documentation about the code with doxygen (cf. fig.10).

Fig. 10: doxygen documentation

12

1.3 Technical choices 1.3.1 Hypothesis Before going on, we have to define the plans to be implemented. We have to deal with several issues about the definition of an object to track, the motion and zoom, and the privileged field of vision. We consider only the objects which are bigger than children (1m-1,5m) and stay more than 5 frames (0.20s) in the filed of vision of the camera. If several tracking object appears simultaneously in the field of vision, we should focus on the group of object to have overall view of them. If the objects have conflicting motion, we should unfocused to keep all tracking objects in the scene. If a tracking object disappears of the scene more than 25 frames (1s), then the camera return to initial position. We zoom on an object if its size is smaller than 1/3 of the size of the frame. We keep the object as soon as possible in the centre of picture. We define a privileged field of vision to check on when there is nobody in the scene or when the tracking object size is smaller than the threshold defined before. This privileged field should be chosen by the user.

1.3.2 Functionalities of OpenCV In this part, we will explain how we use OpenCV in our software. First of all, we will see the main functions we used to implement background subtraction and camshift. Secondly we will explain the most useful function we use in OpenCV to implement easily visualization and recording module. a. Computer vision functions cvUpdateBGStatModel(tmp_frame, bg_model); This function is used in background subtraction module to detect an object in the video. It create a Background subtraction model containing the foreground mask and the updated background model image. The update of the background model is necessary to resolve many problem of background as gradual changes of brightness for example. cvFilterByArea(blobs, BLOB_TAILLE_MIN, BLOB_TAILLE_MAX); This function is used in background subtraction to select interesting blobs in the foreground mask. After this processing, the computer vision video keep only the blob with size between BLOB_TAILLE_MIN and BLOB_TAILLE_MAX. Like this, we could detect only relevant moving object entering the scene. cvCalcBackProject(&hue,imgBackProject,histogram->getHist()); This function is used in camshift algorithm to calculate the back projection of the the computer vision video and then process this stream with the camshift algorithm. It calculate

13

the back projection with help of an histogram. We could apply all modifications we want in calculation of this histogram to upgrade the efficiency of back projection calculation and Camshift processing. cvCalcHist(&hue, hist, 0, mask); This function is used in camshift algorithm to calculate an histogram of the current frame. This algorithm is used bay the back projection calculation and camshift processing. cvCamShift( imgBackProject, *target, cvTermCriteria(CV_TERMCRIT_EPS | CV_TERMCRIT_ITER, 10, 1 ), &track_comp,&box); This function applies the algorithm of camshift as we said before in part 2.2.2. It search the most relevant target in the imgBackProject image and fill in the track_comp argument with its conclusion. We can use the track_comp argument to have the new position and geometry of the tracking target. These functions enable us to implement the software easily and correctly. Thus, OpenCV make easier the implementation of such a program.

b. HightGUI functions cvQueryFrame(…); This function capture a frame the video stream given in argument and convert it into IplImage structure used in OpenCV. IplImage* cvCreateImage( CvSize size, int depth, int channels ); This function creates the header and allocates data with the size of image, the depth and the number of channels. void cvRectangle( IplImage* img, CvPoint pt1, CvPoint pt2, int color, int thickness ); This function draws a rectangle with two opposite corners pt1 and pt2. It is useful for visualize the target. cvSaveImage(name,IplImage* Image); This function save an IplImage, when we want. It is useful for see what background subtraction detect for a target. cvCreateVideoWriter(…); cvWriteFrame(IplImage* Image); This two functions are used to save a video. The first is to create the header of a CvVideoWriter object and allocates it with the size of the frames, the fps and numbers of channels of the resulting video. The second is to write a frame into it. It is record some video about our tests. These functions enable us to implement a first version of the graphical interface. We used later the Qt library which is more complete and efficient but, we needed in particular some of these function to implement for example the conversion between an IplImage, using by OpenCV and QImage, using by Qt. We also conserved functions for creation and writing of a video stream to record our results.

14

2 Tracking system (FAYEL Loïc) The tracking system elaborated for this project works with different states. This choice had been made in order to have larges possibilities to choose different algorithms. Here we will explain the two algorithms chosen for this application and the system which controls these.

2.1 Background Subtraction At first time, the camera is fixed. We then chose to use a background subtraction algorithm, as explained above, to detect a moving target entering the scene.

2.1.1 Principle To detect a moving target in the frame, we just make a subtraction between the frame and a model of the background. The image of background reference is calculated with an algorithm of mixture of Gaussian. The background reference is the weighted sum of each previous frame. Thus, the background reference image is regularly updated. The subtraction of this reference image and the current frame return a binary black and white image which contains the moving target.

2.1.2 Background importance The result of this algorithm depends on the quality of the background reference calculation. Indeed it must take in account light changes and small movements of background elements as trees for example. We must configure several parameters as update speed and importance of each new pixel updated. On figure 11 we can see the foreground mask of a bad background model. In this picture, it is easier to understand that a moving object crossing the scene will not be detected. Indeed it will be lost in white fixed blob which are not supposed to be white.

Fig. 11: bad background model - foreground mask

15

2.1.3 Implementation and processing result This algorithm returns a binary image containing the moving target detected. This one is represented in the image as white pixels and the background as black pixels. This image is then processed to detect relevant targets defined with technical choices previously described. A blob detector analyzes different connected components in the image and filters them by their area. Relevant blobs are then saved. If they are visible on the central part for several consecutives frames, the background subtraction module saves the target and stops. On figure 12 we can see the result of the Background subtraction algorithm and the blob filtering on a video: at the left, the background model, at the middle, the foreground mask, at the right, the real frame with the result of the detection.

Fig. 11 : background subtraction - background model, foreground mask and real frame

Thus, we can detect all types of moving object entering the scene, figure 13 we can see several examples of detection: three persons and two cars.

Fig. 123: examples of target detection - foreground masks

16

After detection, we just have to select pixels of the foreground mask in the real frame to obtain an image of the moving target detected. Figure 14 we can see some examples.

Fig. 14: example of target detection - real targets

These images will be used by camshift algorithm to run the track process.

2.2 Camshift Once a target detected, the camera follows it. To track the target during the camera motion, we chose to code the Camshift algorithm.

2.2.1 Meanshift The meanshift algorithm, as explained above, consist in search in a frame the closest histogram of given target one. The meanshift works with only one color histogram. It takes in parameters the frame in which searched the target, the target and its histogram. The algorithm then calculates methodically histograms around the position of the target and compares them with the given target's histogram. This algorithm follows tolerance and target speed conditions. For example more the target is fast, more the meanshift keeps away from the center of the target.

2.2.2 Camshift Camshift is finally an application of meanshift. This algorithm simply consists in apply several iterations of meanshift, one for each color. The color range must be defined, as the maximal number of iterations.

17

2.2.3 Improvement In order to improve the robustness of the tracking algorithm, it is possible to limit the color range took in account. That is why we implement a calculation of the main color of the target given in argument of the tracking module. It takes the frame where the target was detected, and the foreground mask of it, and calculates a hue histogram only on the pixel belonging to the target. Then the maximum color is taken to define a color range corresponding to the main color of the object (cf. figure 15).

Fig. 15: color range selection

It is amount to selecting only a portion of the HSV color space (cf. figure 16) in the frame to calculate the back-projection.

Fig. 16: Hue/Saturation/Value space

As we made the selection only on the hue component, the software is able to track the object despite of both other component variations. Indeed the color of an object changes because of light variations or object orientation changes for example. However, we could think about black and white colors which appear in this space as special color which can not be identified by hue component only.

18

2.3 State switcher In order to manage different states of the tracking system, we made a class managing the state switching.

2.3.1 States initialization At first time, the state manager creates the states which the system needs and initializes them with frame parameters to be process. Then it chooses the initial state of the system: the background subtraction module and actives it. Once the module started, the state manager runs for each received frame the process of the active state. If the module returns a success code, the state manager starts the following state providing it the needed parameters.

2.3.2 States switching Once the background subtraction module started the process searches for moving target in received frames. When a target is found, the module returns a success code to the state manager. Then, the state manager saves the current target, and transmits it to the camshift module. Same, when camshift lost the target, the state manager resets the target and background reference image and restarts the background subtraction module in order to find a new target to track (cf. figure 17). The state manager contains permanently the last valid target.

Fig. 17: State switching - tracking state to detection state

19

3 Application (COCHET Stéphanie) In order to run the tracking system, several complementary modules are necessary. Here we will detail the communication with the PTZ camera and the system control module and the user graphical interface.

3.1 Communication At first, the communication tool of the AXIS PTZ IP camera supplied by AXIS is a Hyper Text Transfer Protocol Application Programming Interface (HTTP API).

3.1.1 HTTP API «VAPIX» In order to communicate with the camera, we use the API applied by AXIS named “VAPIX”. The supported version by our camera is the version 2.0. First we must assign to the camera an IP address on the network, in order to use and access to it. Once the IP address configured we can control the camera simply with some HTTP request sent to its server. For example, to list parameters of the camera we must send: GET axis-cgi/admin/param.cgi?action=list HTTP/1.1 Host: 10.0.77.33 Connection: keep-alive

Some requests require operator or administrator rights. Thus, with this tool, we can control the movement of the camera, and its pan, tilt and zoom (PTZ) parameters, sending to it the adapted request. However, these requests require identification and a cookie: ptz_ctl_id. It is returned by the server of the camera in response to the first PTZ request. It must be sent with all following requests. If the cookie is valid, the camera carries out the request and reply to the client 204 if the movement was done. For example, here it is the request to move the camera on the left: GET axis-cgi/com/ptz.cgi?camera=1&move=left HTTP/1.1 Host: 10.0.77.33 Authorization : root=xxxxxx Connection: keep-alive Cookie: ptz_ctl_id=41755

In the same way to access to the video stream, we will send the following request. The server response will contain the video stream in a type object multipart/x-mixed-replace and an HTTP status code “200 OK” if the request succeed. GET axis-cgi/mjpg/video.cgi?camera=1&resolution=320x240 HTTP/1.1 Host: 10.0.77.33 Authorization : root=xxxxxx Connection: keep-alive

20

3.1.2 HTTP Client In order to enable our program to send and receive data from the camera, we thought about a HTTP client. To complete it we used a C++ library “POCO” [3]. The httpClient class adapts the tracking program needs and translates it in comprehensible requests for camera interface. httpClient contains methods to translate and send adapted HTTP requests to the AXIS server. It also has methods to authenticate requests and require cookies for PTZ control requests. This HTTP client is initialized with a camera IP address or a server name. This IP is saved and used to send all following requests. In this class we included several methods to respect possibilities offered by the API “VAPIX” : -

send any request, send a request with a list of parameters, send a request requiring cookies, move the camera on the left, right, up or down, center the camera on the point (x,y), center the camera and zoom on the point (x,y) with a zoom factor z, save the current frame with a default resolution, save the current frame with the given resolution, access to the current video stream with a default, access to the current video stream with the given resolution, save a favored position, go to the favored position.

This class doesn't depend on the rest of the system and can be use for any application using an AXIS camera with VAPIX 2.0. To reduce time of calculation of the system we parameterized the default resolution of video stream to 320x240 pixels.

3.2 Control class To complete the program, we made a principal control class in order to link the position of an IP camera with the result of the video processing or simply run the video processing on a video file. Indeed we can initialize the class either with a camera IP address or a server name, or a video file name. In the case of the processing of a video file, the class just runs the state manager, and shows the output image in a window. In the case of the process with a camera, the program runs several steps: -

connection to the camera, initialization of the camera position, access to the video stream, creation of the visualization window, run the video process, calculation of pan tilt and zoom parameters, move the camera, display.

21

This class respects the technical choices; it links the state manager and the camera. Indeed, if the state manager signifies the target is lost, then the camera returns to the initial position and the state manager restarts the background subtraction module until finding a new target. On the opposite, when the Camshift tracking module is active, for each frame the control class retrieves the target coordinates contained in the state manager class, and asks for the adapted movement to the camera with the HTTP client. This class also enables to display the result of video processing and see camera movement.

3.3 User graphical interface Finally, to make a software as user friendly as is possible, we made a user graphical interface with following possibilities (cf. figure 18): - choose the application mode (camera tracking or video file processing), - start or stop tracking/video processing, - record the resulting video and images of tracking targets, - see results (output image, tracking target, active state), - configure the position of the camera.

Fig. 18: graphical user interface - starting

22

This graphical interface was programmed with the C++ library Qt because it is the best C++ library for graphical user interface implementing in Ubuntu or Windows environments.

Fig. 19: graphical user interface - camera motion

The graphical user interface makes it possible to move the camera directly (Cf. Figure 19: a. fixed, b. move on the top, and c. move on the right). The user can move the camera in all direction, and move the camera to the favored position. He can also choose to record the resulting video stream.

Fig. 20: graphical user interface - detection of a target

We could see each target detected, its histogram and the color range selected to initialize the camshift algorithm (cf. figure 20) on the right tabs. Each new target opens a new tab in this widget. Thus the user could visualize all targets tracked since starting. Finally, he can start, stop and reset the tracking process with the push buttons below computer vision and tracking windows. Start tracking and stop will start the video processing and stop it. Reset will move the camera into the favored position and wait for starting.

23

Fig. 21: graphical user interface - tracking of a target

The user could visualize the result of tracking process in live in the tracking window, and the result of both modules background subtraction and camshift in Computer vision window (cf. figure 21). It is useful to understand what the computer “see” and what it is tracking. In this window the user will see the result of background subtraction module until it find a relevant moving target (cf. figure 20). Then he will see the result of back projection calculation until it lost the target (cf. figure 21).

24

4 Tests In order to test our software we finally made some tests in different conditions (cf. figure 22): - in an indoor area with a rigid object (cf . figure 25), - in an indoor area with non rigid object (cf. figure 26), - in an outdoor area with a rigid object (cf. figure 27), - in an outdoor area with a non rigid object (cf. figure 28).

Indoor, rigid object Indoor, non rigid object Outdoor, rigid object Outdoor, non rigid object

sunny Works correctly Works correctly Works with difficulties Works with difficulties

Cloudy Works correctly Works correctly Works correctly Works correctly

Fig. 22: experimentations results

Each times we processed the test in two different light conditions: sunny or cloudy weather. Furthermore, we made a single test in a colored light indoor area. With these experimentations we could conclude on the output of our tracking software. Indeed, our software tracks correctly all types of moving objects in an indoor area with normal conditions of light. Furthermore, the software works correctly even if there is some turbulence in the scene, for example if the tracking objet is masked by another during a short time. On the opposite, it has some difficulties to track moving objects in outdoor area with sunny weather or in an indoor area with a colored light (for example: yellow light, cf figure 23 and 24).

Fig. 23: red object in yellow light – real frame and back-projection

25

It is caused by the color of the light. Indeed, the software tracks the main color of an object and can’t make the difference between the object color and the light color.

Fig. 24: red object in yellow light – hue histogram and selected color range

In each case, the black and white colored objects can’t be tracked by the software without any configuration changes because of the histogram calculation based only on the hue component of the object color. Indeed, black color is the peak of the Hue/Saturation/Value cone and white is its center. These two particular colors can not be defined with the hue component only the software could be parameterized by hand to resolve these two particular cases, tuning value and saturation ranges.

Fig. 25: indoor - rigid object

26

Fig. 26: indoor - non rigid object

Fig. 27: outdoor - rigid object

27

Fig. 28: outdoor - non rigid object

28

Conclusion At final, our software is functional; enabling to detect and track a moving object. It is based on two algorithms: background subtraction with mixture of Gaussian and camshift which use color to track object. We also added a blob filtering module to perfect the detection of a target and color filtering module to perfect the tracking of a moving target in a moving scene. We made several test in outdoor and indoor area and we could concluded that our program work correctly in normal condition of light. Furthermore, these tests enable us to show the limits of the software. Indeed, software couldn't track black and white object or object in a scene with colored light. In addition, the software could confuse two objects with the same color if they are too close. But this software can be considered as general software for tracking object. We implemented many tools to demonstrate the capacity of our software. For example we can choose, at any moment, the object that we want to track. This internship makes it possible for us to discover the research world and have a first great experience in computer science. We learned lots of useful library, like OpenCV and PoCo, which could help us in our next engineer life. We also learn to work by our own and for someone in order to complete a precise objective. At the end of our internship, we install our works in MIVIA laboratories. We connect the camera in the network of the laboratories and install the software in the server of the laboratories. Moreover, we create a dvd with our software, our tests and a file who explain precisely what you have to do for install and use our software. With this, all people of MIVIA laboratories are very happy of our works and congratulate us. For MIVIA laboratories, it is very important to do this because they want to continue to work on our project and maybe give it to other student.

29

Bibliographical reference [1] AXIS 215 PTZ Network camera: http://www.axis.com/fr/product/cam_215/ [2] The HTTP based application for control the PTZ camera, VAPIX: http://www.axis.com/files/manuals/VAPIX_3_HTTP_API_3_00.pdf/ [3] The PoCo library: http://pocoproject.org [4] The OpenCV documentation: http://www.cs.unc.edu/Research/stc/FAQs/OpenCV/OpenCVReferenceManual.pdf

30

5 Annexes: OpenCV OpenCV is the most important library of our software, especially because it help us for implement the two main algorithms : background subtraction and camshift. Here it is a description of OpenCV.

5.1 Description 5.1.1 Main functionalities OpenCV (Open Source Computer Vision) is a library of programming functions for real time computer vision. The OpenCV implements a wide variety of tools for image interpretation. It is compatible with Image Processing Library (IPL) that implements low-level operations on digital images. In spite of primitives such as binarization, filtering, image statistics, pyramids, OpenCV is mostly a high-level library implementing algorithms for calibration techniques (Camera Calibration), feature detection (Feature) and tracking (Optical Flow), shape analysis (Geometry, Contour Processing), motion analysis (Motion Templates, Estimators), 3D reconstruction (View Morphing), object segmentation and recognition (Histogram, Embedded Hidden Markov Models, Eigen Objects). The essential feature of the library along with functionality and quality is performance. The algorithms are based on highly flexible data structures (Dynamic Data Structures) coupled with IPL data structures. The software provides a set of image processing functions, as well as image and pattern analysis functions. The OpenCV Library has platform-independent interface and supplied with whole C sources. OpenCV is open. OpenCV is designed to be used together with Image Processing Library (IPL) and extends the latter functionality toward image and pattern analysis. Therefore, OpenCV shares the same image format (IplImage) with IPL. Also, OpenCV uses Integrated Performance Primitives (IPP) on lower-level, if it can locate the IPP binaries on startup. IPP provides cross-platform interface to highly-optimized low-level functions that perform domain-specific operations, particularly, image processing and computer vision primitive operations. IPP exists on multiple platforms including IA32, IA64, and StrongARM. OpenCV can automatically benefit from using IPP on all these platforms. There are a few fundamental types OpenCV operates on, and several helper data types that are introduced to make OpenCV API more simple and uniform. The fundamental data types include array-like types: IplImage (IPL image), CvMat (matrix), growable collections: CvSeq (deque), CvSet, CvGraph and mixed types: CvHistogram (multi-dimensional histogram). Helper data types include: CvPoint (2d point), CvSize (width and height), CvTermCriteria (termination criteria for iterative processes), IplConvKernel (convolution kernel), CvMoments (spatial moments), etc. There are no return error codes. Instead, there is a global error status that can be set or retrieved via cvError and cvGetErrStatus functions, respectively. The error handling mechanism is adjustable, e.g., it can be specified, whether cvError prints out error message

31

and terminates the program execution afterwards, or just sets an error code and the execution continues. The OpenCV software runs on personal computers that are based on Intel® architecture processors and running Microsoft* Windows* 95, Windows 98, Windows 2000, Windows NT*, Windows XP, Windows Vista, Windows 7, Linux (Ubuntu). The OpenCV integrates into the customer’s application or library written in C or C++. The code and syntax used for function and variable declarations in this manual are written in the ANSI C style. However, versions of the OpenCV for different processors or operating systems may, of necessity, vary slightly.

5.1.2 Blob filtering Blob filtering is an external module of OpenCV. We used this external library for implement the algorithm of background subtraction. Blob extraction is an image segmentation technique that categorizes the pixels in an image as belonging to one of many discrete regions. Blob extraction is generally performed on the resulting binary image from a thresholding step. Blobs may be counted, filtered, and tracked. Inconsistent terminology for this procedure exists, including region labeling, connected-component labeling, blob discovery, or region extraction. Well-known algorithms for accomplishing this exist, including a sequential algorithm and recursive algorithm. Blob extraction is related to but distinct from blob detection. Create a region counter Scan the image (in the following example, it is assumed that scanning is done from left to right and from top to bottom): - For every pixel check the north and west pixel (when considering 4-connectivity) or the northeast, north, northwest, and west pixel for 8-connectivity for a given region criterion (i.e. intensity value of 1 in binary image, or similar intensity to connected pixels in gray-scale image). - If none of the neighbors fit the criterion then assign to region value of the region counter. Increment region counter. - If only one neighbor fits the criterion assign pixel to that region. - If multiple neighbors match and are all members of the same region, assign pixel to their region. - If multiple neighbors match and are members of different regions, assign pixel to one of the regions (it doesn't matter which one). Indicate that all of these regions are the equivalent. Scan image again, assigning all equivalent regions the same region value.

32

5.2 Motion analysis algotrithms and OpenCV We decide to use OpenCV especially because we can implement easily algorithms of background subtraction and Camshift. This two algorithms are explained in the documentation of OpenCV.

5.2.1 Background subtraction This section describes basic functions that enable building statistical model of background for its further subtraction. It is extract from the OpenCV reference Manual [4]. In this chapter the term "background" stands for a set of motionless image pixels, that is, pixels that do not belong to any object, moving in front of the camera. This definition can vary if considered in other techniques of object extraction. For example, if a depth map of the scene is obtained, background can be determined as parts of scene that are located far enough from the camera. The simplest background model assumes that every background pixel brightness varies independently, according to normal distribution.The background characteristics can be calculated by accumulating several dozens of frames, as well as their squares. That means finding a sum of pixel values in the location S(x,y) and a sum of squares of the values Sq(x,y) for every pixel location. Then mean is calculated as , where N is the number of the frames collected, and standard deviation as . After that the pixel in a certain pixel location in certain frame is regarded as belonging to a moving object if condition is met, where C is a certain constant. If C is equal to 3, it is the wellknown "three sigmas" rule. To obtain that background model, any objects should be put away from the camera for a few seconds, so that a whole image from the camera represents subsequent background observation. The above technique can be improved. First, it is reasonable to provide adaptation of background differencing model to changes of lighting conditions and background scenes, e.g., when the camera moves or some object is passing behind the front object.

33

The simple accumulation in order to calculate mean brightness can be replaced with running average. Also, several techniques can be used to identify moving parts of the scene and exclude them in the course of background information accumulation. The techniques include change detection, e.g., via cvAbsDiff with cvThreshold, optical flow and, probably, others. In OpenCV there is simple basic functions to implement this algorithm : Acc() Adds a new image to the accumulating sum. SquareAcc() Calculates square of the source image and adds it to the destination image. MultiplyAcc() Calculates product of two input images and adds it to the destination image. RunningAvg() Calculates weighted sum of two images.

These functions from the section (See Motion Analysis and Object Tracking Reference) are simply the basic functions for background information accumulation and they can not make up a complete background differencing module alone. That’s why we use a non official mopdule of OpenCV, a function implemented in the auxiliary part of OpenCV :

cvUpdateBGStatModel(tmp_frame, bg_model); This function is used in background subtraction module to detect an object in the video. It create a Background subtraction model containing the foreground mask and the updated background model image. The update of the background model is necessary to resolve many problem of background as gradual changes of brightness for example.

5.2.2 Camshift This part is extract from the OpenCV reference Manual [4]. CamShift stands for the “Continuously Adaptive Mean-SHIFT” algorithm. Figure 6 summarizes this algorithm. For each video frame, the raw image is converted to a color probability distribution image via a color histogram model of the color being tracked, e.g., flesh color in the case of face tracking. The center and size of the color object are found via the CamShift algorithm operating on the color probability image. The current size and location of the tracked object are reported and used to set the size and location of the search window in the next video image. The process is then repeated for continuous tracking. The algorithm is a generalization of the Mean Shift algorithm, highlighted in gray in Figure 6.

34

Fig. 13: Block Diagram of Camshift Algorithm

CamShift operates on a 2D color probability distribution image produced from histogram back-projection (see the section on Histogram in Image Analysis). The core part of the CamShift algorithm is the Mean Shift algorithm. The Mean Shift part of the algorithm (gray area in figure 6) is as follows: 1. Choose the search window size. 2. Choose the initial location of the search window. 3. Compute the mean location in the search window. 4. Center the search window at the mean location computed in Step 3. 5. Repeat Steps 3 and 4 until the search window center converges, i.e., until it has moved for a distance less than the preset threshold. Unlike the Mean Shift algorithm, which is designed for static distributions, CamShift is designed for dynamically changing distributions. These occur when objects in video sequences are being tracked and the object moves so that the size and location of the probability distribution changes in time. The CamShift algorithm adjusts the search window size in the course of its operation. Initial window size can be set at any reasonable value. For discrete distributions (digital data), the minimum window length or width is three. Instead of a set, or externally adapted window size, CamShift relies on the zeroth moment information, extracted as part of the internal workings of the algorithm, to continuously adapt its window size within or over each video frame.

35

CamShift Algorithm: 1. Set the calculation region of the probability distribution to the whole image. 2. Choose the initial location of the 2D mean shift search window. 3. Calculate the color probability distribution in the 2D region centered at the search window location in an ROI slightly larger than the mean shift window size. 4. Run Mean Shift algorithm to find the search window center. Store the zeroth moment (area or size) and center location. 5. For the next video frame, center the search window at the mean location stored in Step 4 and set the window size to a function of the zeroth moment found there. Go to Step 3. Figure 7 shows CamShift finding the face center on a 1D slice through a face and hand flesh hue distribution. Figure 8 shows the next frame when the face and hand flesh hue distribution has moved, and convergence is reached in two iterations.

Fig. 14: Cross Section of Flesh Hue Distribution

Rectangular CamShift window is shown behind the hue distribution, while triangle in front marks the window center. CamShift is shown iterating to convergence down the left then right columns.

36

Fig. 15: Flesh Hue Distribution (Next Frame)

Starting from the converged search location in Figure 7 bottom right, CamShift converges on new center of distribution in two iterations.

37