Creating sound events by motion detection on a video image in Max 8 (Max/MSP/Jitter)

Liza Sirenko

5 years ago

On November 8, as part of the Pandemic Media Space project, media artist Georgiy Potopalskiy (alias Ujif_notfound) held a master class “Application of algorithms in media art”. The musician gradually explained how to make a patch in the Max/MSP program that will be able to convert the movements of the object/objects in the webcam video into sound events. Practically this method was used by Georgiy Potopalskiy in Car-Beat (2018), its visual representation can be seen in Practice of Strings (2011):

Practice of Strings : live set 2012 from Ujif_Notfound on Vimeo.

Algorithm description:

Gray cords transmit messages with numbers, green ones are responsible for the video.

Firstly, it is necessary to build the block of objects and operations at the top left:

– toggle (“X”) is a controller to display the on/off status of the metronome

– qmetro33 is a metronome with a 33 milliseconds interval between clicks (captures about 30 frames per second), it starts the calculation process

– jit.grab connects the webcam, open and close messages connected to jit.grab turn the webcam on and off respectively

– jit.resamp @xscale -1. mirrors the webcam image

– jit.rgb2luma converts webcam images from color to black-and-white

– jit.matrix 1 float32 127 480 sets the matrix size: 127 on the X–axis and 480 on the Y–axis, where X-axis is responsible for the pitch scale (midi has 127 pitches)

– object t I I and operation jit.op @op absdiff connected by two cords subtract from each subsequent frame the previous one to make the algorithm to capture the movement giving the outline, the contour of the image

– operations jit.op @op> @val 0.1 and jit.slide @slide_down 2 are responsible for the threshold that regulates the sensitivity of the motion detection system (the higher the threshold, the more pixels of the image will be eliminated); it is permissible to change the values in these operations

– the number in the numberbox (the screenshot shows the value 0.097), which is attached to jit.op @op> @val 0.1, depends on the lighting; only that image values which are greater than the number in the numberbox goes through the algorithm

Next, go to the upper right part of the algorithm:

– by adding objects pak 0 0, clear, setcell $ 1 $ 2 val 1., bang, jit.matrix 1 float 32 480, jit.matrix 1 float32 127 480 @interp 1 we create a white horizontal line with a 1 pixel height, 127 pixels long

– the number in the numberbox (the screenshot shows the value of 196), which is attached to the object pak 0 0, determines the position of the white line in height

– object jit.matrix 1 char 127 @planemap 1 @thru 1 @srcdimstart 0 0 @srcdimend 128 0 @usesrcdim 1 highlights only a white line among all pixels of the 127 × 480 image, and accordingly, eliminates all pixels that are outside the line

– messages srcdimstart 0 $ 1 and srcdimend 126 $ 1, linked with the previously described object, transmit the white line coordinates to the matrix to cut off everything above and below it

– jit.op @op + operation is required to add image and white line matrices and display them on the screen (in order to see their intersection)

– create a video output window (jit.window) and connect to it the image from the webcam and the line image

– operation jit.op @op * is responsible for multiplying the image matrix with the white line matrix, so that, for example, in the future — after connecting the audio generator — sound events appear only when the white line intersects the object/objects in the image (white pixels is a 1-value, black is a 0-value; respectively, when black pixels intersect a white line (1×0=0) the result of their multiplication is zero and does not lead to a sound event, when white pixels of object/objects contour in the video image intersect a white line (1×1=1), on the contrary, sound events occur)

The lower part of the algorithm converts the matrix into numerical values:

– block of jit.iter, +1, pack 0 0, set $ 2 $ 1 objects is responsible for converting the matrix into numerical values, determining the location of 1-values obtained by the movement of the object/objects in the image (when the white contour of the object/objects intersects a white line), and the ratio of the 1-value pixels location on the 127-pixel line with the location of the sound frequencies that will sound from the 127-frequency midi-scale

– multislider is required to display a list of values that appears after the intersection of the object/objects of the image with a white line, and to further transmitting these values using the metronome gmetro 33

– listfunnel object unpacks the list, outputting the value in the format “sequence number/value”

– object unpack 0. 0. (+) translates the list of “sequence numbers/values” obtained earlier into individual values

– the following objects (t f f b, > 0., sel 1, t b b, f, f) form a filter that eliminates zero values for processing only 1-value numbers and determines their sequence number

– makenote object generates midi notes, has inputs (from left to right): pitch (f), volume (f), duration (the screenshot shows the value 250)

– numberbox is responsible for the duration of the sounds

– noteout object transmits data (in midi format) to external devices — synthesizer, or other program

You can download the algorithm in .maxpat format by following the link

The master class video: