<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en-CA">
	<id>https://nmsl.cs.sfu.ca/index.php?action=history&amp;feed=atom&amp;title=ACMM08</id>
	<title>ACMM08 - Revision history</title>
	<link rel="self" type="application/atom+xml" href="https://nmsl.cs.sfu.ca/index.php?action=history&amp;feed=atom&amp;title=ACMM08"/>
	<link rel="alternate" type="text/html" href="https://nmsl.cs.sfu.ca/index.php?title=ACMM08&amp;action=history"/>
	<updated>2026-04-08T17:54:56Z</updated>
	<subtitle>Revision history for this page on the wiki</subtitle>
	<generator>MediaWiki 1.35.1</generator>
	<entry>
		<id>https://nmsl.cs.sfu.ca/index.php?title=ACMM08&amp;diff=1854&amp;oldid=prev</id>
		<title>Mbagheri at 00:02, 19 April 2008</title>
		<link rel="alternate" type="text/html" href="https://nmsl.cs.sfu.ca/index.php?title=ACMM08&amp;diff=1854&amp;oldid=prev"/>
		<updated>2008-04-19T00:02:31Z</updated>

		<summary type="html">&lt;p&gt;&lt;/p&gt;
&lt;table class=&quot;diff diff-contentalign-left diff-editfont-monospace&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en-CA&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;Revision as of 00:02, 19 April 2008&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l35&quot; &gt;Line 35:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 35:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;The algorithm fails to extract meaningful key frames for surveillance videos especially the last two. As it can be seen from the plots, although there is a significant raise/drop in the distance, no new key frame is extracted. This is semantically correct since there is no new shot. However, we need to address this issue either by parameter tuning or changing the algorithm. Characteristics of surveillance videos can help us tune the algorithm for such applications. For example, since the background in surveillance videos is usually steady, we may want to subtract an average histogram (computed progressively) from all video frames, so small changes can be detected more easily.&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;The algorithm fails to extract meaningful key frames for surveillance videos especially the last two. As it can be seen from the plots, although there is a significant raise/drop in the distance, no new key frame is extracted. This is semantically correct since there is no new shot. However, we need to address this issue either by parameter tuning or changing the algorithm. Characteristics of surveillance videos can help us tune the algorithm for such applications. For example, since the background in surveillance videos is usually steady, we may want to subtract an average histogram (computed progressively) from all video frames, so small changes can be detected more easily.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;−&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;''' Hierarchical Summarization''': From what I understand, the distance of frames are measured to the last key frame so far. This means that we are tracing the distance and when it reaches its maximum we declare a new key frame. The algorithm provides us with root key frames as a starting points. These frames are intuitively the ones that are farthest from each other thus maximizing the ''frame coverage''. In order to add more details to the summary we process each shot individually. Therefore, for an online system, we need to keep all the frames between the two subsequent root key frames (peaks in the plot). From all the frames in &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;this &lt;/del&gt;shot we calculate the maximum distance to the root key frame. &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;We have &lt;/del&gt;the number of frames in the current shot&lt;del class=&quot;diffchange diffchange-inline&quot;&gt;, N&lt;/del&gt;, and the &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;percentage from which &lt;/del&gt;we can calculate the number of frames we need, &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;m&lt;/del&gt;. The &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;distance &lt;/del&gt;is divided &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;by m&lt;/del&gt;+1 and &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;the &lt;/del&gt;frames are &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;taken at equal distances (on y-axis)&lt;/del&gt;. This approach maximizes the distance (visual difference) between the &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;new &lt;/del&gt;key frames and the root key frames &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;selected by the algorithm&lt;/del&gt;. Note that this is very different from &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;equal distance &lt;/del&gt;temporal sampling (on x-axis) as it provides more detail for the parts that have more changes. &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;Moreover&lt;/del&gt;, &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;in some sense &lt;/del&gt;it is &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;reflecting &lt;/del&gt;the content progression of the video instead of temporal progression by taking more samples only if there is more visual change. We &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;also &lt;/del&gt;need to come up with &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;an idea of how &lt;/del&gt;to allocate key frame to &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;shot&lt;/del&gt;. Some shots do not need as many frames as others. The distance plot can help us identify these shots &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;corresponding &lt;/del&gt;to flat parts of the plot (e.g. the last segment in foreman).&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;''' Hierarchical Summarization''': From what I understand, the distance of frames are measured to the last key frame so far. This means that we are tracing the distance and when it reaches its maximum we declare a new key frame. The algorithm provides us with root key frames as a starting points. These frames are intuitively the ones that are farthest from each other thus maximizing the ''frame coverage''. In order to add more details to the summary we process each shot individually. Therefore, for an online system, we need to keep all the frames between the two subsequent root key frames (peaks in the plot). From all the frames in &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;th &lt;/ins&gt;shot&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;, &lt;/ins&gt;we calculate the maximum distance to the &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;previous &lt;/ins&gt;root key frame. &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;From &lt;/ins&gt;the number of frames in the current shot, and the &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;desired summarization ratio, &lt;/ins&gt;we can calculate the number of &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;extra &lt;/ins&gt;frames we need, &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;M&lt;/ins&gt;. The &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;interval (on y-axis) &lt;/ins&gt;is divided &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;into M&lt;/ins&gt;+1 &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;equal segments &lt;/ins&gt;and &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;M &lt;/ins&gt;frames are &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;selected&lt;/ins&gt;. This approach maximizes the distance (visual difference) between the &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;extra &lt;/ins&gt;key frames and the root key frames. Note that this is very different from &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;uniform &lt;/ins&gt;temporal sampling (&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;equal distances &lt;/ins&gt;on x-axis) as it provides more detail for the parts that have more changes. &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;In other words&lt;/ins&gt;, it is &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;reflects &lt;/ins&gt;the content progression of the video instead of temporal progression by taking more samples only if there is more visual change. We need to come up with &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;a technique &lt;/ins&gt;to allocate key frame to &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;shots&lt;/ins&gt;. Some shots do not need as many frames as others. The distance plot can help us identify these shots &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;which correspond &lt;/ins&gt;to flat parts of the plot (e.g. the last segment in foreman).&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>Mbagheri</name></author>
	</entry>
	<entry>
		<id>https://nmsl.cs.sfu.ca/index.php?title=ACMM08&amp;diff=1853&amp;oldid=prev</id>
		<title>Mbagheri at 23:57, 18 April 2008</title>
		<link rel="alternate" type="text/html" href="https://nmsl.cs.sfu.ca/index.php?title=ACMM08&amp;diff=1853&amp;oldid=prev"/>
		<updated>2008-04-18T23:57:24Z</updated>

		<summary type="html">&lt;p&gt;&lt;/p&gt;
&lt;table class=&quot;diff diff-contentalign-left diff-editfont-monospace&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en-CA&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;Revision as of 23:57, 18 April 2008&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l35&quot; &gt;Line 35:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 35:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;The algorithm fails to extract meaningful key frames for surveillance videos especially the last two. As it can be seen from the plots, although there is a significant raise/drop in the distance, no new key frame is extracted. This is semantically correct since there is no new shot. However, we need to address this issue either by parameter tuning or changing the algorithm. Characteristics of surveillance videos can help us tune the algorithm for such applications. For example, since the background in surveillance videos is usually steady, we may want to subtract an average histogram (computed progressively) from all video frames, so small changes can be detected more easily.&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;The algorithm fails to extract meaningful key frames for surveillance videos especially the last two. As it can be seen from the plots, although there is a significant raise/drop in the distance, no new key frame is extracted. This is semantically correct since there is no new shot. However, we need to address this issue either by parameter tuning or changing the algorithm. Characteristics of surveillance videos can help us tune the algorithm for such applications. For example, since the background in surveillance videos is usually steady, we may want to subtract an average histogram (computed progressively) from all video frames, so small changes can be detected more easily.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;−&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;''' Hierarchical Summarization''': From what I understand, the distance of frames are measured to the last key frame so far. This means that we are tracing the distance and when it reaches its maximum we declare a new key frame. The algorithm provides us with root key frames as a starting points. These frames are intuitively the ones that are farthest from each other thus maximizing the ''frame coverage''. In order to add more details to the summary we process each shot individually. Therefore, for an online system, we need to keep all the frames between the two root key frames (peaks in the plot). From all the frames in this shot we calculate the maximum distance to the root key frame. We have the number of frames in the current shot, N, and the percentage from which we can calculate the number of frames we need, m. The distance is divided by m+1 and the frames are taken at equal distances (on y-axis). This approach maximizes the distance (visual difference) between the new key frames and the root key frames selected by the algorithm. Note that this is very different from equal distance temporal sampling (on x-axis) as it provides more detail for the parts that have more changes. Moreover, in some sense it is reflecting the content progression of the video instead of temporal progression by taking more samples only if there is more visual change. We also need to come up with an idea of how to allocate key frame to shot. Some shots do not need as many frames as others. The distance plot can help us identify these shots corresponding to flat parts of the plot (e.g. the last segment in foreman).&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;''' Hierarchical Summarization''': From what I understand, the distance of frames are measured to the last key frame so far. This means that we are tracing the distance and when it reaches its maximum we declare a new key frame. The algorithm provides us with root key frames as a starting points. These frames are intuitively the ones that are farthest from each other thus maximizing the ''frame coverage''. In order to add more details to the summary we process each shot individually. Therefore, for an online system, we need to keep all the frames between the two &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;subsequent &lt;/ins&gt;root key frames (peaks in the plot). From all the frames in this shot we calculate the maximum distance to the root key frame. We have the number of frames in the current shot, N, and the percentage from which we can calculate the number of frames we need, m. The distance is divided by m+1 and the frames are taken at equal distances (on y-axis). This approach maximizes the distance (visual difference) between the new key frames and the root key frames selected by the algorithm. Note that this is very different from equal distance temporal sampling (on x-axis) as it provides more detail for the parts that have more changes. Moreover, in some sense it is reflecting the content progression of the video instead of temporal progression by taking more samples only if there is more visual change. We also need to come up with an idea of how to allocate key frame to shot. Some shots do not need as many frames as others. The distance plot can help us identify these shots corresponding to flat parts of the plot (e.g. the last segment in foreman).&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>Mbagheri</name></author>
	</entry>
	<entry>
		<id>https://nmsl.cs.sfu.ca/index.php?title=ACMM08&amp;diff=1852&amp;oldid=prev</id>
		<title>Mbagheri at 23:56, 18 April 2008</title>
		<link rel="alternate" type="text/html" href="https://nmsl.cs.sfu.ca/index.php?title=ACMM08&amp;diff=1852&amp;oldid=prev"/>
		<updated>2008-04-18T23:56:01Z</updated>

		<summary type="html">&lt;p&gt;&lt;/p&gt;
&lt;table class=&quot;diff diff-contentalign-left diff-editfont-monospace&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en-CA&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;Revision as of 23:56, 18 April 2008&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l35&quot; &gt;Line 35:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 35:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;The algorithm fails to extract meaningful key frames for surveillance videos especially the last two. As it can be seen from the plots, although there is a significant raise/drop in the distance, no new key frame is extracted. This is semantically correct since there is no new shot. However, we need to address this issue either by parameter tuning or changing the algorithm. Characteristics of surveillance videos can help us tune the algorithm for such applications. For example, since the background in surveillance videos is usually steady, we may want to subtract an average histogram (computed progressively) from all video frames, so small changes can be detected more easily.&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;The algorithm fails to extract meaningful key frames for surveillance videos especially the last two. As it can be seen from the plots, although there is a significant raise/drop in the distance, no new key frame is extracted. This is semantically correct since there is no new shot. However, we need to address this issue either by parameter tuning or changing the algorithm. Characteristics of surveillance videos can help us tune the algorithm for such applications. For example, since the background in surveillance videos is usually steady, we may want to subtract an average histogram (computed progressively) from all video frames, so small changes can be detected more easily.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;−&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;''' Hierarchical Summarization''': From what I understand, the distance of frames are measured to the last key frame so far. This means that we are tracing the distance and when it reaches its maximum we declare a new key frame. The algorithm provides &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;are &lt;/del&gt;with root key frames as a starting points. These frames are intuitively the ones that are farthest from each other thus maximizing the ''frame coverage''. In order to add more details to the summary we process each shot individually. Therefore, for an online system, we need to keep all the frames between the two root key frames (peaks in the plot). From all the frames in this shot we calculate the maximum distance to the root key frame. We have the number of frames in the current shot, N, and the percentage from which we can calculate the number of frames we need, m. The distance is divided by m+1 and the frames are taken at equal distances (on y-axis). This approach maximizes the distance (visual difference) between the new key frames and the root key frames selected by the algorithm. Note that this is very different from equal distance temporal sampling (on x-axis) as it provides more detail for the parts that have more changes. Moreover, in some sense it is reflecting the content progression of the video instead of temporal progression by taking more samples only if there is more visual change. We also need to come up with an idea of how to allocate key frame to shot. Some shots do not need as many frames as others. The distance plot can help us identify these shots corresponding to flat parts of the plot (e.g. the last segment in foreman).&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;''' Hierarchical Summarization''': From what I understand, the distance of frames are measured to the last key frame so far. This means that we are tracing the distance and when it reaches its maximum we declare a new key frame. The algorithm provides &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;us &lt;/ins&gt;with root key frames as a starting points. These frames are intuitively the ones that are farthest from each other thus maximizing the ''frame coverage''. In order to add more details to the summary we process each shot individually. Therefore, for an online system, we need to keep all the frames between the two root key frames (peaks in the plot). From all the frames in this shot we calculate the maximum distance to the root key frame. We have the number of frames in the current shot, N, and the percentage from which we can calculate the number of frames we need, m. The distance is divided by m+1 and the frames are taken at equal distances (on y-axis). This approach maximizes the distance (visual difference) between the new key frames and the root key frames selected by the algorithm. Note that this is very different from equal distance temporal sampling (on x-axis) as it provides more detail for the parts that have more changes. Moreover, in some sense it is reflecting the content progression of the video instead of temporal progression by taking more samples only if there is more visual change. We also need to come up with an idea of how to allocate key frame to shot. Some shots do not need as many frames as others. The distance plot can help us identify these shots corresponding to flat parts of the plot (e.g. the last segment in foreman).&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>Mbagheri</name></author>
	</entry>
	<entry>
		<id>https://nmsl.cs.sfu.ca/index.php?title=ACMM08&amp;diff=1851&amp;oldid=prev</id>
		<title>Mbagheri at 23:54, 18 April 2008</title>
		<link rel="alternate" type="text/html" href="https://nmsl.cs.sfu.ca/index.php?title=ACMM08&amp;diff=1851&amp;oldid=prev"/>
		<updated>2008-04-18T23:54:59Z</updated>

		<summary type="html">&lt;p&gt;&lt;/p&gt;
&lt;table class=&quot;diff diff-contentalign-left diff-editfont-monospace&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en-CA&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;Revision as of 23:54, 18 April 2008&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l33&quot; &gt;Line 33:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 33:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;As it can be seen from the extracted summaries, the algorithm succeeds in detecting shots. For videos that contain only one shot, only one key frame is extracted such as city and ice sequences. While for multi shot videos several frames are extracted. A good example here is the doc_reality sequence provided by CBC that has a lot of shots. In addition, since we are using all three color elements (HSV), changes in the brightness are detected e.g. the first fews key frames in doc_reality. We may want to change this for surveillance applications and use only H and S.&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;As it can be seen from the extracted summaries, the algorithm succeeds in detecting shots. For videos that contain only one shot, only one key frame is extracted such as city and ice sequences. While for multi shot videos several frames are extracted. A good example here is the doc_reality sequence provided by CBC that has a lot of shots. In addition, since we are using all three color elements (HSV), changes in the brightness are detected e.g. the first fews key frames in doc_reality. We may want to change this for surveillance applications and use only H and S.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;−&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt; &lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;The algorithm fails to extract meaningful key frames for surveillance videos especially the last two. As it can be seen from the plots, although there is a significant raise&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;/&lt;/ins&gt;drop in the distance, no new &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;key frame &lt;/ins&gt;is &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;extracted. This &lt;/ins&gt;is semantically correct &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;since there &lt;/ins&gt;is &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;no new &lt;/ins&gt;shot. However, we need to &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;address this issue &lt;/ins&gt;either &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;by parameter tuning &lt;/ins&gt;or &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;changing &lt;/ins&gt;the algorithm. &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;Characteristics of surveillance videos can help us tune &lt;/ins&gt;the algorithm for such &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;applications&lt;/ins&gt;. &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;For example&lt;/ins&gt;, since the background in surveillance videos &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;is &lt;/ins&gt;usually steady, we may want to subtract an average histogram (computed progressively) from all video frames, so small changes can be detected more easily.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;−&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;The algorithm fails to extract meaningful key frames for surveillance videos especially the last two. As it can be seen from the plots, although there is a significant raise &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;or &lt;/del&gt;drop in the distance, no new &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;shot &lt;/del&gt;is &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;detected which &lt;/del&gt;is semantically correct &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;because the whole video &lt;/del&gt;is &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;one &lt;/del&gt;shot. However, we need to either &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;tune the parameters &lt;/del&gt;or &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;change &lt;/del&gt;the algorithm &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;to take key frames even when there is no change in the shot&lt;/del&gt;. &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;This could mean there is a need for tuning &lt;/del&gt;the algorithm for such &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;application&lt;/del&gt;. &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;In addition&lt;/del&gt;, since the background in surveillance videos &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;are &lt;/del&gt;usually steady, we may want to subtract an average histogram (computed progressively) from all video frames, so small changes can be detected more easily.&lt;/div&gt;&lt;/td&gt;&lt;td colspan=&quot;2&quot;&gt; &lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;''' Hierarchical Summarization''': From what I understand, the distance of frames are measured to the last key frame so far. This means that we are tracing the distance and when it reaches its maximum we declare a new key frame. The algorithm provides are with root key frames as a starting points. These frames are intuitively the ones that are farthest from each other thus maximizing the ''frame coverage''. In order to add more details to the summary we process each shot individually. Therefore, for an online system, we need to keep all the frames between the two root key frames (peaks in the plot). From all the frames in this shot we calculate the maximum distance to the root key frame. We have the number of frames in the current shot, N, and the percentage from which we can calculate the number of frames we need, m. The distance is divided by m+1 and the frames are taken at equal distances (on y-axis). This approach maximizes the distance (visual difference) between the new key frames and the root key frames selected by the algorithm. Note that this is very different from equal distance temporal sampling (on x-axis) as it provides more detail for the parts that have more changes. Moreover, in some sense it is reflecting the content progression of the video instead of temporal progression by taking more samples only if there is more visual change. We also need to come up with an idea of how to allocate key frame to shot. Some shots do not need as many frames as others. The distance plot can help us identify these shots corresponding to flat parts of the plot (e.g. the last segment in foreman).&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;''' Hierarchical Summarization''': From what I understand, the distance of frames are measured to the last key frame so far. This means that we are tracing the distance and when it reaches its maximum we declare a new key frame. The algorithm provides are with root key frames as a starting points. These frames are intuitively the ones that are farthest from each other thus maximizing the ''frame coverage''. In order to add more details to the summary we process each shot individually. Therefore, for an online system, we need to keep all the frames between the two root key frames (peaks in the plot). From all the frames in this shot we calculate the maximum distance to the root key frame. We have the number of frames in the current shot, N, and the percentage from which we can calculate the number of frames we need, m. The distance is divided by m+1 and the frames are taken at equal distances (on y-axis). This approach maximizes the distance (visual difference) between the new key frames and the root key frames selected by the algorithm. Note that this is very different from equal distance temporal sampling (on x-axis) as it provides more detail for the parts that have more changes. Moreover, in some sense it is reflecting the content progression of the video instead of temporal progression by taking more samples only if there is more visual change. We also need to come up with an idea of how to allocate key frame to shot. Some shots do not need as many frames as others. The distance plot can help us identify these shots corresponding to flat parts of the plot (e.g. the last segment in foreman).&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>Mbagheri</name></author>
	</entry>
	<entry>
		<id>https://nmsl.cs.sfu.ca/index.php?title=ACMM08&amp;diff=1850&amp;oldid=prev</id>
		<title>Mbagheri at 23:52, 18 April 2008</title>
		<link rel="alternate" type="text/html" href="https://nmsl.cs.sfu.ca/index.php?title=ACMM08&amp;diff=1850&amp;oldid=prev"/>
		<updated>2008-04-18T23:52:22Z</updated>

		<summary type="html">&lt;p&gt;&lt;/p&gt;
&lt;table class=&quot;diff diff-contentalign-left diff-editfont-monospace&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en-CA&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;Revision as of 23:52, 18 April 2008&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l31&quot; &gt;Line 31:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 31:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&amp;lt;/table&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&amp;lt;/table&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;−&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;As it can be seen from the extracted summaries, the algorithm succeeds in detecting shots. A good example here is the doc_reality sequence provided by CBC that has a lot of shots. In addition, since we are using all three color elements (HSV), changes in the brightness are detected e.g. the first fews key frames in doc_reality. We may want to change this for surveillance applications and use only H and S.&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;As it can be seen from the extracted summaries, the algorithm succeeds in detecting shots&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;. For videos that contain only one shot, only one key frame is extracted such as city and ice sequences. While for multi shot videos several frames are extracted&lt;/ins&gt;. A good example here is the doc_reality sequence provided by CBC that has a lot of shots. In addition, since we are using all three color elements (HSV), changes in the brightness are detected e.g. the first fews key frames in doc_reality. We may want to change this for surveillance applications and use only H and S.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt; &lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt; &lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;−&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;For videos that contain only one shot, only one key frame is extracted such as city and ice sequences.&lt;/del&gt;&lt;/div&gt;&lt;/td&gt;&lt;td colspan=&quot;2&quot;&gt; &lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;The algorithm fails to extract meaningful key frames for surveillance videos especially the last two. As it can be seen from the plots, although there is a significant raise or drop in the distance, no new shot is detected which is semantically correct because the whole video is one shot. However, we need to either tune the parameters or change the algorithm to take key frames even when there is no change in the shot. This could mean there is a need for tuning the algorithm for such application. In addition, since the background in surveillance videos are usually steady, we may want to subtract an average histogram (computed progressively) from all video frames, so small changes can be detected more easily.&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;The algorithm fails to extract meaningful key frames for surveillance videos especially the last two. As it can be seen from the plots, although there is a significant raise or drop in the distance, no new shot is detected which is semantically correct because the whole video is one shot. However, we need to either tune the parameters or change the algorithm to take key frames even when there is no change in the shot. This could mean there is a need for tuning the algorithm for such application. In addition, since the background in surveillance videos are usually steady, we may want to subtract an average histogram (computed progressively) from all video frames, so small changes can be detected more easily.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;''' Hierarchical Summarization''': From what I understand, the distance of frames are measured to the last key frame so far. This means that we are tracing the distance and when it reaches its maximum we declare a new key frame. The algorithm provides are with root key frames as a starting points. These frames are intuitively the ones that are farthest from each other thus maximizing the ''frame coverage''. In order to add more details to the summary we process each shot individually. Therefore, for an online system, we need to keep all the frames between the two root key frames (peaks in the plot). From all the frames in this shot we calculate the maximum distance to the root key frame. We have the number of frames in the current shot, N, and the percentage from which we can calculate the number of frames we need, m. The distance is divided by m+1 and the frames are taken at equal distances (on y-axis). This approach maximizes the distance (visual difference) between the new key frames and the root key frames selected by the algorithm. Note that this is very different from equal distance temporal sampling (on x-axis) as it provides more detail for the parts that have more changes. Moreover, in some sense it is reflecting the content progression of the video instead of temporal progression by taking more samples only if there is more visual change. We also need to come up with an idea of how to allocate key frame to shot. Some shots do not need as many frames as others. The distance plot can help us identify these shots corresponding to flat parts of the plot (e.g. the last segment in foreman).&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;''' Hierarchical Summarization''': From what I understand, the distance of frames are measured to the last key frame so far. This means that we are tracing the distance and when it reaches its maximum we declare a new key frame. The algorithm provides are with root key frames as a starting points. These frames are intuitively the ones that are farthest from each other thus maximizing the ''frame coverage''. In order to add more details to the summary we process each shot individually. Therefore, for an online system, we need to keep all the frames between the two root key frames (peaks in the plot). From all the frames in this shot we calculate the maximum distance to the root key frame. We have the number of frames in the current shot, N, and the percentage from which we can calculate the number of frames we need, m. The distance is divided by m+1 and the frames are taken at equal distances (on y-axis). This approach maximizes the distance (visual difference) between the new key frames and the root key frames selected by the algorithm. Note that this is very different from equal distance temporal sampling (on x-axis) as it provides more detail for the parts that have more changes. Moreover, in some sense it is reflecting the content progression of the video instead of temporal progression by taking more samples only if there is more visual change. We also need to come up with an idea of how to allocate key frame to shot. Some shots do not need as many frames as others. The distance plot can help us identify these shots corresponding to flat parts of the plot (e.g. the last segment in foreman).&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>Mbagheri</name></author>
	</entry>
	<entry>
		<id>https://nmsl.cs.sfu.ca/index.php?title=ACMM08&amp;diff=1849&amp;oldid=prev</id>
		<title>Mbagheri at 23:51, 18 April 2008</title>
		<link rel="alternate" type="text/html" href="https://nmsl.cs.sfu.ca/index.php?title=ACMM08&amp;diff=1849&amp;oldid=prev"/>
		<updated>2008-04-18T23:51:35Z</updated>

		<summary type="html">&lt;p&gt;&lt;/p&gt;
&lt;table class=&quot;diff diff-contentalign-left diff-editfont-monospace&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en-CA&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;Revision as of 23:51, 18 April 2008&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l33&quot; &gt;Line 33:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 33:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;As it can be seen from the extracted summaries, the algorithm succeeds in detecting shots. A good example here is the doc_reality sequence provided by CBC that has a lot of shots. In addition, since we are using all three color elements (HSV), changes in the brightness are detected e.g. the first fews key frames in doc_reality. We may want to change this for surveillance applications and use only H and S.&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;As it can be seen from the extracted summaries, the algorithm succeeds in detecting shots. A good example here is the doc_reality sequence provided by CBC that has a lot of shots. In addition, since we are using all three color elements (HSV), changes in the brightness are detected e.g. the first fews key frames in doc_reality. We may want to change this for surveillance applications and use only H and S.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;−&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;For videos that only &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;contain &lt;/del&gt;one shot &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;without much camera motion&lt;/del&gt;, only one key frame is extracted such as city and ice sequences.&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;For videos that &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;contain &lt;/ins&gt;only one shot, only one key frame is extracted such as city and ice sequences.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;The algorithm fails to extract meaningful key frames for surveillance videos especially the last two. As it can be seen from the plots, although there is a significant raise or drop in the distance, no new shot is detected which is semantically correct because the whole video is one shot. However, we need to either tune the parameters or change the algorithm to take key frames even when there is no change in the shot. This could mean there is a need for tuning the algorithm for such application. In addition, since the background in surveillance videos are usually steady, we may want to subtract an average histogram (computed progressively) from all video frames, so small changes can be detected more easily.&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;The algorithm fails to extract meaningful key frames for surveillance videos especially the last two. As it can be seen from the plots, although there is a significant raise or drop in the distance, no new shot is detected which is semantically correct because the whole video is one shot. However, we need to either tune the parameters or change the algorithm to take key frames even when there is no change in the shot. This could mean there is a need for tuning the algorithm for such application. In addition, since the background in surveillance videos are usually steady, we may want to subtract an average histogram (computed progressively) from all video frames, so small changes can be detected more easily.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;''' Hierarchical Summarization''': From what I understand, the distance of frames are measured to the last key frame so far. This means that we are tracing the distance and when it reaches its maximum we declare a new key frame. The algorithm provides are with root key frames as a starting points. These frames are intuitively the ones that are farthest from each other thus maximizing the ''frame coverage''. In order to add more details to the summary we process each shot individually. Therefore, for an online system, we need to keep all the frames between the two root key frames (peaks in the plot). From all the frames in this shot we calculate the maximum distance to the root key frame. We have the number of frames in the current shot, N, and the percentage from which we can calculate the number of frames we need, m. The distance is divided by m+1 and the frames are taken at equal distances (on y-axis). This approach maximizes the distance (visual difference) between the new key frames and the root key frames selected by the algorithm. Note that this is very different from equal distance temporal sampling (on x-axis) as it provides more detail for the parts that have more changes. Moreover, in some sense it is reflecting the content progression of the video instead of temporal progression by taking more samples only if there is more visual change. We also need to come up with an idea of how to allocate key frame to shot. Some shots do not need as many frames as others. The distance plot can help us identify these shots corresponding to flat parts of the plot (e.g. the last segment in foreman).&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;''' Hierarchical Summarization''': From what I understand, the distance of frames are measured to the last key frame so far. This means that we are tracing the distance and when it reaches its maximum we declare a new key frame. The algorithm provides are with root key frames as a starting points. These frames are intuitively the ones that are farthest from each other thus maximizing the ''frame coverage''. In order to add more details to the summary we process each shot individually. Therefore, for an online system, we need to keep all the frames between the two root key frames (peaks in the plot). From all the frames in this shot we calculate the maximum distance to the root key frame. We have the number of frames in the current shot, N, and the percentage from which we can calculate the number of frames we need, m. The distance is divided by m+1 and the frames are taken at equal distances (on y-axis). This approach maximizes the distance (visual difference) between the new key frames and the root key frames selected by the algorithm. Note that this is very different from equal distance temporal sampling (on x-axis) as it provides more detail for the parts that have more changes. Moreover, in some sense it is reflecting the content progression of the video instead of temporal progression by taking more samples only if there is more visual change. We also need to come up with an idea of how to allocate key frame to shot. Some shots do not need as many frames as others. The distance plot can help us identify these shots corresponding to flat parts of the plot (e.g. the last segment in foreman).&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>Mbagheri</name></author>
	</entry>
	<entry>
		<id>https://nmsl.cs.sfu.ca/index.php?title=ACMM08&amp;diff=1848&amp;oldid=prev</id>
		<title>Mbagheri at 23:51, 18 April 2008</title>
		<link rel="alternate" type="text/html" href="https://nmsl.cs.sfu.ca/index.php?title=ACMM08&amp;diff=1848&amp;oldid=prev"/>
		<updated>2008-04-18T23:51:07Z</updated>

		<summary type="html">&lt;p&gt;&lt;/p&gt;
&lt;table class=&quot;diff diff-contentalign-left diff-editfont-monospace&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en-CA&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;Revision as of 23:51, 18 April 2008&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l31&quot; &gt;Line 31:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 31:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&amp;lt;/table&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&amp;lt;/table&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;−&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;As it can be seen from the extracted summaries, the algorithm succeeds in detecting shots. A good example here is the doc_reality sequence provided by CBC that has a lot of shots. In addition, since we are using all three color elements (HSV), changes in the brightness are detected e.g. the first fews key frames in doc_reality. We may want to change this for surveillance applications.&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;As it can be seen from the extracted summaries, the algorithm succeeds in detecting shots. A good example here is the doc_reality sequence provided by CBC that has a lot of shots. In addition, since we are using all three color elements (HSV), changes in the brightness are detected e.g. the first fews key frames in doc_reality. We may want to change this for surveillance applications &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;and use only H and S&lt;/ins&gt;.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;For videos that only contain one shot without much camera motion, only one key frame is extracted such as city and ice sequences.&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;For videos that only contain one shot without much camera motion, only one key frame is extracted such as city and ice sequences.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>Mbagheri</name></author>
	</entry>
	<entry>
		<id>https://nmsl.cs.sfu.ca/index.php?title=ACMM08&amp;diff=1847&amp;oldid=prev</id>
		<title>Mbagheri at 23:50, 18 April 2008</title>
		<link rel="alternate" type="text/html" href="https://nmsl.cs.sfu.ca/index.php?title=ACMM08&amp;diff=1847&amp;oldid=prev"/>
		<updated>2008-04-18T23:50:53Z</updated>

		<summary type="html">&lt;p&gt;&lt;/p&gt;
&lt;table class=&quot;diff diff-contentalign-left diff-editfont-monospace&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en-CA&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;Revision as of 23:50, 18 April 2008&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l31&quot; &gt;Line 31:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 31:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&amp;lt;/table&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&amp;lt;/table&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;−&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;As it can be seen from the extracted summaries, the algorithm succeeds in detecting shots. A good example here is the doc_reality sequence provided by CBC that has a lot of shots. In addition, since we are using all three color elements (HSV), changes in the brightness are detected e.g. the first fews key frames in doc_reality.&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;As it can be seen from the extracted summaries, the algorithm succeeds in detecting shots. A good example here is the doc_reality sequence provided by CBC that has a lot of shots. In addition, since we are using all three color elements (HSV), changes in the brightness are detected e.g. the first fews key frames in doc_reality. &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;We may want to change this for surveillance applications.&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt; &lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt; &lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;For videos that only contain one shot without much camera motion, only one key frame is extracted such as city and ice sequences.&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;For videos that only contain one shot without much camera motion, only one key frame is extracted such as city and ice sequences.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;The algorithm fails to extract meaningful key frames for surveillance videos especially the last two. As it can be seen from the plots, although there is a significant raise or drop in the distance, no new shot is detected which is semantically correct because the whole video is one shot. However, we need to either tune the parameters or change the algorithm to take key frames even when there is no change in the shot. This could mean there is a need for tuning the algorithm for such application. In addition, since the background in surveillance videos are usually steady, we may want to subtract an average histogram (computed progressively) from all video frames, so small changes can be detected more easily.&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;The algorithm fails to extract meaningful key frames for surveillance videos especially the last two. As it can be seen from the plots, although there is a significant raise or drop in the distance, no new shot is detected which is semantically correct because the whole video is one shot. However, we need to either tune the parameters or change the algorithm to take key frames even when there is no change in the shot. This could mean there is a need for tuning the algorithm for such application. In addition, since the background in surveillance videos are usually steady, we may want to subtract an average histogram (computed progressively) from all video frames, so small changes can be detected more easily.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;''' Hierarchical Summarization''': From what I understand, the distance of frames are measured to the last key frame so far. This means that we are tracing the distance and when it reaches its maximum we declare a new key frame. The algorithm provides are with root key frames as a starting points. These frames are intuitively the ones that are farthest from each other thus maximizing the ''frame coverage''. In order to add more details to the summary we process each shot individually. Therefore, for an online system, we need to keep all the frames between the two root key frames (peaks in the plot). From all the frames in this shot we calculate the maximum distance to the root key frame. We have the number of frames in the current shot, N, and the percentage from which we can calculate the number of frames we need, m. The distance is divided by m+1 and the frames are taken at equal distances (on y-axis). This approach maximizes the distance (visual difference) between the new key frames and the root key frames selected by the algorithm. Note that this is very different from equal distance temporal sampling (on x-axis) as it provides more detail for the parts that have more changes. Moreover, in some sense it is reflecting the content progression of the video instead of temporal progression by taking more samples only if there is more visual change. We also need to come up with an idea of how to allocate key frame to shot. Some shots do not need as many frames as others. The distance plot can help us identify these shots corresponding to flat parts of the plot (e.g. the last segment in foreman).&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;''' Hierarchical Summarization''': From what I understand, the distance of frames are measured to the last key frame so far. This means that we are tracing the distance and when it reaches its maximum we declare a new key frame. The algorithm provides are with root key frames as a starting points. These frames are intuitively the ones that are farthest from each other thus maximizing the ''frame coverage''. In order to add more details to the summary we process each shot individually. Therefore, for an online system, we need to keep all the frames between the two root key frames (peaks in the plot). From all the frames in this shot we calculate the maximum distance to the root key frame. We have the number of frames in the current shot, N, and the percentage from which we can calculate the number of frames we need, m. The distance is divided by m+1 and the frames are taken at equal distances (on y-axis). This approach maximizes the distance (visual difference) between the new key frames and the root key frames selected by the algorithm. Note that this is very different from equal distance temporal sampling (on x-axis) as it provides more detail for the parts that have more changes. Moreover, in some sense it is reflecting the content progression of the video instead of temporal progression by taking more samples only if there is more visual change. We also need to come up with an idea of how to allocate key frame to shot. Some shots do not need as many frames as others. The distance plot can help us identify these shots corresponding to flat parts of the plot (e.g. the last segment in foreman).&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>Mbagheri</name></author>
	</entry>
	<entry>
		<id>https://nmsl.cs.sfu.ca/index.php?title=ACMM08&amp;diff=1846&amp;oldid=prev</id>
		<title>Mbagheri at 23:49, 18 April 2008</title>
		<link rel="alternate" type="text/html" href="https://nmsl.cs.sfu.ca/index.php?title=ACMM08&amp;diff=1846&amp;oldid=prev"/>
		<updated>2008-04-18T23:49:52Z</updated>

		<summary type="html">&lt;p&gt;&lt;/p&gt;
&lt;table class=&quot;diff diff-contentalign-left diff-editfont-monospace&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en-CA&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;Revision as of 23:49, 18 April 2008&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l33&quot; &gt;Line 33:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 33:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;As it can be seen from the extracted summaries, the algorithm succeeds in detecting shots. A good example here is the doc_reality sequence provided by CBC that has a lot of shots. In addition, since we are using all three color elements (HSV), changes in the brightness are detected e.g. the first fews key frames in doc_reality.&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;As it can be seen from the extracted summaries, the algorithm succeeds in detecting shots. A good example here is the doc_reality sequence provided by CBC that has a lot of shots. In addition, since we are using all three color elements (HSV), changes in the brightness are detected e.g. the first fews key frames in doc_reality.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;For videos that only contain one shot without much camera motion, only one key frame is extracted such as city and ice sequences.&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;For videos that only contain one shot without much camera motion, only one key frame is extracted such as city and ice sequences.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;−&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;The algorithm fails to extract meaningful key frames for surveillance videos especially the last two. This could mean there is a need for &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;fine &lt;/del&gt;tuning the algorithm for such application. In addition, since the background in surveillance videos are usually steady, we may want to subtract an average histogram (computed progressively) from all video frames, so small changes can be detected more easily.&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;The algorithm fails to extract meaningful key frames for surveillance videos especially the last two&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;. As it can be seen from the plots, although there is a significant raise or drop in the distance, no new shot is detected which is semantically correct because the whole video is one shot. However, we need to either tune the parameters or change the algorithm to take key frames even when there is no change in the shot&lt;/ins&gt;. This could mean there is a need for tuning the algorithm for such application. In addition, since the background in surveillance videos are usually steady, we may want to subtract an average histogram (computed progressively) from all video frames, so small changes can be detected more easily.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;''' Hierarchical Summarization''': From what I understand, the distance of frames are measured to the last key frame so far. This means that we are tracing the distance and when it reaches its maximum we declare a new key frame. The algorithm provides are with root key frames as a starting points. These frames are intuitively the ones that are farthest from each other thus maximizing the ''frame coverage''. In order to add more details to the summary we process each shot individually. Therefore, for an online system, we need to keep all the frames between the two root key frames (peaks in the plot). From all the frames in this shot we calculate the maximum distance to the root key frame. We have the number of frames in the current shot, N, and the percentage from which we can calculate the number of frames we need, m. The distance is divided by m+1 and the frames are taken at equal distances (on y-axis). This approach maximizes the distance (visual difference) between the new key frames and the root key frames selected by the algorithm. Note that this is very different from equal distance temporal sampling (on x-axis) as it provides more detail for the parts that have more changes. Moreover, in some sense it is reflecting the content progression of the video instead of temporal progression by taking more samples only if there is more visual change. We also need to come up with an idea of how to allocate key frame to shot. Some shots do not need as many frames as others. The distance plot can help us identify these shots corresponding to flat parts of the plot (e.g. the last segment in foreman).&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;''' Hierarchical Summarization''': From what I understand, the distance of frames are measured to the last key frame so far. This means that we are tracing the distance and when it reaches its maximum we declare a new key frame. The algorithm provides are with root key frames as a starting points. These frames are intuitively the ones that are farthest from each other thus maximizing the ''frame coverage''. In order to add more details to the summary we process each shot individually. Therefore, for an online system, we need to keep all the frames between the two root key frames (peaks in the plot). From all the frames in this shot we calculate the maximum distance to the root key frame. We have the number of frames in the current shot, N, and the percentage from which we can calculate the number of frames we need, m. The distance is divided by m+1 and the frames are taken at equal distances (on y-axis). This approach maximizes the distance (visual difference) between the new key frames and the root key frames selected by the algorithm. Note that this is very different from equal distance temporal sampling (on x-axis) as it provides more detail for the parts that have more changes. Moreover, in some sense it is reflecting the content progression of the video instead of temporal progression by taking more samples only if there is more visual change. We also need to come up with an idea of how to allocate key frame to shot. Some shots do not need as many frames as others. The distance plot can help us identify these shots corresponding to flat parts of the plot (e.g. the last segment in foreman).&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>Mbagheri</name></author>
	</entry>
	<entry>
		<id>https://nmsl.cs.sfu.ca/index.php?title=ACMM08&amp;diff=1845&amp;oldid=prev</id>
		<title>Mbagheri at 23:37, 18 April 2008</title>
		<link rel="alternate" type="text/html" href="https://nmsl.cs.sfu.ca/index.php?title=ACMM08&amp;diff=1845&amp;oldid=prev"/>
		<updated>2008-04-18T23:37:46Z</updated>

		<summary type="html">&lt;p&gt;&lt;/p&gt;
&lt;table class=&quot;diff diff-contentalign-left diff-editfont-monospace&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en-CA&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;Revision as of 23:37, 18 April 2008&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l35&quot; &gt;Line 35:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 35:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;The algorithm fails to extract meaningful key frames for surveillance videos especially the last two. This could mean there is a need for fine tuning the algorithm for such application. In addition, since the background in surveillance videos are usually steady, we may want to subtract an average histogram (computed progressively) from all video frames, so small changes can be detected more easily.&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;The algorithm fails to extract meaningful key frames for surveillance videos especially the last two. This could mean there is a need for fine tuning the algorithm for such application. In addition, since the background in surveillance videos are usually steady, we may want to subtract an average histogram (computed progressively) from all video frames, so small changes can be detected more easily.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;−&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;''' Hierarchical Summarization''': From what I understand, the distance of frames are measured to the last key frame so far. This means that we are tracing the distance and when it reaches its maximum we declare a new key frame. The algorithm provides are with root key frames as a starting points. These frames are intuitively the ones that are farthest from each other thus maximizing the ''frame coverage''. In order to add more details to the summary we process each shot individually. Therefore, for an online system, we need to keep all the frames between the two root key frames (peaks in the plot). From all the frames in this shot we calculate the maximum distance to the root key frame &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;KF&lt;/del&gt;. We have the number of frames in the current shot, N, and the percentage from which we can calculate the number of frames we need, m. The distance is divided by m+1 and the frames are taken at equal distances (on y-axis). This approach maximizes the distance (visual difference) between the new key frames and the root key frames selected by the algorithm. Note that this is very different from equal distance temporal sampling (on x-axis) as it provides more detail for the parts that have more changes. Moreover, in some sense it is reflecting the content progression of the video instead of temporal progression by taking more samples only if there is more visual change. We also need to come up with an idea of how to allocate key frame to shot. Some shots do not need as many frames as others. The distance plot can help us identify these shots corresponding to flat parts of the plot (e.g. the last segment in foreman).&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;''' Hierarchical Summarization''': From what I understand, the distance of frames are measured to the last key frame so far. This means that we are tracing the distance and when it reaches its maximum we declare a new key frame. The algorithm provides are with root key frames as a starting points. These frames are intuitively the ones that are farthest from each other thus maximizing the ''frame coverage''. In order to add more details to the summary we process each shot individually. Therefore, for an online system, we need to keep all the frames between the two root key frames (peaks in the plot). From all the frames in this shot we calculate the maximum distance to the root key frame. We have the number of frames in the current shot, N, and the percentage from which we can calculate the number of frames we need, m. The distance is divided by m+1 and the frames are taken at equal distances (on y-axis). This approach maximizes the distance (visual difference) between the new key frames and the root key frames selected by the algorithm. Note that this is very different from equal distance temporal sampling (on x-axis) as it provides more detail for the parts that have more changes. Moreover, in some sense it is reflecting the content progression of the video instead of temporal progression by taking more samples only if there is more visual change. We also need to come up with an idea of how to allocate key frame to shot. Some shots do not need as many frames as others. The distance plot can help us identify these shots corresponding to flat parts of the plot (e.g. the last segment in foreman).&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>Mbagheri</name></author>
	</entry>
</feed>