ACMM08 - Revision history

Mbagheri at 00:02, 19 April 2008

2008-04-19T00:02:31Z

Mbagheri at 23:57, 18 April 2008

2008-04-18T23:57:24Z

Mbagheri at 23:56, 18 April 2008

2008-04-18T23:56:01Z

Mbagheri at 23:54, 18 April 2008

2008-04-18T23:54:59Z

Mbagheri at 23:52, 18 April 2008

2008-04-18T23:52:22Z

Mbagheri at 23:51, 18 April 2008

2008-04-18T23:51:35Z

Mbagheri at 23:51, 18 April 2008

2008-04-18T23:51:07Z

Mbagheri at 23:50, 18 April 2008

2008-04-18T23:50:53Z

Mbagheri at 23:49, 18 April 2008

2008-04-18T23:49:52Z

Mbagheri at 23:37, 18 April 2008

2008-04-18T23:37:46Z

← Older revision		Revision as of 00:02, 19 April 2008
Line 35:		Line 35:
	The algorithm fails to extract meaningful key frames for surveillance videos especially the last two. As it can be seen from the plots, although there is a significant raise/drop in the distance, no new key frame is extracted. This is semantically correct since there is no new shot. However, we need to address this issue either by parameter tuning or changing the algorithm. Characteristics of surveillance videos can help us tune the algorithm for such applications. For example, since the background in surveillance videos is usually steady, we may want to subtract an average histogram (computed progressively) from all video frames, so small changes can be detected more easily.		The algorithm fails to extract meaningful key frames for surveillance videos especially the last two. As it can be seen from the plots, although there is a significant raise/drop in the distance, no new key frame is extracted. This is semantically correct since there is no new shot. However, we need to address this issue either by parameter tuning or changing the algorithm. Characteristics of surveillance videos can help us tune the algorithm for such applications. For example, since the background in surveillance videos is usually steady, we may want to subtract an average histogram (computed progressively) from all video frames, so small changes can be detected more easily.

−	''' Hierarchical Summarization''': From what I understand, the distance of frames are measured to the last key frame so far. This means that we are tracing the distance and when it reaches its maximum we declare a new key frame. The algorithm provides us with root key frames as a starting points. These frames are intuitively the ones that are farthest from each other thus maximizing the ''frame coverage''. In order to add more details to the summary we process each shot individually. Therefore, for an online system, we need to keep all the frames between the two subsequent root key frames (peaks in the plot). From all the frames in ~~this~~ shot we calculate the maximum distance to the root key frame. ~~We have~~ the number of frames in the current shot~~, N~~, and the ~~percentage from which~~ we can calculate the number of frames we need, m. The ~~distance~~ is divided ~~by m~~+1 and ~~the~~ frames are ~~taken at equal distances (on y-axis)~~. This approach maximizes the distance (visual difference) between the ~~new~~ key frames and the root key frames ~~selected by the algorithm~~. Note that this is very different from ~~equal distance~~ temporal sampling (on x-axis) as it provides more detail for the parts that have more changes. ~~Moreover~~, ~~in some sense~~ it is ~~reflecting~~ the content progression of the video instead of temporal progression by taking more samples only if there is more visual change. We ~~also~~ need to come up with ~~an idea of how~~ to allocate key frame to ~~shot~~. Some shots do not need as many frames as others. The distance plot can help us identify these shots ~~corresponding~~ to flat parts of the plot (e.g. the last segment in foreman).	+	''' Hierarchical Summarization''': From what I understand, the distance of frames are measured to the last key frame so far. This means that we are tracing the distance and when it reaches its maximum we declare a new key frame. The algorithm provides us with root key frames as a starting points. These frames are intuitively the ones that are farthest from each other thus maximizing the ''frame coverage''. In order to add more details to the summary we process each shot individually. Therefore, for an online system, we need to keep all the frames between the two subsequent root key frames (peaks in the plot). From all the frames in th shot, we calculate the maximum distance to the previous root key frame. From the number of frames in the current shot, and the desired summarization ratio, we can calculate the number of extra frames we need, M. The interval (on y-axis) is divided into M+1 equal segments and M frames are selected. This approach maximizes the distance (visual difference) between the extra key frames and the root key frames. Note that this is very different from uniform temporal sampling (equal distances on x-axis) as it provides more detail for the parts that have more changes. In other words, it is reflects the content progression of the video instead of temporal progression by taking more samples only if there is more visual change. We need to come up with a technique to allocate key frame to shots. Some shots do not need as many frames as others. The distance plot can help us identify these shots which correspond to flat parts of the plot (e.g. the last segment in foreman).

← Older revision		Revision as of 23:57, 18 April 2008
Line 35:		Line 35:
	The algorithm fails to extract meaningful key frames for surveillance videos especially the last two. As it can be seen from the plots, although there is a significant raise/drop in the distance, no new key frame is extracted. This is semantically correct since there is no new shot. However, we need to address this issue either by parameter tuning or changing the algorithm. Characteristics of surveillance videos can help us tune the algorithm for such applications. For example, since the background in surveillance videos is usually steady, we may want to subtract an average histogram (computed progressively) from all video frames, so small changes can be detected more easily.		The algorithm fails to extract meaningful key frames for surveillance videos especially the last two. As it can be seen from the plots, although there is a significant raise/drop in the distance, no new key frame is extracted. This is semantically correct since there is no new shot. However, we need to address this issue either by parameter tuning or changing the algorithm. Characteristics of surveillance videos can help us tune the algorithm for such applications. For example, since the background in surveillance videos is usually steady, we may want to subtract an average histogram (computed progressively) from all video frames, so small changes can be detected more easily.

−	''' Hierarchical Summarization''': From what I understand, the distance of frames are measured to the last key frame so far. This means that we are tracing the distance and when it reaches its maximum we declare a new key frame. The algorithm provides us with root key frames as a starting points. These frames are intuitively the ones that are farthest from each other thus maximizing the ''frame coverage''. In order to add more details to the summary we process each shot individually. Therefore, for an online system, we need to keep all the frames between the two root key frames (peaks in the plot). From all the frames in this shot we calculate the maximum distance to the root key frame. We have the number of frames in the current shot, N, and the percentage from which we can calculate the number of frames we need, m. The distance is divided by m+1 and the frames are taken at equal distances (on y-axis). This approach maximizes the distance (visual difference) between the new key frames and the root key frames selected by the algorithm. Note that this is very different from equal distance temporal sampling (on x-axis) as it provides more detail for the parts that have more changes. Moreover, in some sense it is reflecting the content progression of the video instead of temporal progression by taking more samples only if there is more visual change. We also need to come up with an idea of how to allocate key frame to shot. Some shots do not need as many frames as others. The distance plot can help us identify these shots corresponding to flat parts of the plot (e.g. the last segment in foreman).	+	''' Hierarchical Summarization''': From what I understand, the distance of frames are measured to the last key frame so far. This means that we are tracing the distance and when it reaches its maximum we declare a new key frame. The algorithm provides us with root key frames as a starting points. These frames are intuitively the ones that are farthest from each other thus maximizing the ''frame coverage''. In order to add more details to the summary we process each shot individually. Therefore, for an online system, we need to keep all the frames between the two subsequent root key frames (peaks in the plot). From all the frames in this shot we calculate the maximum distance to the root key frame. We have the number of frames in the current shot, N, and the percentage from which we can calculate the number of frames we need, m. The distance is divided by m+1 and the frames are taken at equal distances (on y-axis). This approach maximizes the distance (visual difference) between the new key frames and the root key frames selected by the algorithm. Note that this is very different from equal distance temporal sampling (on x-axis) as it provides more detail for the parts that have more changes. Moreover, in some sense it is reflecting the content progression of the video instead of temporal progression by taking more samples only if there is more visual change. We also need to come up with an idea of how to allocate key frame to shot. Some shots do not need as many frames as others. The distance plot can help us identify these shots corresponding to flat parts of the plot (e.g. the last segment in foreman).

← Older revision		Revision as of 23:56, 18 April 2008
Line 35:		Line 35:
	The algorithm fails to extract meaningful key frames for surveillance videos especially the last two. As it can be seen from the plots, although there is a significant raise/drop in the distance, no new key frame is extracted. This is semantically correct since there is no new shot. However, we need to address this issue either by parameter tuning or changing the algorithm. Characteristics of surveillance videos can help us tune the algorithm for such applications. For example, since the background in surveillance videos is usually steady, we may want to subtract an average histogram (computed progressively) from all video frames, so small changes can be detected more easily.		The algorithm fails to extract meaningful key frames for surveillance videos especially the last two. As it can be seen from the plots, although there is a significant raise/drop in the distance, no new key frame is extracted. This is semantically correct since there is no new shot. However, we need to address this issue either by parameter tuning or changing the algorithm. Characteristics of surveillance videos can help us tune the algorithm for such applications. For example, since the background in surveillance videos is usually steady, we may want to subtract an average histogram (computed progressively) from all video frames, so small changes can be detected more easily.

−	''' Hierarchical Summarization''': From what I understand, the distance of frames are measured to the last key frame so far. This means that we are tracing the distance and when it reaches its maximum we declare a new key frame. The algorithm provides ~~are~~ with root key frames as a starting points. These frames are intuitively the ones that are farthest from each other thus maximizing the ''frame coverage''. In order to add more details to the summary we process each shot individually. Therefore, for an online system, we need to keep all the frames between the two root key frames (peaks in the plot). From all the frames in this shot we calculate the maximum distance to the root key frame. We have the number of frames in the current shot, N, and the percentage from which we can calculate the number of frames we need, m. The distance is divided by m+1 and the frames are taken at equal distances (on y-axis). This approach maximizes the distance (visual difference) between the new key frames and the root key frames selected by the algorithm. Note that this is very different from equal distance temporal sampling (on x-axis) as it provides more detail for the parts that have more changes. Moreover, in some sense it is reflecting the content progression of the video instead of temporal progression by taking more samples only if there is more visual change. We also need to come up with an idea of how to allocate key frame to shot. Some shots do not need as many frames as others. The distance plot can help us identify these shots corresponding to flat parts of the plot (e.g. the last segment in foreman).	+	''' Hierarchical Summarization''': From what I understand, the distance of frames are measured to the last key frame so far. This means that we are tracing the distance and when it reaches its maximum we declare a new key frame. The algorithm provides us with root key frames as a starting points. These frames are intuitively the ones that are farthest from each other thus maximizing the ''frame coverage''. In order to add more details to the summary we process each shot individually. Therefore, for an online system, we need to keep all the frames between the two root key frames (peaks in the plot). From all the frames in this shot we calculate the maximum distance to the root key frame. We have the number of frames in the current shot, N, and the percentage from which we can calculate the number of frames we need, m. The distance is divided by m+1 and the frames are taken at equal distances (on y-axis). This approach maximizes the distance (visual difference) between the new key frames and the root key frames selected by the algorithm. Note that this is very different from equal distance temporal sampling (on x-axis) as it provides more detail for the parts that have more changes. Moreover, in some sense it is reflecting the content progression of the video instead of temporal progression by taking more samples only if there is more visual change. We also need to come up with an idea of how to allocate key frame to shot. Some shots do not need as many frames as others. The distance plot can help us identify these shots corresponding to flat parts of the plot (e.g. the last segment in foreman).

@@ Line 33: / Line 33: @@
 As it can be seen from the extracted summaries, the algorithm succeeds in detecting shots. For videos that contain only one shot, only one key frame is extracted such as city and ice sequences. While for multi shot videos several frames are extracted. A good example here is the doc_reality sequence provided by CBC that has a lot of shots. In addition, since we are using all three color elements (HSV), changes in the brightness are detected e.g. the first fews key frames in doc_reality. We may want to change this for surveillance applications and use only H and S.
+The algorithm fails to extract meaningful key frames for surveillance videos especially the last two. As it can be seen from the plots, although there is a significant raise/drop in the distance, no new key frame is extracted. This is semantically correct since there is no new shot. However, we need to address this issue either by parameter tuning or changing the algorithm. Characteristics of surveillance videos can help us tune the algorithm for such applications. For example, since the background in surveillance videos is usually steady, we may want to subtract an average histogram (computed progressively) from all video frames, so small changes can be detected more easily.
 ''' Hierarchical Summarization''': From what I understand, the distance of frames are measured to the last key frame so far. This means that we are tracing the distance and when it reaches its maximum we declare a new key frame. The algorithm provides are with root key frames as a starting points. These frames are intuitively the ones that are farthest from each other thus maximizing the ''frame coverage''. In order to add more details to the summary we process each shot individually. Therefore, for an online system, we need to keep all the frames between the two root key frames (peaks in the plot). From all the frames in this shot we calculate the maximum distance to the root key frame. We have the number of frames in the current shot, N, and the percentage from which we can calculate the number of frames we need, m. The distance is divided by m+1 and the frames are taken at equal distances (on y-axis). This approach maximizes the distance (visual difference) between the new key frames and the root key frames selected by the algorithm. Note that this is very different from equal distance temporal sampling (on x-axis) as it provides more detail for the parts that have more changes. Moreover, in some sense it is reflecting the content progression of the video instead of temporal progression by taking more samples only if there is more visual change. We also need to come up with an idea of how to allocate key frame to shot. Some shots do not need as many frames as others. The distance plot can help us identify these shots corresponding to flat parts of the plot (e.g. the last segment in foreman).

@@ Line 31: / Line 31: @@
 </table>
-As it can be seen from the extracted summaries, the algorithm succeeds in detecting shots. A good example here is the doc_reality sequence provided by CBC that has a lot of shots. In addition, since we are using all three color elements (HSV), changes in the brightness are detected e.g. the first fews key frames in doc_reality. We may want to change this for surveillance applications and use only H and S.
+As it can be seen from the extracted summaries, the algorithm succeeds in detecting shots. For videos that contain only one shot, only one key frame is extracted such as city and ice sequences. While for multi shot videos several frames are extracted. A good example here is the doc_reality sequence provided by CBC that has a lot of shots. In addition, since we are using all three color elements (HSV), changes in the brightness are detected e.g. the first fews key frames in doc_reality. We may want to change this for surveillance applications and use only H and S.
 The algorithm fails to extract meaningful key frames for surveillance videos especially the last two. As it can be seen from the plots, although there is a significant raise or drop in the distance, no new shot is detected which is semantically correct because the whole video is one shot. However, we need to either tune the parameters or change the algorithm to take key frames even when there is no change in the shot. This could mean there is a need for tuning the algorithm for such application. In addition, since the background in surveillance videos are usually steady, we may want to subtract an average histogram (computed progressively) from all video frames, so small changes can be detected more easily.
 ''' Hierarchical Summarization''': From what I understand, the distance of frames are measured to the last key frame so far. This means that we are tracing the distance and when it reaches its maximum we declare a new key frame. The algorithm provides are with root key frames as a starting points. These frames are intuitively the ones that are farthest from each other thus maximizing the ''frame coverage''. In order to add more details to the summary we process each shot individually. Therefore, for an online system, we need to keep all the frames between the two root key frames (peaks in the plot). From all the frames in this shot we calculate the maximum distance to the root key frame. We have the number of frames in the current shot, N, and the percentage from which we can calculate the number of frames we need, m. The distance is divided by m+1 and the frames are taken at equal distances (on y-axis). This approach maximizes the distance (visual difference) between the new key frames and the root key frames selected by the algorithm. Note that this is very different from equal distance temporal sampling (on x-axis) as it provides more detail for the parts that have more changes. Moreover, in some sense it is reflecting the content progression of the video instead of temporal progression by taking more samples only if there is more visual change. We also need to come up with an idea of how to allocate key frame to shot. Some shots do not need as many frames as others. The distance plot can help us identify these shots corresponding to flat parts of the plot (e.g. the last segment in foreman).

← Older revision		Revision as of 23:37, 18 April 2008
Line 35:		Line 35:
	The algorithm fails to extract meaningful key frames for surveillance videos especially the last two. This could mean there is a need for fine tuning the algorithm for such application. In addition, since the background in surveillance videos are usually steady, we may want to subtract an average histogram (computed progressively) from all video frames, so small changes can be detected more easily.		The algorithm fails to extract meaningful key frames for surveillance videos especially the last two. This could mean there is a need for fine tuning the algorithm for such application. In addition, since the background in surveillance videos are usually steady, we may want to subtract an average histogram (computed progressively) from all video frames, so small changes can be detected more easily.

−	''' Hierarchical Summarization''': From what I understand, the distance of frames are measured to the last key frame so far. This means that we are tracing the distance and when it reaches its maximum we declare a new key frame. The algorithm provides are with root key frames as a starting points. These frames are intuitively the ones that are farthest from each other thus maximizing the ''frame coverage''. In order to add more details to the summary we process each shot individually. Therefore, for an online system, we need to keep all the frames between the two root key frames (peaks in the plot). From all the frames in this shot we calculate the maximum distance to the root key frame KF. We have the number of frames in the current shot, N, and the percentage from which we can calculate the number of frames we need, m. The distance is divided by m+1 and the frames are taken at equal distances (on y-axis). This approach maximizes the distance (visual difference) between the new key frames and the root key frames selected by the algorithm. Note that this is very different from equal distance temporal sampling (on x-axis) as it provides more detail for the parts that have more changes. Moreover, in some sense it is reflecting the content progression of the video instead of temporal progression by taking more samples only if there is more visual change. We also need to come up with an idea of how to allocate key frame to shot. Some shots do not need as many frames as others. The distance plot can help us identify these shots corresponding to flat parts of the plot (e.g. the last segment in foreman).	+	''' Hierarchical Summarization''': From what I understand, the distance of frames are measured to the last key frame so far. This means that we are tracing the distance and when it reaches its maximum we declare a new key frame. The algorithm provides are with root key frames as a starting points. These frames are intuitively the ones that are farthest from each other thus maximizing the ''frame coverage''. In order to add more details to the summary we process each shot individually. Therefore, for an online system, we need to keep all the frames between the two root key frames (peaks in the plot). From all the frames in this shot we calculate the maximum distance to the root key frame. We have the number of frames in the current shot, N, and the percentage from which we can calculate the number of frames we need, m. The distance is divided by m+1 and the frames are taken at equal distances (on y-axis). This approach maximizes the distance (visual difference) between the new key frames and the root key frames selected by the algorithm. Note that this is very different from equal distance temporal sampling (on x-axis) as it provides more detail for the parts that have more changes. Moreover, in some sense it is reflecting the content progression of the video instead of temporal progression by taking more samples only if there is more visual change. We also need to come up with an idea of how to allocate key frame to shot. Some shots do not need as many frames as others. The distance plot can help us identify these shots corresponding to flat parts of the plot (e.g. the last segment in foreman).