瀏覽代碼

Check that only the OLD data are deleted, not only the ones that are not the latest valid CP (as we can yet have saved data for future CP).

Romain LION 4 年之前
父節點
當前提交
f36588fe08
共有 1 個文件被更改,包括 4 次插入3 次删除
  1. 4 3
      mpi/src/mpi_failure_tolerance/starpu_mpi_checkpoint_package.c

+ 4 - 3
mpi/src/mpi_failure_tolerance/starpu_mpi_checkpoint_package.c

@@ -50,9 +50,9 @@ int checkpoint_package_data_del(int cp_id, int cp_inst, int rank)
 	while (checkpoint_data != _starpu_mpi_checkpoint_data_list_end(checkpoint_data_list))
 	{
 		next_checkpoint_data = _starpu_mpi_checkpoint_data_list_next(checkpoint_data);
-		if (!(checkpoint_data->cp_id==cp_id && checkpoint_data->cp_inst==cp_inst)
-//		if ((checkpoint_data->cp_id==cp_id && checkpoint_data->cp_inst==cp_inst)
-			&& checkpoint_data->rank==rank)
+		// I delete all the old data (i.e. the cp inst is strictly lower than the one of the just validated CP) only for
+		// the rank that initiated the CP
+		if (checkpoint_data->cp_inst<cp_inst && checkpoint_data->rank==rank)
 		{
 			if (checkpoint_data->type==STARPU_R)
 			{
@@ -64,6 +64,7 @@ int checkpoint_package_data_del(int cp_id, int cp_inst, int rank)
 				free(checkpoint_data->ptr);
 			}
 			_starpu_mpi_checkpoint_data_list_erase(checkpoint_data_list, checkpoint_data);
+			free(checkpoint_data);
 			done++;
 		}
 		checkpoint_data = next_checkpoint_data;