ソースを参照

Check that only the OLD data are deleted, not only the ones that are not the latest valid CP (as we can yet have saved data for future CP).

Romain LION 5 年 前
コミット
f36588fe08
共有1 個のファイルを変更した4 個の追加3 個の削除を含む
  1. 4 3
      mpi/src/mpi_failure_tolerance/starpu_mpi_checkpoint_package.c

+ 4 - 3
mpi/src/mpi_failure_tolerance/starpu_mpi_checkpoint_package.c

@@ -50,9 +50,9 @@ int checkpoint_package_data_del(int cp_id, int cp_inst, int rank)
 	while (checkpoint_data != _starpu_mpi_checkpoint_data_list_end(checkpoint_data_list))
 	{
 		next_checkpoint_data = _starpu_mpi_checkpoint_data_list_next(checkpoint_data);
-		if (!(checkpoint_data->cp_id==cp_id && checkpoint_data->cp_inst==cp_inst)
-//		if ((checkpoint_data->cp_id==cp_id && checkpoint_data->cp_inst==cp_inst)
-			&& checkpoint_data->rank==rank)
+		// I delete all the old data (i.e. the cp inst is strictly lower than the one of the just validated CP) only for
+		// the rank that initiated the CP
+		if (checkpoint_data->cp_inst<cp_inst && checkpoint_data->rank==rank)
 		{
 			if (checkpoint_data->type==STARPU_R)
 			{
@@ -64,6 +64,7 @@ int checkpoint_package_data_del(int cp_id, int cp_inst, int rank)
 				free(checkpoint_data->ptr);
 			}
 			_starpu_mpi_checkpoint_data_list_erase(checkpoint_data_list, checkpoint_data);
+			free(checkpoint_data);
 			done++;
 		}
 		checkpoint_data = next_checkpoint_data;