소스 검색

Check that only the OLD data are deleted, not only the ones that are not the latest valid CP (as we can yet have saved data for future CP).

Romain LION 4 년 전
부모
커밋
f36588fe08
1개의 변경된 파일4개의 추가작업 그리고 3개의 파일을 삭제
  1. 4 3
      mpi/src/mpi_failure_tolerance/starpu_mpi_checkpoint_package.c

+ 4 - 3
mpi/src/mpi_failure_tolerance/starpu_mpi_checkpoint_package.c

@@ -50,9 +50,9 @@ int checkpoint_package_data_del(int cp_id, int cp_inst, int rank)
 	while (checkpoint_data != _starpu_mpi_checkpoint_data_list_end(checkpoint_data_list))
 	{
 		next_checkpoint_data = _starpu_mpi_checkpoint_data_list_next(checkpoint_data);
-		if (!(checkpoint_data->cp_id==cp_id && checkpoint_data->cp_inst==cp_inst)
-//		if ((checkpoint_data->cp_id==cp_id && checkpoint_data->cp_inst==cp_inst)
-			&& checkpoint_data->rank==rank)
+		// I delete all the old data (i.e. the cp inst is strictly lower than the one of the just validated CP) only for
+		// the rank that initiated the CP
+		if (checkpoint_data->cp_inst<cp_inst && checkpoint_data->rank==rank)
 		{
 			if (checkpoint_data->type==STARPU_R)
 			{
@@ -64,6 +64,7 @@ int checkpoint_package_data_del(int cp_id, int cp_inst, int rank)
 				free(checkpoint_data->ptr);
 			}
 			_starpu_mpi_checkpoint_data_list_erase(checkpoint_data_list, checkpoint_data);
+			free(checkpoint_data);
 			done++;
 		}
 		checkpoint_data = next_checkpoint_data;