123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648649650651652653654655656657658659660661662663664665666667668669670671672673674675676677678679680681682683684685686687688689690691692693694695696697698699700701702703704705706707708709710711712713714715716717718719720721722723724725726727728729730731732733734735736737738739740741742743744745746747748749750751752753754755756757758759760761762763764765766767768769770771772773774775776777778779780781782783784785786787788789790791792793794795796797798799800801802803804805806807808809810811812813814815816817818819820821822823824825826827828829830831832833834835836837838839840841842843844845846847848849850851852853854855856857858859860861862863864865866867868869870871872873874875876877878879880881882883884885886887888889890891892893894895896897898899900901902903904905906907908909910911912913914915916917918919920921922923924925926927928929930931932933934935936937938939940941942943944945946947948949950951952953954955956957 |
- # StarPU --- Runtime system for heterogeneous multicore architectures.
- #
- # Copyright (C) 2009-2017 Université de Bordeaux
- # Copyright (C) 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017 CNRS
- # Copyright (C) 2014, 2016, 2017 INRIA
- #
- # StarPU is free software; you can redistribute it and/or modify
- # it under the terms of the GNU Lesser General Public License as published by
- # the Free Software Foundation; either version 2.1 of the License, or (at
- # your option) any later version.
- #
- # StarPU is distributed in the hope that it will be useful, but
- # WITHOUT ANY WARRANTY; without even the implied warranty of
- # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
- #
- # See the GNU Lesser General Public License in COPYING.LGPL for more details.
- StarPU 1.3.0 (svn revision xxxx)
- ==============================================
- New features:
- * New scheduler with heterogeneous priorities
- * Support priorities for data transfers.
- * Add support for Ayudame version 2.x debugging library.
- * Add support for multiple linear regression performance models
- * Add MPI Master-Slave support to use the cores of remote nodes. Use the
- --enable-mpi-master-slave option to activate it.
- * Add STARPU_CUDA_THREAD_PER_DEV environment variable to support driving all
- GPUs from only one thread when almost all kernels are asynchronous.
- * Add starpu_replay tool to replay tasks.rec files with Simgrid.
- * Add experimental support of NUMA nodes. Use STARPU_USE_NUMA to activate it.
- * Add a new set of functions to make Out-of-Core based on HDF5 Library.
- Small features:
- * Scheduling contexts may now be associated a user data pointer at creation
- time, that can later be recalled through starpu_sched_ctx_get_user_data().
- * Add STARPU_SIMGRID_TASK_SUBMIT_COST and STARPU_SIMGRID_FETCHING_INPUT_COST
- to simulate the cost of task submission and data fetching in simgrid mode.
- This provides more accurate simgrid predictions, especially for the
- beginning of the execution and regarding data transfers.
- * STARPU_SIMGRID_SCHED_COST to take into account the time to perform scheduling
- when running in SimGrid mode.
- * New configure option --enable-mpi-pedantic-isend (disabled by
- default) to acquire data in STARPU_RW (instead of STARPU_R) before
- performing MPI_Isend call
- * New function starpu_worker_display_names to display the names of
- all the workers of a specified type.
- * Arbiters now support concurrent read access.
- * Add a field starpu_task::where similar to starpu_codelet::where
- which allows to restrict where to execute a task. Also add
- STARPU_TASK_WHERE to be used when calling starpu_task_insert().
- * Add SubmitOrder trace field.
- * Add workerids and workerids_len task fields.
- * Add priority management to StarPU-MPI.
- * Add STARPU_MAIN_THREAD_CPUID and STARPU_MPI_THREAD_CPUID environment
- variables.
- * Add disk to disk copy functions and support asynchronous full read/write
- in disk backends.
- * New function starpu_mpi_comm_get_attr() which allows to return the
- value of the attribute STARPU_MPI_TAG_UB, i.e the upper bound for
- tag value.
- * New configure option enable-mpi-verbose to manage the display of
- extra MPI debug messages.
- Changes:
- * Vastly improve simgrid simulation time.
- * Switch default scheduler to lws.
- Small changes:
- * Use asynchronous transfers for task data fetches with were not prefetched.
- * Allow to call starpu_sched_ctx_set_policy_data on the main
- scheduler context
- * Fonction starpu_is_initialized() is moved to the public API.
- StarPU 1.2.2 (svn revision xxx)
- ==============================================
- New features:
- * Add starpu_data_acquire_try and starpu_data_acquire_on_node_try.
- * Add NVCC_CC environment variable.
- * Add -no-flops and -no-events options to starpu_fxt_tool to make
- traces lighter
- * Add starpu_cusparse_init/shutdown/get_local_handle for proper CUDA
- overlapping with cusparse.
- * Allow precise debugging by setting STARPU_TASK_BREAK_ON_PUSH,
- STARPU_TASK_BREAK_ON_SCHED, STARPU_TASK_BREAK_ON_POP, and
- STARPU_TASK_BREAK_ON_EXEC environment variables, with the job_id
- of a task. StarPU will raise SIGTRAP when the task is being
- scheduled, pushed, or popped by the scheduler.
- * Add per-node MPI data.
- Small features:
- * New function starpu_worker_get_job_id(struct starpu_task *task)
- which returns the job identifier for a given task
- * Show package/numa topology in starpu_machine_display
- * MPI: Add mpi communications in dag.dot
- * Add STARPU_PERF_MODEL_HOMOGENEOUS_CPU environment variable to
- allow having one perfmodel per CPU core
- * Add starpu_vector_filter_list_long filter.
- * Add starpu_perfmodel_arch_comb_fetch function.
- * Add STARPU_WATCHDOG_DELAY environment variable.
- * Add starpu_mpi_get_data_on_all_nodes_detached function.
- Small changes:
- * Output generated through STARPU_MPI_COMM has been modified to
- allow easier automated checking
- * MPI: Fix reactivity of the beginning of the application, when a
- lot of ready requests have to be processed at the same time, we
- want to poll the pending requests from time to time.
- * MPI: Fix gantt chart for starpu_mpi_irecv: it should use the
- termination time of the request, not the submission time.
- * MPI: Modify output generated through STARPU_MPI_COMM to allow
- easier automated checking
- * MPI: enable more tests in simgrid mode
- * Use assumed-size instead of assumed-shape arrays for native
- fortran API, for better backward compatibility.
- * Fix odd ordering of CPU workers on CPUs due to GPUs stealing some
- cores
- StarPU 1.2.1 (svn revision 20299)
- ==============================================
- New features:
- * Add starpu_fxt_trace_user_event_string.
- * Add starpu_tasks_rec_complete tool to add estimation times in tasks.rec
- files.
- * Add STARPU_FXT_TRACE environment variable.
- * Add starpu_data_set_user_data and starpu_data_get_user_data.
- * Add STARPU_MPI_FAKE_SIZE and STARPU_MPI_FAKE_RANK to allow simulating
- execution of just one MPI node.
- * Add STARPU_PERF_MODEL_HOMOGENEOUS_CUDA/OPENCL/MIC/SCC to share performance
- models between devices, making calibration much faster.
- * Add modular-heft-prio scheduler.
- * Add starpu_cublas_get_local_handle helper.
- * Add starpu_data_set_name, starpu_data_set_coordinates_array, and
- starpu_data_set_coordinates to describe data, and starpu_iteration_push and
- starpu_iteration_pop to describe tasks, for better offline traces analysis.
- * New function starpu_bus_print_filenames() to display filenames
- storing bandwidth/affinity/latency information, available through
- tools/starpu_machine_display -i
- * Add support for Ayudame version 2.x debugging library.
- * Add starpu_sched_ctx_get_workers_list_raw, much less costly than
- starpu_sched_ctx_get_workers_list
- * Add starpu_task_get_name and use it to warn about dmda etc. using
- a dumb policy when calibration is not finished
- * MPI: Add functions to test for cached values
- Changes:
- * Fix performance regression of lws for small tasks.
- * Improve native Fortran support for StarPU
- Small changes:
- * Fix type of data home node to allow users to pass -1 to define
- temporary data
- * Fix compatibility with simgrid 3.14
- StarPU 1.2.0 (svn revision 18521)
- ==============================================
- New features:
- * MIC Xeon Phi support
- * SCC support
- * New function starpu_sched_ctx_exec_parallel_code to execute a
- parallel code on the workers of the given scheduler context
- * MPI:
- - New internal communication system : a unique tag called
- is now used for all communications, and a system
- of hashmaps on each node which stores pending receives has been
- implemented. Every message is now coupled with an envelope, sent
- before the corresponding data, which allows the receiver to
- allocate data correctly, and to submit the matching receive of
- the envelope.
- - New function
- starpu_mpi_irecv_detached_sequential_consistency which
- allows to enable or disable the sequential consistency for
- the given data handle (sequential consistency will be
- enabled or disabled based on the value of the function
- parameter and the value of the sequential consistency
- defined for the given data)
- - New functions starpu_mpi_task_build() and
- starpu_mpi_task_post_build()
- - New flag STARPU_NODE_SELECTION_POLICY to specify a policy for
- selecting a node to execute the codelet when several nodes
- own data in W mode.
- - New selection node policies can be un/registered with the
- functions starpu_mpi_node_selection_register_policy() and
- starpu_mpi_node_selection_unregister_policy()
- - New environment variable STARPU_MPI_COMM which enables
- basic tracing of communications.
- - New function starpu_mpi_init_comm() which allows to specify
- a MPI communicator.
- * New STARPU_COMMUTE flag which can be passed along STARPU_W or STARPU_RW to
- let starpu commute write accesses.
- * Out-of-core support, through registration of disk areas as additional memory
- nodes. It can be enabled programmatically or through the STARPU_DISK_SWAP*
- environment variables.
- * Reclaiming is now periodically done before memory becomes full. This can
- be controlled through the STARPU_*_AVAILABLE_MEM environment variables.
- * New hierarchical schedulers which allow the user to easily build
- its own scheduler, by coding itself each "box" it wants, or by
- combining existing boxes in StarPU to build it. Hierarchical
- schedulers have very interesting scalability properties.
- * Add STARPU_CUDA_ASYNC and STARPU_OPENCL_ASYNC flags to allow asynchronous
- CUDA and OpenCL kernel execution.
- * Add STARPU_CUDA_PIPELINE and STARPU_OPENCL_PIPELINE to specify how
- many asynchronous tasks are submitted in advance on CUDA and
- OpenCL devices. Setting the value to 0 forces a synchronous
- execution of all tasks.
- * Add CUDA concurrent kernel execution support through
- the STARPU_NWORKER_PER_CUDA environment variable.
- * Add CUDA and OpenCL kernel submission pipelining, to overlap costs and allow
- concurrent kernel execution on Fermi cards.
- * New locality work stealing scheduler (lws).
- * Add STARPU_VARIABLE_NBUFFERS to be set in cl.nbuffers, and nbuffers and
- modes field to the task structure, which permit to define codelets taking a
- variable number of data.
- * Add support for implementing OpenMP runtimes on top of StarPU
- * New performance model format to better represent parallel tasks.
- Used to provide estimations for the execution times of the
- parallel tasks on scheduling contexts or combined workers.
- * starpu_data_idle_prefetch_on_node and
- starpu_idle_prefetch_task_input_on_node allow to queue prefetches to be done
- only when the bus is idle.
- * Make starpu_data_prefetch_on_node not forcibly flush data out, introduce
- starpu_data_fetch_on_node for that.
- * Add data access arbiters, to improve parallelism of concurrent data
- accesses, notably with STARPU_COMMUTE.
- * Anticipative writeback, to flush dirty data asynchronously before the
- GPU device is full. Disabled by default. Use STARPU_MINIMUM_CLEAN_BUFFERS
- and STARPU_TARGET_CLEAN_BUFFERS to enable it.
- * Add starpu_data_wont_use to advise that a piece of data will not be used
- in the close future.
- * Enable anticipative writeback by default.
- * New scheduler 'dmdasd' that considers priority when deciding on
- which worker to schedule
- * Add the capability to define specific MPI datatypes for
- StarPU user-defined interfaces.
- * Add tasks.rec trace output to make scheduling analysis easier.
- * Add Fortran 90 module and example using it
- * New StarPU-MPI gdb debug functions
- * Generate animated html trace of modular schedulers.
- * Add asynchronous partition planning. It only supports coherency through
- the home node of data for now.
- * Add STARPU_MALLOC_SIMULATION_FOLDED flag to save memory when simulating.
- * Include application threads in the trace.
- * Add starpu_task_get_task_scheduled_succs to get successors of a task.
- * Add graph inspection facility for schedulers.
- * New STARPU_LOCALITY flag to mark data which should be taken into account
- by schedulers for improving locality.
- * Experimental support for data locality in ws and lws.
- * Add a preliminary framework for native Fortran support for StarPU
- Small features:
- * Tasks can now have a name (via the field const char *name of
- struct starpu_task)
- * New functions starpu_data_acquire_cb_sequential_consistency() and
- starpu_data_acquire_on_node_cb_sequential_consistency() which allows
- to enable or disable sequential consistency
- * New configure option --enable-fxt-lock which enables additional
- trace events focused on locks behaviour during the execution
- * Functions starpu_insert_task and starpu_mpi_insert_task are
- renamed in starpu_task_insert and starpu_mpi_task_insert. Old
- names are kept to avoid breaking old codes.
- * New configure option --enable-calibration-heuristic which allows
- the user to set the maximum authorized deviation of the
- history-based calibrator.
- * Allow application to provide the task footprint itself.
- * New function starpu_sched_ctx_display_workers() to display worker
- information belonging to a given scheduler context
- * The option --enable-verbose can be called with
- --enable-verbose=extra to increase the verbosity
- * Add codelet size, footprint and tag id in the paje trace.
- * Add STARPU_TAG_ONLY, to specify a tag for traces without making StarPU
- manage the tag.
- * On Linux x86, spinlocks now block after a hundred tries. This avoids
- typical 10ms pauses when the application thread tries to submit tasks.
- * New function char *starpu_worker_get_type_as_string(enum starpu_worker_archtype type)
- * Improve static scheduling by adding support for specifying the task
- execution order.
- * Add starpu_worker_can_execute_task_impl and
- starpu_worker_can_execute_task_first_impl to optimize getting the
- working implementations
- * Add STARPU_MALLOC_NORECLAIM flag to allocate without running a reclaim if
- the node is out of memory.
- * New flag STARPU_DATA_MODE_ARRAY for the function family
- starpu_task_insert to allow to define a array of data handles
- along with their access modes.
- * New configure option --enable-new-check to enable new testcases
- which are known to fail
- * Add starpu_memory_allocate and _deallocate to let the application declare
- its own allocation to the reclaiming engine.
- * Add STARPU_SIMGRID_CUDA_MALLOC_COST and STARPU_SIMGRID_CUDA_QUEUE_COST to
- disable CUDA costs simulation in simgrid mode.
- * Add starpu_task_get_task_succs to get the list of children of a given
- task.
- * Add starpu_malloc_on_node_flags, starpu_free_on_node_flags, and
- starpu_malloc_on_node_set_default_flags to control the allocation flags
- used for allocations done by starpu.
- * Ranges can be provided in STARPU_WORKERS_CPUID
- * Add starpu_fxt_autostart_profiling to be able to avoid autostart.
- * Add arch_cost_function perfmodel function field.
- * Add STARPU_TASK_BREAK_ON_SCHED, STARPU_TASK_BREAK_ON_PUSH, and
- STARPU_TASK_BREAK_ON_POP environment variables to debug schedulers.
- * Add starpu_sched_display tool.
- * Add starpu_memory_pin and starpu_memory_unpin to pin memory allocated
- another way than starpu_malloc.
- * Add STARPU_NOWHERE to create synchronization tasks with data.
- * Document how to switch between differents views of the same data.
- * Add STARPU_NAME to specify a task name from a starpu_task_insert call.
- * Add configure option to disable fortran --disable-fortran
- * Add configure option to give path for smpirun executable --with-smpirun
- * Add configure option to disable the build of tests --disable-build-tests
- * Add starpu-all-tasks debugging support
- * New function
- void starpu_opencl_load_program_source_malloc(const char *source_file_name, char **located_file_name, char **located_dir_name, char **opencl_program_source)
- which allocates the pointers located_file_name, located_dir_name
- and opencl_program_source.
- * Add submit_hook and do_schedule scheduler methods.
- * Add starpu_sleep.
- * Add starpu_task_list_ismember.
- * Add _starpu_fifo_pop_this_task.
- * Add STARPU_MAX_MEMORY_USE environment variable.
- * Add starpu_worker_get_id_check().
- * New function starpu_mpi_wait_for_all(MPI_Comm comm) that allows to
- wait until all StarPU tasks and communications for the given
- communicator are completed.
- * New function starpu_codelet_unpack_args_and_copyleft() which
- allows to copy in a new buffer values which have not been unpacked by
- the current call
- * Add STARPU_CODELET_SIMGRID_EXECUTE flag.
- * Add STARPU_CODELET_SIMGRID_EXECUTE_AND_INJECT flag.
- * Add STARPU_CL_ARGS flag to starpu_task_insert() and
- starpu_mpi_task_insert() functions call
- Changes:
- * Data interfaces (variable, vector, matrix and block) now define
- pack und unpack functions
- * StarPU-MPI: Fix for being able to receive data which have not yet
- been registered by the application (i.e it did not call
- starpu_data_set_tag(), data are received as a raw memory)
- * StarPU-MPI: Fix for being able to receive data with the same tag
- from several nodes (see mpi/tests/gather.c)
- * Remove the long-deprecated cost_model fields and task->buffers field.
- * Fix complexity of implicit task/data dependency, from quadratic to linear.
- Small changes:
- * Rename function starpu_trace_user_event() as
- starpu_fxt_trace_user_event()
- * "power" is renamed into "energy" wherever it applies, notably energy
- consumption performance models
- * Update starpu_task_build() to set starpu_task::cl_arg_free to 1 if
- some arguments of type ::STARPU_VALUE are given.
- * Simplify performance model loading API
- * Better semantic for environment variables STARPU_NMIC and
- STARPU_NMICDEVS, the number of devices and the number of cores.
- STARPU_NMIC will be the number of devices, and STARPU_NMICCORES
- will be the number of cores per device.
- StarPU 1.1.5 (svn revision xxx)
- ==============================================
- The scheduling context release
- * Add starpu_memory_pin and starpu_memory_unpin to pin memory allocated
- another way than starpu_malloc.
- * Add starpu_task_wait_for_n_submitted() and
- STARPU_LIMIT_MAX_NSUBMITTED_TASKS/STARPU_LIMIT_MIN_NSUBMITTED_TASKS to
- easily control the number of submitted tasks by making task submission
- block.
- StarPU 1.1.4 (svn revision 14856)
- ==============================================
- The scheduling context release
- New features:
- * Fix and actually enable the cache allocation.
- * Enable allocation cache in main RAM when STARPU_LIMIT_CPU_MEM is set by
- the user.
- * New MPI functions starpu_mpi_issend and starpu_mpi_issend_detached
- to send data using a synchronous and non-blocking mode (internally
- uses MPI_Issend)
- * New data access mode flag STARPU_SSEND to be set when calling
- starpu_mpi_insert_task to specify the data has to be sent using a
- synchronous and non-blocking mode
- * New environment variable STARPU_PERF_MODEL_DIR which can be set to
- specify a directory where to store performance model files in.
- When unset, the files are stored in $STARPU_HOME/.starpu/sampling
- * MPI:
- - New function starpu_mpi_data_register_comm to register a data
- with another communicator than MPI_COMM_WORLD
- - New functions starpu_mpi_data_set_rank() and starpu_mpi_data_set_tag()
- which call starpu_mpi_data_register_comm()
- Small features:
- * Add starpu_memory_wait_available() to wait for a given size to become
- available on a given node.
- * New environment variable STARPU_RAND_SEED to set the seed used for random
- numbers.
- * New function starpu_mpi_cache_set() to enable or disable the
- communication cache at runtime
- * Add starpu_paje_sort which sorts Pajé traces.
- Changes:
- * Fix complexity of implicit task/data dependency, from quadratic to linear.
- StarPU 1.1.3 (svn revision 13450)
- ==============================================
- The scheduling context release
- New features:
- * One can register an existing on-GPU buffer to be used by a handle.
- * Add the starpu_paje_summary statistics tool.
- * Enable gpu-gpu transfers for matrices.
- * Let interfaces declare which transfers they allow with the can_copy
- methode.
- Small changes:
- * Lock performance model files while writing and reading them to avoid
- issues on parallel launches, MPI runs notably.
- * Lots of build fixes for icc on Windows.
- StarPU 1.1.2 (svn revision 13011)
- ==============================================
- The scheduling context release
- New features:
- * The reduction init codelet is automatically used to initialize temporary
- buffers.
- * Traces now include a "scheduling" state, to show the overhead of the
- scheduler.
- * Add STARPU_CALIBRATE_MINIMUM environment variable to specify the minimum
- number of calibration measurements.
- * Add STARPU_TRACE_BUFFER_SIZE environment variable to specify the size of
- the trace buffer.
- StarPU 1.1.1 (svn revision 12638)
- ==============================================
- The scheduling context release
- New features:
- * MPI:
- - New variable STARPU_MPI_CACHE_STATS to print statistics on
- cache holding received data.
- - New function starpu_mpi_data_register() which sets the rank
- and tag of a data, and also allows to automatically clear
- the MPI communication cache when unregistering the data. It
- should be called instead of both calling
- starpu_data_set_tag() and starpu_data_set_rank()
- * Use streams for all CUDA transfers, even initiated by CPUs.
- * Add paje traces statistics tools.
- * Use streams for GPUA->GPUB and GPUB->GPUA transfers.
- Small features:
- * New STARPU_EXECUTE_ON_WORKER flag to specify the worker on which
- to execute the task.
- * New STARPU_DISABLE_PINNING environment variable to disable host memory
- pinning.
- * New STARPU_DISABLE_KERNELS environment variable to disable actual kernel
- execution.
- * New starpu_memory_get_total function to get the size of a memory node.
- * New starpu_parallel_task_barrier_init_n function to let a scheduler decide
- a set of workers without going through combined workers.
- Changes:
- * Fix simgrid execution.
- * Rename starpu_get_nready_tasks_of_sched_ctx to starpu_sched_ctx_get_nready_tasks
- * Rename starpu_get_nready_flops_of_sched_ctx to starpu_sched_ctx_get_nready_flops
- * New functions starpu_pause() and starpu_resume()
- * New codelet specific_nodes field to specify explicit target nodes for data.
- * StarPU-MPI: Fix overzealous allocation of memory.
- * Interfaces: Allow interface implementation to change pointers at will, in
- unpack notably.
- Small changes:
- * Use big fat abortions when one tries to make a task or callback
- sleep, instead of just returning EDEADLCK which few people will test
- * By default, StarPU FFT examples are not compiled and checked, the
- configure option --enable-starpufft-examples needs to be specified
- to change this behaviour.
- StarPU 1.1.0 (svn revision 11960)
- ==============================================
- The scheduling context release
- New features:
- * OpenGL interoperability support.
- * Capability to store compiled OpenCL kernels on the file system
- * Capability to load compiled OpenCL kernels
- * Performance models measurements can now be provided explicitly by
- applications.
- * Capability to emit communication statistics when running MPI code
- * Add starpu_data_unregister_submit, starpu_data_acquire_on_node and
- starpu_data_invalidate_submit
- * New functionnality to wrapper starpu_insert_task to pass a array of
- data_handles via the parameter STARPU_DATA_ARRAY
- * Enable GPU-GPU direct transfers.
- * GCC plug-in
- - Add `registered' attribute
- - A new pass was added that warns about the use of possibly
- unregistered memory buffers.
- * SOCL
- - Manual mapping of commands on specific devices is now
- possible
- - SOCL does not require StarPU CPU tasks anymore. CPU workers
- are automatically disabled to enhance performance of OpenCL
- CPU devices
- * New interface: COO matrix.
- * Data interfaces: The pack operation of user-defined data interface
- defines a new parameter count which should be set to the size of
- the buffer created by the packing of the data.
- * MPI:
- - Communication statistics for MPI can only be enabled at
- execution time by defining the environment variable
- STARPU_COMM_STATS
- - Communication cache mechanism is enabled by default, and can
- only be disabled at execution time by setting the
- environment variable STARPU_MPI_CACHE to 0.
- - Initialisation functions starpu_mpi_initialize_extended()
- and starpu_mpi_initialize() have been made deprecated. One
- should now use starpu_mpi_init(int *, char ***, int). The
- last parameter indicates if MPI should be initialised.
- - Collective detached operations have new parameters, a
- callback function and a argument. This is to be consistent
- with the detached point-to-point communications.
- - When exchanging user-defined data interfaces, the size of
- the data is the size returned by the pack operation, i.e
- data with dynamic size can now be exchanged with StarPU-MPI.
- * Add experimental simgrid support, to simulate execution with various
- number of CPUs, GPUs, amount of memory, etc.
- * Add support for OpenCL simulators (which provide simulated execution time)
- * Add support for Temanejo, a task graph debugger
- * Theoretical bound lp output now includes data transfer time.
- * Update OpenCL driver to only enable CPU devices (the environment
- variable STARPU_OPENCL_ONLY_ON_CPUS must be set to a positive
- value when executing an application)
- * Add Scheduling contexts to separate computation resources
- - Scheduling policies take into account the set of resources corresponding
- to the context it belongs to
- - Add support to dynamically change scheduling contexts
- (Create and Delete a context, Add Workers to a context, Remove workers from a context)
- - Add support to indicate to which contexts the tasks are submitted
- * Add the Hypervisor to manage the Scheduling Contexts automatically
- - The Contexts can be registered to the Hypervisor
- - Only the registered contexts are managed by the Hypervisor
- - The Hypervisor can detect the initial distribution of resources of
- a context and constructs it consequently (the cost of execution is required)
- - Several policies can adapt dynamically the distribution of resources
- in contexts if the initial one was not appropriate
- - Add a platform to implement new policies of redistribution
- of resources
- * Implement a memory manager which checks the global amount of
- memory available on devices, and checks there is enough memory
- before doing an allocation on the device.
- * Discard environment variable STARPU_LIMIT_GPU_MEM and define
- instead STARPU_LIMIT_CUDA_MEM and STARPU_LIMIT_OPENCL_MEM
- * Introduce new variables STARPU_LIMIT_CUDA_devid_MEM and
- STARPU_LIMIT_OPENCL_devid_MEM to limit memory per specific device
- * Introduce new variable STARPU_LIMIT_CPU_MEM to limit memory for
- the CPU devices
- * New function starpu_malloc_flags to define a memory allocation with
- constraints based on the following values:
- - STARPU_MALLOC_PINNED specifies memory should be pinned
- - STARPU_MALLOC_COUNT specifies the memory allocation should be in
- the limits defined by the environment variables STARPU_LIMIT_xxx
- (see above). When no memory is left, starpu_malloc_flag tries
- to reclaim memory from StarPU and returns -ENOMEM on failure.
- * starpu_malloc calls starpu_malloc_flags with a value of flag set
- to STARPU_MALLOC_PINNED
- * Define new function starpu_free_flags similarly to starpu_malloc_flags
- * Define new public API starpu_pthread which is similar to the
- pthread API. It is provided with 2 implementations: a pthread one
- and a Simgrid one. Applications using StarPU and wishing to use
- the Simgrid StarPU features should use it.
- * Allow to have a dynamically allocated number of buffers per task,
- and so overwrite the value defined --enable-maxbuffers=XXX
- * Performance models files are now stored in a directory whose name
- include the version of the performance model format. The version
- number is also written in the file itself.
- When updating the format, the internal variable
- _STARPU_PERFMODEL_VERSION should be updated. It is then possible
- to switch easily between differents versions of StarPU having
- different performance model formats.
- * Tasks can now define a optional prologue callback which is executed
- on the host when the task becomes ready for execution, before getting
- scheduled.
- * Small CUDA allocations (<= 4MiB) are now batched to avoid the huge
- cudaMalloc overhead.
- * Prefetching is now done for all schedulers when it can be done whatever
- the scheduling decision.
- * Add a watchdog which permits to easily trigger a crash when StarPU gets
- stuck.
- * Document how to migrate data over MPI.
- * New function starpu_wakeup_worker() to be used by schedulers to
- wake up a single worker (instead of all workers) when submitting a
- single task.
- * The functions starpu_sched_set/get_min/max_priority set/get the
- priorities of the current scheduling context, i.e the one which
- was set by a call to starpu_sched_ctx_set_context() or the initial
- context if the function has not been called yet.
- * Fix for properly dealing with NAN on windows systems
- Small features:
- * Add starpu_worker_get_by_type and starpu_worker_get_by_devid
- * Add starpu_fxt_stop_profiling/starpu_fxt_start_profiling which permits to
- pause trace recording.
- * Add trace_buffer_size configuration field to permit to specify the tracing
- buffer size.
- * Add starpu_codelet_profile and starpu_codelet_histo_profile, tools which draw
- the profile of a codelet.
- * File STARPU-REVISION --- containing the SVN revision number from which
- StarPU was compiled --- is installed in the share/doc/starpu directory
- * starpu_perfmodel_plot can now directly draw GFlops curves.
- * New configure option --enable-mpi-progression-hook to enable the
- activity polling method for StarPU-MPI.
- * Permit to disable sequential consistency for a given task.
- * New macro STARPU_RELEASE_VERSION
- * New function starpu_get_version() to return as 3 integers the
- release version of StarPU.
- * Enable by default data allocation cache
- * New function starpu_perfmodel_directory() to print directory
- storing performance models. Available through the new option -d of
- the tool starpu_perfmodel_display
- * New batch files to execute StarPU applications under Microsoft
- Visual Studio (They are installed in path_to_starpu/bin/msvc)/
- * Add cl_arg_free, callback_arg_free, prologue_callback_arg_free fields to
- enable automatic free(cl_arg); free(callback_arg);
- free(prologue_callback_arg) on task destroy.
- * New function starpu_task_build
- * New configure options --with-simgrid-dir
- --with-simgrid-include-dir and --with-simgrid-lib-dir to specify
- the location of the SimGrid library
- Changes:
- * Rename all filter functions to follow the pattern
- starpu_DATATYPE_filter_FILTERTYPE. The script
- tools/dev/rename_filter.sh is provided to update your existing
- applications to use new filters function names.
- * Renaming of diverse functions and datatypes. The script
- tools/dev/rename.sh is provided to update your existing
- applications to use the new names. It is also possible to compile
- with the pkg-config package starpu-1.0 to keep using the old
- names. It is however recommended to update your code and to use
- the package starpu-1.1.
- * Fix the block filter functions.
- * Fix StarPU-MPI on Darwin.
- * The FxT code can now be used on systems other than Linux.
- * Keep only one hashtable implementation common/uthash.h
- * The cache of starpu_mpi_insert_task is fixed and thus now enabled by
- default.
- * Improve starpu_machine_display output.
- * Standardize objects name in the performance model API
- * SOCL
- - Virtual SOCL device has been removed
- - Automatic scheduling still available with command queues not
- assigned to any device
- - Remove modified OpenCL headers. ICD is now the only supported
- way to use SOCL.
- - SOCL test suite is only run when environment variable
- SOCL_OCL_LIB_OPENCL is defined. It should contain the location
- of the libOpenCL.so file of the OCL ICD implementation.
- * Fix main memory leak on multiple unregister/re-register.
- * Improve hwloc detection by configure
- * Cell:
- - It is no longer possible to enable the cell support via the
- gordon driver
- - Data interfaces no longer define functions to copy to and from
- SPU devices
- - Codelet no longer define pointer for Gordon implementations
- - Gordon workers are no longer enabled
- - Gordon performance models are no longer enabled
- * Fix data transfer arrows in paje traces
- * The "heft" scheduler no longer exists. Users should now pick "dmda"
- instead.
- * StarPU can now use poti to generate paje traces.
- * Rename scheduling policy "parallel greedy" to "parallel eager"
- * starpu_scheduler.h is no longer automatically included by
- starpu.h, it has to be manually included when needed
- * New batch files to run StarPU applications with Microsoft Visual C
- * Add examples/release/Makefile to test StarPU examples against an
- installed version of StarPU. That can also be used to test
- examples using a previous API.
- * Tutorial is installed in ${docdir}/tutorial
- * Schedulers eager_central_policy, dm and dmda no longer erroneously respect
- priorities. dmdas has to be used to respect priorities.
- * StarPU-MPI: Fix potential bug for user-defined datatypes. As MPI
- can reorder messages, we need to make sure the sending of the size
- of the data has been completed.
- * Documentation is now generated through doxygen.
- * Modification of perfmodels output format for future improvements.
- * Fix for properly dealing with NAN on windows systems
- * Function starpu_sched_ctx_create() now takes a variable argument
- list to define the scheduler to be used, and the minimum and
- maximum priority values
- * The functions starpu_sched_set/get_min/max_priority set/get the
- priorities of the current scheduling context, i.e the one which
- was set by a call to starpu_sched_ctx_set_context() or the initial
- context if the function was not called yet.
- * MPI: Fix of the livelock issue discovered while executing applications
- on a CPU+GPU cluster of machines by adding a maximum trylock
- threshold before a blocking lock.
- Small changes:
- * STARPU_NCPU should now be used instead of STARPU_NCPUS. STARPU_NCPUS is
- still available for compatibility reasons.
- * include/starpu.h includes all include/starpu_*.h files, applications
- therefore only need to have #include <starpu.h>
- * Active task wait is now included in blocked time.
- * Fix GCC plugin linking issues starting with GCC 4.7.
- * Fix forcing calibration of never-calibrated archs.
- * CUDA applications are no longer compiled with the "-arch sm_13"
- option. It is specifically added to applications which need it.
- * Explicitly name the non-sleeping-non-running time "Overhead", and use
- another color in vite traces.
- * Use C99 variadic macro support, not GNU.
- * Fix performance regression: dmda queues were inadvertently made
- LIFOs in r9611.
- StarPU 1.0.3 (svn revision 7379)
- ==============================================
- Changes:
- * Several bug fixes in the build system
- * Bug fixes in source code for non-Linux systems
- * Fix generating FXT traces bigger than 64MiB.
- * Improve ENODEV error detections in StarPU FFT
- StarPU 1.0.2 (svn revision 7210)
- ==============================================
- Changes:
- * Add starpu_block_shadow_filter_func_vector and an example.
- * Add tag dependency in trace-generated DAG.
- * Fix CPU binding for optimized CPU-GPU transfers.
- * Fix parallel tasks CPU binding and combined worker generation.
- * Fix generating FXT traces bigger than 64MiB.
- StarPU 1.0.1 (svn revision 6659)
- ==============================================
- Changes:
- * hwloc support. Warn users when hwloc is not found on the system and
- produce error when not explicitely disabled.
- * Several bug fixes
- * GCC plug-in
- - Add `#pragma starpu release'
- - Fix bug when using `acquire' pragma with function parameters
- - Slightly improve test suite coverage
- - Relax the GCC version check
- * Update SOCL to use new API
- * Documentation improvement.
- StarPU 1.0.0 (svn revision 6306)
- ==============================================
- The extensions-again release
- New features:
- * Add SOCL, an OpenCL interface on top of StarPU.
- * Add a gcc plugin to extend the C interface with pragmas which allows to
- easily define codelets and issue tasks.
- * Add reduction mode to starpu_mpi_insert_task.
- * A new multi-format interface permits to use different binary formats
- on CPUs & GPUs, the conversion functions being provided by the
- application and called by StarPU as needed (and as less as
- possible).
- * Deprecate cost_model, and introduce cost_function, which is provided
- with the whole task structure, the target arch and implementation
- number.
- * Permit the application to provide its own size base for performance
- models.
- * Applications can provide several implementations of a codelet for the
- same architecture.
- * Add a StarPU-Top feedback and steering interface.
- * Permit to specify MPI tags for more efficient starpu_mpi_insert_task
- Changes:
- * Fix several memory leaks and race conditions
- * Make environment variables take precedence over the configuration
- passed to starpu_init()
- * Libtool interface versioning has been included in libraries names
- (libstarpu-1.0.so, libstarpumpi-1.0.so,
- libstarpufft-1.0.so, libsocl-1.0.so)
- * Install headers under $includedir/starpu/1.0.
- * Make where field for struct starpu_codelet optional. When unset, its
- value will be automatically set based on the availability of the
- different XXX_funcs fields of the codelet.
- * Define access modes for data handles into starpu_codelet and no longer
- in starpu_task. Hence mark (struct starpu_task).buffers as
- deprecated, and add (struct starpu_task).handles and (struct
- starpu_codelet).modes
- * Fields xxx_func of struct starpu_codelet are made deprecated. One
- should use fields xxx_funcs instead.
- * Some types were renamed for consistency. when using pkg-config libstarpu,
- starpu_deprecated_api.h is automatically included (after starpu.h) to
- keep compatibility with existing software. Other changes are mentioned
- below, compatibility is also preserved for them.
- To port code to use new names (this is not mandatory), the
- tools/dev/rename.sh script can be used, and pkg-config starpu-1.0 should
- be used.
- * The communication cost in the heft and dmda scheduling strategies now
- take into account the contention brought by the number of GPUs. This
- changes the meaning of the beta factor, whose default 1.0 value should
- now be good enough in most case.
- Small features:
- * Allow users to disable asynchronous data transfers between CPUs and
- GPUs.
- * Update OpenCL driver to enable CPU devices (the environment variable
- STARPU_OPENCL_ON_CPUS must be set to a positive value when
- executing an application)
- * struct starpu_data_interface_ops --- operations on a data
- interface --- define a new function pointer allocate_new_data
- which creates a new data interface of the given type based on
- an existing handle
- * Add a field named magic to struct starpu_task which is set when
- initialising the task. starpu_task_submit will fail if the
- field does not have the right value. This will hence avoid
- submitting tasks which have not been properly initialised.
- * Add a hook function pre_exec_hook in struct starpu_sched_policy.
- The function is meant to be called in drivers. Schedulers
- can use it to be notified when a task is about being computed.
- * Add codelet execution time statistics plot.
- * Add bus speed in starpu_machine_display.
- * Add a STARPU_DATA_ACQUIRE_CB which permits to inline the code to be
- done.
- * Add gdb functions.
- * Add complex support to LU example.
- * Permit to use the same data several times in write mode in the
- parameters of the same task.
- Small changes:
- * Increase default value for STARPU_MAXCPUS -- Maximum number of
- CPUs supported -- to 64.
- * Add man pages for some of the tools
- * Add C++ application example in examples/cpp/
- * Add an OpenMP fork-join example.
- * Documentation improvement.
- StarPU 0.9 (svn revision 3721)
- ==============================================
- The extensions release
- * Provide the STARPU_REDUX data access mode
- * Externalize the scheduler API.
- * Add theoretical bound computation
- * Add the void interface
- * Add power consumption optimization
- * Add parallel task support
- * Add starpu_mpi_insert_task
- * Add profiling information interface.
- * Add STARPU_LIMIT_GPU_MEM environment variable.
- * OpenCL fixes
- * MPI fixes
- * Improve optimization documentation
- * Upgrade to hwloc 1.1 interface
- * Add fortran example
- * Add mandelbrot OpenCL example
- * Add cg example
- * Add stencil MPI example
- * Initial support for CUDA4
- StarPU 0.4 (svn revision 2535)
- ==============================================
- The API strengthening release
- * Major API improvements
- - Provide the STARPU_SCRATCH data access mode
- - Rework data filter interface
- - Rework data interface structure
- - A script that automatically renames old functions to accomodate with the new
- API is available from https://scm.gforge.inria.fr/svn/starpu/scripts/renaming
- (login: anonsvn, password: anonsvn)
- * Implement dependencies between task directly (eg. without tags)
- * Implicit data-driven task dependencies simplifies the design of
- data-parallel algorithms
- * Add dynamic profiling capabilities
- - Provide per-task feedback
- - Provide per-worker feedback
- - Provide feedback about memory transfers
- * Provide a library to help accelerating MPI applications
- * Improve data transfers overhead prediction
- - Transparently benchmark buses to generate performance models
- - Bind accelerator-controlling threads with respect to NUMA locality
- * Improve StarPU's portability
- - Add OpenCL support
- - Add support for Windows
- StarPU 0.2.901 aka 0.3-rc1 (svn revision 1236)
- ==============================================
- The asynchronous heterogeneous multi-accelerator release
- * Many API changes and code cleanups
- - Implement starpu_worker_get_id
- - Implement starpu_worker_get_name
- - Implement starpu_worker_get_type
- - Implement starpu_worker_get_count
- - Implement starpu_display_codelet_stats
- - Implement starpu_data_prefetch_on_node
- - Expose the starpu_data_set_wt_mask function
- * Support nvidia (heterogeneous) multi-GPU
- * Add the data request mechanism
- - All data transfers use data requests now
- - Implement asynchronous data transfers
- - Implement prefetch mechanism
- - Chain data requests to support GPU->RAM->GPU transfers
- * Make it possible to bypass the scheduler and to assign a task to a specific
- worker
- * Support restartable tasks to reinstanciate dependencies task graphs
- * Improve performance prediction
- - Model data transfer overhead
- - One model is created for each accelerator
- * Support for CUDA's driver API is deprecated
- * The STARPU_WORKERS_CUDAID and STARPU_WORKERS_CPUID env. variables make it possible to
- specify where to bind the workers
- * Use the hwloc library to detect the actual number of cores
- StarPU 0.2.0 (svn revision 1013)
- ==============================================
- The Stabilizing-the-Basics release
- * Various API cleanups
- * Mac OS X is supported now
- * Add dynamic code loading facilities onto Cell's SPUs
- * Improve performance analysis/feedback tools
- * Application can interact with StarPU tasks
- - The application may access/modify data managed by the DSM
- - The application may wait for the termination of a (set of) task(s)
- * An initial documentation is added
- * More examples are supplied
- StarPU 0.1.0 (svn revision 794)
- ==============================================
- First release.
- Status:
- * Only supports Linux platforms yet
- * Supported architectures
- - multicore CPUs
- - NVIDIA GPUs (with CUDA 2.x)
- - experimental Cell/BE support
- Changes:
- * Scheduling facilities
- - run-time selection of the scheduling policy
- - basic auto-tuning facilities
- * Software-based DSM
- - transparent data coherency management
- - High-level expressive interface
- # Local Variables:
- # mode: text
- # coding: utf-8
- # ispell-local-dictionary: "american"
- # End:
|