12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091929394959697989910010110210310410510610710810911011111211311411511611711811912012112212312412512612712812913013113213313413513613713813914014114214314414514614714814915015115215315415515615715815916016116216316416516616716816917017117217317417517617717817918018118218318418518618718818919019119219319419519619719819920020120220320420520620720820921021121221321421521621721821922022122222322422522622722822923023123223323423523623723823924024124224324424524624724824925025125225325425525625725825926026126226326426526626726826927027127227327427527627727827928028128228328428528628728828929029129229329429529629729829930030130230330430530630730830931031131231331431531631731831932032132232332432532632732832933033133233333433533633733833934034134234334434534634734834935035135235335435535635735835936036136236336436536636736836937037137237337437537637737837938038138238338438538638738838939039139239339439539639739839940040140240340440540640740840941041141241341441541641741841942042142242342442542642742842943043143243343443543643743843944044144244344444544644744844945045145245345445545645745845946046146246346446546646746846947047147247347447547647747847948048148248348448548648748848949049149249349449549649749849950050150250350450550650750850951051151251351451551651751851952052152252352452552652752852953053153253353453553653753853954054154254354454554654754854955055155255355455555655755855956056156256356456556656756856957057157257357457557657757857958058158258358458558658758858959059159259359459559659759859960060160260360460560660760860961061161261361461561661761861962062162262362462562662762862963063163263363463563663763863964064164264364464564664764864965065165265365465565665765865966066166266366466566666766866967067167267367467567667767867968068168268368468568668768868969069169269369469569669769869970070170270370470570670770870971071171271371471571671771871972072172272372472572672772872973073173273373473573673773873974074174274374474574674774874975075175275375475575675775875976076176276376476576676776876977077177277377477577677777877978078178278378478578678778878979079179279379479579679779879980080180280380480580680780880981081181281381481581681781881982082182282382482582682782882983083183283383483583683783883984084184284384484584684784884985085185285385485585685785885986086186286386486586686786886987087187287387487587687787887988088188288388488588688788888989089189289389489589689789889990090190290390490590690790890991091191291391491591691791891992092192292392492592692792892993093193293393493593693793893994094194294394494594694794894995095195295395495595695795895996096196296396496596696796896997097197297397497597697797897998098198298398498598698798898999099199299399499599699799899910001001100210031004100510061007100810091010101110121013101410151016101710181019102010211022102310241025102610271028102910301031103210331034103510361037103810391040104110421043104410451046104710481049105010511052105310541055105610571058105910601061106210631064106510661067106810691070107110721073107410751076107710781079108010811082108310841085108610871088108910901091109210931094109510961097109810991100110111021103110411051106110711081109111011111112111311141115111611171118111911201121112211231124112511261127112811291130113111321133113411351136113711381139114011411142114311441145114611471148114911501151115211531154115511561157115811591160116111621163116411651166116711681169117011711172117311741175117611771178117911801181118211831184118511861187118811891190119111921193119411951196119711981199120012011202120312041205120612071208120912101211121212131214121512161217121812191220122112221223122412251226122712281229123012311232123312341235123612371238123912401241124212431244124512461247124812491250125112521253125412551256125712581259126012611262126312641265126612671268126912701271127212731274127512761277127812791280128112821283128412851286128712881289129012911292129312941295129612971298129913001301130213031304130513061307130813091310131113121313131413151316131713181319132013211322132313241325132613271328132913301331133213331334133513361337133813391340 |
- # StarPU --- Runtime system for heterogeneous multicore architectures.
- #
- # Copyright (C) 2009-2020 Université de Bordeaux, CNRS (LaBRI UMR 5800), Inria
- #
- # StarPU is free software; you can redistribute it and/or modify
- # it under the terms of the GNU Lesser General Public License as published by
- # the Free Software Foundation; either version 2.1 of the License, or (at
- # your option) any later version.
- #
- # StarPU is distributed in the hope that it will be useful, but
- # WITHOUT ANY WARRANTY; without even the implied warranty of
- # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
- #
- # See the GNU Lesser General Public License in COPYING.LGPL for more details.
- #
- StarPU 1.4.0 (git revision xxxx)
- ==============================================
- New features:
- * Fault tolerance support with starpu_task_ft_failed().
- * Julia programming interface.
- * Add get_max_size method to data interfaces for applications using data with
- variable size to express their maximal potential size.
- * New offline tool to draw graph showing elapsed time between sent
- or received data and their use by tasks
- * Add 4D tensor data interface.
- * New sched_tasks.rec trace file which monitors task scheduling push/pop actions
- * New STARPU_MPI_MEM_THROTTLE environment variable to throttle mpi
- submission according to memory use.
- * New number_events.data trace file which monitors number of events in trace
- files. This file can be parsed by the new script
- starpu_fxt_number_events_to_names.py to convert event keys to event names.
- * New STARPU_PER_WORKER perfmodel.
- * Add energy accounting in the simgrid mode: starpu_energy_use() and
- starpu_energy_used().
- * New function starpu_mpi_get_thread_cpuid() to know where is bound the MPI
- thread.
- * New function starpu_get_pu_os_index() to convert logical index of a PU to
- its OS index.
- * New function starpu_get_hwloc_topology() to get the hwloc topology used by
- StarPU.
- * Add a task prefetch level, to improve retaining data in accelerators so we
- can make prefetch more aggressive.
- * Add starpu_data_dup_ro().
- * Add starpu_data_release_to() and starpu_data_release_to_on_node().
- Small changes:
- * Add a synthetic energy efficiency testcase.
- StarPU 1.3.5 (git revision xxx)
- ====================================================================
- Small features:
- * New environment variable STARPU_FXT_SUFFIX to set the filename in
- which to save the fxt trace
- * New option -d for starpu_fxt_tool to specify in which directory to
- generate files
- Small changes:
- * Move MPI cache functions into the public API
- * Add STARPU_MPI_NOBIND environment variable.
- StarPU 1.3.4 (git revision c37a5d024cd997596da41f765557c58099baf896)
- ====================================================================
- Small features:
- * New environment variables STARPU_BUS_STATS_FILE and
- STARPU_WORKER_STATS_FILE to specify files in which to display
- statistics about data transfers and workers.
- * Add starpu_bcsr_filter_vertical_block filtering function.
- * Add starpu_interface_copy2d, 3d, and 4d to easily request data copies from
- data interfaces.
- * Move optimized cuda 2d copy from interfaces to new
- starpu_cuda_copy2d_async_sync and starpu_cuda_copy3d_async_sync, and use
- them from starpu_interface_copy2d and 3d.
- * New function starpu_task_watchdog_set_hook to specify a function
- to be called when the watchdog is raised
- * Add STARPU_LIMIT_CPU_NUMA_MEM environment variable.
- * Add STARPU_WORKERS_GETBIND environment variable.
- * Add STARPU_SCHED_SIMPLE_DECIDE_ALWAYS modular scheduler flag.
- * And STARPU_LIMIT_BANDWIDTH environment variable.
- * Add field starpu_conf::precedence_over_environment_variables to ignore
- environment variables when parameters are set directly in starpu_conf
- * Add starpu_data_get_coordinates_array
- * MPI: new functions starpu_mpi_interface_datatype_register() and
- starpu_mpi_interface_datatype_unregister() which take a enum
- starpu_data_interface_id instead of a starpu_data_handle_t
- * New script starpu_env to set up StarPU environment variables
- * New STARPU_BACKOFF_MIN and STARPU_BACKOFF_MAX environment variables to the
- exponential backoff limits of the number of cycles to pause while drivers
- are spinning.
- * Add STARPU_DISPLAY_BINDINGS environment variable and
- starpu_display_bindings() function to display all bindings on the machine by
- calling hwloc-ps
- Small changes:
- * New configure option --disable-build-doc-pdf
- StarPU 1.3.3 (git revision 11afc5b007fe1ab1c729b55b47a5a98ef7f3cfad)
- ====================================================================
- New features:
- * New semantic for starpu_task_insert() and alike parameters
- STARPU_CALLBACK_ARG, STARPU_PROLOGUE_CALLBACK_ARG, and
- STARPU_PROLOGUE_CALLBACK_POP_ARG which set respectively
- starpu_task::callback_arg_free,
- starpu_task::prologue_callback_arg_free and
- starpu_task::prologue_callback_pop_arg_free to 1 when used.
- New parameters STARPU_CALLBACK_ARG_NFREE,
- STARPU_CALLBACK_WITH_ARG_NFREE, STARPU_PROLOGUE_CALLBACK_ARG_NFREE, and
- STARPU_PROLOGUE_CALLBACK_POP_ARG_NFREE which set the corresponding
- fields of starpu_task to 0.
- * starpufft: Support 3D.
- * New modular-eager-prio scheduler.
- * Add 'ready' heuristic to modular schedulers.
- * New modular-heteroprio scheduler.
- * Add STARPU_TASK_SCHED_DATA
- * Add support for staging schedulers.
- * New modular-heteroprio-heft scheduler.
- * New dmdap "data-aware performance model (priority)" scheduler
- Changes:
- * Modification in the Native Fortran interface of the functions
- fstarpu_mpi_task_insert, fstarpu_mpi_task_build and
- fstarpu_mpi_task_post_build to only take 1 parameter being the MPI
- communicator, the codelet and the various parameters for the task.
- Small features:
- * New starpu_task_insert() and alike parameter STARPU_TASK_WORKERIDS
- allowing to set the fields starpu_task::workerids_len and
- starpu_task::workerids
- * New starpu_task_insert() and alike parameters
- STARPU_SEQUENTIAL_CONSISTENCY, STARPU_TASK_NO_SUBMITORDER and
- STARPU_TASK_PROFILING_INFO
- * New function starpu_create_callback_task() which creates and
- submits an empty task with the specified callback
- * Use the S4U interface of Simgrid instead of xbt and MSG.
- Small changes:
- * Default modular worker queues to 2 tasks unless it's an heft
- scheduler
- * Separate out STATUS_SLEEPING_SCHEDULING state from
- STATUS_SLEEPING state
- When running the scheduler while being idle, workers do not go in
- the STATUS_SCHEDULING state, so that that time is considered as
- idle time instead of overhead.
- StarPU 1.3.2 (git revision af22a20fc00a37addf3cc6506305f89feed940b0)
- ====================================================================
- Small changes:
- * Improve OpenMP support to detect the environment is valid before
- launching OpenMP
- * Delete old code (drivers gordon, scc, starpu-top, and plugin gcc)
- and update authors file accordingly
- * Add Heteroprio documentation (including a simple example)
- * Add a progression hook, to be called when workers are idle, which
- is used in the NewMadeleine implementation of StarPU-MPI to ensure
- communications progress.
- StarPU 1.3.1 (git revision 01949488b4f8e6fe26d2c200293b8aae5876b038)
- ====================================================================
- Small features:
- * Add starpu_filter_nparts_compute_chunk_size_and_offset helper.
- * Add starpu_bcsr_filter_canonical_block_child_ops.
- Small changes:
- * Improve detection of NVML availability. Do not only check the
- library is available, also check the compiled code can be run.
- StarPU 1.3.0 (git revision 24ca83c6dbb102e1cfc41db3bb21c49662067062)
- ====================================================================
- New features:
- * New scheduler 'heteroprio' with heterogeneous priorities
- * Support priorities for data transfers.
- * Add support for multiple linear regression performance models
- - Bump performance model file format version to 45.
- * Add MPI Master-Slave support to use the cores of remote nodes. Use the
- --enable-mpi-master-slave option to activate it.
- * Add STARPU_CUDA_THREAD_PER_DEV environment variable to support driving all
- GPUs from only one thread when almost all kernels are asynchronous.
- * Add starpu_replay tool to replay tasks.rec files with Simgrid.
- * Add experimental support of NUMA nodes. Use STARPU_USE_NUMA to activate it.
- * Add a new set of functions to make Out-of-Core based on HDF5 Library.
- * Add a new implementation of StarPU-MPI on top of NewMadeleine
- * Add optional callbacks to notify an external resource manager
- about workers going to sleep and waking up
- * Add implicit support for asynchronous partition planning. This means one
- does not need to call starpu_data_partition_submit() etc. explicitly any
- more, StarPU will make the appropriate calls as needed.
- * Add starpu_task_notify_ready_soon_register() to be notified when it is
- determined when a task will be ready an estimated amount of time from now.
- * New StarPU-MPI initialization function (starpu_mpi_init_conf())
- which allows StarPU-MPI to manage reserving a core for the MPI thread, or
- merging it with CPU driver 0.
- * Add possibility to delay the termination of a task with the
- functions starpu_task_end_dep_add() which specifies the number of
- calls to the function starpu_task_end_dep_release() needed to
- trigger the task termination, or with starpu_task_declare_end_deps_array()
- and starpu_task_declare_end_deps() to just declare termination dependencies
- between tasks.
- * Add possibility to define the sequential consistency at the task level
- for each handle used by the task.
- * Add STARPU_SPECIFIC_NODE_LOCAL, STARPU_SPECIFIC_NODE_CPU, and
- STARPU_SPECIFIC_NODE_SLOW as generic values for codelet specific memory
- nodes which can be used instead of exact node numbers.
- * Add starpu_get_next_bindid() and starpu_bind_thread_on() to allow
- binding an application-started thread on a free core. Use it in
- StarPU-MPI to automatically bind the MPI thread on an available core.
- * Add STARPU_RESERVE_NCPU environment variable and
- starpu_config::reserve_ncpus field to make StarPU use a few cores
- less.
- * Add STARPU_MAIN_THREAD_BIND environment variable to make StarPU reserve a
- core for the main thread.
- * New StarPU-RM resource management module to share processor cores and
- accelerator devices with other parallel runtime systems. Use
- --enable-starpurm option to activate it.
- * New schedulers modular-gemm, modular-pheft, modular-prandom and
- modular-prandom-prio
- * Add STARPU_MATRIX_SET_NX/NY/LD and STARPU_VECTOR_SET_NX to change a matrix
- tile or vector size without reallocating the buffer.
- * Application can change the allocation used by StarPU with
- starpu_malloc_set_hooks()
- * XML output for starpu_perfmodel_display and starpu_perfmodel_dump_xml()
- function
- Small features:
- * Scheduling contexts may now be associated a user data pointer at creation
- time, that can later be recalled through starpu_sched_ctx_get_user_data().
- * New environment variables STARPU_SIMGRID_TASK_SUBMIT_COST and
- STARPU_SIMGRID_FETCHING_INPUT_COST to simulate the cost of task
- submission and data fetching in simgrid mode.
- This provides more accurate simgrid predictions, especially for the
- beginning of the execution and regarding data transfers.
- * New environment variable STARPU_SIMGRID_SCHED_COST to take into
- account the time to perform scheduling when running in SimGrid mode.
- * New configure option --enable-mpi-pedantic-isend (disabled by
- default) to acquire data in STARPU_RW (instead of STARPU_R) before
- performing MPI_Isend() call
- * New function starpu_worker_display_names() to display the names of
- all the workers of a specified type.
- * Arbiters now support concurrent read access.
- * Add a field starpu_task::where similar to starpu_codelet::where
- which allows to restrict where to execute a task. Also add
- STARPU_TASK_WHERE to be used when calling starpu_task_insert().
- * Add SubmitOrder trace field.
- * Add workerids and workerids_len task fields.
- * Add priority management to StarPU-MPI. Can be disabled with
- the STARPU_MPI_PRIORITIES environment variable.
- * Add STARPU_MAIN_THREAD_CPUID and STARPU_MPI_THREAD_CPUID environment
- variables.
- * Add disk to disk copy functions and support asynchronous full read/write
- in disk backends.
- * New starpu_task_insert() parameter STARPU_CL_ARGS_NFREE which allows
- to set codelet parameters but without freeing them.
- * New starpu_task_insert() parameter STARPU_TASK_DEPS_ARRAY which
- allows to declare task dependencies similarly to
- starpu_task_declare_deps_array()
- * Add dependency backward information in debugging mode for gdb's
- starpu-print-task
- * Add sched_data field in starpu_task structure.
- * New starpu_fxt_tool option -label-deps to label dependencies on
- the output graph
- * New environment variable STARPU_GENERATE_TRACE_OPTIONS to specify
- fxt options (to be used with STARPU_GENERATE_TRACE)
- * New function starpu_task_set() similar as starpu_task_build() but
- with a task object given as the first parameter
- * New functions
- starpu_data_partition_submit_sequential_consistency() and
- starpu_data_unpartition_submit_sequential_consistency()
- * Add a new value STARPU_TASK_SYNCHRONOUS to be used in
- starpu_task_insert() to define if the task is (or not) synchronous
- * Add memory states events in the traces.
- * Add starpu_sched_component_estimated_end_min_add() to fix termination
- estimations in modular schedulers.
- * New function starpu_data_partition_not_automatic() to disable the
- automatic partitioning of a data handle for which a asynchronous
- plan has previously been submitted
- * Add starpu_task_declare_deps()
- * New function starpu_data_unpartition_submit_sequential_consistency_cb()
- to specify a callback for the task submitting the unpartitioning
- * New tool starpu_mpi_comm_trace.py to draw heatmap of MPI
- communications
- * Support for ARM performance libraries
- * Add functionality to disable signal catching either through field
- starpu_conf::catch_signals or through the environment variable
- STARPU_CATCH_SIGNALS
- * Support for OpenMP Taskloop directive
- * Optional data interface init function (used by the vector and
- matrix interfaces)
- Changes:
- * Vastly improve simgrid simulation time.
- * Switch default scheduler to lws.
- * Add "to" parameter to pull_task and can_push methods of
- components.
- * Deprecate starpu_data_interface_ops::handle_to_pointer interface
- operation in favor of new starpu_data_interface_ops::to_pointer
- operation.
- * Sort data access requests by priority.
- * Cluster support is disabled by default, unless the configure
- option --enable-cluster is specified
- * For unpack operations, move the memory deallocation from
- starpu_data_unpack() to the interface function
- starpu_data_interface_ops::unpack_data(). Pack and unpack
- functions of predefined interfaces
- use public API starpu_malloc_on_node_flags() and
- starpu_free_on_node_flags() to allocate and de-allocate memory
- Small changes:
- * Use asynchronous transfers for task data fetches with were not prefetched.
- * Allow to call starpu_sched_ctx_set_policy_data on the main
- scheduler context
- * Fonction starpu_is_initialized() is moved to the public API.
- * Fix code to allow to submit tasks to empty contexts
- * STARPU_COMM_STATS also displays the bandwidth
- * Update data interfaces implementations to only use public API
- StarPU 1.2.11 (git revision xxx)
- ====================================================================
- Small features:
- * Add starpu_tag_notify_restart_from_apps().
- StarPU 1.2.10 (git revision beb6ac9cc07dc9ae1c838a38d11ed2dae3775996)
- ====================================================================
- Small features:
- * New script starpu_env to set up StarPU environment variables
- * New configure option --disable-build-doc-pdf
- StarPU 1.2.9 (git revision 3aca8da3138a99e93d7f93905d2543bd6f1ea1df)
- ====================================================================
- Small changes:
- * Add STARPU_SIMGRID_TRANSFER_COST environment variable to easily disable
- data transfer costs.
- * New dmdap "data-aware performance model (priority)" scheduler
- * Modification in the Native Fortran interface of the functions
- fstarpu_mpi_task_insert, fstarpu_mpi_task_build and
- fstarpu_mpi_task_post_build to only take 1 parameter being the MPI
- communicator, the codelet and the various parameters for the task.
- StarPU 1.2.8 (git revision f66374c9ad39aefb7cf5dfc31f9ab3d756bcdc3c)
- ====================================================================
- Small features:
- * Minor fixes
- StarPU 1.2.7 (git revision 07cb7533c22958a76351bec002955f0e2818c530)
- ====================================================================
- Small features:
- * Add STARPU_HWLOC_INPUT environment variable to save initialization time.
- * Add starpu_data_set/get_ooc_flag.
- * Use starpu_mpi_tag_t (int64_t) for MPI communication tag
- StarPU 1.2.6 (git revision 23049adea01837479f309a75c002dacd16eb34ad)
- ====================================================================
- Small changes:
- * Fix crash for lws scheduler
- * Avoid making hwloc load PCI topology when CUDA is not enabled
- StarPU 1.2.5 (git revision 22f32916916d158e3420033aa160854d1dd341bd)
- ====================================================================
- Small features:
- * Add a new value STARPU_TASK_COLOR to be used in
- starpu_task_insert() to pick up the color of a task in dag.dot
- * Add starpu_data_pointer_is_inside().
- Changes:
- * Do not export -lcuda -lcudart -lOpenCL in *starpu*.pc.
- StarPU 1.2.4 (git revision 255cf98175ef462749780f30bfed21452b74b594)
- ====================================================================
- Small features:
- * Catch of signals SIGINT and SIGSEGV to dump fxt trace files.
- * New configure option --disable-icc to disable the compilation of
- specific ICC examples
- * Add starpu_codelet_pack_arg_init, starpu_codelet_pack_arg,
- starpu_codelet_pack_arg_fini for more fine-grain packing capabilities.
- * Add starpu_task_insert_data_make_room,
- starpu_task_insert_data_process_arg,
- starpu_task_insert_data_process_array_arg,
- starpu_task_insert_data_process_mode_array_arg
- * Do not show internal tasks in fxt dag by default. Allow to hide
- acquisitions too.
- * Add a way to choose the dag.dot colors.
- StarPU 1.2.3 (git revision 586ba6452a8eef99f275c891ce08933ae542c6c2)
- ====================================================================
- New features:
- * Add per-node MPI data.
- Small features:
- * When debug is enabled, starpu data accessors first check the
- validity of the data interface type
- * Print disk bus performances when STARPU_BUS_STATS is set
- * Add starpu_vector_filter_list_long filter.
- * Data interfaces now define a name through the struct starpu_data_interface_ops
- * StarPU-MPI :
- - allow predefined data interface not to define a mpi datatype and
- to be exchanged through pack/unpack operations
- - New function starpu_mpi_comm_get_attr() which allows to return
- the value of the attribute STARPU_MPI_TAG_UB, i.e the upper
- bound for tag value.
- - New configure option enable-mpi-verbose to manage the display of
- extra MPI debug messages.
- * Add STARPU_WATCHDOG_DELAY environment variable.
- * Add a 'waiting' worker status
- * Allow new value 'extra' for configure option --enable-verbose
- Small changes:
- * Add data_unregister event in traces
- * StarPU-MPI
- - push detached requests at the back of the testing list, so they
- are tested last since they will most probably finish latest
- * Automatically initialize handles on data acquisition when
- reduction methods are provided, and make sure a handle is
- initialized before trying to read it.
- StarPU 1.2.2 (git revision a0b01437b7b91f33fb3ca36bdea35271cad34464)
- ===================================================================
- New features:
- * Add starpu_data_acquire_try and starpu_data_acquire_on_node_try.
- * Add NVCC_CC environment variable.
- * Add -no-flops and -no-events options to starpu_fxt_tool to make
- traces lighter
- * Add starpu_cusparse_init/shutdown/get_local_handle for proper CUDA
- overlapping with cusparse.
- * Allow precise debugging by setting STARPU_TASK_BREAK_ON_PUSH,
- STARPU_TASK_BREAK_ON_SCHED, STARPU_TASK_BREAK_ON_POP, and
- STARPU_TASK_BREAK_ON_EXEC environment variables, with the job_id
- of a task. StarPU will raise SIGTRAP when the task is being
- scheduled, pushed, or popped by the scheduler.
- Small features:
- * New function starpu_worker_get_job_id(struct starpu_task *task)
- which returns the job identifier for a given task
- * Show package/numa topology in starpu_machine_display
- * MPI: Add mpi communications in dag.dot
- * Add STARPU_PERF_MODEL_HOMOGENEOUS_CPU environment variable to
- allow having one perfmodel per CPU core
- * Add starpu_perfmodel_arch_comb_fetch function.
- * Add starpu_mpi_get_data_on_all_nodes_detached function.
- Small changes:
- * Output generated through STARPU_MPI_COMM has been modified to
- allow easier automated checking
- * MPI: Fix reactivity of the beginning of the application, when a
- lot of ready requests have to be processed at the same time, we
- want to poll the pending requests from time to time.
- * MPI: Fix gantt chart for starpu_mpi_irecv: it should use the
- termination time of the request, not the submission time.
- * MPI: Modify output generated through STARPU_MPI_COMM to allow
- easier automated checking
- * MPI: enable more tests in simgrid mode
- * Use assumed-size instead of assumed-shape arrays for native
- fortran API, for better backward compatibility.
- * Fix odd ordering of CPU workers on CPUs due to GPUs stealing some
- cores
- StarPU 1.2.1 (git revision 473acaec8a1fb4f4c73d8b868e4f044b736b41ea)
- ====================================================================
- New features:
- * Add starpu_fxt_trace_user_event_string.
- * Add starpu_tasks_rec_complete tool to add estimation times in tasks.rec
- files.
- * Add STARPU_FXT_TRACE environment variable.
- * Add starpu_data_set_user_data and starpu_data_get_user_data.
- * Add STARPU_MPI_FAKE_SIZE and STARPU_MPI_FAKE_RANK to allow simulating
- execution of just one MPI node.
- * Add STARPU_PERF_MODEL_HOMOGENEOUS_CUDA/OPENCL/MIC/SCC to share performance
- models between devices, making calibration much faster.
- * Add modular-heft-prio scheduler.
- * Add starpu_cublas_get_local_handle helper.
- * Add starpu_data_set_name, starpu_data_set_coordinates_array, and
- starpu_data_set_coordinates to describe data, and starpu_iteration_push and
- starpu_iteration_pop to describe tasks, for better offline traces analysis.
- * New function starpu_bus_print_filenames() to display filenames
- storing bandwidth/affinity/latency information, available through
- tools/starpu_machine_display -i
- * Add support for Ayudame version 2.x debugging library.
- * Add starpu_sched_ctx_get_workers_list_raw, much less costly than
- starpu_sched_ctx_get_workers_list
- * Add starpu_task_get_name and use it to warn about dmda etc. using
- a dumb policy when calibration is not finished
- * MPI: Add functions to test for cached values
- Changes:
- * Fix performance regression of lws for small tasks.
- * Improve native Fortran support for StarPU
- Small changes:
- * Fix type of data home node to allow users to pass -1 to define
- temporary data
- * Fix compatibility with simgrid 3.14
- StarPU 1.2.0 (git revision 5a86e9b61cd01b7797e18956283cc6ea22adfe11)
- ====================================================================
- New features:
- * MIC Xeon Phi support
- * SCC support
- * New function starpu_sched_ctx_exec_parallel_code to execute a
- parallel code on the workers of the given scheduler context
- * MPI:
- - New internal communication system : a unique tag called
- is now used for all communications, and a system
- of hashmaps on each node which stores pending receives has been
- implemented. Every message is now coupled with an envelope, sent
- before the corresponding data, which allows the receiver to
- allocate data correctly, and to submit the matching receive of
- the envelope.
- - New function
- starpu_mpi_irecv_detached_sequential_consistency which
- allows to enable or disable the sequential consistency for
- the given data handle (sequential consistency will be
- enabled or disabled based on the value of the function
- parameter and the value of the sequential consistency
- defined for the given data)
- - New functions starpu_mpi_task_build() and
- starpu_mpi_task_post_build()
- - New flag STARPU_NODE_SELECTION_POLICY to specify a policy for
- selecting a node to execute the codelet when several nodes
- own data in W mode.
- - New selection node policies can be un/registered with the
- functions starpu_mpi_node_selection_register_policy() and
- starpu_mpi_node_selection_unregister_policy()
- - New environment variable STARPU_MPI_COMM which enables
- basic tracing of communications.
- - New function starpu_mpi_init_comm() which allows to specify
- a MPI communicator.
- * New STARPU_COMMUTE flag which can be passed along STARPU_W or STARPU_RW to
- let starpu commute write accesses.
- * Out-of-core support, through registration of disk areas as additional memory
- nodes. It can be enabled programmatically or through the STARPU_DISK_SWAP*
- environment variables.
- * Reclaiming is now periodically done before memory becomes full. This can
- be controlled through the STARPU_*_AVAILABLE_MEM environment variables.
- * New hierarchical schedulers which allow the user to easily build
- its own scheduler, by coding itself each "box" it wants, or by
- combining existing boxes in StarPU to build it. Hierarchical
- schedulers have very interesting scalability properties.
- * Add STARPU_CUDA_ASYNC and STARPU_OPENCL_ASYNC flags to allow asynchronous
- CUDA and OpenCL kernel execution.
- * Add STARPU_CUDA_PIPELINE and STARPU_OPENCL_PIPELINE to specify how
- many asynchronous tasks are submitted in advance on CUDA and
- OpenCL devices. Setting the value to 0 forces a synchronous
- execution of all tasks.
- * Add CUDA concurrent kernel execution support through
- the STARPU_NWORKER_PER_CUDA environment variable.
- * Add CUDA and OpenCL kernel submission pipelining, to overlap costs and allow
- concurrent kernel execution on Fermi cards.
- * New locality work stealing scheduler (lws).
- * Add STARPU_VARIABLE_NBUFFERS to be set in cl.nbuffers, and nbuffers and
- modes field to the task structure, which permit to define codelets taking a
- variable number of data.
- * Add support for implementing OpenMP runtimes on top of StarPU
- * New performance model format to better represent parallel tasks.
- Used to provide estimations for the execution times of the
- parallel tasks on scheduling contexts or combined workers.
- * starpu_data_idle_prefetch_on_node and
- starpu_idle_prefetch_task_input_on_node allow to queue prefetches to be done
- only when the bus is idle.
- * Make starpu_data_prefetch_on_node not forcibly flush data out, introduce
- starpu_data_fetch_on_node for that.
- * Add data access arbiters, to improve parallelism of concurrent data
- accesses, notably with STARPU_COMMUTE.
- * Anticipative writeback, to flush dirty data asynchronously before the
- GPU device is full. Disabled by default. Use STARPU_MINIMUM_CLEAN_BUFFERS
- and STARPU_TARGET_CLEAN_BUFFERS to enable it.
- * Add starpu_data_wont_use to advise that a piece of data will not be used
- in the close future.
- * Enable anticipative writeback by default.
- * New scheduler 'dmdasd' that considers priority when deciding on
- which worker to schedule
- * Add the capability to define specific MPI datatypes for
- StarPU user-defined interfaces.
- * Add tasks.rec trace output to make scheduling analysis easier.
- * Add Fortran 90 module and example using it
- * New StarPU-MPI gdb debug functions
- * Generate animated html trace of modular schedulers.
- * Add asynchronous partition planning. It only supports coherency through
- the home node of data for now.
- * Add STARPU_MALLOC_SIMULATION_FOLDED flag to save memory when simulating.
- * Include application threads in the trace.
- * Add starpu_task_get_task_scheduled_succs to get successors of a task.
- * Add graph inspection facility for schedulers.
- * New STARPU_LOCALITY flag to mark data which should be taken into account
- by schedulers for improving locality.
- * Experimental support for data locality in ws and lws.
- * Add a preliminary framework for native Fortran support for StarPU
- Small features:
- * Tasks can now have a name (via the field const char *name of
- struct starpu_task)
- * New functions starpu_data_acquire_cb_sequential_consistency() and
- starpu_data_acquire_on_node_cb_sequential_consistency() which allows
- to enable or disable sequential consistency
- * New configure option --enable-fxt-lock which enables additional
- trace events focused on locks behaviour during the execution
- * Functions starpu_insert_task and starpu_mpi_insert_task are
- renamed in starpu_task_insert and starpu_mpi_task_insert. Old
- names are kept to avoid breaking old codes.
- * New configure option --enable-calibration-heuristic which allows
- the user to set the maximum authorized deviation of the
- history-based calibrator.
- * Allow application to provide the task footprint itself.
- * New function starpu_sched_ctx_display_workers() to display worker
- information belonging to a given scheduler context
- * The option --enable-verbose can be called with
- --enable-verbose=extra to increase the verbosity
- * Add codelet size, footprint and tag id in the paje trace.
- * Add STARPU_TAG_ONLY, to specify a tag for traces without making StarPU
- manage the tag.
- * On Linux x86, spinlocks now block after a hundred tries. This avoids
- typical 10ms pauses when the application thread tries to submit tasks.
- * New function char *starpu_worker_get_type_as_string(enum starpu_worker_archtype type)
- * Improve static scheduling by adding support for specifying the task
- execution order.
- * Add starpu_worker_can_execute_task_impl and
- starpu_worker_can_execute_task_first_impl to optimize getting the
- working implementations
- * Add STARPU_MALLOC_NORECLAIM flag to allocate without running a reclaim if
- the node is out of memory.
- * New flag STARPU_DATA_MODE_ARRAY for the function family
- starpu_task_insert to allow to define a array of data handles
- along with their access modes.
- * New configure option --enable-new-check to enable new testcases
- which are known to fail
- * Add starpu_memory_allocate and _deallocate to let the application declare
- its own allocation to the reclaiming engine.
- * Add STARPU_SIMGRID_CUDA_MALLOC_COST and STARPU_SIMGRID_CUDA_QUEUE_COST to
- disable CUDA costs simulation in simgrid mode.
- * Add starpu_task_get_task_succs to get the list of children of a given
- task.
- * Add starpu_malloc_on_node_flags, starpu_free_on_node_flags, and
- starpu_malloc_on_node_set_default_flags to control the allocation flags
- used for allocations done by starpu.
- * Ranges can be provided in STARPU_WORKERS_CPUID
- * Add starpu_fxt_autostart_profiling to be able to avoid autostart.
- * Add arch_cost_function perfmodel function field.
- * Add STARPU_TASK_BREAK_ON_SCHED, STARPU_TASK_BREAK_ON_PUSH, and
- STARPU_TASK_BREAK_ON_POP environment variables to debug schedulers.
- * Add starpu_sched_display tool.
- * Add starpu_memory_pin and starpu_memory_unpin to pin memory allocated
- another way than starpu_malloc.
- * Add STARPU_NOWHERE to create synchronization tasks with data.
- * Document how to switch between differents views of the same data.
- * Add STARPU_NAME to specify a task name from a starpu_task_insert call.
- * Add configure option to disable fortran --disable-fortran
- * Add configure option to give path for smpirun executable --with-smpirun
- * Add configure option to disable the build of tests --disable-build-tests
- * Add starpu-all-tasks debugging support
- * New function
- void starpu_opencl_load_program_source_malloc(const char *source_file_name, char **located_file_name, char **located_dir_name, char **opencl_program_source)
- which allocates the pointers located_file_name, located_dir_name
- and opencl_program_source.
- * Add submit_hook and do_schedule scheduler methods.
- * Add starpu_sleep.
- * Add starpu_task_list_ismember.
- * Add _starpu_fifo_pop_this_task.
- * Add STARPU_MAX_MEMORY_USE environment variable.
- * Add starpu_worker_get_id_check().
- * New function starpu_mpi_wait_for_all(MPI_Comm comm) that allows to
- wait until all StarPU tasks and communications for the given
- communicator are completed.
- * New function starpu_codelet_unpack_args_and_copyleft() which
- allows to copy in a new buffer values which have not been unpacked by
- the current call
- * Add STARPU_CODELET_SIMGRID_EXECUTE flag.
- * Add STARPU_CODELET_SIMGRID_EXECUTE_AND_INJECT flag.
- * Add STARPU_CL_ARGS flag to starpu_task_insert() and
- starpu_mpi_task_insert() functions call
- Changes:
- * Data interfaces (variable, vector, matrix and block) now define
- pack und unpack functions
- * StarPU-MPI: Fix for being able to receive data which have not yet
- been registered by the application (i.e it did not call
- starpu_data_set_tag(), data are received as a raw memory)
- * StarPU-MPI: Fix for being able to receive data with the same tag
- from several nodes (see mpi/tests/gather.c)
- * Remove the long-deprecated cost_model fields and task->buffers field.
- * Fix complexity of implicit task/data dependency, from quadratic to linear.
- Small changes:
- * Rename function starpu_trace_user_event() as
- starpu_fxt_trace_user_event()
- * "power" is renamed into "energy" wherever it applies, notably energy
- consumption performance models
- * Update starpu_task_build() to set starpu_task::cl_arg_free to 1 if
- some arguments of type ::STARPU_VALUE are given.
- * Simplify performance model loading API
- * Better semantic for environment variables STARPU_NMIC and
- STARPU_NMICDEVS, the number of devices and the number of cores.
- STARPU_NMIC will be the number of devices, and STARPU_NMICCORES
- will be the number of cores per device.
- StarPU 1.1.8 (git revision f7b7abe9f86361cbc96f2b51c6ad7336b7d1d628)
- ====================================================================
- The scheduling context release
- Small changes:
- * Fix compatibility with simgrid 3.14
- * Fix lock ordering for memory reclaiming
- StarPU 1.1.7 (git revision 341044b67809892cf4a388e482766beb50256907)
- ====================================================================
- The scheduling context release
- Small changes:
- * Fix type of data home node to allow users to pass -1 to define
- temporary data
- StarPU 1.1.6 (git revision cdffbd5f5447e4d076d659232b3deb14f3c20da6)
- ====================================================================
- The scheduling context release
- Small features:
- * Add starpu_task_get_task_succs to get the list of children of a given
- task.
- * Ranges can be provided in STARPU_WORKERS_CPUID
- Small changes:
- * Various fixes for MacOS and windows systems
- StarPU 1.1.5 (git revision 20469c6f3e7ecd6c0568c8e4e4b5b652598308d8xxx)
- =======================================================================
- The scheduling context release
- New features:
- * Add starpu_memory_pin and starpu_memory_unpin to pin memory allocated
- another way than starpu_malloc.
- * Add starpu_task_wait_for_n_submitted() and
- STARPU_LIMIT_MAX_NSUBMITTED_TASKS/STARPU_LIMIT_MIN_NSUBMITTED_TASKS to
- easily control the number of submitted tasks by making task submission
- block.
- * Add STARPU_NOWHERE to create synchronization tasks with data.
- * Document how to switch between differents views of the same data.
- * Add Fortran 90 module and example using it
- StarPU 1.1.4 (git revision 2a3d30b28d6d099d271134a786335acdbb3931a3)
- ====================================================================
- The scheduling context release
- New features:
- * Fix and actually enable the cache allocation.
- * Enable allocation cache in main RAM when STARPU_LIMIT_CPU_MEM is set by
- the user.
- * New MPI functions starpu_mpi_issend and starpu_mpi_issend_detached
- to send data using a synchronous and non-blocking mode (internally
- uses MPI_Issend)
- * New data access mode flag STARPU_SSEND to be set when calling
- starpu_mpi_insert_task to specify the data has to be sent using a
- synchronous and non-blocking mode
- * New environment variable STARPU_PERF_MODEL_DIR which can be set to
- specify a directory where to store performance model files in.
- When unset, the files are stored in $STARPU_HOME/.starpu/sampling
- * MPI:
- - New function starpu_mpi_data_register_comm to register a data
- with another communicator than MPI_COMM_WORLD
- - New functions starpu_mpi_data_set_rank() and starpu_mpi_data_set_tag()
- which call starpu_mpi_data_register_comm()
- Small features:
- * Add starpu_memory_wait_available() to wait for a given size to become
- available on a given node.
- * New environment variable STARPU_RAND_SEED to set the seed used for random
- numbers.
- * New function starpu_mpi_cache_set() to enable or disable the
- communication cache at runtime
- * Add starpu_paje_sort which sorts Pajé traces.
- Changes:
- * Fix complexity of implicit task/data dependency, from quadratic to linear.
- StarPU 1.1.3 (git revision 11afc5b007fe1ab1c729b55b47a5a98ef7f3cfad)
- ====================================================================
- The scheduling context release
- New features:
- * One can register an existing on-GPU buffer to be used by a handle.
- * Add the starpu_paje_summary statistics tool.
- * Enable gpu-gpu transfers for matrices.
- * Let interfaces declare which transfers they allow with the can_copy
- methode.
- Small changes:
- * Lock performance model files while writing and reading them to avoid
- issues on parallel launches, MPI runs notably.
- * Lots of build fixes for icc on Windows.
- StarPU 1.1.2 (git revision d14c550798630bbc4f3da2b07d793c47e3018f02)
- ====================================================================
- The scheduling context release
- New features:
- * The reduction init codelet is automatically used to initialize temporary
- buffers.
- * Traces now include a "scheduling" state, to show the overhead of the
- scheduler.
- * Add STARPU_CALIBRATE_MINIMUM environment variable to specify the minimum
- number of calibration measurements.
- * Add STARPU_TRACE_BUFFER_SIZE environment variable to specify the size of
- the trace buffer.
- StarPU 1.1.1 (git revision dab2e51117fac5bef767f3a6b7677abb2147d2f2)
- ====================================================================
- The scheduling context release
- New features:
- * MPI:
- - New variable STARPU_MPI_CACHE_STATS to print statistics on
- cache holding received data.
- - New function starpu_mpi_data_register() which sets the rank
- and tag of a data, and also allows to automatically clear
- the MPI communication cache when unregistering the data. It
- should be called instead of both calling
- starpu_data_set_tag() and starpu_data_set_rank()
- * Use streams for all CUDA transfers, even initiated by CPUs.
- * Add paje traces statistics tools.
- * Use streams for GPUA->GPUB and GPUB->GPUA transfers.
- Small features:
- * New STARPU_EXECUTE_ON_WORKER flag to specify the worker on which
- to execute the task.
- * New STARPU_DISABLE_PINNING environment variable to disable host memory
- pinning.
- * New STARPU_DISABLE_KERNELS environment variable to disable actual kernel
- execution.
- * New starpu_memory_get_total function to get the size of a memory node.
- * New starpu_parallel_task_barrier_init_n function to let a scheduler decide
- a set of workers without going through combined workers.
- Changes:
- * Fix simgrid execution.
- * Rename starpu_get_nready_tasks_of_sched_ctx to starpu_sched_ctx_get_nready_tasks
- * Rename starpu_get_nready_flops_of_sched_ctx to starpu_sched_ctx_get_nready_flops
- * New functions starpu_pause() and starpu_resume()
- * New codelet specific_nodes field to specify explicit target nodes for data.
- * StarPU-MPI: Fix overzealous allocation of memory.
- * Interfaces: Allow interface implementation to change pointers at will, in
- unpack notably.
- Small changes:
- * Use big fat abortions when one tries to make a task or callback
- sleep, instead of just returning EDEADLCK which few people will test
- * By default, StarPU FFT examples are not compiled and checked, the
- configure option --enable-starpufft-examples needs to be specified
- to change this behaviour.
- StarPU 1.1.0 (git revision 3c4bc72ccef30e767680cad3d749c4e9010d4476)
- ====================================================================
- The scheduling context release
- New features:
- * OpenGL interoperability support.
- * Capability to store compiled OpenCL kernels on the file system
- * Capability to load compiled OpenCL kernels
- * Performance models measurements can now be provided explicitly by
- applications.
- * Capability to emit communication statistics when running MPI code
- * Add starpu_data_unregister_submit, starpu_data_acquire_on_node and
- starpu_data_invalidate_submit
- * New functionnality to wrapper starpu_insert_task to pass a array of
- data_handles via the parameter STARPU_DATA_ARRAY
- * Enable GPU-GPU direct transfers.
- * GCC plug-in
- - Add `registered' attribute
- - A new pass was added that warns about the use of possibly
- unregistered memory buffers.
- * SOCL
- - Manual mapping of commands on specific devices is now
- possible
- - SOCL does not require StarPU CPU tasks anymore. CPU workers
- are automatically disabled to enhance performance of OpenCL
- CPU devices
- * New interface: COO matrix.
- * Data interfaces: The pack operation of user-defined data interface
- defines a new parameter count which should be set to the size of
- the buffer created by the packing of the data.
- * MPI:
- - Communication statistics for MPI can only be enabled at
- execution time by defining the environment variable
- STARPU_COMM_STATS
- - Communication cache mechanism is enabled by default, and can
- only be disabled at execution time by setting the
- environment variable STARPU_MPI_CACHE to 0.
- - Initialisation functions starpu_mpi_initialize_extended()
- and starpu_mpi_initialize() have been made deprecated. One
- should now use starpu_mpi_init(int *, char ***, int). The
- last parameter indicates if MPI should be initialised.
- - Collective detached operations have new parameters, a
- callback function and a argument. This is to be consistent
- with the detached point-to-point communications.
- - When exchanging user-defined data interfaces, the size of
- the data is the size returned by the pack operation, i.e
- data with dynamic size can now be exchanged with StarPU-MPI.
- * Add experimental simgrid support, to simulate execution with various
- number of CPUs, GPUs, amount of memory, etc.
- * Add support for OpenCL simulators (which provide simulated execution time)
- * Add support for Temanejo, a task graph debugger
- * Theoretical bound lp output now includes data transfer time.
- * Update OpenCL driver to only enable CPU devices (the environment
- variable STARPU_OPENCL_ONLY_ON_CPUS must be set to a positive
- value when executing an application)
- * Add Scheduling contexts to separate computation resources
- - Scheduling policies take into account the set of resources corresponding
- to the context it belongs to
- - Add support to dynamically change scheduling contexts
- (Create and Delete a context, Add Workers to a context, Remove workers from a context)
- - Add support to indicate to which contexts the tasks are submitted
- * Add the Hypervisor to manage the Scheduling Contexts automatically
- - The Contexts can be registered to the Hypervisor
- - Only the registered contexts are managed by the Hypervisor
- - The Hypervisor can detect the initial distribution of resources of
- a context and constructs it consequently (the cost of execution is required)
- - Several policies can adapt dynamically the distribution of resources
- in contexts if the initial one was not appropriate
- - Add a platform to implement new policies of redistribution
- of resources
- * Implement a memory manager which checks the global amount of
- memory available on devices, and checks there is enough memory
- before doing an allocation on the device.
- * Discard environment variable STARPU_LIMIT_GPU_MEM and define
- instead STARPU_LIMIT_CUDA_MEM and STARPU_LIMIT_OPENCL_MEM
- * Introduce new variables STARPU_LIMIT_CUDA_devid_MEM and
- STARPU_LIMIT_OPENCL_devid_MEM to limit memory per specific device
- * Introduce new variable STARPU_LIMIT_CPU_MEM to limit memory for
- the CPU devices
- * New function starpu_malloc_flags to define a memory allocation with
- constraints based on the following values:
- - STARPU_MALLOC_PINNED specifies memory should be pinned
- - STARPU_MALLOC_COUNT specifies the memory allocation should be in
- the limits defined by the environment variables STARPU_LIMIT_xxx
- (see above). When no memory is left, starpu_malloc_flag tries
- to reclaim memory from StarPU and returns -ENOMEM on failure.
- * starpu_malloc calls starpu_malloc_flags with a value of flag set
- to STARPU_MALLOC_PINNED
- * Define new function starpu_free_flags similarly to starpu_malloc_flags
- * Define new public API starpu_pthread which is similar to the
- pthread API. It is provided with 2 implementations: a pthread one
- and a Simgrid one. Applications using StarPU and wishing to use
- the Simgrid StarPU features should use it.
- * Allow to have a dynamically allocated number of buffers per task,
- and so overwrite the value defined --enable-maxbuffers=XXX
- * Performance models files are now stored in a directory whose name
- include the version of the performance model format. The version
- number is also written in the file itself.
- When updating the format, the internal variable
- _STARPU_PERFMODEL_VERSION should be updated. It is then possible
- to switch easily between differents versions of StarPU having
- different performance model formats.
- * Tasks can now define a optional prologue callback which is executed
- on the host when the task becomes ready for execution, before getting
- scheduled.
- * Small CUDA allocations (<= 4MiB) are now batched to avoid the huge
- cudaMalloc overhead.
- * Prefetching is now done for all schedulers when it can be done whatever
- the scheduling decision.
- * Add a watchdog which permits to easily trigger a crash when StarPU gets
- stuck.
- * Document how to migrate data over MPI.
- * New function starpu_wakeup_worker() to be used by schedulers to
- wake up a single worker (instead of all workers) when submitting a
- single task.
- * The functions starpu_sched_set/get_min/max_priority set/get the
- priorities of the current scheduling context, i.e the one which
- was set by a call to starpu_sched_ctx_set_context() or the initial
- context if the function has not been called yet.
- * Fix for properly dealing with NAN on windows systems
- Small features:
- * Add starpu_worker_get_by_type and starpu_worker_get_by_devid
- * Add starpu_fxt_stop_profiling/starpu_fxt_start_profiling which permits to
- pause trace recording.
- * Add trace_buffer_size configuration field to permit to specify the tracing
- buffer size.
- * Add starpu_codelet_profile and starpu_codelet_histo_profile, tools which draw
- the profile of a codelet.
- * File STARPU-REVISION --- containing the SVN revision number from which
- StarPU was compiled --- is installed in the share/doc/starpu directory
- * starpu_perfmodel_plot can now directly draw GFlops curves.
- * New configure option --enable-mpi-progression-hook to enable the
- activity polling method for StarPU-MPI.
- * Permit to disable sequential consistency for a given task.
- * New macro STARPU_RELEASE_VERSION
- * New function starpu_get_version() to return as 3 integers the
- release version of StarPU.
- * Enable by default data allocation cache
- * New function starpu_perfmodel_directory() to print directory
- storing performance models. Available through the new option -d of
- the tool starpu_perfmodel_display
- * New batch files to execute StarPU applications under Microsoft
- Visual Studio (They are installed in path_to_starpu/bin/msvc)/
- * Add cl_arg_free, callback_arg_free, prologue_callback_arg_free fields to
- enable automatic free(cl_arg); free(callback_arg);
- free(prologue_callback_arg) on task destroy.
- * New function starpu_task_build
- * New configure options --with-simgrid-dir
- --with-simgrid-include-dir and --with-simgrid-lib-dir to specify
- the location of the SimGrid library
- Changes:
- * Rename all filter functions to follow the pattern
- starpu_DATATYPE_filter_FILTERTYPE. The script
- tools/dev/rename_filter.sh is provided to update your existing
- applications to use new filters function names.
- * Renaming of diverse functions and datatypes. The script
- tools/dev/rename.sh is provided to update your existing
- applications to use the new names. It is also possible to compile
- with the pkg-config package starpu-1.0 to keep using the old
- names. It is however recommended to update your code and to use
- the package starpu-1.1.
- * Fix the block filter functions.
- * Fix StarPU-MPI on Darwin.
- * The FxT code can now be used on systems other than Linux.
- * Keep only one hashtable implementation common/uthash.h
- * The cache of starpu_mpi_insert_task is fixed and thus now enabled by
- default.
- * Improve starpu_machine_display output.
- * Standardize objects name in the performance model API
- * SOCL
- - Virtual SOCL device has been removed
- - Automatic scheduling still available with command queues not
- assigned to any device
- - Remove modified OpenCL headers. ICD is now the only supported
- way to use SOCL.
- - SOCL test suite is only run when environment variable
- SOCL_OCL_LIB_OPENCL is defined. It should contain the location
- of the libOpenCL.so file of the OCL ICD implementation.
- * Fix main memory leak on multiple unregister/re-register.
- * Improve hwloc detection by configure
- * Cell:
- - It is no longer possible to enable the cell support via the
- gordon driver
- - Data interfaces no longer define functions to copy to and from
- SPU devices
- - Codelet no longer define pointer for Gordon implementations
- - Gordon workers are no longer enabled
- - Gordon performance models are no longer enabled
- * Fix data transfer arrows in paje traces
- * The "heft" scheduler no longer exists. Users should now pick "dmda"
- instead.
- * StarPU can now use poti to generate paje traces.
- * Rename scheduling policy "parallel greedy" to "parallel eager"
- * starpu_scheduler.h is no longer automatically included by
- starpu.h, it has to be manually included when needed
- * New batch files to run StarPU applications with Microsoft Visual C
- * Add examples/release/Makefile to test StarPU examples against an
- installed version of StarPU. That can also be used to test
- examples using a previous API.
- * Tutorial is installed in ${docdir}/tutorial
- * Schedulers eager_central_policy, dm and dmda no longer erroneously respect
- priorities. dmdas has to be used to respect priorities.
- * StarPU-MPI: Fix potential bug for user-defined datatypes. As MPI
- can reorder messages, we need to make sure the sending of the size
- of the data has been completed.
- * Documentation is now generated through doxygen.
- * Modification of perfmodels output format for future improvements.
- * Fix for properly dealing with NAN on windows systems
- * Function starpu_sched_ctx_create() now takes a variable argument
- list to define the scheduler to be used, and the minimum and
- maximum priority values
- * The functions starpu_sched_set/get_min/max_priority set/get the
- priorities of the current scheduling context, i.e the one which
- was set by a call to starpu_sched_ctx_set_context() or the initial
- context if the function was not called yet.
- * MPI: Fix of the livelock issue discovered while executing applications
- on a CPU+GPU cluster of machines by adding a maximum trylock
- threshold before a blocking lock.
- Small changes:
- * STARPU_NCPU should now be used instead of STARPU_NCPUS. STARPU_NCPUS is
- still available for compatibility reasons.
- * include/starpu.h includes all include/starpu_*.h files, applications
- therefore only need to have #include <starpu.h>
- * Active task wait is now included in blocked time.
- * Fix GCC plugin linking issues starting with GCC 4.7.
- * Fix forcing calibration of never-calibrated archs.
- * CUDA applications are no longer compiled with the "-arch sm_13"
- option. It is specifically added to applications which need it.
- * Explicitly name the non-sleeping-non-running time "Overhead", and use
- another color in vite traces.
- * Use C99 variadic macro support, not GNU.
- * Fix performance regression: dmda queues were inadvertently made
- LIFOs in r9611.
- StarPU 1.0.3 (git revision 25f8b3a7b13050e99bf1725ca6f52cfd62e7a861)
- ====================================================================
- Changes:
- * Several bug fixes in the build system
- * Bug fixes in source code for non-Linux systems
- * Fix generating FXT traces bigger than 64MiB.
- * Improve ENODEV error detections in StarPU FFT
- StarPU 1.0.2 (git revision 6f95de279d6d796a39debe8d6c5493b3bdbe0c37)
- ====================================================================
- Changes:
- * Add starpu_block_shadow_filter_func_vector and an example.
- * Add tag dependency in trace-generated DAG.
- * Fix CPU binding for optimized CPU-GPU transfers.
- * Fix parallel tasks CPU binding and combined worker generation.
- * Fix generating FXT traces bigger than 64MiB.
- StarPU 1.0.1 (git revision 97ea6e15a273e23e4ddabf491b0f9481373ca01a)
- ====================================================================
- Changes:
- * hwloc support. Warn users when hwloc is not found on the system and
- produce error when not explicitely disabled.
- * Several bug fixes
- * GCC plug-in
- - Add `#pragma starpu release'
- - Fix bug when using `acquire' pragma with function parameters
- - Slightly improve test suite coverage
- - Relax the GCC version check
- * Update SOCL to use new API
- * Documentation improvement.
- StarPU 1.0.0 (git revision d3ad9ca318ec9acfeaf8eb7d8a018b09e4722292)
- ====================================================================
- The extensions-again release
- New features:
- * Add SOCL, an OpenCL interface on top of StarPU.
- * Add a gcc plugin to extend the C interface with pragmas which allows to
- easily define codelets and issue tasks.
- * Add reduction mode to starpu_mpi_insert_task.
- * A new multi-format interface permits to use different binary formats
- on CPUs & GPUs, the conversion functions being provided by the
- application and called by StarPU as needed (and as less as
- possible).
- * Deprecate cost_model, and introduce cost_function, which is provided
- with the whole task structure, the target arch and implementation
- number.
- * Permit the application to provide its own size base for performance
- models.
- * Applications can provide several implementations of a codelet for the
- same architecture.
- * Add a StarPU-Top feedback and steering interface.
- * Permit to specify MPI tags for more efficient starpu_mpi_insert_task
- Changes:
- * Fix several memory leaks and race conditions
- * Make environment variables take precedence over the configuration
- passed to starpu_init()
- * Libtool interface versioning has been included in libraries names
- (libstarpu-1.0.so, libstarpumpi-1.0.so,
- libstarpufft-1.0.so, libsocl-1.0.so)
- * Install headers under $includedir/starpu/1.0.
- * Make where field for struct starpu_codelet optional. When unset, its
- value will be automatically set based on the availability of the
- different XXX_funcs fields of the codelet.
- * Define access modes for data handles into starpu_codelet and no longer
- in starpu_task. Hence mark (struct starpu_task).buffers as
- deprecated, and add (struct starpu_task).handles and (struct
- starpu_codelet).modes
- * Fields xxx_func of struct starpu_codelet are made deprecated. One
- should use fields xxx_funcs instead.
- * Some types were renamed for consistency. when using pkg-config libstarpu,
- starpu_deprecated_api.h is automatically included (after starpu.h) to
- keep compatibility with existing software. Other changes are mentioned
- below, compatibility is also preserved for them.
- To port code to use new names (this is not mandatory), the
- tools/dev/rename.sh script can be used, and pkg-config starpu-1.0 should
- be used.
- * The communication cost in the heft and dmda scheduling strategies now
- take into account the contention brought by the number of GPUs. This
- changes the meaning of the beta factor, whose default 1.0 value should
- now be good enough in most case.
- Small features:
- * Allow users to disable asynchronous data transfers between CPUs and
- GPUs.
- * Update OpenCL driver to enable CPU devices (the environment variable
- STARPU_OPENCL_ON_CPUS must be set to a positive value when
- executing an application)
- * struct starpu_data_interface_ops --- operations on a data
- interface --- define a new function pointer allocate_new_data
- which creates a new data interface of the given type based on
- an existing handle
- * Add a field named magic to struct starpu_task which is set when
- initialising the task. starpu_task_submit will fail if the
- field does not have the right value. This will hence avoid
- submitting tasks which have not been properly initialised.
- * Add a hook function pre_exec_hook in struct starpu_sched_policy.
- The function is meant to be called in drivers. Schedulers
- can use it to be notified when a task is about being computed.
- * Add codelet execution time statistics plot.
- * Add bus speed in starpu_machine_display.
- * Add a STARPU_DATA_ACQUIRE_CB which permits to inline the code to be
- done.
- * Add gdb functions.
- * Add complex support to LU example.
- * Permit to use the same data several times in write mode in the
- parameters of the same task.
- Small changes:
- * Increase default value for STARPU_MAXCPUS -- Maximum number of
- CPUs supported -- to 64.
- * Add man pages for some of the tools
- * Add C++ application example in examples/cpp/
- * Add an OpenMP fork-join example.
- * Documentation improvement.
- StarPU 0.9 (git revision 12bba8528fc0d85367d885cddc383ba54efca464)
- ==================================================================
- The extensions release
- * Provide the STARPU_REDUX data access mode
- * Externalize the scheduler API.
- * Add theoretical bound computation
- * Add the void interface
- * Add power consumption optimization
- * Add parallel task support
- * Add starpu_mpi_insert_task
- * Add profiling information interface.
- * Add STARPU_LIMIT_GPU_MEM environment variable.
- * OpenCL fixes
- * MPI fixes
- * Improve optimization documentation
- * Upgrade to hwloc 1.1 interface
- * Add fortran example
- * Add mandelbrot OpenCL example
- * Add cg example
- * Add stencil MPI example
- * Initial support for CUDA4
- StarPU 0.4 (git revision ad8d8be3619f211f228c141282d7d504646fc2a6)
- ==================================================================
- The API strengthening release
- * Major API improvements
- - Provide the STARPU_SCRATCH data access mode
- - Rework data filter interface
- - Rework data interface structure
- - A script that automatically renames old functions to accomodate with the new
- API is available from https://scm.gforge.inria.fr/svn/starpu/scripts/renaming
- (login: anonsvn, password: anonsvn)
- * Implement dependencies between task directly (eg. without tags)
- * Implicit data-driven task dependencies simplifies the design of
- data-parallel algorithms
- * Add dynamic profiling capabilities
- - Provide per-task feedback
- - Provide per-worker feedback
- - Provide feedback about memory transfers
- * Provide a library to help accelerating MPI applications
- * Improve data transfers overhead prediction
- - Transparently benchmark buses to generate performance models
- - Bind accelerator-controlling threads with respect to NUMA locality
- * Improve StarPU's portability
- - Add OpenCL support
- - Add support for Windows
- StarPU 0.2.901 aka 0.3-rc1 (git revision 991f2abb772c17c3d45bbcf27f46197652e6a3ef)
- ==================================================================================
- The asynchronous heterogeneous multi-accelerator release
- * Many API changes and code cleanups
- - Implement starpu_worker_get_id
- - Implement starpu_worker_get_name
- - Implement starpu_worker_get_type
- - Implement starpu_worker_get_count
- - Implement starpu_display_codelet_stats
- - Implement starpu_data_prefetch_on_node
- - Expose the starpu_data_set_wt_mask function
- * Support nvidia (heterogeneous) multi-GPU
- * Add the data request mechanism
- - All data transfers use data requests now
- - Implement asynchronous data transfers
- - Implement prefetch mechanism
- - Chain data requests to support GPU->RAM->GPU transfers
- * Make it possible to bypass the scheduler and to assign a task to a specific
- worker
- * Support restartable tasks to reinstanciate dependencies task graphs
- * Improve performance prediction
- - Model data transfer overhead
- - One model is created for each accelerator
- * Support for CUDA's driver API is deprecated
- * The STARPU_WORKERS_CUDAID and STARPU_WORKERS_CPUID env. variables make it possible to
- specify where to bind the workers
- * Use the hwloc library to detect the actual number of cores
- StarPU 0.2.0 (git revision 73e989f0783e10815aff394f80242760c4ed098c)
- ====================================================================
- The Stabilizing-the-Basics release
- * Various API cleanups
- * Mac OS X is supported now
- * Add dynamic code loading facilities onto Cell's SPUs
- * Improve performance analysis/feedback tools
- * Application can interact with StarPU tasks
- - The application may access/modify data managed by the DSM
- - The application may wait for the termination of a (set of) task(s)
- * An initial documentation is added
- * More examples are supplied
- StarPU 0.1.0 (git revision 911869a96b40c74eb92b30a43d3e08bf445d8078)
- ====================================================================
- First release.
- Status:
- * Only supports Linux platforms yet
- * Supported architectures
- - multicore CPUs
- - NVIDIA GPUs (with CUDA 2.x)
- - experimental Cell/BE support
- Changes:
- * Scheduling facilities
- - run-time selection of the scheduling policy
- - basic auto-tuning facilities
- * Software-based DSM
- - transparent data coherency management
- - High-level expressive interface
- # Local Variables:
- # mode: text
- # coding: utf-8
- # ispell-local-dictionary: "american"
- # End:
|