Grid Engine

Release Notes Univa Grid Engine 8.1

From UGE810

Jump to: navigation, search

Contents

License

TERM SOFTWARE LICENSE AND SUPPORT AGREEMENT

PLEASE READ THIS AGREEMENT BEFORE USING THE SOFTWARE.

BY USING THE SOFTWARE AND CLICKING OR CHOOSING ‘YES,’ YOU ARE AGREEING TO BE BOUND BY THIS AGREEMENT. - SIGNIFY YOUR AGREEMENT BY CLICKING OR CHOOSING ‘YES.’

IF YOU DO NOT WANT TO AGREE TO THIS AGREEMENT, CLICK OR CHOOSE ‘NO.’ - IF YOU CLICK OR CHOOSE ‘NO’ YOU CANNOT USE THE SOFTWARE.

This agreement is between the individual or entity agreeing to this agreement and Univa Corporation, a Delaware corporation (Univa) of 11044 Research Blvd. Suite B-415, Austin, TX 78759.

1. SCOPE: This agreement governs the licensing of the Univa Software and Support provided to Customer.

  • Univa Software means the Univa software described in the order, all
  • updates and enhancements provided under Support, its software
  • documentation, and license keys (Univa Software), which are licensed
  • under this agreement. This Univa Software is only licensed and is not
  • sold to Company.
  • Third-Party Software/Open Source Software licensing terms are
  • addressed on the bottom of this agreement.

2. LICENSE. Subject to the other terms of this agreement, Univa grants Customer, under an order, a non-exclusive, non-transferable, term license up to the license capacity purchased to:

(a) Operate the Univa Software in Customer’s business operations; and

(b) Make a reasonable number of copies of the Univa Software for archival and backup purposes.

Customer’s contractors and majority owned affiliates are allowed to use and access the Univa Software under the terms of this agreement. Customer is responsible for their compliance with the terms of this agreement.

3. RESTRICTIONS. Univa reserves all rights not expressly granted. Customer is prohibited from:

(a) assigning, sublicensing, or renting the Univa Software or using it as any type of software service provider or outsourcing environment; or

(b) causing or permitting the reverse engineering (except to the extent expressly permitted by applicable law despite this limitation), decompiling, disassembly, modification, translation, attempting to discover the source code of the Univa Software or to create derivative works from the Univa Software.

4. PROPRIETARY RIGHTS AND CONFIDENTIALITY.

(a) Proprietary Rights. The Univa Software, workflow processes, designs, know-how and other technologies provided by Univa as part of the Univa Software are the proprietary property of Univa and its licensors, and all right, title and interest in and to such items, including all associated intellectual property rights, remain only with Univa. The Univa Software is protected by applicable copyright, trade secret, and other intellectual property laws. Customer may not remove any product identification, copyright, trademark or other notice from the Univa Software.

(b) Confidentiality. Recipient may not disclose Confidential Information of Discloser to any third party or use the Confidential Information in violation of this agreement.

(i) Confidential Information means all proprietary or confidential information that is disclosed to the recipient (Recipient) by the discloser (Discloser), and includes, among other things:

  • any and all information relating to Univa Software or Support provided
  • by a Discloser, its financial information, software code, flow charts,
  • techniques, specifications, development and marketing plans,
  • strategies, and forecasts;
  • as to Univa the Univa Software and the terms of this agreement
  • (including without limitation, pricing information).

(ii) Confidential Information excludes information that:

  • was rightfully in Recipient's possession without any obligation of
  • confidentiality before receipt from the Discloser;
  • is or becomes a matter of public knowledge through no fault of
  • Recipient;
  • is rightfully received by Recipient from a third party without
  • violation of a duty of confidentiality;
  • is independently developed by or for Recipient without use or access
  • to the Confidential Information; or is licensed under an open source
  • license.

Customer acknowledges that any misuse or threatened misuse of the Univa Software may cause immediately irreparable harm to Univa for which there is no adequate remedy at law. Univa may seek immediate injunctive relief in such event.

5. PAYMENT. Customer will pay all fees due under an order within 30 days of the invoice date, plus applicable sales, use and other similar taxes.

6. WARRANTY DISCLAIMER. UNIVA DISCLAIMS ALL EXPRESS AND IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION THE IMPLIED WARRANTY OF TITLE, MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE UNIVA SOFTWARE MAY NOT BE ERROR FREE, AND USE MAY BE INTERRUPTED.

7. TERMINATION. Either party may terminate this agreement upon a material breach of the other party after a 30 days notice/cure period, if the breach is not cured during such time period. Upon termination of this agreement or expiration of an order, Customer must discontinue using the Univa Software, de-install it and destroy or return the Univa Software and all copies, within 5 days. Upon Univa request, Customer will provide written certification of such compliance.

8. SUPPORT INCLUDED. Univa technical support and maintenance services (Support) is included with the fees paid under an order. Univa may change its Support terms, but Support will not materially degrade during any paid term. More details on Support are located at www.univa.com/support

9. LIMITATION OF LIABILITY AND DISCLAIMER OF DAMAGES. There may be situations in which, as a result of material breach or other liability, Customer is entitled to make a claim for damages against Univa. In each situation (regardless of the form of the legal action (e.g. contract or tort claims)), Univa is not responsible beyond:

(a) the amount of any direct damages up to the amount paid by Customer to Univa in the prior 12 months under this agreement; and

(b) damages for bodily injury (including death), and physical damage to tangible property, to the extent caused by the gross negligence or willful misconduct of Univa employees while at Customer’s facility.

Other than for breach of the Confidentiality section by a party, the infringement indemnity, violation of Univa’s intellectual property rights by Customer, or for breach of Section 2 by Customer, in no circumstances is either party responsible for any (even if it knows of the possibility of such damage or loss):

(a) loss of (including any loss of use), or damage to: data, information or hardware;

(b) lost profits, business, or goodwill; or

(c) other special, consequential, or indirect damages

10. INTELLECTUAL PROPERTY INDEMNITY. If a third-party claims that Customer’s use of the Univa Software under the terms of this agreement infringes that party's patent, copyright or other proprietary right, Univa will defend Customer against that claim at Univa’s expense and pay all costs, damages, and attorney's fees, that a court finally awards or that are included in a settlement approved by Univa, provided that Customer:

(a) promptly notifies Univa in writing of the claim; and

(b) allows Univa to control, and cooperates with Univa in, the defence and any related settlement.

If such a claim is made, Univa could continue to enable Customer to use the Univa Software or to modify it. If Univa determines that these alternatives are not reasonably available, Univa may terminate the license to the Univa Software and refund any unused fees.

Univa’s obligations above do not apply if the infringement claim is based on the use of the Univa Software in combination with products not supplied or approved by Univa in writing or in the Univa Software, or Customer’s failure to use any updates within a reasonable time after such updates are made available.

This section contains Customer’s exclusive remedies and Univa’s sole liability for infringement claims.

11. GOVERNING LAW AND EXCLUSIVE FORUM. This agreement is governed by the laws of the State of Texas, without regard to conflict of law principles. Any dispute arising out of or related to this agreement may only be brought in the state and federal courts for Travis County, TX. Customer consents to the personal jurisdiction of such courts and waives any claim that it is an inconvenient forum. The prevailing party in litigation is entitled to recover its attorneys’ fees and costs from the other party.

12. MISCELLANEOUS.

(a) Inspection. Univa, or its representative, may audit Customer’s usage of the Univa Software at any Customer facility. Customer will cooperate with such audit. Customer agrees to pay within 30 days of written notification any fees applicable to Customer’s use of the Univa Software in excess of the license.

(b) Entire Agreement. This agreement, and all orders, constitute the entire agreement between the parties, and supersedes all prior or contemporaneous negotiations, representations or agreements, whether oral or written, related to this subject matter.

(c) Modification Only in Writing. No modification or waiver of any term of this agreement is effective unless signed by both parties.

(d) Non-Assignment. Neither party may assign or transfer this agreement to a third party, except that the agreement and all orders may be assigned upon notice as part of a merger, or sale of all or substantially all of the business or assets, of a party.

(e) Export Compliance. Customer must comply with all applicable export control laws of the United States, foreign jurisdictions and other applicable laws and regulations.

(f) US Government Restricted Rights. The Univa Software is provided with RESTRICTED RIGHTS. Use, duplication, or disclosure by the U.S. government or any agency thereof is subject to restrictions as set forth in subparagraph (c)(I)(ii) of the Rights in Technical Data and Computer Software clause at DFARS 252.227-7013 or subparagraphs (c)(1) and (2) of the Commercial Computer Software Restricted Rights at 48 C.F.R. 52.227-19, as applicable.

(g) Independent Contractors. The parties are independent contractors with respect to each other.

(h) Enforceability. If any term of this agreement is invalid or unenforceable, the other terms remain in effect.

(i) No PO Terms. Univa rejects additional or conflicting terms of a Customer’s form-purchasing document.

(j) No CISG. The United Nations Convention on Contracts for the International Sale of Goods does not apply.

(k) Survival. All terms that by their nature survive termination or expiration of this agreement, will survive.

Additional software specific licensing terms:

UniCloud Kits

  • Third Party Software means certain third-party software which is provided along with the Univa Software, and such software is licensed under the license terms located at: http://www.univa.com/resources/licenses
  • Open Source Software means certain opens source software which is provided along with the Univa Software, and such software is licensed under the license terms located at: http://www.univa.com/resources/licenses

Grid Engine

  • Third Party Software means certain third-party software which is provided along with the Univa Software, and such software is licensed under the license terms located at: http://www.univa.com/resources/licenses
  • Open Source Software means certain opens source software which is provided along with the Univa Software, and such software is licensed under the license terms located at: http://www.univa.com/resources/licenses

Rev: 03/09/2011

Fixes and Enhancements

Summary

Univa Grid Engine v8.1 support is available from http://univa.com/products/grid-engine.php

Here is a summary of things that have changed since version 8.0.1

  • Introduced a new configuration object: Job Classes. They allow to
    • specify job templates that can be used to create new jobs.
    • reduce the learning curve for users submitting jobs.
    • avoid errors during the job submission or jobs which may not fit site requirements.
    • ease the cluster management for system administrators.
    • provide more control to the administrator for ensuring jobs are in line with the cluster set-up.
    • define defaults for all jobs that are submitted into a cluster.
    • improve the performance of the scheduler component and thereby the throughput in the cluster.
  • Due to the Job Class enhancements qstat output has slightly been changed compared to 8.0.1. The qstat -ext|-urg|-pri shows an additional column with the name of the job class variant a job might have been derived from. Also qstat -j <jid> shows this information as well as the corresponding XML output when the switches are used in combination with the -xml switch.
  • The processors attribute of the queue configuration has been removed.
  • Moved decision about core binding from execution host to scheduler in order to guarantee a binding and make a better host selection.
  • Core binding request on command line for parallel job is now a per host request (instead of a per qrsh -inherit request before). Core binding is now better supported for parallel jobs, i.e. when submitting with "-binding pe ..." the pe_hostfile contains now different core binding decisions for different hosts. Before only the core selection of the master host is used for all slave nodes.
  • Added a new complex type RSMAP, which allows to set a set of strings as resources. The job is mapped to the selected strings. The selected strings are available for the job through an environment variable (SGE_HGR_<rsmap>). qstat -j output shows the selected strings as well. RSMAP do support all kinds of jobs: resource reservtion, parallel jobs, and array jobs.
  • Added NUMA scheduling capablities, which respects memory per NUMA node (using -mbind with -l m_mem_free) and sets memory allocation modes on lx-amd64 hosts.
  • Added new resource complexes:
m_cache_l1          mcache1    MEMORY      <=    YES         NO         0        0
m_cache_l2          mcache2    MEMORY      <=    YES         NO         0        0
m_cache_l3          mcache3    MEMORY      <=    YES         NO         0        0
m_mem_free          mfree      MEMORY      <=    YES         YES        0        0
m_mem_free_n0       mfree0     MEMORY      <=    YES         YES        0        0
m_mem_free_n1       mfree1     MEMORY      <=    YES         YES        0        0
m_mem_free_n2       mfree2     MEMORY      <=    YES         YES        0        0
m_mem_free_n3       mfree3     MEMORY      <=    YES         YES        0        0
m_mem_total         mtotal     MEMORY      <=    YES         YES        0        0
m_mem_total_n0      mmem0      MEMORY      <=    YES         YES        0        0
m_mem_total_n1      mmem1      MEMORY      <=    YES         YES        0        0
m_mem_total_n2      mmem2      MEMORY      <=    YES         YES        0        0
m_mem_total_n3      mmem3      MEMORY      <=    YES         YES        0        0
m_mem_used          mused      MEMORY      >=    YES         NO         0        0
m_mem_used_n0       mused0     MEMORY      >=    YES         NO         0        0
m_mem_used_n1       mused1     MEMORY      >=    YES         NO         0        0
m_mem_used_n2       mused2     MEMORY      >=    YES         NO         0        0
m_mem_used_n3       mused3     MEMORY      >=    YES         NO         0        0
m_numa_nodes        nodes      INT         <=    YES         NO         0        0
m_topology_inuse    utopo      RESTRING    ==    YES         NO         NONE     0
m_topology_numa     unuma      RESTRING    ==    YES         NO         NONE     0
  • The resource complex "m_mem_free" is a consumable as well as a reported load value. The scheduler takes the minimum for scheduling decisions into account. It is initialized in the complex_values field of the host configuration. When using with core binding it automatically turns into "m_mem_free_nX" requests (memory per socket (NUMA node) requests) depending which cores the scheduler had chosen. These additional requests are called implicit requests.
  • The qstat -j output contains now implicit requests, which are displayed per host.
  • The qstat -j output for "binding" changed from a topology based output (e.g. "SCCcc") to a numerical output (e.g. 0,2:0,3). It shows all bindings for all hosts in cases of parallel jobs.
  • New scheduler parameter: PE_SORT_ORDER which could be ASCENDING or DESCENDING. This determines the order in which PE's are traversed.
  • New scheduler parameter: PREFER_SOFT_REQUESTS which can be true or false (default). If true fulfilling soft requests is more important than scheduling time in case of reservation.
  • New fair urgency policy used to achieve an even distribution of jobs on resources (scheduler configuration attribute fair_urgency_list).
  • New sge_qmaster spooling method to a PostgreSQL database as an alternative to Berkeley DB spooling on NFS4.
  • Improved behaviour of the -masterq switch. The -masterq request is now always fulfilled. If it contradicts the allocation_rule of the parallel environment, the allocation_rule is obeyed and a further task is added automatically to fulfill the -masterq request.
  • Added templates for out of the box tight integration of the most common MPI implementations.
  • The execd-parameter KEEP_ACTIVE was extended by the options "ERROR" and "ALWAYS". If set to ERROR, all job relevant log files will be sent to $SGE_ROOT/$SGE_CELL/faulty_jobs/$job_id if the jobs fails. If set to ALWAYS, the log files of all jobs will be sent. KEEP_ACTIVE=ERROR is set on every default installation.

List of Fixes and Enhancements

Univa Grid Engine 8.1.0

GE-1412	additional pseudo variable $sge_root for pe definition
GE-1926	no info messages in execd messages file on aix
GE-2418	qrsh fails with 'connection refused' error message
GE-2601	multiple occurence of same compile parameters in aimk
GE-2603	qsub option -q breaks -masterq
GE-2643	accounting and online usage of jobs are wrong on aix
GE-2841	submit(1) man page reports that qrsh does not support -display option
GE-3132	Job validation behavour changed since 6.0 / 6.1
GE-3214	manpage queue_conf does not fully describe 'slots' notation
GE-3265	array jobs with PE and dependencies killing qmaster
GE-3299	On Windows Vista Enterprise, sgeexecd can fail to start up at boot time
GE-3302	net continue SGE_Helper_Service.exe STOPS the service
GE-3304	no accounting information for Windows GUI jobs
GE-3354	Cache sizes and cache topology should be reported by GE execution hosts per default
GE-3363	new spooling method writing data into a relational database
GE-3364	evaluate / fix / improve the spooling performance tests
GE-3365	create a prototype for database spooling using postgres
GE-3373	Create a default parallel environment for OpenMP jobs (pe_slots), which is available right after installation
GE-3386	Multi-core NUMA awareness and binding. 
GE-3390	qrsh does not forward necessary environment variables
GE-3403	Add support to Grid Engine for GPUs
GE-3409	Out of Box support for MPI Libraries - likely OpenMPI
GE-3414	sge_execd sometimes hangs during daemonization
GE-3440	file descriptor -1 passed to system calls in interactive job support
GE-3441	change default shell_start_mode from posix_compliant to unix_behavior in global config and in default queues (all.q)
GE-3456	adding a new complex type "resource map" RSMAP
GE-3474	PDC_INTERVAL=NEVER does not work
GE-3480	qstat -xml -j output changes after a job is altered with qalter -l
GE-3483	Within JSV it is not possible to distinguish if -v or -V was used
GE-3484	submit client and host is not available in JSV
GE-3498	remove reporting_param log_consumables
GE-3503	on Windows, the loadcheck.exe binary output misses a line break
GE-3505	fstype binary doesn't detect NFS4 on Linux 
GE-3509	qacct segfaults when bootstrap file was not found
GE-3512	gdi_retries option shall also have effect on sending gdi requests
GE-3514	Pass data as part of GDI return value
GE-3546	adjust JC from JSV
GE-3575	qmaster can't read spooled jobs after a hot upgrade to version 8.0.0
GE-3583	Cleanup: Move sgeijs-lib in clients/common
GE-3591	sge_shepherd might not deliver a signal because "remaining_alarm" might be 0
GE-3595	output np_load_avg instead of load_avg in qstat -f and qhost
GE-3603	RSMAP complex must be compatible with a per JOB consumable
GE-3604	Qconf man pages have wrong object_spec for Resource Quotas
GE-3607	62u5 clients causes a segmentation fault of a 8.0.0 qmaster
GE-3609	CMDNAME needs to be documented in the wiki
GE-3611	adding qstat resource map output to qstat -xml output
GE-3612	add new version 8.1.0alpha1
GE-3620	adding memory per NUMA node reporting
GE-3623	Add all inherited environment variables of qrsh to documentation and man-page
GE-3627	Limit number of multi GDI get requests in qmaster
GE-3628	Make it possible to disable sending of environment variables in combination with qstat -j requests
GE-3629	qstat -j * should show only own jobs per default
GE-3630	documentation shall contain a matrix listing the rights of user/operator/manager for the different commands
GE-3637	wrong header file might be included when GE is compiled with UGE extensions
GE-3638	ship the CUDA load sensor with Grid Engine
GE-3640	GUI- and text installer both don't show the trial license when they are compiled in trial mode
GE-3642	Make it possible to extract source code documentation with aimk
GE-3643	queue/job error states should be explained in more detail
GE-3648	About-Dialog of qmon is not readable
GE-3651	Better description of master and slave queues in Parallel Environments.
GE-3656	add description of UGE_Starter_Service.exe to documentation
GE-3659	add a way to control error state behavior of DRMAA jobs
GE-3673	fix Insure errors reported for IJS in 8.0.1alpha2
GE-3674	make qrsh without command work with Insure instrumented sge_shepherd, qrsh_starter, qsh
GE-3682	add support for kernel 3.0
GE-3686	scheduler monitoring enable/disable is not clear
GE-3695	sharetree man page doesn't explain internal nodes
GE-3708	bundle a script which adds the complexes reported by the CUDA load sensor to the complex configuration
GE-3713	qrsh (without command) fails if something is entered in qrsh client during job start
GE-3715	online job usage is not reported on AIX
GE-3718	on Windows, if run as the local Administrator, qstat prints 'invalid user name "Administrator"'
GE-3722	qsub -sync y and drmaa clients on AIX cannot connect to sge_qmaster
GE-3724	FD are close to fast which leads to problems in combination with nsswitch module from BeyondTrust
GE-3731	Java DRMAA Error : can't send response for this message id - protocol error
GE-3743	prevent sge_execd to crash when "/" is not a directory and in out of memory scenarios
GE-3746	pe array jobs put queues in error state with a file not found error for the job script
GE-3748	scheduler has to make the decision about core selection in case of core binding
GE-3755	tightly integrated parallel jobs are not correctly handled when suspending/unsuspending queues
GE-3756	qalter -verify is not shown in qalter -help output
GE-3757	reduce impact of qstat -j "*" on qmaster in clusters with many jobs
GE-3758	pe tasks of tightly integrated parallel jobs are never suspended
GE-3760	"qsub -pty yes -b y tty" exits with an exit code 3
GE-3761	security hole in UGE when setting LD_PRELOAD or LD_LIBRARY path
GE-3763	enhance loadcheck -cb with memory binding capability checks and memory binding testing
GE-3764	shepherd consumes 100% CPU if IJS does not use builtin as starter method
GE-3765	documentation of XFILESEARCHPATH in qmon man page and Qmon config file missing 
GE-3767	the output of qsub -pty yes jobs is not written to the jobs output file
GE-3768	a pty is created for the pre- and post-commands of any -pty yes job
GE-3773	qrsh -pty yes fails if invoked within a qsub-job
GE-3775	UGE crash with error in qmaster message: got NULL element for JB_type
GE-3781	calling JSV "jsv_set_param binding_exp_n 0" twice segfaults qsub or qmaster
GE-3783	Missing field in mpich template
GE-3785	bash shell functions are not properly transferred in the environment
GE-3790	after job end sge_shepherd processes stay running
GE-3792	auto installation fails if EXEC_HOST_LIST points to a file containing host names
GE-3793	the scheduler thread can be stopped by a normal user
GE-3810	resource reservation does not work correctly with serial jobs
GE-3812	user can be added to multiple departments but it should be denied
GE-3829	importance of soft requests in resource reservations should be higher than earlier start time (make it configurarble)
GE-3831	need a "fair urgency" policy
GE-3832	Execd init script does not stop sge_execd daemon on non MacOS hosts
GE-3839	getJobProgramStatus call in drmaa is throwing DRMCommunicationException
GE-3841	possible buffer overflow in command line parsing of sgepasswd
GE-3846	builtin qrsh <command> fails if data still has to be transferred after job end
GE-3847	the 48 core limit in the test binaries should apply only to running execution hosts, not to all that are registered at the QMaster
GE-3848	qstat -fjc
GE-3849	document required mount options for the spooling shared file system
GE-3857	People allowed to create AR's is not documented.
GE-3860	parallel environment selection order in case of wildcards pes should be configurable
GE-3863	host_alias file is not covered in documentation
GE-3866	qsub -w p $SGE_ROOT/examples/jobs/sleeper.sh is crashing
GE-3887	not helpfull error message in qmaster messages file, regarding sharetree
GE-3900	both man page and documentation must explain how the new -masterq switch works
GE-3903	sge_aliases file is not backed up and restored
GE-3904	man page sge_conf is unclear about the meaning of projects/xprojects
GE-3905	Reconcile the Archimedes Hadoop docs with the current Grid Engine Hadoop docs
GE-3907	slave tasks are wrongly scheduled to the master queue
GE-3918	qalter man page: wrong tag is used for italic text
GE-3921	qmod man page should mention that only the master task of a parallel job gets suspended
GE-3923	GUI Installer must support Postgres spooling method
GE-3924	if "-soft -q <queue>" is specified, "-masterq" doesn't prevent slave tasks from being scheduled to the master queue
GE-3925	Support chkconfig Tool for RHEL Variants
GE-3926	qstat -j "*" -xml is crashing
GE-3932	can delete a complex attribute which is referenced in the load_formula by shortcut
GE-3943	'long_term_usage' in the 'users' spool object is not spooled when QMaster is shut down
GE-3944	sgeexecd script returns 0 even if it was not able to stop the execution daemon because of missing permissions
GE-3949	Extend KEEP_ACTIVE execd param
GE-3950	the fstype binary doesn't report the right file system types anymore
GE-3952	test_sge_lock_fifo test binary throws a floating point exception from time to time
GE-3954	'job duration is longer than duration of AR' message is no longer printed if requested job runtime exceeds AR duration
GE-3955	the caller of qalter, not the job owner, is checked against the ACL of a project
GE-3956	the argument of the '-r' option of qsub is ignored, if '-r' is provided, it is assumed it is always set to 'yes'.
GE-3957	remove the warning "Job Done" in the execd messages file
GE-3958	qresub of a job in hold state doesn't clear the hold state for the new job
GE-3959	make berkeley db spooling platform independent
GE-3960	update openssl used for CSP mode to openssl-1.0.0
GE-3961	update Berkeley DB to current version 5.3.21
GE-3962	job hold doesn't ignore jobs that are not currently in the system
GE-3963	the halflife_decay_list parser doesn't reject invalid separators
GE-3966	qmon works only partly on lx-amd64
GE-3971	qsub option "-shell <y[es]|n[o]>" is ignored
GE-3972	Releasing jobs with qrls -u <username> aborts all held jobs of this user
GE-3973	failing DRMAA jobs are always set to error state, even if SGE_DRMAA_ENABLE_ERROR_STATE is not set
GE-3974	Remove 'Deprecated' Message on SHARETREE_RESERVED_USAGE option in sge_conf
GE-3975	Remove Hadoop integration from default install
GE-3978	qalter -cwd does not change the working directory of a job
GE-3979	qalter -c does not change the time when a job should be checkpointed
GE-3980	the sge_conf(5) man page doesn't mention the key word "infinity" for resource limits
GE-3982	adding support for different core binding decisions (selected processors) to PE hostfile
GE-3984	for PE jobs different core bindings on different hosts should be displayed in the qstat output
GE-3988	qalter prints error message but alters the job
GE-3991	SUBMIT_HOST parameter in JSV is missing
GE-3992	on AIX, win and 32 bit linux, qstat prints out error message: PE_RANGE_ALG=bin is not a vaild parameter, qconf is not working
GE-3993	Interactive jobs have no name
GE-3994	cannot start a 32bit drmaa application on Solaris when CSP mode is installed
GE-3998	it is not possible to change mbind with qalter
GE-4000	abort of qmaster when job derived from a JC changes PE request
GE-4001	qalter -pe shows incorrect message in case of success
GE-4002	classic spooling of job fails sometimes when pe name or range of a parallel job is changed
GE-4004	csp mode is broken on multiple platforms like aix51, hp11-ia64
GE-4006	DISPLAY variable is send as part of the 'full environment' although -V is not used
GE-4014	Set KEEP_ACTIVE=ERROR as default
GE-4015	adding sections about RSMAP and mbind in man pages
GE-4017	qconf -mc always shows message about m_topology_numa
GE-4018	fair urgency policy is broken after some time
GE-4025	qsub/qalter -mods is not able to change CMDARG elements of a job
GE-4029	qstat -xml -j <not_existing_jid> prints invalid XML
GE-4031	when a tightly integrated job is rescheduled with qmod -rj the slave tasks are not signalled
GE-4032	qmaster crashes when a parallel job requests soft resources
GE-4036	adding a execution daemon parameter in order to disable reporting of m_mem_free
GE-4038	backup/restore mechanism of inst_sge does not restore all files
GE-4039	sge_execd silently ignores duplicate job delivery
GE-4041	restore with spooling method postgres should clear the database
GE-4044	make the cuda loadsensor Makefile available in the load sensor directory
GE-4051	wrong message when resubmitting a job with hold
GE-4052	test_enumeration binary crashes because it calls lFreeDescr() with NULL pointer
GE-4053	setting SGE_BINDING variable in any case of binding
GE-4056	epilog with exit status 1 doesn't set queue in error state
GE-4057	adding SGE 6.5u5 compatibility wrapper script for qhost which transforms non existent "-cb" switch
GE-4060	qalter man page is lacking description of the -clearp option
GE-4061	'qrsh -inherit -v ENVVAR=value' doesn't transfer the value to the execution daemon
GE-4062	It is possible to add not existing queue references to a job via -soft -q of qalter/qsub.
GE-4063	qalter -mods q_soft/q_hard does not change the specified queue name
GE-4064	qresub as non deadline user fails
GE-4065	Deadline and start time of a job cannot be cleared with qsub/qalter -clearp
GE-4070	restore (inst_sge -rst) is failing with Postgres spooling to a remote database server
GE-4072	qstat -xml output contains invalid tag <context list> when job context is changed with qalter.
GE-4073	qstat -xml output might contain a tag without a name if the list of mail recipients is changed
GE-4078	add scripts which create a OpenMPI rankfile out of the pe_hostfile
GE-4081	low level spooling info or error messages may not be propagated to clients (e.g. qsub)
GE-4086	the qmaster generates wrong m_mem_used complex value if it doesn't exist
GE-4088	parallel job which could run is not started when using job exclusive complex
GE-4092	job classes bug when submitting binary job
GE-4094	when qmaster is started on command line in debug mode unneccesary information is printed out at the beginning
GE-4096	provide a script for reformatting an accounting file to reporting file
GE-4098	misleading error message when initializing the Postgres database during installation
GE-4104	support non standard port for postgres spooling
GE-4107	user_lists in job classes does not check existence of ACL if this ACL is preceded by user entry
GE-4108	strange error message when showing job class variant with qconf -sjc
GE-4117	free and used memory per NUMA node is wrong for multi-socket hosts

Univa Grid Engine 8.1.1

TS-531	create testsuite test for GE-4122	
TS-501	jsv_ge_mod test is broken in rulevel 1	
GE-4162	automatically create per socket memory consumables (m_mem_free_nX) for more than 4 sockets as soon as bigger exec hosts connect to master	
GE-4156	-mbind nlocal with parallel jobs does not decrement per socket memory correctly	
GE-4152	GUI installer has problems with installation in case of BDB spooling	
GE-4148	qalter using job names instead of JID's to identify jobs cause job names to be reset to (null)	
GE-4144	very short qrsh job sometimes seems to fail	
GE-4133	the qrsh client hangs for parallel jobs if consumables are requested	
GE-4129	past usage information of tightly integrated jobs can get lost at qmaster restart with postgres spooling	
GE-4121	qrsh sometimes crashes at job end	
GE-4113	qsub -help output shows -pty not in alphabethic order	
GE-3919	GE-3386 adding a possibility to express that amount of cores can also be amount of threads depending on the selected execution host	
GE-3769	GE-3386 provide a technical paper which discusses all memory requests UGE offer for sequential and parallel jobs

Fixed with the first patch (UGE 8.1.1p1)

GE-4199 qstat -j does not return with an error code if no job can be selected according to a specified job name or pattern	

Univa Grid Engine 8.1.2

GE-4198 qmon crash if user tries to submit an array job
GE-4196 qmon opens a error message dialog at job submission -> job submission is not possible
GE-4195 The JB_tgt job attribute is cleared when using the AFS security model
GE-4172 qmaster crash if qsub job class is specified with a binary submission but without -b y switch
GE-4151 add option for triggering a scheduling run without writing the schedd_runlog file
GE-4150 too slow reservation scheduling
GE-3776 jobs submitted from an old qsub binary are accepted and might crash qmaster

Univa Grid Engine 8.1.3

GE-4272 wrong DRMAA errno after session reconnect and status check of an array job which is in hold
GE-4266 tightly integrated jobs do not set SGE_HGR_ for slave tasks correctly
GE-4244 KEEP_ACTIVE=ERROR does not transfer all files and doesn't transfer files to the right location
GE-4243 qmaster crashes when KEEP_ACTIVE=ERROR and the path to $SGE_ROOT/$SGE_CELL/faulty_jobs is very short
GE-4231 Non-LSB RedHat Installations Fail Init Service Install
GE-4226 RSMAP id is not freed in certain expectional error conditions
GE-4211 qmaster becomes unresponsive when submitting a large job net
GE-4201 event client id's are not always reused
GE-4199 qstat -j does not return with an error code if no job can be selected according to a specified job name or pattern
GE-4194 improve scheduling performance with exclusive complexes
GE-4147 GE-4135 enhance RMSAP consumable so that it can be a per HOST consumable
GE-4143 qstat -u @unix_group does not work
GE-4137 GE-4135 create an installation script for Intel Phi load sensor complexes
GE-4136 GE-4135 create a load sensor which reports metrics about Intel Phi (MIC) co-processor cards
GE-4101 resource reservation for a (job) exclusive consumable does not work
GE-4058 The KEEP_ACTIVE=ERROR functionality does not always transfer the generated_job_script
GE-4023 GE-4135 adding support for binding a job near a specific (by the scheduler granted) PCIe device for better I/O rate
GE-3433 Add time stamp to debug output
GE-3313 on Windows hosts reporting their full qualified hostname, certificates got wrong names
GE-2894 SGE 6.2u2 Beta execd install failed on Windows 2008 server
GE-2842 listener threads get stuck in cl_commlib_receive_message

Supported Platforms and Upgrade Notes

Supported Platforms

Univa Grid Engine 8.1 supports various hardware architectures and versions of operating systems.

Supported Platforms, Operating Systems and Architectures
Operating System Version Architecture
SLES 10,11 x86, x86-64
RHEL 4-5.6, 6-6.3 x86, x86-64
CentOS 4-5.6, 6-6.3 x86, x86-64
Oracle Linux 4-5.6, 6-6.3 x86, x86-64
Ubuntu Server 10.04LTS-10.10 x86, x86-64
Microsoft Windows1 Vista, Server 2008 R1, HPC Server 2003, XP SP3 (x86 only) x86, x86-64
Oracle Solaris 9,10 x86_64
HP-UX 11.0 or higher 32 and 64bit
IBM AIX 5.3, 6.1 or later 64 bit

Hosts running the Microsoft Windows operations system cannot be used as master or shadow hosts.

Upgrade Requirements

This is a summary of the Upgrade Matrix that describes how you can make the transition from Sun or Oracle Grid Engine 6.2uX or Univa Grid Engine 8.0.X to Univa Grid Engine 8.1 when you currently use classic or BDB local spooling. If your current version of Grid Engine you are using is missing in the overview then please have a look into the full Upgrade Matrix located in the section Updating Univa Grid Engine of the Installation Guide.

Upgrading from SGE, OGE and UGE 8.0.X to UGE 8.1.X
Version Upgrade Method
Sun Grid Engine 6.2u5 Backup/Restore
Sun Grid Engine 6.2u4 Upgrade to SGE 6.2u5 and then Backup/Restore
Sun Grid Engine 6.2u3 Upgrade to SGE 6.2u5 and then Backup/Restore
Sun Grid Engine 6.2u2 Upgrade to SGE 6.2u5 and then Backup/Restore
Sun Grid Engine 6.2u1 Upgrade to SGE 6.2u5 and then Backup/Restore
Sun Grid Engine 6.2 FCS Upgrade to SGE 6.2u5 and then Backup/Restore
Oracle Grid Engine 6.2u6 Backup/Restore
Oracle Grid Engine 6.2u7 Backup/Restore
Univa Grid Engine 8.0.X Backup/Restore


For any upgrade the backup/restore mechanism as outlined in the section Clone Configuration of the Installation Guide must be used to upgrade to Univa Grid Engine 8.1.

Known Issues and Limitations

Slotwise Preemption

If OGE 6.2u6 or OGE 6.2u7 was used with parallel jobs in slotwise preemption, be aware that Univa Grid Engine does not support this.