Centos 7, Torque 单节点部署_weixin_30648587的博客-程序员资料

技术标签: 运维  操作系统  数据库  

1.准备工作

 

安装Torque必须首先配置linux主机名称,服务器主机名称大多默认localhost,不建议直接使用localhost。

linux主机名称修改地址:http://www.cnblogs.com/smbin/p/8488909.html

 

linux系统:Centos 7

主机名称:master

系统用户:root

 

Torque官网下载地址:http://www.adaptivecomputing.com/support/download-center/torque-download/

作者下载的版本:http://wpfilebase.s3.amazonaws.com/torque/torque-6.1.2.tar.gz

 

 

2.安装和配置Torque

 

首先在/opt下创建文件夹torque,在此文件夹中下载压缩包,并解压下载并解压Torque文件

[[email protected] ]# cd /opt
[[email protected] ]# mkdir torque
[[email protected] ]# cd torque
[[email protected] torque]# wget http://wpfilebase.s3.amazonaws.com/torque/torque-6.1.2.tar.gz
......省略下载过程
[[email protected] torque]# tar -zxvf torque-6.1.2.tar.gz
......省略解压过程
[[email protected] torque]#cd torque-6.1.2/
[[email protected] torque-6.1.2]#

 

 

加载、安装和master配置。master配置就是主机和PBS之间的配置,master就是主机名

[[email protected] torque-6.1.2]# yum install libxml2-devel openssl-devel gcc gcc-c++ boost-devel libtool-y
Loaded plugins: fastestmirror, langpacks
base                                                                                                                                                                                      | 3.6 kB  00:00:00     
extras                                                                                                                                                                                    | 3.4 kB  00:00:00     
mysql-connectors-community                                                                                                                                                                | 2.5 kB  00:00:00     
mysql-tools-community                                                                                                                                                                     | 2.5 kB  00:00:00     
mysql56-community                                                                                                                                                                         | 2.5 kB  00:00:00     
updates                                                                                                                                                                                   | 3.4 kB  00:00:00     
Determining fastest mirrors
 * base: mirrors.cn99.com
 * extras: mirrors.tuna.tsinghua.edu.cn
 * updates: mirrors.tuna.tsinghua.edu.cn
Package libxml2-devel-2.9.1-6.el7_2.3.x86_64 already installed and latest version
Package 1:openssl-devel-1.0.2k-8.el7.x86_64 already installed and latest version
Package gcc-4.8.5-16.el7_4.1.x86_64 already installed and latest version
Package gcc-c++-4.8.5-16.el7_4.1.x86_64 already installed and latest version
Package boost-devel-1.53.0-27.el7.x86_64 already installed and latest version
No package libtool-y available.
Nothing to do
[[email protected] torque-6.1.2]# ./configure --prefix=/usr/local/torque --with-scp--with-default-server=master
......省略加载过程
Building components: server=yes mom=yes clients=yes
                     gui=no drmaa=no pam=no
PBS Machine type    : linux
Remote copy         : /bin/scp -rpB
PBS home            : /var/spool/torque
Default server      : master

Unix Domain sockets : 
Linux cpusets       : no
Tcl                 : disabled
Tk                  : disabled
Authentication      : trqauthd

configure: WARNING: This compilation has strict compiler options enabled that cause
the build to fail if any compiler warnings are emitted.  If this build fails
because of a harmless warning, please report the problem to [email protected]org
and run configure again without --enable-gcc-warnings.

Ready for 'make'.
[[email protected] torque-6.1.2]# make
......省略加载过程
[[email protected] torque-6.1.2]# make install
......省略加载过程
[[email protected] torque-6.1.2]# make packages

  [[email protected] torque-6.1.2]# make packages
  Building packages from /opt/torque/torque-6.1.2/tpackages
  rm -rf /opt/torque/torque-6.1.2/tpackages
  mkdir /opt/torque/torque-6.1.2/tpackages
  Building ./torque-package-server-linux-x86_64.sh ...
  libtool: install: warning: remember to run `libtool --finish /usr/local/torque/lib'          //需要去执行命令:libtool --finish /usr/local/torque/lib
  Building ./torque-package-mom-linux-x86_64.sh ...
  libtool: install: warning: remember to run `libtool --finish /usr/local/torque/lib'
  Building ./torque-package-clients-linux-x86_64.sh ...
  libtool: install: warning: remember to run `libtool --finish /usr/local/torque/lib'
  Building ./torque-package-devel-linux-x86_64.sh ...
  libtool: install: warning: remember to run `libtool --finish /usr/local/torque/lib'
  Building ./torque-package-doc-linux-x86_64.sh ...
  Done.

  The package files are self-extracting packages that can be copied
  and executed on your production machines. Use --help for options.
  [[email protected] torque-6.1.2]# libtool --finish /usr/local/torque/lib
  libtool: finish: PATH="/usr/lib/jvm/java-1.7.0-openjdk/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin:/usr/local/torque/bin:/usr/local/torque/sbin:/root/bin:/sbin" ldconfig -n /usr/l   ocal/torque/lib
  ----------------------------------------------------------------------
  Libraries have been installed in:
  /usr/local/torque/lib

  If you ever happen to want to link against installed libraries
  in a given directory, LIBDIR, you must either use libtool, and
  specify the full pathname of the library, or use the `-LLIBDIR'
  flag during linking and do at least one of the following:
  - add LIBDIR to the `LD_LIBRARY_PATH' environment variable
  during execution
  - add LIBDIR to the `LD_RUN_PATH' environment variable
  during linking
  - use the `-Wl,-rpath -Wl,LIBDIR' linker flag
  - have your system administrator add LIBDIR to `/etc/ld.so.conf'

  See any operating system documentation about shared libraries for
  more information, such as the ld(1) and ld.so(8) manual pages.

 

 

 

 配置服务:pbs_server PBS,pbs_sched,pbs_mom,trqauthd

[[email protected] torque-6.1.2]# cp contrib/init.d/{pbs_{server,sched,mom},trqauthd} /etc/init.d/
[[email protected] torque-6.1.2]# for i in pbs_server pbs_sched pbs_mom trqauthd; do chkconfig --add $i; chkconfig $i on; done      //遇见y/n选择y回车继续

 

 

设置Torque环境变量

[[email protected] torque-6.1.2]# TORQUE=/usr/local/torque
[[email protected] torque-6.1.2]# echo "TORQUE=$TORQUE" >> /etc/profile
[[email protected] torque-6.1.2]# echo "export PATH=\$PATH:$TORQUE/bin:$TORQUE/sbin" >> /etc/profile
[[email protected] torque-6.1.2]# source /etc/profile

 

 

以root用户启动,报错服务指向的主机名和现有主机名不一致,安装过程中暂时没有找到解决方案!安装完毕后有解决方案,在本文最下方!!!

[[email protected] torque-6.1.2]# ./torque.setup root          //尝试以root启动,报错:服务“pbs_server”已经启动
initializing TORQUE (admin: root)
pbs_server already running... run 'qterm' to stop pbs_server and rerun          //运行sterm关闭服务
[[email protected] torque-6.1.2]# qterm                        //发现服务指向的主机名称和正常显示的主机名称不一致,命令qterm无法关闭
Can not resolve name for server mastar. (rc = -2 - )
Cannot resolve specified server host 'mastar'.
qterm: could not connect to server '' (15010) Access from host not allowed, or unknown host
[[email protected] mom_priv]# ps -e | grep pbs          //查询服务,尝试以kill -9命令关闭服务
30505 ?        00:00:00 pbs_server
[[email protected] mom_priv]# kill -9 30505
[[email protected] mom_priv]# ps -e | grep pbs
[[email protected] torque-6.1.2]# ./torque.setup root        //发现服务关闭后仍无法启动,服务指向的主机名和现有主机名不一致!经确认上边配置的时候没有配置错误:
                           //‘./configure --prefix=/usr/local/torque --with-scp--with-default-server=master’ configure没有错误,未找到解决方案,怀疑是系统缓存的问题。
initializing TORQUE (admin: root)              //暂时只能修改/etc/hosts文件的内容      You have selected to start pbs_server in create mode. If the server database exists it will be overwritten. do you wish to continue y/(n)?y Can not resolve name for server mastar. (rc = -2 - ) Cannot resolve specified server host 'mastar'. qmgr: cannot connect to server (errno=15010) Access from host not allowed, or unknown host ERROR: cannot set [email protected] in operators list Can not resolve name for server mastar. (rc = -2 - ) Cannot resolve specified server host 'mastar'. qterm: could not connect to server '' (15010) Access from host not allowed, or unknown host [[email protected] torque-6.1.2]# vi /etc/hosts            //修改/etc/hosts文件 10.131.101.142 master 10.131.101.142 mastar        //添加这一行的内容 27.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6

  [[email protected] torque-6.1.2]# ./torque.setup root            //此时执行成功
  initializing TORQUE (admin: root)

  You have selected to start pbs_server in create mode.
  If the server database exists it will be overwritten.
  do you wish to continue y/(n)?y          //输入y

 

 

开始pbs_server,pbs_sched服务,pbs_mom和trqauthd

[[email protected] torque-6.1.2]# qterm          //关闭服务
[[email protected] torque-6.1.2]# for i in pbs_server pbs_sched pbs_mom trqauthd; do service $i start; done
Starting pbs_server (via systemctl):                       [  OK  ]
Starting pbs_sched (via systemctl):                        [  OK  ]
Starting pbs_mom (via systemctl):                          [  OK  ]
Starting trqauthd (via systemctl):                         [  OK  ]

 

 

 

指定计算节点

添加计算节点”master”,设置CPU的数量

检查CPU的数量通过使用命令“lscpu”或“nproc”

[[email protected] torque-6.1.2]# vi /var/spool/torque/server_priv/nodes
master np=8          //添加本行信息,注意等号前后不要有空格 master是主机名
[[email protected] torque-6.1.2]# vi /var/spool/torque/mom_priv/config
pbsserver master        //添加这两行信息  master是主机名
logevent 255

 

 

检查PBS的信息

[[email protected] torque-6.1.2]# ps -e | grep pbs
11188 ?        00:00:00 pbs_sched
11215 ?        00:00:00 pbs_mom
29683 ?        00:00:00 pbs_server
[[email protected] torque-6.1.2]# for i in pbs_server pbs_sched pbs_mom trqauthd; do service $i restart; done
Restarting pbs_server (via systemctl):                     [  OK  ]
Restarting pbs_sched (via systemctl):                      [  OK  ]
Restarting pbs_mom (via systemctl):                        [  OK  ]
Restarting trqauthd (via systemctl):                       [  OK  ]

 

 

创建队列的默认信息

[[email protected] torque-6.1.2]# qmgr -c 'create queue master'
[[email protected] torque-6.1.2]# qmgr -c 'set queue master queue_type= execution'
[[email protected] torque-6.1.2]# qmgr -c 'set queue master started= true'
[[email protected] torque-6.1.2]# qmgr -c 'set queue master enabled= true'
[[email protected] torque-6.1.2]# qmgr -c 'set queue master resources_default.walltime= 240:00:00'
[[email protected] torque-6.1.2]# qmgr -c 'set queue master resources_default.nodes= 1'
[[email protected] torque-6.1.2]# qmgr -c 'set server default_queue= master'

 

 

 提交任务测试:

[[email protected] torque-6.1.2]# qnodes      //查询计算节点的状态
master
     state = free
     power_state = Running
     np = 8
     ntype = cluster
     status = opsys=linux,uname=Linux master 3.10.0-514.el7.x86_64 #1 SMP Tue Nov 22 16:42:41 UTC 2016 x86_64,sessions=3154 3489 41105 41699,nsessions=4,nusers=3,idletime=3198,
    totmem=94868512kb,availmem=92195284kb,physmem=32367652kb,ncpus=56,loadave=0.85,gres=,netload=4005925534,state=free,varattr= ,cpuclock=Fixed,macaddr=68:cc:6e:c3:cf:87,version=6.1.2,rectime=1519980694,jobs=
mom_service_port = 15002 mom_manager_port = 15003 [[email protected] torque-6.1.2]# su master        //切换用户:此master不是主机名,而是一个用户的名字 [[email protected] torque-6.1.2]$ echo sleep 10 | qsub 0.master [[email protected] torque-6.1.2]$ qstat        //查询任务状态 Job ID Name User Time Use S Queue ------------------------- ---------------- --------------- -------- - ----- 0.master STDIN master 0 R master [[email protected] torque-6.1.2]$ qstat -a -n      //查询任务状态和每个任务占用cpu核数 master: Req'd Req'd Elap Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time ----------------------- ----------- -------- ---------------- ------ ----- ------ --------- --------- - --------- 0.master master master STDIN 12470 1 1 -- 240:00:00 C -- master/0 [[email protected] torque-6.1.2]$

 

 

 

主机名和现有主机名不一致的问题解决方案:

 这个问题一直没有找到出现的原因,但是怀疑是之前的Torque删除时没有删除干净,在“创建队列的默认信息”这一步的缓存依然存在。

在Torque安装成功后,停止Torque

[[email protected] torque-6.1.2]# for i in pbs_server pbs_sched pbs_mom trqauthd; do service $i stop; done        //停止服务T,start改为stop
Stopping pbs_server (via systemctl):                       [  OK  ]
Stopping pbs_sched (via systemctl):                        [  OK  ]
Stopping pbs_mom (via systemctl):                          [  OK  ]
Stopping trqauthd (via systemctl):                         [  OK  ]
[[email protected] torque-6.1.2]# ./torque.setup root        //重新运行这一步
hostname: master
Currently no servers active. Default server will be listed as active server. Error  15133
Active server name: master  pbs_server port is: 15001
trqauthd daemonized - port /tmp/trqauthd-unix
trqauthd successfully started
initializing TORQUE (admin: root)

You have selected to start pbs_server in create mode.
If the server database exists it will be overwritten.
do you wish to continue y/(n)?y          //输入y
[[email protected] torque-6.1.2]# vi /var/spool/torque/server_priv/nodes
master np=8           //=前后不要带空格
[[email protected] torque
-6.1.2]# qterm          //关闭pbs_server、 pbs_sched、 pbs_mom、 trqauthd服务 [[email protected] torque-6.1.2]# for i in pbs_server pbs_sched pbs_mom trqauthd; do service $i start; done        //重启服务 Starting pbs_server (via systemctl): [ OK ] Starting pbs_sched (via systemctl): [ OK ] Starting pbs_mom (via systemctl): [ OK ] Starting trqauthd (via systemctl): [ OK ]

  [[email protected] torque-6.1.2]# qnodes          //查询状态,报错服务trqauthd没有启动
  socket_connect_unix failed: 15137
  qnodes: cannot connect to server master, error=15137 (could not connect to trqauthd)
  [[email protected] torque-6.1.2]# for i in pbs_server pbs_sched pbs_mom trqauthd; do service $i restart; done        //重新启动服务
  Restarting pbs_server (via systemctl): [ OK ]
  Restarting pbs_sched (via systemctl): [ OK ]
  Restarting pbs_mom (via systemctl): [ OK ]
  Restarting trqauthd (via systemctl): [ OK ]


[[email protected] torque-6.1.2]# qnodes      //查询状态,成功
master
state = free
power_state = Running
np = 8
ntype = cluster
status = opsys=linux,uname=Linux master 3.10.0-514.el7.x86_64 #1 SMP Tue Nov 22 16:42:41 UTC 2016 x86_64,sessions=3154 3489 10903 41105 41699,nsessions=5,nusers=4,idletime=5287,totmem=94868512kb,
availmem=92236268kb,physmem=32367652kb,ncpus=56,loadave=0.01,gres=,netload=8920006882,state=free,varattr= ,cpuclock=Fixed,macaddr=68:cc:6e:c3:cf:87,version=6.1.2,rectime=1519982783,jobs=
mom_service_port = 15002 mom_manager_port = 15003

 

转载于:https://www.cnblogs.com/zhaosongbin/p/8492470.html

版权声明:本文为博主原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。
本文链接:https://blog.csdn.net/weixin_30648587/article/details/96796533

智能推荐

OpenCV探索之路(二十四)图像拼接和图像融合技术_aa3214567的博客-程序员资料

图像拼接在实际的应用场景很广,比如无人机航拍,遥感图像等等,图像拼接是进一步做图像理解基础步骤,拼接效果的好坏直接影响接下来的工作,所以一个好的图像拼接算法非常重要。再举一个身边的例子吧,你用你的手机对某一场景拍照,但是你没有办法一次将所有你要拍的景物全部拍下来,所以你对该场景从左往右依次拍了好几张图,来把你要拍的所有景物记录下来。那么我们能不能把这些图像拼接成一个大图呢?我们利用opencv

git pull提示冲突无法合并_「已注销」的博客-程序员资料

1 问题git pull拉取最新的时候提示:Please commit your changes or stash them before you merge2 原因分析自己对代码进行了修改,但是还没有提交,而远端仓库的最新代码同样对自己修改的地方做了修改,git无法判断,以自己的修改还是以远端仓库的代码为合并的版本,于是抛出这个问题。3 解决方法3.1 git reset --hard...

python嵌入窗体中的折线图更新_详解pyqt5的UI中嵌入matplotlib图形并实时刷新(挖坑和填坑)..._weixin_39558754的博客-程序员资料

一、pyqt5的UI中嵌入matplotlib的方法1、导入模块导入模块比较简单,首先声明使用pyqt5,通过FigureCanvasQTAgg创建画布,可以将画布的图像显示到UI,相当于pyqt5的一个控件,后面的绘图就建立在这个画布上,然后把这个画布当中pyqt5的控件添加到pyqt5的UI上,其次要导入matplotlib.figure的Figure ,这里要注意的是matplotlib.f...

如何快速查看一个图片是RGB的?还是灰度图像?_怎样看照片是不是rgb_CXing4300的博客-程序员资料

右击图片属性,摘要,点击详细属性,里面有位深度一项。如果是RGB图,位深度是24;如果是灰度和索引图,位深度是8;灰度是白灰黑表示的图,索引图有可能是彩色的,但也是8位深。

Tabs选项卡切换_tabs切换_weixin_44540773的博客-程序员资料

Tabs选项卡切换开发工具与关键技术:JS作者:赵纯雨班级:1803撰写时间:2019.7.5Tabs选项卡切换,他就是能够在一个页面里面通过点击Tabs选项卡进行切换,从而显示多个内容,点击进到页面之后,一般设的都是第一个标签为第一个页面,在li标签里面添加内容,你需要多少,就添加多少,代码如下:<divclass="layui-tab layui-tab-car...

如何上传项目到github_前端吕小布的博客-程序员资料

搭建环境、项目配置和项目部署这几方面,一直都是我最头痛的问题,自始至终,始终如一。这次上传我的毕设项目到github又困扰了我一点时间,而时间是很宝贵的,所以我决定记录下来最终做法,免得以后再把头痛浪费在这种地方。方法一:1、新建仓库,也就是你的项目2、填写项目名称以及项目简介,勾选“Add a README file”(.gitignore文件可要可不要,看你自己),然后点击下方的绿色按钮3、在cmd或者Git Bash切换到项目放置的目录(也即项目的父级目录,如Desktop),

随便推点

java操作QueryBuilders常见用法_querybuilders用法_子之乐鱼之乐的博客-程序员资料

package com.elasticsearch; import org.elasticsearch.action.ActionListener;import org.elasticsearch.action.search.SearchResponse;import org.elasticsearch.index.query.IndicesQueryBuilder;import org...

RuntimeError: Working outside of request context_PuTTY本无树的博客-程序员资料

Flask 1.1.x RuntimeError: Working outside of request context在fkask项目中要用到定时任务,选用了APScheduler,但是在定时任务中要用到上下文,于是按照APScheduler官方示例中使用with scheduler.app.app_context()方法,如下,结果发现没什么用,还是会报错def blah(): with scheduler.app.app_context(): # do stuff无奈,

linux系统管理-第四章Linux软件安装管理_libodbcinst.so.2()(64bit) is needed by mysql-conne_syjhct的博客-程序员资料

linux系统管理 第四章Linux软件安装管理概述在Linux平台下,软件包的类型可以划分为两类:源码包、二进制包。源码包:即程序软件的源代码(一般也叫Tarball,即将软件的源码以tar打包后再压缩的资源包)。二进制包:如 Red Hat发...

OSG学习:几何对象的绘制(三)——几何元素的存储和几何体的绘制方法_路人甲JIA的博客-程序员资料

以下内容来自: 1、《OpenSceneGraph三维渲染引擎设计与实践》王锐 钱学雷 清华大学出版社2、自己的总结从四边形的绘制和简易房屋的绘制中可以看到,几何体都是先定义顶点,然后给顶点赋属性,再定义图元,将几何体添加到叶节点中进行渲染,图形就绘制完成。所以,几何体的核心在于几何元素,几何元素使用顶点属性来表达。从例子中可以看到,顶点的属性不是一个值,而是由其空间坐标值、法线、颜色坐标、纹理坐...

转载《Oracle的tnsnames.ora配置(PLSQL Developer)》_weixin_33722405的博客-程序员资料

源地址:https://www.cnblogs.com/qq3245792286/p/6212617.html。首先打开tnsnames.ora的存放目录,一般为D:\app\Administrator\product\11.2.0\client_1\network\admin,就看安装具体位置了。步骤阅读2使用editplus或者记事本或其他工...

推荐文章

热门文章

相关标签