技术标签: Hadoop
[[email protected] hive-1.1.0-cdh5.7.0]$ which lzop
/bin/lzop
[[email protected] data]$ lzop -v page_views_big.dat
[[email protected] data]$ ls -lah
total 1.4G
drwxrwxr-x 2 hadoop hadoop 4.0K Apr 21 18:29 .
drwx------ 12 hadoop hadoop 4.0K Apr 22 01:14 ..
-rw-rw-r-- 1 hadoop hadoop 304 Apr 21 18:29 live.txt
-rw-r--r-- 1 root root 455M Apr 19 12:08 login.log
-rw-rw-r-- 1 hadoop hadoop 599M Apr 19 18:08 page_views_big.dat
-rw-rw-r-- 1 hadoop hadoop 285M Apr 19 18:08 page_views_big.dat.lzo
-rw-r--r-- 1 root root 19M Apr 18 20:47 page_views.dat
-rw-rw-r-- 1 hadoop hadoop 44 Apr 18 19:55 wc.txt
[[email protected] maven_repo]$ cd ~/software/
[[email protected] software]$ cd hadoop-lzo/
[[email protected] hadoop-lzo]$ mvn clean package -Dmaven.test.skip=true
[[email protected] target]$ ll
total 456
drwxrwxr-x 2 hadoop hadoop 4096 Apr 19 18:43 antrun
drwxrwxr-x 5 hadoop hadoop 4096 Apr 19 18:43 apidocs
drwxrwxr-x 5 hadoop hadoop 4096 Apr 19 18:43 classes
drwxrwxr-x 3 hadoop hadoop 4096 Apr 19 18:43 generated-sources
-rw-rw-r-- 1 hadoop hadoop 188970 Apr 19 18:43 hadoop-lzo-0.4.21-SNAPSHOT.jar
-rw-rw-r-- 1 hadoop hadoop 184565 Apr 19 18:43 hadoop-lzo-0.4.21-SNAPSHOT-javadoc.jar
-rw-rw-r-- 1 hadoop hadoop 52024 Apr 19 18:43 hadoop-lzo-0.4.21-SNAPSHOT-sources.jar
drwxrwxr-x 2 hadoop hadoop 4096 Apr 19 18:43 javadoc-bundle-options
drwxrwxr-x 2 hadoop hadoop 4096 Apr 19 18:43 maven-archiver
drwxrwxr-x 3 hadoop hadoop 4096 Apr 19 18:43 native
drwxrwxr-x 3 hadoop hadoop 4096 Apr 19 18:43 test-classes
[[email protected] target]$ cp hadoop-lzo-0.4.21-SNAPSHOT.jar ~/app/hadoop-2.6.0-cdh5.7.0/share/hadoop/common/
[[email protected] common]$ ll
total 5548
-rw-r--r-- 1 hadoop hadoop 3411839 Apr 10 01:41 hadoop-common-2.6.0-cdh5.7.0.jar
-rw-r--r-- 1 hadoop hadoop 1892451 Apr 10 01:41 hadoop-common-2.6.0-cdh5.7.0-tests.jar
-rw-rw-r-- 1 hadoop hadoop 188970 Apr 19 18:47 hadoop-lzo-0.4.21-SNAPSHOT.jar
-rw-r--r-- 1 hadoop hadoop 161018 Apr 10 01:41 hadoop-nfs-2.6.0-cdh5.7.0.jar
drwxr-xr-x 2 hadoop hadoop 4096 Apr 10 01:41 jdiff
drwxr-xr-x 2 hadoop hadoop 4096 Apr 10 01:41 lib
drwxr-xr-x 2 hadoop hadoop 4096 Apr 10 01:41 sources
drwxr-xr-x 2 hadoop hadoop 4096 Apr 10 01:41 templates
[[email protected] hadoop]$ vim core-site.xml
<property>
<name>io.compression.codecs</name>
<value>org.apache.hadoop.io.compress.GzipCodec,
org.apache.hadoop.io.compress.DefaultCodec,
org.apache.hadoop.io.compress.BZip2Codec,
org.apache.hadoop.io.compress.SnappyCodec,
com.hadoop.compression.lzo.LzoCodec,
com.hadoop.compression.lzo.LzopCodec
</value>
</property>
<property>
<name>io.compression.codec.lzo.class</name>
<value>com.hadoop.compression.lzo.LzoCodec</value>
</property>
hive> create table page_views_lzo(
> track_times string,
> url string,
> session_id string,
> referer string,
> ip string,
> end_user_id string,
> city_id string
> ) row format delimited fields terminated by '\t'
> STORED AS INPUTFORMAT "com.hadoop.mapred.DeprecatedLzoTextInputFormat"
> OUTPUTFORMAT "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat";
OK
Time taken: 0.199 seconds
hive> load data local inpath '/home/hadoop/data/page_views_big.dat.lzo' overwrite into table page_views_lzo;
Loading data to table default.page_views_lzo
Table default.page_views_lzo stats: [numFiles=1, numRows=0, totalSize=298200895, rawDataSize=0]
OK
Time taken: 4.064 seconds
[[email protected] data]$ hdfs dfs -ls /user/hive/warehouse/page_views_lzo
Found 1 items
-rwxr-xr-x 1 hadoop supergroup 298200895 2019-04-23 14:28 /user/hive/warehouse/page_views_lzo/page_views_big.dat.lzo
[[email protected] data]$ hdfs dfs -du -s -h /user/hive/warehouse/page_views_lzo
284.4 M 284.4 M /user/hive/warehouse/page_views_lzo
hive> select count(1) from page_views_lzo;
Query ID = hadoop_20190423142626_386a65de-1dad-4000-b223-15239ce16743
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1556000359234_0001, Tracking URL = http://hadoop004:8088/proxy/application_1556000359234_0001/
Kill Command = /home/hadoop/app/hadoop-2.6.0-cdh5.7.0/bin/hadoop job -kill job_1556000359234_0001
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2019-04-23 14:33:15,184 Stage-1 map = 0%, reduce = 0%
2019-04-23 14:33:26,643 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 7.06 sec
2019-04-23 14:33:32,982 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 8.51 sec
MapReduce Total cumulative CPU time: 8 seconds 510 msec
Ended Job = job_1556000359234_0001
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 8.51 sec HDFS Read: 298207931 HDFS Write: 8 SUCCESS
Total MapReduce CPU Time Spent: 8 seconds 510 msec
OK
3300000
Time taken: 28.124 seconds, Fetched: 1 row(s)
由倒数几行可以看出
Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 8.51 sec HDFS Read: 298207931 HDFS Write: 8 SUCCESS
这条SQL语句只有一个Map作业,但是page_views_big.dat.lzo这个文件是285M,至少有三个block,按理来说应该有3个split,因此这里说明了不添加索引的lzo默认不支持分片。
下面使lzo支持分片
hive> SET hive.exec.compress.output;
hive.exec.compress.output=false
hive> SET hive.exec.compress.output=true;
hive> SET hive.exec.compress.output;
hive.exec.compress.output=true
hive> SET mapreduce.output.fileoutputformat.compress.codec=com.hadoop.compression.lzo.LzopCodec;
hive> SET mapreduce.output.fileoutputformat.compress.codec;
mapreduce.output.fileoutputformat.compress.codec=com.hadoop.compression.lzo.LzopCodec
hive> create table page_views_lzo_split
> STORED AS INPUTFORMAT "com.hadoop.mapred.DeprecatedLzoTextInputFormat"
> OUTPUTFORMAT "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat"
> as select * from page_views_lzo;
Query ID = hadoop_20190423142626_386a65de-1dad-4000-b223-15239ce16743
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1556000359234_0002, Tracking URL = http://hadoop004:8088/proxy/application_1556000359234_0002/
Kill Command = /home/hadoop/app/hadoop-2.6.0-cdh5.7.0/bin/hadoop job -kill job_1556000359234_0002
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2019-04-23 14:42:08,062 Stage-1 map = 0%, reduce = 0%
2019-04-23 14:42:18,703 Stage-1 map = 36%, reduce = 0%, Cumulative CPU 7.42 sec
2019-04-23 14:42:21,813 Stage-1 map = 59%, reduce = 0%, Cumulative CPU 10.92 sec
2019-04-23 14:42:24,211 Stage-1 map = 81%, reduce = 0%, Cumulative CPU 14.05 sec
2019-04-23 14:42:26,738 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 16.69 sec
MapReduce Total cumulative CPU time: 16 seconds 690 msec
Ended Job = job_1556000359234_0002
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to: hdfs://hadoop004:9000/user/hive/warehouse/.hive-staging_hive_2019-04-23_14-42-01_301_8465660280055053580-1/-ext-10001
Moving data to: hdfs://hadoop004:9000/user/hive/warehouse/page_views_lzo_split
Table default.page_views_lzo_split stats: [numFiles=1, numRows=3300000, totalSize=296148323, rawDataSize=624194769]
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Cumulative CPU: 16.69 sec HDFS Read: 298204253 HDFS Write: 296148419 SUCCESS
Total MapReduce CPU Time Spent: 16 seconds 690 msec
OK
Time taken: 27.738 seconds
[[email protected] data]$ hdfs dfs -du -s -h /user/hive/warehouse/page_views_lzo_split
282.4 M 282.4 M /user/hive/warehouse/page_views_lzo_split
构建LZO文件索引
[[email protected] data]$ hadoop jar ~/app/hadoop-2.6.0-cdh5.7.0/share/hadoop/common/hadoop-lzo-0.4.21-SNAPSHOT.jar com.hadoop.compression.lzo.LzoIndexer /user/hive/warehouse/page_views_lzo_split
19/04/23 14:47:58 INFO lzo.GPLNativeCodeLoader: Loaded native gpl library from the embedded binaries
19/04/23 14:47:58 INFO lzo.LzoCodec: Successfully loaded & initialized native-lzo library [hadoop-lzo rev f1deea9a313f4017dd5323cb8bbb3732c1aaccc5]
19/04/23 14:47:59 INFO lzo.LzoIndexer: LZO Indexing directory /user/hive/warehouse/page_views_lzo_split...
19/04/23 14:47:59 INFO lzo.LzoIndexer: [INDEX] LZO Indexing file hdfs://hadoop004:9000/user/hive/warehouse/page_views_lzo_split/000000_0.lzo, size 0.28 GB...
19/04/23 14:47:59 INFO Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
19/04/23 14:48:00 INFO lzo.LzoIndexer: Completed LZO Indexing in 0.72 seconds (393.90 MB/s). Index size is 19.97 KB.
[[email protected] data]$ hdfs dfs -ls /user/hive/warehouse/page_views_lzo_split
Found 2 items
-rwxr-xr-x 1 hadoop supergroup 296148323 2019-04-23 14:42 /user/hive/warehouse/page_views_lzo_split/000000_0.lzo
-rw-r--r-- 1 hadoop supergroup 20448 2019-04-23 14:48 /user/hive/warehouse/page_views_lzo_split/000000_0.lzo.index
hive> select count(1) from page_views_lzo_split;
Query ID = hadoop_20190423142626_386a65de-1dad-4000-b223-15239ce16743
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1556000359234_0003, Tracking URL = http://hadoop004:8088/proxy/application_1556000359234_0003/
Kill Command = /home/hadoop/app/hadoop-2.6.0-cdh5.7.0/bin/hadoop job -kill job_1556000359234_0003
Hadoop job information for Stage-1: number of mappers: 3; number of reducers: 1
2019-04-23 14:49:57,100 Stage-1 map = 0%, reduce = 0%
2019-04-23 14:50:11,166 Stage-1 map = 33%, reduce = 0%, Cumulative CPU 2.27 sec
2019-04-23 14:50:12,201 Stage-1 map = 67%, reduce = 0%, Cumulative CPU 6.27 sec
2019-04-23 14:50:14,285 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 10.41 sec
2019-04-23 14:50:19,470 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 12.18 sec
MapReduce Total cumulative CPU time: 12 seconds 180 msec
Ended Job = job_1556000359234_0003
MapReduce Jobs Launched:
Stage-Stage-1: Map: 3 Reduce: 1 Cumulative CPU: 12.18 sec HDFS Read: 296399059 HDFS Write: 58 SUCCESS
Total MapReduce CPU Time Spent: 12 seconds 180 msec
OK
3300000
Time taken: 29.314 seconds, Fetched: 1 row(s)
有上面结果可以看到Map数为3,证明加了索引的lzo支持分片
1.整个项目的目录结构如下,便于理解2. 如上图,从上到下进行配置(1)首先的db.properties文件的配置db.propertiesjdbc.driver=com.mysql.jdbc.Driverjdbc.url=jdbc:mysql://localhost:3306/mybatisjdbc.user=rootjdbc.password=root...
kubeadm之dashboard1.因访问dashboard界面时需要使用https,所以在本次测试环境中使用openssl进行数据加密传输:[[email protected] ~]# openssl genrsa -des3 -passout pass:x -out dashboard.pass.key 2048Generating RSA private key, 2048 bit lo...
https://www.askmaclean.com/archives/%E5%88%A9%E7%94%A8force_matching_signature%E6%8D%95%E8%8E%B7%E9%9D%9E%E7%BB%91%E5%AE%9A%E5%8F%98%E9%87%8Fsql.htmlselect sql_id, FORCE_MATCHING_SIGNATURE, sql_tex...
技术在不断进步,新知识也理应不断学习!Qt5的发布带给我无尽的好奇心,然而,受项目影响,一直使用VS2008 + Qt4.8也未曾及时更新。这几天,果断装上VS2010 + Qt5.1,开始研究。Qt4过渡到Qt5不算显著,然而,“模块化”的Qt代码也需要改变项目配置,如使用“headers”,和配置项目构建(如:改变*.pro文件)。QtWidgets作为一个独立的模块
Mac 的本领,突飞猛进。[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-J2xKSjHl-1601024149703)(/Volumes/123/123/macOS Catalina-10.15.6(19G2021)]/640.png)音乐、播客,联袂登台iTunes 曾深刻影响了人们的视听娱乐方式。如今,音乐和播客这两款全新 app 携手登场,让一切再次改变。每款 app 都彻彻底底重新设计,只为让你能在 Mac 上尽享娱乐的精彩。请放心,你原来在 iTunes 资料
api.js单独写在src目录下api目录写个api.jsimport axios from 'axios'// 登录export const index_info = function () { return axios.post('/api/home').then(res => { console.log(res); return res.data }).catch(err => { console.log("api登录错误", err) })};在需要调用
redhat和centos添加永久路由的方法:vi /etc/sysconfig/static-routes(该文件默认可能没有,自行创建)添加以下任意一条参数即可,两种写法不同但效果一样1.any net 192.168.5.0/24 gw 192.168.2.52.any net 192.168.5.0 netmask 255.255.255.0 gw 192.168.2.5添加后无论发生设备重启还是网络服务重启都会自动添加路由,也就是永久已验证!在 /etc/sysconfig/networ
一、了解webpack作用: 进行模块化打包,他会分析你的项目结构,找到JavaScript模块以及其它的一些浏览器不能直接运行的拓展语言(Scss,TypeScript等),并将其打包为合适的格式以供浏览器使用工作方式: 把你的项目当做一个整体,通过一个给定的主文件(如:index.js),Webpack将从这个文件开始,找到你的项目的所有依赖文件,使用loaders处理它们,最后打包为一个...
正则表达式的目的就是匹配字符串,匹配字符串可以是我们简单理解的字符串,例如:"zhangsan"但这不是正则表达式美丽所在,它是通过对其他字符的特殊转义来达到复杂匹配字串的支持。这里介绍一下它所支持的基本转义符1 基本正则式1.1) ^ 表示文本行的开头eg: "^a" 表示匹配行的第一个字符为"a"的意思1.2) $ 表示文本行的结尾eg: "$a"
lzma算法分析这几天在公司主要在做压缩相关,记录一下所得。目前业界主流的压缩算法感觉并不多,好用的就Huffman,lz系列,其他的像差分编码,vlq编码,感觉只能做个数据预处理,或者一种小范围的压缩。lz系列有很多,主要有lz77 lz78 lzma,基本思想是一样的,都是一种字典编码,如,我有一段文本,里面有“abcdefabcde”,那么后面的abcde并没有必要,可以用前面的替代,所以,其实可存储为“abcd65”,6代表offset,5代表length,既用距离当前位置6字节,长度为5的字
前段时间,组织了一个小团队,要做一个手机游戏的项目,由于之前用VS2012做C++开发较少,所以遇到了这个问题:怎么在VS里添加自定义注释?其实VS在C#这方面做得很不错,但C++却有点不尽人意。废话不多说,进入正题 以VS2012来说,比如要添加一段自定义注释,如下:/*** 函数名:Func* 作者:小凯* 日期:2014-3-21 11:1
一、前言 最近参加三个一学习活动,学到了十七章,由于之前的实验都是在Windows系统下进行的,非常顺利,但这次实验让我吃了鳖,花了两天时间才找到一个不是特别令人满意的解决方案。所以打算记录在本博客,涨涨教训。 首先,阐述一下实验背景和环境,学习汇编语言的环境大多都是Windows或Liunx系统下,使用Dosbox0.74以及汇编语言三件套(masm,link,debug)环境,的确...