文章属于类别 Shell

sed)

2009年12月11日写评论作者 夜行人

在进行文本处理的时候，我们经常遇到要删除重复行的情况。那怎么解决呢？
下面就是三种常见方法？
第一，用sort+uniq，注意，单纯uniq是不行的。
shell> sort file | uniq

这里我做了个简单的测试，当file中的重复行不再一起的时候，uniq将服务删除所有的重复行。经过排序后，所有相同的行都在相邻，因此unqi可以正常删除重复行。

第二，用sort+awk命令，注意，单纯awk同样不行，原因同上。
shell> sort file | awk ‘{if ($0!=line) print;line=$0}’

当然，自己把管道后面的代码重新设计一下，可能不需要sort命令先排序拉。

第三，用sort+sed命令，同样需要sort命令先排序。
shell> sort file | sed ‘$!N; /^$.*$\n\1$/!P; D’

最后附一个必须先用sort排序的文本的例子，当然，这个需要用sort排序的原因是很简单，就是后面算法设计的时候的“局部性”，相同的行可能分散出现在不同的区域，一旦有新的相同行出现，那么前面的已经出现的记录就被覆盖了，看了这个例子就好理解拉。
ffffffffffffffffff
ffffffffffffffffff
eeeeeeeeeeeeeeeeeeee
fffffffffffffffffff
eeeeeeeeeeeeeeeeeeee
eeeeeeeeeeeeeeeeeeee
gggggggggggggggggggg

其实，这是我随便打进去的几行字，没想到就是必须用sort的很好例子，大家可以自己试试看。

参考资料：

[1] SED单行脚本快速参考
http://linux.chinaitlab.com/administer/381792.html
[2] 如何删除重复的行（sed或awk）
http://www.linuxsir.org/bbs/showthread.php?t=132848

转载自：http://www.91linux.com/html/article/shell/20090205/15636.html

Linux

Linux, shell

史上最强bash函数库

2009年11月22日写评论作者 夜行人

多为系统操作相关，如

取内网地址： get_localip()
取内网网口： get_local_iface()
取网关的IP: get_gateway_ip()
添加一个cron作业： add_cron()
删除一个cron作业： del_cron()
取内存大小： get_mem_size()
取硬盘大小： get_hdd_size()
查找非系统默认帐号：find_non_sys_user()
………

使用方法：
将func-common.sh和主脚本放在同一个目录下
在主脚本中source函数库后直调用，函数参数见注释说明
复制内容到剪贴板
代码:

export WORKDIR=$( cd ` dirname $0 ` && pwd )

if [[ ! -r “$WORKDIR/func-common.sh” ]]; then
echo “[$WORKDIR/func-common.sh] NOT FOUND”
exit 1
fi

. “$WORKDIR/func-common.sh” || exit 1
代码存于google svn中，会不定期更新
http://huan.googlecode.com/svn/bash/func-common.sh

转载自：http://bbs.linuxeden.com/thread-192227-1-1.html

Linux

如何查找: 连续的多行内容

2009年11月6日写评论作者 夜行人

例如，我有一个文件包含这样的内容，我需要查找给变量abc和k连续赋值得语句(连续两句)

…..
abc=13 这里是符合条件的地方
k=12
…..

abc=5 单独给abc赋值，不符合条件

…..
…..
abc=14 这个也是符合条件的地方
k=5
……

也就是说，如果我单独用两个grep来查找的话，肯定会得到很多不符合条件的结果。
如何才能达到前面我描述的需求呢? 也就是说，我希望的过滤结果像下面这样:
abc=13
k=12
abc=14
k=5
解答：

awk ‘/abc=/{s=$0;getline;if (/k=/) print s”\n”$0}’ file

sed -n ‘/abc=/{N;/k=/p}’ file

while read line
do
echo $line | grep -q k= && echo $pre | grep -q abc= && echo $pre && echo $line
pre=$line
done<file

来自：http://bbs3.chinaunix.net/viewthread.php?tid=1604893

Linux, shell

“不许联想”mp3player.xml处理

2009年10月14日写评论作者 夜行人

<?xml version=”1.0″ encoding=”UTF-8″?>
<player showDisplay=”yes” showPlaylist=”yes” autoStart=”no”> <song path=”http://music.u148.net/Kris_Kristofferson-holy_woman.mp3″ title=”Kris_Kristofferson-holy_woman” /> <song path=”http://music.u148.net/public_image_ltd-disappointed.mp3″ title=”public_image_ltd-disappointed” /> <song path=”http://music.u148.net/Muse-Uprising.mp3″ title=”Muse-Uprising” /> <song path=”http://music.u148.net/Orla_Fallon-She_Moved_Thro_the_Fair.mp3″ title=”Orla_Fallon-She_Moved_Thro_the_Fair” /> <song path=”http://music.u148.net/damon_and_naomi-E.T.A.mp3″ title=”damon_and_naomi-E.T.A.” /> <song path=”http://music.u148.net/England_Dan_And_John_Ford_Coley-Who_s_Lonely_Now.mp3″ title=”England_Dan_And_John_Ford_Coley-Who_s_Lonely_Now” /> <song path=”http://music.u148.net/the_avett_brothers-will_you_return.mp3″ title=”the_avett_brothers-will_you_return” /> <song path=”http://music.u148.net/richard_hawley-ashes_on_the_fire.mp3″ title=”richard_hawley-ashes_on_the_fire” /> <song path=”http://music.u148.net/avion_travel-abbassando.mp3″ title=”avion_travel-abbassando” /> <song path=”http://music.u148.net/Elisabeth_Kontomanou-I_gotta_right_to_sing_the_blues.mp3″ title=”Elisabeth_Kontomanou-I_gotta_right_to_sing_the_blues” /> <song path=”http://music.u148.net/Matthew_Barber-Easily_bruised.mp3″ title=”Matthew_Barber-Easily_bruised” /> <song path=”http://music.u148.net/Pearl_Jam-Just_Breathe.mp3″ title=”Pearl_Jam-Just_Breathe” /> <song path=”http://music.u148.net/Hope_Sandoval_&_The_Warm_Inventions-Wild_Roses.mp3″ title=”Hope_Sandoval_&_The_Warm_Inventions-Wild_Roses” /> <song path=”http://music.u148.net/u2-ill_go_crazy_if_i_dont_go_crazy_tonight.mp3″ title=”u2-ill_go_crazy_if_i_dont_go_crazy_tonight” /> <song path=”http://music.u148.net/Monsters_Of_Folk-Say_Please.mp3″ title=”Monsters_Of_Folk-Say_Please” /> <song path=”http://music.u148.net/Andres_Cepeda-Faltarán.mp3″ title=”Andres_Cepeda-Faltarán” /> <song path=”http://music.u148.net/Mark_Mulcahy-Be_Sure.mp3″ title=”Mark_Mulcahy-Be_Sure” /> <song path=”http://music.u148.net/Brett_Dennen-San Francisco.mp3″ title=”Brett_Dennen-San Francisco” /> <song path=”http://music.u148.net/Hnir_Pan-Piyka.mp3″ title=”Hnir_Pan-Piyka” /> <song path=”http://music.u148.net/KAYAH-EMBARCACAO.mp3″ title=”KAYAH-EMBARCACAO” /> <song path=”http://music.u148.net/Mosh_Ben_Ari-Bein_hazlilim.mp3″ title=”Mosh_Ben_Ari-Bein_hazlilim” /> <song path=”http://music.u148.net/the_black_crowes-appaloosa.mp3″ title=”the_black_crowes-appaloosa” /></player>

今天想去下载王小峰博客上的音乐，像往常一样，把http://www.wangxiaofeng.net/mp3player.xml拉回来，发现格式都乱了，像以上，只有2行（<player ×××>到</player>为一行）。
想过将http与mp3之间的内容提取出来，但不知道如何入手，最后想出先将它分行，再提出mp3地址

sed “s/<song/\n\r<song/g” mp3player.xml |cut -d\” -f2

输出,有首歌的歌名不是英文，乱码了，Mark_Mulcahy-Be_Sure.mp3

http://music.u148.net/Brett_Dennen-San Francisco.mp3

注意，如果是vi mp3player.xml,替换时，换行符则只有\r,而用sed时则是\n\r
:s/<song/\r<song/g

Windows下用Notepad++打开正则表达式进行替换，换行符是\n

单命令用

grep -oP ‘(?<=song path=”)[^”]*(?=” title=)’ mp3player.xml

更简单的

grep -o ‘http[^”]*\.mp3’ mp3player.xml

Linux, shell

统计几个目录的总大小

2009年10月13日写评论作者 夜行人

for i in ‘awstats www phpmyadmin webconsole’
do
for DX in `du -s $i | awk ‘{print $1}’`
do
let “DD=${DD}+${DX}”
done
let “DDD=${DD}/1024/1024”
echo ${DDD}G
done

Linux

shell

文章属于类别 Shell

删除文本中的重复行(sort+uniq/awk/sed)

史上最强bash函数库

如何查找: 连续的多行内容

“不许联想”mp3player.xml处理

统计几个目录的总大小

Search

文章归档

分类目录

链接表

功能

一	二	三	四	五	六	日
« 5月
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30	31

文章属于类别 Shell

删除文本中的重复行(sort+uniq/awk/sed)

史上最强bash函数库

如何查找: 连续的多行内容

“不许联想”mp3player.xml处理

统计几个目录的总大小

Search

标签

文章归档

分类目录

链接表

功能