操作指南:规则管理--添加规则--替换或过滤网址
替换或过滤列表页地址的规则用于修正获取到的错误链接或无效链接。 1、替换 用@@隔开搜索字符串与替换字符串,搜索字符串中变化的区域用(*)代替。 比如 aa@@bb,表示将aa替换成bb 搜
替换或过滤列表页地址的规则用于修正获取到的错误链接或无效链接。
1、替换
用@@隔开搜索字符串与替换字符串,搜索字符串中变化的区域用(*)代替。
比如 aa@@bb,表示将aa替换成bb
搜索字符串中有需要保持不变的字符串,可以使用占位符代替。
比如:index-{1}-{2}.html@@index/{1}/{2}.html
注意:
1、占位符由花括号和一个字符组成,例如{1}、{2}、{3}
2、搜索字符串中不允许存在相同的占位符
3、占位符代替的字符串在@@前后对应保持不变,与所在位置无关。
4、占位符与(*)之间至少有一个字符间隔。
每行一组,顺序替换,即替换后再用下一组替换。
示例
某discuz站点启用了“SEO设置”的“URL静态化”,内容主题页设置:thread-{tid}-{page}-{prevpage}.html,
为了避免重复采集需要将{prepage}替换为1,即将上图中1-10.html替换为1-1.html。规则为:
1-(*).html@@1-1.html
上式的(*)可以表示1-和.html之间的任意位字符。
2、过滤
包括如下规则:
- 网址必须包含下列字符
- 网址不包含下列字符
- 网页内容不包含以下字符串
每行一组,变化的地方请用(*)代替。
特别提醒:
测试时并不会读取内容页的内容,因此窗口中显示的链接列表没有删除“网页内容不包含以下字符”的链接,实际采集时会忽略这些链接。
示例:
<th class="common">
<a href="javascript:;" id="content_14660" class="showcontent y" title="更多操作" onclick="CONTENT_TID='14660';CONTENT_ID='stickthread_14660';showMenu({'ctrlid':this.id,'menuid':'content_menu'})"></a>
<a href="javascript:void(0);" onclick="hideStickThread('14660')" class="closeprev y" title="隐藏置顶帖">隐藏置顶帖</a>
<a href="thread-14660-1-1.html" style="font-weight: bold;color: #EC1282;" onclick="atarget(this)" class="s xst">Discuz! X3.5 正式版【2023-05-20】</a>
<i class="fico-image fic4 fc-p fnmr vm" title="图片附件"></i>
- <span class="xi1">[回帖奖励 <strong> 21</strong> ]</span>
<span class="tps"> ...<a href="thread-14660-2-1.html" onclick="atarget(this)">2</a><a href="thread-14660-3-1.html" onclick="atarget(this)">3</a><a href="thread-14660-4-1.html" onclick="atarget(this)">4</a><a href="thread-14660-5-1.html" onclick="atarget(this)">5</a><a href="thread-14660-6-1.html" onclick="atarget(this)">6</a>..<a href="thread-14660-126-1.html" onclick="atarget(this)">126</a></span>
</th>
...
<th class="common">
<a href="javascript:;" id="content_957" class="showcontent y" title="更多操作" onclick="CONTENT_TID='957';CONTENT_ID='stickthread_957';showMenu({'ctrlid':this.id,'menuid':'content_menu'})"></a>
<a href="javascript:void(0);" onclick="hideStickThread('957')" class="closeprev y" title="隐藏置顶帖">隐藏置顶帖</a>
<a href="thread-957-1-1.html" style="font-weight: bold;color: #EE1B2E;" onclick="atarget(this)" class="s xst">新版Discuz!应用中心接入教程</a>
<i class="fico-attachment fic4 fc-p fnmr vm" title="附件"></i>
<span class="tps"> ...<a href="thread-957-2-1.html" onclick="atarget(this)">2</a><a href="thread-957-3-1.html" onclick="atarget(this)">3</a><a href="thread-957-4-1.html" onclick="atarget(this)">4</a><a href="thread-957-5-1.html" onclick="atarget(this)">5</a><a href="thread-957-6-1.html" onclick="atarget(this)">6</a>..<a href="thread-957-18-1.html" onclick="atarget(this)">18</a></span>
</th>
...
<th class="common">
<a href="javascript:;" id="content_11030" class="showcontent y" title="更多操作" onclick="CONTENT_TID='11030';CONTENT_ID='stickthread_11030';showMenu({'ctrlid':this.id,'menuid':'content_menu'})"></a>
<a href="javascript:void(0);" onclick="hideStickThread('11030')" class="closeprev y" title="隐藏置顶帖">隐藏置顶帖</a>
<a href="thread-11030-1-1.html" style="font-weight: bold;color: #EE1B2E;background-color: #CCCCCC;" onclick="atarget(this)" class="s xst">打击盗版行动,重拳再出击</a>
<i class="fico-image fic4 fc-p fnmr vm" title="图片附件"></i>
<span class="tps"> ...<a href="thread-11030-2-1.html" onclick="atarget(this)">2</a><a href="thread-11030-3-1.html" onclick="atarget(this)">3</a></span>
</th>
...
<th class="common">
<a href="javascript:;" id="content_12713" class="showcontent y" title="更多操作" onclick="CONTENT_TID='12713';CONTENT_ID='stickthread_12713';showMenu({'ctrlid':this.id,'menuid':'content_menu'})"></a>
<a href="javascript:void(0);" onclick="hideStickThread('12713')" class="closeprev y" title="隐藏置顶帖">隐藏置顶帖</a>
<a href="thread-12713-1-1.html" style="font-weight: bold;color: #EE1B2E;background-color: #FFCC99;" onclick="atarget(this)" class="s xst">Discuz!应用分销功能,新手推广指南</a>
<i class="fico-attachment fic4 fc-p fnmr vm" title="附件"></i>
<span class="tps"> ...<a href="thread-12713-2-1.html" onclick="atarget(this)">2</a></span>
</th>
...
<th class="common">
<a href="javascript:;" id="content_1890" class="showcontent y" title="更多操作" onclick="CONTENT_TID='1890';CONTENT_ID='stickthread_1890';showMenu({'ctrlid':this.id,'menuid':'content_menu'})"></a>
<a href="javascript:void(0);" onclick="hideStickThread('1890')" class="closeprev y" title="隐藏置顶帖">隐藏置顶帖</a>
<a href="thread-1890-1-1.html" style="font-weight: bold;color: #2B65B7;background-color: #FFCC00;" onclick="atarget(this)" class="s xst">严正声明:盗版将受到中国法律应有的惩罚,请自重</a>
<i class="fico-image fic4 fc-p fnmr vm" title="图片附件"></i>
<span class="tps"> ...<a href="thread-1890-2-1.html" onclick="atarget(this)">2</a><a href="thread-1890-3-1.html" onclick="atarget(this)">3</a><a href="thread-1890-4-1.html" onclick="atarget(this)">4</a><a href="thread-1890-5-1.html" onclick="atarget(this)">5</a><a href="thread-1890-6-1.html" onclick="atarget(this)">6</a>..<a href="thread-1890-8-1.html" onclick="atarget(this)">8</a></span>
</th>
...
<th class="common">
<a href="javascript:;" id="content_297" class="showcontent y" title="更多操作" onclick="CONTENT_TID='297';CONTENT_ID='stickthread_297';showMenu({'ctrlid':this.id,'menuid':'content_menu'})"></a>
<a href="javascript:void(0);" onclick="hideStickThread('297')" class="closeprev y" title="隐藏置顶帖">隐藏置顶帖</a>
<a href="thread-297-1-1.html" style="font-weight: bold;color: #2B65B7;" onclick="atarget(this)" class="s xst">Discuz安装插件模板提示“数据无法识别,请返回”的解决办法</a>
<i class="fico-image fic4 fc-p fnmr vm" title="图片附件"></i>
<span class="tps"> ...<a href="thread-297-2-1.html" onclick="atarget(this)">2</a><a href="thread-297-3-1.html" onclick="atarget(this)">3</a><a href="thread-297-4-1.html" onclick="atarget(this)">4</a><a href="thread-297-5-1.html" onclick="atarget(this)">5</a><a href="thread-297-6-1.html" onclick="atarget(this)">6</a>..<a href="thread-297-7-1.html" onclick="atarget(this)">7</a></span>
</th>
...
<th class="common">
<a href="javascript:;" id="content_14718" class="showcontent y" title="更多操作" onclick="CONTENT_TID='14718';CONTENT_ID='stickthread_14718';showMenu({'ctrlid':this.id,'menuid':'content_menu'})"></a>
<a href="javascript:void(0);" onclick="hideStickThread('14718')" class="closeprev y" title="隐藏置顶帖">隐藏置顶帖</a>
<a href="thread-14718-1-1.html" onclick="atarget(this)" class="s xst">升级X3.5常见问题汇总</a>
<i class="fico-attachment fic4 fc-p fnmr vm" title="附件"></i>
<span class="tps"> ...<a href="thread-14718-2-1.html" onclick="atarget(this)">2</a><a href="thread-14718-3-1.html" onclick="atarget(this)">3</a><a href="thread-14718-4-1.html" onclick="atarget(this)">4</a><a href="thread-14718-5-1.html" onclick="atarget(this)">5</a><a href="thread-14718-6-1.html" onclick="atarget(this)">6</a>..<a href="thread-14718-10-1.html" onclick="atarget(this)">10</a></span>
</th>
...
<th class="common">
<a href="javascript:;" id="content_36" class="showcontent y" title="更多操作" onclick="CONTENT_TID='36';CONTENT_ID='stickthread_36';showMenu({'ctrlid':this.id,'menuid':'content_menu'})"></a>
<a href="javascript:void(0);" onclick="hideStickThread('36')" class="closeprev y" title="隐藏置顶帖">隐藏置顶帖</a>
<a href="thread-36-1-1.html" style="font-weight: bold;color: #EE1B2E;" onclick="atarget(this)" class="s xst">★ 开发者报道帖! 回家啦!</a>
<span class="tps"> ...<a href="thread-36-2-1.html" onclick="atarget(this)">2</a><a href="thread-36-3-1.html" onclick="atarget(this)">3</a><a href="thread-36-4-1.html" onclick="atarget(this)">4</a><a href="thread-36-5-1.html" onclick="atarget(this)">5</a><a href="thread-36-6-1.html" onclick="atarget(this)">6</a>..<a href="thread-36-13-1.html" onclick="atarget(this)">13</a></span>
</th>
...
<th class="common">
<a href="javascript:;" id="content_1000" class="showcontent y" title="更多操作" onclick="CONTENT_TID='1000';CONTENT_ID='stickthread_1000';showMenu({'ctrlid':this.id,'menuid':'content_menu'})"></a>
<a href="javascript:void(0);" onclick="hideStickThread('1000')" class="closeprev y" title="隐藏置顶帖">隐藏置顶帖</a>
<a href="thread-1000-1-1.html" style="color: #8F2A90;" onclick="atarget(this)" class="s xst">新版应用中心开放试运行常见问题解答(站长篇&开发者篇)</a>
<span class="tps"> ...<a href="thread-1000-2-1.html" onclick="atarget(this)">2</a><a href="thread-1000-3-1.html" onclick="atarget(this)">3</a></span>
</th>
...
<th class="common">
<a href="javascript:;" id="content_13786" class="showcontent y" title="更多操作" onclick="CONTENT_TID='13786';CONTENT_ID='normalthread_13786';showMenu({'ctrlid':this.id,'menuid':'content_menu'})"></a>
<a href="thread-13786-1-1.html" onclick="atarget(this)" class="s xst">【限时免费模板】简单美化手机版,专为Discuz!X3.5优化</a>
<i class="fico-image fic4 fc-p fnmr vm" title="图片附件"></i>
<span class="tbox theatlevel" title="热度: 239">火...</span>
<i class="fico-thumbup fic4 fc-l fnmr vm" title="帖子被加分"></i>
<span class="tps"> ...<a href="thread-13786-2-1.html" onclick="atarget(this)">2</a><a href="thread-13786-3-1.html" onclick="atarget(this)">3</a><a href="thread-13786-4-1.html" onclick="atarget(this)">4</a><a href="thread-13786-5-1.html" onclick="atarget(this)">5</a><a href="thread-13786-6-1.html" onclick="atarget(this)">6</a>..<a href="thread-13786-29-1.html" onclick="atarget(this)">29</a></span>
</th>
...
<th class="common">
<a href="javascript:;" id="content_18112" class="showcontent y" title="更多操作" onclick="CONTENT_TID='18112';CONTENT_ID='normalthread_18112';showMenu({'ctrlid':this.id,'menuid':'content_menu'})"></a>
<a href="thread-18112-1-1.html" onclick="atarget(this)" class="s xst">游戏类论坛换友链</a>
<a href="forum.php?mod=redirect&tid=18112&goto=lastpost#lastpost" class="xi1">New</a>
</th>
...
<th class="common">
<a href="javascript:;" id="content_17065" class="showcontent y" title="更多操作" onclick="CONTENT_TID='17065';CONTENT_ID='normalthread_17065';showMenu({'ctrlid':this.id,'menuid':'content_menu'})"></a>
<a href="thread-17065-1-1.html" onclick="atarget(this)" class="s xst">都用的哪个微信登录插件呢,自带微信登录插件?</a>
</th>
...
<th class="common">
<a href="javascript:;" id="content_16756" class="showcontent y" title="更多操作" onclick="CONTENT_TID='16756';CONTENT_ID='normalthread_16756';showMenu({'ctrlid':this.id,'menuid':'content_menu'})"></a>
<a href="thread-16756-1-1.html" onclick="atarget(this)" class="s xst">ChatGPT 网页源码分享</a>
<i class="fico-image fic4 fc-p fnmr vm" title="图片附件"></i>
<span class="tps"> ...<a href="thread-16756-2-1.html" onclick="atarget(this)">2</a><a href="thread-16756-3-1.html" onclick="atarget(this)">3</a><a href="thread-16756-4-1.html" onclick="atarget(this)">4</a><a href="thread-16756-5-1.html" onclick="atarget(this)">5</a></span>
</th>
...
<th class="common">
<a href="javascript:;" id="content_2398" class="showcontent y" title="更多操作" onclick="CONTENT_TID='2398';CONTENT_ID='normalthread_2398';showMenu({'ctrlid':this.id,'menuid':'content_menu'})"></a>
<a href="thread-2398-1-1.html" onclick="atarget(this)" class="s xst">【投票】你的网站盈利了吗?</a>
<i class="fico-image fic4 fc-p fnmr vm" title="图片附件"></i>
<span class="tbox theatlevel" title="热度: 54">火</span>
<span class="tps"> ...<a href="thread-2398-2-1.html" onclick="atarget(this)">2</a><a href="thread-2398-3-1.html" onclick="atarget(this)">3</a><a href="thread-2398-4-1.html" onclick="atarget(this)">4</a><a href="thread-2398-5-1.html" onclick="atarget(this)">5</a><a href="thread-2398-6-1.html" onclick="atarget(this)">6</a>..<a href="thread-2398-7-1.html" onclick="atarget(this)">7</a></span>
</th>
使用字符串规则获取网页链接:
<a href="[link]" onclick="atarget(this)" class="s xst">
获取的列表如下:
其中含有“style"属性的均不是需要获取的规则,在”网址不包含下列字符“中填写
style="
获取的列表如下: