优秀的设备爬行我的网站,也帮助我找到死链接和未链接的文件

我有一个相当大的遗产网站,其中有数百个PDF偶尔会在数据源中组成,但通常只是网页上的网页链接,并且还保存在网站上的每个目录网站的大部分内容中。

我创建了一个php蜘蛛来遵守我网站上的所有网页链接,然后我将其与目录网站框架的转储进行对比,但是存在一些不那么复杂的东西?

0
2019-05-07 09:48:13
资源 分享
答案: 5

MICROSYS中有很多项目,特别是它们的A1站点地图生成器A1网站分析器,它们肯定会在你的网站上爬行,并记录你可以想象的每一件小事。

这包括破坏的网页链接,还有一个桌面视图,所有你的网页,所以你可以对比点相同 还有元汇总标签,nofollow网站链接,页面上的元无索引,以及需要鹰眼和快速处理的大量条件。 </ xx_p></div> <div class="votes-answer green"> <div class="vote-count" itemprop="upvoteCount">0</div><i class="fa fa-heart"></i> </div> <div class="clearfix"></div> <div class="action-time"> <span itemprop="author" itemscope itemtype="http://schema.org/Person"><span itemprop="name"><a href="/profile/8530" rel="noopener noreferrer nofollow" target="_blank">Evgeny</a></span></span> <span title="2019-05-13 15:15:28"> 2019-05-13 15:15:28</span><time class="hidden" itemprop="dateCreated" datetime="2019-05-13T03:05:28+02:00">2019-05-13T03:05:28+02:00</time> </div> <a class="aa-link" href="/source/39852" target="_blank" rel="noopener noreferrer nofollow">资源</a> <a itemprop="url" class="s-link" href="https://askdev.io/cn/questions/4013/you-xiu-de-she-bei-pa-xing-wo-de-wang-zhan-ye-bang-zhu-wo#39852" title="分享">分享</a> <div class="clearfix"></div> </div> </div> </div> <div class="answer" id="15295" itemscope itemtype="http://schema.org/Answer" itemprop="suggestedAnswer"> <div class="answer-row"> <div class="answer-text"> <div class="desc" itemprop="text"> <p> 我已经使用了<a href="http://home.snafu.de/tilman/xenulink.html" rel="noopener noreferrer nofollow">Xenu的Link Sleuth</a>。 它运行得相当好,只需要自己确定不要使用DOS! </p></div> <div class="votes-answer green"> <div class="vote-count" itemprop="upvoteCount">0</div><i class="fa fa-heart"></i> </div> <div class="clearfix"></div> <div class="action-time"> <span itemprop="author" itemscope itemtype="http://schema.org/Person"><span itemprop="name"><a href="/profile/1525" rel="noopener noreferrer nofollow" target="_blank">plntxt</a></span></span> <span title="2019-05-09 04:29:19"> 2019-05-09 04:29:19</span><time class="hidden" itemprop="dateCreated" datetime="2019-05-09T04:05:19+02:00">2019-05-09T04:05:19+02:00</time> </div> <a class="aa-link" href="/source/15295" target="_blank" rel="noopener noreferrer nofollow">资源</a> <a itemprop="url" class="s-link" href="https://askdev.io/cn/questions/4013/you-xiu-de-she-bei-pa-xing-wo-de-wang-zhan-ye-bang-zhu-wo#15295" title="分享">分享</a> <div class="clearfix"></div> </div> </div> </div> <div class="answer" id="15206" itemscope itemtype="http://schema.org/Answer" itemprop="suggestedAnswer"> <div class="answer-row"> <div class="answer-text"> <div class="desc" itemprop="text"> <p> 如果您正在使用Windows 7,最有效的设备是IIS7的SEO Toolkit 1.0。 它是免费的,你也可以完全免费下载。 </p><p> 该设备肯定会检查任何类型的网站,并告知您每个死链接的位置,网页需要多长时间加载,哪些网页错过了标题,复制标题,搜索短语也是如此,以及摘要,以及哪些网页实际上已经损坏了HTML。 </p></div> <div class="votes-answer green"> <div class="vote-count" itemprop="upvoteCount">0</div><i class="fa fa-heart"></i> </div> <div class="clearfix"></div> <div class="action-time"> <span itemprop="author" itemscope itemtype="http://schema.org/Person"><span itemprop="name"><a href="/profile/1832" rel="noopener noreferrer nofollow" target="_blank">Ben Hoffman</a></span></span> <span title="2019-05-09 04:09:45"> 2019-05-09 04:09:45</span><time class="hidden" itemprop="dateCreated" datetime="2019-05-09T04:05:45+02:00">2019-05-09T04:05:45+02:00</time> </div> <a class="aa-link" href="/source/15206" target="_blank" rel="noopener noreferrer nofollow">资源</a> <a itemprop="url" class="s-link" href="https://askdev.io/cn/questions/4013/you-xiu-de-she-bei-pa-xing-wo-de-wang-zhan-ye-bang-zhu-wo#15206" title="分享">分享</a> <div class="clearfix"></div> </div> </div> </div> <div class="answer" id="15200" itemscope itemtype="http://schema.org/Answer" itemprop="suggestedAnswer"> <div class="answer-row"> <div class="answer-text"> <div class="desc" itemprop="text"> <p> 我是<a href="http://www.linklint.org/" rel="noopener noreferrer nofollow"><xx_strong> linklint </xx_strong></a>的大型追随者,用于链接检查大型固定网站,如果你有一个unix命令行(我已经使用了linux,MacOS和FreeBSD)。 有关分期付款指南,请参阅其网站。 一旦安装,我创建了一个名为<code>check.ll</code>的文档,并且还执行: </p> <pre><code>linklint @check.ll </code></pre> <p> 这是我的check.ll文件类似的东西 </p> <pre><code># linklint -doc . -delay 0 -http -htmlonly -limit 4000 -net -host www.example.com -timeout 10 </code></pre> <p> 这会对<code>www.example.com</code>进行爬行,并且还会创建具有交叉引用记录的HTML文档,因此会损坏,丢失等等 </p></div> <div class="votes-answer green"> <div class="vote-count" itemprop="upvoteCount">0</div><i class="fa fa-heart"></i> </div> <div class="clearfix"></div> <div class="action-time"> <span itemprop="author" itemscope itemtype="http://schema.org/Person"><span itemprop="name"><a href="/profile/165" rel="noopener noreferrer nofollow" target="_blank">artlung</a></span></span> <span title="2019-05-09 04:08:27"> 2019-05-09 04:08:27</span><time class="hidden" itemprop="dateCreated" datetime="2019-05-09T04:05:27+02:00">2019-05-09T04:05:27+02:00</time> </div> <a class="aa-link" href="/source/15200" target="_blank" rel="noopener noreferrer nofollow">资源</a> <a itemprop="url" class="s-link" href="https://askdev.io/cn/questions/4013/you-xiu-de-she-bei-pa-xing-wo-de-wang-zhan-ye-bang-zhu-wo#15200" title="分享">分享</a> <div class="clearfix"></div> </div> </div> </div> <div class="answer" id="13290" itemscope itemtype="http://schema.org/Answer" itemprop="suggestedAnswer"> <div class="answer-row"> <div class="answer-text"> <div class="desc" itemprop="text"> <p> 试试<a href="http://validator.w3.org/docs/checklink.html" rel="noopener noreferrer nofollow">W3C的开源工具Link Checker</a>。 您可以在线使用它或在您所在的地区安装它 </p></div> <div class="votes-answer green"> <div class="vote-count" itemprop="upvoteCount">0</div><i class="fa fa-heart"></i> </div> <div class="clearfix"></div> <div class="action-time"> <span itemprop="author" itemscope itemtype="http://schema.org/Person"><span itemprop="name"><a href="/profile/1116" rel="noopener noreferrer nofollow" target="_blank">mvark</a></span></span> <span title="2019-05-08 21:00:15"> 2019-05-08 21:00:15</span><time class="hidden" itemprop="dateCreated" datetime="2019-05-08T09:05:15+02:00">2019-05-08T09:05:15+02:00</time> </div> <a class="aa-link" href="/source/13290" target="_blank" rel="noopener noreferrer nofollow">资源</a> <a itemprop="url" class="s-link" href="https://askdev.io/cn/questions/4013/you-xiu-de-she-bei-pa-xing-wo-de-wang-zhan-ye-bang-zhu-wo#13290" title="分享">分享</a> <div class="clearfix"></div> </div> </div> </div> </div> <div class="similar"> <p>相关问题</p> <div> <div class="votes-question accepted"> <div class="vote-count">0</div> </div><a href="https://askdev.io/cn/questions/4000/ke-qian-ru-de-wysiwigwen-ben-bian-ji-qi-you-na-xie-ti-dai" target="_blank">可嵌入的WYSIWIG文本编辑器有哪些替代方案?</a> </div> <div> <div class="votes-question accepted"> <div class="vote-count">0</div> </div><a href="https://askdev.io/cn/questions/3989/wo-ke-yi-zai-html5zhong-shi-yong-rdfama" target="_blank">我可以在HTML5中使用RDFa吗?</a> </div> <div> <div class="votes-question accepted"> <div class="vote-count">0</div> </div><a href="https://askdev.io/cn/questions/3982/wo-ru-he-zai-bu-fang-qi-da-bu-fen-ke-hu-mu-biao-shi-chang-de" target="_blank">我如何在不放弃大部分客户目标市场的情况下处理HTML5?</a> </div> <div> <div class="votes-question accepted"> <div class="vote-count">0</div> </div><a href="https://askdev.io/cn/questions/3961/shi-yao-shi-zui-jian-dan-zui-qing-de-an-pai-yi-huo-de-biao" target="_blank">什么是最简单/最轻的安排,以获得标准的LAMP桩安排进行开发?</a> </div> <div> <div class="votes-question accepted"> <div class="vote-count">0</div> </div><a href="https://askdev.io/cn/questions/3955/wo-zen-yang-cai-neng-zeng-jia-wang-zhan-de-liu-liang" target="_blank">我怎样才能增加网站的流量?</a> </div> <div> <div class="votes-question accepted"> <div class="vote-count">0</div> </div><a href="https://askdev.io/cn/questions/3946/wo-ru-he-chu-li-you-yu-wo-de-cai-dan-dao-zhi-ye-mian-shang" target="_blank">我如何处理由于我的菜单导致页面上有太多链接的问题</a> </div> <div> <div class="votes-question accepted"> <div class="vote-count">0</div> </div><a href="https://askdev.io/cn/questions/3937/shi-yao-shi-gu-ge-chromedeng-yu-firebug" target="_blank">什么是谷歌Chrome等于Firebug?</a> </div> <div> <div class="votes-question accepted"> <div class="vote-count">0</div> </div><a href="https://askdev.io/cn/questions/3918/liao-jie-apacheguan-li-you-na-xie-hen-hao-de-zi-yuan" target="_blank">了解Apache管理有哪些很好的资源?</a> </div> <div> <div class="votes-question accepted"> <div class="vote-count">0</div> </div><a href="https://askdev.io/cn/questions/3877/ji-suan-ji-dong-hua-de-favicon-icoshi-yi-ge-fu-mian-de-jian" target="_blank">计算机动画的favicon.ico是一个负面的建议吗? 存在时尚使用计算机动画favicon.ico?</a> </div> <div> <div class="votes-question accepted"> <div class="vote-count">0</div> </div><a href="https://askdev.io/cn/questions/3872/chuang-jian-ge-ren-yin-si-ji-hua-yi-ji-shi-yong-fang-mian" target="_blank">创建个人隐私计划以及使用方面有哪些优秀来源?</a> </div> <div> <div class="votes-question accepted"> <div class="vote-count">0</div> </div><a href="https://askdev.io/cn/questions/3868/ru-he-zai-wang-luo-tuo-guan-he-yun-tuo-guan-zhi-jian-jin" target="_blank">如何在网络托管和云托管之间进行选择?</a> </div> <div> <div class="votes-question accepted"> <div class="vote-count">0</div> </div><a href="https://askdev.io/cn/questions/3865/nin-ru-he-zhi-jie-jiang-iphone-androidhu-lian-wang-liu-lan" target="_blank">您如何直接将iPhone / Android互联网浏览器发送到m.example.com?</a> </div> <div> <div class="votes-question accepted"> <div class="vote-count">0</div> </div><a href="https://askdev.io/cn/questions/3836/zai-sou-suo-hou-wo-ke-yi-zai-na-li-an-quan-di-sou-suo-yu" target="_blank">在搜索后,我可以在哪里安全地搜索域名whois而不必费心地停留在域名上的互联网搜索引擎?</a> </div> <div> <div class="votes-question accepted"> <div class="vote-count">0</div> </div><a href="https://askdev.io/cn/questions/3830/wo-ru-he-shi-yong-htaccesszai-xiang-xi-xin-xi-wen-jian-jia" target="_blank">我如何使用.htaccess在详细信息文件夹上安排身份验证?</a> </div> <div> <div class="votes-question accepted"> <div class="vote-count">0</div> </div><a href="https://askdev.io/cn/questions/3734/ru-guo-wo-zai-wo-de-wang-zhan-shang-xu-yao-https-sslan-quan" target="_blank">如果我在我的网站上需要HTTPS / SSL安全性,那么我是否会获得我的认证?</a> </div> </div> </div> </div> </div> </div> <footer class="footer"> <div class="container"> <div style="margin-bottom: 10px;" class="select_lng"><strong>语言: </strong> <a href="https://askdev.io/questions/4013"><span class="flag-icon flag-icon-us"></span></a>  <span class="label label-success" style="margin:0px; font-size:100%"><span class="flag-icon flag-icon-cn"></span></span>  <a href="https://askdev.io/de/questions/4013"><span class="flag-icon flag-icon-de"></span></a>  <a href="https://askdev.io/es/questions/4013"><span class="flag-icon flag-icon-es"></span></a>  <a href="https://askdev.io/fr/questions/4013"><span class="flag-icon flag-icon-fr"></span></a>  <a href="https://askdev.io/hi/questions/4013"><span class="flag-icon flag-icon-in"></span></a>  <a href="https://askdev.io/id/questions/4013"><span class="flag-icon flag-icon-id"></span></a>  <a href="https://askdev.io/it/questions/4013"><span class="flag-icon flag-icon-it"></span></a>  <a href="https://askdev.io/jp/questions/4013"><span class="flag-icon flag-icon-jp"></span></a>  <a href="https://askdev.io/kr/questions/4013"><span class="flag-icon flag-icon-kr"></span></a>  <a href="https://askdev.io/nl/questions/4013"><span class="flag-icon flag-icon-nl"></span></a>  <a href="https://askdev.io/pt/questions/4013"><span class="flag-icon flag-icon-pt"></span></a>  <a href="https://askdev.io/ru/questions/4013"><span class="flag-icon flag-icon-ru"></span></a>  <a href="https://askdev.io/tr/questions/4013"><span class="flag-icon flag-icon-tr"></span></a>  <a href="https://askdev.io/ua/questions/4013"><span class="flag-icon flag-icon-ua"></span></a></div> </div> <div class="container"> <div class="pull-left"> <div class="license"> licensed under <a href="https://creativecommons.org/licenses/by-sa/3.0/" rel="nofollow license">cc by-sa 3.0</a> with attribution. </div> </div> <div class="pull-right logo"> <a class="hidden-xs mail" href="mailto:info@askdev.io">info@askdev.io</a> <a href="#"> <div class="name"><span>AskDev.io</span></div> </a> </div> </div> </footer> <script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/9.15.6/highlight.min.js"></script> <script>hljs.initHighlightingOnLoad();</script> <script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.4.1/jquery.slim.min.js"></script> <script src="https://cdnjs.cloudflare.com/ajax/libs/twitter-bootstrap/3.4.1/js/bootstrap.min.js"></script> </body> </html>