ASP.NET中过滤HTML字符串的两个方法-阿里云开发者社区

ASP.NET中过滤HTML字符串的两个方法

2010-05-20 735

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

简介： 先记下来，以作备用！ /// 去除HTML标记 /// /// /// 包括HTML的源码 /// 已经去除后的文字 public static string G...

先记下来，以作备用！

///     <summary> 去除HTML标记
         ///
         ///     </summary>
         ///     <param name="Htmlstring"> 包括HTML的源码 </param>
         ///     <returns> 已经去除后的文字 </returns>
         public static string GetNoHTMLString( string Htmlstring)
        {
             // 删除脚本
            Htmlstring = Regex.Replace(Htmlstring, @" <script[^>]*?>.*?</script> " , "" , RegexOptions.IgnoreCase);
             // 删除HTML
            Htmlstring = Regex.Replace(Htmlstring, @" <(.[^>]*)> " , "" , RegexOptions.IgnoreCase);
            Htmlstring = Regex.Replace(Htmlstring, @" ([\r\n])[\s]+ " , "" , RegexOptions.IgnoreCase);
            Htmlstring = Regex.Replace(Htmlstring, @" --> " , "" , RegexOptions.IgnoreCase);
            Htmlstring = Regex.Replace(Htmlstring, @" <!--.* " , "" , RegexOptions.IgnoreCase);

            Htmlstring = Regex.Replace(Htmlstring, @" &(quot|#34); " , " \ "" , RegexOptions.IgnoreCase);
            Htmlstring = Regex.Replace(Htmlstring, @" &(amp|#38); " , " & " , RegexOptions.IgnoreCase);
            Htmlstring = Regex.Replace(Htmlstring, @" &(lt|#60); " , " < " , RegexOptions.IgnoreCase);
            Htmlstring = Regex.Replace(Htmlstring, @" &(gt|#62); " , " > " , RegexOptions.IgnoreCase);
            Htmlstring = Regex.Replace(Htmlstring, @" &(nbsp|#160); " , "     " , RegexOptions.IgnoreCase);
            Htmlstring = Regex.Replace(Htmlstring, @" &(iexcl|#161); " , " \xa1 " , RegexOptions.IgnoreCase);
            Htmlstring = Regex.Replace(Htmlstring, @" &(cent|#162); " , " \xa2 " , RegexOptions.IgnoreCase);
            Htmlstring = Regex.Replace(Htmlstring, @" &(pound|#163); " , " \xa3 " , RegexOptions.IgnoreCase);
            Htmlstring = Regex.Replace(Htmlstring, @" &(copy|#169); " , " \xa9 " , RegexOptions.IgnoreCase);
            Htmlstring = Regex.Replace(Htmlstring, @" (\d+); " , "" , RegexOptions.IgnoreCase);

            Htmlstring.Replace( " < " , "" );
            Htmlstring.Replace( " > " , "" );
            Htmlstring.Replace( " \r\n " , "" );
            Htmlstring = HttpContext.Current.Server.HtmlEncode(Htmlstring).Trim();

             return Htmlstring;
        }

         /// <summary> 获取显示的字符串，可显示HTML标签，但把危险的HTML标签过滤，如iframe,script等。
         ///
         /// </summary>
         /// <param name="str"> 未处理的字符串 </param>
         /// <returns></returns>
         public static string GetSafeHTMLString( string str)
        {
            str = Regex.Replace(str, @" <applet[^>]*?>.*?</applet> " , "" , RegexOptions.IgnoreCase);
            str = Regex.Replace(str, @" <body[^>]*?>.*?</body> " , "" , RegexOptions.IgnoreCase);
            str = Regex.Replace(str, @" <embed[^>]*?>.*?</embed> " , "" , RegexOptions.IgnoreCase);
            str = Regex.Replace(str, @" <frame[^>]*?>.*?</frame> " , "" , RegexOptions.IgnoreCase);
            str = Regex.Replace(str, @" <script[^>]*?>.*?</script> " , "" , RegexOptions.IgnoreCase);
            str = Regex.Replace(str, @" <frameset[^>]*?>.*?</frameset> " , "" , RegexOptions.IgnoreCase);
            str = Regex.Replace(str, @" <html[^>]*?>.*?</html> " , "" , RegexOptions.IgnoreCase);
            str = Regex.Replace(str, @" <iframe[^>]*?>.*?</iframe> " , "" , RegexOptions.IgnoreCase);
            str = Regex.Replace(str, @" <style[^>]*?>.*?</style> " , "" , RegexOptions.IgnoreCase);
            str = Regex.Replace(str, @" <layer[^>]*?>.*?</layer> " , "" , RegexOptions.IgnoreCase);
            str = Regex.Replace(str, @" <link[^>]*?>.*?</link> " , "" , RegexOptions.IgnoreCase);
            str = Regex.Replace(str, @" <ilayer[^>]*?>.*?</ilayer> " , "" , RegexOptions.IgnoreCase);
            str = Regex.Replace(str, @" <meta[^>]*?>.*?</meta> " , "" , RegexOptions.IgnoreCase);
            str = Regex.Replace(str, @" <object[^>]*?>.*?</object> " , "" , RegexOptions.IgnoreCase);
             return str;
        }

ASP.NET中过滤HTML字符串的两个方法

热门文章

最新文章

相关课程

相关电子书

相关实验场景