Saturday, 12 February 2022

C# To HTML

In this post I use Rosyln to covert C# code to HTML. Took me best part of a day to understand the Rosyln code. Not exactly straightforward.

The Code

The code is split into two parts, syntax tree walker, code renderer. There is then the code to compile c# code and convert to HTML. To date, comments, keywords and document comments (e.g. ///) are catered for.

IRenderCode interface

To keep things simple, this interface renders code based upon a supplied Token and string. Another way is to have a method for each token to handle. However, I tend to like smaller interfaces. The IRenderInterface is defined as follows...

Top

using CSharpToHtml.CodeRenderers;
namespace CSharpToHtml
{
  /// <summary>
  /// The Token enum lists all tokens we are interested in for C# to HTML conversion.
  /// </summary>
  public enum Token
  {
    Comment,
    DocumentComment,
    LiteralChar,
    LiteralString,
    Keyword,
    Text,
  }

  /// <summary>
  /// The IRenderCode interface renders token and supplied text.
  /// The Completed method is called once tokenisation is completed.
  /// </summary>
  public interface IRenderCode
  {
    void Render(Token token, string text);
    void Completed();
  }

  /// <summary>
  /// The FactoryRenderCode class creates specific renderer instances.
  /// </summary>
  public static class FactoryRenderCode
  {
    public static IRenderCode ToConsole() =>
      new RenderCodeConsole();

    public static IRenderCode ToHtml(TextWriter writer) =>
      new RenderCodeHtml(writer);
  }
}
Top

Rendering HTML

The HTML render class is as follows...


using System.Web;
namespace CSharpToHtml.CodeRenderers
{
  class RenderCodeHtml : IRenderCode
  {
    private readonly TextWriter _writer;
    private readonly Dictionary<Token, Action<string>> _handlers;    

    public RenderCodeHtml(TextWriter writer)
    {
      _writer = writer;
      _writer.Write("<pre class="code"><code>
");
      _handlers = new Dictionary<Token, Action<string>>
      {
        { Token.Comment, s => Write("Green", s) },
        { Token.DocumentComment, s => Write("Green", s) },
        { Token.Keyword, s => Write("Blue", s) },
        { Token.LiteralChar, s => Write("Red", s) },
        { Token.LiteralString, s => Write("Red", s) },
        { Token.Text, s => Write(s) },
      };
    }

    public void Render(Token type, string text)
    {
      _handlers[type](text);
    }

    public void Completed()
    {
      _writer.Write("</code></pre>");
    }

    private void Write(string text)
    {
      _writer.Write(HttpUtility.HtmlEncode(text));
    }

    private void Write(string color, string text)
    {
      _writer.Write($"<span style="color:{color};">");
      Write(text);
      _writer.Write("</span>");
    }
  }
}
Top

Syntax Tree Walker

To make all of this work, one needs a syntax tree walker. Using the CSharpSyntaxWalker was a good start. The first thing I wanted to do, for test purposes, was to traverse the syntax tree and print values so that the resultant content would look like my original code. This is what one might call a sanity check. The syntax walker class is as follows...


using Microsoft.CodeAnalysis;
using Microsoft.CodeAnalysis.CSharp;

namespace CSharpToHtml
{
  /// <summary>
  /// The SyntaxWalker parses supplied code (RenderCode) and walks tokens.
  /// The base class, CSharpSyntaxWalker, allows one to walk trivia as well.
  /// Trivia includes tokens such as whitespace, comments, etc.
  /// As such, token depth walking should suffice for the IRenderCode interface.
  /// </summary>
  public class SyntaxWalker : CSharpSyntaxWalker
  {
    // _model is not currently used
    private readonly SemanticModel _model;
    private readonly IRenderCode _render;


    /// <summary>
    /// Take the code to parse and the code renderer to render tokens.
    /// </summary>
    public static void RenderCode(string code, IRenderCode renderer)
    {
      // Create syntax tree from supplied code.
      var tree = CSharpSyntaxTree.ParseText(code);

      // Create a new compilation unit, gives access to the semantic model.
      var compilation = CSharpCompilation.Create(
        "MyCompilation",
        new[] { tree },
        new[] { MetadataReference.CreateFromFile(typeof(object).Assembly.Location) });

      // Create semantic model, a walker and node visitor.
      // Visit nodes then call renderer completed.
      var semanticModel = compilation.GetSemanticModel(tree);
      var walker = new SyntaxWalker(semanticModel, renderer);
      walker.Visit(semanticModel.SyntaxTree.GetRoot());
      renderer.Completed();
    }    

    /// <summary>
    /// Visit all tokens.
    /// A token may have leading or trailing trivia.
    /// Trivia is defined as whitespace or comments.
    /// </summary>
    /// <param name="token"></param>
    public override void VisitToken(SyntaxToken token)
    {
      // Process trivia that may lead the supplied token.
      if (token.HasLeadingTrivia)
        ProcessTrivia(token.LeadingTrivia);

      // Special case - token is a keyword.
      if (token.IsKeyword())
        _render.Render(Token.Keyword, token.ValueText);
      else
      {
        // get token type.
        var kind = token.Kind();
        switch (kind)
        {
          // token is a character.
          case SyntaxKind.CharacterLiteralToken:
            _render.Render(Token.LiteralChar, $"'{token.ValueText}'");
            break;

          // token is a string.
          case SyntaxKind.StringLiteralToken:
            _render.Render(Token.LiteralString, $""{token.ValueText}"");
            break;

          // don't care what the token is at this point.
          default:
            _render.Render(Token.Text, token.ValueText);
            break;
        }        
      }

      // Process trailing trivia (typically whitespace).
      if (token.HasTrailingTrivia)
        ProcessTrivia(token.TrailingTrivia);

      base.VisitToken(token);
    }

    private SyntaxWalker(
      SemanticModel model,
      IRenderCode walker) : base(SyntaxWalkerDepth.Token)
    {
      _model = model;
      _render = walker;
    }

    /// <summary>
    /// Process trivia (comments, whitespace or text.
    /// </summary>
    /// <param name="triviaCollection"></param>
    private void ProcessTrivia(SyntaxTriviaList triviaCollection)
    {
      foreach (var trivia in triviaCollection)
      {
        var kind = trivia.Kind();

        switch (kind)
        {
          // Single or multiline comments are rendered as a Token.Comment.
          case SyntaxKind.SingleLineCommentTrivia:
          case SyntaxKind.MultiLineCommentTrivia:          
            _render.Render(Token.Comment, trivia.ToString());
            break;

          // Document comments are rendered as a Token.DocumentComment.
          case SyntaxKind.SingleLineDocumentationCommentTrivia:
          case SyntaxKind.MultiLineDocumentationCommentTrivia:
            string text = "///" + trivia.ToString();
            _render.Render(Token.DocumentComment, text);
            break;
          default:
            _render.Render(Token.Text, trivia.ToString());
            break;
        }        
      }
    }
  }
}

Using the code

To generate HTML output the following code should be used.


using CSharpToHtml;
var code = File.ReadAllText("myfile");
using var writer = new StringWriter();
SyntaxWalker.RenderCode(code, FactoryRenderCode.ToHtml(writer));
string html = writer.ToString();
}

One can also pass C# code as a string to the SyntaxWalker.RenderCode method.

Summary

I showed how to use the C# syntax walker to pick out tokens for use when converting code. It should be noted that syntax highlighting for code in this post was genererated using the above code.

Saturday, 27 November 2021

Fast File Traversal in C#

If you have tried to use the .NET method for traversing files and directories you have probably encountered problems. Problems include access denied exception and slow execution. File access seems to be implemented by .NET and NOT by the operating system. The code supplied in this blog was successfully run on Windows 10, 64 bit. The code was developed using Visual studio 2022 using .NET Core 5.0.

Why are the .NET methods troublesome?

In my opinion the .NET framework API contains some great code. The problem is that some API methods are somewhat lacking. The idea that traversing a file structure may throw exceptions seems a little bizarre.

Most of the methods for traversing the file structure use IEnumerable. Ultimately, using IEnumerable is a good idea. Problems arise however, due to the fact that an IEnumerable implementation, upon receiving an exception, tends to close the underlying stream. This is certainly the case for methods that obtain a list of directories or files.

I think the file system API, especially for traversal, is very poor. I should be able to view all directories and files. Attempting to open/modify said directories or files is a different matter (maybe this is where .NET file security should kick in?).

Top

Under the hood

Some of the .NET implementations appear to be wrappers over the Win32 API. The Win32 API, is old-school, C functions, of which there are many. Calling Win32 functions from a C# app incurrs overhead due to the marshalling of types. In my experiece, the overhead is not significant, but, may well be in tight loops. (So, just something to be aware of.)

Top

The code

using System;
using System.Collections.Generic;
using System.IO;
using System.Runtime.InteropServices;

public static partial class Functional
{
  #region Constants
  public static readonly IntPtr InvalidHandle = new IntPtr(-1);
  #endregion // Constants

  #region Data
  [StructLayout(LayoutKind.Sequential)]
  struct FILE_TIME
  {
    public uint Low;
    public uint High;
  }

  [StructLayout(LayoutKind.Sequential, CharSet = CharSet.Auto)]
  struct WIN32_FIND_DATA
  {
    public int Attributes;
    public FILE_TIME Created;
    public FILE_TIME LastAccessed;
    public FILE_TIME LastWrite;
    public int SizeLow;
    public int SizeHigh;
    public int Reserved;
    public int Reserved2;

    [MarshalAs(UnmanagedType.ByValTStr, SizeConst = 260)]
    public string Name;

    [MarshalAs(UnmanagedType.ByValTStr, SizeConst = 14)]
    public string AlternateName;

    public uint Type;
    public uint CreatorType;
    public uint FinderFlag;

    public bool IsValidDirectory()
    {
      int len = Name.Length;
      return len > 2 ||
        (len == 1) && (Name[0] != '.') ||
        (len == 2) && (Name[0] != '.') && (Name[1] != '.');
    }
  }
  #endregion // Data

  #region Imports
  [DllImport("kernel32.dll", CharSet = CharSet.Auto, SetLastError = true)]
  private static extern IntPtr FindFirstFile(
  string fileName,
  [Out] out WIN32_FIND_DATA data);

  [DllImport("kernel32.dll", CharSet = CharSet.Auto, SetLastError = true)]
  private static extern bool FindNextFile(
    IntPtr hndFindFile,
    [Out] out WIN32_FIND_DATA lpFindFileData);

  [DllImport("kernel32.dll")]
  private static extern bool FindClose(IntPtr handle);
  #endregion // Imports

  public static void FindFiles(string root)
  {
    WIN32_FIND_DATA fd = new WIN32_FIND_DATA();
    Stack<string> dirs = new Stack<string>();

    dirs.Push(root);

    while (dirs.Count > 0)
    {
      string curDir = dirs.Pop();
      string search = Path.Combine(curDir, "*.*");
      IntPtr handle = FindFirstFile(search, out fd);
      if (handle != InvalidHandle)
      {
        bool valid = true;
        while (valid)
        {
          bool isDir = 0 != (fd.Attributes & (int)FileAttributes.Directory);
          if (isDir && fd.IsValidDirectory())
            dirs.Push(Path.Combine(curDir, fd.Name));

          valid = FindNextFile(handle, out fd);
        }
        FindClose(handle);
      }
    }
  }
}
Top

Summary

First, note that the code does not yield/return useful results. The code simply serves as a starting point. It is fast, and can be modified accordingly to suit project needs.

Memory allocations (mainly strings as to be expected) are quite high. I tried using string interning, StringBuilder to build paths and so on. Some techniques yielded slightly less memory consumption, nothing so drastic as to make me think "yeah I should post that".

Top