Best method of Textfile Parsing in C#?

I intend to parse a config documents sorta point, thus :

[KEY:Value]     
    [SUBKEY:SubValue]

Now I began with a StreamReader, transforming lines right into personality selections, when I figured there is obtained ta be a far better means. So I ask you, modest viewers, to aid me.

One constraint is that it needs to operate in a Linux/Mono setting (1.2.6 to be specific). I do not have the most up to date 2.0 release (of Mono), so attempt to limit language attributes to C# 2.0 or C# 1.0.

0
2019-12-02 02:50:58
Source Share
Answers: 7

Regardless of the lingered layout, making use of a Regex would certainly be the fastest means of parsing. In ruby it 'd possibly be a couple of lines of code.

\[KEY:(.*)\] 
\[SUBKEY:(.*)\]

These 2 would certainly get you the Value and also SubValue in the first team. Look into MSDN on just how to match a regex versus a string.

This is something every person need to have in their feline. Pre - Regex days would certainly feel like the Ice Age.

0
2019-12-04 09:18:44
Source

@Gishu

Actually as soon as I would certainly suited for run away personalities my regex ran a little slower than my hand created top down recursive parser which lacks the nesting (connecting sub - things to their moms and dads) and also mistake reporting the hand created parser had.

The regex was a somewhat faster to write (though I do have a little experience with hand parsers) yet that lacks excellent mistake coverage. As soon as you add that it comes to be a little tougher and also longer to do.

I additionally find the hand created parser less complicated to recognize the purpose of. As an example, below is the a fragment of the code :

private static Node ParseNode(TextReader reader)
{
    Node node = new Node();
    int indentation = ParseWhitespace(reader);
    Expect(reader, '[');
    node.Key = ParseTerminatedString(reader, ':');
    node.Value = ParseTerminatedString(reader, ']');
}
0
2019-12-04 09:18:12
Source

Using a collection is generally ideally to rolling your very own. Below is a fast checklist of "Oh I'll never ever require that/I really did not think of that" factors which will certainly wind up involving attack you later on down the line :

  • Escaping personalities. What happens if you desire a : in the key or ] in the value?
  • Running away the retreat personality.
  • Unicode
  • Mix of tabs and also rooms (see the troubles with Python is white room delicate syntax)
  • Handling various return personality layouts
  • Handling syntax mistake reporting

Like others have actually recommended, YAML resembles your best choice.

0
2019-12-04 01:39:31
Source

You can additionally make use of a pile, and also make use of a push/pop algorithm. This set suits open/closing tags.

public string check()
    {
        ArrayList tags = getTags();


        int stackSize = tags.Count;

        Stack stack = new Stack(stackSize);

        foreach (string tag in tags)
        {
            if (!tag.Contains('/'))
            {
                stack.push(tag);
            }
            else
            {
                if (!stack.isEmpty())
                {
                    string startTag = stack.pop();
                    startTag = startTag.Substring(1, startTag.Length - 1);
                    string endTag = tag.Substring(2, tag.Length - 2);
                    if (!startTag.Equals(endTag))
                    {
                        return "Fout: geen matchende eindtag";
                    }
                }
                else
                {
                    return "Fout: geen matchende openeningstag";
                }
            }
        }

        if (!stack.isEmpty())
        {
            return "Fout: geen matchende eindtag";
        }            
        return "Xml is valid";
    }

You can possibly adjust so you can read the materials of your documents. Normal expressions are additionally an excellent suggestion.

0
2019-12-03 04:15:44
Source

I was considering virtually this specific trouble a few days ago : this article on string tokenizing is specifically what you require. You'll intend to specify your symbols as something like :

@"(?&ltlevel>\s) | " +
@"(?&ltterm>[^:\s]) | " +
@"(?&ltseparator>:)"

The write-up does a respectable work of clarifying it. From there you simply start consuming symbols as you please.

Protip : For an LL(1) parser (read : very easy), symbols can not share a prefix. If you have abc as a token, you can not have ace as a token

Note : The write-up is missing out on the | personalities in its instances, simply toss them in.

0
2019-12-03 04:15:26
Source

It aims to me that you would certainly be far better off making use of an XML based config documents as there are already.NET courses which can read and also store the details for you reasonably conveniently. Exists a factor that this is not feasible?

@Bernard : It holds true that hand editing and enhancing XML bores, yet the framework that you exist currently looks really comparable to XML.

After that of course, has an excellent method there.

0
2019-12-03 04:15:18
Source

I considered it, yet I'm not mosting likely to make use of XML. I am mosting likely to be creating this things by hand, and also hand editing and enhancing XML makes my mind pain. :')

Have you considered YAML?

You get the advantages of XML without all the discomfort and also suffering. It is made use of thoroughly in the ruby area for points like config documents, pre - ready data source information, etc

below is an instance

customer:
  name: Orion
  age: 26
  addresses:
    - type: Work
      number: 12
      street: Bob Street
    - type: Home
      number: 15
      street: Secret Road

There seems a C# library here, which I have not made use of directly, yet yaml is rather straightforward, so "how hard can it be?" : -)

I would certainly claim it is better to designing your very own advertisement - hoc layout (and also managing parser pests)

0
2019-12-03 04:14:53
Source