Remove statistical outliers

I've analysed newspapers by counting the language distributions of the articles.

The results look like that:

Day 1               Day 2              Day 3

Economy             Economy            Economy
language 1: 0,35    language 1: 0,30   language 1: 0,90
language 2: 0,11    language 2: 0,10   language 2: 0,00
language 3: 0,54    language 3: 0,60   language 3: 0,10

Sports              Sports             Sports
language 1: 0,40    language 1: 0,30   language 1: 1.00
language 2: 0,20    language 2: 0,20   language 2: 0,00
language 3: 0,40    language 3: 0,50   language 3: 0,00

So for instance on day 1, 35 % of the Economy-articles are written in language 1, 11 % in language 2 and so on.

Now I want to remove the outliers (like e.g. day 3) from my data. I was thinking about calculating the double standard deviation and remove all the values that are outside of it.

Does that make sense? Is there a problem if my values don't have a normal distribution? Or is there another way to get rid off the outliers?

In the end I want to calculate the average of all language 1, language 2 etc. values of each category over time and see how the values change.

Any technique how to do that?

Thanks in advance.

2
2022-07-25 20:40:09
Source Share
Answers: 0