NNUE Blog | Rebel

Here you can follow the testing of the new created data in the new-data folder and if it is an improvement or not. But first read the introduction page about the why and how.

Results of the 4B MAIN net against a pool of 7 engines playing 2100 games in 5 steps of epochs.

We will start with testing Mobility containing almost 400M positions which were added with the 4B MAIN data and the goal is that it produces a higher score. A test usually takes about 2 days in this way, quite some difference testing with the strongest net structure and the best data usually between 26 and 30B that takes 10-12 days. And as my experience with NNUE programming has learned me, if new data works on 4B it will usually work adding the new data to the current best data of 26.5B as shown in the 2 improvements in the good-data folder. Conclusion: no progress, Mobility also loses with 49.7% against the MAIN data.

Testing King Safety, adding 400M positions (if possible always 10%) to the 4B MAIN data. Conclusion: no progress.

Testing Rook Imbalance, adding 400M positions to the 4B MAIN data. Conclusion: nice but not convincing, but good enough to create 2B of those, merge them into the 26.5B best data, start the trainer and go fishing for 10-12 days.

Tested 3 ideas so far, one is working, 2 not, meaning saved myself 20 days of wasted time.

---------------------------------------------------------------------------------------------------

Testing New Raw Data, this is my way to test new data and if it is good data. Conclusion: bad data.

Testing Horizon effects, adding 400M positions to the 4B MAIN data. Conclusion: no progress.

Testing Bishop Pair, adding 163M positions to the 4B MAIN data. Conclusion: No doubt it's good data, trouble is the pattern is rare, too few positions. A new run with NU+[F6] and wider margins might solve this. Tested 6 ideas, 2 are promising, saved myself 4 x 10 days of testing, one month and 10 days.

---------------------------------------------------------------------------------------------------

Testing Queen Imbalance, adding 129M positions to the 4B MAIN data. Conclusion: it's pretty amazing what only 3% extra specific data can do, same as with the 4% of the Bishop Pair.

Testing Special, adding 400M positions to the 4B MAIN data. Conclusion: A typical case when the error bar rears its ugly head triggered by Murphy's Law and starts to disturb the natural progression with peaks and lows. In cases the standard 10.500 games don't give a clear picture we play 2100 more as in this case. Internally we have called the phenomena : POTM, Phase of the Moon results. Final verdict : inconclusive.

Testing New Raw Data [2], before moving to the endgame we are trying new data first while creating data for the endgame in the meantime. Conclusion: More than excellent data, even a 49% result would have been good. Data can be added to the big 26.5B main base for more variety, the more variety the better the data and thus more elo. One minor point, training will take one day longer.

--------------------------------------------------------------------------------------------------

Start trying to improve the Endgame, we start careful.

Testing Bishop and Knight Endings, adding 250M positions to the 4B MAIN data. Conclusion: Not bad at all, maybe a redo with 500M will make more sense.

Testing Queen and Rook-Endings, adding 250M positions to the 4B MAIN data. Conclusion: The 51% is surprising. It definitely taste for more, redo with 500M. Maybe later.

Testing Bishop -Endings, adding 311M positions to the 4B MAIN data. The Bishop ending is notorious for programmers because there are 2 types, the same colored bishop ending and the unequal colored bishop ending each having its own dynamics, we will see what the training will tell us. Also added are about 6 million positions of the so called bad-bishop-ending KBPK and if that helps. And if not it should be done by classic HCE code. Conclusion: No progress. Oh wait, I have seen it plays the bad-bishop-ending a lot better. So that's 0.1 elo progress.

-------------------------------------------------------------------------------------------------

Testing New Raw Data [3], Conclusion: More excellent data, will be good to extract new data from.

Tuning Piece Division, this is how our 4B test data looks like when we make an overview from the 4B data with NU [F8]. As you see some are not well synchronized, especially the high odd piece counts.

With NU [F12] we create extra data to fill the gaps in order that the data is better synchronized. Click on the picture above to view the data before and after. Edit the file balance-piece-count.txt and in millions enter the data that should be added for each of the low piece-counts. And good for ~10 elo, in this particular case.

Another use of NU [F12] is to set everything (for instance) on "1" in balance-piece-count.txt witch will create a new epd where every piece count has exactly 1.000.000 positions. In other words you can design any piece-count division data you wish, from midgame to endgame.

Real training started

Prune Draws [1] - Based on the statistic we remove all pawnless positions from our 4B test data, in total 160M. Conclusion: First sign that garbage positions like pawnless RxR, RxB, QxQ etc. have a bad influence on the quality of the net. This taste for more. Pruning 100% draw positions seems to make sense. We will check later.

Prune Draws [2] - is an extension on Delete Pawnless with changes, it doesn't delete all pawnless positions but we keep essential ones like KQKR and KBNK. Furthermore we prune dead draws in RxR, BxB, NxN and BxN endings. In this way we prune 360M positions, 9% of the total. Conclusion: Prune Draws [1] scores better. Likely not good.

Prune Draws [3] - is the third and last utility we try, it works via the NU -[Trim Draws [HCE] from the menu using the ProDeo [HCE] knowledge as a base. Used criteria to trim draws are, [1] low on King Safety, [2] low on Mobility, [3] low on passed pawn evaluation and [4] positions that are not tactical in nature. Prunes 8.5% in total. Conclusion: no progress.

Sacrifices - collecting sacrifices is a hobby project to explore the possibility to learn a net to sac material. We currently have 419M with captures only. Now we have created 163M non capture sacrifices to see how that works. Note, this is not about playing strength by definition but like REBEL-EAS about playing style in the first place. Conclusion: very positive, disadvantage, there are not many, we need more.

-----------------------------------------------------------------------------------------------------

Prune More Data - Pruning more positions that are a dead draw positions with 50% or more that might influence the trainer in a bad way. Total trim is now 10%. Conclusion: no progress.

Syzygy experiment - Adding 200M (high selective) 4-5-6 man Syzygy analyzed positions to our 4B test data. It's an inferior approach because all 4-5-6 man positions in the 4B should be also analyzed with Syzygy in the same way which isn't the case at the moment. Hence it's called an experiment. Just wait and see what happens. Conclusion: not bad at all, maybe it has a future after if all of the 4-5-6 man in the 4B test data are also analyzed with Syzygy bases.

More Sacrifices - We collected more non capture sacrifices, in total 439M. Conclusion: unclear, perhaps 11% sacrifices is a too high number.

Rebel nnue blogRebel nnue blog