BBO Discussion Forums: New hand evaluation method - BBO Discussion Forums

Jump to content

7 Pages
« First
←
2
3
4
5
6
→
Last »

You cannot start a new topic
You cannot reply to this topic

New hand evaluation method

#61 Stefan_O

Group: Full Members
Posts: 469
Joined: 2016-April-01

Posted 2016-July-16, 13:01

tnevolin, on 2016-July-16, 11:45, said:

So please please don't ask me why this suit takes 10 tricks in NT while my model gives it only 21 points.

AKQJTxxxxx

Well... since it is the opponents who lead to trick one, this suit might also take 0 tricks in NT?
Perhaps, the long-term average actually is 21/3 = 7 tricks for such suit?

0

Back to top of the page up there ^

#62 jogs

Group: Advanced Members
Posts: 1,316
Joined: 2011-March-01
Gender:Male
Interests:student of the game

Posted 2016-July-16, 17:57

m1cha, on 2016-July-15, 21:47, said:

No, I disagree here. These coefficients don't come up randomly, they come up for a reason. For example, if you hold AKQ opposite xxx you can expect that those three honors in one hand will cover the three losers in the other hand. But if you hold AKQ opposite x, there is only one loser in this suit to be covered and the other two honors will make a trick only if you have losers in other suits. But sometimes your opponents will take their tricks in those other suits, so your honors become worthless. Or you may have to guess which card will become a loser and you discard the wrong card. This is why AKQ opposite x gets -1 point, it is worth 1/3 trick less than opposite xxx. Even worse opposite a void, you may not be able to access the honors when you need to because you cannot play to them from the other hand. This is why here you get -3 points for AKQ opposite a void.

This is why one needs a system to count both winners and losers. Need to count those losers in high level auctions carefully. Too complex to count losers in low level auctions.

Quote

400k boards is a huge amount of data. I am honestly respecting this.

400K boards sounds like a huge amount of data to study. But is often insufficient amount of data to learn anything useful. Create any situation. Usually one is lucky if one out of 100 boards are useful for studying that situation.

0

Back to top of the page up there ^

#63 m1cha

Group: Full Members
Posts: 397
Joined: 2014-February-23
Gender:Male
Location:Germany

Posted 2016-July-16, 18:47

tnevolin, on 2016-July-16, 09:59, said:

What I meant about coefficients is that there are different ways to get them. Computer and people solve same task of calculating coefficients (feature values). They do it differently. Sometimes they converge on coefficient and people are happy to see two different approaches match in the end. That is all to it. We cannot actually judge the way computer think if coefficients do not converge as good as we would like. Our attempt to "explain" it is just a rationalization of our own model that doesn't actually prove that we are right.

This is true but these two models are not independent. I mean if you analyse a situation and you think you understand it, and then a computer analyses the situation by means of a complex model, you expect it to get a similar result, at least qualitatively, right?

I believe your model is somewhat similar to what climate scientitst do to understand the warming of the earth. Use a formula or a set of formulas and determine a multitude of coefficients from a huge set of data. So if simple physics tells us that the temperature should go up when the CO₂ content increases, and if the model would tell us that the temperature should go down instead, that would be quite spectacular. On the other hand, the influence of water vapor is too complicated to treat it with simple physics because you have opposing effects, so that's where you need the model. But I believe most of the situations in your analysis are more simple than the water vapor example. Although some might not be.

tnevolin, on 2016-July-16, 09:59, said:

Now, if we continue our speculative mind game

, keep in mind that "High card combinations in side suit with 8+ cards on line" and "Value duplication" feature are corrective ones. You can see "Optional. Count only if known." note for each of them. That means that even if you do not count them due to lack of partner's hand knowledge the result still be correct. These two coefficients allow you to do finer tuning in case you have information to use them. That's why they go to both positive and negative sides.

Yes yes. But still, when one of these coefficients is positive, you expect a higher probability of making more tricks.

tnevolin, on 2016-July-16, 09:59, said:

So K-x = -1 doesn't say anything about king trick taking potential. It says that king trick taking potential and singleton trick taking potential clash and the result of this clash is that the combined trick taking potential of kind and singleton when they are in the same suit is 1 point less than if they were in different suits.

This is certainly a precise formulation but it does not help to understand the problem of these figures. The problem is this:
You have a hand with an average side suit, you add a king in the opposite hand in this suit, and the combined value of the hand rises by 2 points equivalent of the probability of making 2/3 of a trick.
You have a hand with a singleton in a side suit, you add a king in the opposite hand in this suit, and the combined value of the hand rises by 1 point equivalent of the probability of making 1/3 of a trick.
You have a hand with a void in a side suit, you add a king in the opposite hand in this suit, and the combined value of the hand rises by 2 points equivalent of the probability of making 2/3 of a trick.
This is strange. I can't prove it wrong, but I find it strange.

tnevolin, on 2016-July-16, 09:59, said:

You are right that some features occur more often than others. That's why I explicitly excluded unstable coefficient with insufficient statistics. Those included in the document are reliable!

I see, OK.

tnevolin, on 2016-July-16, 09:59, said:

In numbers, I excluded feature those occur less than 100-200 times overall. With 100 results statistical error for corresponding coefficient is about 10%. So if its numeric value is less than 5 then absolute error is less than 0.5 which is OK.

This is true for independent random events of equal weight on a single variable. Is it correct that you kept the other coefficients constant in this part of the test and only determined the value duplication coefficients? In that case I would accept the test. If you did it within a multi-factor analysis, working with several variables within one test, I wonder if you could have additional noise from other sources.

5 points sounds OK but not too much considering that a contract can make an overtrick or be 2 - 3 tricks down depending on play. 0.5 means still 32 % of being wrong. Well okay, that's the borderline figures.

tnevolin, on 2016-July-16, 09:59, said:

This is only for very very rare features. I can tell you void is not rare. 10 cards suit is rare.

I never had a 10-card suit.

Well with ~ 400k boards (boards are independent, observations are not), voids have ~ 4.5 %, voids by the playing party in a side suit have ~ 2 %, that is ~ 8000 observations. Voids opposite some definite combination of honors, you are down to ~ 1000 events. That seems to be on the safe side. But I started to understand why you need 400k boards

.

If your figures are correct, what could that mean? Could it mean that the opponents, when they know declarer is very short in a suit, lead their aces carelessly promoting tricks for declarer?

0

Back to top of the page up there ^

#64 jogs

Group: Advanced Members
Posts: 1,316
Joined: 2011-March-01
Gender:Male
Interests:student of the game

Posted 2016-July-16, 20:14

tnevolin, on 2016-July-15, 05:31, said:

I thought about it myself a lot. Here is my speculation on it.
Let's take, for example, 25 HCP combined, 4333-4333 distribution on both hands, and 4-4 fit. This is said to be enough for 3NT.

4333 // 4333 or a joint pattern of 8666
I suspect that with 25 HCP 3NT will make less than 40% of the time. You should be able to prove my statement as true or false. Examine the data base of all 25 HCP, 4333--4333 hands, played in 3NT. Make a histogram of tricks made for the entire study. Post the mean tricks made, the standard deviation of those tricks, and the median tricks made. Perform the same study with 24 HCP, 26 HCP and 27 HCP hands. TIA

0

Back to top of the page up there ^

#65 tnevolin

Group: Full Members
Posts: 64
Joined: 2011-November-12
Gender:Male

Posted 2016-July-16, 20:30

m1cha, on 2016-July-16, 18:47, said:

You have a hand with an average side suit, you add a king in the opposite hand in this suit, and the combined value of the hand rises by 2 points equivalent of the probability of making 2/3 of a trick.
You have a hand with a singleton in a side suit, you add a king in the opposite hand in this suit, and the combined value of the hand rises by 1 point equivalent of the probability of making 1/3 of a trick.
You have a hand with a void in a side suit, you add a king in the opposite hand in this suit, and the combined value of the hand rises by 2 points equivalent of the probability of making 2/3 of a trick.
This is strange. I can't prove it wrong, but I find it strange.

Believe me I feel exactly the same. Some strange fluctuation that I cannot explain. I even highlighted this irregularity in the document.
I kept it there because other parts of the model seems reasonable. I also varied input parameters and features like 500 times and approved for final version only those displaying some persistence to count them as proven effects and not the noise. So I tend to believe this is some sort of new rule that got discovered during experimentation. Spectacular, as you said.

m1cha, on 2016-July-16, 18:47, said:

This is true for independent random events of equal weight on a single variable. Is it correct that you kept the other coefficients constant in this part of the test and only determined the value duplication coefficients? In that case I would accept the test. If you did it within a multi-factor analysis, working with several variables within one test, I wonder if you could have additional noise from other sources.

5 points sounds OK but not too much considering that a contract can make an overtrick or be 2 - 3 tricks down depending on play. 0.5 means still 32 % of being wrong. Well okay, that's the borderline figures.

I didn't do an extensive probabilistic analysis of errors. Just decided that 200 events should be enough to include feature in the system and anything less than that is my consideration. My explanation is just an illustration of an accurate approach to result interpretation. I didn't mean to prove that it is mathematically correct.

m1cha, on 2016-July-16, 18:47, said:

But I started to understand why you need 400k boards

.

I didn't need them. That is how much my computer can chew. Fortunately, I was able to feel the threshold where further increase in features doesn't significantly add up to the accuracy. So I can say the current version includes more or less optimal balance between number of features and accuracy.
I couldn't analyze slam missing cards conditions completely, though. This is the only thing that requires more data.

0

Back to top of the page up there ^

#66 m1cha

Group: Full Members
Posts: 397
Joined: 2014-February-23
Gender:Male
Location:Germany

Posted 2016-July-17, 16:26

@tnevoliln:

Thank you for your explanations. Here is one maybe final point from my side. We were talking about the possibility of setting the point requirements for a full game to 25/26. I understand your reasons for not wanting to do so. But there is something we have not discussed so far. It is the question which hands are weak, which are strong and which are invitational. For example, opposite a 1M opening 11 - 12 point hands are typically considered invitational. There are many situations. If you set your full game requirement to ~ 26, all point ranges can remain and your point-counting system can easily be intrgrated into almost all standard bidding systems. If you have the full game requirement at 28/29, everything changes. I wonder if people are willing to undergo the trouble of changing all the bidding ranges for testing your system. Indeed I did this once for trying Zar points but except my that-time partner I don't know anyone in my environment who found it worth trying. And Zar points are relatively easy in that respect because all the ranges just multiply with 2.

0

Back to top of the page up there ^

#67 tnevolin

Group: Full Members
Posts: 64
Joined: 2011-November-12
Gender:Male

Posted 2016-July-17, 18:59

m1cha, on 2016-July-17, 16:26, said:

@tnevoliln:

Thank you for your explanations. Here is one maybe final point from my side. We were talking about the possibility of setting the point requirements for a full game to 25/26. I understand your reasons for not wanting to do so. But there is something we have not discussed so far. It is the question which hands are weak, which are strong and which are invitational. For example, opposite a 1M opening 11 - 12 point hands are typically considered invitational. There are many situations. If you set your full game requirement to ~ 26, all point ranges can remain and your point-counting system can easily be intrgrated into almost all standard bidding systems. If you have the full game requirement at 28/29, everything changes. I wonder if people are willing to undergo the trouble of changing all the bidding ranges for testing your system. Indeed I did this once for trying Zar points but except my that-time partner I don't know anyone in my environment who found it worth trying. And Zar points are relatively easy in that respect because all the ranges just multiply with 2.

I agree and this is exactly the point someone made earlier. I can shift some value say trump length up and down to bring game requirements close to 26. That is doable and makes no difference for me. The arithmetic is the same. I leave this for people to decide if they like one way or another.

0

Back to top of the page up there ^

#68 tnevolin

Group: Full Members
Posts: 64
Joined: 2011-November-12
Gender:Male

Posted 2016-July-31, 11:35

Finalized article about whole method internals. Lot's of charts, description, analysis, and results.
https://drive.google...QWxsZmlUR2ZUblU

0

Back to top of the page up there ^

#69 jogs

Group: Advanced Members
Posts: 1,316
Joined: 2011-March-01
Gender:Male
Interests:student of the game

Posted 2016-July-31, 18:09

Look at the chart on page 11. If you trade stock markets, you would be familiar with Bollinger bands. This chart should show both mean estimates of tricks and the one standard deviation of those estimates. If the std dev is much greater than 1 trick per board, those mean estimates should be taken with a grain of salt.

0

Back to top of the page up there ^

#70 tnevolin

Group: Full Members
Posts: 64
Joined: 2011-November-12
Gender:Male

Posted 2016-August-01, 15:16

jogs, on 2016-July-31, 18:09, said:

Look at the chart on page 11. If you trade stock markets, you would be familiar with Bollinger bands. This chart should show both mean estimates of tricks and the one standard deviation of those estimates. If the std dev is much greater than 1 trick per board, those mean estimates should be taken with a grain of salt.

The residual quadratic mean for each method and contract type are in table on page 9. Initially I planned charting it on the same graph as on page 11. Then decided that it wouldn't give much more insight that a single average number anyway. The quadratic mean is about the same as standard deviation except it measures deviation from predicted point not from the experimental points population mean. That essentially shows you how far experimental points are from predicted one on average (that is what I wanted to measure) versus how far experimental points are from their own mean.
In the table you can see the deviation is about 0.7-0.8.

0

Back to top of the page up there ^

#71 jogs

Group: Advanced Members
Posts: 1,316
Joined: 2011-March-01
Gender:Male
Interests:student of the game

Posted 2016-August-01, 16:54

Are those charts for NT contracts only? By my count a partnership has 32+ HCP about one out of every 150 boards. Can't find any bridge website which publishes the combined HCP of a partnership. Since most successful slams are in a suit(mostly majors), I'm much more interested in tricks for suit strains.
If one knows both the combined HCP and combined trumps, one can get a better estimate of tricks. But the std dev of those estimates are often greater than 1.25 tricks/board. It often depends on whether there is duplication of values in the short suits.

0

Back to top of the page up there ^

#72 tnevolin

Group: Full Members
Posts: 64
Joined: 2011-November-12
Gender:Male

Posted 2016-August-01, 20:56

jogs, on 2016-August-01, 16:54, said:

Are those charts for NT contracts only? By my count a partnership has 32+ HCP about one out of every 150 boards. Can't find any bridge website which publishes the combined HCP of a partnership. Since most successful slams are in a suit(mostly majors), I'm much more interested in tricks for suit strains.
If one knows both the combined HCP and combined trumps, one can get a better estimate of tricks. But the std dev of those estimates are often greater than 1.25 tricks/board. It often depends on whether there is duplication of values in the short suits.

Charts are for both NT (N) and trumps (T). See legend.

0

Back to top of the page up there ^

#73 jogs

Group: Advanced Members
Posts: 1,316
Joined: 2011-March-01
Gender:Male
Interests:student of the game

Posted 2016-August-08, 19:35

tnevolin, on 2016-August-01, 15:16, said:

The residual quadratic mean for each method and contract type are in table on page 9. Initially I planned charting it on the same graph as on page 11. Then decided that it wouldn't give much more insight that a single average number anyway. The quadratic mean is about the same as standard deviation except it measures deviation from predicted point not from the experimental points population mean. That essentially shows you how far experimental points are from predicted one on average (that is what I wanted to measure) versus how far experimental points are from their own mean.
In the table you can see the deviation is about 0.7-0.8.

It's not clear what you did on page 9. Is this residual quadratic mean calculated for each observation separately? Is it the first difference between the observed value and the expected value for each observation? Hope it is not the difference between the observed value and the mean value of the sample.

0

Back to top of the page up there ^

#74 tnevolin

Group: Full Members
Posts: 64
Joined: 2011-November-12
Gender:Male

Posted 2016-August-09, 07:20

jogs, on 2016-August-08, 19:35, said:

It's not clear what you did on page 9. Is this residual quadratic mean calculated for each observation separately? Is it the first difference between the observed value and the expected value for each observation? Hope it is not the difference between the observed value and the mean value of the sample.

There are two graphs on pages 9 and 10 depicting experimental average depending on theoretical prediction for SAYC and Evolin points, correspondingly. In simpler words, I estimated number of tricks for each hand in observation - that would be the "estimated tricks" (horizontal) scale value. Then for each observation I got real tricks. That is the "actual tricks" (vertical) scale value. Then I averaged results by whole tricks. So the graph point corresponding to 10 estimated tricks represents all hands that were estimated in range 9.5 - 10.5 tricks. And vertical value of this point is the real trick count average for all of these hands.
Now back to residual quadratic mean. Residual is the difference between experimental and theoretical values. The standard measure of match quality is the sum of residual squares. The lesser the value the better the match. This number is good when you run the optimization but is it not good for understanding how much experimental values deviate from theoretical ones on average. For that the <residual quadratic mean> = SQRT(<sum of residual squares> / N) is used.
What I meant in previous post is that I calculated residual quadratic mean across all observations as a single number instead of splitting it to buckets as I did on prediction accuracy graphs. Graphs are for visual perception only. It is nice to see how two lines goes close to each other. Whereas for residual quadratic mean you need just one number which show you an average error. So taking that residual quadratic mean is 0.8, one can draw two supporting lines on the prediction quality graph. One is 0.8 tricks above and one is 0.8 tricks below. Then you can say that 70% of experimental dots would fall into this band. I just didn't want to actually draw them on the graph to not overload it with heavy math.

0

Back to top of the page up there ^

#75 jogs

Group: Advanced Members
Posts: 1,316
Joined: 2011-March-01
Gender:Male
Interests:student of the game

Posted 2016-August-09, 08:22

Our side has 32+ HCP about once every 150 boards. We have a biddable and makeable slam about 3% of the time. HCP is just one of many parameters for generating tricks. Trumps, mainly quantity of trumps and sometimes quality of trumps, is the second parameter. With your large database it is possible to learn which other parameters affect tricks and measure that effect. Lawrence's short suit totals is another parameter. His contribution has been dismissed by most experts. Then there is source of tricks from a second and sometimes a third suit. Controls play a major role in slams. Even opponents' patterns and their ability to defend play a role. But we have no control over those parameters.

0

Back to top of the page up there ^

#76 tnevolin

Group: Full Members
Posts: 64
Joined: 2011-November-12
Gender:Male

Posted 2016-August-11, 08:10

To all forum buddies. I need your expert opinion.
m1cha proposed to shift scale for trump model down so critical contract requirements will match popular values. Upon some thinking I agreed with him that even though such shift distorts the exact trick estimate bidders are much more often interested in critical contract condition rather than generic trick estimate. So the convenience is huge enough to outweigh the incorrectness. So I did and updated evaluation document accordingly. Again thanks to m1cha for pointing this out.

With this is done I continued thinking about further improvement in this direction to make point to contract requirement transition is even more convenient. Here is the background info for you. I have analyzed standard HCP evaluation model and found out that 3NT contract requirement is actually 23 points, not 25. That number marks the breaking point where declaring the contract become profitable on average, not that contract will be made 100%. This is an interesting finding and I believe it's correct because my recent NT model calculations suggest to add 2 points as a constant value to combined strength for better trick count estimate in NT contracts. My critical contract strength table lists 25 points for 3NT (IMP). With 2 points constant value in mind this is equivalent to 23 points 3NT requirement in HCP model. So far the math matches. Now let's go back to the convenience. This 2 points constant value is practically the only difference between Evolin NT model and HCP. The long and strong suit rule happens quite rare. So if we would remove the constant value these two models would match exactly 99% of the time and in the rest of the case they would differ by 1 point max. Which would be a huge in game convenience. The only consequence of this change would be shifting NT critical contract requirements down 2 points. I understand that this would drift away from popular values. However, let me reiterate again that HCP 3NT 25/26 points requirement is incorrect one. The correct one is 23/24 points anyway. Please let me know which approach would be more convenient in your opinion.

If we decide to shift NT model scale the same should be done with Trump model to keep critical contract requirements in sync. Trump model shift doesn't present a challenge, though. I would just add/subtract the value for number of trumps and this would do it. Trump model is already quite complicated so such shift doesn't make it more or less complicated anyway.

0

Back to top of the page up there ^

#77 jogs

Group: Advanced Members
Posts: 1,316
Joined: 2011-March-01
Gender:Male
Interests:student of the game

Posted 2016-August-11, 12:54

tnevolin, on 2016-August-11, 08:10, said:

With this is done I continued thinking about further improvement in this direction to make point to contract requirement transition is even more convenient. Here is the background info for you. I have analyzed standard HCP evaluation model and found out that 3NT contract requirement is actually 23 points, not 25. That number marks the breaking point where declaring the contract become profitable on average, not that contract will be made 100%. This is an interesting finding and I believe it's correct because my recent NT model calculations suggest to add 2 points as a constant value to combined strength for better trick count estimate in NT contracts. My critical contract strength table lists 25 points for 3NT (IMP). With 2 points constant value in mind this is equivalent to 23 points 3NT requirement in HCP model.

You'll need to show proof. Defenders with 17 HCP and the opening lead can't find 5 tricks quicker than declarer finds 9 tricks? I've actually made my own studies. Don't agree with your conclusion. Sometimes declarer has long suits as a source of tricks. It is easier to make 9 tricks while in 1NT than in 3NT. In 3NT the defenders know their goal is to win 5 tricks.
We all know Meckwell often bid 3NT with 24 HCP or less. But does anyone have a study of their boards? Have they won or lost imps on these shaky 3NTs?

0

Back to top of the page up there ^

#78 Stefan_O

Group: Full Members
Posts: 469
Joined: 2016-April-01

Posted 2016-August-11, 13:27

Have to agree with Jogs here...

Could it also be that your deals/scoreboard data-files come from a too weak field of players?

I might expect weak/inexperienced players would be quicker to pick up how to develop the tricks they need as declarer, than how to defend reasonably...

Not that you should blindly accept any old rules-of-thumb, of course, but this specific point of
how many HCP you need for 3NT, is probably the one that has been by far most thoroughly examined in past hand-eval studies...

And if it was that wrong, it would certainly have been refuted by good players long time ago, because it is so easy to test/experience in every-day actual play.

0

Back to top of the page up there ^

#79 tnevolin

Group: Full Members
Posts: 64
Joined: 2011-November-12
Gender:Male

Posted 2016-August-11, 15:04

Stefan_O, on 2016-August-11, 13:27, said:

Have to agree with Jogs here...

Could it also be that your deals/scoreboard data-files come from a too weak field of players?

I might expect weak/inexperienced players would be quicker to pick up how to develop the tricks they need as declarer, than how to defend reasonably...

Not that you should blindly accept any old rules-of-thumb, of course, but this specific point of
how many HCP you need for 3NT, is probably the one that has been by far most thoroughly examined in past hand-eval studies...

And if it was that wrong, it would certainly have been refuted by good players long time ago, because it is so easy to test/experience in every-day actual play.

OK. Let me try to find another game source and see if it holds. From the other side, 2 points is a huge difference that would be very difficult to explain by certain constant dumbness of players in thousands of games.

By the way, can you point me to where it was "most thoroughly examined in past hand-eval studies..."? I never found one after extensive search for few years. That might resolve many of questions right away.

0

Back to top of the page up there ^

#80 Stephen Tu

Group: Advanced Members
Posts: 4,097
Joined: 2003-May-14

Posted 2016-August-11, 15:36

You might look at

http://bridge.thomas...com/valuations/

This was based on double-dummy simulations. But other studies have compared single dummy to double dummy and found that actual results are fairly close to DD, declarers tend to do slightly better than DD at lower levels and somewhat worse at slam level. At 3nt I've seen studies seeing declarers do somewhere between 0.1-0.2 tricks better than DD on average.

I'd be pretty surprised to see that 23 is enough to bid 3nt.

0

Back to top of the page up there ^

7 Pages
« First
←
2
3
4
5
6
→
Last »

You cannot start a new topic
You cannot reply to this topic

3 User(s) are reading this topic
0 members, 3 guests, 0 anonymous users

Time Now: 2024-Nov-10, 12:31

Community Forum Software by IP.Board 3.1.4