Search: [math] - Alan's Bookmarks

New secret math benchmark stumps AI models and PhDs alike - Ars Technica

FrontierMath's difficult questions remain unpublished so that AI companies can't train against it. //

On Friday, research organization Epoch AI released FrontierMath, a new mathematics benchmark that has been turning heads in the AI world because it contains hundreds of expert-level problems that leading AI models solve less than 2 percent of the time, according to Epoch AI. The benchmark tests AI language models (such as GPT-4o, which powers ChatGPT) against original mathematics problems that typically require hours or days for specialist mathematicians to complete.

FrontierMath's performance results, revealed in a preprint research paper, paint a stark picture of current AI model limitations. Even with access to Python environments for testing and verification, top models like Claude 3.5 Sonnet, GPT-4o, o1-preview, and Gemini 1.5 Pro scored extremely poorly. This contrasts with their high performance on simpler math benchmarks—many models now score above 90 percent on tests like GSM8K and MATH.

The design of FrontierMath differs from many existing AI benchmarks because the problem set remains private and unpublished to prevent data contamination. Many existing AI models are trained on other test problem datasets, allowing the AI models to easily solve the problems and appear more generally capable than they actually are. Many experts cite this as evidence that current large language models (LLMs) are poor generalist learners.

a/i · computers · math

November 15, 2024 at 3:41:34 PM UTC * · permalink

·

https://arstechnica.com/ai/2024/11/new-secret-math-benchmark-stumps-ai-models-and-phds-alike/

·

Localization Failure: Temperature is Hard | Random ASCII - tech blog of Bruce Dawson

68 °F above average is a lot. For a tropical country it is not credible for temperatures to be that much warmer than average because the average is too high to give enough headroom. So what gives?

Reading the article I found this:

parts of Malawi saw a maximum temperature of 43C (109F), compared with an average of nearly 25C (77F)

As I expected the actual temperature increase was 32 °F, not 68 °F. So what’s up with that headline? Here’s a hint: this is what the headline might say if you set your location to somewhere other than the United States:

Now “nearly 20C” is an odd way of saying “18 °C”, but I guess they really like round numbers, and that’s not the problem. The problem is that somebody – the localization team? an algorithm? – decided that 20 °C was equivalent to 68 °F. And they’re not wrong. And yet they are.

When converting from a temperature in Celsius to one in Fahrenheit you have to multiply by 1.8 (because each degree Celsius covers a range 1.8 times as large as a degree Fahrenheit) and you have to add 32 °F (because the freezing point in Fahrenheit is 32, compared to 0 in Celsius). However if you are converting a temperature difference you just multiply by 1.8. //

This is just another version of the fallacy involved when somebody says that it is “twice as hot” when the temperature goes from 5 °C to 10 °C – note that this is equivalent to going from 278 K to 283 K, or 41 °F to 50 °F, so clearly not “twice as hot” in any meaningful way.

weather · math

September 22, 2024 at 10:47:54 PM UTC * · permalink

·

https://randomascii.wordpress.com/2023/10/17/localization-failure-temperature-is-hard/#more-4050

·

Intel Underestimates Error Bounds by 1.3 quintillion | Random ASCII - tech blog of Bruce Dawson

Intel’s manuals for their x86/x64 processor clearly state that the fsin instruction (calculating the trigonometric sine) has a maximum error, in round-to-nearest mode, of one unit in the last place. This is not true. It’s not even close.

The worst-case error for the fsin instruction for small inputs is actually about 1.37 quintillion units in the last place, leaving fewer than four bits correct. For huge inputs it can be much worse, but I’m going to ignore that.

I was shocked when I discovered this. Both the fsin instruction and Intel’s documentation are hugely inaccurate, and the inaccurate documentation has led to poor decisions being made. //

brucedawson on October 9, 2014 at 10:38 pm
This will affect programmers who then have to work around the issue so that every day computer users are not affected. The developer of VC++ and glibc had to write alternate versions, so that’s one thing. The inaccuracies could be enough to add up over repeated calls to sin and could lead to errors in flight control software, CAD software, games, various things. It’s hard to predict where the undocumented inaccuracy could cause problems.

It likely won’t now because most C runtimes don’t use fsin anymore and because the documentation will now be fixed.

computers · math

September 22, 2024 at 10:42:25 PM UTC * · permalink

·

https://randomascii.wordpress.com/2014/10/09/intel-underestimates-error-bounds-by-1-3-quintillion/

·

SwissMicros - Product Line

The DM32 is our enhanced classic all-rounder based on the HP 32SII. 171 functions, of which 75 are directly accessible from the keypad. Programmable. Conversions, statistics, fractions, equations, solver and more. The perfect choice for almost everybody. BETA firmware installed, updates will be required.

computers · math · purchasing

September 22, 2024 at 10:25:55 PM UTC * · permalink

·

https://www.swissmicros.com/products

·

The mystery of the rogue HP calculator: 12C or not 12C? That is the question • The Register Forums

Anonymous Coward

Don't put it in your pocket

Are we now going to discover that Hezbollah bought a batch of calculators from Brazil some months ago?

Ian JohnstonSilver badge

Re: Don't put it in your pocket

If they did, it's a bad move which might easily blow up in their faces.

Yet Another Anonymous cowardSilver badge

Re: Scientific Calculator

Scientific calculators use a body of tested and published algorithms to determine the answer.

Non-scientific calculators believe what they read in the Daily Mail and what someones sister's best-friends hairdresser's partner saw on Facebook

Andy NonSilver badge

Re: Scientific calculator:

1+2x3=7

Daily Mail calculator:

1+2x3=9

humor · computers · math

September 22, 2024 at 10:19:00 PM UTC * · permalink

·

https://forums.theregister.com/forum/all/2024/09/21/hp_12c_calculator_mystery/

·

How I won $2,750 using JavaScript, AI, and a can of WD-40

Here’s how I used ChatGPT and a little bit of JavaScript to figure out I could win this video contest... and then proceed to win it. //

All told, there were 538 entries competing for prizes in the contest. n/(538+n)n/(538+n)n/(538+n) doesn’t sound like great odds, does it?

Let’s dig deeper to see why winning isn’t as improbable as it sounds on the surface. To do this, we’ll review the existing submissions. //

Now we can get to figuring out the probabilities and odds. For this, we’ll use a function to calculate the binomial coefficient.

math

August 21, 2024 at 3:50:26 PM UTC * · permalink

·

https://davekiss.com/blog/how-i-won-2750-using-javascript-ai-and-a-can-of-wd-40

·

Support Mathematics and Logic | Hillsdale College | Hillsdale College

For more than 2,300 years, Euclid’s Elements has been the foundation for countless students to learn how to reason with precision and pursue knowledge in all fields of learning.

The brilliance of his work has made it the second most published book in history because it provides profound tools to distinguish truth from error and discover fundamental principles about the world.

Modeled after our core mathematics course, “Mathematics and Logic” examines the vital importance of good reasoning to the liberal arts.

With this course, you’ll study the transformation of mathematics by the ancient Greeks, the fundamentals of logic and deductive reasoning, the central proofs of Euclid, the birth of modern geometry, and much more.

And now, you can own a DVD box set of “Mathematics and Logic” for a gift of $100 or more to Hillsdale College.

math · dvd · education

August 15, 2024 at 6:44:24 PM UTC * · permalink

·

https://secured.hillsdale.edu/hillsdale/dvd-mathematics-and-logic-course?sc=00472N0242L15DODEES

·

Exploiting Mistyped URLs - Schneier on Security

What price common sense? • June 11, 2024 7:30 AM

@Levi B.

“Those who are not familiar with the term “bit-squatting” should look that up”

Are you sure you want to go down that rabbit hole?

It’s an instant of a general class of problems that are never going to go away.

And why in

“Web servers would usually have error-correcting (ECC) memory, in which case they’re unlikely to create such links themselves.”

The key word is “unlikely” or more formally “low probability”.

Because it’s down to the fundamentals of the universe and the failings of logic and reason as we formally use them. Which in turn has been why since at least as early as the ancient Greeks through to 20th Century, some of those thinking about it in it’s various guises have gone mad and some committed suicide.

To understand why you need to understand why things like “Error Correcting Codes”(ECC) will never by 100% effective and deterministic encryption systems especially stream ciphers will always be vulnerable. //

No matter what you do all error checking systems have both false positive and false negative results. All you can do is tailor the system to that of the more probable errors.

But there are other underlying issues, bit flips happen in memory by deterministic processes that apparently happen by chance. Back in the early 1970’s when putting computers into space became a reality it was known that computers were effected by radiation. Initially it was assumed it had to be of sufficient energy to be ‘ionizing’ but later any EM radiation such as the antenna of a hand held two way radio would do with low energy CMOS chips.

This was due to metastability. In practice the logic gates we use are very high gain analog amplifiers that are designed to “crash into the rails”. Some logic such as ECL was actually kept linear to get speed advantages but these days it’s all a bit murky.

The point is as the level at a simple logic gate input changes it goes through a transition region where the relationship between the gate input and output is indeterminate. Thus an inverter in effect might or might not invert or even oscillate with the input in the transition zone.

I won’t go into the reasons behind it but it’s down to two basic issues. Firstly the universe is full of noise, secondly it’s full of quantum effects. The two can be difficult to differentiate in even very long term measurements and engineers tend to try to lump it all under a first approximation of a Gaussian distribution as “Addative White Gaussian Noise”(AWGN) that has nice properties such as averaging predictably to zero with time and “the root of the mean squared”. However the universe tends not to play that way when you get up close, so instead “Phase Noise in a measurement window” is often used with Allan Deviation. //

There are things we can not know because they are unpredictable or beyond or ability to measure.

But also beyond a deterministic system to calculate.

Computers only know “natural numbers” or “unsigned integers” within a finite range. Everything else is approximated or as others would say “faked”. Between every natural number there are other numbers some can be found as ratios of natural numbers and others can not. What drove philosophers and mathematicians mad was the realisation of the likes of “root two”, pi and that there was an infinity of such numbers we could never know. Another issue was the spaces caused by integer multiplication the smaller all the integers the smaller the spaces between the multiples. Eventually it was realised that there was an advantage to this in that it scaled. The result in computers is floating point numbers. They work well for many things but not with addition and subtraction of small values with large values.

As has been mentioned LLM’s are in reality no different from “Digital Signal Processing”(DSP) systems in their fundamental algorithms. One of which is “Multiply and ADd”(MAD) using integers. These have issues in that values disappear or can not be calculated. With continuous signals they can be integrated in with little distortion. In LLM’s they can cause errors that are part of what has been called “Hallucinations”. That is where something with meaning to a human such as the name of a Pokemon trading card character “Solidgoldmagikarp” gets mapped to an entirely unrelated word “distribute”, thus mayhem resulted on GPT-3.5 and much hilarity once widely known.

physics · computers · crypto · math

June 11, 2024 at 4:20:07 PM UTC * · permalink

·

https://www.schneier.com/blog/archives/2024/06/exploiting-mistyped-urls.html/#comment-438265

·

Pi has an amusing footnote in Indiana history • Indiana Capital Chronicle

Purdue University mathematics professor Clarence Waldo was only at the Indiana Statehouse to lobby for the school during budget talks in February of 1897. That’s when he happened to witness House Bill 246 – to legally change the value of the number pi to 3.2 – pass its third and final reading in the General Assembly’s lower house. //

Waldo resolved to make sure the Senate didn’t make the same embarrassing mistake, privately coaching several senators on how to speak against the bill. At the same time, newspapers outside the state were picking up the story, correctly making fun of Indiana legislators for being so easily hoodwinked.

Sen. Orrin Hubbel of Elkhart County took the lead in trying to kill the bill when it reached the floor of the Senate, calling it “utter folly” and stating he and his colleagues “might as well try to legislate water to run up hill as to establish mathematical truth by law,” according to a report in the Indianapolis Journal.

Thankfully, the bill died before coming to a vote, but that was due more to Waldo’s lobbying and the negative publicity than any principled opposition based on basic mathematical knowledge.

math

March 30, 2024 at 5:37:02 AM UTC * · permalink

·

https://indianacapitalchronicle.com/2023/03/13/pi-from-the-sky/

·

Note:

all the tags from https://b.plas.ml

1st-amendment 2nd-amendment 4th-amendment 5th-amendment 9/11 a8 abortion acl adhd afghanistan africa a/i air-conditioning amateur-radio amazon america american android animals anti-americanism antifa anti-semitism antiv antivirus aoip apollo apple appliances archaeology architecture archive art astronomy audio automation avatar aviation backup bash batteries belleville bible biden bill-of-rights biology bookmarks books borg bush business calibre camping capitalism cellphone censorship chemistry children china christianity church cia clinton cloud coldwar communication communist composed computers congress conservatives constitution construction cooking copyleft copyright corruption cosmology counseling creation crime cron crypto culture culture-of-death cummins data database ddt dd-wrt defense democrats depression desantis development diagrams diamonds disinformation diy dns documentation dokuwiki domains dprk drm drm-tpm drugs dvd dysautonomia earth ebay ebola ebook economics education efficiency electricity electronics elements elwa email energy engineering english environment environmentalism epa ethernet ethics europe euthanasia evolution faa facebook family fbi fcc feminism finance firewall flightsim flowers fonts français france fraud freebsd free-speech fun games gardening genealogy generation generators geography geology gifts git global-warming google gop government gpl gps graphics green-energy grounding hdd-test healthcare help history hollywood homeschool hormones hosting houses hp html humor hunting hvac hymns hyper-v imap immigration india infosec infotech insects instruments interesting internet investing ip-addressing iran iraq irs islam israel itec j6 journalism jumpcloud justice kindle kodi language ldap leadership leftist leftists legal lego lgbt liberia liberty linguistics linux literature locks make malaria malware management maps markdown marriage mars math media medical meshcentral metatek metric microbit microsoft mikrotik military minecraft minidisc missions moon morality mothers motorola movies mp3 museum music mythtv names nasa nature navigation navy network news nextcloud ntp nuclear obama ocean omega opensource organizing ortlip osmc oxygen paint palemoon paper parents passwords patents patriotism pdf petroleum pets pews photography photo-mgmt physics piano picasa plesk podcast poetry police politics pollution pornography pots prayer pregnancy presentations press printers privacy programming progressive progressives prolife psychology purchasing python quotes rabbits rabies racism radiation radio railroad reagan recipes recording recycling reference regulations religion renewables republicans resume riots rockets r-pi russia russiagate safety samba satellites sbe science sci-fi scotus secularism security servers shipping ships shooting shortwave signal sjw slavery sleep snakes socialism social-media software solar space spacex spam spf spideroak sports ssh statistics steampowered streaming supplement surveillance sync tarsnap taxes tck tds technology telephones television terrorism tesla theology thorium thumbnail thunderbird time tls tools toyota trains transformers travel trump tsa twitter typography ukraine unions united.nations unix ups usa vaccinations vangelis vehicles veracrypt video virtualbox virus vitamin vivaldi vlc voting vpn w3w war water weather web whatsapp who wifi wikipedia windows wordpress wuflu ww2 xigmanas xkcd youtube zfs

1st-amendment · 2nd-amendment · 4th-amendment · 5th-amendment · 9/11 · a8 · abortion · acl · adhd · afghanistan · africa · a/i · air-conditioning · amateur-radio · amazon · america · american · android · animals · anti-americanism · antifa · anti-semitism · antiv · antivirus · aoip · apollo · apple · appliances · archaeology · architecture · archive · art · astronomy · audio · automation · avatar · aviation · backup · bash · batteries · belleville · bible · biden · bill-of-rights · biology · bookmarks · books · borg · bush · business · calibre · camping · capitalism · cellphone · censorship · chemistry · children · china · christianity · church · cia · clinton · cloud · coldwar · communication · communist · composed · computers · congress · conservatives · constitution · construction · cooking · copyleft · copyright · corruption · cosmology · counseling · creation · crime · cron · crypto · culture · culture-of-death · cummins · data · database · ddt · dd-wrt · defense · democrats · depression · desantis · development · diagrams · diamonds · disinformation · diy · dns · documentation · dokuwiki · domains · dprk · drm · drm-tpm · drugs · dvd · dysautonomia · earth · ebay · ebola · ebook · economics · education · efficiency · electricity · electronics · elements · elwa · email · energy · engineering · english · environment · environmentalism · epa · ethernet · ethics · europe · euthanasia · evolution · faa · facebook · family · fbi · fcc · feminism · finance · firewall · flightsim · flowers · fonts · français · france · fraud · freebsd · free-speech · fun · games · gardening · genealogy · generation · generators · geography · geology · gifts · git · global-warming · google · gop · government · gpl · gps · graphics · green-energy · grounding · hdd-test · healthcare · help · history · hollywood · homeschool · hormones · hosting · houses · hp · html · humor · hunting · hvac · hymns · hyper-v · imap · immigration · india · infosec · infotech · insects · instruments · interesting · internet · investing · ip-addressing · iran · iraq · irs · islam · israel · itec · j6 · journalism · jumpcloud · justice · kindle · kodi · language · ldap · leadership · leftist · leftists · legal · lego · lgbt · liberia · liberty · linguistics · linux · literature · locks · make · malaria · malware · management · maps · markdown · marriage · mars · math · media · medical · meshcentral · metatek · metric · microbit · microsoft · mikrotik · military · minecraft · minidisc · missions · moon · morality · mothers · motorola · movies · mp3 · museum · music · mythtv · names · nasa · nature · navigation · navy · network · news · nextcloud · ntp · nuclear · obama · ocean · omega · opensource · organizing · ortlip · osmc · oxygen · paint · palemoon · paper · parents · passwords · patents · patriotism · pdf · petroleum · pets · pews · photography · photo-mgmt · physics · piano · picasa · plesk · podcast · poetry · police · politics · pollution · pornography · pots · prayer · pregnancy · presentations · press · printers · privacy · programming · progressive · progressives · prolife · psychology · purchasing · python · quotes · rabbits · rabies · racism · radiation · radio · railroad · reagan · recipes · recording · recycling · reference · regulations · religion · renewables · republicans · resume · riots · rockets · r-pi · russia · russiagate · safety · samba · satellites · sbe · science · sci-fi · scotus · secularism · security · servers · shaarli · shipping · ships · shooting · shortwave · signal · sjw · slavery · sleep · snakes · socialism · social-media · software · solar · space · spacex · spam · spf · spideroak · sports · ssh · statistics · steampowered · streaming · supplement · surveillance · sync · tarsnap · taxes · tck · tds · technology · telephones · television · terrorism · tesla · theology · thorium · thunderbird · time · tls · tools · toyota · trains · transformers · travel · trump · tsa · twitter · typography · ukraine · unions · united.nations · unix · ups · usa · vaccinations · vangelis · vehicles · veracrypt · video · virtualbox · virus · vitamin · vivaldi · vlc · voting · vpn · w3w · war · water · weather · web · whatsapp · who · wifi · wikipedia · windows · wordpress · wuflu · ww2 · xigmanas · xkcd · youtube · zfs

October 22, 2023 at 10:33:10 PM UTC * · permalink

·

/shaare/MhqFMA

·