Interesting numerical problem from Google: post your implementations

karkhaz · 2014-08-08 21:28:28

From this problem sheet (warning: links to an image):

Consider a function which, for a given whole number n, returns the number of ones required when writing out all numbers between 0 and n. For example, f(13) = 6. Notice that f(1) = 1. What is the next largest n such that f(n) = n?

I thought that this was an interesting problem I get 199,981 for the next such n, followed by 1,599,990.

I implemented the problem in (ugly) C and (sweet) OCaml, and was pleasantly surprised to find that the ML version was faster than C. Who has a nicer/faster C implementation than mine, or an implementation in your favourite functional language that beats out my ML implementation?

#include <stdio.h>
#include <stdlib.h>
#include <math.h>

#define MAX_DIGITS 10

int main(){
  unsigned n = 2;                                                                         

  while(1){ // For each n
    unsigned ones_up_to_n = 0; 
    unsigned curr;

    // For current n, curr gets each int up to n
    // Then we find the number of 1s in curr, and add them onto
    // `ones_up_to_n'.
    for(curr = 0; curr <= n; curr++){
      // Make array with digits of number
      // i.e. 123 => [0,0,0,0,0,0,0,0,1,2,3]
      unsigned str[MAX_DIGITS] = {0}; 
      long     denom = (long)pow(10, MAX_DIGITS - 1);
      unsigned temp_n = curr;
      unsigned i;
      for(i = 0; i < MAX_DIGITS; i++){
        unsigned div = temp_n / denom;
        temp_n -= (div * denom);
        str[i] = div; 
        denom /= 10;
      }    
      // Count 1s
      for(i = 0; i < MAX_DIGITS; i++) 
        if(str[i] == 1) ones_up_to_n++;
    }    
    if(n == ones_up_to_n){
      printf("\rFound n: %d\n", n);
      exit(0);
    }    
    n += 1;
  }
  return 0;

open Printf

let ones_in n =
  let str = string_of_int n in 
  let len = String.length str in
  let rec ones_in str idx total =
    if idx = len
    then total
    else begin
      match str.[idx] with 
        | '1' -> ones_in str (idx + 1) (total + 1) 
        |  _  -> ones_in str (idx + 1) total
      end in
    ones_in str 0 0

let ones_up_to n =
  let rec ones_up_to n total =
    match n with 
      | 0 -> total
      | m -> ones_up_to (m - 1) (total + (ones_in m)) in
    ones_up_to n 0

let _ =
  let n = ref 2 in 
    while true do
      let ones = ones_up_to !n in
        if ones = !n                                                                      
        then (printf "\r%15d\n%!" !n; exit 0)
        else n := (!n + 1) 
    done;

What I found interesting is how f(x) varies with x. f(x) stays lower than x (as you might expect) until it equalises at 199,981. However, the slope of f(x) / x becomes much steeper when x is a number beginning with 1, as you can see from the below graph of f(x) over x...the green line is y = x, and the red line is y = f(x).

Note the sudden increase in steepness at x=10,000, and also at x=100,000. In addition, if the first digit of x is not a 1, but the second digit is, then you can see little 'jumps' in the steepness of the graph.

Last edited by karkhaz (2014-08-08 21:34:21)

falconindy · 2014-08-08 22:36:43

karkhaz wrote:

I thought that this was an interesting problem I get 199,981 for the next such n, followed by 1,599,990.

Hrmm, but 199,982 uses one more 1, and thus is a candidate. And, all the numbers between 199,982 and 200,001 are candidates too. This seems to be a common occurrence for larger values of N, too.

Here's my C contribution:

#include <stdio.h>

int f(int n) {
  int ones = 0;

  while (n) {
    if (n % 10 == 1)
      ++ones;

    n /= 10;
  }

  return ones;
}

int main(void) {
  int n = 0, total = 0;

  for (;;) {
    ++n;

    total += f(n);;

    if (n == total)
      printf("fn(%d) = %d\n", n, total);
  }

  return 0;
}

Compiled at -O3, this finds 117463825 as a candidate in 3.7s (3.3ghz CPU)

Last edited by falconindy (2014-08-08 22:38:26)

karkhaz · 2014-08-08 23:09:26

falconindy wrote:

Hrmm, but 199,982 uses one more 1, and thus is a candidate. And, all the numbers between 199,982 and 200,001 are candidates too. This seems to be a common occurrence for larger values of N, too.

You're right. What I meant was that, given that these `blocks' of candidates occur, the next block starts at 1,599,981.

falconindy wrote:

Here's my C contribution:

*...slow clap* Way way faster than my solution, thanks! And simpler too.

Trent · 2014-08-09 20:53:34

I'm not at my main machine, but there is a faster way to calculate f(n) for large n rather than summing it up as you go. Consider the pattern
f(9) = 1
f(99) = 10 + 10
f(999) = 100 + 100 + 100
...

which can be proven to extrapolate to any number of nines, and consider how to calculate separately the number of ones that are in the ones' place, those that are in the tens' place, those that are in the hundreds' place, and so on, without having to count them individually. I would post my proof of concept code but it's just way too ugly and I'm dealing with a weird keyboard layout (qwerty) so I'm not willing to try to fix it now.

Granted, if you have to walk over all the positive integers anyway it's probably cheaper to use an accumulator like falconindy's solution does.

karkhaz · 2014-08-10 17:03:42

Trent wrote:

Granted, if you have to walk over all the positive integers anyway it's probably cheaper to use an accumulator like falconindy's solution does.

Indeed, but your observation is useful if you want to calculate f(n) for a particular n without finding all the previous ones first. Here's my code based on your observation:

#include <math.h>

long ones_in(long n){
  long ones = 0; 

  while(n){
    if(n % 10 == 1) ++ones;
    n /= 10;
  }
  return ones;
}

long f(long n){
  unsigned digits = 0; 
  long temp_n = n; 
  while(temp_n){
    digits++;
    temp_n /= 10;
  }

  // The largest 100...000 smaller than n
  long pow_ten = (long)pow(10, digits - 1);

  // f(the largest 999...999 smaller than n)                                              
  long ones = (digits - 1) * (pow_ten / 10); 

  // Then calculate the remainder of the ones from pow_ten upwards
  long i;
  for(i = pow_ten; i <= n; i++){
    ones += ones_in(i);
  }

  return ones;
}

This code calculates f(1000000000000000000) instantly. I can't go much higher than that number, the program spins for some reason (limits on the size of long?).

Edit: Also, here's a new ML implementation based on falconindy's idea. It's about as fast as falconindy's C code on my machine.

open Printf

let ones_in n =
  let rec ones_in n total =
    match n with 
      | 0 -> total
      | _ -> begin match n mod 10 with 
        | 1 -> ones_in (n / 10) (total + 1) 
        | _ -> ones_in (n / 10) total
       end in
    ones_in n 0

let _ =
  let n = ref 1 in 
  let ones = ref 0 in 
    while true do
      ones := !ones + (ones_in !n); 
      if !ones = !n 
      then printf "%12d\n%!" !n                                                           
      else ();
      n := (!n + 1) 
    done;

Last edited by karkhaz (2014-08-18 00:44:21)

skottish · 2014-08-16 19:22:41

Here's a Haskell version. It could be a little cleaner, but it's not bad. It runs at ~30 seconds up to 1111111110 (compiled with -O2 -fllvm). To put that in perspective, falconindy's posted version (32 bit ints) runs ~22 seconds here. Using longs (64 bit ints) with his code, it's runs about ~26 seconds. When using arbitrary length integers on my Haskell version -- which is the problem domain -- things come out to ~360 seconds*:

import Control.Monad

ones :: Int -> Int -> Int
ones n acc
    | n == 0  = acc
    | n /= 0  = if digit == 1
                then ones next (acc + 1)
                else ones next acc
    where
        (next, digit) = n `quotRem` 10

matches :: Int -> Int -> [Int]
matches n a
    | n == acc  = n : matches (n + 1) acc
    | n /= acc  = matches (n + 1) acc
    where
        acc = a + ones n 0

main :: IO ()
main = mapM_ (\x -> print x) $ matches 1 0

* Integer division for arbitrary length integers is a major bottleneck in Haskell (at least).

**EDIT**

I had to know that if for arbitrary length integers if it was simply faster to convert the numbers to strings and count the 1s. It is. This blob runs at ~315 seconds. Admittedly, it will fail with the first number with 9223372036854775808 1's in it as 'length' only returns 64 bit ints.

import Control.Monad

ones :: Integer -> Integer -> [String]
ones n a
    | n == acc  = enn : ones (n + 1) acc
    | n /= acc  = ones (n + 1) acc
    where
        enn = show n
        acc = a + (fromIntegral $ length $ filter (== '1') $ enn)

main :: IO ()
main = mapM_ (\x -> putStrLn x) $ ones 1 0

Last edited by skottish (2014-08-16 22:25:50)

karkhaz · 2014-08-17 17:50:58

skottish wrote:

Integer division for arbitrary length integers is a major bottleneck in Haskell (at least).

So consider switching to Caml Here's my arbitrary-length integer version, and it gets to 1111111110 in about 38 seconds. Your second Haskell program (using strings) took about 340s on my machine.

open Printf
open Big_int

let ten = big_int_of_int 10
let one = unit_big_int
let zero = zero_big_int

let ones_in n =
  let rec ones_in n total =
    if eq_big_int n zero
    then total
    else let (quot, rem) = quomod_big_int n ten in
      if eq_big_int rem one
      then ones_in quot (succ_big_int total)
      else ones_in quot total in
    ones_in n zero

let _ =
  let rec f_all n ones_so_far =
    let new_ones = add_big_int ones_so_far (ones_in n) in
      begin
        if eq_big_int new_ones n
        then printf "%d\n%!" (int_of_big_int n)
        else ();
        f_all (succ_big_int n) new_ones
      end in
    f_all zero zero

My rubbish first attempt in the first post used strings as well, but it was slower than your version in Haskell. I still find OCaml simpler to read then Haskell though, and I need to learn Haskell at some point soon, yuck.

Last edited by karkhaz (2014-08-17 17:58:05)

skottish · 2014-08-17 19:29:35

@karkhaz,

What compiler options are you using to compile your OCaml code? With just ocamlopt, your regular int version is running over four minutes on my end.

OCaml seems like a pretty cool language. I wouldn't mind messing around with it so more.

My Haskell is still pretty novice. I'm going to try to get this one more performant.

**EDIT**

Just for fun, this is a still slow, but totally funtioning Clojure version:

(ns cl.core
  (:gen-class)
  (:require [clojure.java.io :as io :only (reader)]))

(defn ones
  ([n] (ones n 0))
  ([n acc]
    (let [digit (rem n 10)
          next-num (quot n 10)]
      (if (== n 0)
        acc
        (if (== 1 digit)
          (recur next-num (inc acc))
          (recur next-num acc))))))

(defn matches
  ([] (matches 1 0))
  ([n a]
    (let [acc (+ (ones n) a)]
      (if (== n acc)
        (println n))
        (recur (inc n) acc))))

(defn -main
  []
  (matches))

Last edited by skottish (2014-08-17 19:36:44)

karkhaz · 2014-08-17 20:22:18

Oh wow, eating humble pie at the moment. When I thought I was running my arbitrary-int Caml code, I was actually running your regular-int Haskell code. *facepalm* That's why it was so fast, I thought that couldn't be right.

Here's the scoreboard so far:

Time to reach 1111111110
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
falconindy C:                  00m 31s
skottish regular-int Haskell:  00m 36s
karkhaz regular int Ocaml:     03m 35s
skottish big-int Haskell:      04m 46s
karkhaz big-int Ocaml:         43m 57s

I'm relatively new to ML. AFAIK, there are no optimisation flags that you can pass to ocamlopt. If you want to build my big-int version, you'll need to link in the nums library: ocamlopt -o ml_big_int nums.cmxa ml_big_int.ml.

I can't figure out how to run the Clojure code (never used Clojure before). I did

lein new cloj
cd cloj

Then copied your code into src/cloj/core.clj, and then started lein repl in the cloj directory and typed (-main), but it is "Unable to resolve symbol". The code looks cute though.

Based on trent's observation, I wonder if this problem could be fruitfully multi-threaded. Given that we can trivially calculate the answer for 10, 100, 1000,... then it might be worth spawning some threads with those numbers as their 'starting point' and then each thread carries on from there. I'll do an implementation at some point, but I bet it would be cute if done in Erlang or some other concurrency-oriented language.

skottish · 2014-08-17 21:34:23

karkhaz wrote:

I can't figure out how to run the Clojure code (never used Clojure before). I did
lein new cloj
cd cloj
Then copied your code into src/cloj/core.clj, and then started lein repl in the cloj directory and typed (-main), but it is "Unable to resolve symbol". The code looks cute though.

On the top of my core.clj, change the namespace line to:

(ns cloj.core

and see if that helps. If everything runs, go into the project directory and use:

lein uberjar

The version that will run will be in ./target/uberjar and is the "standalone" one. Run it with "java -jar". If the uberjar sub-directory isn't there, then your version of leiningen is getting old. If that's the case, then the jars will be simply be under "target" (I think). I am running openjdk8 with Clojure 1.6.

Based on trent's observation, I wonder if this problem could be fruitfully multi-threaded. Given that we can trivially calculate the answer for 10, 100, 1000,... then it might be worth spawning some threads with those numbers as their 'starting point' and then each thread carries on from there. I'll do an implementation at some point, but I bet it would be cute if done in Erlang or some other concurrency-oriented language.

If I understand what's happening with the code that you wrote with this algorithm before, I have some observations. First off, I ported it to Haskell to see if I could use it. I quickly realized that I was going to end up doing more work with that version that what we've been using. The reason why is that even if an accumulator is implemented, I would still have to transverse nearly every single number (the only exceptions would be when the upper limit is an even power of 10). Even though for a single number that is way faster, adding the extra math while still having to transverse nearly every number will slow it down. I may be wrong about this, but I couldn't see how to escape it.

Last edited by skottish (2014-08-17 21:36:15)

Trent · 2014-08-17 23:43:14

There is an algorithm that calculates f(n) directly (without calculating f(n-1) first, for all values of n) in O(log n) time. I spent a while inventing it last week, left it on a desk in another state and really don't feel like working it out again from scratch right now. But it certainly does exist, and could be used to multithread the original problem.

If somebody else cares to figure out what it was and post it before I get around to it, I won't complain :-)

Last edited by Trent (2014-08-17 23:44:36)

karkhaz · 2014-08-18 00:28:01

skottish wrote:

The reason why is that even if an accumulator is implemented, I would still have to transverse nearly every single number (the only exceptions would be when the upper limit is an even power of 10). Even though for a single number that is way faster, adding the extra math while still having to transverse nearly every number will slow it down. I may be wrong about this, but I couldn't see how to escape it.

You don't need to traverse nearly every single number...given n, you need to find the largest x = 100...000 that is smaller than n, and then traverse every number from x up to n. Trent's observation was that you can trivially calculate f(x) (well, Trent did it using x - 1), and so you don't have to traverse numbers up to x.

Here's a solution using multi-processing, with Trent's observation. It's a bit faster than falconindy's solution on my dual-core machine. It forks a new process for each order of magnitude (so a process to find candidates between 1-10, another process for candidates between 10-100, and so on).

#include <math.h>
#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>
#include <stdlib.h>
#include <sys/wait.h>

// Look for solutions with up to N_THREADS digits
#define N_THREADS 10

long ones_in(long n){
  long ones = 0;
  while(n){
    if(n % 10 == 1) ones++;
    n /= 10;
  }
  return ones;
}

int main(){
  pid_t pids[N_THREADS] = {0};

  for(int exp = 0; exp < N_THREADS; exp++){
    long start_n = pow(10, exp);
    long end_n = pow(10, exp + 1);

    pid_t pid = fork();
    if(pid != 0){  // top-level parent process
      pids[exp] = pid;
      continue;
    }

    // child worker process
    long ones = (start_n / 10) * exp;     // f(999...999)

    for(long n = start_n; n < end_n; n++){
      ones += ones_in(n);
      if(ones == n) fprintf(stderr, "%ld\n", n);
    }

    exit(0);
  }

  int status;
  for(int i = 0; i < N_THREADS; i++)
    waitpid(pids[i], &status, 0);
}

Note that this will print the numbers out-of-order, so don't kill it when it prints out 1111111110. You need to wait for it to print out 535200001, which is an order of magnitude smaller but takes longer to calculate.

Time to reach 1111111110
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
karkhaz multiprocess C:        00m 24s
falconindy C:                  00m 31s
skottish regular-int Haskell:  00m 36s
karkhaz regular int Ocaml:     03m 35s
skottish big-int Haskell:      04m 46s
karkhaz big-int Ocaml:         43m 57s

Trent, I had a half-formed algorithm in my mind which would work in O(log n) time (if it works at all); perhaps it might be similar to what you were thinking of. The general idea is that I start at the largest 100...000 lower than n, as before, and then 'zoom in' to n in large steps rather than one at a time (using binary partitioning).

Anyway, the code I wrote in post #5 (based on your idea) for working out a particular n is already super-fast even for huge n. We'd have to use abitrary-size ints in Haskell or whatever to demonstrate the advantages of an O(log n) algorithm

Last edited by karkhaz (2014-08-18 00:41:32)

Trent · 2014-08-18 12:37:06

Here is more or less the algorithm I was thinking of. It's actually not that different in concept from the "zooming in" idea, but it works inside out: first it counts all the ones that would be in the ones' place when written out, then all the ones that would be in the tens' place, etc. whereas your proposed method would start with the highest order of magnitude and get closer by degrees. This algorithm works by partitioning the number at each step into three pieces like so:

 1 0 1(4)    # 101 "ones", plus 1 because 4 > 1
 1 0(1)4     # 10*10 "ones", plus 4+1 because 1 = 1
 1(0)1 4     # 1*100 "ones", plus nothing because 0 < 1
(1)0 1 4     # 0*1000 "ones", plus 14+1 because 1 = 1

At each step the digit in (parens) gets prepended to the remainder. The remainder is only used in calculation when the digit under examination is exactly 1.

The following is a linear recursive implementation in Python. It calculates f(2**3300) on my machine without breaking a sweat or detectable latency. Much larger than that and I run into recursion depth limits. Tail call optimization would fix that.

def f(n, place=1, rmdr=0, accum=0):
	if n == 0: return accum
	new_n, digit = divmod(n, 10)
	tmp = place*new_n
	if digit == 1:
		tmp += rmdr + 1
	elif digit > 1:
		tmp += place
	return f(new_n, place*10, digit*place + rmdr, accum + tmp)

Considering arbitrary precision arithmetic, I think this is actually O(log^2 n). It's still not going to beat a straightforward counting solution for enumerating all of them, but perhaps there's some extra cleverness that could be applied, like sampling f(n) every so often and singling out for closer examination intervals where (f(n) - n) changes sign.

Here's a puzzler for the mathematicians: Are there infinitely many n > 0 such that f(n) = n? I suspect there are, and furthermore I suspect there are infinitely many n > 0 such that f(n) != n, but I couldn't prove either assertion.

(edit) Hmm, on reflection, I'm not sure whether that makes sense or not. Still interesting to think about.

Last edited by Trent (2014-08-18 12:48:41)

tavianator · 2014-08-18 15:36:56

Trent wrote:

It's still not going to beat a straightforward counting solution for enumerating all of them, but perhaps there's some extra cleverness that could be applied, like sampling f(n) every so often and singling out for closer examination intervals where (f(n) - n) changes sign.

The general idea would be to try galloping search to find a and b such that (f(a) - a) and (f(b) - b) have opposite sign, then bisect the interval repeatedly to find zeros.

Here's a puzzler for the mathematicians: Are there infinitely many n > 0 such that f(n) = n? I suspect there are, and furthermore I suspect there are infinitely many n > 0 such that f(n) != n, but I couldn't prove either assertion.

If there are infinitely many n such that f(n) = n then there are also infinitely many such that f(n) != n because there are many intervals where f(n) grows faster than n (whenever n has multiple 1s in it).

Xyne · 2014-08-18 16:59:55

I have been thinking about this off and on since I saw this thread yesterday. Here's my algorithm using slopes to jump ahead. My count_ones function counts the total number of "1"s for a given number without cumulation.

The algorithm is fresh-baked and unrefined. I expect that the next slope at each jump could be chosen more efficiently. There are probably also some numerical tricks that can be used to speed up some calculations that I have done naively at the moment.

I have used unsigned ints for now. At some point I hit an integer overflow error which could be avoided using unsigned long ints. I have changed the algorithm since then so I'm not sure if I still hit the overflow or if the higher orders just take too much time. I haven't bothered debugging it yet.

Note that within an order of magnitude n (10^n), the rate of increase of "1"s ranges from n to 0 but averages to n/10, which means that for some x above n=10, the "1"s accumulate faster than than the numbers increase. The total number of solutions is therefore finite.

Here's my code (v8)

#include <stdlib.h>
#include <stdio.h>

#define MIN(a, b) (((a) < (b)) ? (a) : (b))
#define MAX(a, b) (((a) > (b)) ? (a) : (b))

#define NEXT(x, y, k) ((unsigned long int) ((((k)*(x)) - (y)) / ((k) - 1)))




/*
  Count all "1"s required to write the sequence of numbers up to the given
  number.

  The number of "1"s in 10^n is n(10^n)+1 (+1 for the leading "1").
*/
unsigned long int
count_ones(
  unsigned long int n,
  unsigned int * order,
  unsigned int * digit
)
{
  unsigned long int ones = 0, remainder = 0, t = 0;
  *order = -1;

  while (n)
  {
    ++(*order);
    *digit = (n % 10);
    if (t)
    {
      ones += *digit * (*order * t);
      t *= 10;
    }
    else
    {
      t = 1;
    }
    if (*digit == 1)
    {
      ones += remainder + 1;
    }
    else if (*digit > 1)
    {
      ones += t;
    }
    remainder += *digit * t;
    n /= 10;
  }

  return ones;
}



int
main(int argc, char * * argv)
{
  double x, y, k;
  unsigned long int x0, y0, x1, y1 = 0, next_range, previous_range = 1;
  unsigned int order = 0, previous_order = 0, digit;
  int i;


  // Code for testing the count_ones function.
  if (argc > 1)
  {
    for (i = 1; i < argc; i++)
    {
      x0 = atoi(argv[i]);
      printf("%lu -> %lu\n", x0, count_ones(x0, &order, &digit));
    }
    return EXIT_SUCCESS;
  }



  /*
    Print out the first solution here and skip to the first order of magnitude
    to reduce the number of checks needed (k > 1).
  */
  printf("1\n");
  x0 = 10;

  /*
    After the order reaches 10, the number of "1"s grows too fast for the number
    to catch up.
  */
  while (order <= 10)
  {
    y0 = count_ones(x0, &order, &digit);

    /*
      Print the solution and check if the next number is also a solution.
    */
    if (x0 == y0)
    {
      printf("%lu\n", x0);
      x0++;
      continue;
    }
    /*
      Because the "1"s accumulate (i.e. the sequence is non-decreasing), no
      value of x0 below y0 at this point can have as many "1"s. Jump ahead to
      y0.
    */
    else if (y0 > x0)
    {
      x0 = y0;
      continue;
    }


    /*
      The next range is the next sequence of numbers with a leading "1". This
      range will increase the maximum slope by 1.
    */
    next_range = previous_range;
    // ">=" to reach the next order, i.e. "(order + 1) > previous_order"
    while (order >= previous_order)
    {
      next_range *= 10;
      previous_order++;
    }
    previous_range = next_range;

    x = (double) x0;
    y = (double) y0;
    /*
      The maximum slope is the order + 1 because this is the maximum number of
      "1"s that may appear in the number of that order. Here the slope is set to
      the order and the +1 is conditionally added below if the current number
      begins with "1".
    */
    k = (double) order;


    /*
      If we are current in a range of numbers beginning with "1" and x is less
      than f(x) then f(x) is increasing at least as fast as x. Jump ahead to
      the next number with a leading "2".
    */
    if (digit == 1)
    {
      next_range /= 5;
      k += 1.0;
    }


    // Use the slope to find the next possible candidate in the current range.
    x1 = NEXT(x, y, k);
    // Jump to the next range if it begins before the candidate.
    x1 = MIN(x1, next_range);
    // Make sure that x0 always increases.
    x0++;
    x0 = MAX(x0, x1);
  }


  // Print out information about where the algorithm stopped.
  printf("\nStopped at %lu.\n", x0);

  y0 = count_ones(x0, &order, &digit);
  y1 = y0;
  if (y1 > x0)
  {
    // +1 to ensure that an order of 10 implies a difference >10^10.
    y1 -= (x0 + 1);
    order = 0;
    while (y1 /= 10)
    {
      order++;
    }
    if (order >= 10)
    {
      printf(
        "f(%lu) - %lu =\n"
        "%lu - %lu =\n"
        "%lu >\n"
        "10^10 => No more solutions are possible.\n",
        x0, x0,
        y0, x0,
        y0 - x0
      );
      return EXIT_SUCCESS;
    }
  }

  printf("More solutions may be possible (increase the maximum order to find them).\n");

  return EXIT_SUCCESS;
}

Compiled with

gcc -Wall -O3...

Results:

$ time ./a.out 
1
199981
199982
199983
199984
199985
199986
199987
199988
199989
199990
200000
200001
1599981
1599982
1599983
1599984
1599985
1599986
1599987
1599988
1599989
1599990
2600000
2600001
13199998
35000000
35000001
35199981
35199982
35199983
35199984
35199985
35199986
35199987
35199988
35199989
35199990
35200000
35200001
117463825
500000000
500000001
500199981
500199982
500199983
500199984
500199985
500199986
500199987
500199988
500199989
500199990
500200000
500200001
501599981
501599982
501599983
501599984
501599985
501599986
501599987
501599988
501599989
501599990
502600000
502600001
513199998
535000000
535000001
535199981
535199982
535199983
535199984
535199985
535199986
535199987
535199988
535199989
535199990
535200000
535200001
1111111110

Stopped at 125521283796.
f(125521283796) - 125521283796 =
171546411754 - 125521283796 =
46025127958 >
10^10 => No more solutions are possible.

real	0m0.001s
user	0m0.000s
sys	0m0.000s

The time varies between 1 and 4 ms.

Btw, if you pass it numbers on the command line then it will print out the number of "1"s for each given number.

edit 1
I have tweaked the algorithm and updated the code and results. The code now uses unsigned long ints.

If fails to detect some numbers so the algorithm obviously needs further tweaking.

edit 2
Previous errors were due to floating point rounding errors. Changing floats to doubles along with another tweak to the algorithm seems to have fixed it. Can someone confirm that it finds all solutions?

edit 3
v6: stopping data

edit 4
v7: simplified code, added comments, removed debugging statements

edit 5
v8: reduced iteration by adjusting k value based on first digit

Last edited by Xyne (2014-08-25 18:49:43)

skottish · 2014-08-18 19:49:50

Xyne wrote:

Can someone confirm that it finds all solutions?

Your output matches up with mine.

Xyne · 2014-08-18 20:17:52

skottish wrote:

Xyne wrote:
Can someone confirm that it finds all solutions?
Your output matches up with mine.

Thanks!

Trent · 2014-08-18 22:56:07

Xyne wrote:

Note that within an order of magnitude n (10^n), the rate of increase of "1"s ranges from n to n/10, which means that for some x above n=10, the "1"s accumulate faster than than the numbers increase. The total number of solutions is therefore finite.

While this makes logical sense and agrees with everything I've found so far, it's not sufficient evidence because there are an infinite number of intervals over which the slope is 0. (For example, [23456789098765432, 23456789098765440] is such an interval.) Knowing the average rate of accumulation over an order of magnitude isn't enough to tell you that f(n) is never equal to n over that interval. But I guess the conclusion is correct regardless.

Xyne · 2014-08-19 02:00:20

Trent wrote:

Xyne wrote:
Note that within an order of magnitude n (10^n), the rate of increase of "1"s ranges from n to n/10, which means that for some x above n=10, the "1"s accumulate faster than than the numbers increase. The total number of solutions is therefore finite.
While this makes logical sense and agrees with everything I've found so far, it's not sufficient evidence because there are an infinite number of intervals over which the slope is 0. (For example, [23456789098765432, 23456789098765440] is such an interval.) Knowing the average rate of accumulation over an order of magnitude isn't enough to tell you that f(n) is never equal to n over that interval. But I guess the conclusion is correct regardless.

I disagree. The average rate over order n is n/10. The only thing missing is to show that at some point (f(x) - x) > 10^10. After that point, regardless of however many intervals of 0 slope there are, they are not long enough to close the difference of 10^10. For every subsequent interval of 10^10 numbers ([x, x+10^10]) at least 10^10 "1"s will accumulate. The numbers cannot catch up. v6 of my script includes stopping data which shows that this condition is met:

...
535199990
535200000
535200001
1111111110

Stopped at 108265749976.
f(108265749976) - 108265749976 =
126782229975 - 108265749976 =
18516479999 >
10^10 => No more solutions are possible.

Trent · 2014-08-19 11:02:47

Xyne wrote:

I disagree. The average rate over order n is n/10. The only thing missing is to show that at some point (f(x) - x) > 10^10. After that point, regardless of however many intervals of 0 slope there are, they are not long enough to close the difference of 10^10. For every subsequent interval of 10^10 numbers ([x, x+10^10]) at least 10^10 "1"s will accumulate. The numbers cannot catch up.

Ok, I'm happy now

Xyne · 2014-08-19 11:44:22

I am clearly not making the most of my time, but I have made some notes in case anyone is interested. It is not necessary to find f(x) for which f(x) - x > 10^10. Its existence is ensured by the average rate surpassing 1.

edit: cleaned up after the brainfart that led to "notes in anyone is interesting".

Last edited by Xyne (2014-08-19 17:05:36)

Trilby · 2014-08-19 12:36:24

FINAL EDIT: please ignore this post. It turns out it's complete nonsense that is evidence of only two things: 1) my embarrasment that I generally consider myself quite mathematically minded, yet post this; and 2) I should never ever try to tackle such problems before the morning coffee.
-----

Perhaps another way to conceptualize Xyne's solution is that in an infinite set of rational numbers each digit is represented uniformly at each place. In otherwords, the number '1' shows up as often in the ones place as '2' does and '3' does, etc. Specifically, each numeral shows up 1/10 times in each place. So for the set of all numbers greater than ten digits, there are more than ten places for each of these ten numerals to be distributed.

In fact, we should be able to conclude just frm this, that among 10 digit numbers, the total number of 1s used to write those numbers should be exactly equal to the number of numbers ~~(i.e. f(10^10) = 10^10)~~. [edit: I deleted this once when I second guessed it, then reconfirmed it: 10^10 distinct numbers, each with 10 digits (leading zeros are counted) and each place holding a '1' one out of ten times, so 10^10 numbers * 10 digits * 1/10 ones-per-digit = 10^10]

The further beyond 10 digits one goes, the faster this divergence of f(x) > x will be as there are more places in which 1s will be uniformly distributed.

Or, perhaps this is exactly how Xyne was conceptualizing it to start with and I'm just catching up.

EDIT: now I'm second guessing again. I'm convinced by the logic of this approach - but I realized the actual number would be different. The largest 10-digit number is not 10^10 of course, but 10^11 - 1. So it is f(10^11-1) that should be equal to 10^10.

Last edited by Trilby (2014-08-19 13:11:53)

Trent · 2014-08-23 19:54:18

I thought falconindy's solution was a good starting point for me to experiment a little with Rust. The following is a near-straight port of falconindy's C:

/// `ones(n)` calculates and returns the number of ones in the decimal
/// representation of the natural number `n`
fn ones(n: int) -> int {
	let mut tmp = n;
	let mut result = 0;
	while tmp > 0 {
		if tmp%10 == 1 {
			result += 1;
		}
		tmp /= 10;
	}
	return result;
}
	
fn main() {
	let mut total = 0;
	let mut n = 0;
	while n < 1111111110 {
		n += 1;
		total += ones(n);
		if n == total {
			println!("f({}) = {}", n, total);
		}
	}
}

Rust's int is 64 bits wide. I changed the types in each version (and added the termination condition to the C version) to do comparisons for four different integer types:

                 C   Rust  delta
  signed 32  29.98  31.37  +1.39 (4.6% slower)
  signed 64  35.51  38.46  +2.95 (8.3% slower)
unsigned 32  26.17  26.90  +0.73 (2.8% slower)
unsigned 64  33.80  32.36  -1.44 (4.3% faster)

I'm guessing the C compiler can make the signed versions faster partly by assuming that undefined behavior never happens, in a way that Rust isn't permitted to. I was more surprised by the fact that Rust with uint actually beats C with uint64_t.

I also wrote a linear recursive ones() function to see the compiler TCO it. This uint version runs in the same time as the above:

fn ones(n: uint, ac: uint) -> uint {
	if n < 1 {
		return ac;
	}
	let (q, d) = (n/10, n%10);
	let tmp = if d == 1 { 1 } else { 0 };
	return ones(q, ac + tmp);
}

I think it's possible to make it perform better, though. In any case, I'm satisfied that Rust's claim to zero-cost abstractions isn't all wind. Thanks for the interesting problem karkhaz!

Rust is under development, has bugs, may eat your laundry, etc. Programs were compiled with `gcc -O2` and `rustc --opt-level 2`.

skottish · 2014-08-23 22:35:02

Trent wrote:

Thanks for the interesting problem karkhaz!

No doubt! This is a fun one. And I'm really hoping that more people get involved with programming questions here; Stack Overflow, while great, is a little strict for my tastes.

Anyway, this Haskell version is way slower than my last one. The point being is that I implemented my own function that calculates all of the ones in one pass. It calculates the the top positive 64 bit int instantly:

***EDIT***

My last implementation of the 'ones' function was using exponents for the 10's and not simple multiplication. This one is 7x faster than the other and is no longer horrifyingly slow. It's still slower than my first version, but that's the next problem...

import Control.Monad

ones :: Int -> Int
ones n
  | n == 0     = 0 
  | n < 10     = 1
  | digit == 0 = go next 0 1 1
  | digit /= 0 = go next 1 1 1
  where
    (next, digit) = n `quotRem` 10
    num = n
    go n acc pass tens
      | n == 0 = acc
      | n /= 0 = go next (acc + result) (pass + 1) (tens * 10)      
      where
        (next, digit) = n `quotRem` 10
        p = pass * tens
        t = 10 * tens
        result
          | digit == 0 = 0
          | digit == 1 = p + num `rem` t + 1
          | digit /= 1 = p * digit + t

matches :: [Int]
matches = go 1
  where
    go n
      | n == 1111111111 = []
      | n == ones n = n : go (n + 1)
      | otherwise   = go (n + 1)

main :: IO ()
main = mapM_ (\x -> print x) $ matches

Last edited by skottish (2014-08-24 19:17:45)

karkhaz · 2014-08-24 02:28:39

skottish wrote:

Trent wrote:
Thanks for the interesting problem karkhaz!
No doubt! This is a fun one. And I'm really hoping that more people get involved with programming questions here; Stack Overflow, while great, is a little strict for my tastes.

Cheers. I haven't replied for a few days because I'm super-impressed with xyne's solution, and want to devote a few hours to trying to understand it. I'll just note that this problem came from some kind of parody entrance exam for job candidates at Google...if any of you are between jobs at the moment, you might want to send your CV and a link to this thread to the folks at Mountain View

edit: Just to make it clear, I'm not a head-hunter for google or anything. In case I implied that I was

Last edited by karkhaz (2014-08-24 02:29:41)

Arch Linux

#1 2014-08-08 21:28:28

Interesting numerical problem from Google: post your implementations

#2 2014-08-08 22:36:43

Re: Interesting numerical problem from Google: post your implementations

#3 2014-08-08 23:09:26

Re: Interesting numerical problem from Google: post your implementations

#4 2014-08-09 20:53:34

Re: Interesting numerical problem from Google: post your implementations

#5 2014-08-10 17:03:42

Re: Interesting numerical problem from Google: post your implementations

#6 2014-08-16 19:22:41

Re: Interesting numerical problem from Google: post your implementations

#7 2014-08-17 17:50:58

Re: Interesting numerical problem from Google: post your implementations

#8 2014-08-17 19:29:35

Re: Interesting numerical problem from Google: post your implementations

#9 2014-08-17 20:22:18

Re: Interesting numerical problem from Google: post your implementations

#10 2014-08-17 21:34:23

Re: Interesting numerical problem from Google: post your implementations

#11 2014-08-17 23:43:14

Re: Interesting numerical problem from Google: post your implementations

#12 2014-08-18 00:28:01

Re: Interesting numerical problem from Google: post your implementations

#13 2014-08-18 12:37:06

Re: Interesting numerical problem from Google: post your implementations

#14 2014-08-18 15:36:56

Re: Interesting numerical problem from Google: post your implementations

#15 2014-08-18 16:59:55

Re: Interesting numerical problem from Google: post your implementations

#16 2014-08-18 19:49:50

Re: Interesting numerical problem from Google: post your implementations

#17 2014-08-18 20:17:52

Re: Interesting numerical problem from Google: post your implementations

#18 2014-08-18 22:56:07

Re: Interesting numerical problem from Google: post your implementations

#19 2014-08-19 02:00:20

Re: Interesting numerical problem from Google: post your implementations

#20 2014-08-19 11:02:47

Re: Interesting numerical problem from Google: post your implementations

#21 2014-08-19 11:44:22

Re: Interesting numerical problem from Google: post your implementations

#22 2014-08-19 12:36:24

Re: Interesting numerical problem from Google: post your implementations

#23 2014-08-23 19:54:18

Re: Interesting numerical problem from Google: post your implementations

#24 2014-08-23 22:35:02

Re: Interesting numerical problem from Google: post your implementations

#25 2014-08-24 02:28:39

Re: Interesting numerical problem from Google: post your implementations

Board footer