You are not logged in.
Pages: 1
Hi guys,
One of my test case takes approximately 10 times more time than it should be. I tried to use gprof to find where is the bottleneck, and get the following thing.
time ./test_rintegral.out
... <skipped output>...
./test_rintegral.out 29.79s user 0.08s system 99% cpu 30.058 total
Around 30 seconds.
While the output of profile
Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls s/call s/call name
20.53 0.42 0.42 6812301 0.00 0.00 grid_lin_log_table_eval_function_derivative
15.46 0.74 0.32 6808185 0.00 0.00 power_free_bound_eval
14.01 1.03 0.29 6810796 0.00 0.00 d_free_coulomb_eval_all
11.59 1.27 0.24 6810796 0.00 0.00 general_dirac_free_eval
10.63 1.50 0.22 6810796 0.00 0.00 general_dirac_bound_eval
10.39 1.71 0.21 6810796 0.00 0.00 d_free_coulomb_tail_eval
7.25 1.86 0.15 6791217 0.00 0.00 bound_artificial_eval
3.38 1.93 0.07 6809490 0.00 0.00 integrant_change_variables_eval
3.38 2.00 0.07 1306 0.00 0.00 quad_romberg_2
0.97 2.02 0.02 1 0.02 0.02 general_dirac_bound_init_bound_artificial
0.48 2.03 0.01 1305 0.00 0.00 power_free_bound_eval_const
0.48 2.04 0.01 2 0.01 0.01 general_function_init_integrant_change_variables
0.48 2.05 0.01 1 0.01 0.02 rintegral_power_fb_const
0.48 2.06 0.01 d_free_coulomb_eval
0.24 2.06 0.01 1 0.01 0.01 general_grid_init_grid_lin_log_table
0.24 2.07 0.01 d_free_coulomb_tail_conjugate_eval_asymptotic
0.00 2.07 0.00 19579 0.00 0.00 d_power_mexp_eval
0.00 2.07 0.00 2811 0.00 0.00 grid_lin_log_table_eval
0.00 2.07 0.00 1502 0.00 0.00 grid_lin_log_eval_function_derivative
0.00 2.07 0.00 1502 0.00 0.00 solve_linear_log_eq
0.00 2.07 0.00 1500 0.00 0.00 analityc_inner_fb_eval_0_0_0
0.00 2.07 0.00 1500 0.00 0.00 general_inner_eval
0.00 2.07 0.00 1500 0.00 0.00 test_rinner_compare_given
0.00 2.07 0.00 1306 0.00 0.00 rintegral_power_fb_origin_without_power
0.00 2.07 0.00 1303 0.00 0.00 bisect_find_index
0.00 2.07 0.00 1303 0.00 0.00 cubic_spline_eval
0.00 2.07 0.00 4 0.00 0.00 grid_lin_log_eval_inverse
0.00 2.07 0.00 2 0.00 0.00 power_series_eval
0.00 2.07 0.00 2 0.00 0.00 rintegral_power_fb_asymptotic_without_power
0.00 2.07 0.00 1 0.00 0.00 bound_artificial_init
0.00 2.07 0.00 1 0.00 0.00 cubic_spline_calc
0.00 2.07 0.00 1 0.00 0.00 cubic_spline_ialloc
0.00 2.07 0.00 1 0.00 0.00 cubic_spline_ifree
0.00 2.07 0.00 1 0.00 0.00 cubic_spline_init
0.00 2.07 0.00 1 0.00 0.00 d_free_coulomb_init_k
0.00 2.07 0.00 1 0.00 0.00 d_free_coulomb_tail_init_finish
0.00 2.07 0.00 1 0.00 0.00 general_dirac_free_init_d_free_coulomb
...<skipped the rest. The thing is that cumulative time stuck on 2.07>...
Therefore I get that, gprof thinks that cumulative time of execution is 2 seconds.
What do I miss?
Offline
My guess is that the 6 M calls * 8 functions = 48 M function calls take some time .
Maybe try some inlining or -fomit-frame-pointer or eliminating some calls...
Offline
Thats obvious dealing with this right now.
The question is why gprof shows that cumulative time is only 2s, while it takes 30s to execute program.
Last edited by ogronom (2010-06-04 17:25:49)
Offline
Maybe the flat profile doesn't take calling time into account? Have you tried --graph?
(I've never used gprof.)
Edit: I don't think what I wrote makes any sense. I'll stop here .
Last edited by stqn (2010-06-04 18:37:06)
Offline
By default gprof generates both of them. (Values of time are the same). I just posted flat one to get the idea.
Offline
gprof doesn't provide "inclusive" timing, to my knowledge, which means that time spent in system calls is largely unaccounted for.
I usually use "valgrind --tool=callgrind --separate-recs=10 --simulate-cache=yes --dump-instr=yes --collect-jumps=yes" for profiling, and kcachegrind to view the data. A major benefit is that it doesn't require the code to be compiled with any particular set of flags, although "-g" is helpful.
Offline
Thanks
Offline
Pages: 1