platform_system_core/metricsd
Alex Deymo de4497ef3d Fix UploadService unittests.
The unittest was creating a brillo::Daemon in a smart pointer and then
resetting it to a newly created different brillo::Daemon instance. This
creates two brillo::Daemon instances for a short period of time, which
both setup global context (the default message loop).

This patch fixes this situation by deleting the service before creating
a new one.

Bug: 27121855
TEST=`/data/nativetest/metricsd_tests/metricsd_tests` on edison-eng

Change-Id: I3b60cbfcdba77febbfa6104aa0ecad60a29afec5
2016-02-10 11:57:55 -08:00
..
aidl/android/brillo/metrics metrics_collector: add libmetricscollectorservice binder service 2015-12-10 13:34:10 -08:00
collectors metricsd: Removing brillo::Any from weaved's public APIs 2016-01-28 14:44:28 -08:00
etc/weaved/traits Fix metricsd to conform to the lastest server schema 2015-12-16 12:23:14 -08:00
include/metrics metricsd: Replace scoped_ptr with unique_ptr. 2015-12-22 09:18:04 -08:00
uploader Fix UploadService unittests. 2016-02-10 11:57:55 -08:00
.clang-format Add BinPackArguments: false to .clang-format 2016-01-25 17:46:21 -08:00
Android.mk Unit tests: Compile only on 'eng' builds. 2016-02-04 14:17:46 -08:00
OWNERS metricsd: Rename metrics to metricsd. 2015-08-10 15:26:01 -07:00
README.md Fix documentation. 2016-01-06 17:19:30 -08:00
WATCHLISTS metricsd: Rename metrics to metricsd. 2015-08-10 15:26:01 -07:00
c_metrics_library.cc metrics: Remove user action logic. 2015-12-04 13:04:23 -08:00
constants.h metricsd: Persist the metrics to disk periodically. 2016-01-12 15:54:11 -08:00
libmetrics-369476.gyp libmetrics: Update BASE_VERS to 369476 2016-01-20 15:49:35 -08:00
libmetrics.gypi core: Rename libchromeos into libbrillo 2015-10-27 11:46:37 -07:00
libmetrics.pc.in metricsd: Rename metrics to metricsd. 2015-08-10 15:26:01 -07:00
metrics.gyp core: Rename libchromeos into libbrillo 2015-10-27 11:46:37 -07:00
metrics_client.cc metricsd: Allows introspecting the metrics. 2015-12-04 13:14:58 -08:00
metrics_collector.cc crash_reporter: Fix paths for kernel panic collection 2016-02-02 16:14:52 -08:00
metrics_collector.h metricsd: Removing brillo::Any from weaved's public APIs 2016-01-28 14:44:28 -08:00
metrics_collector.rc metricsd: Removing brillo::Any from weaved's public APIs 2016-01-28 14:44:28 -08:00
metrics_collector_main.cc crash_reporter/metricsd: Update libchrome APIs to r369476 2016-01-20 13:43:30 -08:00
metrics_collector_service_client.cc metrics_collector: add libmetricscollectorservice binder service 2015-12-10 13:34:10 -08:00
metrics_collector_service_impl.cc Remove metrics collector service trampoline 2016-01-05 15:48:04 -08:00
metrics_collector_service_impl.h Remove metrics collector service trampoline 2016-01-05 15:48:04 -08:00
metrics_collector_test.cc metricsd: Removing brillo::Any from weaved's public APIs 2016-01-28 14:44:28 -08:00
metrics_library.cc metricsd: Allows introspecting the metrics. 2015-12-04 13:14:58 -08:00
metrics_library_test.cc metricsd: Log over binder. 2015-12-03 17:01:27 -08:00
metricsd.rc metricsd: Removing brillo::Any from weaved's public APIs 2016-01-28 14:44:28 -08:00
metricsd_main.cc metricsd: Persist the metrics to disk periodically. 2016-01-12 15:54:11 -08:00
persistent_integer.cc metricsd: Specify directory for persistent integers. 2015-11-30 13:34:49 -08:00
persistent_integer.h metricsd: Specify directory for persistent integers. 2015-11-30 13:34:49 -08:00
persistent_integer_mock.h metricsd: Specify directory for persistent integers. 2015-11-30 13:34:49 -08:00
persistent_integer_test.cc metricsd: Replace scoped_ptr with unique_ptr. 2015-12-22 09:18:04 -08:00
timer.cc metricsd: Replace scoped_ptr with unique_ptr. 2015-12-22 09:18:04 -08:00
timer_test.cc metricsd: Replace scoped_ptr with unique_ptr. 2015-12-22 09:18:04 -08:00

README.md

Metricsd

The metricsd daemon is used to gather metrics from the platform and application, aggregate them and upload them periodically to a server. The metrics will then be available in their aggregated form to the developer for analysis.

Three components are provided to interact with metricsd: libmetrics, metrics_collector and metrics_client.

The Metrics Library: libmetrics

libmetrics is a small library that implements the basic C++ API for metrics collection. All metrics collection is funneled through this library. The easiest and recommended way for a client-side module to collect user metrics is to link libmetrics and use its APIs to send metrics to metricsd for transport to UMA. In order to use the library in a module, you need to do the following:

  • Add a dependency on the shared library in your Android.mk file: LOCAL_SHARED_LIBRARIES += libmetrics

  • To access the metrics library API in the module, include the <metrics/metrics_library.h> header file.

  • The API is documented in metrics_library.h. Before using the API methods, a MetricsLibrary object needs to be constructed and initialized through its Init method.

  • Samples are uploaded only if the /data/misc/metrics/enabled file exists.

Server Side

You will be able to see all uploaded metrics on the metrics dashboard, accessible via the developer console.

*** note It usually takes a day for metrics to be available on the dashboard.


The Metrics Client: metrics_client

metrics_client is a simple shell command-line utility for sending histogram samples and querying metricsd. It's installed under /system/bin on the target platform and uses libmetrics.

For usage information and command-line options, run metrics_client on the target platform or look for "Usage:" in metrics_client.cc.

The Metrics Daemon: metricsd

metricsd is the daemon that listens for metrics logging calls (via Binder), aggregates the metrics and uploads them periodically. This daemon should start as early as possible so that depending daemons can log at any time.

metricsd is made of two threads that work as follows:

  • The binder thread listens for one-way Binder calls, aggregates the metrics in memory (via base::StatisticsRecorder) and increments the crash counters when a crash is reported. This thread is kept as simple as possible to ensure the maximum throughput possible.
  • The uploader thread takes care of backing up the metrics to disk periodically (to avoid losing metrics on crashes), collecting metadata about the client (version number, channel, etc..) and uploading the metrics periodically to the server.

The Metrics Collector: metrics_collector

metrics_collector is a daemon that runs in the background on the target platform, gathers health information about the system and maintains long running counters (ex: number of crashes per week).

The recommended way to generate metrics data from a module is to link and use libmetrics directly. However, we may not want to add a dependency on libmetrics to some modules (ex: kernel). In this case, we can add a collector to metrics_collector that will, for example, take measurements and report them periodically to metricsd (this is the case for the disk utilization histogram).

FAQ

What should my histogram's |min| and |max| values be set at?

You should set the values to a range that covers the vast majority of samples that would appear in the field. Note that samples below the |min| will still be collected in the underflow bucket and samples above the |max| will end up in the overflow bucket. Also, the reported mean of the data will be correct regardless of the range.

How many buckets should I use in my histogram?

You should allocate as many buckets as necessary to perform proper analysis on the collected data. Note, however, that the memory allocated in metricsd for each histogram is proportional to the number of buckets. Therefore, it is strongly recommended to keep this number low (e.g., 50 is normal, while 100 is probably high).

When should I use an enumeration (linear) histogram vs. a regular (exponential) histogram?

Enumeration histograms should really be used only for sampling enumerated events and, in some cases, percentages. Normally, you should use a regular histogram with exponential bucket layout that provides higher resolution at the low end of the range and lower resolution at the high end. Regular histograms are generally used for collecting performance data (e.g., timing, memory usage, power) as well as aggregated event counts.

How can I test that my histogram was reported correctly?

  • Make sure no error messages appear in logcat when you log a sample.
  • Run metrics_client -d to dump the currently aggregated metrics. Your histogram should appear in the list.
  • Make sure that the aggregated metrics were uploaded to the server successfully (check for an OK message from metricsd in logcat).
  • After a day, your histogram should be available on the dashboard.