9  eicu数据安装

9.1 下载数据

wget -r -N -c -np --user hulinhui --ask-password https://physionet.org/files/eicu-crd/2.0/

开始下载时间:2024-02-02 09:32:00 CST

FINISHED 2024-02-02 12:02:13
Total wall clock time: 2h 30m 49s
Downloaded: 34 files, 5.1G in 2h 30m 28s (594 KB/s)

9.2 装载数据

9.2.1 下载github code

github网址:https://github.com/mit-lcp/eicu-code

export GH_TOKEN=your_gh_token
gh repo clone MIT-LCP/eicu-code

9.2.2 安装postgres数据库

9.2.3 成功实践

  • 创建eicu_crd模式(SCHEMA)
psql "dbname=hulihuihong user=hulihuihong options=--search_path=eicu_crd" -v ON_ERROR_STOP=1 -c "CREATE SCHEMA eicu_crd AUTHORIZATION hulihuihong;"
  • 查看文件是否完整
cd eicu-code/build-db/postgres

make eicu-check-gz datadir=/home/hulihuihong/Books/phd/physionet.org/files/eicu-crd/2.0/
  • 修改Makefile的默认设置
make initialize DBUSER=myusername DBPASS=mypassword

这个方法会报错,因为内置代码限定了postgres用户的原因。采用手动更改:

DBNAME := hulihuihong
DBUSER := hulihuihong
DBPASS := mypassowrd
DBSCHEMA := eicu_crd  # 这个不用改
  • 装载数据并添加索引,检测完整性
make eicu-gz datadir=/home/hulihuihong/Books/phd/physionet.org/files/eicu-crd/2.0/

开始载入时间:2024-02-02 22:41:30 CST
完成载入时间:2024-02-02 23:11:15 CST

  • 查看模式(schema)
psql
\dn
 List of schemas
      Name       |    Owner    
-----------------+-------------
 eicu_crd        | hulihuihong
 mimiciv_derived | hulihuihong
 mimiciv_hosp    | hulihuihong
 mimiciv_icu     | hulihuihong
 public          | postgres
(5 rows)
  • 查看模型里的数据表
\dt eicu_crd.*
 List of relations
  Schema  |           Name            | Type  |    Owner    
----------+---------------------------+-------+-------------
 eicu_crd | admissiondrug             | table | hulihuihong
 eicu_crd | admissiondx               | table | hulihuihong
 eicu_crd | allergy                   | table | hulihuihong
 eicu_crd | apacheapsvar              | table | hulihuihong
 eicu_crd | apachepatientresult       | table | hulihuihong
 eicu_crd | apachepredvar             | table | hulihuihong
 eicu_crd | careplancareprovider      | table | hulihuihong
 eicu_crd | careplaneol               | table | hulihuihong
 eicu_crd | careplangeneral           | table | hulihuihong
 eicu_crd | careplangoal              | table | hulihuihong
 eicu_crd | careplaninfectiousdisease | table | hulihuihong
 eicu_crd | customlab                 | table | hulihuihong
 eicu_crd | diagnosis                 | table | hulihuihong
 eicu_crd | hospital                  | table | hulihuihong
 eicu_crd | infusiondrug              | table | hulihuihong
 eicu_crd | intakeoutput              | table | hulihuihong
 eicu_crd | lab                       | table | hulihuihong
 eicu_crd | medication                | table | hulihuihong
 eicu_crd | microlab                  | table | hulihuihong
 eicu_crd | note                      | table | hulihuihong
 eicu_crd | nurseassessment           | table | hulihuihong
 eicu_crd | nursecare                 | table | hulihuihong
 eicu_crd | nursecharting             | table | hulihuihong
 eicu_crd | pasthistory               | table | hulihuihong
 eicu_crd | patient                   | table | hulihuihong
 eicu_crd | physicalexam              | table | hulihuihong
 eicu_crd | respiratorycare           | table | hulihuihong
 eicu_crd | respiratorycharting       | table | hulihuihong
 eicu_crd | treatment                 | table | hulihuihong
 eicu_crd | vitalaperiodic            | table | hulihuihong
 eicu_crd | vitalperiodic             | table | hulihuihong
(31 rows)
  • 查看表占据的空间和行数
SELECT schemaname AS "Schema Name",
       relname AS "Table Name",
       n_live_tup AS "Number of Rows",
       pg_size_pretty(pg_total_relation_size(schemaname || '.' || relname)) AS "Total Size"
FROM pg_stat_user_tables
WHERE schemaname = 'eicu_crd'
ORDER BY pg_total_relation_size(schemaname || '.' || relname) DESC;
 Schema Name |        Table Name         | Number of Rows | Total Size 
-------------+---------------------------+----------------+------------
 eicu_crd    | vitalperiodic             |      146608245 | 17 GB
 eicu_crd    | nursecharting             |      151613417 | 14 GB
 eicu_crd    | lab                       |       39120770 | 4727 MB
 eicu_crd    | vitalaperiodic            |       25068372 | 2865 MB
 eicu_crd    | nurseassessment           |       15602197 | 2557 MB
 eicu_crd    | intakeoutput              |       12024190 | 1907 MB
 eicu_crd    | respiratorycharting       |       20165942 | 1583 MB
 eicu_crd    | physicalexam              |        9211101 | 1517 MB
 eicu_crd    | nursecare                 |        8308346 | 1315 MB
 eicu_crd    | medication                |        7303894 | 776 MB
 eicu_crd    | treatment                 |        3688386 | 562 MB
 eicu_crd    | diagnosis                 |        2710569 | 448 MB
 eicu_crd    | careplangeneral           |        3115120 | 387 MB
 eicu_crd    | note                      |        2253525 | 360 MB
 eicu_crd    | infusiondrug              |        4803770 | 344 MB
 eicu_crd    | admissiondrug             |         874931 | 324 MB
 eicu_crd    | pasthistory               |        1149180 | 273 MB
 eicu_crd    | admissiondx               |         626858 | 131 MB
 eicu_crd    | allergy                   |         251949 | 131 MB
 eicu_crd    | apachepatientresult       |         297064 | 85 MB
 eicu_crd    | respiratorycare           |         865381 | 79 MB
 eicu_crd    | patient                   |         200859 | 76 MB
 eicu_crd    | careplangoal              |         504139 | 68 MB
 eicu_crd    | careplancareprovider      |         502765 | 59 MB
 eicu_crd    | apachepredvar             |         171177 | 43 MB
 eicu_crd    | apacheapsvar              |         171177 | 39 MB
 eicu_crd    | microlab                  |          16996 | 1512 kB
 eicu_crd    | careplaninfectiousdisease |           8056 | 1088 kB
 eicu_crd    | careplaneol               |           1433 | 200 kB
 eicu_crd    | customlab                 |           1082 | 128 kB
 eicu_crd    | hospital                  |            208 | 56 kB
(31 rows)
  • 查询mimic和eicu的数据容量
SELECT CONCAT(ROUND(SUM(pg_total_relation_size(c.oid)) / (1024.0 * 1024.0 * 1024.0), 3), 'GB') AS total_size
FROM pg_class c
JOIN pg_namespace n ON c.relnamespace = n.oid
WHERE n.nspname IN ('mimic_hosp', 'mimic_icu', 'eicu_crd') AND c.relkind = 'r';
total_size  
-------------
 51.291GB
(1 row)