so-VITS-svc在Colab中訓(xùn)練報錯
INFO:48k:{'train': {'log_interval': 200, 'eval_interval': 1000, 'seed': 1234, 'epochs': 10000, 'learning_rate': 0.0001, 'betas': [0.8, 0.99], 'eps': 1e-09, 'batch_size': 12, 'fp16_run': False, 'lr_decay': 0.999875, 'segment_size': 17920, 'init_lr_ratio': 1, 'warmup_epochs': 0, 'c_mel': 45, 'c_kl': 1.0, 'use_sr': True, 'max_speclen': 384, 'port': '8001'}, 'data': {'training_files': 'filelists/train.txt', 'validation_files': 'filelists/val.txt', 'max_wav_value': 32768.0, 'sampling_rate': 48000, 'filter_length': 1280, 'hop_length': 320, 'win_length': 1280, 'n_mel_channels': 80, 'mel_fmin': 0.0, 'mel_fmax': None}, 'model': {'inter_channels': 192, 'hidden_channels': 192, 'filter_channels': 768, 'n_heads': 2, 'n_layers': 6, 'kernel_size': 3, 'p_dropout': 0.1, 'resblock': '1', 'resblock_kernel_sizes': [3, 7, 11], 'resblock_dilation_sizes': [[1, 3, 5], [1, 3, 5], [1, 3, 5]], 'upsample_rates': [10, 8, 2, 2], 'upsample_initial_channel': 512, 'upsample_kernel_sizes': [16, 16, 4, 4], 'n_layers_q': 3, 'use_spectral_norm': False, 'gin_channels': 256, 'ssl_dim': 256, 'n_speakers': 0}, 'spk': {}, 'model_dir': './logs/48k'}
2023-02-18 14:55:41.659435: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:? AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
DEBUG:tensorflow:Falling back to TensorFlow client; we recommended you install the Cloud TPU client directly with pip install cloud-tpu-client.
2023-02-18 14:55:42.619057: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2023-02-18 14:55:42.619210: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2023-02-18 14:55:42.619232: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
DEBUG:h5py._conv:Creating converter from 7 to 5
DEBUG:h5py._conv:Creating converter from 5 to 7
DEBUG:h5py._conv:Creating converter from 7 to 5
DEBUG:h5py._conv:Creating converter from 5 to 7
DEBUG:root:Initializing MLIR with module: _site_initialize_0
DEBUG:root:Registering dialects from initializer <module 'jaxlib.mlir._mlir_libs._site_initialize_0' from '/usr/local/lib/python3.8/dist-packages/jaxlib/mlir/_mlir_libs/_site_initialize_0.so'>
DEBUG:jax._src.path:etils.epath found. Using etils.epath for file I/O.
INFO:numexpr.utils:NumExpr defaulting to 2 threads.
INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0
INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes.
/usr/local/lib/python3.8/dist-packages/torch/utils/data/dataloader.py:554: UserWarning: This DataLoader will create 8 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
? warnings.warn(_create_warning_msg(
Traceback (most recent call last):
? File "train.py", line 281, in <module>
? ? main()
? File "train.py", line 48, in main
? ? mp.spawn(run, nprocs=n_gpus, args=(n_gpus, hps,))
? File "/usr/local/lib/python3.8/dist-packages/torch/multiprocessing/spawn.py", line 240, in spawn
? ? return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
? File "/usr/local/lib/python3.8/dist-packages/torch/multiprocessing/spawn.py", line 198, in start_processes
? ? while not context.join():
? File "/usr/local/lib/python3.8/dist-packages/torch/multiprocessing/spawn.py", line 160, in join
? ? raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:?
-- Process 0 terminated with the following error:
Traceback (most recent call last):
? File "/usr/local/lib/python3.8/dist-packages/torch/multiprocessing/spawn.py", line 69, in _wrap
? ? fn(i, *args)
? File "/content/so-vits-svc/train.py", line 88, in run
? ? net_g = DDP(net_g, device_ids=[rank])? # , find_unused_parameters=True)
? File "/usr/local/lib/python3.8/dist-packages/torch/nn/parallel/distributed.py", line 657, in __init__
? ? _sync_module_states(
? File "/usr/local/lib/python3.8/dist-packages/torch/distributed/utils.py", line 136, in _sync_module_states
? ? _sync_params_and_buffers(
? File "/usr/local/lib/python3.8/dist-packages/torch/distributed/utils.py", line 154, in _sync_params_and_buffers
? ? dist._broadcast_coalesced(
RuntimeError: The size of tensor a (256) must match the size of tensor b (0) at non-singleton dimension 1